| The Library of Congress / Ameritech National Digital Library Competition (1996-1999) |
Lessons Learned: Intellectual Access (and other types of metadata)
Several of the projects have taken advantage of existing
machine-readable (often MARC) bibliographic descriptions. For other
projects, descriptive records are being prepared in conjunction with the
digitization. In most cases, relational databases (such as FileMaker Pro
and Microsoft Access) are in use for data preparation and maintenance;
metadata beyond that needed to provide intellectual access is usually
recorded in the same database. Information to support the ongoing
management of the resources (original and digital) is often included.
Some institutions discovered a need to record unanticipated information
relating to the structure of a converted item in order to support
convenient access and navigation for users. In addition to exporting the
data to LC for indexing within American Memory, several projects export
the descriptive records for integration into other delivery systems at the
host institution.
Decisions to change the level and sources of descriptive terminology
were discussed in some reports.
- Ohio Historical Society
- Online Collection: The African-American Experience in Ohio, 1850-1920
- At OHS, a database in Microsoft Access is being used both for data preparation and to support dynamic web-based presentation of items.
Copies of the descriptive records will be sent to LC for integration into American Memory, but OHS is also providing a local search interface.
In the first interim report, George Parkinson commented that the OHS pilot project led them to
- extend the initial set of fifty Library of Congress Subject Headings selected (and built in to the database for authority control) to provide more specific subject access.
- record additional information in the database, including the
number of pages in a scanned item, and dates for individual items scanned
from newspapers and serials.
- In the second interim report, Parkinson identified two
issues while scanning. The first relates to the occurance of
inconsistancies in the numbering of issues and volumes of newspaper
titles. While this does not necessarily cause problems in terms of
searching or retrieving articles, it is problematic because the
- directory structure is based in part on volume and issue information; thus, duplicates cause confusion for project staff and create difficulties for storage.
- the numbering problems...confuse patrons [who would]
encounter difficulties in looking for an article in volume 16, issue 20,
but not find it because it was actually in the next issue, which was also
volume 16, issue 20! However, there were also concerns about changing or
alterng the original citation.
- Since the date is generally considered much more reliable than the volume and issue information, LC staff and OHS decided:
- to keep the issues with duplicated numbers in a separate directory structure [while ensuring user access would be seamless].
- add a browse by date option, which should allow more
accurate retrieval of issues. Patrons will be able to view a list of
newspaper articles in date order for each newspaper. Every entry, which
will hyperlink to the item, will give the original date date and
volume/issue information. This presentation should make it clear to the
patron that there are duplicate issues. We will also indicate what the
"correct" issue number should be.
- The second issue relates to the existence of blank pages in
manuscript material, such as notes and minutes in ledger books. The
question is whether it would be acceptable to scan only those pages with
pertinent material, then indicate that ot her blank pages were not
scanned. There are established standards and procedures for other types
of reformatting projects. For example, when microfilming books, every
page is microfilmed regardless of content or condition. There are no such
standards for digital images yet. OHS and LC staff decided that there is
no need to present (or scan) all the blank pages, but that some indication
as to their existence is recommended.
- University of Chicago
- Online Collection: American Environmental
Photographs, 1897-1931
- Item-level descriptions are being prepared for 5,800
photographic images. Sources of information about the images include the
card file that came to the library with the collection and a variety of
notebooks, hand-written notes, and archival records.
- In the first interim report, Alice Schreyer emphasized:
- In addition to the sources of subject authorities mentioned
specifically in the proposal (LC Subject Headings, LC Thesaurus for
Graphic Materials, and the Art and Architecture Thesaurus), the team
decided to add two categories of botanical name (scientific and common),
and terminology from the 1890-1920 period as found in the published work
of Henry Chandler Cowles and John Merle Coulter. For terms describing
physical geography, two published reference works were used.
- Up to 37 data fields are used to record descriptive and
administrative data for items, including notes on damage revealed during
the inspection and inventory process. Programs have been written to
extract data from the database using the ODBC/SQL protocol (Open Data Base
Connectivity /Structured Query Language) and tag it using SGML according
to a simple document type definition (DTD). Export in another format is
planned for use with software developed locally for use in the ARTFL
project (American and French Research on the Treasury of the French
Language)
- University of Chicago
- Online Collection: First American West
- There were two major differences between the two collections
for which the University of Chicago won awards. The first was a
homogeneous collection of photographs. The second was a heterogeneous
set of materials in different physical formats, selected from the
collections of two institutions.
- At the end of the second project, Alice Schreyer recommended:
- Examples of all possible types of materials to be
digitized should be looked at early on and an assessment made of the
challenges and needs of each format. For the FAW project there were many
formats (pictures, books, manuscripts, maps, objects, etc.). Some
required special cataloging attention for their digital versions, such as
multivolume works. A clear understanding upfront of existing descriptive
metadata that will be needed for selections and of the level of
description that is desired for Internet access to digital versions would
assist in planning a workflow for this part of the project.