Building the Digital Collection

Conservation Efforts

The opportunity to digitize the African American Pamphlet Collection also became an opportunity for conservation of the collection. The African American Pamphlet Collection is composed primarily of bound and unbound materials containing single or multiple sections; complete pamphlets range in length from four to over one hundred pages, including one uncut quarto. The most common method of binding consists of metal staples either punched through the fold of the sections or punched vertically through all folios at approximately one-half centimeter from the spine edge. A substantial number of pamphlets were hand-sewn, with a single or double loop introduced vertically through the folios. These bindings made scanning problematic for those pamphlets containing brittle paper. If these pamphlets were opened to 90 degrees, the minimum aperture necessary to capture all information on a page, they could be damaged. In fact, brittle paper, dirt, losses, tears, or restricted or tight openings caused by the binding method rendered forty-two items from the African American Pamphlet Collection damaged or at risk of damage and in need of pre-scanning treatment. None of these items had received previous treatment of paper or binding. The pH of these items ranged between 2.5 and 6.0, but was generally between 4.5 and 5.0.

It was essential to rebind those pamphlets that could not safely be opened, substituting a more freely opening and flexible structure. Single-section pamphlets were given a simple stitch through the fold. Multiple-section items received a traditional long stitch through the fold and were sewn through a text-block wrapper of heavy handmade (Barrett) paper, a process that also created a new protective cover. The embrittled folios often had to be guarded with a lightweight Japanese paper because they retained little flexibility at the original point of fold, and breakage could occur with repeated opening of the materials. When a fold had already broken because of extreme degradation of the paper, a very wide Japanese paper guard was used to absorb all stress and flex when the item was opened. This step changed the dimensions of the original item slightly but was necessary to insure safe handling during the scanning process.

In instances where any handling, no matter how careful, would result in loss and breakage, individual pamphlet pages were encapsulated in polyester film (trade-named Mylar), though this method was avoided whenever possible. From a digital perspective, encapsulated items can cause scanning problems, creating a halo effect in digital images. From a researcher's perspective, Mylar can reflect the glare from overhead lights, making reading difficult. Encapsulation also places an intrusive barrier between the researcher and the item. In some cases, only the covers of the pamphlets were encapsulated in polyester film. Typically, if the pamphlets have original covers, they are of a paper quite different in quality from that of the text block--usually a thicker colored paper that becomes much weaker over time than the pages contained within. For these items, a Mylar jacket was designed for the covers and incorporated into the sewing structure; the text block was not encapsulated.

All African American Pamphlet Collection items treated by the Conservation Laboratory were deacidified and buffered using either aqueous (calcium hydroxide or magnesium bicarbonate solutions) or non-aqueous (Bookkeeper deacidification spray) methods; the choice depended on the item and its condition. Tears and losses were repaired using Japanese papers and wheat starch paste as well as heat-set tissue, toned with acrylics or pastels to blend with the original materials. Any materials left vulnerable after treatment received new housing using polyester sleeves or wrappers of buffered card stock.

Creation of Digital Images

The digital images that make up From Slavery to Freedom: The African-American Pamphlet Collection, 1822-1909 were created at the Library of Congress by Systems Integration Group of Lanham, Maryland, using a variety of capture devices.

Most pages were scanned as bitonal TIFF images using a Bookeye overhead scanner. Pages with illustrations and significant handwriting were scanned as 8-bit grayscale TIFFs. Color graphs and color covers that were too dark for bitonal scanning were scanned as 24-bit color TIFFs. Grayscale and color scanning was performed on two devices. Small disbound or single-leaf items that could be inverted were scanned on a UMAX 11 x 17-inch flatbed scanner. Items that could not be inverted or that were larger than 11 x 17 inches were scanned with a PhaseOne Photophase Plus digital camera back mounted on an overhead Toyo 4 x 5-inch studio camera.

The average file size for bitonal TIFFs is approximately 50 KB, while grayscale or color TIFFs are 8-20 MB. All uncompressed grayscale and color images were also delivered as JPEG files. The National Digital Library team used Image Alchemy to create the GIF files featured in the Page Image Viewer.

Creation of Digitized Text

The preparation of the searchable texts occured offsite, where a subcontractor rekeyed the documents from the page images. The texts have a transcription accuracy of approximately 99.95 percent and are marked up with SGML (Standard Generalized Markup Language). The online presentation offers access to the SGML versions of the texts for users with appropriate viewing software. The text was transformed with an OmniMark program to HTML 3.2 for indexing and viewing with Web browsers, including links from the text to the page images. MARC records were enhanced to provide access to the collection, with links either to the full text or to page images. In addition, a page-turning feature allows consecutive viewing of the page images.

In the transcriptions, capitalization and spelling reflect those of the original document. Where the text was illegible or otherwise untranscribable, a bracketed statement of omitted text appears, with a number followed by the letter 'w' to indicate the number of omitted words. For example, "{Omitted text, 2w}" means that two words have been omitted. If the illegible or untranscribable text is less than one word, a question mark replaces each omitted character. Thus, "But what if Mr. Greeley should, notwithstanding his candid??y and election, remain miraculously unchanged?" means that two letters in the ninth word have been omitted.

