Scanning the Printed Material
Paper-based printed documents in Puerto Rico at the Dawn of the Modern Age: Nineteenth- and Early-Twentieth-Century Perspectives were digitized by Systems Integration Group (SIG) of Lanham, Maryland. Each item was reproduced as facsimile page images. The image capture took place at the Library of Congress. In order to preserve the originals, bound works were scanned face-up in their bindings, one page at a time. The master or archival version of the textual pages (containing typography and line art) is a 300-dots-per-inch (dpi) bitonal image in the TIFF format, with ITU Group IV compression. Pages with printed halftone illustrations, finely detailed line drawings, or pages with significant color, including book covers, were captured as 8-bit grayscale or 24-bit color images, as appropriate, and stored in the JFIF image format (with JPEG compression). Books containing bitonal text pages and no illustrations were scanned using the Minolta PS3000. Books containing grayscale illustrations were scanned using the Toyo 4x5 inch studio camera with a Phase One Photophase Plus digital camera back.
The browser-display images for all document pages are in the GIF format. The staff produces these images by processing batches of the master or archival images. When bitonal images are being processed, gray tones are added and the resulting image is blurred to mimic grayscale. Then the image is reduced in scale to fit the typical display monitor and sharpened to enhance legibility. When the source image is grayscale, only rescaling and sharpening are undertaken to create the GIF image.
Materials in Puerto Rico at the Dawn of the Modern Age: Nineteenth- and Early-Twentieth-Century Perspectives that were digitized from microfilm include the pamphlets and the periodical Repertorio Historico de Puerto-Rico as well as two monographs, Puerto Rico y Su Historia: Investigaciones Criticas and Historia de la Insurreccion de Lares. For optimal capture of detail, the microfilm negative was duplicated, printed directly from the master microfilm, and produced for scanning by Preservation Resources, Bethlehem, Pennsylvania. The digital images were captured by Preservation Resources as 600-dpi bitonal images saved in TIFF format, with ITU Group IV compression.
Preservation Resources also created GIF files for quick online access to the microfilm items in this collection. These images were derived from the bitonal TIFF files or the grayscale TIFF files during the post-processing phase of production using Image Alchemy image-processing software.
Creating the Searchable Text
After the images were approved by the Library, searchable texts were prepared offsite, where a subcontractor rekeyed the documents from the page images. These typescript materials were converted to machine-readable form at an accuracy rate of 99.95% and encoded with Standard Generalized Markup Language (SGML), according to the American Memory Document Type Definition (DTD). This DTD is a markup scheme that conforms to the guidelines of the Text Encoding Initiative (TEI), the work of a consortium of scholarly institutions. The online presentation of the texts also includes a version in HTML (HyperText Markup Language), produced by the Library in an automated process. Because it requires no special software, the HTML version is easier for most users to access.