Technical Considerations:
Digitizing the Winslow Family Papers

The digitized version of the Winslow Family Papers represents more than five years of daring and dreaming, technical challenges and triumphs, and old-fashioned hard work in the life of the Electronic Text Centre.

As a pilot project, the Winslow Family Papers and the Electronic Text Centre's digital imaging operations turned out to be a good match. Experts in digitization typically recommend that, in selecting material for a beginning conversion project, one ought to consider a body of material that has high information value, is cohesive, manageable, doable, affordable, and not covered by copyright or other access restrictions. Such material should be of high use or have anticipated high use. Do not, they stress, underestimate the value of learning by doing and advancing incrementally. As a pilot project, the Winslow manuscript collection, consisting as it does of more than 2500 documents in excess of 11,000 pages, of varying sizes and conditions, with a massive print index only partially digitized at the time, and no external funding, Winslow appeared less than manageable, doable, or affordable. Although hindsight is 20/20, sometimes it's best not to know then what one knows now. We had only a partial sense of the challenges laying ahead, motivated as we were by the sheer import of these manuscripts and the youthful enthusiasm that comes with exploring uncharted territory.

In 1998 UNB Libraries' Electronic Text Centre purchased a PhaseOne 4" X 5" digital camera back, mounted to a copy stand, and began offering high-resolution digital imaging services to various University units. It was natural that we should develop a close working relationship with UNB Archives, whose rich collection of manuscripts spurred our initial interest in the research and development of digital capture and preservation techniques. Around the same time the Electronic Text Centre was designing Web-accessible database prototypes based on multimedia metadata schemas developed under contract to Industry Canada's SchoolNet. The Winslow project was to be our first serious coupling of the digital and the archival, pulling together the ETC's digital imaging and database development activities in what we hoped would be an ongoing, fruitful partnership.

As a lively and impassioned account of Loyalist life during the American Revolution and beyond, words pour forth onto the page with the force of blood. Edward Winslow's collection of letters, diaries, reports, memos, ledgers, invoices, etc. are some of the most immediate and important documents on Atlantic Canadian and American history. Though the immensity of the undertaking was apparent from the outset, the possibility of making such a resource available to people around the world was all the motivation we needed to face challenges with optimism and determination.

Although personnel changes were many and procedure took months - in some cases, years - to establish and perfect, there is in the end a consistency of output that attests to the importance of monitoring industry trends and keeping close watch on workflow.

Technical Specifications

Digital masters

Master archival image files were captured in full colour (24 bit RGB) at a resolution of 300 pixels or dots per inch (ppi/dpi). Following the recommended best practices of leading cultural heritage preservation institutions (for example, Cornell University's Department of Preservation and Collection Maintenance and Canadian Heritage's Guidelines for Creating and Managing Digital Content), tonal scale and colour balance controls were set prior to image capture in order that digital surrogates be as true to the appearance of original documents as possible and to minimize adjustments during processing. Images were sharpened as needed during image processing to achieve the approximate appearance of the original. All sharpening was effected with an unsharp mask algorithm. Master image files were stored as uncompressed TIFF files (Intel byte order, header version 6). File naming follows established conventions at the University of New Brunswick for effective management of digital image collections. In our case we use unique alpha-numeric numbering modeled on source documents' physical organization (volume, number, and pagination).

Web surrogates

Web surrogates for use in on-line delivery were derived from the master archival TIFF files. The format for these files is JPEG (24 bit RGB), a flexible, compressed format and recognized industry standard for the Web presentation of textual and photographic documents. In order to improve networked access and use of the images, resolution was reduced to 72 dpi. Additional surrogates in the form of thumbnails are also used for Web access. Thumbnails are in JPEG format at a resolution of 72 dpi with reduced dimensions of 150 pixels in width for landscape images and 150 pixels in height for portrait images. As with the master archival image files, file naming follows established conventions for effective management of University of New Brunswick digital image collections. The university has ensured that image files can be identified with a persistent URL to enable reliable citation, cross-linking, and integrated access.

Image archiving

Master images (TIFFs) are archived to CD-R, two copies each - one for University of New Brunswick Archives and one kept onsite with the ETC. Masters are also backed up on a high-performance server cluster maintained jointly by the ETC and UNB's Advanced Computation Research Laboratory. while surrogates (JPEGs) have been uploaded to a Linux Web Server running Apache Web server software.

Metadata creation

Metadata descriptions have been created at collection, document, and component image levels according to the Electronic Text Centre's extended Dublin Core metadata schema (http://www.lib.unb.ca/Texts/metadata.html). The project follows a Dublin Core framework with relevant terminology standards and controlled vocabularies in creating rich and highly portable metadata records.

Database design and implementation

Project cataloguers have created metadata descriptions using custom Web-accessible editors that interface with a MySQL database. MySQL is an open-source database designed for speed and flexibility in heavy load use. The ETC's MySQL image database resides on a Unix (Linux) Server running Apache Web server software and is used for storing and delivering Dublin Core-compliant metadata records as well as linking them to associated image files.

Marc Bragdon
University of New Brunswick

About the Project