A2. Difficulty accessing content of a scanned document

Situation Definition:

A situation that arises from difficulty reading a digitized document because of poor quality in scanning or processing.

Factor(s) Leading to the Situation:

    • Inadequate support:

    • Inadequate support:

Guideline or Design Recommendation:

    1. Ensure the full text of a scanned document is readable
    2. Provide essential information for a scanned document with readability issues

Rationale and Objective:

A

is a convenient information source for

users. One issue results from poor content quality, which makes a transcript difficult to be recognized by screen reader software (1).  The guidelines and techniques presented here must largely be implemented at the time when an item is digitized. If rescanning of poorly digitized items with better

methods is not possible, then manual editing and correction of transcripts can be helpful (2).

Techniques and Methods:

1.1. Choose the most appropriate and effective software to scan documents based on their content (e.g., OCR for printed or typed text,

for handwritten text, missing letter prediction, natural language, and Adobe software for converting Microsoft Office files to PDF files.)
1.2. Set the OCR or HTR software and/or hardware for the highest quality recognition possible (See example 1.2.a and 1.2.b)
1.3. Scan materials at high resolution (More than 600 dpi)
1.4. Establish quality control procedures for the digitization process
1.5. Use the Text-to-speech function (e.g., “Read out loud” function in PDF) to test the readability of the scanned document
1.6. Manually review and correct poor-quality items generated through OCR or HTR
1.7. Transcribe original sources manually if character recognition software is not available or effective
1.8 Put PDF tags into PDF files for content accessing (See example 1.8)
2.1. Provide summaries and keywords for documents with poor quality transcripts generated by OCR or HTR

Recommended Features:

1.1. Document scanning and transcript generation software (e.g., OCR software for print or typed text, HTR software for handwritten text) (See example 1.1.a and 1.1.b)
1.5. Text- to- speech function (See example 1.5)
1.7. Manual creation of a document transcript (See example 1.7)
2.1. Summaries and keywords (See example 2.1)

Examples:

1.1.a. Approximate document scanning: How-to example and Good design

Example of a transcript generated for a handwritten letter. University of Cambridge, Darwin Correspondence Project. Retrieved from https://www.darwinproject.ac.uk/letter/DCP-LETT-9247.xml1.1. Example of a transcript generated for a handwritten letter University of Cambridge, Darwin Correspondence ProjectHandwritten letter transcript

1.1.b. Approximate document scanning: How-to example

Check PDF accessibility using a PDF accessibility checker such as PAVE

Instructions for PDF accessibility 1. Upload your PDF document to PAVE. Please note: The maximal allowed file size is 5 megabytes. 2. PAVE will make the automatic corrections. 3. Simply make the remaining corrections yourself in PAVE. 4. Now you can download the accessible PDF document. The PDF document will remain on the PAVE server for a maximum of three weeks, unless you delete it manually beforehand.

1.2.a. Set high-quality recognition: Bad design

Text too small to read. Ensure the full text of a scanned document is readable by using high-resolution scanning.

Letter to the General Assembly in which scanned letter is too small for page and difficult to read

1.2.b. Set high-quality recognition: How-to example

Disable OCR Fast mode option (recommended for poor quality material) (OCLC, 2017)

OCR options menu with Fast mode checkbox enabled (Fast mode offers increased speed but less accurate OCR text.)

1.5. Text to Speech function: Good design

This DL provides a Text to Speech function and also shows which text is currently being read by highlighting the sentence and word.

ReadSpeaker toolbar visible on About this Collection page with highlighted text demonstrating what is currently being read

1.7. Manual creation of a document transcript: Good design

Provide manual transcripts for hand-written documents manually

A transcript of a hand-written letter

 

1.8. Put PDF tags into a file: How to example

Create PDF tags in an untagged PDF file to provide functionality similar to HTML tags for header navigation (See more instruction: https://webaim.org/techniques/acrobat/acrobat, or https://www.adobe.com/accessibility/products/acrobat/pdf-repair-add-tags.html)

Menu walkthrough to access PDF tags: Go to View, Show/Hide, Navigation Panes, and Tags

2.1. Provide summary and keyword for the scanned document: Good design

Even if there is no text or text is hard to recognize, it is good to provide a brief summary and description like the following example.

Example of a map titles, "A survey of the road of the United States of America" which includes a short summary: "Relief shown by hachures. Orientation varies. Phillips, 1326 Maps no. 34-39 are believed to never have been engraved as they are wanting in all known copies. Available also through the Library of Congress Web site as a raster image. On original paper cover in manuscript ink: "Collis's plan of the roads throughout the United States." Accompanied by broadside: Proposals for publishing A survey of..."

Related Resources:

    1. DGM 2120. (n.d.). Accessibility and Collated Text Transcript. Retrieved from http://desource.uvu.edu/dgm/2120/in/steinja/lessons/08/08_08.html
    2. Transkribus. (2017). How to transcribe documents with transkribus-introduction. Retrieved from https://readcoop.eu/transkribus/howto/how-to-transcribe-documents-with-transkribus-introduction/
    3. Adobe. (n.d.). Determine if the Document has been Tagged. Retrieved from https://www.adobe.com/accessibility/products/acrobat/pdf-repair-add-tags.html
    4. OCLC. (2017). OCR Settings. Retrieved from https://www.oclc.org/support/services/contentdm/help/project-client-help/entering-metadata/ocr-settings.en.html
    5. WebAIM. (2019). Acrobat and Accessibility in PDF Accessibility. Retrieved from https://webaim.org/techniques/acrobat/acrobat
    6. WebAIM. (2019). Converting Documents to PDFs in PDF Accessibility. Retrieved from https://webaim.org/techniques/acrobat/converting

See Also:

Help-seeking Situations > Difficulty accessing information