Situation Definition:
A situation that arises from difficulty reading a digitized document because of poor quality in scanning or processing.
Factor(s) Leading to the Situation:
-
- Inadequate support:
- Inadequate support:
- Inadequate support:
Guideline or Design Recommendation:
-
- Ensure the full text of a scanned document is readable
- Provide essential information for a scanned document with readability issues
Rationale and Objective:
A
is a convenient information source for
users. One issue results from poor content quality, which makes a transcript difficult to be recognized by screen reader software (1). The guidelines and techniques presented here must largely be implemented at the time when an item is digitized. If rescanning of poorly digitized items with better
methods is not possible, then manual editing and correction of transcripts can be helpful (2).
Techniques and Methods:
1.1. Choose the most appropriate and effective software to scan documents based on their content (e.g., OCR for printed or typed text,
for handwritten text, missing letter prediction, natural language, and Adobe software for converting Microsoft Office files to PDF files.)
1.2. Set the OCR or HTR software and/or hardware for the highest quality recognition possible (See example 1.2.a and 1.2.b)
1.3. Scan materials at high resolution (More than 600 dpi)
1.4. Establish quality control procedures for the digitization process
1.5. Use the Text-to-speech function (e.g., “Read out loud” function in PDF) to test the readability of the scanned document
1.6. Manually review and correct poor-quality items generated through OCR or HTR
1.7. Transcribe original sources manually if character recognition software is not available or effective
1.8 Put PDF tags into PDF files for content accessing (See example 1.8)
2.1. Provide summaries and keywords for documents with poor quality transcripts generated by OCR or HTR
Recommended Features:
1.1. Document scanning and transcript generation software (e.g., OCR software for print or typed text, HTR software for handwritten text) (See example 1.1.a and 1.1.b)
1.5. Text- to- speech function (See example 1.5)
1.7. Manual creation of a document transcript (See example 1.7)
2.1. Summaries and keywords (See example 2.1)
Examples:
1.1.a. Approximate document scanning: How-to example and Good design
Example of a transcript generated for a handwritten letter. University of Cambridge, Darwin Correspondence Project. Retrieved from https://www.darwinproject.ac.uk/letter/DCP-LETT-9247.xml
1.1.b. Approximate document scanning: How-to example
Check PDF accessibility using a PDF accessibility checker such as PAVE
1.2.a. Set high-quality recognition: Bad design
Text too small to read. Ensure the full text of a scanned document is readable by using high-resolution scanning.
1.2.b. Set high-quality recognition: How-to example
Disable OCR Fast mode option (recommended for poor quality material) (OCLC, 2017)
1.5. Text to Speech function: Good design
This DL provides a Text to Speech function and also shows which text is currently being read by highlighting the sentence and word.
1.7. Manual creation of a document transcript: Good design
Provide manual transcripts for hand-written documents manually
1.8. Put PDF tags into a file: How-to example
Create PDF tags in an untagged PDF file to provide functionality similar to HTML tags for header navigation (See more instruction: https://webaim.org/techniques/acrobat/acrobat, or https://www.adobe.com/accessibility/products/acrobat/pdf-repair-add-tags.html)
2.1. Provide summary and keyword for the scanned document: Good design
Even if there is no text or text is hard to recognize, it is good to provide a brief summary and description like the following example.
Related Resources:
-
- DGM 2120. (n.d.). Accessibility and Collated Text Transcript. Retrieved from http://desource.uvu.edu/dgm/2120/in/steinja/lessons/08/08_08.html
- Transkribus. (2017). How to transcribe documents with transkribus-introduction. Retrieved from https://readcoop.eu/transkribus/howto/how-to-transcribe-documents-with-transkribus-introduction/
- Adobe. (n.d.). Determine if the Document has been Tagged. Retrieved from https://www.adobe.com/accessibility/products/acrobat/pdf-repair-add-tags.html
- OCLC. (2017). OCR Settings. Retrieved from https://www.oclc.org/support/services/contentdm/help/project-client-help/entering-metadata/ocr-settings.en.html
- WebAIM. (2019). Acrobat and Accessibility in PDF Accessibility. Retrieved from https://webaim.org/techniques/acrobat/acrobat
- WebAIM. (2019). Converting Documents to PDFs in PDF Accessibility. Retrieved from https://webaim.org/techniques/acrobat/converting
See Also:
Help-seeking Situations > Difficulty accessing information