A3/D4. Difficulty locating or accessing information related to visual items

Situation Definition:

A situation that arises from difficulty finding and gaining access to alternative text, transcripts, or description for visual items in a DL.

Factor(s) Leading to the Situation:

    • Complex information presentation:

    • Complex information presentation: 

    • Inappropriate labeling:

    • Lack of features/functions/items/information:

    • Lack of features/functions/items/information:

Guideline or Design Recommendation

    1. Provide concise and meaningful

       for all visual items
    2. Provide

      for images of text documents
    3. Provide audio descriptions for video materials
    4. Provide descriptive

      for each image or video item
    5. Ensure alternative text, transcripts and descriptions for visual items can be easily located
    6. Ensure alternative text, transcripts and descriptions for visual items can be easily accessed
    7. Provide help information regarding alternative text, transcript, or description access

Rationale and Objective:

Providing a text

for an image or a video item is helpful for

users to search for a keyword or scan overall content (1). Specifically, since DLs include a range of heterogeneous content, including illustrated books, photos, maps, manuscripts, etc., it is necessary to include a text transcript in a document form that is accessible.  Preferably, the transcript should be in markup language for web pages, such as


document format, as it is considered to be most easily accessible by

. Regarding downloadable formats, because PDF format typically requires up-to-date assistive technology and a plug-in for accessing the

files, Microsoft Word format might be more appropriate. A


should also be provided to offer additional information along with the transcript (2). Also, providing a clear label for the transcript is important because “text” metadata refers to text type, but BVI users may mistake this term to mean “transcript.” The addition of invisible headings would support ease of navigation for BVI users (3).

Techniques and Methods:

1.1. Provide a supplementary description for a visual item next to it with a clear

2.1. Provide an option to download a

of a visual item
2.2. Provide partial transcripts if full transcripts are unavailable, and clearly indicate a partial transcript is available for a visual item
3.1. Offer

for scenes of videos between dialogues
4.1. Provide a clear and concise description of visual items in

5.1. Provide an option for transcript display next to a visual item
5.2. Add “Transcript” as an invisible part of the title for a heading
5.1/6.1. Add an invisible “Skip to transcript” link
7.1. Provide a context-sensitive description for information embedded in a visual item. For example, if an image portrays protestors holding a sign for their protest, BVI audiences should be provided with contextual information on the broader meaning of the sign in addition to the text or the visual image.

Recommended Features:


(See example 1.1.A.a and 1.1.A.b)
1.1.B. Meaningful

(See example 1.1.B.a and 1.1.B.b)
2.1. Downloadable

feature (See example 2.1)
2.2. Transcript display feature (See example 2.2.a and 2.2.b)
3.1. Invisible audio description controls (see example 3.1.a and 3.1.b)
4.1. Description element in metadata (See example 4.1)
5.1/6.1. Skip-to-transcript links (See example 5.1/6.1)
5.2. Invisible headings (See example 5.2.a and 5.2.b)
7.1. Context-sensitive description (See example 7.1)


1.1.A.a. Alt text: How-to example

Use appropriate alternative text for images by considering the context of an image and text description/summary along with an image. For example, if an image is provided without any text (e.g., title, summary), a right alt text should include title and short/brief information about the image. If an image has a title without a summary, alt text should also provide a summary without a title. If an image has a link, alt text should provide “where it is heading to” information.

1.1.A.b. Alt Text: How-to example

Use “access interview,” “view movie,” or “listen to audio recording.”

In the media example, links to the media opened in a new browser tab, but this behavior was not indicated with the link.  aria-label= “view movie opens in a new window” would have addressed the issue.

1.1.B.a. A supplementary description for a visual item: Good example

Replace the first image with the second image with the “image” tab selected so that screen readers can see both versions of the content

First image indicates "There is no text for this item" while second image shows searchable text information about the item. View Image and Text options are both available.

1.1.B.b. A supplementary description for a visual item: Bad example

No Alt-text for an image, and the file name is not meaningfully labeled.

Image of large group of individuals wearing surgical masks in a room with the heading "America during the 1918 Influenza Pandemic" but without alt text or other meaningful labels

Corresponds to this markup:

Markup corresponding to Influenza Pandemic image: <div> <div tabindex="-1" style="width: 100%; display: inline-block;"> <a class="HomePageSlider_item_23jN2 href="/exhibitions/1918-influenza"> event <div class="HomePageSlider_itemImgWrapper_3D4X3"> <div class="HomePageSlider_itemImg_2zHpT" style="background-image: url("https://dp.la/api/files/fullsize/0cc3a204f084885ef171604130ba5f2.jpg");"></div> </div> <div class="HomePageSlider_itemText_3gBG3"> America during the 1918 Influenza Pandemic </div> </a> </div> </div> </div>

2.1. Downloadable transcript feature: Good example

Make the transcript downloadable in .txt format (e.g., “Download transcript here”) similar to the HathiTrust digital library where metadata is downloadable in text format (.txt). (HathiTrust, 2018)

Example from Hathi Trust Digital Library with clear Download Metadata button available for item

2.2.a. Provide partial transcripts: Good example

Enabled transcript feature that provides a text-based representation of the visual content.

Text search results for the term "film" with "4 found in document" message

2.2.b. Provide partial transcripts: Bad example

No semantic markup around the transcript data (Transcript is set as p elements, nothing in markup to indicate that it’s the transcript text.)

HTML code for an item's description which states it includes a transcription. The transcription is placed in separate <p> elements which do not indicate that it is the transcript text

3.1.a. Provide audio descriptions: Good Example

Audio description (Art Beyond Sight, n.d.) with transcription.

Display page for "Woman 1" painting which includes a physical description of the painting and a transcript of the audio

3.1.b. Provide audio descriptions: Bad Example

No audio description or no CC on a video file.

Closeup of video's CC icon with No Subtitles message

4.1. Provide a clear and concise description of visual items in metadata: How to example

Clear labeling for metadata to avoid confusion between transcript and metadata: Object description -> Folder level metadata & Description -> Item level metadata.

Example of clear labeling for metadata: Use Folder Level Metadata and Item Level Metadata as descriptive names instead of the unclear names Object Description and Description

5.1/6.1. Add an invisible “Skip to transcript” link: Good example

Invisible link: “Skip to Transcript” links to the “Text” tab if the transcript is available

Example of Invisible link added: Skip to Transcript link to activate text tab if link to transcript is available

5.2.a. Provide invisible heading: Good example

Invisible headings: a heading not visible to sighted users but readable by screen readers

Example of Invisible heading

5.2.b. Provide invisible heading: Bad example

No option to skip to the transcript or invisible heading.

A film for "A Map of the Cattle Drive" that does not provide any transcripts and no options for skipping.

7.1. Provide a context-sensitive description for information embedded in a visual item: Good example

Context-sensitive description (providing clear descriptions based on image or scanned document’s content)

A Source Set of items with context-sensitive descriptions including: Resource 1- "A map showing the gold mining region of California and routes for traveling there, 1849." Resource 2- "A print depicting a long line of men, women, and families waiting to depart for the gold regions of California, 1848." Resource 3- "An excerpt from A trip across the plains, and life in California by George Keller, 1851." Resource 4- "A letter from gold prospector Newton Chandler to his wife Jane after arriving in San Francisco, January 15, 1855."

Related Resources:

    1. W3C. (2018). Understanding Success Criterion 2.4.6: Headings and Labels. Retrieved from https://www.w3.org/WAI/WCAG21/Understanding/headings-and-labels.html
    2. Bitstreams. (2018). Interactive Transcripts have arrived. Retrieved from https://blogs.library.duke.edu/bitstreams/2018/02/16/interactive-transcripts-arrived/
    3. Xie, I., Babu, R., Joo, S., & Fuller, P. (2015). Using digital libraries non-visually: Understanding the help-seeking situations of blind users. Information Research: An International Electronic Journal, 20(2), paper 673. Retrieved from http://InformationR.net/ir/20-2/paper673.html.
    4. Griffin, E. (2015). Tips for making web video & audio accessible. http://www.3playmedia.com/2015/07/13/tips-for-making-web-video-audio-accessible/
    5. Graves, A., Mohamed, A. R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (ICASSP), 2013 IEEE International Conference (pp. 6645-6649). IEEE.
    6. HathiTrust. (2018). Retrieved from https://babel.hathitrust.org/cgi/mb?a=listis;c=464226859;sort=title_a;pn=1;lmt=ft
    7. Ingle, R. R., Fujii, Y., Deselaers, T., Baccash, J., & Popat, A. C. (2019). A Scalable Handwritten Text Recognition System. arXiv preprint arXiv:1904.09150.
    8. Muehlberger, G., Seaward, L., Terras, M., Oliveira, S. A., Bosch, V., Bryan, M., & Gatos, B. (2019). Transforming scholarship in the archives through handwritten text recognition. Journal of Documentation.
    9. Penn State. Caption Guidelines and policy. Retrieved from https://accessibility.psu.edu/video/captions/
    10. Transcripts. (2019). Retrieved from https://www.w3.org/WAI/media/av/transcripts/#where-to-put-transcripts.
    11. Video Captions. (2019). Retrieved from https://www.w3.org/WAI/perspective-videos/captions/.
    12. WebAIM. (2019). Alternative Text. Retrieved from https://webaim.org/techniques/alttext/
    13. Zhong, Y., Raman, T. V., Burkhardt, C., Biadsy, F., & Bigham, J. P. (2014, April). JustSpeak: enabling universal voice control on Android. In Proceedings of the 11th Web for All Conference (p. 36). ACM.

See also:

Help-seeking Situations > A. Difficulty Accessing Information

Help-seeking Situations > D. Difficulty locating specific information, items, or features