1. How to conduct blind and visually impaired (BVI) user studies in mobile environment

Using the Institute of Museum and Library Services’ (IMLS) National Leadership Grant (LG-252289-OLS-22) and University of Wisconsin-Milwaukee’s (UWM) Discovery and Innovation Grant (DIG) and, we have examined BVI users’ help-seeking situations and developed accessibility and usability guidelines for DLs in the mobile context. We use the examples of our user studies of these two research projects to illustrate how to recruit BVI participants and how to collect and analyze data. The studies were approved by the IRB.

1.1. IMLS studies

There are 4 stages in the IMLS project. The first stage focused on identifying help-seeking situations in mobile interactions and surveying existing guidelines and papers by document analysis. User studies will be conducted in the following three stages.

1.1.1. Sampling

Participant recruitment will occur for three different communities participating in various stages of the mDLAUG development. First, 120 BVI participants were recruited for a user study in DL mobile contexts in order to identify help-seeking situations. Second, 150 participants will be recruited for in-depth surveys to provide feedback for the draft of mDLAUG. Third, 30 DL developers will be recruited for the assessment of 6 DLs and participation in focus groups to identify the challenges and associated solutions in implementing and adopting mDLAUG.

To create the mDLAUG draft, we recruited 120 BVI participants across the United States, with 30 in each of four groups representing different types of mobile devices (iPhone, iPad, Android phone, and Android tablet.). All of the BVI participants relied on screen readers to interact with DLs using mobile devices. Each participant used the default screen reader of their devices: VoiceOver for the iPhone/iPad groups and TalkBack for the Android phone/Android tablet groups. A variety of methods were used to recruit BVI users. Most importantly, the recruitment flyer was distributed through the NFB. In addition, the research team attempted to find online communities of BVI users using IR systems (e.g., Google) and posted recruitment messages in such communities. Moreover, snowball sampling was used to find more participants, especially for the Android groups. Participants who had finished the study were encouraged to share the research project information with their BVI peers who they thought might be interested in the study.

BVI people needed to fulfill certain inclusion criteria for the study. Eligible participants were required to (1) be 18 years old or older, (2) rely on a screen reader to access the Internet, (3) have at least three years of experience in using mobile devices to search for information, (4) feel comfortable verbalizing in English, and (5) be willing to install Microsoft Teams on their mobile devices.

1.1.2. Data collection

We employed multiple data collection methods, including pre-questionnaires, think-aloud protocols, transaction logs, post-system interviews, and post-search interviews. The following is the data collection procedure. First, participants completed the pre-questionnaires to provide background information (e.g., demographic information, information search skills, and experience of using DLs). Second, each participant joined an online meeting via Microsoft Teams using their own devices in their natural settings to work on three tasks, including one orientation task and two search tasks (a specific search and an exploratory search) on each of the two DLs. Before the search tasks, they completed one 10-minute orientation task to familiarize themselves with the assigned DLs. Third, think-aloud was used when participants worked on the tasks. Specifically, they needed to talk continuously about their thoughts and actions in relation to their interactions with each DL, specifying their intentions for their actions, the problems they encountered, and their solutions. At the same time, transaction logs were recorded. Fourth, post-system interviews were conducted after each DL to solicit information about participants’ experiences and perceptions of the accessibility and usability of each DL. A post-search interview was conducted at the end of their participation to gather a final assessment of the two selected DLs. Participants’ interactions with DLs during tasks and interviews were recorded using Microsoft Teams, and video and transcripts were exported for analysis.

1.1.3. Data analysis

Qualitative data from the transcripts were examined. Moreover, videos were constantly being checked. Relevant screenshots were taken to help illustrate identified help-seeking situations. Open coding was used for data analysis, representing the process of breaking down, examining, comparing, conceptualizing, and categorizing textual transcripts. Inter-coder reliability was 0.81 based on Krippendorff’s Alpha, indicating a good level of agreement among coders. Multiple sessions of group discussions were held to scrutinize the identified help-seeking situations and associated design recommendations.

1.2. DIG user study

1.2.1. Sampling

Thirty BVI participants were recruited throughout the United States by distributing the recruitment flyer to the National Federation of the Blind (NFB) listserv. Prior to joining the study, potential participants received a brief pre-screening questionnaire and a consent form. Participants needed to meet the following requirements: (1) using iPhone 6S (or newer) with iOS 11 (or later); (2) using iPhone non-visually by listening to VoiceOver; (3) having at least three years of experience searching for information on the Internet via iPhone; (4) feeling comfortable verbalizing thoughts in English; and (5) being willing to install the Microsoft Teams software and the Library of Congress Digital Collections app. The Library of Congress Digital Collections were selected for this study since it is one of a few DLs providing both mobile website and mobile app. As a national DL, the Library of Congress Digital Collections consist of multiple digital collections covering a wide range of topics of interest to BVI users.  Thirty participants were recruited to use both the mobile app and the mobile website. Each participant received a $100 electronic gift card to compensate for their time and efforts for participating in the study.

1.2.2. Data collection

Multiple data collection methods were used, including pre-search interviews, think-aloud protocols and transaction logs, post-platform questionnaires, post-platform interviews, and post-search interviews. Since this is a within-subjects design, following the Latin Square design, the first participant (P1) was assigned to use the mobile app first, and the second participant (P2) was assigned to the mobile website first. The rest of the participants were assigned to either platform first accordingly. For each platform, participants performed one orientation task, one specific search task, and one subject search task. After successfully installing the required apps, each participant spent around 3.5 hours completing all of the study activities. Microsoft Teams was employed to collect data. Think-aloud protocols and transaction logs were applied to capture participants’ thoughts and movements when working on the tasks. All recorded audio and video files were transcribed verbatim for further analysis. For each platform, a questionnaire was employed to help researchers to measure participants’ perceptions of each platform’ accessibility and usability as well as their satisfaction towards the platform by using a 7-point Likert scale with 1 indicating not at all and 7 extremely. Different types of interviews, including post-platform interviews and post-search interviews, allowed the researchers to record participants’ responses related to research questions. During post-platform interviews, participants provided the reasons for their rating of accessibility and usability of each platform as well as suggestions for how to enhance the DL in mobile platforms. In the post-search interviews, participants provided their assessment regarding the comparison of the two platforms of the DL and their final thoughts.

1.2.3. Data analysis

Qualitative data (i.e., interview transcripts, think-aloud protocols, and transaction logs) were examined to identify types of help-seeking situations and design factors. Open coding was utilized for the analysis. Two coders coded the data independently, and the inter-coder reliability was 0.94 based on Holsti’s (1969) method. The codes were discussed within the research team until an agreement was reached, and disagreements or questions were resolved by group discussions to ensure the reliability of data analysis.