TY - GEN
T1 - Screen Reading Enabled by Large Language Models
AU - Ghosh, Anujay
AU - Reddy, Monalika Padma
AU - Kodandaram, Satwik Ram
AU - Uckun, Utku
AU - Ashok, Vikas
AU - Bi, Xiaojun
AU - Ramakrishnan, I. V.
N1 - Publisher Copyright: © 2024 Copyright held by the owner/author(s).
PY - 2024/10/27
Y1 - 2024/10/27
N2 - Large language models (LLMs), such as the pioneering GPT technology by OpenAI, have undeniably become one of the most signifcant innovations in recent history. They have achieved phenomenal success across a broad spectrum of applications in numerous industries, transforming how we interact with the digital world. Notwithstanding these remarkable successes, applying LLMs within the realm of accessibility has largely been unexplored. We introduce Savant, as a demonstration of the potential of LLMs for accessibility. Specifcally, Savant leverages the impressive text comprehension abilities of LLMs to provide uniform interaction for screen reader users across various applications, mitigating the signifcant interaction burden imposed by the heterogeneity in user interfaces for blind screen reader users. Savant automates screen reader actions on control elements like buttons, text felds, and drop-down menus via spoken natural language commands (NLCs). Interpreting the NLC, identifying the correct control element, and formulating the action sequence are facilitated by LLMs. Few-shot prompts supply context and guidance for the LLMs to produce appropriate responses, specifically converting the NLC into a correct series of actions on the user interface elements, which are then performed automatically. The demonstration will exhibit Savant's capability across a variety of exemplar applications, emphasizing its versatility.
AB - Large language models (LLMs), such as the pioneering GPT technology by OpenAI, have undeniably become one of the most signifcant innovations in recent history. They have achieved phenomenal success across a broad spectrum of applications in numerous industries, transforming how we interact with the digital world. Notwithstanding these remarkable successes, applying LLMs within the realm of accessibility has largely been unexplored. We introduce Savant, as a demonstration of the potential of LLMs for accessibility. Specifcally, Savant leverages the impressive text comprehension abilities of LLMs to provide uniform interaction for screen reader users across various applications, mitigating the signifcant interaction burden imposed by the heterogeneity in user interfaces for blind screen reader users. Savant automates screen reader actions on control elements like buttons, text felds, and drop-down menus via spoken natural language commands (NLCs). Interpreting the NLC, identifying the correct control element, and formulating the action sequence are facilitated by LLMs. Few-shot prompts supply context and guidance for the LLMs to produce appropriate responses, specifically converting the NLC into a correct series of actions on the user interface elements, which are then performed automatically. The demonstration will exhibit Savant's capability across a variety of exemplar applications, emphasizing its versatility.
KW - Accessibility
KW - Assistive technology
KW - Blind users
KW - Computer Interaction
KW - Large language models (LLMs)
KW - Uniform interaction
UR - https://www.scopus.com/pages/publications/85211451665
U2 - 10.1145/3663548.3688491
DO - 10.1145/3663548.3688491
M3 - Conference contribution
T3 - ASSETS 2024 - Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility
BT - ASSETS 2024 - Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility
PB - Association for Computing Machinery, Inc
T2 - 26th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2024
Y2 - 28 October 2024 through 30 October 2024
ER -