Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hillming Li

VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models

Jan 29, 2024

Yi Zhao, Yilin Zhang, Rong Xiang, Jing Li, Hillming Li

Figure 1 for VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models

Figure 2 for VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models

Figure 3 for VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models

Figure 4 for VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models

Abstract:Visually Impaired Assistance (VIA) aims to automatically help visually impaired (VI) handle daily activities. The advancement of VIA primarily depends on developments in Computer Vision (CV) and Natural Language Processing (NLP), both of which exhibit cutting-edge paradigms with large models (LMs). Furthermore, LMs have shown exceptional multimodal abilities to tackle challenging physically-grounded tasks such as embodied robots. To investigate the potential and limitations of state-of-the-art (SOTA) LMs' capabilities in VIA applications, we present an extensive study for the task of VIA with LMs (\textbf{VIALM}). In this task, given an \textit{image} illustrating the physical environments and a \textit{linguistic request} from a VI user, VIALM aims to output step-by-step \textit{guidance} to assist the VI user in fulfilling the request grounded in the environment. The study consists of a survey reviewing recent LM research and benchmark experiments examining selected LMs' capabilities in VIA. The results indicate that while LMs can augment VIA, their output cannot be well \textit{grounded} in reality (i.e., 25.7\% GPT-4's responses) and lacks \textit{fine-grained} guidance (i.e., 32.1\% GPT-4's responses).

* under review

Via

Access Paper or Ask Questions