Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ruei-Che Chang

EditScribe: Non-Visual Image Editing with Natural Language Verification Loops

Aug 13, 2024

Ruei-Che Chang, Yuxuan Liu, Lotus Zhang, Anhong Guo

Figure 1 for EditScribe: Non-Visual Image Editing with Natural Language Verification Loops

Figure 2 for EditScribe: Non-Visual Image Editing with Natural Language Verification Loops

Figure 3 for EditScribe: Non-Visual Image Editing with Natural Language Verification Loops

Figure 4 for EditScribe: Non-Visual Image Editing with Natural Language Verification Loops

Abstract:Image editing is an iterative process that requires precise visual evaluation and manipulation for the output to match the editing intent. However, current image editing tools do not provide accessible interaction nor sufficient feedback for blind and low vision individuals to achieve this level of control. To address this, we developed EditScribe, a prototype system that makes image editing accessible using natural language verification loops powered by large multimodal models. Using EditScribe, the user first comprehends the image content through initial general and object descriptions, then specifies edit actions using open-ended natural language prompts. EditScribe performs the image edit, and provides four types of verification feedback for the user to verify the performed edit, including a summary of visual changes, AI judgement, and updated general and object descriptions. The user can ask follow-up questions to clarify and probe into the edits or verification feedback, before performing another edit. In a study with ten blind or low-vision users, we found that EditScribe supported participants to perform and verify image edit actions non-visually. We observed different prompting strategies from participants, and their perceptions on the various types of verification feedback. Finally, we discuss the implications of leveraging natural language verification loops to make visual authoring non-visually accessible.

* ASSETS 2024

Via

Access Paper or Ask Questions

WorldScribe: Towards Context-Aware Live Visual Descriptions

Aug 13, 2024

Ruei-Che Chang, Yuxuan Liu, Anhong Guo

Figure 1 for WorldScribe: Towards Context-Aware Live Visual Descriptions

Figure 2 for WorldScribe: Towards Context-Aware Live Visual Descriptions

Figure 3 for WorldScribe: Towards Context-Aware Live Visual Descriptions

Figure 4 for WorldScribe: Towards Context-Aware Live Visual Descriptions

Abstract:Automated live visual descriptions can aid blind people in understanding their surroundings with autonomy and independence. However, providing descriptions that are rich, contextual, and just-in-time has been a long-standing challenge in accessibility. In this work, we develop WorldScribe, a system that generates automated live real-world visual descriptions that are customizable and adaptive to users' contexts: (i) WorldScribe's descriptions are tailored to users' intents and prioritized based on semantic relevance. (ii) WorldScribe is adaptive to visual contexts, e.g., providing consecutively succinct descriptions for dynamic scenes, while presenting longer and detailed ones for stable settings. (iii) WorldScribe is adaptive to sound contexts, e.g., increasing volume in noisy environments, or pausing when conversations start. Powered by a suite of vision, language, and sound recognition models, WorldScribe introduces a description generation pipeline that balances the tradeoffs between their richness and latency to support real-time use. The design of WorldScribe is informed by prior work on providing visual descriptions and a formative study with blind participants. Our user study and subsequent pipeline evaluation show that WorldScribe can provide real-time and fairly accurate visual descriptions to facilitate environment understanding that is adaptive and customized to users' contexts. Finally, we discuss the implications and further steps toward making live visual descriptions more context-aware and humanized.

* UIST 2024

Via

Access Paper or Ask Questions

Sound Unblending: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness

Jan 20, 2024

Ruei-Che Chang, Chia-Sheng Hung, Bing-Yu Chen, Dhruv Jain, Anhong Guo

Figure 1 for Sound Unblending: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness

Figure 2 for Sound Unblending: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness

Figure 3 for Sound Unblending: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness

Figure 4 for Sound Unblending: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness

Abstract:Mixed-reality (MR) soundscapes blend real-world sound with virtual audio from hearing devices, presenting intricate auditory information that is hard to discern and differentiate. This is particularly challenging for blind or visually impaired individuals, who rely on sounds and descriptions in their everyday lives. To understand how complex audio information is consumed, we analyzed online forum posts within the blind community, identifying prevailing challenges, needs, and desired solutions. We synthesized the results and proposed Sound Unblending for increasing MR sound awareness, which includes six sound manipulations: Ambience Builder, Feature Shifter, Earcon Generator, Prioritizer, Spatializer, and Stylizer. To evaluate the effectiveness of sound unblending, we conducted a user study with 18 blind participants across three simulated MR scenarios, where participants identified specific sounds within intricate soundscapes. We found that sound unblending increased MR sound awareness and minimized cognitive load. Finally, we developed three real-world example applications to demonstrate the practicality of sound unblending.

Via

Access Paper or Ask Questions