Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:RWKV-UI: UI Understanding with Enhanced Perception and Reasoning

Feb 06, 2025

Jiaxi Yang, Haowen Hou

Figure 1 for RWKV-UI: UI Understanding with Enhanced Perception and Reasoning

Figure 2 for RWKV-UI: UI Understanding with Enhanced Perception and Reasoning

Figure 3 for RWKV-UI: UI Understanding with Enhanced Perception and Reasoning

Figure 4 for RWKV-UI: UI Understanding with Enhanced Perception and Reasoning

Share this with someone who'll enjoy it:

Abstract:Existing Visual Language Modelsoften struggle with information loss and limited reasoning abilities when handling high-resolution web interfaces that combine complex visual, textual, and interactive elements. These challenges are particularly evident in tasks requiring webpage layout comprehension and multi-step interactive reasoning. To address these challenges, we propose RWKV-UI, a Visual Language Model based on the RWKV architecture, specifically designed to handle high-resolution UI images. During model training, we introduce layout detection as a visual prompt to help the model better understand the webpage layout structures. Additionally, we design a visual prompt based on the Chain-of-Thought(CoT) mechanism, which enhances the model's ability to understand and reason about webpage content through reasoning chains. Experimental results show that RWKV-UI demonstrates significant performance improvements in high-resolution UI understanding and interactive reasoning tasks.

* 10 pages, 5figures, conference

View paper on

Share this with someone who'll enjoy it:

Title:RWKV-UI: UI Understanding with Enhanced Perception and Reasoning

Paper and Code