Picture for Kanzhi Cheng

Kanzhi Cheng

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Add code
Oct 30, 2024
Figure 1 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 2 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 3 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 4 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Viaarxiv icon

Vision-Language Models Can Self-Improve Reasoning via Reflection

Add code
Oct 30, 2024
Figure 1 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Figure 2 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Figure 3 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Figure 4 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Viaarxiv icon

Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models

Add code
Jun 17, 2024
Figure 1 for Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Figure 2 for Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Figure 3 for Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Figure 4 for Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Viaarxiv icon

A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

Add code
Mar 21, 2024
Viaarxiv icon

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

Add code
Jan 17, 2024
Viaarxiv icon

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

Add code
Aug 06, 2023
Viaarxiv icon

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model

Add code
Aug 02, 2023
Viaarxiv icon

ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora

Add code
Aug 02, 2023
Viaarxiv icon