Picture for Kanzhi Cheng

Kanzhi Cheng

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

Add code
Feb 05, 2026
Viaarxiv icon

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Add code
Feb 03, 2026
Viaarxiv icon

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Add code
Jan 12, 2026
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Figure 1 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 2 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 3 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 4 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Viaarxiv icon

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Add code
Apr 11, 2025
Figure 1 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Figure 2 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Figure 3 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Figure 4 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Viaarxiv icon

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Add code
Mar 16, 2025
Figure 1 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Figure 2 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Figure 3 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Figure 4 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Viaarxiv icon

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Add code
Dec 27, 2024
Figure 1 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Figure 2 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Figure 3 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Figure 4 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Viaarxiv icon

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Add code
Oct 30, 2024
Figure 1 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 2 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 3 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 4 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Viaarxiv icon

Vision-Language Models Can Self-Improve Reasoning via Reflection

Add code
Oct 30, 2024
Figure 1 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Figure 2 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Figure 3 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Figure 4 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Viaarxiv icon

Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models

Add code
Jun 17, 2024
Figure 1 for Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Figure 2 for Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Figure 3 for Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Figure 4 for Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
Viaarxiv icon