Picture for Kaixin Ma

Kaixin Ma

OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization

Add code
Oct 25, 2024
Figure 1 for OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Figure 2 for OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Figure 3 for OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Figure 4 for OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Viaarxiv icon

DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects

Add code
Oct 03, 2024
Figure 1 for DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects
Figure 2 for DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects
Figure 3 for DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects
Figure 4 for DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects
Viaarxiv icon

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Add code
Oct 02, 2024
Viaarxiv icon

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots

Add code
Sep 16, 2024
Figure 1 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Figure 2 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Figure 3 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Figure 4 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Viaarxiv icon

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Add code
Sep 12, 2024
Figure 1 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Figure 2 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Figure 3 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Figure 4 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Viaarxiv icon

COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes

Add code
Sep 06, 2024
Figure 1 for COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes
Figure 2 for COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes
Figure 3 for COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes
Figure 4 for COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes
Viaarxiv icon

DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

Add code
Jul 15, 2024
Viaarxiv icon

MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

Add code
Apr 24, 2024
Viaarxiv icon

SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Add code
Apr 22, 2024
Figure 1 for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense
Figure 2 for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense
Figure 3 for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense
Figure 4 for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense
Viaarxiv icon

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Add code
Jan 28, 2024
Viaarxiv icon