Picture for Zekun Wang

Zekun Wang

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Add code
Dec 12, 2024
Viaarxiv icon

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Add code
Dec 05, 2024
Viaarxiv icon

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

Add code
Oct 28, 2024
Figure 1 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 2 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 3 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 4 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Viaarxiv icon

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Add code
Oct 15, 2024
Figure 1 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 2 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 3 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 4 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Viaarxiv icon

PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

Add code
Oct 09, 2024
Viaarxiv icon

MIO: A Foundation Model on Multimodal Tokens

Add code
Sep 26, 2024
Figure 1 for MIO: A Foundation Model on Multimodal Tokens
Figure 2 for MIO: A Foundation Model on Multimodal Tokens
Figure 3 for MIO: A Foundation Model on Multimodal Tokens
Figure 4 for MIO: A Foundation Model on Multimodal Tokens
Viaarxiv icon

Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos

Add code
Aug 26, 2024
Viaarxiv icon

GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

Add code
Jun 26, 2024
Viaarxiv icon

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Add code
Jun 20, 2024
Figure 1 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 2 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 3 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 4 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Viaarxiv icon

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Add code
Jun 11, 2024
Figure 1 for II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
Figure 2 for II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
Figure 3 for II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
Figure 4 for II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
Viaarxiv icon