Picture for Yunxin Li

Yunxin Li

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Add code
Jan 21, 2025
Viaarxiv icon

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Add code
Aug 19, 2024
Figure 1 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 2 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 3 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 4 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Viaarxiv icon

VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

Add code
Jun 17, 2024
Figure 1 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 2 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 3 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 4 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Viaarxiv icon

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Add code
May 18, 2024
Viaarxiv icon

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

Add code
May 08, 2024
Viaarxiv icon

LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs

Add code
Feb 21, 2024
Figure 1 for LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs
Figure 2 for LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs
Figure 3 for LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs
Figure 4 for LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs
Viaarxiv icon

Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

Add code
Feb 21, 2024
Figure 1 for Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Figure 2 for Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Figure 3 for Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Figure 4 for Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Viaarxiv icon

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

Add code
Feb 21, 2024
Viaarxiv icon

Frame Structure and Protocol Design for Sensing-Assisted NR-V2X Communications

Add code
Dec 27, 2023
Viaarxiv icon

Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs

Add code
Nov 27, 2023
Viaarxiv icon