Picture for Yunxin Li

Yunxin Li

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Add code
Aug 19, 2024
Figure 1 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 2 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 3 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 4 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Viaarxiv icon

VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

Add code
Jun 17, 2024
Figure 1 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 2 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 3 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 4 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Viaarxiv icon

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Add code
May 18, 2024
Viaarxiv icon

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

Add code
May 08, 2024
Viaarxiv icon

Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

Add code
Feb 21, 2024
Figure 1 for Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Figure 2 for Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Figure 3 for Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Figure 4 for Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Viaarxiv icon

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

Add code
Feb 21, 2024
Viaarxiv icon

LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs

Add code
Feb 21, 2024
Figure 1 for LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs
Figure 2 for LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs
Figure 3 for LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs
Figure 4 for LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs
Viaarxiv icon

Frame Structure and Protocol Design for Sensing-Assisted NR-V2X Communications

Add code
Dec 27, 2023
Viaarxiv icon

Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs

Add code
Nov 27, 2023
Viaarxiv icon

A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering

Add code
Nov 13, 2023
Viaarxiv icon