Picture for Yanwei Li

Yanwei Li

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Add code
Feb 13, 2025
Viaarxiv icon

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Add code
Dec 12, 2024
Figure 1 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 2 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 3 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 4 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Viaarxiv icon

Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration

Add code
Dec 01, 2024
Figure 1 for Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
Figure 2 for Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
Figure 3 for Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
Figure 4 for Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
Viaarxiv icon

LLaVA-OneVision: Easy Visual Task Transfer

Add code
Aug 06, 2024
Figure 1 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 2 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 3 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 4 for LLaVA-OneVision: Easy Visual Task Transfer
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Figure 1 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 2 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 3 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 4 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Viaarxiv icon

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Add code
Mar 27, 2024
Figure 1 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Figure 2 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Figure 3 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Figure 4 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Viaarxiv icon

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

Add code
Feb 29, 2024
Figure 1 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Figure 2 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Figure 3 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Figure 4 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Viaarxiv icon

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Add code
Nov 28, 2023
Viaarxiv icon

LISA: Reasoning Segmentation via Large Language Model

Add code
Aug 03, 2023
Viaarxiv icon

Democratizing Pathological Image Segmentation with Lay Annotators via Molecular-empowered Learning

Add code
May 31, 2023
Viaarxiv icon