Picture for Yanwei Li

Yanwei Li

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Add code
Dec 12, 2024
Viaarxiv icon

Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration

Add code
Dec 01, 2024
Viaarxiv icon

LLaVA-OneVision: Easy Visual Task Transfer

Add code
Aug 06, 2024
Figure 1 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 2 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 3 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 4 for LLaVA-OneVision: Easy Visual Task Transfer
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Figure 1 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 2 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 3 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 4 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Viaarxiv icon

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Add code
Mar 27, 2024
Figure 1 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Figure 2 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Figure 3 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Figure 4 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Viaarxiv icon

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

Add code
Feb 29, 2024
Figure 1 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Figure 2 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Figure 3 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Figure 4 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Viaarxiv icon

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Add code
Nov 28, 2023
Viaarxiv icon

LISA: Reasoning Segmentation via Large Language Model

Add code
Aug 03, 2023
Viaarxiv icon

Democratizing Pathological Image Segmentation with Lay Annotators via Molecular-empowered Learning

Add code
May 31, 2023
Viaarxiv icon

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

Add code
May 30, 2023
Figure 1 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Figure 2 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Figure 3 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Figure 4 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Viaarxiv icon