Picture for Yanwei Li

Yanwei Li

LLaVA-OneVision: Easy Visual Task Transfer

Add code
Aug 06, 2024
Figure 1 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 2 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 3 for LLaVA-OneVision: Easy Visual Task Transfer
Figure 4 for LLaVA-OneVision: Easy Visual Task Transfer
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Figure 1 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 2 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 3 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 4 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Viaarxiv icon

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Add code
Mar 27, 2024
Figure 1 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Figure 2 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Figure 3 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Figure 4 for Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Viaarxiv icon

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

Add code
Feb 29, 2024
Figure 1 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Figure 2 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Figure 3 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Figure 4 for RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Viaarxiv icon

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Add code
Nov 28, 2023
Viaarxiv icon

LISA: Reasoning Segmentation via Large Language Model

Add code
Aug 03, 2023
Viaarxiv icon

Democratizing Pathological Image Segmentation with Lay Annotators via Molecular-empowered Learning

Add code
May 31, 2023
Viaarxiv icon

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

Add code
May 30, 2023
Figure 1 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Figure 2 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Figure 3 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Figure 4 for GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Viaarxiv icon

Diversified Dynamic Routing for Vision Tasks

Add code
Sep 26, 2022
Figure 1 for Diversified Dynamic Routing for Vision Tasks
Figure 2 for Diversified Dynamic Routing for Vision Tasks
Figure 3 for Diversified Dynamic Routing for Vision Tasks
Figure 4 for Diversified Dynamic Routing for Vision Tasks
Viaarxiv icon

Unifying Voxel-based Representation with Transformer for 3D Object Detection

Add code
Jun 01, 2022
Figure 1 for Unifying Voxel-based Representation with Transformer for 3D Object Detection
Figure 2 for Unifying Voxel-based Representation with Transformer for 3D Object Detection
Figure 3 for Unifying Voxel-based Representation with Transformer for 3D Object Detection
Figure 4 for Unifying Voxel-based Representation with Transformer for 3D Object Detection
Viaarxiv icon