Picture for Jiaming Han

Jiaming Han

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Add code
Feb 23, 2025
Viaarxiv icon

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Add code
Dec 03, 2024
Figure 1 for AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Figure 2 for AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Figure 3 for AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Figure 4 for AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Viaarxiv icon

Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant

Add code
Oct 17, 2024
Figure 1 for Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant
Figure 2 for Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant
Figure 3 for Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant
Figure 4 for Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant
Viaarxiv icon

OneLLM: One Framework to Align All Modalities with Language

Add code
Dec 06, 2023
Viaarxiv icon

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Add code
Nov 13, 2023
Viaarxiv icon

ImageBind-LLM: Multi-modality Instruction Tuning

Add code
Sep 11, 2023
Viaarxiv icon

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Add code
Sep 01, 2023
Viaarxiv icon

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

Add code
Apr 28, 2023
Viaarxiv icon

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Add code
Mar 28, 2023
Figure 1 for LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Figure 2 for LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Figure 3 for LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Figure 4 for LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Viaarxiv icon

Few-Shot Object Detection via Variational Feature Aggregation

Add code
Jan 31, 2023
Figure 1 for Few-Shot Object Detection via Variational Feature Aggregation
Figure 2 for Few-Shot Object Detection via Variational Feature Aggregation
Figure 3 for Few-Shot Object Detection via Variational Feature Aggregation
Figure 4 for Few-Shot Object Detection via Variational Feature Aggregation
Viaarxiv icon