Picture for Haifeng Huang

Haifeng Huang

KiToke: Kernel-based Interval-aware Token Compression for Video Large Language Models

Add code
Apr 03, 2026
Viaarxiv icon

Chat-Scene++: Exploiting Context-Rich Object Identification for 3D LLM

Add code
Mar 29, 2026
Viaarxiv icon

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation

Add code
Jun 12, 2025
Figure 1 for GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Figure 2 for GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Figure 3 for GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Figure 4 for GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Viaarxiv icon

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

Add code
Apr 30, 2025
Figure 1 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Figure 2 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Figure 3 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Figure 4 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Viaarxiv icon

Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine

Add code
Dec 12, 2024
Viaarxiv icon

Improving Retrieval Augmented Language Model with Self-Reasoning

Add code
Jul 29, 2024
Viaarxiv icon

A Refer-and-Ground Multimodal Large Language Model for Biomedicine

Add code
Jun 26, 2024
Viaarxiv icon

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

Add code
Jun 13, 2024
Figure 1 for MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Figure 2 for MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Figure 3 for MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Figure 4 for MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Viaarxiv icon

Grounded 3D-LLM with Referent Tokens

Add code
May 16, 2024
Viaarxiv icon

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

Add code
May 10, 2024
Figure 1 for FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Figure 2 for FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Figure 3 for FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Figure 4 for FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Viaarxiv icon