Picture for Jian Ding

Jian Ding

A computational transition for detecting correlated stochastic block models by low-degree polynomials

Add code
Sep 02, 2024
Viaarxiv icon

Goldfish: Vision-Language Understanding of Arbitrarily Long Videos

Add code
Jul 17, 2024
Viaarxiv icon

InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding

Add code
Jun 28, 2024
Viaarxiv icon

VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding

Add code
Jun 18, 2024
Viaarxiv icon

iMotion-LLM: Motion Prediction Instruction Tuning

Add code
Jun 11, 2024
Viaarxiv icon

Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding

Add code
May 29, 2024
Viaarxiv icon

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

Add code
May 16, 2024
Figure 1 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Figure 2 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Figure 3 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Figure 4 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Viaarxiv icon

Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation

Add code
May 16, 2024
Viaarxiv icon

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Add code
Apr 04, 2024
Viaarxiv icon

Uni3DL: Unified Model for 3D and Language Understanding

Add code
Dec 05, 2023
Viaarxiv icon