Picture for Jingkang Yang

Jingkang Yang

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene

Add code
Mar 19, 2025
Viaarxiv icon

EgoLife: Towards Egocentric Life Assistant

Add code
Mar 05, 2025
Viaarxiv icon

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Add code
Nov 21, 2024
Figure 1 for Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Figure 2 for Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Figure 3 for Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Figure 4 for Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Viaarxiv icon

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Add code
Jul 31, 2024
Figure 1 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 2 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 3 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Figure 4 for Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Viaarxiv icon

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Add code
Jul 17, 2024
Viaarxiv icon

Long Context Transfer from Language to Vision

Add code
Jun 24, 2024
Figure 1 for Long Context Transfer from Language to Vision
Figure 2 for Long Context Transfer from Language to Vision
Figure 3 for Long Context Transfer from Language to Vision
Figure 4 for Long Context Transfer from Language to Vision
Viaarxiv icon

4D Panoptic Scene Graph Generation

Add code
May 16, 2024
Figure 1 for 4D Panoptic Scene Graph Generation
Figure 2 for 4D Panoptic Scene Graph Generation
Figure 3 for 4D Panoptic Scene Graph Generation
Figure 4 for 4D Panoptic Scene Graph Generation
Viaarxiv icon

WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning

Add code
May 06, 2024
Figure 1 for WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Figure 2 for WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Figure 3 for WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Figure 4 for WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Viaarxiv icon

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

Add code
Mar 29, 2024
Viaarxiv icon

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Add code
Jan 18, 2024
Figure 1 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Figure 2 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Figure 3 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Figure 4 for Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Viaarxiv icon