Picture for Chenhui Gou

Chenhui Gou

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

Add code
Nov 22, 2024
Viaarxiv icon

EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance

Add code
Sep 12, 2024
Viaarxiv icon

How Well Can Vision Language Models See Image Details?

Add code
Aug 07, 2024
Viaarxiv icon

InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding

Add code
Jun 28, 2024
Viaarxiv icon

DrVideo: Document Retrieval Based Long Video Understanding

Add code
Jun 18, 2024
Figure 1 for DrVideo: Document Retrieval Based Long Video Understanding
Figure 2 for DrVideo: Document Retrieval Based Long Video Understanding
Figure 3 for DrVideo: Document Retrieval Based Long Video Understanding
Figure 4 for DrVideo: Document Retrieval Based Long Video Understanding
Viaarxiv icon

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

Add code
Apr 02, 2024
Viaarxiv icon

Strong and Controllable Blind Image Decomposition

Add code
Mar 15, 2024
Viaarxiv icon

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

Add code
Oct 13, 2022
Figure 1 for RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
Figure 2 for RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
Figure 3 for RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
Figure 4 for RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
Viaarxiv icon