Picture for Yiwu Zhong

Yiwu Zhong

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Add code
Oct 15, 2024
Figure 1 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 2 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 3 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 4 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Viaarxiv icon

Enhancing Temporal Modeling of Video LLMs via Time Gating

Add code
Oct 08, 2024
Viaarxiv icon

Generalized Tampered Scene Text Detection in the era of Generative AI

Add code
Jul 31, 2024
Viaarxiv icon

Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models

Add code
Mar 27, 2024
Viaarxiv icon

Towards Learning a Generalist Model for Embodied Navigation

Add code
Dec 06, 2023
Viaarxiv icon

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Add code
Nov 13, 2023
Viaarxiv icon

Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models

Add code
Oct 04, 2023
Viaarxiv icon

Learning Concise and Descriptive Attributes for Visual Recognition

Add code
Aug 07, 2023
Figure 1 for Learning Concise and Descriptive Attributes for Visual Recognition
Figure 2 for Learning Concise and Descriptive Attributes for Visual Recognition
Figure 3 for Learning Concise and Descriptive Attributes for Visual Recognition
Figure 4 for Learning Concise and Descriptive Attributes for Visual Recognition
Viaarxiv icon

Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations

Add code
Mar 31, 2023
Viaarxiv icon

RegionCLIP: Region-based Language-Image Pretraining

Add code
Dec 16, 2021
Figure 1 for RegionCLIP: Region-based Language-Image Pretraining
Figure 2 for RegionCLIP: Region-based Language-Image Pretraining
Figure 3 for RegionCLIP: Region-based Language-Image Pretraining
Figure 4 for RegionCLIP: Region-based Language-Image Pretraining
Viaarxiv icon