Picture for Zongyang Ma

Zongyang Ma

mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA

Add code
Nov 22, 2024
Figure 1 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Figure 2 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Figure 3 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Figure 4 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Viaarxiv icon

E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding

Add code
Sep 26, 2024
Viaarxiv icon

How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

Add code
Jul 10, 2024
Viaarxiv icon

EA-VTR: Event-Aware Video-Text Retrieval

Add code
Jul 10, 2024
Viaarxiv icon

CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation

Add code
Mar 31, 2022
Figure 1 for CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation
Figure 2 for CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation
Figure 3 for CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation
Figure 4 for CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation
Viaarxiv icon

Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

Add code
Mar 20, 2022
Figure 1 for Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation
Figure 2 for Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation
Figure 3 for Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation
Figure 4 for Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation
Viaarxiv icon