Picture for Yun Zheng

Yun Zheng

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Add code
Mar 20, 2025
Viaarxiv icon

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

Add code
Mar 05, 2025
Viaarxiv icon

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface

Add code
Mar 04, 2025
Viaarxiv icon

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs

Add code
Feb 19, 2025
Viaarxiv icon

ContextHOI: Spatial Context Learning for Human-Object Interaction Detection

Add code
Dec 12, 2024
Viaarxiv icon

Orchestrating the Symphony of Prompt Distribution Learning for Human-Object Interaction Detection

Add code
Dec 11, 2024
Viaarxiv icon

CoReS: Orchestrating the Dance of Reasoning and Segmentation

Add code
Apr 08, 2024
Viaarxiv icon

Automated Identification and Segmentation of Hi Sources in CRAFTS Using Deep Learning Method

Add code
Mar 29, 2024
Figure 1 for Automated Identification and Segmentation of Hi Sources in CRAFTS Using Deep Learning Method
Figure 2 for Automated Identification and Segmentation of Hi Sources in CRAFTS Using Deep Learning Method
Figure 3 for Automated Identification and Segmentation of Hi Sources in CRAFTS Using Deep Learning Method
Figure 4 for Automated Identification and Segmentation of Hi Sources in CRAFTS Using Deep Learning Method
Viaarxiv icon

Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

Add code
Mar 05, 2024
Viaarxiv icon

Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model

Add code
Dec 18, 2023
Figure 1 for Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Figure 2 for Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Figure 3 for Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Figure 4 for Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Viaarxiv icon