Picture for Yuankai Qi

Yuankai Qi

Exploring Primitive Visual Measurement Understanding and the Role of Output Format in Learning in Vision-Language Models

Add code
Jan 25, 2025
Viaarxiv icon

Adapter-Enhanced Semantic Prompting for Continual Learning

Add code
Dec 15, 2024
Figure 1 for Adapter-Enhanced Semantic Prompting for Continual Learning
Figure 2 for Adapter-Enhanced Semantic Prompting for Continual Learning
Figure 3 for Adapter-Enhanced Semantic Prompting for Continual Learning
Figure 4 for Adapter-Enhanced Semantic Prompting for Continual Learning
Viaarxiv icon

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

Add code
Dec 12, 2024
Figure 1 for EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Figure 2 for EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Figure 3 for EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Figure 4 for EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Viaarxiv icon

Generating High-quality Symbolic Music Using Fine-grained Discriminators

Add code
Aug 03, 2024
Figure 1 for Generating High-quality Symbolic Music Using Fine-grained Discriminators
Figure 2 for Generating High-quality Symbolic Music Using Fine-grained Discriminators
Figure 3 for Generating High-quality Symbolic Music Using Fine-grained Discriminators
Figure 4 for Generating High-quality Symbolic Music Using Fine-grained Discriminators
Viaarxiv icon

Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis

Add code
Jun 27, 2024
Figure 1 for Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis
Figure 2 for Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis
Figure 3 for Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis
Figure 4 for Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis
Viaarxiv icon

Augmented Commonsense Knowledge for Remote Object Grounding

Add code
Jun 03, 2024
Figure 1 for Augmented Commonsense Knowledge for Remote Object Grounding
Figure 2 for Augmented Commonsense Knowledge for Remote Object Grounding
Figure 3 for Augmented Commonsense Knowledge for Remote Object Grounding
Figure 4 for Augmented Commonsense Knowledge for Remote Object Grounding
Viaarxiv icon

Retrieval Enhanced Zero-Shot Video Captioning

Add code
May 11, 2024
Figure 1 for Retrieval Enhanced Zero-Shot Video Captioning
Figure 2 for Retrieval Enhanced Zero-Shot Video Captioning
Figure 3 for Retrieval Enhanced Zero-Shot Video Captioning
Figure 4 for Retrieval Enhanced Zero-Shot Video Captioning
Viaarxiv icon

Generating Content for HDR Deghosting from Frequency View

Add code
Apr 01, 2024
Figure 1 for Generating Content for HDR Deghosting from Frequency View
Figure 2 for Generating Content for HDR Deghosting from Frequency View
Figure 3 for Generating Content for HDR Deghosting from Frequency View
Figure 4 for Generating Content for HDR Deghosting from Frequency View
Viaarxiv icon

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework

Add code
Mar 12, 2024
Figure 1 for Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework
Figure 2 for Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework
Figure 3 for Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework
Figure 4 for Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework
Viaarxiv icon

StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing

Add code
Feb 21, 2024
Figure 1 for StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
Figure 2 for StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
Figure 3 for StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
Figure 4 for StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
Viaarxiv icon