Picture for Yapeng Tian

Yapeng Tian

Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters

Add code
Dec 18, 2024
Figure 1 for Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters
Figure 2 for Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters
Figure 3 for Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters
Figure 4 for Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters
Viaarxiv icon

Modality-Inconsistent Continual Learning of Multimodal Large Language Models

Add code
Dec 17, 2024
Figure 1 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 2 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 3 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 4 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Viaarxiv icon

VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation

Add code
Dec 14, 2024
Viaarxiv icon

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach

Add code
Nov 26, 2024
Viaarxiv icon

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Add code
Nov 19, 2024
Figure 1 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 2 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 3 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 4 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Viaarxiv icon

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level

Add code
Nov 15, 2024
Figure 1 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Figure 2 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Figure 3 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Figure 4 for Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Viaarxiv icon

SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering

Add code
Nov 07, 2024
Figure 1 for SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Figure 2 for SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Figure 3 for SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Figure 4 for SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Viaarxiv icon

Continual Audio-Visual Sound Separation

Add code
Nov 05, 2024
Figure 1 for Continual Audio-Visual Sound Separation
Figure 2 for Continual Audio-Visual Sound Separation
Figure 3 for Continual Audio-Visual Sound Separation
Figure 4 for Continual Audio-Visual Sound Separation
Viaarxiv icon

Scaling Concept With Text-Guided Diffusion Models

Add code
Oct 31, 2024
Viaarxiv icon

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

Add code
Oct 30, 2024
Figure 1 for CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Figure 2 for CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Figure 3 for CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Figure 4 for CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Viaarxiv icon