Picture for Yapeng Tian

Yapeng Tian

SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering

Add code
Nov 07, 2024
Figure 1 for SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Figure 2 for SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Figure 3 for SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Figure 4 for SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Viaarxiv icon

Continual Audio-Visual Sound Separation

Add code
Nov 05, 2024
Viaarxiv icon

Scaling Concept With Text-Guided Diffusion Models

Add code
Oct 31, 2024
Viaarxiv icon

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

Add code
Oct 30, 2024
Figure 1 for CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Figure 2 for CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Figure 3 for CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Figure 4 for CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Viaarxiv icon

Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models

Add code
Oct 15, 2024
Figure 1 for Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Figure 2 for Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Figure 3 for Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Figure 4 for Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Viaarxiv icon

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation

Add code
Oct 09, 2024
Figure 1 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 2 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 3 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 4 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Viaarxiv icon

DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures

Add code
Sep 11, 2024
Figure 1 for DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Figure 2 for DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Figure 3 for DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Figure 4 for DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
Viaarxiv icon

Semantic Grouping Network for Audio Source Separation

Add code
Jul 04, 2024
Viaarxiv icon

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Add code
Jun 11, 2024
Viaarxiv icon

MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers

Add code
Jun 07, 2024
Viaarxiv icon