Picture for Yuanyuan Jiang

Yuanyuan Jiang

CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering

Add code
May 13, 2024
Viaarxiv icon

Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios

Add code
May 21, 2023
Viaarxiv icon

Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization

Add code
Oct 11, 2022
Figure 1 for Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
Figure 2 for Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
Figure 3 for Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
Figure 4 for Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
Viaarxiv icon

ML4C: Seeing Causality Through Latent Vicinity

Add code
Oct 01, 2021
Figure 1 for ML4C: Seeing Causality Through Latent Vicinity
Figure 2 for ML4C: Seeing Causality Through Latent Vicinity
Figure 3 for ML4C: Seeing Causality Through Latent Vicinity
Figure 4 for ML4C: Seeing Causality Through Latent Vicinity
Viaarxiv icon