Picture for Shusuke Takahashi

Shusuke Takahashi

SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation

Add code
Dec 18, 2024
Viaarxiv icon

Music Foundation Model as Generic Booster for Music Downstream Tasks

Add code
Nov 05, 2024
Figure 1 for Music Foundation Model as Generic Booster for Music Downstream Tasks
Figure 2 for Music Foundation Model as Generic Booster for Music Downstream Tasks
Figure 3 for Music Foundation Model as Generic Booster for Music Downstream Tasks
Figure 4 for Music Foundation Model as Generic Booster for Music Downstream Tasks
Viaarxiv icon

OpenMU: Your Swiss Army Knife for Music Understanding

Add code
Oct 21, 2024
Viaarxiv icon

Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models

Add code
Oct 02, 2024
Figure 1 for Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Figure 2 for Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Figure 3 for Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Figure 4 for Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Viaarxiv icon

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

Add code
Jun 26, 2024
Viaarxiv icon

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training

Add code
Jun 04, 2024
Figure 1 for MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
Figure 2 for MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
Figure 3 for MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
Figure 4 for MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
Viaarxiv icon

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Add code
May 23, 2024
Figure 1 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Figure 2 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Figure 3 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Figure 4 for Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Viaarxiv icon

Zero- and Few-shot Sound Event Localization and Detection

Add code
Sep 17, 2023
Figure 1 for Zero- and Few-shot Sound Event Localization and Detection
Figure 2 for Zero- and Few-shot Sound Event Localization and Detection
Figure 3 for Zero- and Few-shot Sound Event Localization and Detection
Figure 4 for Zero- and Few-shot Sound Event Localization and Detection
Viaarxiv icon

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

Add code
Aug 14, 2023
Figure 1 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Figure 2 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Figure 3 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Figure 4 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Viaarxiv icon

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

Add code
Jun 15, 2023
Figure 1 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 2 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 3 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 4 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Viaarxiv icon