Picture for Kazuki Shimada

Kazuki Shimada

SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation

Add code
Dec 18, 2024
Viaarxiv icon

Music Foundation Model as Generic Booster for Music Downstream Tasks

Add code
Nov 05, 2024
Figure 1 for Music Foundation Model as Generic Booster for Music Downstream Tasks
Figure 2 for Music Foundation Model as Generic Booster for Music Downstream Tasks
Figure 3 for Music Foundation Model as Generic Booster for Music Downstream Tasks
Figure 4 for Music Foundation Model as Generic Booster for Music Downstream Tasks
Viaarxiv icon

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

Add code
Dec 31, 2023
Viaarxiv icon

Zero- and Few-shot Sound Event Localization and Detection

Add code
Sep 17, 2023
Figure 1 for Zero- and Few-shot Sound Event Localization and Detection
Figure 2 for Zero- and Few-shot Sound Event Localization and Detection
Figure 3 for Zero- and Few-shot Sound Event Localization and Detection
Figure 4 for Zero- and Few-shot Sound Event Localization and Detection
Viaarxiv icon

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

Add code
Jun 15, 2023
Figure 1 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 2 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 3 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 4 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Viaarxiv icon

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

Add code
May 18, 2023
Viaarxiv icon

Diffusion-based Signal Refiner for Speech Separation

Add code
May 12, 2023
Figure 1 for Diffusion-based Signal Refiner for Speech Separation
Figure 2 for Diffusion-based Signal Refiner for Speech Separation
Figure 3 for Diffusion-based Signal Refiner for Speech Separation
Figure 4 for Diffusion-based Signal Refiner for Speech Separation
Viaarxiv icon

Extending Audio Masked Autoencoders Toward Audio Restoration

Add code
May 11, 2023
Viaarxiv icon

An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification

Add code
Feb 16, 2023
Figure 1 for An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification
Figure 2 for An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification
Figure 3 for An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification
Viaarxiv icon

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

Add code
Jun 04, 2022
Figure 1 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Figure 2 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Figure 3 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Figure 4 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Viaarxiv icon