Picture for Masahiro Yasuda

Masahiro Yasuda

M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation

Add code
Jun 04, 2024
Viaarxiv icon

Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis

Add code
Apr 12, 2024
Viaarxiv icon

6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human

Add code
Mar 04, 2024
Viaarxiv icon

First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline

Add code
Mar 01, 2023
Viaarxiv icon

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

Add code
Feb 18, 2022
Figure 1 for Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion
Figure 2 for Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion
Figure 3 for Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion
Figure 4 for Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion
Viaarxiv icon

Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments

Add code
Feb 18, 2022
Figure 1 for Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments
Figure 2 for Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments
Figure 3 for Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments
Figure 4 for Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments
Viaarxiv icon

Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head

Add code
Feb 17, 2022
Figure 1 for Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head
Figure 2 for Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head
Figure 3 for Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head
Figure 4 for Wearable SELD dataset: Dataset for sound event localization and detection using wearable devices around head
Viaarxiv icon

APPLADE: Adjustable Plug-and-play Audio Declipper Combining DNN with Sparse Optimization

Add code
Feb 16, 2022
Figure 1 for APPLADE: Adjustable Plug-and-play Audio Declipper Combining DNN with Sparse Optimization
Figure 2 for APPLADE: Adjustable Plug-and-play Audio Declipper Combining DNN with Sparse Optimization
Figure 3 for APPLADE: Adjustable Plug-and-play Audio Declipper Combining DNN with Sparse Optimization
Figure 4 for APPLADE: Adjustable Plug-and-play Audio Declipper Combining DNN with Sparse Optimization
Viaarxiv icon

ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions

Add code
Jun 04, 2021
Figure 1 for ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions
Figure 2 for ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions
Figure 3 for ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions
Figure 4 for ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions
Viaarxiv icon

Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval

Add code
Dec 14, 2020
Figure 1 for Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval
Figure 2 for Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval
Figure 3 for Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval
Figure 4 for Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval
Viaarxiv icon