Picture for Chiori Hori

Chiori Hori

Factorized RVQ-GAN For Disentangled Speech Tokenization

Add code
Jun 18, 2025
Viaarxiv icon

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization

Add code
Feb 27, 2024
Figure 1 for NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization
Figure 2 for NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization
Figure 3 for NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization
Figure 4 for NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization
Viaarxiv icon

Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks

Add code
Dec 11, 2023
Figure 1 for Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks
Figure 2 for Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks
Figure 3 for Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks
Figure 4 for Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks
Viaarxiv icon

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

Add code
Oct 30, 2023
Figure 1 for Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction
Figure 2 for Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction
Figure 3 for Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction
Figure 4 for Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction
Viaarxiv icon

Generation or Replication: Auscultating Audio Latent Diffusion Models

Add code
Oct 16, 2023
Figure 1 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 2 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 3 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 4 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Viaarxiv icon

Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos

Add code
Jun 27, 2023
Figure 1 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 2 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 3 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 4 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Viaarxiv icon

D Spatio-Temporal Scene Graphs for Video Question Answering

Add code
Feb 18, 2022
Figure 1 for D Spatio-Temporal Scene Graphs for Video Question Answering
Figure 2 for D Spatio-Temporal Scene Graphs for Video Question Answering
Figure 3 for D Spatio-Temporal Scene Graphs for Video Question Answering
Figure 4 for D Spatio-Temporal Scene Graphs for Video Question Answering
Viaarxiv icon

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

Add code
Oct 13, 2021
Figure 1 for Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Figure 2 for Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Figure 3 for Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Figure 4 for Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Viaarxiv icon

Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers

Add code
Aug 04, 2021
Figure 1 for Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Figure 2 for Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Figure 3 for Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Figure 4 for Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Viaarxiv icon

Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers

Add code
Apr 19, 2021
Figure 1 for Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Figure 2 for Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Figure 3 for Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Viaarxiv icon