Picture for Chiori Hori

Chiori Hori

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization

Add code
Feb 27, 2024
Viaarxiv icon

Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks

Add code
Dec 11, 2023
Viaarxiv icon

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

Add code
Oct 30, 2023
Viaarxiv icon

Generation or Replication: Auscultating Audio Latent Diffusion Models

Add code
Oct 16, 2023
Figure 1 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 2 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 3 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 4 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Viaarxiv icon

Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos

Add code
Jun 27, 2023
Viaarxiv icon

D Spatio-Temporal Scene Graphs for Video Question Answering

Add code
Feb 18, 2022
Figure 1 for D Spatio-Temporal Scene Graphs for Video Question Answering
Figure 2 for D Spatio-Temporal Scene Graphs for Video Question Answering
Figure 3 for D Spatio-Temporal Scene Graphs for Video Question Answering
Figure 4 for D Spatio-Temporal Scene Graphs for Video Question Answering
Viaarxiv icon

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

Add code
Oct 13, 2021
Figure 1 for Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Figure 2 for Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Figure 3 for Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Figure 4 for Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Viaarxiv icon

Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers

Add code
Aug 04, 2021
Figure 1 for Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Figure 2 for Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Figure 3 for Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Figure 4 for Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Viaarxiv icon

Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers

Add code
Apr 19, 2021
Figure 1 for Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Figure 2 for Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Figure 3 for Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Viaarxiv icon

Multi-Pass Transformer for Machine Translation

Add code
Sep 23, 2020
Figure 1 for Multi-Pass Transformer for Machine Translation
Figure 2 for Multi-Pass Transformer for Machine Translation
Figure 3 for Multi-Pass Transformer for Machine Translation
Figure 4 for Multi-Pass Transformer for Machine Translation
Viaarxiv icon