Picture for Tomohiro Tanaka

Tomohiro Tanaka

All-in-One ASR: Unifying Encoder-Decoder Models of CTC, Attention, and Transducer in Dual-Mode ASR

Add code
Dec 12, 2025
Viaarxiv icon

Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models

Add code
Nov 13, 2025
Figure 1 for Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models
Figure 2 for Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models
Figure 3 for Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models
Figure 4 for Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models
Viaarxiv icon

Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition

Add code
Oct 16, 2025
Figure 1 for Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition
Figure 2 for Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition
Figure 3 for Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition
Figure 4 for Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition
Viaarxiv icon

Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model

Add code
Sep 10, 2025
Figure 1 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Figure 2 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Figure 3 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Figure 4 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Viaarxiv icon

Constant Rate Schedule: Constant-Rate Distributional Change for Efficient Training and Sampling in Diffusion Models

Add code
Nov 19, 2024
Figure 1 for Constant Rate Schedule: Constant-Rate Distributional Change for Efficient Training and Sampling in Diffusion Models
Figure 2 for Constant Rate Schedule: Constant-Rate Distributional Change for Efficient Training and Sampling in Diffusion Models
Figure 3 for Constant Rate Schedule: Constant-Rate Distributional Change for Efficient Training and Sampling in Diffusion Models
Figure 4 for Constant Rate Schedule: Constant-Rate Distributional Change for Efficient Training and Sampling in Diffusion Models
Viaarxiv icon

Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion

Add code
Sep 04, 2023
Viaarxiv icon

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

Add code
Jun 14, 2023
Figure 1 for SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Figure 2 for SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Figure 3 for SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Figure 4 for SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Viaarxiv icon

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization

Add code
Jun 07, 2023
Viaarxiv icon

End-to-End Joint Target and Non-Target Speakers ASR

Add code
Jun 04, 2023
Figure 1 for End-to-End Joint Target and Non-Target Speakers ASR
Figure 2 for End-to-End Joint Target and Non-Target Speakers ASR
Figure 3 for End-to-End Joint Target and Non-Target Speakers ASR
Viaarxiv icon

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

Add code
May 25, 2023
Viaarxiv icon