Picture for Hoirin Kim

Hoirin Kim

Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition

Add code
Jul 04, 2024
Viaarxiv icon

One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection

Add code
Jun 24, 2024
Viaarxiv icon

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models

Add code
Dec 14, 2023
Viaarxiv icon

Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation

Add code
May 19, 2023
Viaarxiv icon

Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination

Add code
Oct 26, 2022
Viaarxiv icon

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

Add code
Jul 01, 2022
Figure 1 for FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Figure 2 for FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Figure 3 for FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Figure 4 for FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Viaarxiv icon

Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck

Add code
Apr 04, 2022
Figure 1 for Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck
Figure 2 for Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck
Figure 3 for Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck
Figure 4 for Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck
Viaarxiv icon

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

Add code
Mar 30, 2022
Figure 1 for Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings
Figure 2 for Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings
Figure 3 for Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings
Figure 4 for Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings
Viaarxiv icon

Perceptually Guided End-to-End Text-to-Speech

Add code
Nov 02, 2020
Figure 1 for Perceptually Guided End-to-End Text-to-Speech
Figure 2 for Perceptually Guided End-to-End Text-to-Speech
Figure 3 for Perceptually Guided End-to-End Text-to-Speech
Viaarxiv icon

A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments

Add code
Oct 06, 2020
Figure 1 for A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments
Figure 2 for A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments
Figure 3 for A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments
Figure 4 for A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments
Viaarxiv icon