Picture for Anurag Kumar

Anurag Kumar

Hearing Anywhere in Any Environment

Add code
Apr 14, 2025
Viaarxiv icon

Quickest change detection for UAV-based sensing

Add code
Apr 10, 2025
Viaarxiv icon

Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment

Add code
Jan 30, 2025
Figure 1 for Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment
Figure 2 for Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment
Figure 3 for Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment
Figure 4 for Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment
Viaarxiv icon

SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models

Add code
Jan 14, 2025
Figure 1 for SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Figure 2 for SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Figure 3 for SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Figure 4 for SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Viaarxiv icon

Bridging Context Gaps: Enhancing Comprehension in Long-Form Social Conversations Through Contextualized Excerpts

Add code
Dec 28, 2024
Viaarxiv icon

Scaling Concept With Text-Guided Diffusion Models

Add code
Oct 31, 2024
Viaarxiv icon

Using RLHF to align speech enhancement approaches to mean-opinion quality scores

Add code
Oct 17, 2024
Viaarxiv icon

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation

Add code
Oct 09, 2024
Figure 1 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 2 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 3 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 4 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Viaarxiv icon

Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting

Add code
Sep 22, 2024
Figure 1 for Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
Figure 2 for Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
Figure 3 for Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
Figure 4 for Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
Viaarxiv icon

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

Add code
Aug 09, 2024
Figure 1 for Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Figure 2 for Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Figure 3 for Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Figure 4 for Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Viaarxiv icon