Picture for Anurag Kumar

Anurag Kumar

Scaling Concept With Text-Guided Diffusion Models

Add code
Oct 31, 2024
Viaarxiv icon

Using RLHF to align speech enhancement approaches to mean-opinion quality scores

Add code
Oct 17, 2024
Viaarxiv icon

Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation

Add code
Oct 09, 2024
Figure 1 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 2 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 3 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Figure 4 for Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Viaarxiv icon

Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting

Add code
Sep 22, 2024
Figure 1 for Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
Figure 2 for Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
Figure 3 for Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
Figure 4 for Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
Viaarxiv icon

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

Add code
Aug 09, 2024
Figure 1 for Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Figure 2 for Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Figure 3 for Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Figure 4 for Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Viaarxiv icon

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

Add code
Jul 04, 2024
Figure 1 for High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Figure 2 for High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Figure 3 for High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Figure 4 for High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Viaarxiv icon

AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling

Add code
Jun 17, 2024
Viaarxiv icon

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

Add code
Jun 07, 2024
Viaarxiv icon

Cross-Talk Reduction

Add code
May 30, 2024
Viaarxiv icon

Few Shot Class Incremental Learning using Vision-Language models

Add code
May 02, 2024
Viaarxiv icon