Picture for Siddhant Arora

Siddhant Arora

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems

Add code
Mar 11, 2025
Viaarxiv icon

Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics

Add code
Mar 03, 2025
Viaarxiv icon

ESPnet-SpeechLM: An Open Speech Language Model Toolkit

Add code
Feb 21, 2025
Viaarxiv icon

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

Add code
Dec 23, 2024
Viaarxiv icon

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Add code
Nov 08, 2024
Figure 1 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 2 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 3 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Figure 4 for Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Viaarxiv icon

Task Arithmetic for Language Expansion in Speech Translation

Add code
Sep 17, 2024
Figure 1 for Task Arithmetic for Language Expansion in Speech Translation
Figure 2 for Task Arithmetic for Language Expansion in Speech Translation
Figure 3 for Task Arithmetic for Language Expansion in Speech Translation
Figure 4 for Task Arithmetic for Language Expansion in Speech Translation
Viaarxiv icon

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

Add code
Sep 14, 2024
Viaarxiv icon

Decoder-only Architecture for Streaming End-to-end Speech Recognition

Add code
Jun 23, 2024
Figure 1 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Figure 2 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Figure 3 for Decoder-only Architecture for Streaming End-to-end Speech Recognition
Viaarxiv icon

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

Add code
Jun 18, 2024
Figure 1 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 2 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 3 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Figure 4 for Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model
Viaarxiv icon

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting

Add code
Jun 18, 2024
Figure 1 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 2 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 3 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Figure 4 for Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting
Viaarxiv icon