Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Oct 13, 2023

Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg

Figure 1 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Figure 2 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Figure 3 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Figure 4 for SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Share this with someone who'll enjoy it:

Abstract:We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a modality adapter module, and LoRA layers to accommodate speech input and associated task instructions. The unified SALM not only achieves performance on par with task-specific Conformer baselines for Automatic Speech Recognition (ASR) and Speech Translation (AST), but also exhibits zero-shot in-context learning capabilities, demonstrated through keyword-boosting task for ASR and AST. Moreover, {\em speech supervised in-context training} is proposed to bridge the gap between LLM training and downstream speech tasks, which further boosts the in-context learning ability of speech-to-text models. Proposed model is open-sourced via NeMo toolkit.

* submit to ICASSP 2024

View paper on

Share this with someone who'll enjoy it:

Title:SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Paper and Code