Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Sep 29, 2024

Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin

Figure 1 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Figure 2 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Figure 3 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Figure 4 for CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Share this with someone who'll enjoy it:

Abstract:Speech Language Models (SLMs) have demonstrated impressive performance on speech translation tasks. However, existing research primarily focuses on direct instruction fine-tuning and often overlooks the inherent reasoning capabilities of SLMs. In this paper, we introduce a three-stage training framework designed to activate the chain-of-thought (CoT) capabilities of SLMs. We propose CoT-ST, a speech translation model that utilizes multimodal CoT to decompose speech translation into sequential steps of speech recognition and translation. We validated the effectiveness of our method on two datasets: the CoVoST-2 dataset and MuST-C dataset. The experimental results demonstrate that CoT-ST outperforms previous state-of-the-art methods, achieving higher BLEU scores (CoVoST-2 en-ja: 30.5->30.8, en-zh: 45.2->47.7, MuST-C en-zh: 19.6->21.2). This work is open sourced at https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/st_covost2 .

View paper on

Share this with someone who'll enjoy it:

Title:CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Paper and Code