Papers
arxiv:2312.13585

Speech Translation with Large Language Models: An Industrial Practice

Published on Dec 21, 2023
Authors:
,
,
,
,

Abstract

LLM-ST, a speech translation model combining a pre-trained LLM with a speech encoder and multi-task instruction tuning, achieves superior performance with CoT prompting on English and Chinese datasets.

Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long audio inputs. Furthermore, our findings indicate that the implementation of Chain-of-Thought (CoT) prompting can yield advantages in the context of LLM-ST. Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST, establishing a new benchmark in the field of speech translation. Demo: https://speechtranslation.github.io/llm-st/.

Community

Paper author

Webpage and more examples: https://speechtranslation.github.io/llm-st/
:)

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2312.13585
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2312.13585 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2312.13585 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2312.13585 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.