智谱：GLM-4-Voice技术报告（英文原版+译版）（14页）.pdf

智谱：GLM-4-Voice技术报告（中译版）（14页）.pdf

下载：

《智谱：GLM-4-Voice技术报告（英文版）（14页）.pdf》由会员分享，可在线阅读，更多相关《智谱：GLM-4-Voice技术报告（英文版）（14页）.pdf（14页珍藏版）》请在三个皮匠报告上搜索。

1、GLM-4-Voice:Towards Intelligent and Human-LikeEnd-to-End Spoken ChatbotAohan Zeng,Zhengxiao Du,Mingdao Liu,Kedong Wang,Shengmin Jiang,Lei ZhaoYuxiao Dong,Jie TangZhipu.AITsinghua Universityhttps:/ introduce GLM-4-Voice,an intelligent and human-like end-to-end spoken chat-bot.It supports both Chinese

2、 and English,engages in real-time voice conversations,and varies vocal nuances such as emotion,intonation,speech rate,and dialectaccording to user instructions.GLM-4-Voice uses an ultra-low bitrate(175bps),single-codebook speech tokenizer with 12.5Hz frame rate derived from an auto-matic speech reco

3、gnition(ASR)model by incorporating a vector-quantized bottle-neck into the encoder.To effi ciently transfer knowledge from text to speech modal-ities,we synthesize speech-text interleaved data from existing text pre-trainingcorpora using a text-to-token model.We continue pre-training from the pre-tr

4、ainedtext language model GLM-4-9B with a combination of unsupervised speech data,interleaved speech-text data,and supervised speech-text data,scaling up to 1 trilliontokens,achieving state-of-the-art performance in both speech language modelingand spoken question answering.We then fi ne-tune the pre

5、-trained model withhigh-quality conversational speech data,achieving superior performance comparedto existing baselines in both conversational ability and speech quality.The openmodels can be accessed throughhttps:/ https:/huggingface.co/THUDM/glm-4-voice-9b.1IntroductionThe success of large languag

6、e models(LLMs)has driven signifi cant advancements in conversationalAI,enabling the development of text-based chatbots and digital assistants.However,LLMs areprimarily designed to process text input and generate text output,focusing on semantic and logicalcommunication.In contrast,human communicatio

智谱：GLM-4-Voice技术报告（中译版）（14页）.pdf

智谱：GLM-4-Voice技术报告（中译版）（14页）.pdf

相关报告