报告预览

孙豪泽-Baichuan-Omni-1.5：百川智能在端到端多模态大模型的实践探索.pdf

编号：631147

PDF 43页 20.40MB 下载积分：VIP专享

下载报告请您先登录！

孙豪泽-Baichuan-Omni-1.5：百川智能在端到端多模态大模型的实践探索.pdf

1、ML-SummitML-Summitwww.cpp-www.ml-summit.orgwww.gosim.orgwww.pm-summit.orgML-SummitML-SummitML-SummitML-SummitML-SummitML-Summit孙孙豪豪泽泽百百川川智智能能多多模模态态负负责责人人2017年毕业于北京大学，在NLP，搜索和推荐领域均有过行业一线的实践经历。加入百川智能以来先后从事文本预训练，SFT，code Agent以及多模态预训练相关工作，近期专注于全模态模型，特别是语音端到端模型的算法探索。目前开源的Baichuan-Omni-1.5全模态模型在文本能力，图像/

2、视频理解，语音理解和生成效果上取得了最佳的平衡。演演讲讲主主题题：B Ba ai ic ch hu ua an n-O Ommn ni i-1 1.5 5：百百川川智智能能在在端端到到端端多多模模态态大大模模型型的的实实践践探探索索ML-SummitML-Summit2 20 02 25 5 全球机器学习技术大会百百川川智智能能在在端端到到端端多多模模态态大大模模型型的的实实践践探探索索孙豪泽百川多模态团队ML-SummitML-Summit目目录录Baichuan-Audio 端到端语音模型Baichuan-Omni-1.5 全模态模型实践未来展望ML-SummitML-Summit端到端

3、语音理解生成模型ML-SummitML-SummitMoshi:1.Full-duplex model capable of simultaneously generating audio tokens and text tokens through a multi-stream output mechanism.2.Balance semantic and acoustic features through distillation,similar to the approach of SpeechTokenizer.3.Streamable low-latency generation.M

4、ain Limitations:1.Dual-channel(text&speech)input requires extensive training from scratch,placing high demands on the LLMs parameter scale.2.Exhibits a noticeable intelligence degradation compared to text-only models.Compared to pure text-based dialogue models,their performance tends to degrade.End

5、to End Audio-LLM FrameworkMoshi:a speech-text foundation model for real-time dialogueML-SummitML-SummitVITA-1.5/MiniCPM-o/Freeze-Omni/etc.1.Pseudo end-to-end:three paradigms for text and speech input-output processing2.No intelligence degradation problem.3.Low training cost.Main Limitations:1.Inabil

6、ity to utilize pure audio data or reuse partial LLM parameters,limiting the scalability.2.Lack of paradigm unification between audio understanding,generation,and text processing,making it difficult to leverage advancements in the text domain for future improvements(e.g.RL).3.Potential deficiencies i

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（孙豪泽-Baichuan-Omni-1.5：百川智能在端到端多模态大模型的实践探索.pdf）为本站（哆哆）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。