2-4 基于大数据的复杂场景的语音识别的探索与实践.pdf

编号:102409 PDF 41页 5.49MB 下载积分:VIP专享
下载报告请您先登录!

2-4 基于大数据的复杂场景的语音识别的探索与实践.pdf

1、The exploration of complex Large-scale databased scenario automatic speech recognitionComplex scenario ASR in ZOOMHaoyu(Charlie)TangApril 24,2022Zoom AI/ML EngineeringContent1.Introduction to automatic speech recognition2.End-to-End automatic speech recognition3.Model innovation4.Training pipeline i

2、nnovation5.Large scale data model training acceleration in ZOOM6.Summary and next step1Introduction to automatic speechrecognitionWhat is automatic speech recognitionAutomatic Speech Recognition(ASR):generate texts from given audiowav,argmax(P(Y|X)Figure 1:ASR1Conventional Method:Acoustic model,Lang

3、uage model andPronunciation dict/model.End-To-End Method:Main Model,and language model(optional).1https:/ History of ASRFigure 2:ASR history22https:/sonix.ai/history-of-speech-recognition.3Brief History of ASRFigure 3:Recent decade ASR history33https:/sonix.ai/history-of-speech-recognition.4ASR:curr

4、ent problemsCurrent problem in ASR for live,meeting and online chat scenario:1.Spontaneous but most ASR open data is read sound2.Open-set recognition+Large vocabulary3.Noise especially for background music4.Accent independent5.Code-switch6.Free switch between far-field and near-field since moving sp

5、eak5End-to-End automatic speechrecognitionA standard end-to-end(E2E)ASR architectureFigure 4:A standard end-to-end(E2E)ASR architecture4This figure shows two standard E2E modeling method:CTC andencoder-attention-decoder.And These could be combined togetheras CTC-ATT architecture.4Watanabe et al.,“Hy

6、brid CTC/attention architecture for end-to-end speechrecognition”.6ATT-CTC Training and Inference5Loss combine:LMTL=LCTC+(1 )LAttention(1)Figure 5:Loss combineJoint decoding/rescoring:C=argmaxlogpctc(h|X)+(1 )patt(h|X)(2)5Watanabe et al.,“Hybrid CTC/attention architecture for end-to-end speechrecogn

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(2-4 基于大数据的复杂场景的语音识别的探索与实践.pdf)为本站 (云闲) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠