《多说话人分离技术及应用进展-洪青阳.pdf》由会员分享,可在线阅读,更多相关《多说话人分离技术及应用进展-洪青阳.pdf(21页珍藏版)》请在三个皮匠报告上搜索。
1、洪青阳合作者:余洪涌、姜跃猛、李朝阳、王捷、李琳厦门大学智能语音实验室2024.3多说话人分离技术及应用进展纲 要1.研究背景2.工业版本模块化系统3.改进方案4.落地应用1.研究背景多说话人分离(说话人日志):给定一个包含多人交替说话的语音,系统需要判断每个时间段是谁在说话。多说话人分离系统音频分割信息1.研究背景应用场景:会议纪要,多说话人转录,智能客服,录音质检等.终端设备:智能手机个人电脑录音笔支持厂商:科大讯飞(智能办公本)、华为(AI纪要)、声云(语音转写).1.研究背景端到端架构模块化架构研究趋势:简单场景复杂场景2000 200220062009 2013 2018 2019
2、2020 2021 2022 2023竞赛/数据集Rich Transcription(RT)AMICALLHOMEDIHARD(I,II,III)CHiME-6M2MeT,AISHELL-4架构MIXER6挑战:噪声干扰,人数未知,语音重叠等应用:离线=在线,单麦克风=多麦克风,适配新场景VoxSRC(20,21,22,23)M2MeT2.0,CHiME-7AliMeeting1.研究背景模块化系统聚类方法:AHC1、SC2,3、VB/VBx4,5、UIS-RNN6、DNC7 1 K.C.Gowda and G.Krishna,“Agglomerative Clustering Using
3、the Concept of Mutual Nearest Neighbourhood,”Pattern Recognition,vol.10,pp.105112,1978.2 U.von Luxburg,“Atutorial on spectral clustering,”Statistics and Computing,vol.17,pp.395416,2007.3 T.Park,Kyu J.Han,Manoj Kumar,and Shrikanth S.Narayanan,“Auto-tuning Spectral Clustering for Speaker Diarization U
4、sing Normalized Maximum Eigengap,”IEEE SignalProcessing Letters,vol.27,pp.381385,2020.4 M.Diez,L.Burget,S.Wang,J.Rohdin,H.Cernocky,“Bayesian HMM based x-vector Clustering for Speaker Diarization,”Interspeech,2019,pp.346-350.5 M.Diez,L.Burget,F.Landini,J.Cernocky,Analysis of Speaker Diarization based
5、 on Bayesian HMM with Eigenvoice Priors,IEEE/ACM Transactions on Audio Speech andLanguage Processing,vol.28,p 355-368,2020.6A.Zhang,Q.Wang,Z.Zhu,J.Paisley,and C.Wang,“Fully Supervised Speaker Diarization,”ICASSP,2019.7 Q.J.Li,F.L.Kreyssig,C.Zhang,P.C.Woodland,“Discriminative Neural Clustering for Sp
6、eaker Diarisation,”IEEE Spoken Language Technology Workshop(SLT 2021),Jan 2021,Shenzhen,China.1.研究背景端到端系统端到端系统EEND1SA-EEND2TS-VAD4基于Bi-LSTM的端到端模型目标说话人音频端点检测模型1 Y.Fujita,N.Kanda,S.Horiguchi,K.Nagamatsu,and S.Watanabe,“End-to-end Neural Speaker Diarization with Permutation-free Objectives,”in Interspe