1、Speech Signal Improvement In Real-time CommunicaitonYannan WangTencent Ethereal Audio Lab,Tencent,Shenzhen,ChinaOutline1.Introduction2.Speech Signal Improvement3.Future work2 BackgroundReal-time communication(RTC)systems widely used:Teleconferencing systems Video callsReason for speech quality of cu
2、rrent RTC systems:Device robustness Acoustical capturing Noise/reverberation corruption Interfering speakers Network congestion3Introduction4Device robustnessOutline1.Introduction2.Speech Signal ImprovementI.EnhancementII.Restoration3.Future work56键盘雨声微信消息提示桌子放水杯咳嗽语音降噪7 房间墙壁、天花板、地面、各种物体的反射声波和直达波叠加,降
3、低语音质量和清晰度 传统方法缺陷:难以准确估计纯净语音和混响语音的非线性映射关系 算法需要先验信息较多,收敛较慢 去除混响的成分较少,效果不够明显去混响8说话人提取有感注册无感注册Yukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaOur pervious winner model-TEA-PSE2The 1ststage network:estimate the target speakers magnitude with noisy phaseThe 2ndstage network:estimate the residual re
4、al and imaginary part Use simple concatenation method to combine speaker embeddingRelated WorksYukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaContributionIncorporate a residual LSTM3after squeezed temporal convolution network(S-TCN)to enhance sequence modeling capabilitiesLocal-global repres
5、entation(LGR)4 structure is introduced to boost speaker information extractionMulti-STFT resolution loss5 is used to effectively capture the time-frequency characteristics of the speech signalsRetraining methods are employed based on the freeze training strategy to fine-tune the systemTEA-PSE 3.0 ra
6、nks 1st in both ICASSP 2023 DNS-Challenge track 1 and track 26TEA-PSE 3.0Yukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaNetwork structureSame dual-stage framework as TEA-PSEResidual LSTM is added after every S-TCN module to further enhance the models sequence modeling capabilitiesLocal-globa