《【5】优必选的多模态机器学习技术.pdf》由会员分享,可在线阅读,更多相关《【5】优必选的多模态机器学习技术.pdf(38页珍藏版)》请在三个皮匠报告上搜索。
1、优必选的多模态机器学习技术汇 报 人:优必选 人形机器人事业部丁万汇报人简介本科毕业于武汉大学,博士毕业于华中师范大学,曾在新加坡科技局资讯通信研究院任博士后及科学家(Scientist I)职位,主要研究方向为多模态情感识别和多模态语音合成。2019年入职优必选,主要负责优必选在/离线语音合成技术核心算法研发及产品化工作。参与编写了支持语音和视觉交互的虚拟数字人技术规范。曾获 EmotioNet 2017脸部动作单元识别挑战赛第一名,MEC 2017多模态情感识别竞赛第二名,ACII Asia 2018 Outstanding Paper Award等荣誉。丁万人形机器人事业部-专家工程师2
2、2多模态情感识别语音驱动的数字人合成12 23 3总结4 4动机和问题研究动机环境信息计算:人通过多模态感知环境所谓“模态”(Modality),是德国生理学家赫尔姆霍茨提出的一种生物学概念,即生物凭借感知器官与经验来接收信息的通道,例如人类有视觉、听觉、触觉、味觉和嗅觉模态。由学者研究得知,人类感知信息的途径里,通过视觉、听觉、触觉、嗅觉和味觉获取外界信息的比例依次为83%、11%、3.5%、1.5%和1%。多模态是指将多种感官进行融合。It is widely acknowledged that human affective expression consists of a comple
3、x coordination of signals encompassing mostly involuntary(e.g.,physiology),semi voluntary(facial expressions,body movements),and voluntary(e.g.,overt actions such as key presses)responses Ekman 1992;Rosenberg and Ekman 1994.Analyzing multiple signals and their mutual interdependence is expected to y
4、ield models that more accurately reflect the underlying nature of human affective expression.对于情感识别,多模态信息是互补的(There is no one-to-one mapping between an expression and an affective state,for example:)A furrowed brow caused by squinting to focus at something in the distance is diagnostic of a differen
5、t cognitive state(information seeking)than a furrowed brow that accompanies an expression of confusion DMello and Graesser 2014.Furthermore,the same affective state can be differentially expressed as a function of the underlying eliciting stimulus.Forexample,a nearby spider(about to strike)and a spi
6、der across the room elicit different responses because they require different actions even though the underlying affective state(fear)elicited by both situations might be the same Coan 2010.In general,there is a loose coupling between observable expressions and specific affective states;hence,UniMod