全球化研究报告-PDF版

您的当前位置：首页 > 英文报告 > AI/科技通信

报告分类

一级分类：

全部人工智能信息科技互联网消费经济汽车交通电商零售传媒娱乐医疗健康投资金融能源环境地产建筑传统产业英文报告其它

二级分类：

全部国际报告中期报告年度报告 AI/科技通信商贸消费传媒与营销 HR/企业管理旅游文教金融投资物流交通建筑地产能源环境医药/大健康工业制造农林牧渔全球经济美股招股书国际投行报告 ESG/CSR

data.ai & IDC：2022年聚焦游戏领域报告（英文版）（32页）.pdf
此通信或任何数据中没有任何内容。Ai产品、服务或其他产品应被解释为要约、推荐或招揽购买或出售任何证券或投资，或作出任何投资决定。任何对过去或潜在表现的提及都不是，也不应该被理解为对任何具体结果的推荐或.

2022-06-27 32页 5星级
PLEX：2022年智能制造行业现状报告-可持续发展/ESG版（英文版）（32页）.pdf
智能制造:对工厂和整个价值链内的业务、物理和数字流程进行智能、实时的编排和优化。基于尽可能接近实时的所有可用信息，对资源和流程进行自动化、集成、监控和持续评估。

2022-06-27 32页 5星级
NPR：智能语音报告（英文版）（17页）.pdf
24%的美国成年人18+拥有智能音箱，约6000万人18岁以上的美国成年人中有54%曾经使用过语音指令

2022-06-25 17页 5星级
波士顿咨询（BCG）：人工智能在世界最大公司带来的信任问题（英文版）（40页）.pdf
今天，我们与客户密切合作，采用一种转型的方法，旨在使所有利益相关者受益，增强组织的发展，建立可持续的竞争优势，并推动积极的社会影响。我们多元化的全球团队带来了深入的行业和职能专业知识，以及质疑现状和激.

2022-06-22 40页 5星级
Encryption Consulting：2022年全球公钥基础设施（PKI）和物联网趋势调查报告（英文版）（24页）.pdf
Encryption Consulting发布了2022年全球加密趋势研究报告。数字证书正在成为当今加密领域的必需品。随着云应用的不断普及，物联网(IoT)安全成为拥有核心关键数据的企业最关注的问题.

2022-06-22 24页 5星级
Neustar：全球机器人呼叫发展白皮书（英文版）（15页）.pdf
但由于欺骗性语音电话造成的损失却占44%。需要注意的是，语音电话也可以针对家庭和企业的固定电话。在拉丁美洲，2021年由于语音电话造成的欺诈损失总额的49%来自巴西和墨西哥。Juniper Resea.

2022-06-16 15页 5星级
Dealroom & Tech Nation：英国科技更新报告-2022年伦敦科技周（英文版）（16页）.pdf
尽管伦敦高居榜首，但还有3个英国城市跻身欧洲20大未来独角兽城市之列。继创纪录的第一季度之后，英国在2022年的创业融资排名全球第二。到目前为止，英国初创公司在2022年获得的投资超过了印度和中国，在.

2022-06-15 16页 5星级
data.ai：2022年印度尼西亚移动市场报告（英文版）（61页）.pdf
应用程序中的消费者时间和交易继续对不同的垂直领域超收费从游戏到零售，再到食品/叫车，现在是金融、金融科技和教育科技。对于各大品牌来说，积极投资开发强大的基于应用的见解和应用功能，以充分利用用户注意力和.

2022-06-14 61页 5星级
Talkdesk：通过人工智能和自动化实现人性化的数字客户体验（英文版）（11页）.pdf
通过人工智能和自动化，公司可以在整个体验过程中消除低效，从而促进与更投入的代理更好地互动。当他们建立一个专注的前线，组织最终将提供客户所寻求的个人支持水平，无论渠道。本报告将阐述数字化客户体验人性化的.

2022-06-13 11页 5星级
EPoSS：2021智能系统中的边缘AI白皮书：技术潜力、行业应用及未来发展（英文版）（52页）.pdf
2021 WHITE PAPERAI AT THE EDGEAI AT THE EDGE2021 WHITE PAPER 1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS3Content1 Introduction to“AI at the Edge”for Smart Systems .51.1 How to read this document .51.2 Artificial intelligence at the Edge:introduction to the topic .61.2.1 Definition of“AI at the Edge”.61.3 Impact for the European Smart Systems industry and market .71.4 Technical advantages and opportunities of AI at the Edge .71.5 Cost effectiveness .82 Applications for“AI at the Edge”.92.1 Automotive and multi-modal mobility .92.1.1 Vehicle intelligence and V2X communication.92.1.2 Occupant activity“understanding”.92.1.3 Air path control and diagnostics .102.1.4 Battery lifecycle management .102.1.5 Thermal management of the powertrain .102.1.6 Autonomous(Inland)ships by Project A-Swarm .112.1.7 Massive sensor technology and networks .112.2 Energy .122.2.1 Smart Grid .122.2.2 Smart buildings .132.3 Digital Industry .142.3.1 Predictive maintenance .142.3.2 Reliable prevention of early failures .142.3.3 Robot co-working .142.4 Health and wellbeing .152.4.1 Vital sensing with radar.152.4.2 Personalised medicine .152.4.3 Affective computing(“AI of emotions”).162.4.4 Sport analytics .162.4.5 Physiological monitoring .172.5 Agriculture,Farming and Natural Resources .172.5.1 Automated weeding .172.5.2 Drones for precision agriculture .182.5.3 Soil control .182.6 Smart Cities .192.6.1 Smart streetlights .192.6.2 Air quality .192.6.3 Alarm Systems .19EPoSS WHITE PAPER AI at the Edge43 State-of-the-art of“AI at the Edge”.213.1 Edge AI for smarter systems .213.2 Hardware for edge AI .223.3 Machine learning models for edge AI .243.4 Distributed learning at the Edge .253.5 Frameworks and platforms for AI at the Edge .263.6 Orchestration of AI between cloud and edge resources .273.7 Hardware-software co-design for AI at Edge .304 Future Challenges and Trends.314.1 Trust and explainability .314.2 Re-learning .314.3 Security and adversarial attacks .314.4 Learning at the Edge .334.5 Integrating AI into the smallest devices .334.6 Data as a basis for AI .334.7 Neuromorphic technologies .344.8 Meta-learning .344.9 Hybrid modelling .354.10 Energy efficiency .365 Milestones for AI at the Edge in Smart Systems .376 Policy Recommendations.386.1 Sustainable business model innovation .386.2 Our vision cross domain technology stack .396.3 Common standards .406.4 Heterogeneous approaches,multiple vendors .416.5 Education and network building .416.6 Data collection,testing and experimentation facilities for AI at the Edge .427 Summary .43 References .45 Authors .50 1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS51 Introduction to“AI at the Edge”for Smart SystemsIn this paper members of the European Platform on Smart Systems Integration(EPoSS)have collected their views on the benefits of incorporating Artificial Intelligence in future Smart devices and defined the actions required to achieve this to implement“AI at the Edge”.1.1 How to read this documentTo address individual needs of our audience,we divided the structure of this document into two parts:the first part focuses on status quo:Chapter 1 includes the market potential of AI at the edge,Chapter 2 describes the pos-sible application domains and challenges and presents the state-of-the-art technologies that are available now.The second part addresses the future.Chapters 4 and 5 include the novel technologies,trends,and technological milestones that will drive the future research activities in the next ten years.Chapter 6 outlines the recommen-dations of the experts to the political decision makes,in order to seize the full potential of AI at the Edge.Finally,the whitepaper concludes with the Summary of the major insights and recommendations.1.MarketIoT,Edge,AI2.ApplicationsAutomotive,Energy,Industry,Health,Agriculture,Smart Cities 4.Future Challenges and Trends5.Milestones for AI at the Edge in Smart Systems6.Policy Recommendations7.Summary3.State-of-the-artHardware,Machine Learning,Distributed Learning over Edge and Cloud,Frameworks,Hardware and Software Co-Design TODAYFUTUREpotentialconclusionsour vision and callfor actionsEPoSS WHITE PAPER AI at the Edge61.2 ArtificialintelligenceattheEdge:introductiontothetopic1.2.1 DEFINITION OF“AI AT THE EDGE”Artificial Intelligence(AI)is a technical system which has the ability to mimic human intelligence as characterized by behaviours such as sensing,learning,understanding,decision-making,and acting.Owing to the availability of powerful computing hardware(GPUs and specialist architectures)and of large amounts of data,AI solutions espe-cially Machine Learning(ML)and more specifically Deep Learning(DL)have found numerous and widespread ap-plications over the past two decades(such as image recognition,fault detection or automated driving functions).Due to their reliance on large amounts of data,most current AI solutions require large-scale cloud data centres for computationally demanding processing tasks.Nevertheless,we are now in a new information-centric era in which computing is becoming pervasive and ubiquitous,thanks to the billion IoT devices connected to the Internet,and increasing digitalisation generates Zettabytes of data every year.Consequently,edge computing is emerging as a strong alternative to traditional cloud computing,enabling new types of applications(such as connected health,autonomous driving,Industry 4.0)with the advantage of implementing the required AI solutions as close as possi-ble to the end-users and the data sources.Figure 1:Positioning of edge/extreme edge.The data processing stack for the Internet of Things consists of three lay-ers:the edge layer,the fog layer and the cloud layer.In this paper we mainly address AI implementation at the Edge,close to smart sensors.The lowest layer represents the current use of AI-enhanced systems,often acting in a single system.As complexity and functionality increases,interaction between several AI systems is needed(between sensors for sensor fusion,between the AI model and a simulation model(Digital Twin)in Hybrid AI,or between a several AI systems).Ultimately,the highest complexity and functionality is achieved with distributed systems of systems(swarm AI,general intelligence).Figure 1:Positioning of edge/extreme edge.The data processing stack for the Internet of Things consists of three lay-ers:the edge layer,the fog layer and the cloud layer.In this paper we mainly address AI implementation at the Edge,close to smart sensors.The lowest layer represents the current use of AI-enhanced systems,often acting in a single system.As complexity and functionality increases,interaction between several AI systems is needed(between sensors for sensor fusion,between the AI model and a simulation model(Digital Twin)in Hybrid AI,or between a several AI systems).Ultimately,the highest complexity and functionality is achieved with distributed systems of systems(swarm AI,general intelligence).INDUSTRIAL IoT DATA PROCESSING LAYER STACKCLOUD LAYER Big Data Processing Business Logic Data WarehousingFOG LAYER Local Network Data Analysis&Reduction Control Response Virtualization/StandardizationEDGE LAYER Large Volume Real-time Data Processing At Source/On Premises Data Visualization Industrial PCs Embedded Systems Gateways Micro Data StorageSENSORS&CONTROLLERS(Data Origination and Use)Business Analytics/IntelligenceApplication ApplicationApplication ApplicationApplication ApplicationPROCESSING SPEED/RESPONSE TIMEslowerfasterFog Node/ServerFog Node/ServerFog Node/ServerDATA FLOW 1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS7Although a consensus across academia and industry on a worldwide AI roadmap is still to be reached,one fact is that the pure dominance of cloud computing will come to an end.To exploit the full potential of AI,data processing solutions will run on distributed edge computing nodes,interconnected by next-generation IoT platforms and communications.In future functionality,energy consumption,stability,resilience,robust-ness and safety constraints will define their features.Figure 1 shows an example of the interaction between cloud and edge computing as envisioned for the future.1.3 Impact for the European Smart Systems industry and marketFactors driving the demand on edge computing solutions include a growing adoption of the Internet of Things(IoT)from 7%in 2019 to 12%by 2025 across industry1.Low-latency processing and real-time,automated decision-making and a need for processing exponentially increasing data volumes and network traffic require novel edge-based approaches.Moreover,the emergence of autonomous vehicles,wearable devices,connect-ed infrastructures and the need for lightweight frameworks and systems(to enhance the efficiency of edge computing solutions)will create additional market opportunities for edge computing vendors.The Linux Foun-dation estimates in its“The 2021 State of the Edge”-Report2 that between 2019 and 2028 up to$800 billion USD infrastructure investments will be rquired to cover the growing device and infrastructure edge demand.The expected investments into IoT devices and infrastructure edges will be relatively evenly split.Technologies for wireless connectivity such as 5G are acting as a catalyst for market growth up to 35GR alone for industrial IoT-solutions3 with the total market for intelligent industrial edge computing(hardware,software,services)growing from$11.6B in 2019 to$30.8B by 2025.Current leaders in cloud technologies see this as an opportunity to increase their market share and have started investing in the edge ecosystem by engaging in partnerships with global telecom companies and smaller innovative vendors4.It is quite evident that 5G,and its predicted benefits,has the potential to create a powerful network-based technology that is expected to reorganize industrial value chains5.Yole Dveloppement forecasts for AI computing in consumer applications(in particular stand-alone and em-bedded sound and vision processors)a market increase from USD 2.3 billion in 2018 to 15.6 billion in 2024 at an average CAGR of 37.5%6.For AI in the automotive field(robotic vehicles,infotainment and ADAS)revenues starting from USD 174 million in 2018 to 13.8 billion in 2028(average CAGR of 49%)7,in AI for medical imaging from USD 332 million in 2019 to 2,886 billion in 2025(CAGR of 36%)8,and for neuromorphic computing and sensing an increase of the markets in the mobile,consumer,computing,automotive,medical and industrial fields from USD 112 million in 2024 to 25.993 billion in 2034(CAGR of 64%)9.1.4 Technical advantages and opportunities of AI at the Edge AI solutions that run autonomously,are distributed and implemented at the Edge offer the following advantages:Increased real-time performance(low-latency):Edge applications process data and generate results locally on the sensing device.As a consequence,the device is not required to be continuously connected with a cloud data-centre.As it can process data and take decisions independently,there is increased real-time performance in the decision-making process,reduced delay of data transmissions and improved response speed.Reliable low-bandwidth communication:Distributed devices can handle a large number of computational tasks,therefore reducing the need to send data to the cloud for storage and further processing.Overall,this results in minimizing the traffic load in the network and supports low-bandwidth communication.EPoSS WHITE PAPER AI at the Edge8 Enhanced power-efficiency:As the amount and rate of data exchange with the cloud is minimized,the power consumption of the device is reduced thus improving battery lifetime,which is critical for many edge devices.Improved data security and privacy:By processing data locally it does not have to be sent over a network to remote servers for processing.This improves data security and privacy as the data is not visible externally.Figure 2:Especially for applications that need real-time performance(low latency)the processing or pre-processing of data athe the Application edge is mandatory.1.5 CosteffectivenessOver the past decades,the emergence and growth of the smartphone market and the mass production of semiconductors has allowed a significant cost reduction per unit for high performance processors that include AI capability.However,the required cost of an IoT device may limit the ability to include hardware components(memory,logic or storage)required for edge AI and this may limit the use of edge AI to high-end IoT products.Indeed,the applications currently driving the development of edge AI hardware are either computing-inten-sive(e.g.image processing for autonomous cars)or characterized by a high-level of criticality(e.g.organ-on-chip for health care).IoT devices represent a wide range of connected objects,some will benefit from increased computing capacity but for low-power,remote and less data intensive devices the integration of high-end components to process AI may not be the best solution.Sensor and Smart Systems solutions have evolved in the recent years,reducing the gap between the semicon-ductor chips and the final user/application.This strength allows a more tailored approach to AI system design that can mitigate the impact of components prices.This can provide adapted solutions for a wider range of devices and applications without a significant increase of cost when compared to“non-AI”solutions.WHERE TO PLACE INTELLIGENCE?FACTORS DRIVING INTELLIGENCE TOWARDS THE EDGEData CentreCloudNetwork EdgeGatewayApplication EdgeDevice ADevice BLatencyDescriptionClassificationRegressionData CentreCloudNetwork EdgeGatewayApplication EdgeDevice ADevice BPreprocessingLatency 1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS92 Applications for“AI at the Edge”In this chapter,a number of applications involving the adoption of Edge AI solutions are illustrated and an-alysed with the goal to highlight the considerable breadth of scenarios where this technology can play an important role.2.1 Automotive and multi-modal mobility2.1.1 VEHICLE INTELLIGENCE AND V2X COMMUNICATIONVehicle intelligence:While some advanced driver-assistance systems(ADAS),such as a lane keeping assistant or cruise control,are already commercially available.For several additional vehicle automation functions suffi-ciently efficient and reliable performance must still be developed and implemented before human drivers can be replaced by AI(in all operating domains).Transferring driving tasks successively from human to AI drivers and meeting all requirements with respect to sensing(scene understanding),decision-making and acting,pres-ents a complex technological challenge with respect to both AI hardware and software models.Today it is clear that besides AI,the connectivity vehicle-to-vehicle(V2V)and between vehicles and infrastructure(V2I)will be key to deploying automated vehicles,since it provides the basis for the coordination of vehicles.AI potential at the edge:Advances toward automated and ultimately autonomous mobility depend on prog-ress in sensor and actuator technology,but most importantly on progress in AI technology.Each vehicle rep-resents an edge node within the mobility system,connected to the cloud for services such as traffic or fleet management or mapping.Transferring AI tasks to the edge offers multiple benefits including improved system performance due to reduced communication and thereby processing latency,enhanced privacy or new func-tions such as driver authentication.The combination of vehicle intelligence and intelligent infrastructure us-ing,for example,Multi-access Edge Computing(MEC)can provide further significant safety improvements10.Challenges:The optimal distribution of intelligence between the edge nodes(cars),the fog computing layer(e.g.traffic lights at an intersection)and the cloud(e.g.traffic management centres)presents a key,and strong-ly debated,topic in the field of vehicle automation.The answer is likely to differ for different operational do-mains,as automated shuttles on dedicated lanes require far less coordination from a central intelligence than an automated vehicle moving through dense mixed traffic(including non-automated,partially automated and fully automated vehicles).2.1.2 OCCUPANT ACTIVITY“UNDERSTANDING”Automated driving:Automated driving is one of the four main automotive trends11,driven by technical de-velopments,market expectations and continual legislative tightening.The technical solutions are focusing on the human-centred component,which covers two challenges:the Human-Machine Interface(HMI)and human perception of automated driving.Advanced HMI is the essential interface for seamless operation between the(semi)automatic system and humans.The EU-funded HADRIAN project12 is developing a holistic driving solu-tion,focusing on the utility of dynamically adjusting(fluid)human-machine interfaces taking environmental and driver conditions into account.On the other hand,human perception of driving style and safety is crucial for the acceptance of new technology through the increase of trust.The EU-funded TEACHING project13 explores AI techniques at the Edge to realise the human-centred vision leveraging the physiological,emotional and cognitive state of vehicle occupants for the adaptation and optimisation of the autonomous driving applications.AI potential at the edge:Managing transitions between different levels of autonomy is fundamental.The AI-based observer is a key point of this system as it detects the behaviour and the mental state of the driver.Edge AI offers the local calculation of the driver states,thus allowing for control of the response time thus EPoSS WHITE PAPER AI at the Edge10preventing personal data from leaving the vehicle and continuous learning to adapt the AI-based observer to each driver and passenger.Challenges:Understanding vehicle occupants physiological state and their ability to take over control of the(semi)autonomous vehicle are crucial for driving safety.Those challenges are further complemented by the need for the most effective interfacing to the driver.Those calculations must be performed at the local level to avoid basic connectivity risks.The inherent conflict between safety and AI is an open challenge,which is also complemented by the need for continuous learning.2.1.3 AIR PATH CONTROL AND DIAGNOSTICSEmission control and overall efficiency of the engine:The air path of an internal combustion engine is a cru-cial component for emission control and overall efficiency of the engine.The goal of air path diagnostics is to detect faults or poor performance and to identify the root cause.AI potential at the edge:AI helps in efficiently executing the control strategy and in diagnostics of the air-path.For example,the heavyweight processing(e.g.physics-based simulations)used in executing the control strategy can be substituted with ML workloads.Compared to the original simulation model,the execution of the trained model implementation is less demanding.The deployed model can thus be executed on the edge.As sensors are mounted on the vehicle,this requires a split of intelligence between the backend(e.g.crowd-sourcing of vehicle-data to obtain the diagnostic model)and the edge(e.g.the various privacy related aspects).Challenges:Challenging requirements such as time-predictability,dependability,energy-efficiency,and secu-rity need to be fulfilled.In this respect,the aim of ECSEL Joint Undertaking FRACTAL14 is to create a reliable computing platform node,implementing a so-called Cognitive Edge with industry standards.This comput-ing platform node will be the building block of scalable decentralized Internet of Things(ranging from Smart Low-Energy Computing Systems to High-Performance Computing Edge Nodes).2.1.4 BATTERY LIFECYCLE MANAGEMENTPredictive maintenance for battery aging:The in-use phase of a vehicle(road profile,climatic condition,driv-ing,parking,charging)has a significant impact on battery aging.Batteries pose the risk of exploding(“thermal runaway”)in normal use and the existing methods such as strain-,acoustic and/or temperature sensors to detect thermal runaways.AI potential at the edge:To better understand the aging behaviour of batteries,data-driven models based on aging experiments enable lifetime simulation and prediction.Predictive algorithms on the edge can crowd source data from vehicles and/or the lab.This provides critical correlations with battery safety and offers the potential of increasing the warning period.Within the ECSEL JU Integrated Development 4.015 a digital twin that allows the prediction of state of charge(SoC),state of health(SoH)and/or remaining lifetime is developed.Challenges:There are a large variety of modelling approaches ranging from models using first principles(e.g.electro-chemical models)to purely data-driven models(needing to collect aging-related data in the lab and while operating the vehicle).These can be applied to the cell,module as well as package-level.Hybrid models have to be developed that aim to combine the models from first principles and data-driven models.2.1.5 THERMAL MANAGEMENT OF THE POWERTRAINEnergy control in the vehicle:The perception of the environment is carried out via vehicle and powertrain sen-sors coupled with the weather data and the traffic information that can be retrieved from dedicated service providers.The computing workload is split between processing in the backend(e.g.crowd-sourced data)and dedicated control units(energy control units)in the vehicle.1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS11AI potential at the edge:AI has a tremendous potential in optimizing the energy efficiency based on the per-ceived environmental conditions.For example,by considering weather and traffic conditions to accelerate or delay the cooling process,AI-based strategies can augment classical model-based thermal control strategies.In line with such efforts,the Horizon Europe draft work programme 2021/2022,Cluster 5,mentions safe,seam-less,smart,inclusive,resilient,climate neutral and sustainable mobility systems in terms of their expected impacts.Challenges:The perception of the environment of the vehicle is key to optimize the vehicles energy efficiency.Including the powertrain system(internal combustion engine,electric motor,fuel cell),energy storage system(hydrogen,electric),the passenger or cargo air conditioning system,and the traffic information(V2V,V2X).The collection of data offers the potential to characterize the context under which to perform the optimiza-tion.However,continuously collecting such data requires highly reliable connectivity(V2V,V2X)and an agree-ment on common mobility data sharing space16.In addition to standardization of data interoperability this includes data-lifecycle management that is designed around B2B,B2C and B2G data sharing.2.1.6 AUTONOMOUS(INLAND)SHIPS BY PROJECT A-SWARMAutonomous inland ships:could play an important role for the transportation of goods in big cities in the fu-ture.The German-funded A-Swarm17 project explores this topic.It is planned in the project to fit a barge with near-field and far-field sensors as well as edge platforms with AI accelerators.AI potential at the edge:The accelerators will be used to run AI models that locally process the data from sensors and control the engines of the barge.Allowing for the implementation of a system with the necessary real time capabilities required to traverse the waterways in cities.Challenges:To realize this system many different problems need to be solved.One of the biggest is to filter out,in real time,the noise and movement generated by the water from the sensor data.Furthermore,it needs to be explored how different intelligent edge systems can efficiently communicate with each other to enable for example an automated unloading of the barge.2.1.7 MASSIVE SENSOR TECHNOLOGY AND NETWORKSMassive sensor technology and networks:The new concepts for autonomous mobility,digital industry,and decentralized bidirectional and multi-modal energy supply,as well as smart city and smart home applications,require massively more sensor technology and electronics with significantly higher performance in each indi-vidual product than today.At the same time,reliability and safety requirements are increasing dramatically,as the operation of automated and autonomous systems are no longer overseen by human operators.Instead,the lives of passengers,the economics of production,and the stability of utilities depend entirely on the func-tionality of their electronics.The current way to ensure the highest standards of safety and availability relies mainly on redundancy at all levels of integration(including full system redundancy).This is very expensive,resource heavy and sub optimal.The failures can occur without warning and in both the primary and the re-dundant unit.Therefore,the ultimate fallback solutions have to be used quite often(e.g.emergency stop).They are safe but usually mean the sudden end of operation.This approach would lead to an unreasonably low availability of ultra-complex systems like autonomous cars.New strategies with smart and pro-active safety assurance need to be developed that are based on continuous self-monitoring,remaining life estimation,and active failure prevention in the electronic systems.EPoSS WHITE PAPER AI at the Edge12AI potential at the edge:An intelligent approach to functional safety will achieve a higher level of confidence and trustworthiness with less redundancy than today.It will require the inclusion of artificial intelligence al-gorithms as essential elements.Trained by data from comprehensive physics of failure(PoF)studies and big data-driven(DD)analyses,compact AI routines can be developed and implemented directly into the individual products to deliver maximum availability.Challenges:Despite the limited computational resources at this edge position,the AI routines should cover the current application scenario very well.This can be achieved by developing dedicated meta-models and by resource-optimized programming.In addition,a(non-permanent)connection to the cloud server allows dy-namic updates to best adapt to changing scenarios(e.g.from summer to winter conditions)and to continuous-ly improve the meta-models(e.g.by learning from the entire fleet).A number of projects have already started to explore this approach to AI-based smart safety solution for electronic systems,e.g.ECSEL iRel4.018 PoF and DD analyses,ITEA3 COMPAS19 compact models for AI,H2020-GV EVC100020 early warning indicators,lifetime estimation.However,the main part of the research work in this area is still ahead.2.2 Energy2.2.1 SMART GRID Distributed energy sources:The development of Smart Grids over the past two decades was a necessary re-sponse to the fundamental shift from a unidirectional supply of electricity(from power plants to consumers)toward an increasingly decentralized,bidirectional and complex network.The widespread use of renewable energy sources has resulted in a corresponding growth in the number of network nodes.Advances in ICT have enabled smart home applications which,alongside the introduction of electric vehicles,constitute new agents and further increase the complexity at individual network nodes.At the same time,the introduction of smart meters at network endpoints and ubiquitous sensors throughout the grid,have added a digital layer compris-ing a myriad of sensors and providing large amounts of data.This data availability and the increasing diversi-fication and distribution of energy sources and applications call for an equivalent distribution of intelligence throughout the grid,to maximise network efficiency,optimize grid management and enable new(end-user)applications,including data privacy.AI potential at the edge:Large amounts of data concerning energy demand and supply accumulate at individ-ual network nodes and must be processed efficiently at the Edge to exploit their full value.Machine learning applications for Smart Grids include classification and clustering models for big data processing,are used pri-marily by utility suppliers and cloud service providers to group consumers according to their usage patterns and apply predictive models for future demand21.Prediction models can be used for the supply of renewable energy when weather forecasts are included.While many of these models can be applied for management,de-cision-making and control processes at the(micro)grid level can be run in cloud data centres or fog gateways,some applications potential optimisations can only be fully unlocked using edge AI.Cognitive applications of edge computing in Smart Grids include intelligent agents used both for energy market issues(management,pricing and scheduling)and for network management(security,reliability,fault handling and efficiency).Pos-sible use cases include:Combination of AI and blockchain technology for the integration of electric vehicles in power management platforms for Smart Grids22.Dynamic pricing to balance demand and supply23.Pre-processing strategy of hierarchical decision-making to optimise resource usage based on service level requirements.1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS13 Data-driven methods to analyse equipment and end-user behaviour in the distribution network,for example,to provide energy as a service(EaaS)24.Fault detection and diagnosis in the transmission grid(e.g.video surveillance and scene interpretation using drones).Real-time monitoring25.Challenges:A central challenge for the application of edge computing and AI in Smart Grids remains the de-sign and implementation of efficient system architectures that meet the real time and safety requirements on AI at the device,network and application level and distribute tasks as well as required intelligence between cloud,fog and edge.2.2.2 SMART BUILDINGS Buildings as networked cyber-physical energy systems:Buildings are a major contributor to the overall en-ergy consumption.Since passive means(e.g.thermal insulation)are nearly fully exploited,smart buildings are envisaged to be the future enabler for further improvement in energy efficiency26.The objectives of building energy control systems are multi-dimensional and complex aimed at using a minimum of energy(preferably generated on-site from renewable sources),a prescribed level of comfort and a healthy indoor climate must be provided.Since the components of the building energy systems are integrating more sensors and embedded systems,buildings are becoming networked cyber-physical energy systems especially larger objects like air-ports,shopping malls or office buildings.AI potential at the edge:A high number of multivariate sensors are required to exploit the full potential of model predictive control schemes besides standard parameters such as temperature,humidity,CO2 and the occupation of rooms,which are usual inputs to the control system.While the main control system is usually im-plemented as a centralized controller,there are relevant applications for data analytics and AI on edge devices and smart sensors;for example:the number of persons present inside a room is a relevant input parameter for building controls.Image sensors allow for a precise counting of people.But to enable a required level of privacy,raw image data should not be spread among open data networks.Implementing AI algorithms directly on the device can help to analyse the image in order to extract the relevant information for the control sys-tem.Sensors in energy system components,like fans or air filters,are an enabler for predictive maintenance schemes,allowing higher efficiency and reduced maintenance costs.Using wireless technology,easy installa-tion or retrofitting would be possible-especially at places that are hard to reach by the tethered data network.However wireless data transmission from basements can be difficult due to the metal structures in heating,ventilation and air conditioning systems.Data can be reliably transmitted,with a reduced bandwidth,by using data analytics at the sensor to provide only the relevant information on the current status of the component instead of time series data from pressure sensors etc.Challenges:Only with a large number of sensors can explore the merits of energy savings,i.e.the economic benefit per sensor is rather low.In turn,a smart sensor for a building energy system must be a rugged and a low-cost system.Furthermore,in many use cases,wireless connectivity is strongly demanded.An optimized power consumption ensures long maintenance intervals,imposing challenges on energy efficiency of the on-board data acquisition and processing.EPoSS WHITE PAPER AI at the Edge142.3 Digital Industry2.3.1 PREDICTIVE MAINTENANCEIndustry 4.0 and predictive maintenance:Industrial applications of IoT are predicted to generate a signifi-cant economic benefit.Predictive maintenance is a popular example enabled mainly by the analysis of huge amounts of data generated by sensors integrated into industrial assets.This is implemented by the classifica-tion of the acquired data,with respect to the status of critical components,and using prediction models to enable a forecast of remaining lifetime and,in turn,to optimize maintenance schedules.AI potential at the edge:Implementing AI on the edge devices near the sensors would offer several benefits:reduction in the transmitted data volume,which is particularly important for sensors generating large data streams such as vibration time series.Data from heterogeneous sensors can be fused on the device.This also enables cross-validation of sensor data,improving the resilience of the system.Local data analysis can reduce the latency of the AI compared to a cloud based solution,this can be an important advantage when detecting critical faults.Challenges:In order to gain economic benefits from the sensor signal analysis,the accuracy of the algorithms has to be very high.False alarms or undetected failures can cause severe financial losses or even damages to equipment.Another important aspect is the availability of training and validation data.Only for mass produc-tion lines,the necessary amount of representative data can be collected in a reasonable time.In cases of more individualized production,algorithms have to cope with small training sets;or the application of synthetic data from simulation model scan be considered.2.3.2 RELIABLE PREVENTION OF EARLY FAILURESReliable prevention of early product failures:Product reliability has a typical characteristic.A relatively high failure rate occurs during the first operating period.These early failures are caused by the small variations in material,shape,or process properties during fabrication.None of these stochastic deviations exceeds their spec-ified limits,so current process control algorithms cannot detect the reliability risk that arises from unfortunate combinations of these variations.AI potential at the edge:Expanding the scope of process control,by including a larger number of process steps in advanced data analysis using artificial intelligence schemes,can detect a significant portion of these risky com-binations of inherently permissible variations.The ECSEL project iRel4.027 explores this approach with the ex-ample of microelectronic production.While the core part of AI-based data analysis can be performed by the large computer clusters,that provide general process control at the manufacturing site,additional edge capabilities are required to enable corrective countermeasures to be taken in real-time at all relevant process tools to provide the important data in a pre-aggregated form.Challenges:The computational edge capabilities are thus an essential part of the overall AI system.The flexibili-ty,latency,and security requirements of advanced process control cannot be met without them.2.3.3 ROBOT CO-WORKINGCollaborative robots in industrial environments support human workforce in the fulfilment of repetitive jobs or heavy lifting,for instance.Applications can be found mainly in the manufacturing industry,e.g.assembly of automotive parts.AI potential on the edge:Edge AI enables new possibilities for the cooperation of humans and robots,be-cause in contrast to cloud based systems edge AI is fast enough to handle situations where the robot could inflict harm.To implement these new possibilities sensors need to be deployed that are able to monitor the 1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS15environment and the movement of humans and animals within range.The data from these sensors are locally processed by AI models in the robot or running on nearby edge nodes.Afterwards,edge AI-based components use the processed data to control the robot allowing for close cooperation with humans in performing com-plex tasks like the manufacturing of custom products in workshops or rescue operations.To implement this vision of close cooperation many challenges need to be solved such as training the robots for new tasks.Challenges:In addition to more traditional robotic applications,the safety of the human worker has to be con-sidered,since the robot and the human share a common working space.Operational strategies ensuring safety of the worker require advanced sensing capabilities of the robot28.In addition,the sensor data,e.g.from an image sensor,has to be processed with low latency in order to enable a quick reaction of the robot in a critical situation.Thus,transferring cognitive and analytic capabilities to the edge,i.e.a single robot,is advantageous.Potential strategies include distribution of AI methods in a network of robotic devices29.Finally,reliability and functional safety requirements of the robotic system with integrated AI capabilities have to be met during the design process.2.4 Health and wellbeing 2.4.1 VITAL SENSING WITH RADARVital sign monitoring based on radar sensors:will be an important component in many medical applications.However,a cloud-based implementation of the sensing would be too slow in time critical contexts.This is not the only problem of cloud systems as storing generated data in them is also a privacy concern.AI potential at the edge:Issues of latency and privacy can be solved by using edge AI.When the radar data is processed locally,the information about heart rate,respiration and so on are available fast enough to trigger other parts of the system that can save the life of the human.Furthermore,the results can then be deleted or anonymized before they are sent to the cloud.Challenges:The accuracy of current edge AI implementations of such products is too low to avoid high false alarms rates.Hence,the accuracy of algorithms needs to be improved to enable better adoption of life saving applications.2.4.2 PERSONALISED MEDICINEPersonalised Medicine:Human physiology can vary greatly from individual to individual.Examples for that in-clude blood pressure or lung capacity.However,these differences need to be considered for accurate medical applications like vital sign monitoring.Due to privacy concerns,it is difficult to process this information in the cloud-based solutions.AI potential at the edge:Edge AI offers the possibility of maintaining privacy when processing medical data.Furthermore,many medical applications require real time processing,which can be better realized with local AI.By exploiting these two aspects,many medical and consumer applications can be implemented which were not possible in the past.For example,different organisations work on integrating sensors and AI into clothes allowing for feedback loop based training of athletes.Challenges:Processing data at the Edge does not make it totally safe against malicious access.Hence,the security measures of edge AI processing pipelines need to be further improved to ensure that medical data or applications are not misused.EPoSS WHITE PAPER AI at the Edge162.4.3 AFFECTIVE COMPUTING(“AI OF EMOTIONS”)Detecting and measuring human emotions:Affective computing is interested in automatically detecting and recognizing the emotional state of a human either with remote or“nearable”sensors(visible and IR imagery,audio,physiology),or with sensors in contact(wearables)for physiology,or activity monitoring.Emotions,a classic conceptual representation of which follows a 2D valence(negative/positive)versus intensity(calm/excited)pattern,have an essential role in human behaviour.These influence the mechanisms of perception,attention,decision making,and social behaviour.The purpose of estimating emotional states is to improve un-derstanding of human behaviour.This is the strongest reason as emotional states are both very personal and evolving,very different from one individual to another,and from one situation to another.AI potential at the edge:The edge AI allows for maintaining the confidentiality of the data inside the mea-surement device,to guarantee the autonomy of the devices,and to aim for an individual estimator learning over time.The objective of the studies conducted at the CEA LETI is to develop an autonomous and ambula-tory stress observer based on physiological signals,aimed at self-assessment and coaching for well-being(see M.O.T.I.O.N project)30.Challenges:Privacy and personalisation.On the road to individual guidance whether medical or for other purpose(wellbeing,sports or emotion management)local processing of data answers potential issue of con-fidentiality and data protection.In addition,the use of AI allows identification and adaptation to individual re-sponse pattern to the targeted monitoring(activity,treatment).Once anonymised,this individual response(learned and characterised thanks to the AI)can feed wider models so that it can be shared and benefits to other users/patients and helps them in managing their own activities.2.4.4 SPORT ANALYTICSPrevalence of lower-limb injuries:Lower-limb injuries are common among athletes,accounting for 77%of hospitalized sport-related injuries,and are a risk factor for early-onset osteoarthritis.High-impact forces are one of the factors contributing to lower-limb injuries.To decrease the prevalence of lower-limb injuries,and their associated long-term disability and economic burden,multiple injury prevention programs have been proposed.These take into account the study of ground reaction forces(GRFs)in order to enhance athletes performance,determine injury-related factors,and evaluate rehabilitation programs outcomes.AI potential at the edge:Together with industry partners,the Tyndall National Institute have developed a miniaturised monitoring system,integrating ultra-accurate accelerometers and neural networks,to estimate the impact GRF forces while running.Besides being a unique solution for multiple injury prevention,the devel-oped solution can be used by elite athletes,sports teams,coaches,scientists,and consumers who would use novel performance monitoring systems to keep pushing the boundaries of their sports and gain performance advantages.Challenges:Major challenges in the system implementation are related to the development of a neural network that is sufficiently accurate to model GRFs while it is also simple enough to be deployed on a re-source-constrained microcontroller with limited energy consumption.Moreover,an open challenge is related to the deployment of personalized athlete-specific models rather than general-purpose neural networks;this could be achieved by either training a whole network from scratch directly on the wearable unit by relying only on the data collected from the individual athlete,or by adopting a transfer learning approach where a number of layers in the general-purpose network are trained based on the data from all the available subjects and are frozen and deployed on the microcontroller and the data collected from the individual athlete are used to train only the last layers of the deployed neural network.1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS172.4.5 PHYSIOLOGICAL MONITORINGHealth markers:Elevated blood pressure is a major health concern and a risk factor for complicated cardio-vascular morbidities including coronary heart disease,ischemic,and haemorrhagic stroke.WHO reported an estimated 7.5 million deaths due to elevated blood pressure.The accurate measurement of blood pressure is important to timely detect health threats.Therefore,to get a continuous,accurate,and reliable insight into a persons cardiovascular health condition requires a practical approach.Sepsis is also another good example,one of the leading causes of death worldwide,with incidence and mortal-ity rates failing to decrease substantially over the last few decades.AI potential at the edge:Edge computing is experiencing a massive growth in healthcare applications as it helps to maintain the privacy of patients(e.g.data is locally processed,without engaging cloud services in the overall process),and allows a fast and real-time decision support system.One of the objectives of the HOLISTICS project led by Tyndall National Institute31,in cooperation with its in-dustry partners,is the adoption of edge analytics into wearable devices for health-related use case scenarios(e.g.blood pressure monitoring).Cuffless blood pressure monitoring devices adopting AI solutions based on the analysis of PPG or PTT signals have shown promising results in recent years.As an example to illustrate the value of edge-based AI models in the management of vital signs,the model can raise timely alerts pro-actively prompting clinicians without needing time-consuming and costly laboratory tests.AI solutions have,therefore,the potential to be used on wearable devices to predict the prognosis(e.g.blood pressure),and/or detect the pathogens causing an infectious process(i.e.sepsis).Challenges:A typical challenge of health-related datasets is the presence of a high imbalance in the data.The development of the outcomes for patients with sepsis and recommend the treatment process(e.g.the medi-cations to be used during sepsis),of techniques and approaches able to tackle this problem at a technical level(i.e.data augmentation,resampling techniques)and policy level(e.g.data collection process,data sharing pol-icy,new standards)is diffusing steadily.Moreover,the possibility to provide tailored medical treatment(e.g.personalized medicine)is attracting increased attention over the recent years;however,its implementation and deployment into edge devices in real-world scenarios is still in its infancy.2.5 Agriculture,Farming and Natural Resources2.5.1 AUTOMATED WEEDINGChemical weeding to reduce the competition between weeds and crops:Vegetable production imposes a wide variety of farming operations because of the diversity of crops and the related planting parameters such as the seedbed structure,the seeding density,the spacing between rows and the distance between plants in each row.In addition to these agricultural operations,vegetables require early weeding(7 to 15 days after sowing or planting)due to the strong competition between weeds and crop and the increasing difficulty of removing weeds without damaging the crop.Once the crops cover all the row,weeds are stifled as soon as they appear,and weeding becomes less critical.Chemical weeding is the classical solution to reduce the competition between weeds and crops.However,the growing consumers demand for product quality and for the absence of phytosan-itary residues,is having an increasing impact on agricultural practices.Mechanical weeding(hoeing)is,therefore,increasingly necessary.Nevertheless,it remains difficult to implement weeding within the rows,because de-stroying the weeds inside a row while preserving the plants is very delicate,especially when the sowing is dense.To date the only mechanized or automated solutions concern inter-row weeding(weeding between two rows).EPoSS WHITE PAPER AI at the Edge18AI potential at the edge:Commercial AI-based intra-row weeding solutions exist only for crops with significant inter-plant distances(lettuce or cabbage for instance).No automatic hoeing solution exist for carrot,peas,beans,sweet corn,onions,etc.To realize intra-weeding for these crops some AI-capabilities in the weeding machines are required to adapt to changing environment in real time.Challenges:A stable connection to the cloud cannot be guaranteed on fields all the time.Furthermore,the auto-mated weeding machines should be as power efficient as possible.Both of these requirements could be solved by neuromorphic AI algorithms,due to their lower energy demand compared to standard neural networks at the Edge.Such algorithms and corresponding hardware are explored in European funding project Andante32 but these topics will require much more work than which can be achieved within one project.2.5.2 DRONES FOR PRECISION AGRICULTUREPrecision agriculture is one of the scenarios where Unmanned Aerial Vehicles(UAVs)or drones are currently being used and demonstrated.They are equipped with cameras and sensors which allow taking close images of the crops,field operations and of the machines.AI potential at the edge:This information can be used for tasks such as obtaining Normalised Difference Veg-etation Index(NDVI)maps from multispectral cameras which can support decision making about spraying or perform additional operations in the crops,for example to recognise areas that may be affected by pests and to apply phytosanitary or pesticide treatments.They can even act as a network gateway to collect information from IoT sensors using low-cost and wide area network protocols like LoRaWAN(Long Range Wide Area Network).Challenges:The deployment and the usage of drones and UAVs in the agriculture domain still presents challeng-es that must be solved,e.g.be intelligent enough to fly autonomously without requiring major interventions from specialised human operators be capable of dynamically readjusting the missions based on context information coming from onboard sensors and other sources of data deployed in the crops collaborate with other drones or ground robots to perform more complex tasks in complex and larger terrains guarantee compliance with security regulations and incorporate trustworthy requirements and guidelines.To address most of the previous points,artificial intelligence processes will be embedded directly on drones and robots in order to increase their autonomy and real-time capabilities.2.5.3 SOIL CONTROLEfficient food production is important to ensure the food supply of mankind.Hence,more and more sensors are deployed around,and in fields to gather data about their state and planted crops.For soil monitoring Biode-gradable sensors are being researched.The idea is to mix them into the fertilizer,which is then put on the field.Afterwards,they send their data for between six months and one year to a node near the field and this node transfers the data to the cloud.AI potential at the edge:The amount of data generated by the field monitoring sensors is very high.However,not all of the data is relevant and can be averaged over multiple sensors e.g.the average soil moisture level of a field.Edge AI can be trained to analyse these large data volumes resulting in lower amounts of data needed to be sent to the cloud as well as lower network load.1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS19Challenges:The critical point about bio-degradable sensors is that they cease to work after a specific amount of time.In addition,other sensors deployed around the field may have a lower average lifetime than sensors in other contexts.This requires the Edge AI solutions for this application to be able to handle fluctuating amounts of incoming data.Such high levels of flexibility are not well explored yet.2.6 Smart Cities2.6.1 SMART STREETLIGHTSLighting systems adjusting the brightness to the individual conditions of the surroundings:Smart street-lights could provide important services for smarter and greener cities in the future.AI potential at the edge:Using different kinds of sensors and edge AI,the streetlights can detect whether,and at what speed,a pedestrian or motorist is approaching.As long as the person is within the radius of the light,this area is illuminated by built-in LED lamps.If the person moves away,the lighting is reduced.In adverse weather conditions,such as snow or rain,the light output could be increased automatically as required.The edge AI eval-uating the sensor data can run on microcontrollers in the lamp or on other nodes in the proximity.This dynamic light regulation saves energy and costs.Smart streetlights could also be used for implementing other important services like the charging of electric vehicles and the measurement of the air quality.Challenges:A central challenge of this application is managing the access to the results of the Edge AI.The processed sensor data can be of interest to different parties,for example for the police in case of accidents or insurance services that insure shops near the smart street lights.One approach to solve this challenge would be to combine block chain technologies with Edge AI.However,this is a research field which is still in its initial phase.2.6.2 AIR QUALITYImproving air quality using gas sensors:Gas sensors currently available on the market are often quite unstable,inaccurate and show large cross sensitivities to other interfering gases.Moreover,they are often very large(not in a portable form factor)and quite costly.AI potential at the edge:Neural networks at the Edge are crucial to gas sensing especially when it comes to ac-curately identifying different gases in an outdoor environment.While the sensor technology itself(materials,ge-ometry,temperature modulation,number of sensing fields,etc.)can surely help to improve sensitivity to target gases,algorithms play a very important role when it comes not only to classifying gases but also to quantifying them in parts per billion(ppb).Since gas sensors in most use cases have limited connection to the internet,these algorithms need to be deployed on the sensor node.Challenges:Recent results already show that traditional neural networks can strike the right balance between accuracy and robustness for air quality monitoring,still many open questions remain on the behaviour of air qual-ity monitoring sensor deployed in the field over a long time.Here,it is even more crucial to ensure long battery life and wider online learning at the Edge for specific use cases and more self-diagnostics on the performance of the sensor.2.6.3 ALARM SYSTEMS Increased safety with intelligent Alarm Systems:Edge AI-based alarm systems are a good example of how edge computing solutions enrich existing smart building systems.AI potential at the edge:While previous alarm systems use a microphone and simple threshold rules to detect glass breakage when an unlawful entry is made into an apartment,the new generation of alarms process infor-EPoSS WHITE PAPER AI at the Edge20mation from multiple data sources via data fusion and neural networks.This minimizes the number of false alarm and significantly increases the reliability of the system.This solution can easily be integrated into existing glass breakage alarm systems.Challenges:As these new generations of alarm system become more widely spread,they will become targets for attacks.Currently little work has been done in the area of securing neural networks,making them responsible for the whole system.1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS213 State-of-the-art of“AI at the Edge”3.1 Edge AI for smarter systemsEdge AI resides at the location where the virtual world of the network hits the real world,where sensors and actuators are the link.Figure 3:At the system level,the place to run AI algorithms depends on multiple factors and is often a balancing act between the time and energy cost of local compute vs remote compute.Algorithms can be distributed at multiple levels as well.The balance point will shift over time,following advances in wireless technologies and neural processing.Edge computing can be mainly segmented in the following areas:Hardware(HW)Software IoT Platforms/Communication ServicesThose areas are separately discussed below.Compute ElementFigure 3:At the system level,the place to run AI algorithms depends on multiple factors and is often a balancing act between the time and energy cost of local compute vs remote compute.Algorithms can be distributed at multiple levels as well.The balance point will shift over time,following advances in wireless technologies and neural processing.AI SOLUTION RESEARCH FOCUSAI/ML algorithm executionSystem-level balanceSENSOR-LOCAL AIHigh-speed sensorsRich sensorsReal-time controlPrivacyCompute latency,power,energyREMOTEL AIComplex algorithmsData storageExplainabilityCommunicatelatency,power,energyRemote compute latencySensorsMCU/DSPGPUMemoryNPUSecurityActuatorsSignal conditioningData AcquisitionCommunicationLatency,bandwidthLatency,bandwidthCompute latency,throughputEDGE/FOGCompute and communicatelatency,throughput,energy,powerEXTREME EDGE/ENDPOINTCLOUDCompute latency,throughputEPoSS WHITE PAPER AI at the Edge223.2 Hardware for edge AI Choosing the ideal hardware for a particular application requires careful consideration of all the requirements.A successful system design finds a balance between the different aspects of system architecture,such as mem-ory footprint,executing time,model accuracy,power consumption,scalability,cost,and maintainability.While data-centres allow engineers to scale available computational power to the current demand(via GPUs or TPUs),an application running on edge devices needs to keep sufficient power reserves.An increasing number of ven-dors are now moving from producing simple resource-scarce microcontrollers(e.g.ARM Cortex MCU)to pairing general-purpose processors with specialized units tailored to execute the computational tasks required to imple-ment AI solution.As embedded systems are typically focused on using AI in the form of machine learning(ML)for interpreting incoming sensor data,these specialized sub-processors aim to speed up a classification or prediction tasks while maintaining a low power draw.This is especially important in applications running on battery power or with a low potential for cooling the system.Figure 4:Depending on the particular application requirements,different types of hardware are available and have to be chosen for Edge AI realisation.A common approach is the use of HW accelerators which can be either directly co-located with the general-pur-pose processor on the same silicon or might be connected as a separate chip.These accelerators stretch the pow-er continuum from relatively simple digital signal processors(DSPs)to highly parallel matrix computation units and similar advanced designs.These accelerators can either be monolithic designs such as special Edge variants of Googles TPU or a distributed set of smaller compute cores.Some examples for the latter are the Tensor Cores in newer Nvidia GPU architectures,the Hexagon cores in Qualcomm SnapDragon SoCs and Intel Movidius SHAVE processors.They commonly need specialized drivers and software libraries that allow software developers to take advantage of their capabilities.Currently,many accelerators rely on reduced precision computation,replacing costly floating-point mathematics with lightweight integer operations of 8bit precision,or even lower.Co-locating memory close to computation Application specific solutions for AI at the edge General-purpose processors with AI unit for ML classificationResource-scarce microcontrollers accelerator Field-programmable gate arrays for specific AI/ML architecturesGraphics processing unit acceleratorFLEXIBILITYCOSTSlowlowhighhighFigure 4:Depending on the particular application requirements,different types of hardware are available and have to be chosen for Edge AI realisation.GPUASICGPP AI UnitFPGA with built in AI/MLMCU 1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS23cores and pre-loading repeatedly used variables cuts down on time spent shuffling data around.Nowadays some accelerator hardware architectures are closely tailored to the operation of specific ML algorithms,allowing for very efficient computation.This specialization however comes at the cost of flexibility in a rapidly evolving field.A solution is presented using field-programmable gate arrays(FPGAs),which are becoming heterogeneous plat-forms that combine powerful CPU systems,specialized arrays of AI accelerator cores and traditional fabric,where the hardware can be programmed by the use of hardware description language(HDL)or C/C via high-level synthesis tools.FPGAs combine the benefits of specialized hardware with the freedom to change the layout even after the chip has left the factory.The use of FPGA can create more dynamic,scalable,and flexible systems,even though they often carry higher cost.Another type of processor is emerging as a new class of processing accelerator for these predominantly data-centric heterogenous processing tasks as offload to the main CPU.These processors are called manycores and are referred as DPUs(Data Processing Unit)in the industry.For example,in the case of a car or a drone,the challenge is to integrate the AI in complex,heterogeneous,real-time systems especially regarding pre-pro-cessing,DL-based processing,and post-processing.Intensive mathematical algorithms,signal processing,net-work or storage software stacks in the context of end-to-end use case is critical to meet key requirements such as form factor/size,power consumption,and costs.DPU or manycores provide a solution for such complex requirements.One pioneer company in such processors is Kalray with its manycores MPPA(Massively Parallel Processor Array)solution.Today,a system architect needs to weigh both current and future requirements of their systems when deciding on which combination of conventional and specialized computation cores to pick.Other than datacentres,where upgrades are done under controlled conditions,devices out in the field are harder to upgrade to more capable hardware,especially when faced with the number and variety of different customers.Figure 5:The system analysis results in constraints for computation speed,power consumption and power-efficiency for the compute element,given a specific algorithm.The target performance zone is different for each application or application domains.For each application,one can balance the energy cost and latency between computing the AI locally or transferring the data to cloud for processing.Typical mobile applications are limited by power constraints and a maximum laten-Figure 5:The system analysis results in constraints for computation speed,power consumption and power-efficiency for the compute element,given a specific algorithm.The target performance zone is different for each application or application domains.0.00110001001010.10.010.01Power(W)0,11101001000Speed(TOPS)TARGET PERFORMANCE OF EDGE COMPUTE ELEMENTMin speed to execute algorithmMax power on systemMinefficiencyforlocalcompute power communication power10 TOPS/W100 TOPS/W1000 TOPS/W1 TOPS/W0.1 TOPS/W0.01 TOPS/W0.001 TOPS/WEPoSS WHITE PAPER AI at the Edge24cy requirement.The performance and power efficiency of the local computation solution should be better than the performance and efficiency of the communication system.For instance,an AI processor on a small drone,tasked with running the tiny-YOLOv3 algorithm,should consume less than 10W,and provide more than 336 GOPS compute power able to run at 30 frames per seconds.Its power efficiency must be better than 112 GOPS/W(Giga Operation Per Second Per Watt)to be competitive with current 5G data transmission and computation at the edge.It is clear that improvements in communication level will push the minimal requirements for local AI computing even higher.Even though vendors are typically focusing on ever more capable H/W accelerators,new development tools and libraries of algorithms and software can also contribute to boost the system performance.3.3 Machine learning models for edge AI Current AI models for the edge are far more limited in terms of performance when compared to cloud-based models because of the relatively limited computation and storage abilities.Model training and inference on re-source-scarce devices are still a debated problem throughout academia and industry.A number of novel libraries and algorithms have been developed in the recent years with the goal to adapt stan-dard ML models to resource-constrained devices.A well-known example is given by ProtoNN which aims to adapt kNN in memory space-limited microcontrollers via sparse-projection and joint optimization.For low memory scenarios(2 kB),ProtoNN outperformed the state-of-the-art compressed models.In settings allowing 16-32 kB memory,it matched the performance of the state-of-the-art compressed models.Moreover,when compared to the best uncompressed models,ProtoNN was only 12%less accurate while consuming 12 orders of magnitude less memory.Bonsai is another novel algorithm based,instead,on decision trees and aims to reduce the model size by learning a sparse,single shallow tree.When deployed on an Arduino Uno,Bonsai required only 70 bytes for a binary clas-sification model and 500 bytes for a 62-class classification model.Its prediction accuracy was up to 30%higher than other resource-constrained models and even comparable with unconstrained models,with better predic-tion times and energy usage.The development of neural networks and deep neural networks with lighter and faster architectures(e.g.small size model,minimization of trainable parameters,minimization of the number of computations)for edge plat-forms has also gained massive traction among researchers.Some examples are represented by CMSIS-NN(devel-oped for Cortex-M processor cores)which generates neural networks that can achieve about a fourfold improve-ment in performance and energy efficiency,yet minimizing the memory footprint.Even recurrent neural networks(RNN)have been implemented in tiny IoT devices(FastGRNN and FastRNN).It is possible to fit FastGRNN in 1-6 kilobytes which makes this algorithm suitable for IoT devices,such as Arduino Uno.Some of the well-known techniques considered for model size reduction include:knowledge distillation,whereby a small(easy to implement)model(student)is trained to behave like a larger trained neural network(teacher)while trying to preserve the accuracy of the teacher model,thus enabling the deployment of such models on small devices,steps such as quantization,dimensionality reduction,pruning,components sharing,etc.These methods exploit the inherent sparsity structure of gradients and weights to reduce the memory and channel occupation as much as possible,conditional computation reduces the amount of calculation by selectively turning off some unimportant calculations(for example with components shutoff,input filtering,early exit,results caching,etc.).1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS253.4 Distributed learning at the Edge The computation resources of nodes at the Edge are limited in comparison to the cloud.Hence,the training of ML algorithms cannot be accomplished with a single edge node in many cases.There are several approaches to solve this problem by distributing the learning process.The approaches can be divided into three categories.The first is data parallelism(Figure 6.1),which is about splitting the trainings data into smaller parts,training a model on each part and then implementing a model that combines the result of the other models.Due to the reduction of the amount of training data,the models are not as big and complex as a model trained on the whole dataset.In some cases,this means that one edge node can perform the entire training.In contrast to data parallelism,the second category,model parallelism(Figure 6.2),is about splitting the model into sub-modules and letting multiple nodes train each one of these modules on the same data.After the training is completed,the modules are combined again into one model.Offloading the training is the last category.Since the training and inference step have different requirements for computation power,it is possible to let a more powerful node take care of the training of the model and then to deploy the model for inference on a node with less resources for inference(Figure 6.3).Figure 6.1:Distributed Learning data parallelismFigure 6.2:Distributed learning:model parallelismFigure 6.1:Distributed Learning data parallelismCloudCloudSplitting training dataSingle ML modelData subset distributed HW nodes the same ML modelResults combined into one ML modelEdgeHardware(HW)nodeTraining dataMachine learning(ML)modelInput/ActionFigure 6.1:Distributed Learning data parallelismCloudCloudSplitting training dataSingle ML modelData subset distributed HW nodes the same ML modelResults combined into one ML modelEdgeHardware(HW)nodeTraining dataMachine learning(ML)modelInput/ActionFigure 6.2:Distributed learning:model parallelismSingle training dataSplitting ML modelSame data distributed HW nodes ML submodelsResults combined into one ML modelCloudEdgeCloudEPoSS WHITE PAPER AI at the Edge26Figure 6.3:AI model distribution and reasoning at the EdgeA combination of data and model parallelism is Federated learning33,which is based on training a series of lo-cal models on different devices,which are then combined in a central node for a global model upgrade.The central node is also responsible for the coordination between edge nodes.However,this approach involves trade-offs between model performance and communication overheads.The data in federated learning is split into smaller parts,which would be distributed amongst the nodes of the edge network,as shown in Figure 6.4 below.Each node trains a separated model based on the data received,thus training a single part of the final DNN(Distributed Neural Network).Figure 6.4:Outline of the Federated Learning Approach 3.5 Frameworks and platforms for AI at the Edge Edge computing aims to bring high-performance computing capabilities and next-generation analytics pow-ered by AI/ML,into hardware systems deployable at the edge.This requires the implementation of compre-hensive intelligent edge frameworks and platforms whose development features the specific requirements and challenges spanning hardware,power efficiency,software,connectivity,flexibility and interoperability,and security.Remaining major challenges include the end-to-end integration of connected systems,cloud endpoints,and third-party platforms or services,with a flexible embedded framework required to ensure maximum Figure 6.3:AI model distribution and reasoning at the EdgeTraining at the powerful node/cloudData subsets distributed HW nodes ML submodelsReasoning at the edgeEdgeSplitting training data multiple ML modelsData subsets distributed HW nodes ML submodelsEdgeEdgeCloudResults combined into one ML model 1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS27use of data generated/collected over the long term.This is a necessary precursor to developing edge servers and processing solutions.Without a flexible edge-to-cloud integration platform and supporting software/middleware34 libraries for IoT edge gateways35 and other connected systems,solutions for high-performance edge computing cannot scale or adapt to the dynamic requirements of solution providers and end users.Other major challenges are represented by cross-platform flexibility(e.g.usable on both Android OS and Linux OS),dynamic parallelisation of the computational tasks,model compression,and end-user customization.To address the challenges for data analysis of edge intelligence,computing power limitation,data sharing and collaborating,and the mismatch between the edge platform and AI algorithms,Zhang et al.introduced an Open Framework for Edge Intelligence(OpenEI),which is a lightweight software platform to equip the edge with intelligent processing and data sharing capability36.The goal of OpenEI is that any hardware,ranging from Raspberry Pi to a powerful Cluster,will become an in-telligent edge.Meanwhile,accuracy,latency,energy,and memory footprint,will have an order of magnitude improvement compared to current AI algorithms running on the deep learning package.The framework includes:a Package Manager,which works as a running environment for AI algorithms on the edge platform,supporting both inference tasks and model re-training,a Model Selector,which is designed to find the most suitable models for the specific edge platform based on users requirements in terms of accuracy,latency,energy,memory,etc.RESTful API,which is used for communication with cloud,other edge devices,and IoT devices 3.6 Orchestration of AI between cloud and edge resourcesThe implementation of complex AI-based applications like self-driving cars is very difficult due to limited resources of edge devices.Hence,these kinds of systems are often distributed between the edge and the cloud.It means often that the time-sensitive part of processing is implemented at the Edge and parts that can take more time are executed in the cloud.A specific example for such a procedure is the“Big-Little approach”by E.De Conick et al.37.They proposed to split a classification problem into a smaller part with a limited number of high priority classes and a larger part including all other classes.Afterwards,a model is trained for each part.Due to the reduction of number of classes,the size of the model handling the smaller part of the problem is reduced allowing it to be deployed on lower powered edge devices.In contrast,the larger model is deployed in the cloud or on an edge device with high amount of computation power as de-picted in Figure 7.EPoSS WHITE PAPER AI at the Edge28Figure 7:Architecture of Big-Little neural network:the little neural network only classifies a subset of the output classes,and can be executed locally with limited CPU power.When the little neural network cannot classify the input sample,a big neural network running in the Cloud can be queried.Another distribution approach is to focus on the distribution of the processing of data.This means that as the data travels from its source at the Edge to the cloud each of the intermediate nodes that it passes per-form a small share of the processing task until the final result of the processing pipeline is obtained in a cloud server.An implementation of this approach was proposed by S.Teerapittayanon et al.38.They exploited the fact that only the first few layers of a DNN are required for general processing to distribute one DNN over edge and cloud nodes.Furthermore,this implementation also introduces exits points that allow the termi-nation of processing when the results are sufficiently accurate,are required earlier due to time constraints,or cannot be processed further due to a node failure.The deplyoment of this approach is presented in Figure 8.Class 1Class 5OtherLittle Neural NetworkRecognizeRecognizeBig Remote Neural Network Hidden Layer 0Hidden Layer 1Output LayerClass 1Class 2Class 3:Class N Hidden Layer 1 Hidden Layer 0Hidden Layer 2Output LayerFigure 7:Architecture of Big-Little neural network:the little neural network only classifies a subset of the output classes,and can be executed locally with limited CPU power.When the little neural network cannot classify the input sample,a big neural network running in the Cloud can be queried.Input Layer 1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS29Figure 8:Overview of the DDNN architecture.The vertical lines represent the DNN pipeline,which connects the horizontal bars(NN layers):(1)is the standard DNN(processed entirely in the cloud),(2)introduces end devices and a local exit point that may classify samples before the cloud,(3)extends(2)by adding multiple end devices which are aggregated together for classification,(4)and(5)extend(2)and(3)by adding edge layers between the cloud and end devices,and(6)shows how the edge can also be distributed like the end devices.DDN over cloud,edge and deviceDDN over cloud and geographically distributed edges and devicesCloud-based DDN14DDN over cloud and device 26DDN over cloud and geo-graphically distributed devices 3DDN over cloud,edge and geo-graphically distributed devices5CLOUD EXITEDGE EXITLOCAL EXITCLOUD EXITEDGE EXITLOCAL EXITCLOUD EXITCloudCloudCloudCloudCloudCloudCLOUD EXITEDGE EXITLOCAL EXITLOCAL EXITCLOUD EXITCLOUD EXITLOCAL EXITDeviceDevicesDeviceEdgeEdgeEdgesDevicesDevicesDISTRIBUTED DATA DISTRIBUTION NETWORK(DDN)EPoSS WHITE PAPER AI at the Edge303.7 Hardware-software co-design for AI at the EdgeOne of the most important challenges in the implementation of AI at the Edge is to be able to offer scalable solutions and yet meet diverse application needs,in terms of:end-users varying context(e.g.job to be done),individual end-users characteristics(such as demographics),latency expectations,available battery power and computational power within the device.To address these challenges,it is quite important to ensure that both the hardware and software adapt to the dynamic context at the Edge and devices state.Here,hardware-software co-design helps to ensure that this adap-tation and personalization happen seamlessly.While,in general,AI models have been designed with a top-down design flow,mainly focused on achieving the highest possible accuracy and performance,assuming that the HW will deliver the computational tasks required.This approach ignores the limitations present in the deployment of intelligent systems at the Edge.Instead,AI models should be built bottom-up with adequate understanding of the hardware constraints.In order to provide an optimized solution it is most important that AI models and the associated HW are developed simultaneously.A good example of this co-design approach is that some smart sensors include self-learning AI together with oth-er non-AI signal processing functions.As the sensors co-processor is capable of executing context-sensitive firm-ware on-demand,the device can switch between AI and non-AI firmware depending on the need.This solution can thereby reduce electronic-waste by having specialized hardware for AI and minimise overall bill of material cost.Additionally,the co-design of software and hardware helps to extend,or easily integrate,further physical and virtual sensors(e.g.magnetometer,pressure sensors,inertial sensor,etc.)as additional external inputs.This enables faster and more robust learning from an expandable list of input sources,chosen according to edge appli-cation,as opposed to pre-programmed(AI)solutions with a fixed number of physical inputs and without a built-in learning function.As the self-learning AI function executes on the sensors co-processor,the overall system pow-er and memory requirements are extremely low in comparison to other non-edge AI systems.In summary,as highlighted in the previous paragraphs,H/W aspects,models,and communication platforms are inter-linked when developing a system working at the Edge.Hardware-software co-design of edge-AI systems provides a path to the execution of a wide variety of applications(AI and non-AI included),whilst having the ca-pability to adapt to the application needs on-demand.1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS314 Future Challenges and Trends The development and deployment of a secure and trustworthy Edge AI will require a wide number of challenges to be addressed and solved.4.1 Trust and explainability AI algorithms,and especially deep neural networks,are often considered as black boxes.The decision-making process of typical ML algorithms is not always transparent,and usual data models based on NN do not represent the characteristics of the process to be represented.Furthermore,their internal computations present a black-box and are not easily understandable for humans.The drawbacks of such algorithms include:any bias within the training data is potentially transferred to the algorithm and remains undetected,users may not trust their predictions,and that they lack robustness in operational environments.Explainable AI aims to provide insights into the internal decision-making process of machine learning algorithms.Using these insights,algorithms can be developed whose predictions are not only correct but right for the right reasons39.4.2 Re-learningAcceptance and market uptake of products and services are directly dependant on trust into offered solutions.That is particularly manifested in emerging automotive applications,such as Driving Automation,which are heavily reliant on edge AI.Hence,it is of utmost importance that a common approach to AI is based on trust and excellence.Considering the importance of edge AI,there is a need for commitment to consider the impact of edge AI throughout its lifecycle.To that extent,the developed algorithms must be kept up to date and performant on new data,with the ability to integrate external sources through re-training.In addition to meet the requirements and defined metrics that indicate the training state of the AI system,the re-training must also consider any con-sequence it may have on other components or the system itself.The implication is that rather than having to spend time and resources on re-training from scratch,to incorporate slightly different insights,the re-training should focus on creating more generic models.The aim is to permit improvements in performance through a quick re-training of an edge AI model that has already been trained using previous data sets.Equally,the re-train-ing of one model should not compromise the performance of other components within the system(or other systems within a system of systems).In simple terms,the re-training must enable improved performance through exploitation of new data and in parallel it must not negatively impact its surroundings.Additionally,changes in calibration(e.g.of sensors or actuators)should be permitted without the need to retrain the edge AI.4.3 Security and adversarial attacks In distributed learning,a communication overhead is introduced in order for the edge platforms and the system aggregator to transfer data during training and inference.When compared to data processing in large central data centres,data produced on resource-constrained end devices in a decentralized distributed setting is partic-ularly vulnerable to security threats and the necessary level of protection against such risks should be considered carefully for specific applications.EPoSS WHITE PAPER AI at the Edge32Figure 9:A real-world attack on VGG16,using a physical patch generated by the white-box ensemble method described in Section 3.When a photo of a tabletop with a banana and a notebook(top photograph)is passed through VGG16,the network reports class banana with 97%confidence(top plot).If we physically place a sticker targeted to the class“toaster”on the table(bottom photograph),the photograph is classified as a toaster with 99%confidence(bottom plot).See the following video for a full demonstration:https:/youtu.be/i1sp4X57TL4(Source:https:/arxiv.org/pdf/1712.09665.pdf,Foto:Pixabay)An example of a security threat debated in the recent years is adversarial attacks.Adversarial attacks de-scribe the use of erroneous data to manipulate the results of AI algorithms,especially of neural networks.In the context of image or video classification,attacks are done by designing specific noises,colours,lighting,or orientation patterns,which are then integrated in the corresponding data.An example of the so-called“adversarial patches attack”was presented at NIPS 2017 Conference on Neural Information Processing Systems.After their generation these patches can be placed anywhere within the field of view of the classi-fier and cause the classifier to output a targeted class.In Figure 9 above,a banana is correctly classified as a banana.Placing a sticker with a toaster printed on it is not enough to fool the network and it continues to classify it as a banana.However,with a carefully constructed“adversarial patch”,it is easy to trick the net-work into thinking that it is a toaster.This patch attack is especially difficult as these patches can easily be distributed after their creation.Adversarial attacks exist for other kinds of AI-based data processing,e.g.audio or LiDAR(Light Detection And Ranging).However,to date,these areas are not as well investigated as attacks on image or video clas-sifiers.Further research is required to increase the security,privacy,and robustness of edge AI by reducing the overhead,or by adopting novel approaches such as clustered federated learning or federated distillations.Figure 9:A real-world attack on VGG16,using a physical patch generated by the white-box ensemble method described in Section 3.When a photo of a tabletop with a banana and a notebook(top photograph)is passed through VGG16,the network reports class banana with 97%confidence(top plot).If we physically place a sticker targeted to the class“toaster”on the table(bottom photograph),the photograph is classified as a toaster with 99%confidence(bottom plot).See the following video for a full demonstration:https:/youtu.be/i1sp4X57TL4(Source:https:/arxiv.org/pdf/1712.09665.pdf,Foto:Pixabay)1.00.80.60.40.20.0CLASSIFIER INPUTCLASSIFIER OUTPUTCLASSIFIER INPUTCLASSIFIER OUTPUTToasterBananaPiggy Bank SpaghettiBananaSlugSnailOrange1.00.80.60.40.20.0Place sticker on table 1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS334.4 Learning at the Edge Training artificial neural networks at the Edge remains a challenge.Work has been done to optimize inference at the Edge by optimizing algorithms and accelerators for low precision,low memory footprint and feed-for-ward computations.However,an additional re-training phase of an artificial neural network can undo part of those optimizations as higher precision is needed to enable the iterative approach typically used and more storage is needed to keep track of the intermediate data required.Also,the frequent weight updates during training can pose additional challenges regarding energy efficiency as well as reliability.As such,neuromorphic-based architectures hold potential,as they allow on-line learning to be built in by mod-elling plasticity.Plenty of challenges remain to achieve this goal as it is difficult to make a single synapse and neuron device that allows the capture of a very wide range of time constants.Another approach for edge learning is to implement a pre-trained neural network for inference but permit ad-aptation of the final network layers to tune classification or detection,this approach is called transfer learning.4.5 Integrating AI into the smallest devices Recently a number of tools have been developed with the goal of implementing AI models which could fit the memory available in edge platforms.As an example,tinyML is about processing sensor data at extremely low power and,in many cases,at the outer-most edge of the network.Therefore,tinyML applications could be deployed on the microcontroller in a sensor node to reduce the amount of data that the node forwards to the rest of the system.These integrated“tiny”machine learning applications require“full-stack”solutions(hardware,system,software,and applications)plus the machine learning architectures,techniques,and tools performing on-device analytics.Furthermore,a variety of sensing modalities(vision,audio,motion,environmental,human health monitoring,etc.)are used with extreme energy efficiency(typically in the single milliwatt,or lower,power range)to enable machine in-telligence at the boundary of the physical and digital worlds.Tensorflow Lite(TFLite)was created specifically for this purpose,it proposes a set of tools that help program-mers to run AI models on embedded,mobile,and IoT devices.A typical workflow will involve the definition of the AI model in Keras/Tensorflow,followed by the conversion of the model from Keras to TFLite,and the final compression of the model(for example,via post-training quantization)to further decrease the overall foot-print.Many tinyML implementations actually use TFLite under the hood.With the increase in dedicated hardware for machine learning,an important direction for future work is the de-velopment of compilers,such as Glow,and other tools that optimize neural network graphs for heterogeneous hardware or train and handle specialized technologies and algorithms.4.6 Data as a basis for AIData is the fundamental piece behind ML/AI.However,one of the major problems when developing AI solu-tions can be the lack of sufficient data to achieve the required performance in a specific application.In recent years several techniques have been considered to deal with this problem in the context of cloud-based solu-tions;for example,by using semi-supervised learning(to take advantage of the large amounts of unlabelled data generated by edge devices),by using data augmentation(via Generative Adversarial Networks(GANs)or transformations),or by transfer learning.These have become cutting-edge methods deployed to improve the overall performance in AI models.However,the adoption of these techniques in edge computing still needs to be thoroughly investigated.EPoSS WHITE PAPER AI at the Edge34Moreover,edge systems need to interact with various types of IoT sensors,which produce a diversity of data such as image,text,sound,and motion.Edge analytics should be able to deal with those heterogeneous en-vironments and adapt to be multimodal allowing learning from features collected over multiple modalities.4.7 Neuromorphic technologiesNeuromorphic engineering is a ground-breaking approach to the design of computing technology that draws inspiration from powerful and efficient biological neural processing systems.Neuromorphic devices are able to carry out sensing,processing,and control strategies with ultra-low power performance.Today,the neu-romorphic community in Europe is leading the State-of-the-Art in this domain.The community includes an increasing number of labs that work on the theory,modelling,and implementation of neuromorphic comput-ing systems using conventional VLSI technologies,emerging memristive devices,photonics,spin-based,and other nano-technological solutions.Extensive work is needed in terms of neuromorphic algorithms,emerging technologies,hardware design and neuromorphic applications to enable the uptake of this technology,and to match the needs of real-world applications that solve real-world tasks in industry,health-care,assistive systems,and consumer devices.It is important to note that“neuromorphic”is most commonly defined as the group of brain-inspired hardware and algorithms.Parallel to the advancement in neuromorphic computing,the underlying computation of such technology gets increasingly complex and requires more and more parameters.This triggers further development of efficient neuromorphic hardware designs,e.g.the development of neuromorphic hardware that can tackle the well-known memory wall issues and limited power budget in order to make such technology applicable on edge de-vices.The emerging memory technologies provide additional benefits for neuromorphic solutions,especially memory technology that can allow us to perform computation directly in the memory cells themselves instead of having to load and store the parameters,inputs,and outputs into computation cores.Such technology,coupled with the properties of neuromorphic computing,delivers many benefits.Firstly,DL and spiking neural networks(SNN)parameters are often fixed and/or modified very seldom.This matches the capability of emerging non-volatile memories where write accesses are typically one or two orders slower than read accesses as the number of memory writes required is lower.Secondly,most computations are matrix ad-dition and multiplication.This operation can be mapped efficiently in memory arrays.Thirdly,inference of such neuromorphic networks can be optimized for low-bit precision and coarse quantization without sacrificing the quality of the network outputs.Some tasks,such as classification,are proven to be good enough even when networks are optimized to binary and/or ternary representation.This provides an excellent opportunity as the underlying operation can be simply replaced by AND/XOR logic.Fourthly,neural networks are robust to error.Thus,process variations on the emerging memory technologies do not limit their capability to compute and/or and load/store in the networks.These benefits can be achieved by in-memory compute technology using emerging memory technologies.4.8 Meta-learningIn most of todays industrial applications of deep learning,models and related learning algorithms are tai-lor-made for very specific tasks4041.This procedure can lead to accurate solutions of complex and multidimen-sional problems but it also has visible weaknesses4243.Normally,these models require an enormous amount of data to be able to learn how to correctly solve problems.Labelled data can be costly as it may require the intervention of experts or not be available in real-time applications due to the lack of generation events.1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS35A question can therefore arise:in addition to having the correct formulation and the descriptive data for the problem,is it possible not only to try to solve it but also to learn how to solve it in the best way?Therefore:“is it possible to learn how to learn?”Precisely on this question,the branch of machine learning,called meta-learn-ing(Meta-L),is based4546.In Meta-L the optimization is performed on multiple learning examples that consider different learning objec-tives in a series of training steps.In base learning,an inner learning algorithm,given a dataset and a target,solves a specific task such as image recognition.During meta learning,an outer algorithm updates the internal algorithm so that the model learned during base learning also optimizes an outer objective,which tries,for example,to increase the inner algorithms robustness or its generalization performance47.This two-step iterative approach is resulting in successful solutions to problems where few labels or,in general,little data is available,as the highest level of information is extracted thanks to the formulation of the opti-mization problem itself.Intelligent extraction of information,by addressing the problem from a general point of view can also lead to the ability of the inner algorithm to handle new situations quickly and with little data available with a robust approach48.Exactly for the reasons listed above,Meta-L is gaining significant attention in Edge AI,where the new data collected can be immediately processed and fed to the algorithms to increase the robustness of the model and generalisation of new tasks that may be useful for systems,even in the deployment phase.Looking at the advantages of Meta-Learning and the possibility of using it together with Edge computing to increase its benefits,provides a good outline of how this branch of ML can soon find concrete uses in the most varied application scenarios49.4.9 Hybrid modellingData-based and knowledge-based modelling can be combined into hybrid modelling approaches.Some solu-tions can take advantage of a-priori knowledge in the form of physical equations describing known causal re-lationships in the behaviour of the systems or by using well known simulation techniques.Whereas dependen-cies not known a priori can be represented by many kinds of machine learning methods using big data based on observing the behaviour of the systems.The former type of situation can be seen as white box modelling as the internal states possess a physical meaning,while the latter is referred to as black box modelling,using just the input-output-behaviour,but not maintaining information on the internal physical states of the system.Howev-er,in many cases,a model is not purely physics-based nor purely data-driven,giving rise to grey box modelling methods that can be formulated50.The assignment of models to the scale varies within the literature:For instance,a transfer function can be derived from physical considerations(white),identified from measurement data with a well-educated guess of the model order(grey)or without(black).Approaches for combining machine learning and simulation,by simulation-assisted machine learning or by ma-chine-learning-assisted simulation and combinations are described by von Rueden et al.in“Combining Machine Learning and Simulation to a Hybrid Modelling approach:Current and Future Directions”51 and in“Informed machine learning towards a taxonomy of explicit integration of knowledge into machine learning.”52 advan-tage of hybrid modelling is avoiding the necessity of learning a-priori the behaviour of systems from huge amounts of data,if they can be described by simulation techniques.Also,in the case of missing data,hybrid modelling is a possible approach53.A practical example of combining physical white-box modelling and machine learning to improve a model for the highly non-linear dynamic behaviour of a ship,described by a set of analytical equations has been recently investigated by Mei et al.54.Another example is hybrid modelling in process industries55.EPoSS WHITE PAPER AI at the Edge364.10 EnergyefficiencyReducing energy consumption is a general goal,not only,but especially for smart systems providers to address the challenges of global warming and enable a higher degree of miniaturization of intelligent devices.For a long time power reduction has been a challenge in micro and nano electronics and also a target for all AI applications,regardless of whether data is processed in the cloud or at the edge.But at the edge,this target is especially important as applications usually have only limited power resources available.They often have to be battery powered or even use energy harvesting.Special energy-efficient neural network architectures have been investigated56.Not only is the hardware crucial for low-power AI applications,but also the implemented methods and models have great influence on the energy consumption.This has been examined for the example of computer vision57.Moving away from traditional von Neumann processing solutions and using dedicated hardware58 allows for additional power reduction.Even more can be achieved with neuromorphic architectures59.The“ultimate benchmark”in power consumption for artificial intelligence would be the“natural intelligence”in form of the human brain,which has 86 bn.neurons60 and approximately 10141015 synapses61 with an energy consumption of less than 20W,based on glucose available to the brain,or only 0.2W,when counting the ATP usage instead of glucose62.Current GPU based solutions with that complexity are far from this energy efficiency.There is obviously plenty of headroom for further development.1 POTENTIAL OF AI AT THE EDGE FOR SMART SYSTEMS375 Milestones for AI at the Edge in Smart Systems Edge AI is a key technological area that is ending the pure dominance of the cloud in the data analytics world.As shown by the numerous scenarios and contexts reported in this white paper,Edge AI technology is poised to disrupt a wide variety of industries because of the huge advantages introduced,such as increase real-time performance,improved energy efficiency,improved security and privacy etc.The evolution of a new generation of edge intelligence systems will take place during the next 515 years,with the completion of different technological steps supporting the development of new devices,technology and applications.However,there are still several challenges that have to be addressed:the development of new algorithms and applications,the development of neuromorphic-based chips and new specialized computing platforms and their integration with classical systems,the development of efficient and automated transfer learning to support federated learning as well

2022-06-09 52页 5星级
欧洲科学技术合作组织：2021人机协作时代的曙光：新兴语言技术展望报告（英文版）（77页）.pdf
THE DAWN OF THEHUMAN-MACHINE ERAA FORECAST OF NEW AND EMERGING LANGUAGE TECHNOLOGIESThe Dawn of the Human-Machine EraA forecast of new and emerging language technologiesThis publication is based upon work from COST Action Language in the Human-Machine Era,supported by COST(European Cooperation in Science and Technology).COST(European Cooperation in Science and Technology)is a funding agency for research and innovation networks.Our Actions help connect research initiatives across Europe and enable scientists to grow their ideas by sharing them with their peers.This boosts their research,career and innovation.www.cost.euFunded by the Horizon 2020 Framework Programme of the European UnionThis work is licenced under a Creative Commons Attribution 4.0 International Licencehttps:/creativecommons.org/licenses/by/4.0/To cite this reportSayers,D.,R.Sousa-Silva,S.Hhn et al.(2021).The Dawn of the Human-Machine Era:A forecast of new and emerging language technologies.Report for EU COST Action CA19102 Language In The Human-Machine Era.https:/doi.org/10.17011/jyx/reports/20210518/1Sayers,Dave 0000-0003-1124-7132Sousa-Silva,Rui 0000-0002-5249-0617Hhn,Sviatlana 0000-0003-0646-3738Ahmedi,Lule 0000-0003-0384-6952Allkivi-Metsoja,Kais 0000-0003-3975-5104Anastasiou,Dimitra 0000-0002-9037-0317Beu,tefan 0000-0001-8266-393XBowker,Lynne 0000-0002-0848-1035Bytyi,Eliot 0000-0001-7273-9929Catala,Alejandro 0000-0002-3677-672Xepani,Anila 0000-0002-8400-8987Chacn-Beltrn,Rubn 0000-0002-3055-0682Dadi,Sami 0000-0001-7221-9747Dalipi,Fisnik 0000-0001-7520-695XDespotovic,Vladimir 0000-0002-8950-4111Doczekalska,Agnieszka 0000-0002-3371-3803Drude,Sebastian 0000-0002-2970-7996Fort,Karn 0000-0002-0723-8850Fuchs,Robert 0000-0001-7694-062XGalinski,Christian (no ORCID number)Gobbo,Federico 0000-0003-1748-4921Gungor,Tunga 0000-0001-9448-9422Guo,Siwen 0000-0002-6132-6093Hckner,Klaus 0000-0001-6390-4179Lncos,Petra Lea 0000-0002-1174-6882Libal,Tomer 0000-0003-3261-0180Jantunen,Tommi 0000-0001-9736-5425Jones,Dewi 0000-0003-1263-6332Klimova,Blanka 0000-0001-8000-9766Korkmaz,Emin Erkan 0000-0002-7842-7667Mauec,Mirjam Sepesy 0000-0003-0215-513XMelo,Miguel 0000-0003-4050-3473Meunier,Fanny 0000-0003-2186-2163Migge,Bettina 0000-0002-3305-7113Mititelu,Verginica Barbu 0000-0003-1945-2587Nvol,Aurlie 0000-0002-1846-9144Rossi,Arianna 0000-0002-4199-5898Pareja-Lora,Antonio 0000-0001-5804-4119Sanchez-Stockhammer,C.0000-0002-6294-3579ahin,Aysel 0000-0001-6277-6208Soltan,Angela 0000-0002-2130-7621Soria,Claudia 0000-0002-6548-9711Shaikh,Sarang 0000-0003-2099-4797Turchi,Marco 0000-0002-5899-4496Yildirim Yayilgan,Sule 0000-0002-1982-6609Contributors(names and ORCID numbers)This report began life in October 2020 at the start of the Language In The Human-Machine Era net-work(lithme.eu).Several online co-writing workshops followed,working together in Google Docs while video-conferencing.The list of contributors was recorded automatically in the Google Doc activity log.The content of the report was finalised on 12 May 2021,at which point this activity log was copied into a Google spreadsheet,and a table chart automatically rendered to weigh contributions.On this basis LITHMEs Chair,Dave Sayers,is the named first author.He is very closely followed in the activity log by Rui Sousa Silva,Chair of LITHME Working Group 1,and then by Sviatlana Hhn,LITHMEs Vice-Chair.All three contributed significantly and consistently.The other named contributors all made the report what it is:authoritative,clear,diverse,and future-oriented.We look forward to working together on future editions of this important forecast.A note on the contributorsContents1 Introduction:speaking through and to technology.51.1 Speaking through technology.61.2 Speaking to technology.71.3 The variety of languages,tools and use-cases.71.3.1 Non-standard language(data).81.3.2 Minority and under-resourced languages.91.3.3 Sign languages.91.3.4 Haptic language.101.4 Endless possibilities vs boundless risks,ethical challenges.101.5 The way ahead.112 Behind the scenes:the software powering the human-machine era.12Summary and overview.122.1 Text Technology.142.1.1 Translation of texts.142.1.2 Sentiment,bias.172.1.3 Text-based conversation.182.2 Speech Technology.192.2.1 Automatic speech recognition,speech-to-text,and speech-to-speech.192.2.2 Voice Synthesis.212.3 Visual and tactile elements of interaction.232.3.1 Facial expression,gesture,sign language.232.3.2 Tactile expression and haptic technology.242.4 Pragmatics:the social life of words.242.5 Politeness.273 Gadgets&gizmos:human-integrated devices .28Summary and overview.283.1 Augmented Reality.303.1.2 The visual overlay.303.1.3 Augmenting the voices we hear.313.1.4 Augmenting the faces we see.323.2 Virtual reality.343.3 Automated conversation(chatbots).353.4 Second Language Learning and Teaching.373.4.1 Intelligent Computer-Assisted Language Learning(ICALL).373.4.2 Communicative ICALL.383.4.3 VR-based Language learning.393.4.4 AR-based Language Learning.403.5 Law and Order.423.5.1 Automated legal reasoning.423.5.2 Computational forensic linguistics.433.5.3 Legal chatbots.443.6 Health and Care.453.7 Sign-language Applications.463.8 Writing through technology.473.8.1 Professional translators as co-creators with machines.483.8.2 Cyber Journalism,Essay and Contract Generation.493.9 Personality profiling.504 Language in the Human-machine Era.524.1 Multilingual everything.524.2 The future of language norms.534.3 Natural language as an API.534.4 Problems to solve before using NLP tools.534.5 Speechless speech.544.6 Artificial Companions and AI-Love.554.7 Privacy of linguistic data and machine learning.554.8 Decolonising Speech and Language Technology.554.9 Authorship and Intellectual Property.564.10 Affective chatbots.564.11 Ethics,lobbying,legislation,regulation.575 Conclusion,looking ahead.58Acknowledgements.61References.615The human-machine era is coming soon:a time when technology is integrated with our senses,not confined to mobile devices.What will this mean for language?Over the centuries there have been very few major and distinctive milestones in how we use language.The inven-tion(s)of writing allowed our words to outlive the moment of their origin(Socrates was famously suspicious of writing for this reason).The printing press enabled faithful mass reproduction of the same text.The telegram and later the telephone allowed speedy written and then spoken communication worldwide.The internet enabled bil-lions of us to publish mass messages in a way previously confined to mass media and governments.Smartphones brought all these prior inventions into the palms of our hands.The next major milestone is coming very soon.For decades,there has been a growing awareness that technology plays some kind of active role in our communi-cation.As Marshall McLuhan so powerfully put it,the medium is the message(e.g.McLuhan&Fiore 1967;Carr 2020;Cavanaugh et al.2016).But the coming human-machine era represents something much more fundamental.Highly advanced audio and visual filters powered by artificial intelligence evolutionary leaps from the filters we know today will overlay and augment the language we hear,see,and feel in the world around us,in real time,all the time.We will also hold complex conversations with highly intelligent machines that are able to respond in detail.1Introduction:speaking through and to technology“Within the next 10 years,many millions of people will walk around wearing relatively unobtrusive AR devices that offer an immersive and high-res-olution view of a visually augmented world”(Perlin 2016:85)6Language In The Human-Machine Era lithme.eu COST Action 19102In this report we describe and forecast two imminent changes to human communication:Speaking through technology.Technology will actively contribute and participate in our commu-nication altering the voices we hear and facial movements we see,instantly and imperceptibly trans-lating between languages,while clarifying and amplifying our own languages.This will not happen overnight,but it will happen.Technology will weave into the fabric of our language in real time,no longer as a supplementary resource but as an inextricable part of it.Speaking to technology.The current crop of smart assistants,embedded in phones,wearables,and home listening devices will evolve into highly intelligent and responsive utilities,able to address complex queries and engage in lengthy detailed conversation.Technology will increasingly under-stand both the content and the context of natural language,and interact with us in real time.It will understand and interpret what we say.We will have increasingly substantive and meaningful conversations with these devices.Combined with enhanced virtual reality featuring lifelike characters,this will increasingly enable learning and even socialising among a limitless selection of intelligent and responsive artificial partners.In this introduction,we further elaborate these two features of the human-machine era,by describing the advance of key technologies and offering some illustrative scenarios.The rest of our report then goes into further detail about the current state of relevant technologies,and their likely future trajectories.1.1 Speaking through technologyThese days,if youre on holiday and you dont speak the local language,you can speak into your phone and a translation app will re-voice your words in an automated translation.This translation technology is still nascent,its reliability is limited,and it is confined to a relatively small and marketable range of languages.The scope for error and miscommunication,confusion or embarrassment remains real.The devices are also clearly physically separate from us.We speak into the phone,awkwardly break our gaze,wait for the translation,and proceed in stops and starts.These barriers will soon fade,then disappear.In the foreseeable future we will look back at this as a quaint rudimentary baby step towards a much more immersive and fluid experience.The hardware will move from our hands into our eyes and ears.Intelligent eyewear and earwear currently in prototype will beam augmented information and images directly into our eyes and ears.This is the defining dis-tinction of the human-machine era.These new wearable devices will dissolve that boundary between technology and conversation.Our current binary understanding of humans on the one hand,and technology on the other,will drift and blur.These devices will integrate seamlessly into our conversation,adding parallel information flows in real time.The world around us will be overlain by additional visual and audible information directions on streets,opening hours on stores,the locations of friends in a crowd,social feeds,agendas,anything one could find using ones phone but instead beamed directly into ones eyes and ears.We will interact with machines imperceptibly,either through subtle finger movements detected by tiny sensors or through direct sensing of brainwaves(both are in development).This will alter the basic fabric of our interactions,fundamentally and permanently.As these devices blossom into mass consumer adoption,this will begin to reshape the nature of face-to-face interaction.Instead of breaking the flow of conversation to consult handheld devices,our talk will be interwoven with technological input.We will not be speaking with technology,but through technology.The software is also set to evolve dramatically.For example,the currently awkward translation scenario described above will improve,as future iterations of translation apps reduce error and ambiguity to almost imperceptible levels finessed by artificial intelligence churning through vast and ever-growing databases of natural language.And this will be joined by new software that can not only speak a translation of someones words,but automatically mimic their voice too.Meanwhile,the evolution of Augmented Reality software,combined with emerging new eyepieces,will digitally augment our view of each persons face,in real time.This could alter facial movements,including lip movements,to match the automated voice translation.So we will hear people speaking our language,in their voice,and see their mouth move as if they were speaking those translated words.If our interlocutors have the same kit,they will hear 7Language In The Human-Machine Era lithme.eu COST Action 19102and see the same.This is what we mean when we say technology will become an active participant,inextricably woven into the interaction.All this might feel like a sci-fi scenario,but it is all based on real technologies currently at prototype stage,under active development,and the subject of vast(and competing)corporate R&D investment.These devices are com-ing,and they will transform how we use and think about language.1.2 Speaking to technologyAs well as taking an active role in interaction between people,new smart technologies will also be able to hold complex and lengthy conversations with us.Technology will be the end agent of communicative acts,rather than just a mediator between humans.Currently,smart assistants are in millions of homes.Their owners call out commands to order groceries,adjust the temperature,play some music,and so on.Recent advances in chatbot technology and natural language interfaces have enabled people to speak to a range of machines,including stereos,cars,refrigerators,and heating systems.Many companies use chatbots as a first response in customer service,to filter out the easily answerable queries before releasing the expense of a human operator;and even that human operator will be prompted by another algorithm to give pre-specified responses to queries.We already speak to technology,but in quite tightly defined and structured ways,where our queries are likely to fit into a few limited categories.This,too,is set to change.New generations of chatbots,currently under active development,will not only perform services but also engage in significantly more complex and diverse conversations,including offering advice,thinking through problems,consoling,celebrating,debating,and much else.The change here will be in the volume and nature of conversation we hold with technology;and,along with it,our levels of trust,engagement,and even emotional investment.Furthemore,devices will be able to solve complicated requests and find or suggest possible user intentions.This,too,will be entirely new terrain for language and communication in the human-machine era.Like the move to Augmented Reality eyewear and earwear,this will be qualitatively distinct from the earlier uses of technology.Now switch from Augmented Reality to Virtual Reality,and imagine a virtual world of highly lifelike artificial characters all ready and willing to interact with us,on topics of our choice,and in a range of languages.Perhaps you want to brush up your Italian but you dont have the time or courage to arrange lessons or find a conversation partner.Would those barriers come down if you could enter a virtual world full of Italian speakers,who would happily repeat themselves as slowly as you need,and wait without a frown for you to piece together your own words?Language learning may be facing entirely new domains and learning environments.The same systems could be used for a range of other purposes,from talking therapy to coaching autistic children in interactional cues.The ability to construct a virtual world of lifelike interlocutors who will never get scared or offended,never judge you,never laugh at you or gossip about you carries with it immense potential for learning,training,and communication support.Indeed,highly intelligent chatbots are unlikely to remain constrained to specific contexts of use.They will adapt and learn from our input as silent algorithms contour their responses to maximise our satisfaction.As they become more widely available,many people may talk to them more or less all the time.Able to understand us,deploying algorithms to anticipate our needs,patiently responding and never getting tired or bored,bots may become our best imaginable friends.Again,all this is simply a logical and indeed explicitly planned progression of current prototype technology,a foreseeable eventuality heading towards us.Many millions of people will soon be regularly and substantively speaking to technology.1.3 The variety of languages,tools and use-casesBelow is a model that shows different levels of complexity in the different technologies we discuss in this report from simple online form-filling to highly complex immersive Virtual Reality.We map two measures of complexity against each other:formality;and number of modalities.Formal language tends to be easier for machines to handle:more predictably structured,with less variation and innovation.Informal language tends to be more free-flowing 8Language In The Human-Machine Era lithme.eu COST Action 19102and innovative,harder to process.Next is modalities.Modalities are the various ways that humans use language through our senses,including writing,speech,sign,and touch.The more of these a machine uses at once,the more processing power is needed.The model below sets all these out for comparison.Figure 1.Levels of difficulty for machines,according to language formality and modalities There are predictions that over time the distinction between written and spoken language will gradually fade,as more texts are dictated to(and processed by)speech recognition tools,and texts we read become more speech-like.Below we discuss types of human language,combining the perspectives of linguists and technologists.As above,this is relevant to the amount of work a machine must do.1.3.1 Non-standard language(data)Many languages around the world have a standard form(often associated with writing,education,and officialdom)alongside many non-standard varieties dialects,and if the language is used internationally,perhaps also distinctive national varieties(for example Singaporean English or Morrocan Arabic).There will also be various registers of language,for example text messages,historical texts,formal letters,news media reporting,conversation,and so on(Biber&Conrad 2009).There will also be approximations associated with language learners.All these variations present challenges for standard Natural Language Processing(NLP)methods,not least be-cause NLP systems are typically trained on written,standard language such as newspaper articles.Usually,language processing with such language as input suffers from low accuracy and high rates of errors(Nerbonne 2016).Plank(2016)suggests“embracing”variations in linguistic data and combining them with proper algorithms in order to produce more robust language models and adaptive language technology.Learner language is described as non-standard and non-canonical language in NLP research,as“learners tend to make errors when writing in a second language and in this regard,can be seen to violate the canonical rules of a language”(Cahill 2015).Other examples of non-canonical language are dialects,ordinary conversation and historical texts,which stray from the standard.Different approaches have been used to manage the contents of conversation with the user and to deal with learner errors.Wilske(2014)mentions constraining possible input and error diagnosis as strategies used by researchers and software developers in order to deal with the complexity of learner input.9Language In The Human-Machine Era lithme.eu COST Action 191021.3.2 Minority and under-resourced languagesMinority languages are typically spoken by a numerical minority in a given country or polity;languages such as Occitan or Smi.They tend to be under-resourced in terms of technology and the data needed for AI.Certain of-ficial languages of smaller countries face similar barriers,such as Latvian or Icelandic.Under-resourced languages suffer from a chronic lack of available resources(human-,financial-,time-,data-and technology-wise),and from the fragmentation of efforts in resource development.Their scarce resources are only usable for limited purposes,or are developed in isolation,without much connection with other resources and initiatives.The benefits of reusability,accessibility and data sustainability are often out of reach for such languages.Until relatively recently,most NLP research has focused on just a few well-described languages,those with abundant data.In fact,state-of-the-art NLP methods rely heavily on large datasets.However,the situation is rapidly evolving,as we discuss further in this report.Research and development are being driven both by a growing demand from communities,and by the scientific and technological challenges that this category of languages presents.1.3.3 Sign languagesAs discussed above,speech and writing are two modalities of language,two ways of transmitting meaning through human senses(hearing and sight respectively).There are other modalities,principally used by people with hearing and sight impairments,shown in Table 1.Sign languages are those languages that typically use the signed modality.However,the table 1 risks some over-simplifications.Firstly,each sign language is not simply a visual representation of e.g.English,Finnish,etc.;they are entirely independent languages,with their own grammar,vocabulary,and other levels of linguistic structure.And,like spoken languages,they have huge variety,individual nuance,and creativity.Still,some spoken/written languages can be expressed visually,such as Signing Exact English for expressing(spoken or written)English.ModalityMeaning is encoded in.Sense requiredCommonly associated languagesMachine must produce.WrittenGraphemes (written characters)SightEnglish,Finnish,Esperanto,Quechua,etc.TextSpokenPhonemes (distinctive sounds)HearingSynthesised voiceHapticTouch(as in Braille or fingerspelling)TouchMoveable surfaceSignedMovements of the hands,arms,head and body;facial expressionVisionBritish Sign Language,Finnish Sign Language,International Sign etc.Avatar with distinguishable arms,fingers,facial features,mouth detail and postureTable 1.Modalities of language and what they require from machinesPut another way,the signed modality is the basic modality for individual sign languages,but some other languages can also be expressed in the signed modality.It is possible to differentiate further into full sign languages and signed languages,such as fingerspelling,etc.often used in school education for young students(see ISO,in prep.).A further distinction is needed between visual sign languages and tactile sign languages.For example,unlike visual sign languages,tactile sign languages do not have clearly defined grammatical forms to mark questions.Additionally,visual sign languages use a whole range of visible movements beyond just the handshapes hearing people typically associated with sign.This includes facial expression,head tilt,eyebrow positions or other ways of managing what in spoken language would be intonation(Willoughby et al.2018).“Unlike spoken languages,sign languages employ multiple asynchronous channels to convey information.These channels include both the manual(i.e.upper body motion,hand shape and trajectory)and non-manual(i.e.facial expressions,mouthings,body posture)features”(Stoll et al.2018).It is important to distinguish all these,for understanding different peoples needs and the different kinds of use cases of new and emerging language technologies.10Language In The Human-Machine Era lithme.eu COST Action 191021.3.4 Haptic languageThe haptic modality is used particularly by deafblind people,who have limited or no access to the visual or auditory channels.Such communication systems can be based on an existing language(English,Finnish,etc.),often by adapting individual sign languages to the haptic modality or by fingerspelling in a spoken and written language.This may appear to be simply the use of the same language in a different modality;however,haptic systems are far more complicated.Deafblind signers have heterogeneous backgrounds and needs.For example,vision loss during life may lead to the development of idiosyncratic choices when language is developed in isolation.If a haptic system is not related to any other language but is instead an independent development,then it constitutes an individual language in its own right.Tadoma is a method of communication used by deafblind individuals,in which the deafblind person places their thumb on the speakers lips and their fingers along the jawline.he middle three fingers often fall along the speakers cheeks with the little finger picking up the vibrations of the speakers throat.See https:/ the USA,the movements made by deafblind users to develop and promote interactional conventions have been referred to as pro-tactile movements see http:/www.protactile.org/.)Haptics,short for social-haptic communication,refers to a range of communicative sym-bols and practices that differ from standard tactile signing that are used to convey information,e.g.the description of a location,to deafblind people(Willoughby et al.2018).Braille is the written language used by blind people to read and write.It consists of raised dots corresponding to written characters,which can be read with the fingers.Strictly speaking,communication through braille belongs to the haptic modality,although it is very close to writing,especially for the speaker.For extensive introductory detail on how Braille works,see e.g.http:/www.dotlessbraille.org/.A key detail is that there is not a one-to-one relationship between text in a visual alphabet and text in Braille.Even plain text needs to be translated into Braille before it can be read.To complicate matters further,Braille is language-specific,and the Braille code differs from country to country and according to domain(e.g.literary Braille,scientific Braille,Braille music,Braille poetry,pharmaceutical Braille),medium of rendition(six-dot Braille for paper,eight-dot for computers),and contraction levels(from two levels in British English Braille to five in the recently revitalised Norwegian Braille).Added to this comes the issue of Braille character sets(Christensen 2009).In section 2.3,we return to current capabilities and limitations of technologies for signed and haptic modalities.1.4 Endless possibilities vs boundless risks,ethical challengesThe above scenarios sketch out some exciting advances,and important limitations.There are some additional conspicuous gaps in our story.Every new technology drags behind it the inequalities of the world,and usually contributes to them in ways nobody thought to foresee.Perhaps the most obvious inequality will be financial access to expensive new gadgets.This will inevitably follow and perhaps worsen familiar disadvantages,both enabling and disenfranchising different groups according to their means.Access will certainly not correlate to need,or environmental impact sustained(Bender et al.2021).There have already been concerns raised about inequalities and injustice in emerging language technologies,for example poorer performance in non-standard language varieties(including of ethnic minorities),or citizens being unjustly treated due to technologies(https:/ is widely used to support decisions in life-altering scenarios including employment,healthcare(Char et al.2018),justice,and finance:who gets a loan,who gets a job,who is potentially a spy or a terrorist,who is at risk of suicide,which medical treatment one receives,how long a prison sentence one serves,etc.But NLP is trained on human language,and human language contains human biases(Saleiro et al.2020).This inevitably feeds through into NLP tools and language models(Blodgett et al.2020).Work is underway to address this(Bender 2019;Beukeboom&Burgers 2020;Benjamin 2020;Saleiro et al.2020).Remedies could lead to improved equality,or perhaps polarise society in new ways.LITHME is here to pay attention to all these possible outcomes,and to urge collaboration that is inclusive and representative of society.A further major gap was discussed in the previous section:sign languages.There have been many attempts to apply similar technology to sign language:smart gloves that decode gestures into words and sentences,and virtual ava-tars that do the same in reverse.But the consensus among the Deaf community so far is that these are a profoundly poor substitute for human interpreters.They over-simplify,they elide crucial nuance,and they completely miss the 11Language In The Human-Machine Era lithme.eu COST Action 19102diversity of facial expression,body posture,and social context that add multiple layers of meaning,emphasis and feeling to sign.Moreover,these technologies help non-signers to understand something from sign but they strip signers of much intended meaning.The inequality is quite palpable.There are early signs of progress,with small and gradual steps towards multimodal chatbots which are more able to detect and produce facial movements and complex gestures.But this is a much more emergent field than verbal translation,so for the foreseeable future,sign language automation will be distantly inferior.Another issue is privacy and security.The more we speak through and to a companys technology,the more data we provide.AI feeds on data,using it to learn and improve.We already trade privacy for technology.AI,the Internet of Things,and social robots all offer endless possibilities,but they may conceal boundless risks.Whilst improving user experiences,reducing health and safety risks,easing communication between languages and other benefits,technology can also lead to discrimination and exclusion,surveillance,and security risks.This can take many forms.Some exist already,and may be exacerbated,like the“filter bubbles”(Pariser 2011),“ideological frames”(Scheufele,1999;Guenther et al.2020)or“echo chambers”(Cinelli et al.,2021)of social media,which risk intellectual isolation and constrained choices(Holone 2016).Meanwhile automatic text generation will increasingly help in identifying criminals based on their writing,for example grooming messages or threatening letters,or a false suicide letter.Such text generation technologies can also challenge current plagiarism detection methods and procedures,and allow speakers and writers of a language to plagiarise other original texts.Likewise,the automatic emulation of someones speech can be used to trick speech recognition systems used by banks,thus contributing to cybercriminal activities.New vectors for deception and fraud will emerge with every new advance.The limits of technology must be clearly understood by human users.Consider the scenario we outlined earlier,a virtual world of lifelike characters endlessly patient interlocutors,teachers,trainers,sports partners,and plenty else besides.Those characters will never be truly sad or happy for us,or empathise even if they can emulate these things.We may be diverted away from communicating and interacting with imperfect but real humans.Last but not least,another challenging setting for technology is its use by minority languages communities.From a machine learning perspective,the shortage of digital infrastructure to support these languages may hamper development of appropriate technologies.Speakers of less widely-used languages may lag in access to the exciting resources that are coming.The consequences of this can be far-reaching,well beyond the technological domain:unavailability of a certain technology may lead speakers of a language to use another one,hastening the disappear-ance of their language altogether.LITHME is here to scrutinise these various critical issues,not simply shrug our shoulders as we cheer exciting shiny new gadgets.A major purpose of this report,and of the LITHME network,is to think through and foresee future societal risks as technology advances,and amplify these warnings so that technology developers and regu-lators can act pre-emptively.1.5 The way aheadLITHME is a diverse network of researchers,developers and other specialists,aiming to share insights about how new and emerging technologies will impact interaction and language use.We hope to foresee strengths,weakness-es,opportunities and threats.The remainder of this report sketches the likely way ahead for the transformative technologies identified above.We move on now to a more detailed breakdown of new and emerging language technologies likely to see wide-spread adoption in the foreseeable future.The rest of the report falls into two broad areas:software;and hardware.Section 2 examines developments in computing behind the scenes:advances in Artificial Intelligence,Natural Language Processing,and other fields of coding that will power the human-machine era.Section 3 focuses on the application of this software in new physical devices,which will integrate with our bodies and define the human-machine era.12Artificial Intelligence(AI)is a broad term applied to computing approaches that enable ma-chines to learn from data,and generate new outputs that were not explicitly programmed into them.AI has been trained on a wide range of inputs,including maps,weather data,planetary movements,and human language.The major overarching goal for language AI is for machines to both interpret and then produce language with human levels of accuracy,fluency,and speed.Recent advances in Neural Networks and deep learning have enabled machines to reach un-precedented levels of accuracy in interpretation and production.Machines can receive text or audio inputs and summarise these or translate them into other languages,with reasonable(and increasing)levels of comprehensibility.They are not yet generally at a human level,and there is distinct inequality between languages,especially smaller languages with less data to train the AI,and sign languages sign is a different modality of language in which data collection and machine training are significantly more difficult.There are also persistent issues of bias.Machines learn from large bodies of human language data,which naturally contain all of our biases and prejudices.Work is underway to address this ongoing challenge and attempt to mitigate those biases.Machines are being trained to produce human language and communicate with us in in-creasingly sophisticated ways enabling us to talk to technology.Currently these chatbots 2Behind the scenes:the software powering the human-machine eraSummary and overview13Language In The Human-Machine Era lithme.eu COST Action 19102power many consumer devices including smart assistants embedded in mobile phones and standalone units.Development in this area will soon enable more complex conversations on a wider range of topics,though again marked by inequality,at least in the early stages,between languages and modalities.Automatic recognition of our voices,and then production of synthesised voices,is progressing rapidly.Currently machines can receive and automatically transcribe many languages,though only after training on several thousand hours of transcribed audio data.This presents issues for smaller languages.Deep learning has also enabled machines to produce highly lifelike synthetic voices.Recently this has come to include the ability to mimic real peoples voices,based on a similar principle of churning through long recordings of their voice and learning how individual sounds are pro-duced and combined.This has remarkable promise,especially when combined with automated translation,for both dubbing of recorded video and translation of conversation,potentially enabling us to talk in other languages,in our own voice.There are various new ways of talking through technology that will appear in the coming years.Aside from text and voice,attempts are underway to train AI on sign language.Sign is an entirely different system of language with its own grammar,and uses a mix of modalities to achieve full meaning:not just shapes made with the hands but also facial expression,gaze,body posture,and other aspects of social context.Currently AI is only being trained on handshapes;other modalities are simply beyond current technologies.Progress on handshape detection and production is focused on speed,accuracy,and making technologies less intrusive moving from awkward sensor gloves towards camera-based facilities embedded in phones and web-cams.Still,progress is notably slower than for the spoken and written modalities.A further significant challenge for machines will be to understand what lies beyond just words,all the other things we achieve in conversation:from the use of intonation(questioning,happy,aggressive,polite,etc.),to the understanding of physical space,implicit references to common knowledge,and other aspects woven into our conversation which we typically understand alongside our words,almost without thinking,but which machines currently cannot.Progress to date in all these areas has been significant,and more has been achieved in recent years than in the preceding decades.However,significant challenges lie ahead,both in the state of the art and in the equality of its application across languages and modalities.This section covers advances in software that will power the human-machine era.We describe the way machines will be able to understand language.We begin with text,then move on to speech,before looking at paralinguistic features like emotion,sentiment,and politeness.Underlying these software advances are some techniques and processes that enable machines to understand human speech,text,and to a lesser extent facial expression,sign and gesture.Deep learning techniques have now been used extensively to analyse and understand text sequences,to recognise human speech and transcribe it to text,and to translate between languages.This has typically relied on supervised machine learning approaches;that is,large manually annotated corpora from which the machine can learn.An example would be a large transcribed audio database,from which the machine could build up an understanding of the likelihood that a certain combination of sounds correspond to certain words,or(in a bilingual corpus)that a certain word in one language will correspond to another word in another language.The machine learns from a huge amount of data,and is then able to make educated guesses based on probabilities in that data set.The term Neural Networks is something of an analogy,based on the idea that these probabilistic models are working less like a traditional machine with fixed inputs and outputs and more like a human brain,able to arrive at new solutions somewhat more independently,having learned from prior data.This is a problematic and somewhat superficial metaphor;the brain cannot be reduced to the sum of its parts,to its computational abilities(see e.g.Epstein 2016;Cobb 2020;Marincat 2020).Neural Networks do represent a clear advance from computers that simply repeat code programmed into them.Still,they continue to require extensive prior data and programming,and have less flexibility in computing the importance and accuracy of data points.This is significant 14Language In The Human-Machine Era lithme.eu COST Action 19102in the real world because,for example,the large amounts of data required for deep learning are costly and time consuming to gather.Investment has therefore followed the line of greatest utility and profit with lowest initial cost.Low-resource languages lose out from deep learning.Deep Neural Networks(DNNs),by contrast,work by building up layers of knowledge about different aspects of a given type of data,and establishing accuracies more dynamically.DNNs enable much greater flexibility in determining,layer by layer,whether a sound being made was a k or a g and so on,and whether a group of sounds together corresponded to a given word,and words to sentences.DNNs allow adaptive,dynamic,estimated guesses of linguistic inputs which have much greater speed and accuracy.Consequently,many commercial products inte-grate speech recognition;and some approach a level comparable with human recognition.Major recent advances in machine learning have centred around different approaches to Neural Networks.Widely used technical terms include Recurrent Neural Networks(RNNs),Long Short-Term Memory(LSTM),and Gated Recurrent Units(GRUs).Each of these three can be used for a technique known as sequence-to-sequence,se-q2seq.Introduced by Google in 2014(https:/arxiv.org/pdf/1409.3215.pdf),seq2seq analyses language input(speech,audio etc.)not as individual words or sounds,but as combined sequences;for example in a translation task,interpreting a whole sentence in the input(based on prior understanding of grammar)and assembling that into a likely whole sentence in a target language all based on probabilities of word combinations in each language.This marks a major advance from translating word for word,and enables more fluent translations.In particular it allows input and output sequences of different lengths,for example a different number of words in the source and translation useful if source and target languages construct grammar differently(for example presence of absence of articles,prepositions,etc.)or have words that dont translate into a single word in another language.The above is a highly compressed review of some of the underlying machinery for machine learning of language.Worth also noting that many of these same processes are used in areas like automatic captioning of photos(inter-preting what is in a photo by comparing similar combinations of colours and shapes in billions of other photos),facial recognition(identifying someones unique features by referring to different layers of what makes a face look like a human,like a man,like a 45 year old,and so on),self-driving cars(distinguishing a cyclist from a parking space),and so on.These algorithms will govern far more than language technology in the human-machine era.We move on now to discuss how these underlying machine smarts are used to analyse text,speech,paralinguistic features like sentiment,and then visual elements like gesture and sign.2.1 Text TechnologyHeadline terminology for automated text facilities include:information extraction,semantic analysis,sentiment analysis,machine translation,text summarisation,text categorisation,keyword identification,named entity recog-nition,and grammar/spell-checkers,among others.A major challenge for NLP research is that most information is expressed as unstructured text.Computational models are based on numerical entities and probabilistic modelling;but natural language is obviously not so straightforward.Furthermore,the number of categories that exist in natural language data is magnitudes greater than,say,image processing.Success in NLP applications has therefore been slower and more limited.2.1.1 Translation of textsHumans have long had high hopes for machine translation;but for many years these hopes were in vain.The ALPAC report(Pierce&Carroll 1966)conveyed a sense of that disappointment.Significant technological invest-ment at this time was paying off in the developments of the early internet.Investment in machine translation,however,generated much less satisfying results.Initial attempts at machine translation were rule-based,built on the assumption that,if a computer was given a set of rules,eventually it would be able to translate any combination of words.Preliminary results of trials run on short messages produced under tightly controlled circumstances were promising.However,when fed texts pro-duced naturally(often containing ungrammatical formulations),the system fell down.This is because translation is not about words,but about meanings.Computers have long struggled to process meanings in a source language and produce them in a target language.15Language In The Human-Machine Era lithme.eu COST Action 19102Attempts at machine translation were soon dropped,but were resumed later on by projects such as Google Translate,which approached the problem not based on rules but statistics,not on direct dictionary correspondence but on the likelihood of one word following another,or surrounding others in the semantic space.Statistical machine translation systems first aligned large volumes of text in a source and target language side by side,and then arrived at statistical assumptions for which words or word combinations were more likely to produce the same meanings in another language.Companies like Google were ideally placed for this,as they indexed trillions of pages written in many languages.The system would soon become a victim of its own success,as companies and users worldwide started using poor quality translations,including those produced by Google,to produce websites in many different languages.As a result,poor quality data fed into the same system.Garbage in,garbage out.Statistical machine translation,too,then fell short of expectations,and Google invited their users to correct the translations produced by the system.Translation is nowadays perhaps the area where human-machine interaction technologies have advanced the most.Yet,not all types of translation have evolved at the same pace;translation of written language has progressed more than spoken and haptic languages.More recently,research has focused on neural machine translation(NMT).The rationale behind NMT is that technology is able to simulate human reasoning and hence produce human-like machine translations.Indeed,the functions of MT are likely to continue to expand.In the area of machine translation there are now various utilities including Google Translate,Microsoft Translate and DeepL.Open source alternatives include ESPNet,and FBK-Fairseq-ST.These are based on deep learning techniques,and can produce convincing results for many language pairs.Deep learning uses large datasets of previously translated text to build probabilistic models for translating new text.There are many such sources of data.One example is multilingual subtitles:and within these,a particularly useful dataset comes from TED talks these are routinely translated by volunteers into many languages with adminis-tratively managed quality checks;they cover a variety of topics and knowledge domains,and they are open access(Cettolo et al.2012).There are limitations,for example translations are mainly from English to other languages;and since many talks are pre-scripted,they may not represent typical conversational register(Dupont&Zufferey 2017;Lefer&Grabar 2015).TED talks are nevertheless valuable for parallel data.They are employed as a data set for statistical machine translation systems and are one of the most popular data resources for multilingual neural machine translation(Aharoni et al.2019;Chu et al.2017;Hoang et al.2018;Khayrallah et al.2018;Zhang et al.2019).The accuracy of machine translation is lower in highly inflected languages(as in the Slavic family),and aggluti-native languages(like Hungarian,Turkish,Korean,and Swahili).In many cases,this can be remedied with more data,since the basis of deep learning is precisely to churn through huge data sets to infer patterns.This,however,presents problems for languages spoken by relatively small populations often minority languages.Hence,prog-ress is running at different paces,with potential for inequalities.Even though deep learning techniques can provide good results,there are still rule-based machine translation sys-tems in the market like that of the oldest machine learning company SYSTRAN().There are also open source systems like Apertium(apertium.org).These toolkits allow users to train neural machine translation(NMT)systems with parallel corpora,word embeddings(for source and target languages),and dictionaries.The different toolkits offer different(maybe overlapping)model implementations and architectures.Nematus(https:/ an attention-based encoder-decoder model for NMT built in Tensorflow.OpenNMT(,https:/www.aclweb.org/anthology/P17-4012)and MarianNMT(https:/marian-nmt.github.io/)are two other open source translation systems.One of the most prolific open source machine translation systems is the Moses phrase-based system (https:/www.statmt.org/moses),used by Amazon and Facebook,among other corporations.Moses was also successfully used for translation of MOOCs across four translation directions from English into German,Greek,Portuguese,and Russian(Castilho et.al.2017).Another research trend is AI-powered Quality Estimation(QE)of machine translation.This provides a quality indication for machine translation output without human intervention.Much work is being undertaken on QE,and some systems such as those of Memsource(https:/ available;but so far none seems to have reached sufficient robustness for large-scale adoption.According to Sun et al.(2020),it is likely that QE models trained on publicly available datasets are simply guessing translation quality rather than estimating it.Although QE models might capture fluency of translated sentences and complexity of source sentences,they cannot model adequacy of translations effectively.There could be vari-16Language In The Human-Machine Era lithme.eu COST Action 19102ous reasons for this,but this ineffectiveness has been attributed to potential inherent flaws in current QE datasets,which cause the resulting models to ignore semantic relationships between translated segments and the originals,resulting in incorrect judgments of adequacy.CJEU MT Systran SYStem TRANSlation has contributed significantly to machine translation(https:/curia.europa.eu/jcms/upload/docs/application/pdf/2013-04/cp130048en.pdf).Another example is the European Unions eTranslation online machine translation service,which is provided by the European Commission(EC)for European official administration,small and medium sized enterprises(SMEs),and higher education institutions(https:/ec.europa.eu/info/resources-partners/machine-translation-public-administrations-etranslation_en).Bergamot(browser.mt/)is a further interesting project whose aim is to add and improve client-side machine trans-lation in a web browser.The project will release an open-source software package to run inside Mozilla Firefox.It aims to enable bottom-up adoption by non-experts,resulting in cost savings for private and public sector users.Lastly,ParaCrawl(paracrawl.eu/)is a European project which applies state-of-the-art neural methods to the detection of parallel sentences,and the processing of the extracted corpora.As mentioned above,translation systems tend to focus on languages spoken by large populations.However,there are systems focusing on low-resource languages.For instance,the GoURMET project(https:/gourmet-project.eu/)aims to use and improve neural machine translation for low-resource language pairs and domains.The WALS database(https:/wals.info/)(Dryer&Haspelmath 2013)is used to improve systems(language transfer),especial-ly for less-resourced languages (Naseem et al.2012;Ahmad et al.2019).Machine translation has been particularly successful when applied to specialized domains,such as education,health,and science.Activities focused on specific domains abound:for example,the Workshop for Machine Translation(WMT)has offered a track on biomedical machine translation which has led to the development of domain-specif-ic resources.http:/www.statmt.org/wmt20/biomedical-translation-task.html.There are limited parallel corpora,and much more monolingual data in specialized domains(e.g.for the biomedical domain:https:/www.aclweb.org/anthology/L18-1043.pdf).Back-translation is studied to integrate monolingual corpus into NMT training of domain-adapted machine translation(https:/www.aclweb.org/anthology/P17-2061.pdf).European Language Resource Coordination(ELRC)http:/lr-coordination.eu/node/2 is gathering data(corpora)specialised on Digital Service Infrastructures.The EUs Connecting Europe Facility(CEF)in Telecom enables cross-border interaction between organisations(public and private).Projects financed by CEF Telecom usually deliver domain-specific corpora(especially for less resourced languages)for training and tuning of the e-Translation system.Examples include MARCELL(marcell-project.eu)and CURLICAT(curlicat.eu).Currently,the main obstacle is the need for huge amounts of data.As noted above,this creates inequalities for smaller languages.Current technology based on neural systems conceal a hidden threat:neural systems require much more data for training than rule-based or traditional statistical machine-learning systems.Hence,technologi-cal language inclusion depends to a significant extent on how much data is available,which furthers the technolog-ical gap between resourced and under-resourced languages.Inclusion of additional,under-resourced languages is desirable,but this becomes harder as the resources to build on are scarce.Consequently,these languages will be excluded from the use of current technologies for a long time to come and this might pose serious threats to the vitality and future active use of such languages.A useful analytical tool to assess the resources of such languages is the Digital Language Vitality Scale(Soria 2017).Advances in transfer learning may help here(Nguyen&Chiang 2017;Aji et al.2020),as well as less super-vised MT(Artetxe el al.2018).Relevant examples include HuggingFace(https:/huggingface.co/Helsinki-NLP/opus-mt-mt-en)and OPUS(opus.nlpl.eu).There is also a need to consider the economic impact for translation companies.For example in Wales the Cymen translation company has developed and trained its own NMT within its workflow,as part of the public-private SMART partnership(https:/businesswales.gov.wales/expertisewales/support-and-funding-businesses/smart-partnerships).Other companies()have adopted similar ap-proaches.The benefits of such technology are evident,although their use raises issues related to ownership of data,similarly to older ethical questions of who owns translation memories.Human translators have not yet been entirely surpassed,but machines are catching up.A 2017 university study of Korean-English translation,pitting various machine translators against a human rival,came out decisively in favour of the human;but still the machines averaged around one-third accuracy(Andrew 2018).Another controlled test,comparing the accuracy of automated translation tools,concludes that“new technologies of neural and adaptive translation are not just hype,but provide substantial improvements in machine translation quality”(Lilt Labs 2017).17Language In The Human-Machine Era lithme.eu COST Action 19102More recently,Popel et al.(2020)demonstrated a deep learning system for machine translation of news media,which human judges assessed as more accurate than humans,though not yet as fluent.This was limited to news media,which is a specific linguistic register that follows fairly predictable conventions compared to conversation,personal correspondence,etc.(see Biber&Conrad,2009);but this still shows progress.2.1.2 Sentiment,biasSentiment analysis is the use of automated text analysis to detect and infer opinions,feelings,and other subjective aspects of writing for example whether the writer was angry or happy.Extensive contributions have been made already,especially in more widely spoken languages(see Yadav&Vishwakarma 2020,for an accessible review).Social networking sites represent a landscape continuously enriched by vast amounts of data daily.Finding and extracting the hidden“pearls”from the ocean of social media generated data constitutes one of the great advan-tages that sentiment analysis and opinion mining techniques can provide.Nevertheless,language spoken by social networks,like tagging,likes,the context of the comment,have yet to be explored by communities in computation,linguistics,and social sciences in order to improve the results on automatic sentiment analysis performance.Some well known business applications include product and services reviews(Yang et al.2020),financial markets(Carosia et al.2020),customer relationship management(Capuano et al.2020),marketing strategies and research(Carosia et al.2019),politics(Chauhan et al.2021),and in e-learning environments(Kastrati et al.2020),among others.Most work for sentiment extraction has focused on English or other more widely used languages;and only a few studies have identified and proposed patterns for sentiment extraction as a tool applicable for multiple languages(i.e.for bridging the gap between languages)(Abbasi et al.2008;Vilares et al.2017).Focusing now on machine translation,the authors in Baccianella et al.(2010),Denecke(2008)and Esuli&Sebastiani(2006)performed sentiment classification for German texts using a multi-lingual approach.The authors translated the German texts into English language and then used SentiWordNet to assign polarity scores.Poncelas et al.(2020)discussed both advantages and drawbacks of sentiment analysis on translated texts.They reported exceptionally good results from English to languages like French and Spanish,which are relatively close to English in grammar,syntax etc.;but less good results for languages like Japanese,which are structurally more distinct.Shalunts et al.(2016)investigated the impact of machine translation on sentiment analysis.The authors translated Russian,German and Spanish datasets into English.The experimental results showed less than 5%performance difference for sentiment analysis in English vs.non-English datasets.This gives an indication that multilingual translation can help to create multilingual corpora for sentiment analysis.Balahur&Turchi(2014)performed machine translation to translate an English dataset of New York Times articles into German,French and Spanish using three different translators(Google,Bing&Moses).These four different texts were then used to train the multilingual sentiment classifier.For the test,the authors also used Yahoo Translator.The results supported the quality of translated text and sentiment analysis.Barriere&Balahur(2020)proposed to use automatic translation and multilingual transformer models.These are the recent advances in the NLP to solve the problem of sentiment analysis in multi-language combinations.For more detailed analysis in this area,see Lo et al.(2017).On the issue of bias,machine learning has been applied to,for example,hyperpartisan news detection;that is,news articles biased towards a person,a party or a certain community(Frber et al.2019).Bias,however,has increasingly been an issue discussed in language created automatically by machines themselves.Popular cited examples include Google Translate translating non-gendered languages like Finnish and adding gendered pronouns according to traditional gender associations:“he works,she cooks”,etc.One of the challenges faced by machine learning systems and methods,in general,is judging the“fairness”of the computational model underlying those systems.Because machine learning uses real data produced by real people,to which some sort of statistical processing is applied,it is reasonable to expect that the closer those systems are to human commu-nication,the more likely they are to reproduce all things good and bad about the respective population.When training corpora are skewed towards white American English-speaking males,the systems tend to be more error prone when handling speech by English-speaking females and varieties of English other than American(Hovy et al.2017;Tatman 2017;see also https:/plan-norge.no/english/girls-first;Costa-Juss 2019).Such systems re-produce social and cultural issues and stereotypes(Nangia et al.2020,Vanmassenhove et al.2018),and racial bias(Saunders et al.2016;Lum&Isaac 2016).18Language In The Human-Machine Era lithme.eu COST Action 19102Further relevant technical terminology in this field includes:sentiment ontologies enrichment and refinement syntactic-semantic relations metaphoric and implicit language properties sentiment evaluative terms multimodal contexts for spoken data analysis performanceLikely future developmentsWork is underway to mitigate gender and other bias in machine learning,for example the automatic gendering discussed above,e.g.Sun et al.(2019),Tomalin et al.(2021).This will be especially important since automatically produced texts feed into future machine learning,potentially exacerbating their own biases.There are also early attempts to mobilise automated sentiment analysis for predicting suicide or self-harm,using the writing of known sufferers and victims to predict these conditions in others,scaled up using massive data sets(see e.g.Patil et al.2020).From the clinical to the verificational and forensic:voice is already used as an alternative to passwords in call centres(voice signature verified by algorithm);and sentiment analysis is under development for identifying early signs of political extremist behaviour or radicalisation(see e.g.Asif et al.2020;De Bruyn 2020).The focus on text brings distinct limitations for other modalities speech,sign,gesture,etc.Further studies are also required to address the cross-lingual differences and to design better sentiment classifiers.Future devel-opments will also seek to enhance detection approaches with more accurate supervised/semi-supervised ML techniques,including transfer(transformer)models.From the linguistic standpoint,many approaches have been recently introduced,such as Googles Neural Machine Translation(https:/research.google/pubs/pub45610/)for delivering English text contextually similar to a certain foreign language.2.1.3 Text-based conversationWithin technology circles,chatbots are seen as relatively primitive early predecessors to smarter and more complex successors;terms for these include“dialogue systems”(Klwer 2011),“conversational interfaces”and“conversational AI”.However,the term chatbot has stuck and become much more common;it is therefore likely to continue dominating the popular understanding of all sorts of conversational interfaces,including dialogue systems,intelligent agents,companions and voice assistants.So we use the term chatbot in this report as an umbrella term.Current chatbots are very heterogeneous.This section is only a brief overview of all aspects of chatbot technology.For a more detailed reference see for example McTear(2020).Chatbots embody a long-held fantasy for humanity:a machine capable of maintaining smart conversations with its creator.Chatbot technology has three principle requirements:understanding what the user said;understanding what to do next;and doing this next(usually sending a response,sometimes also performing other actions).ELIZA(Weizenbaum 1966)is recognised to be the first chatbot.It was followed by thousands of similar machines.ELIZA was primitive:able to recognise patterns in written input,and retrieve precompiled responses.Over time,the complexity of the language comprehension capabilities increased.Audio-and video-signals were also added to the initial text-only communication.A variety of use cases for chatbots have been explored in academic research,such as education,health,companion-ship,and therapy.Despite significant research,only a few of the first chatbots reached the commercial market and a wider audience(usually customer service contexts).Some car manufacturers installed conversational interfaces for GPS controls and hands-free phone calls.More complex,technical,forensic or clinical uses are likely some way off;indeed current early experiments have led to some alarming initial results,such as a prototype healthcare chat-bot answering a dummy test patients question“Should I kill myself?”,with“I think you should”(Hutson 2021).In 2015,social network providers realised that people use instant messengers more intensively than social networks.This was the time of the“chatbot revolution”:messengers opened their APIs to developers and encouraged them to become chatbot developers by providing learning resources and free-of-charge access to developer tools.Natural Language Understanding as a service became a rapidly developing business area.19Language In The Human-Machine Era lithme.eu COST Action 19102Natural Language Understanding(NLU)includes a range of technologies such as pattern-based NLU;these are powerful and successful due to a huge number of stored patterns.For instance,AIML(Artificial Intelligence Mark-up Language)forms the brain of KuKi(former Mitsuku),the Loebner prize-winner chatbot.2.2 Speech TechnologyThe previous section discussed machines analysing and producing written language,including translation.The current section turns to machines working on spoken language,also including a focus on translation.Relevant terminology includes Automatic Speech Recognition(ASR)and Speech-To-Text(STT).The human voice is impressive technology.It allows hearing people to express ideas,emotions,personality,mood,and other thoughts to other hearing people.In addition to linguistic characteristics,speech carries important para-linguistic features over and above the literal meaning of words,information about intensity,urgency,sentiment,and so on can all be conveyed in our tone,pace,pitch and other features that accompany the sounds we call words.Think of the word sorry.You could say this sincerely or sarcastically,earnestly or reluctantly,happily or sadly;you could say it in your local dialect or a more standard form;as you say it you could cry,sigh,exhale heavily,etc.;and if you heard someone saying sorry,you could immediately decode all these small but highly meaningful nuances,from voice alone.Context matters too:are you sorry only for yourself,or on behalf of someone else?Are you apologising to one person,two people,a whole country,or the entire United Federation of Planets?Fully understanding an apology means fully grasping these contextual details.Now think about programming a machine to grasp all that,to listen like a human.Its much more than simply teaching the machine to piece together sounds into words.But progress is occurring.The evolution of speech recognition and natural language understanding have opened the way to numerous applications of voice in smart homes and ambient-assisted living,healthcare,military,education etc.Speech technologies are considered to be one of the most promising sectors,with the global market estimated at$9.6 billion in 2020 and forecasted increase to$32.2 billion by 2027(Research&Markets 2020).But as we have cautioned already,if private corporations are leading on these technologies,then significant concerns arise with regard to data security,privacy,and equality of access.Figure 2.Automatic speech recognition and voice synthesis2.2.1 Automatic speech recognition,speech-to-text,and speech-to-speech2.2.1.1 What is it,and how is it performed?Automatic Speech Recognition(ASR)is the ability of devices to recognize human speech.In 1952,the first speech recognizer Audrey was invented at Bell Laboratories.Since then,ASR has been rapidly developing.In the early 1970s,the US Department of Defences Advanced Research Projects Agency funded a program involving ASR.This led to Carnegie Mellon Universitys Harpy(1976),which could recognize over 1000 words.In the 1980s,20Language In The Human-Machine Era lithme.eu COST Action 19102Hidden Markov Models(HMMs)also made a big impact,allowing researchers to move beyond conventional recognition methods to statistical approaches.Accuracy,accordingly,increased.By the 1990s,products began to appear on the market.Perhaps the most well-known is Dragon Dictate(released 1990)which,though cutting edge for its time,actually required consumers to“train”the algorithm themselves,and to speak very slowly.Progress from this point was relatively slow until the 2010s,when Deep Neural Networks(DNNs,discussed earlier)were introduced in speech engineering.Commercial speech recognition facilities include Microsoft Windows inbuilt dictation facility(https:/ Watson( Transcribe( Google Speech-to-Text( source alternatives include Mozilla Deep Speech( Jasper(https:/nvidia.github.io/OpenSeq2Seq/html/speech-recognition/jasper),Kaldi ASR(https:/ Fairseq-S2T(Wang et al.2020).Notably some of these open source facilities are developed by private companies(e.g.Facebook,NVIDIA)with their own incentives to contribute to other products in their portfolio.A recently founded EU-funded project,MateSUB(),is leveraging these kinds of capabilities spe-cifically for adding subtitles following speech recognition.Machine translation of subtitles was the topic of the SUMAT project http:/www.fp7-sumat-project.eu/.2.2.1.2 What are some of the challenges?Many challenges remain for machines to faithfully and reliably decode human speech.These include at least the following:Different words that sound the same,like here/hear,bare/bear.These are known as homo-phones(i.e.same sound)and require more than just the sound alone to understand Rapidly switching between dialects or languages(code-switching),which is extremely common in normal human conversation around the world Variability in the volume or quality of someones voice,including things like illness,or physical block-ages like chewing food Ambient sounds like echoes or road noise Transfer influences from ones first language(s)to second languages(Elfeky et al.2018;Li et al.2017)All sorts of other conversational devices we use,like elisions(skipping sounds within words to say them more easily),or repair(making a small error and going back to correct it)Paralinguistic features:pace,tone,intonation,volumeFor all these various levels of meaning and nuances of speech,there are relatively few annotated training sets,that is,databases of speech that contain not only transcribed speech but all that other necessary information,in a format a machine could understand.This is especially acute for lesser-resourced languages.And if systems are initially trained on English(and/or English paired with other languages),and then transferred to other languages and language pairs,there could be a bias towards the norms of the English language,which might differ with other languages.The issue of speaker variation(individual variation,non-standard dialects,learner varieties,etc.)requires greater attention.Equal access to speech recognition technology will depend heavily on this.A standardized framework for describing these is under development in ISO(International Organization for Standardization).Likely future improvementsTraditional speech technologies require massive amounts of transcribed speech and expert knowledge.While this works fairly well for major languages,the majority of the worlds languages lack such resources.A large amount of research is therefore dedicated to the development of speech technologies for low-resource or zero-resource languages.The idea is to mimic the way infants learn to speak,spontaneously,directly from raw sensory input,with minimal or no supervision.21Language In The Human-Machine Era lithme.eu COST Action 19102A huge step towards unsupervised speech recognition was made when Facebook released wav2vec(Schneider et al.2019)and its successor wav2vec 2.0(Baevski et al.2020),which is able to achieve a 5.2%word error rate using only 10 minutes of transcribed speech.It learns speech representations directly from raw speech signals without any annotations,requiring no domain knowledge,while the model is fine-tuned using only a minimal amount of transcribed speech.This holds great promise,though success will depend on factors including accessibility,privacy concerns,and end user cost.Mozilla Deep Speech,accompanied with its annotated data CommonVoice initiative(see next section),aims to em-ploy transfer learning.That is,models previously trained on existing large annotated corpora,such as for American English,are adapted and re-trained with smaller annotated corpora for a new domain and or language.Such an approach has been proven to be viable for bootstrapping speech recognition in a voice assistant for a low-resourced language.Companies like Google,as well as many university AI research groups,are busily attempting to apply self-supervised learning techniques to the automatic discovery and learning of representations in speech.With little or no need for an annotated corpus,self-supervised learning has the potential to provide speech technology to a very wide diversity of languages and varieties:see for example https:/icml-sas.gitlab.io/.Further challenges ahead for automated subtitling include improved quality,and less reliance on human post-edit-ing(Matusov et al.2019).2.2.2 Voice Synthesis2.2.2.1 What is it,and how is it performed?Voice synthesis is the ability of machines to produce speech sounds artificially.With origins as a niche research field restricted to the most highly trained specialists speech synthesis is now a large domain with people of varying specialisms producing core components of successful commercial products.The success of voices like Amazons Alexa and Apples Siri were built on years of work on speech modelling and parametrization.Contemporary systems,based on advanced neural modelling techniques,get us closer to bridging the quality and naturalness gap while still offering flexibility and control.They are capable of modelling challenging heterogeneous data,i.e.data that contains multiple sources of variation such as speakers and languages,non-ideal recording conditions,and expressive and spontaneous speech.The current cutting edge in voice synthesis is to go beyond simply creating a life-like robot voice,and instead fully mimicking a real person.This can be achieved by mobilising the recently discussed deep learning advances in AI.A voice synthesis algorithm can take a recording of a persons voice(ideally a relatively long one with a range of different sounds),and then apply deep learning techniques to assemble building blocks of sounds in that persons voice their accent,their pitch,tone,pace,and use of pauses and fillers.The machine can then create entirely new words,in new combinations,with that persons unique cadence,and with smooth transitions between words.There is a vibrant and jostling market of companies offering automated voice mimicry,for example ReSpeecher().They offer revoicing of a users voice of another person,and automated dubbing of video into other languages in the actors original voices.MateDUB()is an EU-funded project for automatically creating dubbed audio.Anyone is free to upload their voice and set a price for using the new automated voice.This means it isnt free,but it is significantly cheaper than voice actors recording everything.Another service is .Amazon is also working in this area:https:/arxiv.org/pdf/2001.06785.pdf.Mozillas Common Voice(https:/commonvoice.mozilla.org)aims to bring some of these capabilities to the world in a fully free and open-source way.This is intended to complement Mozillas Deep Speech speech recognition utility mentioned earlier in this report.Further relevant terminology in the field of voice synthesis includes:Text processing and signal processing,including text normalisation,letter-to-sound conversion,short term analysis of sequential signals,frequency analysis and pitch extraction Concatenative speech synthesis,including diphone synthesis and unit selection synthesis Statistical parametric based speech synthesis,including parametrizing speech using vocoders and acoustic modelling using HMMs22Language In The Human-Machine Era lithme.eu COST Action 19102 Currently:deep neural networks for acoustic modelling and waveform generation(replacing decision tree HMM-based models and vocoders)Advanced techniques for acoustic modelling using sequence-to-sequence(seq2seq)(Hewitt&Kriz 2018)models for end-to-end(e2e)speech synthesis(Taylor&Richmond 2020)2.2.2.2 What are some of the challenges?Some challenges are similar to voice recognition as noted earlier,for example the availability of data to train machines:Mozilla Common Voice for example requires 10,000 hours of speech to add a new language.Work is ongoing to bridge this gap by applying transfer learning,as discussed earlier;see for example Jones(2020)on development of a voice assistant for Welsh.Other common challenges with voice recognition include linguistic structural issues like homophones and code-switching.Challenges specific to voice synthesis include:Achieving“naturalness”,that is,passing for human according to human test subjects Persuasiveness and trustworthiness of automated voices;for example Dubiel et al.,2020,compare the persuasiveness of a chatbot speaking in different styles:“debating style vs.speech from audio-books”,while Glvez et al.(2017)discuss variability in pitch,intensity and speech rate linked to judgements of truthfulness measures to detect and intercept malicious imitation used for identity fraudSpeech recognition and voice synthesis can be combined together in various combined software applications.The application of machine translation to spoken language is more recent,though gaining rapidly in use.The principles do not differ significantly.The technology that enables two or more interlocutors to communicate in their own language in near real time already exists and is available to common users.Jibbigo,which was developed at Carnegie Mellon University in the early 2000s,is a good example.It started as an iPhone app to provide speech-to-speech translation between English and Spanish,but later included more languages,and was also ported to Android.In addition to popular tools like Microsoft Translator and Google Translate,a number of services are available that provide users with voice translation,or allow them to translate text and then read it aloud:iTranslate(https:/),Speak to Voice Translator(https:/ Translator 2020(https:/ All:Translation Voice Text&Dictionary(https:/ Translate(https:/ Naver Papago AI Translator(https:/ has also developed its own voice translation system,Skype Translator(https:/ enables translation of text from/into over 60 languages,and speech from/into 11 languages(or,to be more precise,language varieties),including Chinese(Simplified and Traditional),English(UK and US),French,German,Italian,Japanese,Portuguese,Russian and Spanish.More languages will certainly follow in the near future.The development of applications for machine translation of haptic languages has been slower,especially given the limitations underlying haptic language technologies.Yet,systems have evolved significantly in recent years.RoboBraille.org(http:/www.robobraille.org/)is an online service designed to translate text into braille,rendered as either six-dot or eight-dot Braille(Christensen 2009).A similar functionality is provided by BrailleTranslator.org(https:/www.brailletranslator.org/).The Dot Translation Engine and its related tactile device,the Dot Mini,allows the visually impaired to translate documents rapidly into Braille(https:/hyperinteractive.de/portfolio/dot-translation-engine/).For a current overview of technologies taking automated approaches to Braille,see Shokat et al.(2020).The Concept Coding Framework(CCF)is an attempt to provide a generic approach to make content interoperable in any language/communication modality:http:/www.conceptcoding.org/.Most machine translation of speech currently works by interpreting voice patterns into words,then assembling this into speech,before reading out that text in a synthesised voice.Further advances in machine translation are work-ing towards direct speech-to-speech translation,for example the Google Translatotron:https:/google-research.github.io/lingvo-lab/translatotron/.23Language In The Human-Machine Era lithme.eu COST Action 191022.3 Visual and tactile elements of interactionWe previously discussed sign language and the contribution of factors like facial expression and body posture to the meaning of sign.Sign languages make use of a wide range of very specific bodily cues.In spoken language too,the body is used,though less precisely.For hearing people,the body is a wide paint roller;for signers,it is a fine brush.As we also noted earlier,machine analysis of the visual elements of interaction has quite some way to go in comparison to voice and text.2.3.1 Facial expression,gesture,sign languageSorgini et al.(2018)give a review of progress in this area.Small-sized,tailor-made,low-cost haptic interfaces are in development.These interfaces will be integrated with common devices such as smartphones,contributing to a massification of sensory assistants among those impaired.This will also mean a move from invasive sensory implants to less invasive alternatives(ibid.).Machines have different tasks ahead of them in gauging the importance of body movements.For example,there is work aiming to simply identify who is speaking based on gesture(Gebre et al.2013).The task gets more and more specific until we get down to the much more precise work of interpreting sign language,which is still some appreciable way from being within the command of machines.As explained by Jantunen et al.(2021),research into automated sign language detection,processing and translation is important and worthy;but currently no automated systems are anywhere close to full functionality.First of all,as we noted earlier in this report,sign languages are not just a visual rendering of spoken language.They are entirely separate languages,with their own grammar.These grammars have attracted significant research attention.A relevant source for comparative grammars of sign languages,potentially relevant for computational purposes,is the Sign-Hub project(sign-hub.eu/).Additionally,rule-based machine translation has recently demon-strated promising results for sign language,given how it represents grammar(Filhol et al.2016).But these endeav-ours have faced serious limitations.Sign-hub has adopted the formal(generative)theory as its starting point,so the blueprint of sign language grammars is based on the study of spoken language,and especially English.The sensor gloves,too,require further development,not the least because the data used in the test/evaluation of the sensor gloves was only ten finger alphabets and number signs plus one emblem gesture,“I love you”.That is eleven sign types,and only in American Sign Language.Consequently,it still remains to be established how the information produced as part of the Sign-hub can be used to contribute to a robust and generally accepted SL translator.Hadjadj et al.(2018)proposed an alternative approach to the grammar of French Sign language that takes into account the additional grammatical characteristics of sign language.There is much progress left to make here.One problem for machine learning of sign language is shared with minority spoken languages:scarce and unstruc-tured data.For spoken languages,this is somewhat easier to solve than for sign:just feed more data into the same systems used for larger languages;and/or improve transfer learning approaches which we discussed earlier.For sign,the problem is much more complex.Systems for automatic recognition of speech(audio only)and writing simply cannot understand sign.Sign language corpora are not only smaller than spoken corpora;they are much harder to gather.Remember that sign is a different modality to speech;sign corpora must be annotated manually for machines to learn from,which is demanding and very time-consuming(Jantunen et al.2021).The main enigma for machine translation of sign language in the near future is the bounty of unconventional characteristics that exist in all sign languages the so-called indicating and depicting aspects of signed utterances which are not easily translatable.As a result,data sets are fewer,and those that exist have been collected under“controlled environments with limited vocabulary”(Camgz et al.2018:7785).Eventually,machines learn from annotated models,and not di-rectly from video as would be required to capture the inherently multimodal nature of sign.Attempts have been made to make progress in this area,by finding new ways to collect multimodal data(Camgz et al.2018)and to program intelligent avatars to produce sign language after translating speech(Stoll et al.2018).Neural networks generate video content without relying on extensive motion capture data and complex animation.Nevertheless,as the authors caution,this work is rare and foundational,and still far behind the progress achieved so far by research into writing and speech.Another issue is the confidentiality of the individuals involved in the production of sign language samples collected for shared datasets:as a recent study suggests,signers can be recognized based on motion capture information(Bigand et al.2020).24Language In The Human-Machine Era lithme.eu COST Action 19102Although all progress is welcome,the technological advances in the field of sign language have been slow,when compared to the pace at which written and spoken language technologies have evolved.As Jantunen et al.(2021)predict,machine translation of sign languages will face at least three important challenges for some time to come(that is,long after similar obstacles are overcome for spoken and written modalities):(a)multimodality,wherein meaning is made not only with hand gestures but also with a rich mix of gesture,facial expression,body posture,and other physical cues,yet even the most advanced detection systems in development are focused only on hand movement;(b)there are hundreds to thousands of sign languages,but research so far has focused on major sign languages,so the complexity of sign language communication and translation is higher than gesture-recognition systems currently take into account;(c)meaning often also depends on socio-cultural context and signers knowledge of each others lives,which machines cannot know(and training them to find out provokes major privacy concerns).In section 1.3.3 we discussed sign language.In section 2.3.1 we expand sign recognition and synthesis including its many persistent limitations.2.3.2 Tactile expression and haptic technologyHaptic assistive technologies build upon tactile sense to offer sensory information to deaf,blind and deaf-blind individuals when communicating with the non-disabled community.The visual and auditory cues received by the machine are converted into haptic feedback;that is,targeted pressure on the skin in a pattern that corresponds to meaning.It is a common misconception that the human-machine interaction of blind people requires a special,highly specific and sophisticated,Braille-enabled computer.Currently,a blind person can use an ordinary computer,equipped with an ordinary keyboard;no physical adaptations or alterations are required.They touch-type,perhaps using automated audio readouts of each keystroke to confirm.Instead of a mouse,shortcut keys are used(many such shortcuts pre-date the invention of the computer mouse and are quite standard).The information shown in the computer screen is vocalised by a“screen reader”using voice synthesis methods discussed earlier.Screen readers also allow users to control a Braille terminal a device connected to the computer to show the computer screen in Braille.Information can also be printed using a Braille printer.Research into so-called tactile sign language is even scarcer(Willoughby et al.2018),partly because tactile sign languages are still developing stable conventions.The group of deaf-blind signers is highly heterogeneous(ibid.),and the influence of sociolinguistic or fluency factors(e.g.at what life stage did the tactile sign language acquisi-tion occur)is still unknown.Because so much pragmatic information in visual sign languages is communicated non-manually,such meaning can hardly be made if at all with tactile signs.We are at our most cautious when discussing progress in this area.2.4 Pragmatics:the social life of wordsAs well as words and body movements,machines will also need to understand the purpose of each utterance,and how it changes the world around us.This brings us into the realm of pragmatics,including the subtle negotiation of politeness,sincerity,honesty,deception,and so on.Pragmatics has a long history as an academic discipline,and more recently the interdisciplinary field of Computational Pragmatics has arisen,somewhat as a subdiscipline of Computational Linguistics.Computational Pragmatics is perhaps less developed than other areas of Computational Linguistics,basically due to two main limitations:the need to further develop and structure Pragmatics itself as a theoretical field;and the need to further develop other subdisciplinary areas of Computational Linguistics,not least gesture recognition.The first limitation is the bounds of Pragmatics itself.This should not be underestimated.It remains quite an enigma how we combine such a rich and diverse range of actions to achieve things in conversation.There is simply a vast and splaying prism of pragmatic meaning-making,the myriad social meanings and intentions that go into what we say,and the similarly disparate array of effects these have in the world.Pragmatics is simply enormous,and very far from reaching any kind of unifying theories it is sometimes affectionately(or exasperatedly)labelled“the pragmatics wastebasket”(Yule 1996:6)for accommodating more or less all communicative phenomena 25Language In The Human-Machine Era lithme.eu COST Action 19102that do not neatly fit in other linguistic levels(syntax,morphology,etc.).To elaborate a little further,the myriad phenomena at play include at least:how we encode physical distance(“Youre too far away”,“Come closer”)how we select the right terms to address each other(“Madam”,“buddy”,“Mx”,etc.)tacit cultural knowledge,or prior understanding that directly affects word choice,like who I am and who you are,where we are,what will happen if you drop a ball vs.a glass,etc.how we convince people to do things by appearing authoritative,weak,apologetic,etc.discourse markers and fillers(“hmm”,“uhuh”,“right”)And for every single one of these,there is bountiful and wonderful but baffling and unknowable diversity and change across languages and cultures,within smaller groups,between individuals,and in the same individual when talking to different people at different times.All this adds up to quite a challenge for machines to understand pragmatic meaning.The second limitation noted above is the need to further develop other fields of Computational Linguistics.Recognising and classifying pragmatic phenomena first relies on recognising and classifying other linguistic phe-nomena(e.g.phonological,prosodic,morphological,syntactic or semantic).If someone tells a friend they are short of money,it could imply a request,a dispensation,a simple plea for pity,or something else;a machine cannot know which without first knowing about different ways of describing money,poverty,and so on;as well as the subtle but vital combination of gestures,facial expressions and other movements that could contribute to any of these intended meanings.All this might entail the incorporation into pragmatics-related corpora of sound at a larger scale and its processing and annotation with adequate schemes and formats.Archer et al.(2008)mention two definitions of computational pragmatics:“the computational study of the relation between utterances and action”(Jurafsky 2004:578);and“getting natural language processing systems to reason in a way that allows machines to interpret utterances in context”(McEnery 1995:12).As far as pragmatic annotation is concerned,it is noted that“the majority of the better-known(corpus-based)pragmatic annotation schemes are devoted to one aspect of inference:the identification of speech/dialogue acts”(Archer et al.2008:620).Some projects developed subsequently such as the Penn Discourse Treebank(https:/www.cis.upenn.edu/p-dtb/)have also worked extensively in the annotation of discourse connectives,discourse relations and discourse structure in many languages(see e.g.Ramesh et al.2012;Lee et al.2016;Webber et al.2012).Progress in the last two decades includes attempts to standardise subareas of Pragmatics,such as discourse struc-ture(ISO/TS 24617-5:2014),discourse relations(ISO 24617-8:2016)speech act annotation(ISO 24617-2:2020),dialogue acts(ISO 24617-2:2020),and semantic relations in discourse(ISO 24617-8:2016);and even to structure the whole field of Computational Pragmatics and pragmatic annotation(Pareja-Lora&Aguado de Cea 2010;Pareja-Lora 2014)and integrate it with other levels of Computational Linguistics and linguistic annotation(Pareja-Lora 2012).Further current research concerns,for instance,the polarity of speech acts,that is,in their classification as neutral,face-saving or face-threatening acts(Naderi&Hirst 2018).However,as Archer,Culpeper&Davies(2008)indicate,“unlike the computational studies concerning speech act interpretation,.corpus-based schemes are,in the main,applied manually,and schemes that are semi-automatic tend to be limited to specific domains”(e.g.“task-oriented telephone dialogues”).This is only one of the manifold limitations of research in this area.All this could be solved,to some extent,by suitable annotated gold standards to help train machine learning tools for the(semi-)automatic annotation and/recognition of pragmatic phenomena.These gold standards would need to include and integrate annotations pertaining to all linguistic levels as discussed above,a machine may struggle to identify pragmatic values if it cannot first identify other linguistic features(for instance politeness encoded in honorifics,pronouns,and verb forms).These annotated gold standards would be quite useful also for the evaluation of any other kinds of systems classifying and/or predicting some particular pragmatic phenomenon.Another big limitation in this field,as discussed above,is about the journey our words take out there in the real world.Much of our discussion so far in this report is all about the basic message we transmit and receive:words,signs,and so on.But human language is much more complex.The basic message the combination of sounds,the array of signs and gestures,that make up our utterances are absolutely not the end of the story for language.When we put together the words Let me go,those words have a linguistic meaning;they also carry an intention;and then subsequently(we hope)they have an actual effect on our lives.These are different aspects of our lan-26Language In The Human-Machine Era lithme.eu COST Action 19102guage,all essential for any kind of full understanding.Neurotypical adults understand all these intuitively but they must be learned;and so a machine must be trained accordingly.There have been some advances but constraints remain(mentioned below and also pervasively in this document),for example:1.Higher-order logical representation of language,discourse and statement meaning are still partial,incomplete and/or under development.2.Perhaps also as a consequence,computational inference over higher-order logic(s)for language(e.g.to deal with presuppositions or inference)require further research to overcome their own current limitations and problems.“Indeed,inference is said to pose“four core inferential problems”for the computational community:abduction.,reference resolution.,the interpretation and generation of speech acts.,and the interpretation and generation of discourse structure and coherence rela-tions.”(Archer,Culpeper&Davies 2008).The first of these,abduction,means roughly inference towards the best possible explanation,and has proved the most challenging for machines to learn;no great progress is expected here in the next decade.But progress on speech acts and discourse structure(and/or relations)has been robust for some widely-spoken languages;and some resources and efforts are being devoted to the reference resolution problem(that is,reference,inference,and ways of referring to physical space),in the fields of(i)named entity recognition and annotation and(ii)anaphora(and co-reference)resolution.The final big limitation of this field is the strong dependency of pragmatic features on culture and cultural differ-ences.Indeed,once identified,the values of these pragmatic features must be interpreted(or generated)according to their particular cultural and societal(not only linguistic)context.That pragmatic disambiguation is often a challenge for humans,let alone machines.Take face-saving or face-threatening acts:for example we attempt face-saving for a friend when we say something has gone missing,not that our friend lost it;while face-threatening acts,by contrast,are less forgiving.Interpretation or formulation of face-saving and face-threatening acts are highly culture-dependent.This also affects,for example,the interpretation and production of distance-related features(any kind of distance:spatial,social,temporal,etc.).Earlier we mentioned some levels of pragmatic meaning our basic literal message,our intention,its possible effects in the world.Understanding face-saving is a key part of managing those things,and they all differ according to who we are talking to.It is almost impossible to understand pragmatic meaning without understanding a huge amount of overlapping social information.Nothing is understood before everything is understood.In the end,all these aspects entail the codification and management of lots of common(or world)knowledge,information,features and values.Machines might be more able to process all these items now than in the past by means of big data processes and techniques(such as supercomputation or cloud computing).However,all these items still need to be identified and encoded in a suitable computer-readable format.The community of linked open data is working hard in this aspect and their advances might help solve this issue in due course(Pareja-Lora et al.2020).Likely future developmentsSeemingly,the likely future developments in this field might be the application of all this progress to the areas of:Human-machine interaction(e.g.chatbots).As above,chatbots can decode speech into words and grammatical structures,but they are much less adept at understanding the social purpose of our words,much less the varied interpretations of our intentions by those around us.Chatbots therefore make mistakes and the human user usually feels frustrated.Efforts will be directed towards these issues.Virtual reality avatars.More and more sociolinguistic knowledge will be incorporated in the program-ming of these entities,to make them increasingly natural and user-friendly.Machine translation and interpretation.Machine(or automatic)interpretation is still a very young field,since it has to somehow encompass and integrate both natural language processing and generation.Thus,it needs both of these areas to progress before it can further be developed.However,it seems that the time is ripe for a major leap forward in this field,and machine interpretation should blossom in the coming years,hand in hand with the advances within Computational Pragmatics.27Language In The Human-Machine Era lithme.eu COST Action 191022.5 PolitenessLinguistic politeness concerns the way we negotiate relationships with language.This is in some senses a sub-dis-ciplinary area of pragmatics.We design our speech in order to further our intentions.For that,we pay attention to face.In common parlance,to save face is to say something in a way that minimises the imposition or embar-rassment it might cause.It is simple everyday conversational diplomacy.Politeness theory builds on this simple insight to interrogate the various ways we attend to other peoples face needs,their self-esteem and their sense of worth.Neurotypical adults have an intuitive sense of interlocutors face needs.That sense enables a choice about whether we want to either uphold or undermine those needs.Do we say Im sorry I wasnt clear or You idiot,you completely misunderstood me!,or something in between these extremes?They mean the same thing,but they attend to face needs very differently.How we attend to face needs will depend on the nature

2022-06-09 77页 5星级
ContactBabel：人工智能（AI）自助服务核心指南（英文版）（110页）.pdf
TalkdeskR是一家全球客户体验领导者，为客户至上的公司服务。我们的联络中心解决方案为企业和客户提供了更好的沟通方式。我们的创新速度和全球足迹反映了我们的承诺，确保任何地方的企业都可以通过任何渠道提供更好的客户体验，从而提高客户满意度、节约成本和盈利能力。

2022-06-09 110页 5星级
麦肯锡（McKinsey）：物联网-抓住加速发展的机遇（英文版）（90页）.pdf
在此背景下，麦肯锡全球研究所(MGI)估计了到2025.1年物联网可能产生的经济价值。2020年，我们开始了解这种潜力已经被捕获了多少，考虑发生了什么变化，并在更新数据集的情况下展望未来。我们的目标不.

2022-06-01 90页 5星级
普华永道 & FICCI：通过人工智能重新定义印度农业-探索未知未来可期（英文版）（64页）.pdf
印度的政府(Gol)正在不遗余力地为印度农业发展有利的生态系统。最近，印度政府就允许将无人机用于农业生产发表了一项重要声明。既然政府发挥了自己的作用，那么其他利益相关者就有责任协同工作，使国家政策决策.

2022-05-31 64页 5星级
ResearchInChina：2022年全球及中国领先Tier1供应商智能座舱业务研究报告（简版）（英文版）（21页）.pdf
ResearchInChina发布了2022年全球及中国领先Tier1供应商智能座舱业务研究报告。报告显示，创新座舱量产步伐正不断加快，新技术渗透率快速提升。该报告两分为两卷：第一卷研究了6家一级供.

2022-05-31 21页 5星级
益普索（Ipsos）：2022年人工智能消费者洞察（AICI）报告（英文版）（9页）.pdf
但人工智能的进步正在改变品牌收集和激活消费者洞察的方式，以及社交数据的整体作用。在本文中，我们将探讨人工智能如何为企业带来新的解决方案和新的见解，并挑战现状Ai符合消费者洞察

2022-05-30 9页 5星级
普华永道（PwC）：数字影像-关于元宇宙的思考（英文版）（7页）.pdf
超宇宙不需要镜像实际的物质世界;这样做毫无意义。因此，在虚拟的伦敦场景中，人们很可能会在贝克街221B号遇到福尔摩斯和华生。玩家可以在老邦德街(Old Bond Street)上的著名时装商店购买NF.

2022-05-30 7页 5星级
未来今日研究所（FTI）：2022年识别、测试与隐私趋势报告（英文版）（61页）.pdf
我们还强调了包括各级政府在内的大多数行业的新兴或非典型威胁。对于那些在创意领域的人，你会发现大量的新想法，会激发你的想象力。我们的框架将近600种趋势划分为13个明确的类别，并作为单独的报告发布。

2022-05-23 61页 5星级
未来今日研究所（FTI）：2022年人工智能趋势报告（完整版）（英文版）（73页）.pdf
颠覆性的影响这种趋势对你的企业、政府或社会的影响。我们跟踪技术和科学的纵向趋势。这一测量方法表明我们跟踪这一趋势的时间及其进展。

2022-05-23 73页 5星级
Wunderman Thompson：为什么企业需要人工智能-揭示人工智能成为业务的变革性力量（英文版）（60页）.pdf
点击查看更多Wunderman Thompson：为什么企业需要人工智能-揭示人工智能成为业务的变革性力量（英文版）（60页）.pdf精彩内容。

2022-05-20 60页 5星级