《ABI research:人工智能遇上商业智能(英文版)(13页).pdf》由会员分享,可在线阅读,更多相关《ABI research:人工智能遇上商业智能(英文版)(13页).pdf(13页珍藏版)》请在三个皮匠报告上搜索。
1、ARTIFICIAL INTELLIGENCE MEETS BUSINESS INTELLIGENCE Big Datas Role in the Future of Artificial Intelligence Across Key Verticals 2ARTIFICIAL INTELLIGENCE MEETS BUSINESS INTELLIGENCE Harnessing the Potential of Intelligence Companies across the worldand across different industrieshave embraced the po
2、tential of Artificial Intelligence (AI). The automotive, commercial, consum- er, healthcare, and manufacturing sectors are investing billions into AI, hoping to boost productivity and profitability in the process. ABI Research forecasts that the total installed base of devices with AI will grow from
3、 2.694 billion in 2019 to 4.471 billion in 2024. Disconnected Data There are billions of petabytes of data flowing through AI devices every day and the volume will only increase in the years to come. The challenge now facing both technology companies and implementers is getting all of these devices
4、to learn, think, and work together. Multimodal learning is the key to making this happen, and it is fast becoming one of the most excitingand potentially transformativefields of AI. This whitepaper explores the growing commercial demand for multimodal learning, as well as the key business benefits t
5、hat it can deliver. It also provides an in-depth look at the end market opportunities being created in key verticals, along with a snapshot of the current state of activity and recommendations for technology providers, chipset makers, and connectivity vendors. 5,000 4,500 4,000 3,500 3,000 2,500 2,0
6、00 1,500 1,000 500 0 20172018201920202021202220232024 SOURCE: ABI RESEARCH TOTAL INSTALLED BASE OF EDGE AI DEVICES WORLD MARKETS, FORECAST 2017 TO 2024 INSTALLED BASE (MILLIONS) 3ARTIFICIAL INTELLIGENCE MEETS BUSINESS INTELLIGENCE What Is Multimodal Learning? Multimodal learning consolidates disconn
7、ected, heterogeneous data from various sensors and data inputs into a single model. Unlike traditional unimodal learning systems, multimodal sys- tems can carry complementary information about each other, which will only become evident when they are both included in the learning process. Therefore,
8、learning-based methods that combine signals from different modalities are capable of gener- ating more robust inference, or even new insights, which would be impossible in a unimodal system. There are two main benefits of multimodal learning: Predictions: Using multiple modality sensors to observe t
9、he same phenomena can make more robust predictions, because detecting changes in it may only be possible when both modali- ties are present. Inference: Fusing multiple sensor modalities can also allow for the capture of complementary information, e.g., a trend or in- stance that is not visible to in
10、dividual modalities on their own. New Opportunities at the Edge Multimodal learning will create an opportunity for chip vendors, as some use cases will need to be implemented at the edge. The implementation requirements of sophisticated edge multimodal learning systems will favor heterogenous chip s
11、ystems, because of their ability to serve both sequential and parallel processing. SOURCE: ABI RESEARCH MULTIMODAL ECOSYSTEM DEVELOPMENT ROADMAP 2011 - 2017 Adoption of DNNs into Multimodal AI Multimodal AI Translation Becomes Prominent in Voice Assistants 2018 - 2022 Open Source Multimodal Learning
12、 Platforms Launched Start-ups Launch Specialized Multimodal AI Chips Multimodal Learning Becomes a Feature of AI Platforms 2023 - 2027 Open Source Multimodal Software Ecosystem Thrives Chipset Vendors Provide Full Support Implementation for all Scenarios of Multimodal Learning Major Technical Multim
13、odal Learning Issues Addressed Companies Achieve Massive Multimodal Learning, Spanning the Length of the Enterprise 4ARTIFICIAL INTELLIGENCE MEETS BUSINESS INTELLIGENCE Commercial Demand Is Already Building, but There is a Lack of Supply (and a Lack of Strategy) The first wave of multimodal learning
14、 applications and products is just now permeating the market. However, very few companies have begun to formalize, let alone finalize, their strategies in this area. Even the most widely known multimodal learning systems, IBM Watson and Microsoft Azure, have failed to gain traction, which is a resul
15、t of poor marketing and positioning of multimodals capabilities. Most companies developing multimodal learning applications are doing so in propri- etary environments from scratch. AI platform companies like IBM, Microsoft, Amazon, and Google are continuing to focus on commercially proven technology
16、, which is pre- dominantly unimodal in nature. However, the era of multimodal learning is not too far off in the future. Multimodal is well placed to scale, as the underlying supporting technologies like Deep Neural Net- works (DNNs) have already done so in unimodal applications like image recogniti
17、on in camera surveillance or voice recognition and Natural Language Processing (NLP) in virtual assistants like Amazons Alexa. Furthermore, the cost of developing new multi- modal systems has fallen because the market landscape for both hardware sensors and perception software is already very compet
18、itive. At the same time, organizations are recognizing the need for multimodal learning to manage and automate processes that span the entirety of their operations. Given these factors, ABI Research estimates that the total number of devices shipped with multimodal learning applications will grow fr
19、om 3.94 million in 2017 to 514.12 mil- lion in 2023, at a Compound Annual Growth Rate (CAGR) of 83%. This dramatic growth of multimodal learning is made possible by the speed at which technologies can be rolled out in updates in devices like smartphones. 600 500 400 300 200 100 0 2017201820192020202
20、120222023 SOURCE: ABI RESEARCH RULES BASED TOTAL SHIPMENTS OF DEVICES USING MULTIMODAL AI BY S0FTWARE APPROACH WORLD MARKETS, FORECAST 2017 TO 2023 SHIPMENTS (MILLIONS) DNN HYBRID 5ARTIFICIAL INTELLIGENCE MEETS BUSINESS INTELLIGENCE Why Does the Industry Need Multimodal Learning? Multimodal learning
21、 is not a new concept. It has been a research topic in computer science since the mid-1970s, but it took a gi- ant leap forward when researchers started applying DNNs to their models. In 2011, an influential paper from a group of re- searchers at the Stanford University explained the benefits of usi
22、ng DNNs in multimodal learning over classical multimodal learning approaches, which used rules-based software. Multimodal training exploits complementary aspects of modal- ity data streams, making it a powerful technology and enabling new business applications that fall into three categories: classi
23、fi- cation, decision making, and Human Machine Interfaces (HMIs). Classification Multimodal training gives developers the capability to perform classifications of data that would be impossible with the pres- ence of only one modality. It also allows them to automate the classification process, which
24、 when left to employees can come at considerable cost. Companies are constantly creating vast amounts of unstruc- tured data in their processes, including from images, video, text, and audio data. While structuring these streams is possible with unimodal learning, actually generating insight into ho
25、w these data streams operate in relation to each other, and how they impact one another, requires multimodal learning classification. Multimodal learning can also be used to improve the quality of data classification, allowing more meaningful classification to take place. For instance, if a system w
26、as trying to classify a video using only image recognition, then it would miss out on import- ant information about the video that is present in the sound and dialogue. Companies still have not yet fully realized the potential value of structuring their data, and as they incorporate multimodal learn
27、ing into their classification processes, they will be able to improve both quality and depth of classification. 6ARTIFICIAL INTELLIGENCE MEETS BUSINESS INTELLIGENCE Real-World Example: Media Tagging Google is offering a product for structuring video data, using a combina- tion of audio and image rec
28、ognition to infer the genre, themes, and tone of a piece of video content, which is then used to tag it. Companies can use Google Cloud Video Intelligence to label their content and then map it to other similar videos based on those tags, enhancing the recommendation system. Better content recommend
29、ation improves the likelihood of sub- scribers watching more videos, which has been shown to have a diminish- ing effect on subscriber churn. Real-World Example: Medical Imaging and Diagnosis Some scientific researchers are also using multimodal learning for classify- ing medical conditions. Researc
30、hers at the Maharshi Dayanand University in India and Chongqing University in China have published papers that look at combining data from a mixture of Computed Tomography (CT) scans and Magnetic Resonance Imaging (MRI) scans to find coefficients between these two data sources. These combined data c
31、an be used to better diagnose diseases in patients. There are currently no commercial deployments of this technology, and it will take some time for the algorithms used in this process to gain acceptance in the medical community and by regulators. Decision-Making Systems Multimodal learning can be u
32、sed to help decision-making systems predict the best course of action in response to their current situation or unfolding events. By combining multiple modalities in the training process of a decision-making system, it is possible to spot unfolding situations impossible to identify using a single mo
33、dality. Decision-making systems are one of the most valuable application areas of multimodal training, as they will play a major role in automating transportation and robotics. Real-World Example: Advanced Driver Assistance Systems (ADAS) An example of decision-making systems using multimodal learni
34、ng is in ADAS on vehicles. ADAS technology, developed by NXP for Level 3 autonomous driving, is already using a mixture of Red, Green, Blue (RGB) camera, Light Detection and Ranging (LiDAR), and radar data to infer the best course of action for the vehicle. Spanish applied data research company Vico
35、mtech was also able to show that multimodal learning improved the accuracy of ADAS perception and prediction. Vicomtech trained a multimodal algorithm that estimates the future position of obstacles and other road users, given a sequence of their past trajectory data obtained from LiDAR and Global P
36、ositioning System (GPS) sensors. The system learned to accurately predict and react to where obstacles were going to be, a critical aspect for software guiding an autonomous vehicles path. 7ARTIFICIAL INTELLIGENCE MEETS BUSINESS INTELLIGENCE Human Machine Interfaces Multimodal learning is becoming a
37、 popular area of commercial research for those building HMI systems. HMI developers have found they can make their software more safe, secure, personalized, and accurate by incorporating multimodal learning. Real-World Example: Companion Robots Intuition Robotics launched ElliQ, which is a companion
38、 robot for elderly citizens that uses a mixture of audio and camera data to learn how to best interact with users. Based on the users responses to interactions with ElliQ, the HMI adjusts the frequency and how it conducts future interactions. Multimodal learning is relevant here because a user might
39、 appear visually happy to respond to the ElliQ robot regularly, but the tone of their voice might indicate that they are finding the robot slightly annoying; ElliQs software would then adjust the frequency at which it inter- acts with the user. Real-World Example: Collaborative Robots A group of tec
40、hnical researchers at the University of Novi Sad in Serbia have also developed an HMI system for collaborative robots that manages interactions be- tween ABBs industrial robots and humans. This system incorporates speech recog- nition for worker commands, image recognition to determine different ind
41、ividuals positions, and typed text for worker commands. The researchers used multimodal learning, so the robot takes the context and location of workers into account when dealing with commands, adapting its movement behavior to the current situation in which it is operating. Multimodal learning allo
42、ws the HMI system to learn how to carry out the commands it has received via speech or text and effectively schedule them with the robot in a safe way that does not involve any conflict with the workers it might be near. Real-World Example: Smartphones Improvements in HMI systems are not limited to
43、the robotics space. Google has begun using multimodal learning to optimize how smartphone users interact with Google Assistant via both touch and speech. Google is using multimodal learning to personalize the experience of using Google Assistant, balancing the voice with touch interactions and learn
44、ing from users previous interactions with the system. Real-World Example: Automotive In the automotive space, Cerence (formerly Nuance) has developed an in-car as- sistant that uses a mixture of voice, GPS data, and gesture and gaze recognition to interpret the commands of a driver. For instance, a
45、driver might ask the vehicle “What is the telephone number of that restaurant?” and then gesture toward the restaurant with their hands inside the vehicle. The assistant would be able to infer the restaurant the driver meant by combining GPS location data with information from the gesture of the dri
46、ver, picking the correct one out and then handing over the information to the driver. Multimodal learning is used here to fuse different command modalities with location data, allowing the driver to focus more of their attention on the activity of driving, improving driver safety. End-Market Opportu
47、nities There is impressive momentum driving multimodal applications onto devices, with five key end-market verticals most aggressive- ly adopting multimodal learning: automotive, robotics, consumer, healthcare, and media and entertainment. Automotive In the automotive space, there are three instance
48、s where multi- modal learning is being introduced: ADAS: By 2023, annual shipments of vehicles with ADAS Level 3 and higher will reach 21.7 million. These vehicles will contain some of the most sophisticated multimodal learning systems. Many players are trying to deliver ADAS, with nearly all the ma
49、jor vehicle Original Equipment Manufacturers (OEMs) having internal projects or strategic partnerships to develop Level 4 and 5 ADAS in the next 5 to 10 years. The current market leader, Waymo, has already delivered the first autonomous driving taxi service in Ari- zona, and Waymos competitors are trying to replicate a scalable version of this same service. ADAS needs real-time inferencing to ensure safety requirements are met, meaning processing for ADAS needs to take place at the edge on the vehicle. In-Vehicle HMI Assistants: In-car voice assistants have