《王俊杰-基于多模态大模型的用户界面交互和测试.pdf》由会员分享,可在线阅读,更多相关《王俊杰-基于多模态大模型的用户界面交互和测试.pdf(49页珍藏版)》请在三个皮匠报告上搜索。
1、基于多模态大模型的用户界面交互和测试王俊杰 中国科学院软件研究所演讲嘉宾王俊杰中国科学院软件研究所研究员,博士生导师中国科学院软件研究所研究员,博士生导师,中国科学院特聘研究岗位、青年创新促进会会员,主要从事智能化软件工程、软件质量等方面的研究,近年来主要关注智能软件测试、大模型驱动的软件测试等。在国际著名学术期刊/会议发表60余篇高水平学术论文,四次荣获ACM/IEEE杰出论文奖。主持和参与了多项国家自然科学基金项目、科技部重点研发计划、CCF-华为胡杨林基金等。担任CCF A类期刊TSE的Associate Editor,ICSE、FSE、ISSRE等的PC member,TOSEM、EM
2、SE、AUSE、软件学报等期刊的审稿人。目 录CONTENTS1.用户界面测试现状和挑战2.测试输入生成技术3.面向测试路径规划的自动化GUI测试技术4.基于多模态大模型的自动化GUI测试技术5.针对文本输入的模糊测试技术6.面向文本输入组件的交互提升技术7.总结和展望 Monkey Fastbot2 Droidbot Ape WCTester Stoat TimeMachine ComboDroid Humanoid Q-testing面临挑战 合适文本输入 连续长串操作 复合操作 页面功能理解 逻辑错误的发现用户界面测试现状和挑战相关成果 多篇论文发表在软件工程和人机交互领域旗舰会议/期刊
3、ICSE、TSE、CHI等 贝壳找房app、抖音app、华为鸿蒙生态、新能源汽车车载系统进行了应用或对接中ICSE 2023ICSE 2024ICSE 2024ICSE 2024CHI 2024 最佳论文提名奖TSE 2024Under submissionFSE2024-SE2030 用户界面测试现状和挑战 测试输入生成技术 面向测试路径规划的自动化GUI测试技术 基于多模态大模型的自动化GUI测试技术 针对文本输入的模糊测试技术 面向文本输入组件的交互提升技术提纲Fill in the Blank:Context-aware Automated Text Input Generation
4、for Mobile GUI.ICSE 2023Ask LLM to fill in the blank according to the generated promptsText Input Generation Set up linguistic patterns to generate prompts based on the current pageText input generation ExamplesText input generation for mobile app testing Passing rate:0.87 Significant activity boost
5、 and 122%(51 vs 23)more bugs by added to GUI testing toolsEvaluation 用户界面测试现状和挑战 测试输入生成技术 面向测试路径规划的自动化GUI测试技术 基于多模态大模型的自动化GUI测试技术 针对文本输入的模糊测试技术 面向文本输入组件的交互提升技术提纲 Auto GUI testing with LLM Formulate the automatic GUI testing problem to an interactive question&answering task to let the LLM conduct the
6、 whole app testing by understanding the GUI semantic information and automatically inferring possible operation stepsGPTDroid:Function-aware Automatic GUI testing GUI context extraction GUI prompting and executive command generation Functionality-aware memory prompting Testing sequence memorizer to
7、record all the detailed interactive testing information,e.g.,the explored activities and widgetsMake LLM a Testing Expert:Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions.ICSE 2024GPTDroid:Function-aware Automatic GUI testing GUI context extraction Accurately d
8、epict the GUI page currently under test,as well as its contained widgets information from a more micro perspective,and the app information from a more macro perspective.GPTDroid:Function-aware Automatic GUI testing GUI prompting and executive command generation Feedback prompt:inform the LLM error o
9、ccurred and re-try for querying next operation Command generation:provide LLM with the output template,including available operations and operation primitivesGPTDroid:Function-aware Automatic GUI testing Functionality-aware memory prompting Build a testing sequence memorizer to record all detailed t
10、esting information Query the LLM about the function-level progress of the testing Functionality-aware memory promptGPTDroid:Function-aware Automatic GUI testingExample 75%average activity coverage,32%higher than the best baselines detects 95 bugs for the 93 apps,31%higher than the best baselinesEval
11、uation Capability of GPTDroid Function-aware exploration through long meaningful testing trace Prioritization Valid text inputs&compound operationsEvaluation 用户界面测试现状和挑战 测试输入生成技术 面向测试路径规划的自动化GUI测试技术 基于多模态大模型的自动化GUI测试技术 针对文本输入的模糊测试技术 面向文本输入组件的交互提升技术提纲 Crash-bugs vs.Non-crash functional bugsVisionDroi
12、d:Vision-driven Automated Mobile GUI TestingVision-driven Automated Mobile GUI Testing via Multimodal Large Language Model,arxiv Vision-driven,multi-agent collaborative automated GUI testing approach for detecting non-crash functional bugsExplorer Agent:navigates through the app,captures view hierar
13、chies and screenshots,and guides the exploration towards diverse GUI pages while focusing on the apps functionalities.Monitor Agent:supervises the testing process,records the exploration history,and triggers the detector agent at the appropriate time.Detector Agent:identifies potential functional bu
14、gs by examining whether there are any issues in the logical transitions that occur during GUI page changesVisionDroid:Vision-driven Automated Mobile GUI TestingVision-driven Automated Mobile GUI Testing via Multimodal Large Language Model,arxiv Challenge 1:Aligning visual and text for MLLM input.Ali
15、gnment method that integrates text properties with visual context;Screenshot annotation method,pay attention to different types of actionable widgets,resolve issue of overlapping Challenge 2:Functionality-oriented exploration.Infer and abstract the current functionality from detailed exploration seq
16、uences,avoids exceeding token limits when interacting with the LLM,enable exploration more focusing on the functionality aspect Challenge 3:Inferring test oracle.Let the Monitor Agent trigger the Detector Agent at the end of each functionality explorationFunctionality-aware Chain-of-Thought(COT)to e
17、nable the MLLM to first explicitly infer oracles and then detect functional bugs based on these inferenceVisionDroid:Vision-driven Automated Mobile GUI TestingExplorer AgentMonitor Agent Enriching Detector Prompt with Examplebug description,bug screenshot and natural language described bug reproduct
18、ion path which facilitates the MLLM understanding of what the non-crash functional bugs areDetector Agent 50%-72%precision and 42%-65%recall more than 14%-112%and 108%-147%boost in average recall and precision compared with the best baselineEvaluation测试需求:我家-添加资产流程面向自然语言描述的测试用例迁移 用户界面测试现状和挑战 测试输入生成技
19、术 面向测试路径规划的自动化GUI测试技术 基于多模态大模型的自动化GUI测试技术 针对文本输入的模糊测试技术 面向文本输入组件的交互提升技术提纲 Intra-widget constraint:requirements of a single text input,e.g.,a widget for a humans height can only input the non-negative number.Inter-widget constraint:requirements among multiple text input widgets on a GUI page,for exam
20、ple,the diastolic pressure should be less than systolic pressure.面向文本输入的模糊测试 produce the test generators(a code snippet)with LLM Each can generate a batch of unusual text inputs under the same mutation rule(e.g.,insert special characters into a string)Testing the Limits:Unusual Text Inputs Generatio
21、n for Mobile App Crash Detection with Large Language Model,ICSE 2024Unusual Text Inputs GenerationExample prompt 72%-78%found bugs,significant higher than baselines Significant few attempt timesEvaluations 用户界面测试现状和挑战 测试输入生成技术 面向测试路径规划的自动化GUI测试技术 基于多模态大模型的自动化GUI测试技术 针对文本输入的模糊测试技术 面向文本输入组件的交互提升技术提纲35
22、hint-textlabel context descriptionExamples of differencesGood Examples of Text InputWithout hint-text36screen reader cannot obtain informationsimple and lacks meaningBad Examples of Text Input37Dataset:4,950 apps from 33 categories from Google Play3,398(76%)of them are without hint-text30,226(66%)ha
23、d at least one text input without explicit hint-text content Motivation Study38 HintDroid:Predicting Hint-text of Text Input GUI Entity Extraction and Prompting Enriching Prompt with Examples Feedback Extraction and PromptingUnblind Text Inputs:Predicting Hint-text of Text Input in Mobile Apps via L
24、LM,CHI 2024HintDroid:Predicting Hint-text of Text Input39 GUI Entity Extraction and Prompting App Entity Information+Page GUI Entity Information Input Component Entity informationModule 1:GUI Entity Extraction and Prompting 40 Example of the GUI prompt construction Module 1:GUI Entity Extraction and
25、 Prompting 41 Enriching Prompt with Examples Example Dataset Construction Retrieval-based Example Selection and In-context LearningModule 2:Enriching Prompt with Examples 42 Feedback Extraction and Prompting Automated Input Content Checking Error Message ExtractionModule 3:Feedback Extraction and Pr
26、ompting43 Extracting GUI pages Detecting GUI pages with missing hint-text Predicting hint-text based on GUI information Decompiling APK to obtain code Repackaging APK after code modification Implementation44We evaluate the effectiveness of HintDroid from the point of view of the hint-text generation
27、 accuracy.Evaluation-Effectiveness 45We can see that HintDroids hint-text generation performance is much higher than all other variants.Evaluation-Ablation Study46Data:33 apps with 237 text inputParticipants:P1 to P18 use HintDroidP18 to P36 withoutMetrics:Input accuracy Activity coverageState cover
28、age Filling time Whether HintDroid can help visually impaired users fill in the input.Evaluation-Usefulness 47Example of different good cases generated by HintDroidEvaluation-Usefulness 用户界面测试现状和挑战 测试输入生成技术 面向测试路径规划的自动化GUI测试技术 基于多模态大模型的自动化GUI测试技术 针对文本输入的模糊测试技术 面向文本输入组件的交互提升技术 测试覆盖率达不到100%页面元素远超开源应用,单个页面元素常常超过100个 图标类组件,缺失label等元素 文本描述中的模糊描述,左上,右上 一些特殊表述,例如:选择地区 大模型输出里面标号和描述对不上挑战总结和展望THANKS