1、#BHUSA BlackHatEventsWhat Lies Beneath the Surface:What Lies Beneath the Surface:Evaluating LLMs for Offensive Cyber Capabilities through Evaluating LLMs for Offensive Cyber Capabilities through Prompting,Simulation&EmulationPrompting,Simulation&EmulationSpeaker(s):Michael Kouremetis,Marissa Dotter,
2、Alexander ByrneCopyright 2024 The MITRE Corporation.ALL RIGHTS RESERVED.Approved for public release.Distribution unlimited.Case:24-2367#BHUSA BlackHatEventsTeamMarissa Dotter(Speaker)AI,AI SecurityLLMsAlex Byrne(Speaker)AI,LLMsAutonomous Cyber OpsMichael ThreetAI InfrastructureLLMsEthan MichalakAdve
3、rsary EmulationSoftware DevMichael Kouremetis(Speaker)Autonomous Cyber OpsAdversary EmulationGuido ZarrellaMITRE AI Technical FellowDan MartinRed teamingAdversary EmulationGianpaolo RussoAutonomous Cyber OpsOCOCopyright 2024 The MITRE Corporation.ALL RIGHTS RESERVED.Approved for public release.Distr
4、ibution unlimited.Case:24-2367 2#BHUSA BlackHatEventsThe Problem$10 gift card problemIs this LLM an offensive cyber threat?What is actual the level of risk?Y2K problemSource:https:/ proliferation804K public LLMs(HuggingFace)Application of LLMs to cyber domain3.5K public“cyber”datasets(HuggingFace)LL
5、M power increasingChatGPT is estimated to be1-1.5T parameters“No.Well maybe but probably not.LLMs are hard to test;and are very hardto test for offensive cyber capability.Sono?”Copyright 2024 The MITRE Corporation.ALL RIGHTS RESERVED.Approved for public release.Distribution unlimited.Case:24-2367 So
6、urce:https:/ BlackHatEventsCurrent&Emerging EffortsPurple Llama-CyberSecEval 1&2Google Project Zero-NaptimeDeepMind Evaluating Frontier ModelsNTU-PentestGPTUIUC “LLM Agents Hack Websites”Evaluating LLMs for Offensive Cyber Operation(OCO)CapabilitiesCopyright 2024 The MITRE Corporation.ALL RIGHTS RES