1、When Agents Go Rogue(and how to fix them)Samraj Moorjani&Nikhil Thorat6/10/25Creating LLM judges to Measure Domain-Specific Agent QualityForward-looking StatementThis presentation has been prepared for informational purposes only.The information set forth herein does not purport to be complete or co
2、ntain all relevant information.Statements contained herein are made as of the date of this presentation unless stated otherwise.This presentation and the accompanying oral commentary may contain forward-looking statements.In some cases,forward-looking statements can be identified by terms such as“ma
3、y”,“will”,“should”,“expects”,“plans”,“anticipates”,“could”,“intends”,“projects”,“believes”,“estimates”,“predicts”,or“continue”,or the negative of these words or other similar terms or expressions that concern Databricks expectations,strategy,plans,or intentions.Forward-looking statements are based o
4、n information available at the time those statements are made and are inherently subject to risks and uncertainties that could cause actual results to differ materially from those expressed in or suggested by the forward-looking statements.Forward-looking statements should not be read as a guarantee
5、 of future performance or outcomes.Except as required by law,Databricks does not undertake any obligation to publicly update or revise any forward-looking statement,whether as a result of new information,future developments or otherwise.2Production Quality AgentsThis talk is not for you if:You are o
6、kay shipping untested software to production.You are okay with the financial and reputational risk when Agents make fatal mistakes.You dont care about Agents or AI3Why GenAI quality is hard 4Inputs and outputs are free-form,natural languageDomain expertise is required to assess qualityMust trade-off