1、Why Observability Matters(More!)with AI ApplicationsMonitoring vLLM and Llamastack in Kubernetes,with OpenTelemetry,Prometheus,Tempo,and GrafanaSally OMalleyPrincipal Software EngineerRed Hat“Im Sally OMalley,Principal SWE within Red Hats Emerging Technologies,Office of the CTO.I aim to help custome
2、rs run enterprise AI applications in production,on OpenShift and RHEL,with a focus on observability.”3Where were atObservability for LLMs4Reliable&Transparent:LLMLLMs are moving from research labs into business-critical enterprise applicationsIncrease of LLM use-casesOptimize PerformanceObservabilit
3、y for LLMs5LLMs must run not just efficiently,but reliably and with full transparency into their runtime behaviourReliable&Transparent:LLMLLMs are moving from research labs into business-critical enterprise applicationsIncrease of LLM use-casesOptimize PerformanceComplex PipelinesObservability for L
4、LMs6LLMs must run not just efficiently,but reliably and with full transparency into their runtime behaviourReliable&Transparent:LLMLLMs are moving from research labs into business-critical enterprise applicationsIncrease of LLM use-casesOptimize PerformanceComplex PipelinesEssential to debug multipl
5、e components and phases:retrieval,prompting,generation,distributed inferenceObservability for LLMs7LLMs must run not just efficiently,but reliably and with full transparency into their runtime behaviourReliable&Transparent:LLMLLMs are moving from research labs into business-critical enterprise appli
6、cationsIncrease of LLM use-casesOptimize PerformanceComplex PipelinesEssential to debug multiple components and phases:retrieval,prompting,generation,distributed inferenceTracking GPU usage,token throughput,and inference latency for fast and responsive outputWhy,How,What,Where?Why do LLMs pose uniqu