当前位置:首页 > 报告详情

从故障到修复:OCI 让 AI 基础设施管理变得轻松 [LRN3344].pdf

上传人: Fl****zo 编号:970987 2025-11-08 26页 2.23MB

1、 AI Infrastructure management on OCIStreamlining AI Infra operationsNameJonathan SchrieberSr.Principal Product Manager,Compute2025,Oct 14The following is intended to outline our general product direction.It is intended for information purposes only,and may not be incorporated into any contract.It is

2、 not a commitment to deliver any material,code,or functionality,and should not be relied upon in making purchasing decisions.The development,release,timing,and pricing of any features or functionality described for Oracles products may change and remains at the sole discretion of Oracle Corporation.

3、Safe harbor statement2Copyright 2025,Oracle and/or its affiliates|Confidential:Internal/Restricted/Highly RestrictedForward-Looking statementsThis presentation is intended to outline our general product direction.It is intended for information purposes only,and may not be incorporated into any contr

4、act.It is not a commitment to deliver any material,code,or functionality,and should not be relied upon in making purchasing decisions.The development,release,timing,and pricing of any features or functionality described for Oracles products may change and remains at the sole discretion of Oracle Cor

5、poration.Statements in this presentation relating to Oracles future plans,expectations,beliefs,intentions,and prospects are“forward-looking statements”and are subject to material risks and uncertainties.A detailed discussion of these factors and other risks that affect our business is contained in O

6、racles Securities and Exchange Commission(SEC)filings,including our most recent reports on Form 10-K and Form 10-Q under the heading“Risk Factors.”These filings are available on the SECs website or on Oracles website at https:/ information in this presentation is current as of October 2025 and Oracl

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据《Data》标记中的内容,以下是全文主要内容的简明扼要概括: 1. **主题**:AI基础设施管理在Oracle Cloud Infrastructure (OCI)上的优化。 2. **演讲者**:Jonathan Schrieber(OCI Compute高级产品经理)和Ahnjae Shin(云团队负责人和产品经理)。 3. **核心内容**: - **基础设施设置和配置**:提供一键式Terraform安装,部署控制器和GPU节点,集成Grafana仪表板和本地Prometheus进行监控。 - **监控和可观察性**:OCI GPU扫描器提供被动和主动的健康检查,包括GPU性能和网络健康监控。 - **操作和故障排除**:通过Grafana仪表板和Prometheus进行监控,提供节点修复自动化。 4. **客户案例**:Friendli.ai在OCI上运行AI集群,强调裸金属性能和网络性能对低延迟服务的重要性。 5. **未来展望**: - OCI Health、修复和可观察性功能。 - OKE的自动化节点修复和补救功能。 - NVIDIA网络操作员管理的附加组件和RDMA支持。
"OCI AI基础设施管理揭秘" "一键部署,AI集群无忧?" AI运维新利器!"
客服
商务合作
小程序
服务号
折叠