《2018年云原生赋能AI:基于云原生的AI开发普惠.pdf》由会员分享,可在线阅读,更多相关《2018年云原生赋能AI:基于云原生的AI开发普惠.pdf(24页珍藏版)》请在三个皮匠报告上搜索。
1、?AI everywhere?/?g?T?r?o?-B?R?-?-o?-?-?N?-?-R?-?-L?-?-?-?N?44231?-?0?21 a?e?g?l?GG-?-?8Eo?PP?Operations Develop Network Compute CPU GPU Elastic Compute Service Storage NAS/NFS Alibaba cloud Kubernetes Service Auto scaling Docker Registry Service Train Tensorboard Loggingprofiling Load balance VPC gi
2、t Jupyter Notebook Inference Resource monitoring Tensorflow,Caffe,Pytorch MXNet,Keras,Tensorflow-serving Data preparation EMR OSS HDFS Hadoop CPFS RDMA Spark Seldon/TensorRT inference server A/B test,canary release FPGA Orchestration&workflow(kubeflow/arena/pipelines)?0?0?0?0?/,?/,?,?1?,?1?h?akh?dG?
3、/?/?ij?Ob?,?f?l?cAU?FOh?lij?ol?0?E?l?Oh?j?nPS?j?H?l?B?C?e?lnI?,?f?t?b?C?D?R?G?A?P?P?P?CM?S?M?M?M?M?N?D?FU?D?FU?9?GPU?G?G?P?C1?P?C1?0?-?C1?%?3?U?U?E?S?5?G?C1T?/bP13?)/?(?0?1?)/?(?0?1?8?8?2?2?0?1?)/?(?0?1?)/?(?0?1?8?8?2?2?0?1?2?2?0?1?C?/D?C?G?a?eC?/?/DF?e?U?P?A?d?:?1?-?.?4?.?1?:?:.?/?4?/?:?1?-?.?4?.?1
4、?:?/?.?4?1?G?A?C?D?A5?1G?DA?wn?u?u?t?vs?K?K?a?wf/2?k?-?np?U?-?l?wn?3?A?BD:?B?5:?0BHBDB?2?BD?u?-?wn?o?a?hyc?K?T?K?K?-iP.B?D?gf1G?DA?r?-d?m?2?/2?-?eb?wn?a?-?B?h?w?,3?o?-?2?/2?-d?/2?K?a?-d?o?y?y?-?C?G?B?G?:?B?5D?A51515KubeFlow/Kubernetes/NVidia Docker Runtimearena cliCPU/GPU/?Ethernet/RDMACPFS/HDFS/OSS
5、Arena?Tensorflow,Caffe,Horovod,Pytorch?S?U?/?/?P?MK?A?A?/?2?-?=?-?5?-?1?=2?5?-?1?=?1?6?5?6?0 1=?=?:=?/?:?1=2?5?-?2/?/?1?2/?/?2/?/.2?2?-?/5?5?1?6/?5G6=?/:?1?1=?=?:=?/?:?/:?5?-?=?2?0?6?/?/?:?/?/?5?8=0?/?/?/?:=5?8=0?/?/?/?:=5?8=0?/?M?M?/?/?=?=2?2?A?:A?:/?/?=?:=?=2?:?/?/?=?#?A?I?NAME STATUS TRAINER AGE
6、INSTANCE NODEtf-dist-data RUNNING tfjob3d tf-dist-data-tfjob-ps-0 192.168.1.120tf-dist-data SUCCEEDED tfjob3d tf-dist-data-tfjob-worker-0 N/Atf-dist-data SUCCEEDED tfjob3d tf-dist-data-tfjob-worker-1 N/AYour tensorboard will be available on