1、2024 Databricks Inc.All rights reserved1Scaling Scaling RAG and Embedding RAG and Embedding Computation with Computation with Ray and PineconeRay and PineconeCheng Su,AnyscaleCheng Su,AnyscaleRoy Miara,PineconeRoy Miara,Pinecone2024 Databricks Inc.All rights reservedRoy MiaraEngineering Manager,Gene
2、rative AI PineconePreviously worked on Data/ML infra(Spark,DBT,Entity Knowledge Graphs)Cheng SuEngineering Manager,Data AnyscalePreviously worked on Data Infra(Spark,Hadoop)Meta 2ABOUT USABOUT US2024 Databricks Inc.All rights reservedIntroRay&AnyscalePineconeIntroRay&AnyscalePinecone3AGENDAAGENDA“Th
3、e Problem”RAG:Retrieval Augmented GenerationVector Database&EmbeddingRay&AnyscaleEmbeddingLLM Offline InferenceServerless ArchitectureScale and CostQuality of RAG vs Training2024 Databricks Inc.All rights reserved4THE“PROBLEM”THE“PROBLEM”What did we try to solve together?What did we try to solve tog
4、ether?Evaluate a large scale RAG solutionData:Falcon RefinedWeb 1B documents from Common CrawlEmbedding Model:gte-large,dimension 1024Process and Embed with RayUpload and Index on Pinecone ServerlessRun a large scale RAG Evaluation2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights
5、 reserved5INTRO to RAGINTRO to RAG2024 Databricks Inc.All rights reserved62024 Databricks Inc.All rights reserved7WHAT IS RAG,WHY WE RAG?WHAT IS RAG,WHY WE RAG?MotivationMotivationLLMs dont knowdont know what they do not knowLLMs hallucinatehallucinate even when they know the answerRAG solves these
6、issues by providing models with factual correct context*RAG solves these issues by providing models with factual correct context*Errors and omissions excepted2024 Databricks Inc.All rights reserved8WHAT IS RAG,WHY WE RAG?WHAT IS RAG,WHY WE RAG?New informationNew information2024 Databricks Inc.All ri