1、ML-SummitML-Summitwww.cpp-www.ml-summit.orgwww.gosim.orgwww.pm-summit.orgML-SummitML-SummitML-SummitML-SummitML-SummitML-Summit罗罗震震霄霄 P Pi in nt te er re es st t高高级级软软件件工工程程师师Pinterest担任Sr.Staff Software Engineer,负责大数据实时处理引擎,监控平台,和大模型数据预处理。在加入Pinterest之前,罗震霄先后在Cloudera,Uber,Twitter,Facebook负责大数据引擎和机
2、器学习平台的研发和运营工作。罗震霄是开源项目Presto committer,Presto Technical Steering Committee member。本科毕业于复旦大学,博士(on leave)毕业于University of Wisconsin Madiso演演讲讲主主题题:向向量量数数据据库库对对大大语语言言模模型型的的支支持持和和优优化化ML-SummitML-Summit2025 全球机器学习技术大会向量数据向量数据库对库对大大语语言模型言模型的支持和的支持和优优化化罗震霄Sr.Staff Software EngineerPinterestML-SummitML-Sum
3、mit目目录录CONTENTSLanguage Models at PinterestWhy we need VectorDBWhich are the VectorDB options?Our approachML-SummitML-SummitUndergraduate student Fudan University,2003-2007Ph.D.student(on leave)University of Wisconsin Madison,2007-2010Software Engineer Vertica,2010-2011Software Engineer Cloudera,201
4、1-2012Software Engineer Facebook,2012-2013Sr.Software Engineer Netflix,2013-2016Staff Engineer and Engineering Manager Uber,2016-2019Sr.Staff Engineer Twitter,2019-2022Sr.Staff Engineer Pinterest,2022-presentPresto Committer&Technical Steering Committee member,2019-presentML-SummitML-SummitLanguage
5、Models at Pinterest01ML-SummitML-Summit Vector Table Search Text to SQL automated table documentation ads models growth models many moreLanguage Models at PinterestML-SummitML-Summit use generic GPT models from openAI Retrieval Augmented Generation(RAG)Infuse Pinterest context into models both struc
6、tured and un-structured data data documentation slack query historyChallengesML-SummitML-Summit No Coordination manually curated datasets,no data reuse one off solution,no technology reuse embeddings and raw data together Standardization:vectorDB as a serviceMore ChallengesML-SummitML-SummitWhy we n