《JoinBoost:在数据库中为树模型进行机器学习.pdf》由会员分享,可在线阅读,更多相关《JoinBoost:在数据库中为树模型进行机器学习.pdf(24页珍藏版)》请在三个皮匠报告上搜索。
1、JoinBoost:Tree Models on DB with only SQLSpeaker:Zachary Huang1Contributors:Rathijit Sen2,Jiaxiang Liu1,Pavan Kalyan Damalapati1,Weisheng Wang1,Matthew Schoenbauer1,Eugene Wu11 Columbia University,2 Microsoft Gray Systems LabDatabricks2023ML on DB is popularTree models are the top picksTree ModelsDB
2、 full of tablesTree models for tablesTree model on DB Seem natural and easyTree model on DB Seem natural and easyBakery ChainPredict RatingTree model on DB Seem natural and easyPriceAgeWeatherLocationRating$1020-30SunnyNew York4.5feature XyPick the features from DBPredict RatingTree model on DB Seem
3、 natural and easyPriceAgeWeatherLocationRatingfeature XyPick the features from DBML trains the model$1020-30SunnyNew York4.5Challenge:DB-ML ImpedenceItemIDUserIDStoreIDDateIDRatingSalesUserIDAgeCustomerStoreIDLocationStoreItemIDPriceItemDateIDWeatherDateDB:stores many tablesPriceAgeWeatherLocationRa
4、tingfeature XyML:needs single tableTree model on DBChallenge:DB-ML Impedence$1020-30SunnyNew York4.5DB-ML ImpedenceCurrent solutionSalesCustStoreItemDateDBPriceRatingML CREATE TABLE train AS SELECT Price,Income,Weather,Location,Rating FROM Sales Join Date ON Sales.Date=Date.Date Join User ON Sales.U
5、serID=User.UserID Join Item ON Sales.ItemID=Item.ItemID Join Store ON Sales.StoreID=Store.StoreID;SQL:Join,materialize&exportJoin,Materialization&Export isLaborious,Slow and Insecure1.Laborious Expected:pick the features from DBReality:navigate through a complex schema to write SQLCREATE TABLE train
6、 ASSELECT*FROM title t JOIN aka_title at ON t.id=at.movie_id JOIN kind_type kt ON t.kind_id=kt.id JOIN movie_info mi ON t.id=mi.movie_id JOIN info_type it ON mi.info_type_id=it.id JOIN movie_companies mc ON t.id=mc.movie_id JOIN company_name cn ON pany_id=cn.id JOIN company_type ct ON pany_type_id=c