1、Multi-cloud data governance on the Databricks LakehouseIoannis Papadopoulos,DatabricksVolker Tjaden,DatabricksDatabricks2023Data governance as answers to questionsThe questions that data governance is attempting to addressWho has access to what dataHow do we ensure we can trust the dataHow can we pr
2、ove the validity of the insights generated by the dataPrincipalPrivilegeSecurableData governance defined as questionsThe Privacy and Security DimensionWho has access to what dataWhen,HowEntitlementData governance in the real worldConstraints that do matter in implementing a data governance systemMul
3、tiple Cloud ProvidersMultiple Geo LocationsMultiple Data TypesMultiple Technology Stacks1_DAIS_Title_SlideData governance features ofthe Databricks LakehouseCentralised governance with Unity CatalogCloud Storage(S3,ADLS,GCS)container/bucket container/bucket Unity CatalogAudit LogDatabricksWorkspaceA
4、ccount Level User MgmtCredentialsMetastoreIdentity ProviderLineage ExplorerACL StoreData ExplorerAccess ControlXuserDelta Sharing ServerDelta Sharing Clientuser Short-lived tokenXAccess deniedWhere do identities and entitlements live?DatabricksWorkspaceDatabricksWorkspaceClustersSQL WarehousesCluste
5、rsSQL WarehousesUnity CatalogUser ManagementMetastoreAccess ControlDatabricks and cloud provider hierarchiesA Databricks account per cloud providerGCPOrganizationProjectProjectDatabricks WorkspaceDatabricks WorkspaceDatabricks AccountAccount ConsoleAzureAWSAADTenantAzure SubscriptionAzure Subscripti
6、onDatabricks WorkspaceDatabricks WorkspaceDatabricks AccountAccount ConsoleAccount ConsoleOrganizational UnitAWS AccountAWS AccountDatabricks WorkspaceDatabricks WorkspaceDatabricks AccountDatabricks WorkspaceThe Databricks account consoleThe Databricks account consoleCloud 3Cloud 2Cloud 1User/group