Table: databricks_catalog_metastore - Query Databricks Catalog Metastores using SQL
A Databricks Catalog Metastore is a critical component within Databricks that stores metadata of databases and tables. It provides a centralized way to manage and access metadata, which includes information about databases, tables, views, functions, and more. Databricks Catalog Metastore helps users to understand the structure and properties of their data.
Table Usage Guide
The databricks_catalog_metastore
table provides insights into the Databricks Catalog Metastore within Databricks. As a Data Engineer or Data Scientist, explore metadata-specific details through this table, including database names, table names, and associated properties. Use it to uncover information about the organization and structure of your data, helping you to manage and utilize your data more effectively.
Examples
Basic info
Explore the key details of your Databricks metastores to gain insights into their creation time, ownership, and associated cloud platform. This can help you manage and track your data resources effectively.
select metastore_id, name, cloud, created_at, owner, account_idfrom databricks_catalog_metastore;
select metastore_id, name, cloud, created_at, owner, account_idfrom databricks_catalog_metastore;
List cloud provider configuration for the metastores
Analyze the settings to understand the configuration of your cloud provider for the metastores. This can be useful to identify the regions, storage roots, and associated credentials, helping you manage and optimize your cloud resources effectively.
select metastore_id, name, cloud, global_metastore_id, region, storage_root, storage_root_credential_id, storage_root_credential_namefrom databricks_catalog_metastore;
select metastore_id, name, cloud, global_metastore_id, region, storage_root, storage_root_credential_id, storage_root_credential_namefrom databricks_catalog_metastore;
List metastores that could be shared externally
Identify metastores that have the potential to be shared externally. This query is beneficial in managing data access and ensuring appropriate security measures are in place.
select metastore_id, name, cloud, global_metastore_id, owner, account_idfrom databricks_catalog_metastorewhere delta_sharing_scope = 'INTERNAL_AND_EXTERNAL';
select metastore_id, name, cloud, global_metastore_id, owner, account_idfrom databricks_catalog_metastorewhere delta_sharing_scope = 'INTERNAL_AND_EXTERNAL';
List metastores that were updated in the last 7 days
Identify recently updated metastores in your Databricks catalog to keep track of changes. This can be useful for auditing and maintaining up-to-date information.
select metastore_id, name, cloud, owner, updated_at, account_idfrom databricks_catalog_metastorewhere updated_at >= now() - interval '7 days';
select metastore_id, name, cloud, owner, updated_at, account_idfrom databricks_catalog_metastorewhere updated_at >= datetime('now', '-7 days');
Get effective permissions for each function
Analyze the permissions associated with each function in your Databricks catalog to understand who has what level of access. This can help in maintaining security and managing access control effectively.
select metastore_id, name, p ->> 'principal' as principal_name, p ->> 'privileges' as permissionsfrom databricks_catalog_metastore, jsonb_array_elements(metastore_effective_permissions) p;
select metastore_id, name, json_extract(p.value, '$.principal') as principal_name, json_extract(p.value, '$.privileges') as permissionsfrom databricks_catalog_metastore, json_each(metastore_effective_permissions) as p;
List metastores with the highest number of effective permissions
Discover the metastores that have the most effective permissions. This is useful for analyzing which metastores are potentially more exposed or have more complex access configurations.
select name, cloud, jsonb_array_length(metastore_effective_permissions) as permission_countfrom databricks_catalog_metastoreorder by permission_count desclimit 10;
select name, cloud, json_array_length(metastore_effective_permissions) as permission_countfrom databricks_catalog_metastoreorder by permission_count desclimit 10;
Find the most recently updated metastores
Explore which metastores have been updated most recently to keep track of the latest changes and ensure data integrity. This is particularly useful in managing large datasets across multiple cloud platforms.
select name, cloud, updated_atfrom databricks_catalog_metastoreorder by updated_at desclimit 10;
select name, cloud, updated_atfrom databricks_catalog_metastoreorder by updated_at desclimit 10;
Count the number of metastores per cloud
This query is used to analyze the distribution of metastores across different cloud platforms. It helps in understanding the spread of data storage and management systems, providing insights for strategic decision-making in cloud resource allocation.
select cloud, count(*) as metastore_countfrom databricks_catalog_metastoregroup by cloud;
select cloud, count(*) as metastore_countfrom databricks_catalog_metastoregroup by cloud;
Schema for databricks_catalog_metastore
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form, e.g. connection_name. | |
account_id | text | The Databricks Account ID in which the resource is located. | |
cloud | text | Cloud vendor of the metastore home shard. | |
created_at | timestamp with time zone | Time at which this metastore was created. | |
default_data_access_config_id | text | Unique identifier of the metastore's (Default) Data Access Configuration. | |
delta_sharing_organization_name | text | The organization name of a Delta Sharing entity, to be used in Databricks-to-Databricks Delta Sharing as the official name. | |
delta_sharing_recipient_token_lifetime_in_seconds | bigint | The lifetime of a delta sharing recipient token in seconds. | |
delta_sharing_scope | text | The scope of Delta Sharing enabled for the metastore. | |
global_metastore_id | text | Globally unique metastore ID across clouds and regions, of the form `cloud:region:metastore_id`. | |
metastore_effective_permissions | jsonb | The metastore effective permissions. | |
metastore_id | text | = | Unique identifier of metastore. |
metastore_permissions | jsonb | The metastore permissions. | |
name | text | The user-specified name of the metastore. | |
owner | text | The owner of the metastore. | |
privilege_model_version | text | The privilege model version of the metastore. | |
region | text | Cloud region which the metastore serves. | |
storage_root | text | The storage root URL for metastore. | |
storage_root_credential_id | text | UUID of storage credential to access the metastore storage_root. | |
storage_root_credential_name | text | Name of storage credential to access the metastore storage_root. | |
title | text | The title of the resource. | |
updated_at | timestamp with time zone | Time at which this metastore was last updated. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks
You can pass the configuration to the command with the --config
argument:
steampipe_export_databricks --config '<your_config>' databricks_catalog_metastore