turbot/databricks
steampipe plugin install databricks

Table: databricks_catalog_metastore - Query Databricks Catalog Metastores using SQL

A Databricks Catalog Metastore is a critical component within Databricks that stores metadata of databases and tables. It provides a centralized way to manage and access metadata, which includes information about databases, tables, views, functions, and more. Databricks Catalog Metastore helps users to understand the structure and properties of their data.

Table Usage Guide

The databricks_catalog_metastore table provides insights into the Databricks Catalog Metastore within Databricks. As a Data Engineer or Data Scientist, explore metadata-specific details through this table, including database names, table names, and associated properties. Use it to uncover information about the organization and structure of your data, helping you to manage and utilize your data more effectively.

Examples

Basic info

Explore the key details of your Databricks metastores to gain insights into their creation time, ownership, and associated cloud platform. This can help you manage and track your data resources effectively.

select
metastore_id,
name,
cloud,
created_at,
owner,
account_id
from
databricks_catalog_metastore;
select
metastore_id,
name,
cloud,
created_at,
owner,
account_id
from
databricks_catalog_metastore;

List cloud provider configuration for the metastores

Analyze the settings to understand the configuration of your cloud provider for the metastores. This can be useful to identify the regions, storage roots, and associated credentials, helping you manage and optimize your cloud resources effectively.

select
metastore_id,
name,
cloud,
global_metastore_id,
region,
storage_root,
storage_root_credential_id,
storage_root_credential_name
from
databricks_catalog_metastore;
select
metastore_id,
name,
cloud,
global_metastore_id,
region,
storage_root,
storage_root_credential_id,
storage_root_credential_name
from
databricks_catalog_metastore;

List metastores that could be shared externally

Identify metastores that have the potential to be shared externally. This query is beneficial in managing data access and ensuring appropriate security measures are in place.

select
metastore_id,
name,
cloud,
global_metastore_id,
owner,
account_id
from
databricks_catalog_metastore
where
delta_sharing_scope = 'INTERNAL_AND_EXTERNAL';
select
metastore_id,
name,
cloud,
global_metastore_id,
owner,
account_id
from
databricks_catalog_metastore
where
delta_sharing_scope = 'INTERNAL_AND_EXTERNAL';

List metastores that were updated in the last 7 days

Identify recently updated metastores in your Databricks catalog to keep track of changes. This can be useful for auditing and maintaining up-to-date information.

select
metastore_id,
name,
cloud,
owner,
updated_at,
account_id
from
databricks_catalog_metastore
where
updated_at >= now() - interval '7 days';
select
metastore_id,
name,
cloud,
owner,
updated_at,
account_id
from
databricks_catalog_metastore
where
updated_at >= datetime('now', '-7 days');

Get effective permissions for each function

Analyze the permissions associated with each function in your Databricks catalog to understand who has what level of access. This can help in maintaining security and managing access control effectively.

select
metastore_id,
name,
p ->> 'principal' as principal_name,
p ->> 'privileges' as permissions
from
databricks_catalog_metastore,
jsonb_array_elements(metastore_effective_permissions) p;
select
metastore_id,
name,
json_extract(p.value, '$.principal') as principal_name,
json_extract(p.value, '$.privileges') as permissions
from
databricks_catalog_metastore,
json_each(metastore_effective_permissions) as p;

List metastores with the highest number of effective permissions

Discover the metastores that have the most effective permissions. This is useful for analyzing which metastores are potentially more exposed or have more complex access configurations.

select
name,
cloud,
jsonb_array_length(metastore_effective_permissions) as permission_count
from
databricks_catalog_metastore
order by
permission_count desc
limit
10;
select
name,
cloud,
json_array_length(metastore_effective_permissions) as permission_count
from
databricks_catalog_metastore
order by
permission_count desc
limit
10;

Find the most recently updated metastores

Explore which metastores have been updated most recently to keep track of the latest changes and ensure data integrity. This is particularly useful in managing large datasets across multiple cloud platforms.

select
name,
cloud,
updated_at
from
databricks_catalog_metastore
order by
updated_at desc
limit
10;
select
name,
cloud,
updated_at
from
databricks_catalog_metastore
order by
updated_at desc
limit
10;

Count the number of metastores per cloud

This query is used to analyze the distribution of metastores across different cloud platforms. It helps in understanding the spread of data storage and management systems, providing insights for strategic decision-making in cloud resource allocation.

select
cloud,
count(*) as metastore_count
from
databricks_catalog_metastore
group by
cloud;
select
cloud,
count(*) as metastore_count
from
databricks_catalog_metastore
group by
cloud;

Schema for databricks_catalog_metastore

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form, e.g. connection_name.
account_idtextThe Databricks Account ID in which the resource is located.
cloudtextCloud vendor of the metastore home shard.
created_attimestamp with time zoneTime at which this metastore was created.
default_data_access_config_idtextUnique identifier of the metastore's (Default) Data Access Configuration.
delta_sharing_organization_nametextThe organization name of a Delta Sharing entity, to be used in Databricks-to-Databricks Delta Sharing as the official name.
delta_sharing_recipient_token_lifetime_in_secondsbigintThe lifetime of a delta sharing recipient token in seconds.
delta_sharing_scopetextThe scope of Delta Sharing enabled for the metastore.
global_metastore_idtextGlobally unique metastore ID across clouds and regions, of the form `cloud:region:metastore_id`.
metastore_effective_permissionsjsonbThe metastore effective permissions.
metastore_idtext=Unique identifier of metastore.
metastore_permissionsjsonbThe metastore permissions.
nametextThe user-specified name of the metastore.
ownertextThe owner of the metastore.
privilege_model_versiontextThe privilege model version of the metastore.
regiontextCloud region which the metastore serves.
storage_roottextThe storage root URL for metastore.
storage_root_credential_idtextUUID of storage credential to access the metastore storage_root.
storage_root_credential_nametextName of storage credential to access the metastore storage_root.
titletextThe title of the resource.
updated_attimestamp with time zoneTime at which this metastore was last updated.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks

You can pass the configuration to the command with the --config argument:

steampipe_export_databricks --config '<your_config>' databricks_catalog_metastore