Table: databricks_catalog - Query Databricks Catalogs using SQL
Databricks Catalog is a feature within Databricks that provides a unified view of all your Databricks data objects. It allows you to organize, manage, and access Databricks databases, tables, and functions. The Catalog helps in managing data objects across all workspaces and simplifies data discovery and access.
Table Usage Guide
The databricks_catalog
table provides insights into Databricks Catalogs within Databricks. As a Data Engineer or Data Analyst, you can explore catalog-specific details through this table, including the databases and their corresponding tables. Utilize it to uncover information about your Databricks data objects, such as their organization, structure, and accessibility across workspaces.
Examples
Basic info
Explore the basic information related to your Databricks catalog, such as who created it and when, to better understand its origin and usage. This can be useful in auditing or managing your Databricks resources.
select name, catalog_type, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog;
select name, catalog_type, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog;
List catalogs modified in the last 7 days
Explore recent modifications to catalogs in the past week. This is useful for keeping track of changes and updates to your data resources.
select name, catalog_type, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalogwhere updated_at >= now() - interval '7 days';
select name, catalog_type, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalogwhere updated_at >= datetime('now', '-7 days');
List catalogs with auto maintenance enabled
Explore which catalogs have automatic maintenance enabled in Databricks to better manage and optimize your data storage and processing. This can help in ensuring data integrity and improved performance.
select name, catalog_type, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalogwhere enable_auto_maintenance = 'ENABLE';
select name, catalog_type, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalogwhere enable_auto_maintenance = 'ENABLE';
List catalogs that are not isolated to the workspace
Explore which catalogs are not isolated within your workspace. This is useful for identifying potential security risks or data sharing opportunities across different areas of your organization.
select name, catalog_type, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalogwhere isolation_mode = 'OPEN';
select name, catalog_type, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalogwhere isolation_mode = 'OPEN';
Get effective permissions for each catalog
Explore the various permissions assigned to each catalog to understand the access levels of different principals. This can be useful in managing and auditing access control within your Databricks environment.
select name, p ->> 'principal' as principal_name, p ->> 'privileges' as permissionsfrom databricks_catalog, jsonb_array_elements(catalog_effective_permissions) p;
select name, json_extract(p.value, '$.principal') as principal_name, json_extract(p.value, '$.privileges') as permissionsfrom databricks_catalog, json_each(catalog_effective_permissions) as p;
Get total catalogs of each type
Analyze the distribution of different catalog types in your Databricks environment. This can be useful to understand the variety of data sources and their prevalence, aiding in data management and strategic planning.
select catalog_type, count(*) as total_catalogsfrom databricks_cataloggroup by catalog_type;
select catalog_type, count(*) as total_catalogsfrom databricks_cataloggroup by catalog_type;
List the most recently updated catalog
Explore the most recent changes in your catalog. This query is useful in monitoring updates and ensuring the latest modifications are as expected.
select name, catalog_type, updated_atfrom databricks_catalogorder by updated_at desclimit 1;
select name, catalog_type, updated_atfrom databricks_catalogorder by updated_at desclimit 1;
Count the number of catalogs created by each user, including a percentage of their ownership
Discover the segments that have been created by each user and understand their relative contribution in terms of percentage ownership. This can provide valuable insights into user activity and resource utilization.
select owner, count(*) as total_catalogs, (count(*) * 100.0 / sum(count(*)) over ()) as ownership_percentagefrom databricks_cataloggroup by owner;
select owner, count(*) as total_catalogs, ( count(*) * 100.0 / ( select count(*) from databricks_catalog ) ) as ownership_percentagefrom databricks_cataloggroup by owner;
Schema for databricks_catalog
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form, e.g. connection_name. | |
account_id | text | The Databricks Account ID in which the resource is located. | |
catalog_effective_permissions | jsonb | Effective permissions for the catalog. | |
catalog_permissions | jsonb | Permissions for the catalog. | |
catalog_type | text | Type of the catalog. | |
comment | text | User-provided free-form text description. | |
connection_name | text | The name of the connection to an external data source. | |
created_at | timestamp with time zone | Time at which this catalog was created, in epoch milliseconds. | |
created_by | text | Username of catalog creator. | |
effective_auto_maintenance_flag | jsonb | Effective auto maintenance flag. | |
enable_auto_maintenance | text | Whether auto maintenance should be enabled for this object and objects under it. | |
isolation_mode | text | Whether the current securable is accessible from all workspaces or a specific set of workspaces. | |
metastore_id | text | Unique identifier of parent metastore. | |
name | text | = | Name of the catalog. |
options | jsonb | Catalog options - A map of key-value properties attached to the securable. | |
owner | text | Username of current owner of catalog. | |
properties | jsonb | Catalog properties - A map of key-value properties attached to the securable. | |
provider_name | text | The name of delta sharing provider. | |
share_name | text | The name of the share under the share provider. | |
storage_location | text | Storage Location URL (full path) for managed tables within catalog. | |
storage_root | text | Storage root URL for managed tables within catalog. | |
title | text | The title of the resource. | |
updated_at | timestamp with time zone | Time at which this catalog was last updated, in epoch milliseconds. | |
updated_by | text | Username of user who last modified catalog. | |
workspace_bindings | jsonb | Array of workspace bindings. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks
You can pass the configuration to the command with the --config
argument:
steampipe_export_databricks --config '<your_config>' databricks_catalog