turbot/databricks
steampipe plugin install databricks

Table: databricks_catalog - Query Databricks Catalogs using SQL

Databricks Catalog is a feature within Databricks that provides a unified view of all your Databricks data objects. It allows you to organize, manage, and access Databricks databases, tables, and functions. The Catalog helps in managing data objects across all workspaces and simplifies data discovery and access.

Table Usage Guide

The databricks_catalog table provides insights into Databricks Catalogs within Databricks. As a Data Engineer or Data Analyst, you can explore catalog-specific details through this table, including the databases and their corresponding tables. Utilize it to uncover information about your Databricks data objects, such as their organization, structure, and accessibility across workspaces.

Examples

Basic info

Explore the basic information related to your Databricks catalog, such as who created it and when, to better understand its origin and usage. This can be useful in auditing or managing your Databricks resources.

select
name,
catalog_type,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog;
select
name,
catalog_type,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog;

List catalogs modified in the last 7 days

Explore recent modifications to catalogs in the past week. This is useful for keeping track of changes and updates to your data resources.

select
name,
catalog_type,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog
where
updated_at >= now() - interval '7 days';
select
name,
catalog_type,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog
where
updated_at >= datetime('now', '-7 days');

List catalogs with auto maintenance enabled

Explore which catalogs have automatic maintenance enabled in Databricks to better manage and optimize your data storage and processing. This can help in ensuring data integrity and improved performance.

select
name,
catalog_type,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog
where
enable_auto_maintenance = 'ENABLE';
select
name,
catalog_type,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog
where
enable_auto_maintenance = 'ENABLE';

List catalogs that are not isolated to the workspace

Explore which catalogs are not isolated within your workspace. This is useful for identifying potential security risks or data sharing opportunities across different areas of your organization.

select
name,
catalog_type,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog
where
isolation_mode = 'OPEN';
select
name,
catalog_type,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog
where
isolation_mode = 'OPEN';

Get effective permissions for each catalog

Explore the various permissions assigned to each catalog to understand the access levels of different principals. This can be useful in managing and auditing access control within your Databricks environment.

select
name,
p ->> 'principal' as principal_name,
p ->> 'privileges' as permissions
from
databricks_catalog,
jsonb_array_elements(catalog_effective_permissions) p;
select
name,
json_extract(p.value, '$.principal') as principal_name,
json_extract(p.value, '$.privileges') as permissions
from
databricks_catalog,
json_each(catalog_effective_permissions) as p;

Get total catalogs of each type

Analyze the distribution of different catalog types in your Databricks environment. This can be useful to understand the variety of data sources and their prevalence, aiding in data management and strategic planning.

select
catalog_type,
count(*) as total_catalogs
from
databricks_catalog
group by
catalog_type;
select
catalog_type,
count(*) as total_catalogs
from
databricks_catalog
group by
catalog_type;

List the most recently updated catalog

Explore the most recent changes in your catalog. This query is useful in monitoring updates and ensuring the latest modifications are as expected.

select
name,
catalog_type,
updated_at
from
databricks_catalog
order by
updated_at desc
limit
1;
select
name,
catalog_type,
updated_at
from
databricks_catalog
order by
updated_at desc
limit
1;

Count the number of catalogs created by each user, including a percentage of their ownership

Discover the segments that have been created by each user and understand their relative contribution in terms of percentage ownership. This can provide valuable insights into user activity and resource utilization.

select
owner,
count(*) as total_catalogs,
(count(*) * 100.0 / sum(count(*)) over ()) as ownership_percentage
from
databricks_catalog
group by
owner;
select
owner,
count(*) as total_catalogs,
(
count(*) * 100.0 / (
select
count(*)
from
databricks_catalog
)
) as ownership_percentage
from
databricks_catalog
group by
owner;

Schema for databricks_catalog

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form, e.g. connection_name.
account_idtextThe Databricks Account ID in which the resource is located.
catalog_effective_permissionsjsonbEffective permissions for the catalog.
catalog_permissionsjsonbPermissions for the catalog.
catalog_typetextType of the catalog.
commenttextUser-provided free-form text description.
connection_nametextThe name of the connection to an external data source.
created_attimestamp with time zoneTime at which this catalog was created, in epoch milliseconds.
created_bytextUsername of catalog creator.
effective_auto_maintenance_flagjsonbEffective auto maintenance flag.
enable_auto_maintenancetextWhether auto maintenance should be enabled for this object and objects under it.
isolation_modetextWhether the current securable is accessible from all workspaces or a specific set of workspaces.
metastore_idtextUnique identifier of parent metastore.
nametext=Name of the catalog.
optionsjsonbCatalog options - A map of key-value properties attached to the securable.
ownertextUsername of current owner of catalog.
propertiesjsonbCatalog properties - A map of key-value properties attached to the securable.
provider_nametextThe name of delta sharing provider.
share_nametextThe name of the share under the share provider.
storage_locationtextStorage Location URL (full path) for managed tables within catalog.
storage_roottextStorage root URL for managed tables within catalog.
titletextThe title of the resource.
updated_attimestamp with time zoneTime at which this catalog was last updated, in epoch milliseconds.
updated_bytextUsername of user who last modified catalog.
workspace_bindingsjsonbArray of workspace bindings.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks

You can pass the configuration to the command with the --config argument:

steampipe_export_databricks --config '<your_config>' databricks_catalog