Table: databricks_catalog_schema - Query Databricks Catalog Schemas using SQL
Databricks Catalog is a feature within Databricks that organizes data into databases and tables. It provides a unified view of all data in Databricks and allows users to manage, discover, and utilize data effectively. A Databricks Catalog Schema is a logical grouping of tables within a database, providing a way to organize and manage data within a Databricks workspace.
Table Usage Guide
The databricks_catalog_schema
table provides insights into the organization and ownership of data within a Databricks workspace. As a data engineer or data scientist, explore schema-specific details through this table, including the database name, schema name, and schema owner. Utilize it to understand the structure of your data, discover who owns which schemas, and manage your data more effectively.
Examples
Basic info
Explore the basic details of your Databricks catalog schemas, such as who created them and when, to gain insights into their usage and management. This could be particularly useful for auditing purposes or for understanding the distribution of responsibility within your team.
select full_name, name, catalog_name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_schema;
select full_name, name, catalog_name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_schema;
List schemas modified in the last 7 days
Gain insights into recent schema modifications by identifying those that have been updated in the past week. This can be useful for tracking changes, auditing purposes, or troubleshooting recent issues.
select full_name, name, catalog_name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_schemawhere updated_at >= now() - interval '7 days';
select full_name, name, catalog_name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_schemawhere updated_at >= datetime('now', '-7 days');
List system created schemas
Explore which schemas have been created by the system to gain insights into the organization and management of your data. This can be particularly useful for understanding the structure of your data and identifying areas for optimization.
select full_name, name, catalog_name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_schemawhere owner = 'System user';
select full_name, name, catalog_name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_schemawhere owner = 'System user';
List schemas having auto maintenance enabled
Explore which schemas have automatic maintenance enabled to streamline management and ensure optimal performance. This can be useful in identifying areas for potential optimization and troubleshooting.
select full_name, name, catalog_name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_schemawhere enable_auto_maintenance;
select full_name, name, catalog_name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_schemawhere enable_auto_maintenance = 1;
Get effective permissions for each external location
Analyze the settings to understand the effective permissions assigned to each external location. This can help in managing access control and maintaining security protocols within your system.
select name, p ->> 'principal' as principal_name, p ->> 'privileges' as permissionsfrom databricks_catalog_schema, jsonb_array_elements(schema_effective_permissions) p;
select name, json_extract(p.value, '$.principal') as principal_name, json_extract(p.value, '$.privileges') as permissionsfrom databricks_catalog_schema, json_each(schema_effective_permissions) as p;
List catalog types and the average number of schemas per catalog
Explore the different types of catalogs and understand the average number of schemas each type typically contains. This can help in managing and optimizing the distribution of schemas across various catalogs.
select catalog_schema_counts.catalog_type, avg(catalog_schema_counts.schema_count) as avg_schemas_per_catalogfrom ( select c.catalog_type, count(s.full_name) as schema_count from databricks_catalog as c left join databricks_catalog_schema as s on c.name = s.catalog_name group by c.catalog_type ) as catalog_schema_countsgroup by catalog_schema_counts.catalog_type;
select catalog_schema_counts.catalog_type, avg(catalog_schema_counts.schema_count) as avg_schemas_per_catalogfrom ( select c.catalog_type, count(s.full_name) as schema_count from databricks_catalog as c left join databricks_catalog_schema as s on c.name = s.catalog_name group by c.catalog_type ) as catalog_schema_countsgroup by catalog_schema_counts.catalog_type;
Schema for databricks_catalog_schema
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form, e.g. connection_name. | |
account_id | text | The Databricks Account ID in which the resource is located. | |
catalog_name | text | = | Name of parent catalog. |
catalog_type | text | The type of the parent catalog. | |
comment | text | User-provided free-form text description. | |
created_at | timestamp with time zone | Time at which this schema was created. | |
created_by | text | The user who created this schema. | |
effective_auto_maintenance_flag | jsonb | Effective auto maintenance flag of the schema. | |
enable_auto_maintenance | boolean | Whether auto maintenance should be enabled for this object and objects under it. | |
full_name | text | = | Full name of schema, in form of __catalog_name__.__schema_name__. |
metastore_id | text | Unique identifier of parent metastore. | |
name | text | Name of schema, relative to parent catalog. | |
owner | text | Owner of the schema. | |
properties | jsonb | A map of key-value properties attached to the securable. | |
schema_effective_permissions | jsonb | Effective permissions of the schema. | |
schema_permissions | jsonb | Permissions of the schema. | |
storage_location | text | Storage location for managed tables within schema. | |
storage_root | text | Storage root URL for managed tables within schema. | |
title | text | The title of the resource. | |
updated_at | timestamp with time zone | Time at which this schema was last updated. | |
updated_by | text | The user who last updated this schema. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks
You can pass the configuration to the command with the --config
argument:
steampipe_export_databricks --config '<your_config>' databricks_catalog_schema