Table: databricks_catalog_connection - Query Databricks Catalog Connections using SQL
Databricks Catalog Connections is a feature within Databricks that allows users to manage and utilize data connections within their Databricks workspace. It provides a centralized way to set up and manage connections to various data sources, including databases, data lakes, and other data storage systems. Databricks Catalog Connections helps users to streamline their data workflows and improve data accessibility and utilization.
Table Usage Guide
The databricks_catalog_connection
table provides insights into the catalog connections within Databricks. As a data engineer, explore connection-specific details through this table, including connection properties, type, and associated metadata. Utilize it to manage and monitor data connections, such as those with specific properties, the types of connections, and the verification of connection status.
Examples
Basic info
Explore the various connections within your Databricks catalog to understand their types, creators, and creation dates. This can be useful for auditing purposes or to assess the overall configuration of your Databricks environment.
select name, connection_id, comment, connection_type, created_at, created_by, full_name, metastore_id, account_idfrom databricks_catalog_connection;
select name, connection_id, comment, connection_type, created_at, created_by, full_name, metastore_id, account_idfrom databricks_catalog_connection;
List connections modified in the last 7 days
Explore which connections have been updated in the past week. This could be useful for monitoring recent changes and ensuring they align with your organization's data policies.
select name, connection_id, comment, connection_type, created_at, created_by, full_name, metastore_id, account_idfrom databricks_catalog_connectionwhere updated_at >= now() - interval '7 days';
select name, connection_id, comment, connection_type, created_at, created_by, full_name, metastore_id, account_idfrom databricks_catalog_connectionwhere updated_at >= datetime('now', '-7 days');
List read only connections
Discover the segments that have read-only access in your databricks catalog to better manage data security and access permissions. This can be particularly useful when auditing user access or refining access control policies.
select name, connection_id, comment, connection_type, created_at, created_by, full_name, metastore_id, account_idfrom databricks_catalog_connectionwhere read_only;
select name, connection_id, comment, connection_type, created_at, created_by, full_name, metastore_id, account_idfrom databricks_catalog_connectionwhere read_only;
List all postgres connections
Explore which Databricks catalog connections are specifically linked to PostgreSQL. This is useful for understanding and managing your PostgreSQL integrations within Databricks.
select name, connection_id, comment, connection_type, created_at, created_by, full_name, metastore_id, account_idfrom databricks_catalog_connectionwhere connection_type = 'POSTGRESQL';
select name, connection_id, comment, connection_type, created_at, created_by, full_name, metastore_id, account_idfrom databricks_catalog_connectionwhere connection_type = 'POSTGRESQL';
Count the number of connections per connection type
Analyze your databricks catalog to understand the distribution of different connection types. This can help optimize resource allocation by identifying which connection types are used most frequently.
select connection_type, count(*) as connection_countfrom databricks_catalog_connectiongroup by connection_type;
select connection_type, count(*) as connection_countfrom databricks_catalog_connectiongroup by connection_type;
Calculate the total number of connections per owner, sorted by owner's total connections
Assess the elements within your databricks catalog to understand the distribution of connections per owner. This is particularly useful in identifying which owners are utilizing the most connections, thereby aiding in resource allocation and management.
select owner, count(*) as total_connectionsfrom databricks_catalog_connectiongroup by ownerorder by total_connections desc;
select owner, count(*) as total_connectionsfrom databricks_catalog_connectiongroup by ownerorder by total_connections desc;
List connections with properties that have the highest number of key-value pairs
Analyze your connections to understand which ones have the most complex properties, helping you focus on the ones that may require more maintenance or are more likely to experience issues due to their complexity.
select name, connection_type, jsonb_object_keys(properties_kvpairs) as keysfrom databricks_catalog_connectionorder by array_length( array( select keys ), 1 ) desclimit 10;
Error: SQLite does not support array functions like array_lengthand array.
Get details of the metastore associated to a particular connection
Explore the specifics of a particular connection by identifying its associated metastore details. This can be beneficial in understanding the ownership, creation time, and account details of the metastore, providing a comprehensive view of the connection's metadata storage.
select c.name as connection_name, m.metastore_id, m.name as metastore_name, m.created_at as metastore_create_time, m.owner as metastore_owner, m.account_id as metastore_account_idfrom databricks_catalog_connection as c left join databricks_catalog_metastore as m on c.metastore_id = m.metastore_id;
select c.name as connection_name, m.metastore_id, m.name as metastore_name, m.created_at as metastore_create_time, m.owner as metastore_owner, m.account_id as metastore_account_idfrom databricks_catalog_connection as c left join databricks_catalog_metastore as m on c.metastore_id = m.metastore_id;
Schema for databricks_catalog_connection
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form, e.g. connection_name. | |
account_id | text | The Databricks Account ID in which the resource is located. | |
comment | text | User-provided free-form text description. | |
connection_id | text | Unique identifier of the Connection. | |
connection_type | text | Type of the connection. | |
created_at | timestamp with time zone | The creation time of the connection. | |
created_by | text | The user who created the connection. | |
credential_type | text | Type of the credential. | |
full_name | text | The full name of the connection. | |
metastore_id | text | Unique identifier of parent metastore. | |
name | text | = | Name of the connection. |
options_kvpairs | jsonb | A map of key-value properties attached to the securable. | |
owner | text | The user who owns the connection. | |
properties_kvpairs | jsonb | An object containing map of key-value properties attached to the connection. | |
read_only | boolean | Whether the connection is read-only. | |
title | text | The title of the resource. | |
updated_at | timestamp with time zone | The last time the connection was updated. | |
updated_by | text | The user who last updated the connection. | |
url | text | URL of the remote data source, extracted from options_kvpairs. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks
You can pass the configuration to the command with the --config
argument:
steampipe_export_databricks --config '<your_config>' databricks_catalog_connection