turbot/databricks
steampipe plugin install databricks

Table: databricks_catalog_external_location - Query Databricks Catalog External Locations using SQL

Databricks Catalog External Locations is a feature within Databricks that allows you to manage and organize your data. It provides a centralized way to catalog and manage data across various external sources, making data discovery and governance more efficient. Databricks Catalog External Locations helps you stay informed about the data you have and where it is located.

Table Usage Guide

The databricks_catalog_external_location table provides insights into the catalog external locations within Databricks. As a data engineer or data scientist, explore specific details about these locations through this table, including the name of the location, the associated database, and the owner. Utilize it to gain a deeper understanding of your data's organization, distribution, and ownership in Databricks.

Examples

Basic info

Explore the metadata of external locations in your Databricks catalog to understand when and by whom they were created, as well as their associated account details. This can help you manage and track the usage of your resources.

select
name,
comment,
created_at,
created_by,
metastore_id,
url,
account_id
from
databricks_catalog_external_location;
select
name,
comment,
created_at,
created_by,
metastore_id,
url,
account_id
from
databricks_catalog_external_location;

List external locations modified in the last 7 days

Discover the segments that have undergone modification in the external locations during the past week. This can be useful in tracking recent changes and understanding who made them and when.

select
name,
comment,
created_at,
created_by,
metastore_id,
url,
account_id
from
databricks_catalog_external_location
where
updated_at >= now() - interval '7 days';
select
name,
comment,
created_at,
created_by,
metastore_id,
url,
account_id
from
databricks_catalog_external_location
where
updated_at >= datetime('now', '-7 days');

List read only external locations

Discover the segments that are read-only in your external locations. This can be useful to ensure data integrity by preventing accidental changes to these areas.

select
name,
comment,
created_at,
created_by,
metastore_id,
url,
account_id
from
databricks_catalog_external_location
where
read_only;
select
name,
comment,
created_at,
created_by,
metastore_id,
url,
account_id
from
databricks_catalog_external_location
where
read_only = 1;

Get assocated credential for each external location

Determine the associated credentials for each external location in Databricks to understand the correlation between data storage and access permissions. This can aid in managing security and access control across different data repositories.

select
l.name,
l.comment,
l.url,
c.name as credential_name,
c.id,
c.aws_iam_role as credential_aws_iam_role,
l.account_id
from
databricks_catalog_external_location l,
databricks_catalog_storage_credential c
where
l.credential_id = c.id
and l.account_id = c.account_id;
select
l.name,
l.comment,
l.url,
c.name as credential_name,
c.id,
c.aws_iam_role as credential_aws_iam_role,
l.account_id
from
databricks_catalog_external_location l,
databricks_catalog_storage_credential c
where
l.credential_id = c.id
and l.account_id = c.account_id;

Get effective permissions for each external location

Explore which permissions are associated with each external location in your Databricks catalog. This can be particularly useful for understanding access rights and maintaining security compliance.

select
name,
p ->> 'principal' as principal_name,
p ->> 'privileges' as permissions
from
databricks_catalog_external_location,
jsonb_array_elements(external_location_effective_permissions) p;
select
name,
json_extract(p.value, '$.principal') as principal_name,
json_extract(p.value, '$.privileges') as permissions
from
databricks_catalog_external_location,
json_each(external_location_effective_permissions) as p;

Count the number of external locations per account ID

Discover the segments that have varying numbers of external locations linked to them. This is useful to understand the distribution of resources across different account segments.

select
account_id,
count(*) AS location_count
from
databricks_catalog_external_location
group by
account_id;
select
account_id,
count(*) AS location_count
from
databricks_catalog_external_location
group by
account_id;

List users who created the most external locations

Discover who has been the most active in creating external locations within your Databricks catalog. This is beneficial for understanding usage patterns and identifying key contributors to your data infrastructure.

select
created_by,
count(*) as location_count
from
databricks_catalog_external_location
group by
created_by
order by
location_count desc;
select
created_by,
count(*) as location_count
from
databricks_catalog_external_location
group by
created_by
order by
location_count desc;

Schema for databricks_catalog_external_location

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form, e.g. connection_name.
account_idtextThe Databricks Account ID in which the resource is located.
commenttextUser-provided free-form text description.
created_attimestamp with time zoneTime at which this external location was created.
created_bytextThe user who created this external location.
credential_idtextUnique ID of the location's storage credential.
credential_nametextName of the storage credential used with this location.
external_location_effective_permissionsjsonbThe effective permissions for the external location.
external_location_permissionsjsonbThe permissions for the external location.
metastore_idtextUnique identifier of metastore hosting the external location.
nametext=Human readable name that identifies the experiment.
ownertextThe user who owns this external location.
read_onlybooleanWhether this external location is read-only.
titletextThe title of the resource.
updated_attimestamp with time zoneTime at which this external location was last updated.
updated_bytextThe user who last updated this external location.
urltextThe Path URL of the external location.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks

You can pass the configuration to the command with the --config argument:

steampipe_export_databricks --config '<your_config>' databricks_catalog_external_location