turbot/databricks
steampipe plugin install databricks

Table: databricks_catalog_volume - Query Databricks Catalog Volumes using SQL

A Databricks Catalog Volume is a logical storage unit within Databricks, a unified analytics platform. These volumes are used to organize and manage data within the Databricks environment. They can contain a variety of data types and formats, including tables, views, and functions.

Table Usage Guide

The databricks_catalog_volume table provides insights into the Catalog Volumes within Databricks. As a data engineer or data analyst, you can explore volume-specific details through this table, including volume type, properties, and associated metadata. Utilize it to uncover information about volumes, such as their organization, the types of data they contain, and their properties.

Examples

Basic info

Gain insights into specific volumes within your Databricks catalog, pinpointing details such as their creation date and creator. This information can be useful for auditing purposes or to understand the history and ownership of different volumes.

select
volume_id,
name,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog_volume
where
catalog_name = 'catalog'
and schema_name = 'schema';
The provided PostgreSQL query does not contain any PostgreSQL - specific functions
or data types that need to be converted to SQLite.Therefore,
the SQLite equivalent of the provided query is exactly the same as the original query.select volume_id,
name,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog_volume
where
catalog_name = 'catalog'
and schema_name = 'schema';

List volumes not modified in the last 90 days

Determine the volumes that have remained unmodified for the past 90 days. This query can help in identifying potential areas for data cleanup or archiving, thus optimizing your data storage and management.

select
volume_id,
name,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog_volume
where
updated_at <= now() - interval '90 days'
and catalog_name = 'catalog'
and schema_name = 'schema';
select
volume_id,
name,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog_volume
where
updated_at <= datetime('now', '-90 days')
and catalog_name = 'catalog'
and schema_name = 'schema';

List volumes created in the last 7 days

Explore volumes that have been created within the past week. This is useful for tracking recent additions and understanding the growth of your data storage.

select
volume_id,
name,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog_volume
where
created_at >= now() - interval '7 days'
and catalog_name = 'catalog'
and schema_name = 'schema';
select
volume_id,
name,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog_volume
where
created_at >= datetime('now', '-7 days')
and catalog_name = 'catalog'
and schema_name = 'schema';

List all external volumes

Discover the segments that utilize external volumes in your Databricks catalog. This can be beneficial in understanding your data storage and management strategy.

select
volume_id,
name,
volume_type,
storage_location,
account_id
from
databricks_catalog_volume
where
volume_type = 'EXTERNAL'
and catalog_name = 'catalog'
and schema_name = 'schema';
select
volume_id,
name,
volume_type,
storage_location,
account_id
from
databricks_catalog_volume
where
volume_type = 'EXTERNAL'
and catalog_name = 'catalog'
and schema_name = 'schema';

Get details for a specific volume

Explore specific volume details to understand its creation timeline, associated account, and metadata store. This is useful for auditing and tracking changes to the volume over time.

select
volume_id,
name,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog_volume
where
full_name = '__catalog_name__.__schema_name__.__volume_name__';
select
volume_id,
name,
comment,
created_at,
created_by,
metastore_id,
account_id
from
databricks_catalog_volume
where
full_name = '__catalog_name__.__schema_name__.__volume_name__';

Count the number of volumes in a particular catalog

Explore the quantity of volumes within a specific catalog to understand its size and complexity. This is beneficial in managing and optimizing data storage and accessibility.

select
catalog_name,
count(*) as volume_count
from
databricks_catalog_volume
where
catalog_name = 'catalog'
and schema_name = 'schema'
group by
catalog_name;
select
catalog_name,
count(*) as volume_count
from
databricks_catalog_volume
where
catalog_name = 'catalog'
and schema_name = 'schema'
group by
catalog_name;

Schema for databricks_catalog_volume

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form, e.g. connection_name.
account_idtextThe Databricks Account ID in which the resource is located.
catalog_nametext=The name of the catalog where the schema and the volume are.
commenttextThe comment attached to the volume.
created_attimestamp with time zoneTime at which this volume was created.
created_bytextThe user who created the volume.
full_nametext=The three-level (fully qualified) name of the volume.
metastore_idtextUnique identifier of parent metastore.
nametextName of volume.
ownertextThe identifier of the user who owns the volume.
schema_nametext=The name of the schema where the volume is.
storage_locationtextThe storage location on the cloud.
titletextThe title of the resource.
updated_attimestamp with time zoneTime at which this volume was last updated.
updated_bytextThe user who last updated the volume.
volume_idtextThe unique identifier of the volume.
volume_typetextThe type of the volume.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks

You can pass the configuration to the command with the --config argument:

steampipe_export_databricks --config '<your_config>' databricks_catalog_volume