Table: databricks_catalog_volume - Query Databricks Catalog Volumes using SQL
A Databricks Catalog Volume is a logical storage unit within Databricks, a unified analytics platform. These volumes are used to organize and manage data within the Databricks environment. They can contain a variety of data types and formats, including tables, views, and functions.
Table Usage Guide
The databricks_catalog_volume
table provides insights into the Catalog Volumes within Databricks. As a data engineer or data analyst, you can explore volume-specific details through this table, including volume type, properties, and associated metadata. Utilize it to uncover information about volumes, such as their organization, the types of data they contain, and their properties.
Examples
Basic info
Gain insights into specific volumes within your Databricks catalog, pinpointing details such as their creation date and creator. This information can be useful for auditing purposes or to understand the history and ownership of different volumes.
select volume_id, name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_volumewhere catalog_name = 'catalog' and schema_name = 'schema';
The provided PostgreSQL query does not contain any PostgreSQL - specific functionsor data types that need to be converted to SQLite.Therefore,the SQLite equivalent of the provided query is exactly the same as the original query.select volume_id,name,comment,created_at,created_by,metastore_id,account_idfrom databricks_catalog_volumewhere catalog_name = 'catalog' and schema_name = 'schema';
List volumes not modified in the last 90 days
Determine the volumes that have remained unmodified for the past 90 days. This query can help in identifying potential areas for data cleanup or archiving, thus optimizing your data storage and management.
select volume_id, name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_volumewhere updated_at <= now() - interval '90 days' and catalog_name = 'catalog' and schema_name = 'schema';
select volume_id, name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_volumewhere updated_at <= datetime('now', '-90 days') and catalog_name = 'catalog' and schema_name = 'schema';
List volumes created in the last 7 days
Explore volumes that have been created within the past week. This is useful for tracking recent additions and understanding the growth of your data storage.
select volume_id, name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_volumewhere created_at >= now() - interval '7 days' and catalog_name = 'catalog' and schema_name = 'schema';
select volume_id, name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_volumewhere created_at >= datetime('now', '-7 days') and catalog_name = 'catalog' and schema_name = 'schema';
List all external volumes
Discover the segments that utilize external volumes in your Databricks catalog. This can be beneficial in understanding your data storage and management strategy.
select volume_id, name, volume_type, storage_location, account_idfrom databricks_catalog_volumewhere volume_type = 'EXTERNAL' and catalog_name = 'catalog' and schema_name = 'schema';
select volume_id, name, volume_type, storage_location, account_idfrom databricks_catalog_volumewhere volume_type = 'EXTERNAL' and catalog_name = 'catalog' and schema_name = 'schema';
Get details for a specific volume
Explore specific volume details to understand its creation timeline, associated account, and metadata store. This is useful for auditing and tracking changes to the volume over time.
select volume_id, name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_volumewhere full_name = '__catalog_name__.__schema_name__.__volume_name__';
select volume_id, name, comment, created_at, created_by, metastore_id, account_idfrom databricks_catalog_volumewhere full_name = '__catalog_name__.__schema_name__.__volume_name__';
Count the number of volumes in a particular catalog
Explore the quantity of volumes within a specific catalog to understand its size and complexity. This is beneficial in managing and optimizing data storage and accessibility.
select catalog_name, count(*) as volume_countfrom databricks_catalog_volumewhere catalog_name = 'catalog' and schema_name = 'schema'group by catalog_name;
select catalog_name, count(*) as volume_countfrom databricks_catalog_volumewhere catalog_name = 'catalog' and schema_name = 'schema'group by catalog_name;
Schema for databricks_catalog_volume
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form, e.g. connection_name. | |
account_id | text | The Databricks Account ID in which the resource is located. | |
catalog_name | text | = | The name of the catalog where the schema and the volume are. |
comment | text | The comment attached to the volume. | |
created_at | timestamp with time zone | Time at which this volume was created. | |
created_by | text | The user who created the volume. | |
full_name | text | = | The three-level (fully qualified) name of the volume. |
metastore_id | text | Unique identifier of parent metastore. | |
name | text | Name of volume. | |
owner | text | The identifier of the user who owns the volume. | |
schema_name | text | = | The name of the schema where the volume is. |
storage_location | text | The storage location on the cloud. | |
title | text | The title of the resource. | |
updated_at | timestamp with time zone | Time at which this volume was last updated. | |
updated_by | text | The user who last updated the volume. | |
volume_id | text | The unique identifier of the volume. | |
volume_type | text | The type of the volume. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks
You can pass the configuration to the command with the --config
argument:
steampipe_export_databricks --config '<your_config>' databricks_catalog_volume