Databricks + Steampipe
Databricks is a unified set of tools for building, deploying, sharing, and maintaining enterprise-grade data solutions at scale.
Steampipe is an open-source zero-ETL engine to instantly query cloud APIs using SQL.
List details of your Databricks clusters:
select cluster_id, title, cluster_source, creator_user_name, driver_node_type_id, node_type_id, state, start_timefrom databricks_compute_cluster;
+----------------------+--------------------------------+----------------+-------------------+---------------------+--------------+------------+---------------------------+| cluster_id | title | cluster_source | creator_user_name | driver_node_type_id | node_type_id | state | start_time |+----------------------+--------------------------------+----------------+-------------------+---------------------+--------------+------------+---------------------------+| 1234-141524-10b6dv2h | [default]basic-starter-cluster | "API" | user@turbot.com | i3.xlarge | i3.xlarge | TERMINATED | 2023-07-21T19:45:24+05:30 || 1234-061816-mvns8mxz | test-cluster-for-ml | "UI" | user@turbot.com | i3.xlarge | i3.xlarge | TERMINATED | 2023-07-28T11:48:16+05:30 |+----------------------+--------------------------------+----------------+-------------------+---------------------+--------------+------------+---------------------------+
Documentation
Quick start
Install
Download and install the latest Databricks plugin:
steampipe plugin install databricks
Credentials
Item | Description |
---|---|
Credentials | For Databricks native authentication, Specify a named profile from .databrickscfg file with the profile argument. |
Permissions | Grant the READ permissions to your user. |
Radius | Each connection represents a single Databricks Installation. |
Resolution | 1. Credentials explicitly set in a steampipe config file (~/.steampipe/config/databricks.spc )2. Credentials specified in environment variables, e.g., DATABRICKS_TOKEN .3. Credentials in the credential file ( ~/.databrickscfg ) for the profile specified in the DATABRICKS_CONFIG_PROFILE environment variable. |
Configuration
Installing the latest databricks plugin will create a config file (~/.steampipe/config/databricks.spc
) with a single connection named databricks
:
connection "databricks" { plugin = "databricks"
# A connection profile specified within .databrickscfg to use instead of DEFAULT. # This can also be set via the `DATABRICKS_CONFIG_PROFILE` environment variable. # profile = "databricks-dev"
# The target Databricks account ID. # This can also be set via the `DATABRICKS_ACCOUNT_ID` environment variable. # See Locate your account ID: https://docs.databricks.com/administration-guide/account-settings/index.html#account-id. # account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"
# The target Databricks account SCIM token. # See: https://docs.databricks.com/administration-guide/account-settings/index.html#generate-a-scim-token # This can also be set via the `DATABRICKS_TOKEN` environment variable. # account_token = "dsapi5c72c067b40df73ccb6be3b085d3ba"
# The target Databricks account console URL, which is typically https://accounts.cloud.databricks.com. # This can also be set via the `DATABRICKS_HOST` environment variable. # account_host = "https://accounts.cloud.databricks.com/"
# The target Databricks workspace Personal Access Token. # This can also be set via the `DATABRICKS_TOKEN` environment variable. # See: https://docs.databricks.com/dev-tools/auth.html#databricks-personal-access-tokens-for-users # workspace_token = "dapia865b9d1d41389ed883455032d090ee"
# The target Databricks workspace URL. # See https://docs.databricks.com/workspace/workspace-details.html#workspace-url # This can also be set via the `DATABRICKS_HOST` environment variable. # workspace_host = "https://dbc-a1b2c3d4-e6f7.cloud.databricks.com"
# The Databricks username part of basic authentication. Only possible when Host is *.cloud.databricks.com (AWS). # This can also be set via the `DATABRICKS_USERNAME` environment variable. # username = "user@turbot.com"
# The Databricks password part of basic authentication. Only possible when Host is *.cloud.databricks.com (AWS). # This can also be set via the `DATABRICKS_PASSWORD` environment variable. # password = "password"
# A non-default location of the Databricks CLI credentials file. # This can also be set via the `DATABRICKS_CONFIG_FILE` environment variable. # config_file_path = "/Users/username/.databrickscfg" # OAuth secret client ID of a service principal # This can also be set via the `DATABRICKS_CLIENT_ID` environment variable. # client_id = "123-456-789"
# OAuth secret value of a service principal # This can also be set via the `DATABRICKS_CLIENT_SECRET` environment variable. # client_secret = "dose1234567789abcde"}
By default, all options are commented out in the default connection, thus Steampipe will resolve your credentials using the same mechanism as the Databricks CLI (Databricks environment variables, DEFAULT profile, etc). This provides a quick way to get started with Steampipe, but you will probably want to customize your experience using configuration options for querying multiple accounts, configuring credentials from your Databricks Profiles.
Multi-Account Connections
You may create multiple databricks connections:
connection "databricks_dev" { plugin = "databricks" profile = "databricks_dev" account_id = abcdd0f81-9be0-4425-9e29-3a7d96782373}
connection "databricks_qa" { plugin = "databricks" profile = "databricks_qa" account_id = wxyzd0f81-9be0-4425-9e29-3a7d96782373}
connection "databricks_prod" { plugin = "databricks" profile = "databricks_prod" account_id = pqrsd0f81-9be0-4425-9e29-3a7d96782373}
Each connection is implemented as a distinct Postgres schema. As such, you can use qualified table names to query a specific connection:
select *from databricks_dev.databricks_iam_account_user;
You can create a multi-account connection by using an aggregator connection. Aggregators allow you to query data from multiple connections for a plugin as if they are a single connection.
connection "databricks_all" { plugin = "databricks" type = "aggregator" connections = ["databricks_dev", "databricks_qa", "databricks_prod"]}
Querying tables from this connection will return results from the databricks_dev
, databricks_qa
, and databricks_prod
connections:
select *from databricks_all.databricks_iam_account_user;
Alternatively, can use an unqualified name and it will be resolved according to the Search Path. It's a good idea to name your aggregator first alphabetically, so that it is the first connection in the search path (i.e. databricks_all
comes before databricks_dev
):
select *from databricks_iam_account_user;
Steampipe supports the *
wildcard in the connection names. For example, to aggregate all the Databricks plugin connections whose names begin with databricks_
:
connection "databricks_all" { type = "aggregator" plugin = "databricks" connections = ["databricks_*"]}
Configuring Databricks Credentials
Databricks Profile Credentials
You may specify a named profile from a Databricks credential file with the profile
argument. A connection per profile, using named profiles is probably the most common configuration:
databricks credential file:
[user1-account]host = https://accounts.cloud.databricks.comtoken = dsapi5c72c067b40df73ccb6be3b085d3baaccount_id = abcdd0f81-9be0-4425-9e29-3a7d96782373
[user1-workspace]host = https://dbc-a1b2c3d4-e6f7.cloud.databricks.com/token = dapia865b9d1d41389ed883455032d090eeaccount_id = abcdd0f81-9be0-4425-9e29-3a7d96782373
[user1-basic]host = https://accounts.cloud.databricks.comusername = user1@turbot.compassword = Pass****wordaccount_id = abcdd0f81-9be0-4425-9e29-3a7d96782373
databricks.spc:
connection "databricks_user1-account" { plugin = "databricks" profile = "user1-account" account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"}
connection "databricks_user1-workspace" { plugin = "databricks" profile = "user1-workspace" account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"}
connection "databricks_user1-basic" { plugin = "databricks" profile = "user1-basic" account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"}
Databricks Account Credentials
Configuration to query Databricks account.
databricks credential file:
[user1-account]host = https://accounts.cloud.databricks.comtoken = dsapi5c72c067b40df73ccb6be3b085d3baaccount_id = abcdd0f81-9be0-4425-9e29-3a7d96782373
databricks.spc:
connection "databricks_user1-account" { plugin = "databricks" profile = "user1-account" account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"}
Databricks Wokspace Credentials
Configuration to query Databricks workspace.
databricks credential file:
[user1-workspace]host = https://dbc-a1b2c3d4-e6f7.cloud.databricks.com/token = dapia865b9d1d41389ed883455032d090eeaccount_id = abcdd0f81-9be0-4425-9e29-3a7d96782373
databricks.spc:
connection "databricks_user1-workspace" { plugin = "databricks" profile = "user1-workspace" account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"}
Databricks Account and Workspace Credentials
Configuration to query Databricks workspace and account using the same connection.
databricks.spc:
connection "databricks_user1-workspace" { plugin = "databricks"
account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"
account_host = "https://accounts.cloud.databricks.com/" account_token = "dsapi5c72c067b40df73ccb6be3b085d3ba"
workspace_host = "https://dbc-a1b2c3d4-e6f7.cloud.databricks.com/" workspace_token = "dapia865b9d1d41389ed883455032d090ee"}
Databricks OAuth credentials
Configuration to query Databricks workspace by using OAuth for service principals.
databricks.spc
connection "databricks_user1-account" { plugin = "databricks" account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373" workspace_host = "https://dbc-a1b2c3d4-e6f7.cloud.databricks.com/"
client_id = "123-456-789" client_secret = "dose1234567789abcde"}
Credentials from Environment Variables
Alternatively, you can also use the standard Databricks environment variables to obtain credentials only if other argument (profile
, account_id
, client_id
/client_secret
/account_host
/workspace_host
, account_token
/account_host
/workspace_token
/workspace_host
) is not specified in the connection:
export DATABRICKS_CONFIG_PROFILE=user1-testexport DATABRICKS_TOKEN=dsapi5c72c067b40df73ccb6be3b085d3baexport DATABRICKS_HOST=https://accounts.cloud.databricks.comexport DATABRICKS_ACCOUNT_ID=abcdd0f81-9be0-4425-9e29-3a7d96782373export DATABRICKS_CLIENT_ID=123-456-789export DATABRICKS_CLIENT_SECRET=dose1234567789abcdeexport DATABRICKS_USERNAME=user@turbot.comexport DATABRICKS_PASSWORD=password
Postgres FDW
This plugin is available as a native Postgres FDW. Unlike Steampipe CLI, which ships with an embedded Postgres server instance, the Postgres FDW can be installed in any supported Postgres database version.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_postgres_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/postgres.sh)" -- databricks
The installer will prompt you for the plugin name and version, download and install the appropriate files for your OS, system architecture, and Postgres version.
To configure the Postgres FDW, you will create an extension, foreign server, and schema and import the foreign schema.
CREATE EXTENSION IF NOT EXISTS steampipe_postgres_databricks;CREATE SERVER steampipe_databricks FOREIGN DATA WRAPPER steampipe_postgres_databricks OPTIONS (config '<your_config>');CREATE SCHEMA databricks;IMPORT FOREIGN SCHEMA databricks FROM SERVER steampipe_databricks INTO databricks;
SQLite Extension
This plugin is available as a SQLite Extension, making the tables available as SQLite virtual tables.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_sqlite_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/sqlite.sh)" -- databricks
The installer will prompt you for the plugin name, version, and destination directory. It will then determine the OS and system architecture, and it will download and install the appropriate package.
To configure the SQLite extension, load the extension module and then run the steampipe_configure_databricks
function to configure it with plugin-specific options.
$ sqlite3sqlite> .load ./steampipe_sqlite_extension_databricks.sosqlite> select steampipe_configure_databricks('<your_config>');
Export
This plugin is available as a standalone Export CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks
You can pass the configuration to the command with the --config
argument:
steampipe_export_databricks --config '<your_config>' <table_name>