turbot/databricks
steampipe plugin install databricks

Databricks + Steampipe

Databricks is a unified set of tools for building, deploying, sharing, and maintaining enterprise-grade data solutions at scale.

Steampipe is an open-source zero-ETL engine to instantly query cloud APIs using SQL.

List details of your Databricks clusters:

select
cluster_id,
title,
cluster_source,
creator_user_name,
driver_node_type_id,
node_type_id,
state,
start_time
from
databricks_compute_cluster;
+----------------------+--------------------------------+----------------+-------------------+---------------------+--------------+------------+---------------------------+
| cluster_id | title | cluster_source | creator_user_name | driver_node_type_id | node_type_id | state | start_time |
+----------------------+--------------------------------+----------------+-------------------+---------------------+--------------+------------+---------------------------+
| 1234-141524-10b6dv2h | [default]basic-starter-cluster | "API" | user@turbot.com | i3.xlarge | i3.xlarge | TERMINATED | 2023-07-21T19:45:24+05:30 |
| 1234-061816-mvns8mxz | test-cluster-for-ml | "UI" | user@turbot.com | i3.xlarge | i3.xlarge | TERMINATED | 2023-07-28T11:48:16+05:30 |
+----------------------+--------------------------------+----------------+-------------------+---------------------+--------------+------------+---------------------------+

Documentation

Quick start

Install

Download and install the latest Databricks plugin:

steampipe plugin install databricks

Credentials

ItemDescription
CredentialsFor Databricks native authentication, Specify a named profile from .databrickscfg file with the profile argument.
PermissionsGrant the READ permissions to your user.
RadiusEach connection represents a single Databricks Installation.
Resolution1. Credentials explicitly set in a steampipe config file (~/.steampipe/config/databricks.spc)
2. Credentials specified in environment variables, e.g., DATABRICKS_TOKEN.
3. Credentials in the credential file (~/.databrickscfg) for the profile specified in the DATABRICKS_CONFIG_PROFILE environment variable.

Configuration

Installing the latest databricks plugin will create a config file (~/.steampipe/config/databricks.spc) with a single connection named databricks:

connection "databricks" {
plugin = "databricks"
# A connection profile specified within .databrickscfg to use instead of DEFAULT.
# This can also be set via the `DATABRICKS_CONFIG_PROFILE` environment variable.
# profile = "databricks-dev"
# The target Databricks account ID.
# This can also be set via the `DATABRICKS_ACCOUNT_ID` environment variable.
# See Locate your account ID: https://docs.databricks.com/administration-guide/account-settings/index.html#account-id.
# account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"
# The target Databricks account SCIM token.
# See: https://docs.databricks.com/administration-guide/account-settings/index.html#generate-a-scim-token
# This can also be set via the `DATABRICKS_TOKEN` environment variable.
# account_token = "dsapi5c72c067b40df73ccb6be3b085d3ba"
# The target Databricks account console URL, which is typically https://accounts.cloud.databricks.com.
# This can also be set via the `DATABRICKS_HOST` environment variable.
# account_host = "https://accounts.cloud.databricks.com/"
# The target Databricks workspace Personal Access Token.
# This can also be set via the `DATABRICKS_TOKEN` environment variable.
# See: https://docs.databricks.com/dev-tools/auth.html#databricks-personal-access-tokens-for-users
# workspace_token = "dapia865b9d1d41389ed883455032d090ee"
# The target Databricks workspace URL.
# See https://docs.databricks.com/workspace/workspace-details.html#workspace-url
# This can also be set via the `DATABRICKS_HOST` environment variable.
# workspace_host = "https://dbc-a1b2c3d4-e6f7.cloud.databricks.com"
# The Databricks username part of basic authentication. Only possible when Host is *.cloud.databricks.com (AWS).
# This can also be set via the `DATABRICKS_USERNAME` environment variable.
# username = "user@turbot.com"
# The Databricks password part of basic authentication. Only possible when Host is *.cloud.databricks.com (AWS).
# This can also be set via the `DATABRICKS_PASSWORD` environment variable.
# password = "password"
# A non-default location of the Databricks CLI credentials file.
# This can also be set via the `DATABRICKS_CONFIG_FILE` environment variable.
# config_file_path = "/Users/username/.databrickscfg"
# OAuth secret client ID of a service principal
# This can also be set via the `DATABRICKS_CLIENT_ID` environment variable.
# client_id = "123-456-789"
# OAuth secret value of a service principal
# This can also be set via the `DATABRICKS_CLIENT_SECRET` environment variable.
# client_secret = "dose1234567789abcde"
}

By default, all options are commented out in the default connection, thus Steampipe will resolve your credentials using the same mechanism as the Databricks CLI (Databricks environment variables, DEFAULT profile, etc). This provides a quick way to get started with Steampipe, but you will probably want to customize your experience using configuration options for querying multiple accounts, configuring credentials from your Databricks Profiles.

Multi-Account Connections

You may create multiple databricks connections:

connection "databricks_dev" {
plugin = "databricks"
profile = "databricks_dev"
account_id = abcdd0f81-9be0-4425-9e29-3a7d96782373
}
connection "databricks_qa" {
plugin = "databricks"
profile = "databricks_qa"
account_id = wxyzd0f81-9be0-4425-9e29-3a7d96782373
}
connection "databricks_prod" {
plugin = "databricks"
profile = "databricks_prod"
account_id = pqrsd0f81-9be0-4425-9e29-3a7d96782373
}

Each connection is implemented as a distinct Postgres schema. As such, you can use qualified table names to query a specific connection:

select
*
from
databricks_dev.databricks_iam_account_user;

You can create a multi-account connection by using an aggregator connection. Aggregators allow you to query data from multiple connections for a plugin as if they are a single connection.

connection "databricks_all" {
plugin = "databricks"
type = "aggregator"
connections = ["databricks_dev", "databricks_qa", "databricks_prod"]
}

Querying tables from this connection will return results from the databricks_dev, databricks_qa, and databricks_prod connections:

select
*
from
databricks_all.databricks_iam_account_user;

Alternatively, can use an unqualified name and it will be resolved according to the Search Path. It's a good idea to name your aggregator first alphabetically, so that it is the first connection in the search path (i.e. databricks_all comes before databricks_dev):

select
*
from
databricks_iam_account_user;

Steampipe supports the * wildcard in the connection names. For example, to aggregate all the Databricks plugin connections whose names begin with databricks_:

connection "databricks_all" {
type = "aggregator"
plugin = "databricks"
connections = ["databricks_*"]
}

Configuring Databricks Credentials

Databricks Profile Credentials

You may specify a named profile from a Databricks credential file with the profile argument. A connection per profile, using named profiles is probably the most common configuration:

databricks credential file:

[user1-account]
host = https://accounts.cloud.databricks.com
token = dsapi5c72c067b40df73ccb6be3b085d3ba
account_id = abcdd0f81-9be0-4425-9e29-3a7d96782373
[user1-workspace]
host = https://dbc-a1b2c3d4-e6f7.cloud.databricks.com/
token = dapia865b9d1d41389ed883455032d090ee
account_id = abcdd0f81-9be0-4425-9e29-3a7d96782373
[user1-basic]
host = https://accounts.cloud.databricks.com
username = user1@turbot.com
password = Pass****word
account_id = abcdd0f81-9be0-4425-9e29-3a7d96782373

databricks.spc:

connection "databricks_user1-account" {
plugin = "databricks"
profile = "user1-account"
account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"
}
connection "databricks_user1-workspace" {
plugin = "databricks"
profile = "user1-workspace"
account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"
}
connection "databricks_user1-basic" {
plugin = "databricks"
profile = "user1-basic"
account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"
}

Databricks Account Credentials

Configuration to query Databricks account.

databricks credential file:

[user1-account]
host = https://accounts.cloud.databricks.com
token = dsapi5c72c067b40df73ccb6be3b085d3ba
account_id = abcdd0f81-9be0-4425-9e29-3a7d96782373

databricks.spc:

connection "databricks_user1-account" {
plugin = "databricks"
profile = "user1-account"
account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"
}

Databricks Wokspace Credentials

Configuration to query Databricks workspace.

databricks credential file:

[user1-workspace]
host = https://dbc-a1b2c3d4-e6f7.cloud.databricks.com/
token = dapia865b9d1d41389ed883455032d090ee
account_id = abcdd0f81-9be0-4425-9e29-3a7d96782373

databricks.spc:

connection "databricks_user1-workspace" {
plugin = "databricks"
profile = "user1-workspace"
account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"
}

Databricks Account and Workspace Credentials

Configuration to query Databricks workspace and account using the same connection.

databricks.spc:

connection "databricks_user1-workspace" {
plugin = "databricks"
account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"
account_host = "https://accounts.cloud.databricks.com/"
account_token = "dsapi5c72c067b40df73ccb6be3b085d3ba"
workspace_host = "https://dbc-a1b2c3d4-e6f7.cloud.databricks.com/"
workspace_token = "dapia865b9d1d41389ed883455032d090ee"
}

Databricks OAuth credentials

Configuration to query Databricks workspace by using OAuth for service principals.

databricks.spc

connection "databricks_user1-account" {
plugin = "databricks"
account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"
workspace_host = "https://dbc-a1b2c3d4-e6f7.cloud.databricks.com/"
client_id = "123-456-789"
client_secret = "dose1234567789abcde"
}

Credentials from Environment Variables

Alternatively, you can also use the standard Databricks environment variables to obtain credentials only if other argument (profile, account_id, client_id/client_secret/account_host/workspace_host, account_token/account_host/workspace_token/workspace_host) is not specified in the connection:

export DATABRICKS_CONFIG_PROFILE=user1-test
export DATABRICKS_TOKEN=dsapi5c72c067b40df73ccb6be3b085d3ba
export DATABRICKS_HOST=https://accounts.cloud.databricks.com
export DATABRICKS_ACCOUNT_ID=abcdd0f81-9be0-4425-9e29-3a7d96782373
export DATABRICKS_CLIENT_ID=123-456-789
export DATABRICKS_CLIENT_SECRET=dose1234567789abcde
export DATABRICKS_USERNAME=user@turbot.com
export DATABRICKS_PASSWORD=password

Postgres FDW

This plugin is available as a native Postgres FDW. Unlike Steampipe CLI, which ships with an embedded Postgres server instance, the Postgres FDW can be installed in any supported Postgres database version.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_postgres_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/postgres.sh)" -- databricks

The installer will prompt you for the plugin name and version, download and install the appropriate files for your OS, system architecture, and Postgres version.

To configure the Postgres FDW, you will create an extension, foreign server, and schema and import the foreign schema.

CREATE EXTENSION IF NOT EXISTS steampipe_postgres_databricks;
CREATE SERVER steampipe_databricks FOREIGN DATA WRAPPER steampipe_postgres_databricks OPTIONS (config '<your_config>');
CREATE SCHEMA databricks;
IMPORT FOREIGN SCHEMA databricks FROM SERVER steampipe_databricks INTO databricks;

SQLite Extension

This plugin is available as a SQLite Extension, making the tables available as SQLite virtual tables.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_sqlite_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/sqlite.sh)" -- databricks

The installer will prompt you for the plugin name, version, and destination directory. It will then determine the OS and system architecture, and it will download and install the appropriate package.

To configure the SQLite extension, load the extension module and then run the steampipe_configure_databricks function to configure it with plugin-specific options.

$ sqlite3
sqlite> .load ./steampipe_sqlite_extension_databricks.so
sqlite> select steampipe_configure_databricks('<your_config>');

Export

This plugin is available as a standalone Export CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks

You can pass the configuration to the command with the --config argument:

steampipe_export_databricks --config '<your_config>' <table_name>