turbot/databricks
steampipe plugin install databricks

Table: databricks_workspace_repo - Query Databricks Workspace Repositories using SQL

Databricks Workspace Repositories are a feature of Databricks that allow users to manage and version control notebooks and other workspace objects. These repositories can be linked to a remote Git repository, enabling seamless integration with existing version control workflows. This functionality provides a robust and efficient way to manage, track and version control data science and machine learning workflows.

Table Usage Guide

The databricks_workspace_repo table provides insights into Databricks Workspace Repositories. As a data scientist or DevOps engineer, explore repository-specific details through this table, including repository ID, name, and git status. Utilize it to manage and track your data science and machine learning workflows, ensuring efficient version control and workflow management.

Examples

Basic info

Explore which Databricks workspace repositories are linked to your account. This helps you assess the elements within your workspace, such as the repository path, branch, and provider, and pinpoint the specific locations where changes have been made.

select
id,
path,
branch,
provider,
head_commit_id,
url,
account_id
from
databricks_workspace_repo;
select
id,
path,
branch,
provider,
head_commit_id,
url,
account_id
from
databricks_workspace_repo;

List the master repositories

Determine the areas in which master repositories are used within your Databricks workspace. This can help in understanding your workspace's code base, tracking changes, and managing versions.

select
id,
path,
branch,
provider,
head_commit_id,
url,
account_id
from
databricks_workspace_repo
where
branch = 'master';
select
id,
path,
branch,
provider,
head_commit_id,
url,
account_id
from
databricks_workspace_repo
where
branch = 'master';

List repositories for github provider

Analyze your Databricks workspace to identify all repositories linked with the GitHub provider. This can help in understanding the codebase distribution and managing repositories effectively.

select
id,
path,
branch,
provider,
head_commit_id,
url,
account_id
from
databricks_workspace_repo
where
provider = 'gitHub';
select
id,
path,
branch,
provider,
head_commit_id,
url,
account_id
from
databricks_workspace_repo
where
provider = 'gitHub';

List patterns included for sparse checkout

Explore which patterns are included for sparse checkout in a Databricks workspace repository. This can help in understanding the specific files or directories that are included in the workspace without downloading the entire repository, aiding in efficient data management.

select
id,
path,
branch,
patterns,
account_id
from
databricks_workspace_repo,
jsonb_array_elements_text(sparse_checkout_patterns) as patterns;
select
id,
path,
branch,
patterns.value as patterns,
account_id
from
databricks_workspace_repo,
json_each(sparse_checkout_patterns) as patterns;

List total repos per provider

Gain insights into the distribution of repositories across different providers. This is useful for understanding which providers are most commonly used for hosting repositories in your Databricks workspace.

select
provider,
count(*) as total_repos
from
databricks_workspace_repo
group by
provider;
select
provider,
count(*) as total_repos
from
databricks_workspace_repo
group by
provider;

Schema for databricks_workspace_repo

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form, e.g. connection_name.
account_idtextThe Databricks Account ID in which the resource is located.
branchtextBranch that the local version of the repo is checked out to.
head_commit_idtextSHA-1 hash representing the commit ID of the current HEAD of the repo.
idbigint=ID of the repo object in the workspace.
pathtext=Path for the repo in the workspace.
providertextGit provider.
sparse_checkout_patternsjsonbList of patterns to include for sparse checkout.
titletextThe title of the resource.
urltextURL of the Git repo to be linked.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks

You can pass the configuration to the command with the --config argument:

steampipe_export_databricks --config '<your_config>' databricks_workspace_repo