Table: databricks_workspace_repo - Query Databricks Workspace Repositories using SQL
Databricks Workspace Repositories are a feature of Databricks that allow users to manage and version control notebooks and other workspace objects. These repositories can be linked to a remote Git repository, enabling seamless integration with existing version control workflows. This functionality provides a robust and efficient way to manage, track and version control data science and machine learning workflows.
Table Usage Guide
The databricks_workspace_repo
table provides insights into Databricks Workspace Repositories. As a data scientist or DevOps engineer, explore repository-specific details through this table, including repository ID, name, and git status. Utilize it to manage and track your data science and machine learning workflows, ensuring efficient version control and workflow management.
Examples
Basic info
Explore which Databricks workspace repositories are linked to your account. This helps you assess the elements within your workspace, such as the repository path, branch, and provider, and pinpoint the specific locations where changes have been made.
select id, path, branch, provider, head_commit_id, url, account_idfrom databricks_workspace_repo;
select id, path, branch, provider, head_commit_id, url, account_idfrom databricks_workspace_repo;
List the master repositories
Determine the areas in which master repositories are used within your Databricks workspace. This can help in understanding your workspace's code base, tracking changes, and managing versions.
select id, path, branch, provider, head_commit_id, url, account_idfrom databricks_workspace_repowhere branch = 'master';
select id, path, branch, provider, head_commit_id, url, account_idfrom databricks_workspace_repowhere branch = 'master';
List repositories for github provider
Analyze your Databricks workspace to identify all repositories linked with the GitHub provider. This can help in understanding the codebase distribution and managing repositories effectively.
select id, path, branch, provider, head_commit_id, url, account_idfrom databricks_workspace_repowhere provider = 'gitHub';
select id, path, branch, provider, head_commit_id, url, account_idfrom databricks_workspace_repowhere provider = 'gitHub';
List patterns included for sparse checkout
Explore which patterns are included for sparse checkout in a Databricks workspace repository. This can help in understanding the specific files or directories that are included in the workspace without downloading the entire repository, aiding in efficient data management.
select id, path, branch, patterns, account_idfrom databricks_workspace_repo, jsonb_array_elements_text(sparse_checkout_patterns) as patterns;
select id, path, branch, patterns.value as patterns, account_idfrom databricks_workspace_repo, json_each(sparse_checkout_patterns) as patterns;
List total repos per provider
Gain insights into the distribution of repositories across different providers. This is useful for understanding which providers are most commonly used for hosting repositories in your Databricks workspace.
select provider, count(*) as total_reposfrom databricks_workspace_repogroup by provider;
select provider, count(*) as total_reposfrom databricks_workspace_repogroup by provider;
Schema for databricks_workspace_repo
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form, e.g. connection_name. | |
account_id | text | The Databricks Account ID in which the resource is located. | |
branch | text | Branch that the local version of the repo is checked out to. | |
head_commit_id | text | SHA-1 hash representing the commit ID of the current HEAD of the repo. | |
id | bigint | = | ID of the repo object in the workspace. |
path | text | = | Path for the repo in the workspace. |
provider | text | Git provider. | |
sparse_checkout_patterns | jsonb | List of patterns to include for sparse checkout. | |
title | text | The title of the resource. | |
url | text | URL of the Git repo to be linked. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks
You can pass the configuration to the command with the --config
argument:
steampipe_export_databricks --config '<your_config>' databricks_workspace_repo