Table: databricks_sql_query_history - Query Databricks SQL Query History using SQL
Databricks SQL Query History is a feature of Databricks that keeps a record of the SQL queries executed in a Databricks workspace. It provides details about each query such as the query text, execution time, user who ran the query, and more. This information can be useful for auditing, performance tuning, and understanding the usage patterns of the Databricks workspace.
Table Usage Guide
The databricks_sql_query_history
table provides insights into the SQL queries executed within a Databricks workspace. As a data analyst or data engineer, you can explore the details of past queries through this table, including the query text, execution time, user who ran the query, and more. Utilize it to audit the usage of the Databricks workspace, tune the performance of your SQL queries, and understand the usage patterns of your team.
Examples
Basic info
Analyze your Databricks SQL query history to gain insights into the user activity and performance. This can help you identify trends, optimize resource usage, and improve overall system efficiency.
select query_id, warehouse_id, executed_as_user_name, query_text, rows_produced, account_idfrom databricks_sql_query_history;
select query_id, warehouse_id, executed_as_user_name, query_text, rows_produced, account_idfrom databricks_sql_query_history;
List all DML queries
Explore which data modifications have been made in your Databricks warehouse. This is useful for tracking changes and understanding the impact of various insertions, updates, and deletions within your data.
select query_id, warehouse_id, executed_as_user_name, query_text, rows_produced, statement_type, account_idfrom databricks_sql_query_historywhere statement_type in ('INSERT', 'UPDATE', 'DELETE');
select query_id, warehouse_id, executed_as_user_name, query_text, rows_produced, statement_type, account_idfrom databricks_sql_query_historywhere statement_type in ('INSERT', 'UPDATE', 'DELETE');
List all failed queries
Analyze your databricks history to pinpoint specific instances where queries have failed. This can help you identify common issues and improve the overall efficiency of your database operations.
select query_id, query_text, error_message, account_idfrom databricks_sql_query_historywhere status = 'FAILED';
select query_id, query_text, error_message, account_idfrom databricks_sql_query_historywhere status = 'FAILED';
List queries with no more expected updates
Explore the list of queries that have completed their updates to get insights into any potential errors or issues. This is useful for identifying and troubleshooting problematic queries within your Databricks account.
select query_id, query_text, error_message, account_idfrom databricks_sql_query_historywhere is_final;
select query_id, query_text, error_message, account_idfrom databricks_sql_query_historywhere is_final;
List queries that have existing plans
Discover the segments that have existing plans in your Databricks SQL query history. This can be useful for identifying potential errors or inefficiencies in your queries.
select query_id, query_text, error_message, account_idfrom databricks_sql_query_historywhere plans_state = 'EXISTS';
select query_id, query_text, error_message, account_idfrom databricks_sql_query_historywhere plans_state = 'EXISTS';
List query history by order of duration
Analyze the history of executed queries to understand which ones took the most time to complete. This can help in pinpointing inefficient queries that may need optimization for better performance.
select query_id, rows_produced, duration, executed_as_user_name, query_text, account_idfrom databricks_sql_query_historyorder by duration desc;
select query_id, rows_produced, duration, executed_as_user_name, query_text, account_idfrom databricks_sql_query_historyorder by duration desc;
List metrics for each query execution
Determine the performance of each executed query by assessing factors such as compilation time, execution time, and total time taken. This can help in identifying inefficient queries and optimizing them for better performance.
select query_id, metrics ->> 'compilation_time_ms' as compilation_time_ms, metrics ->> 'execution_time_ms' as execution_time_ms, metrics ->> 'network_sent_bytes' as network_sent_bytes, metrics ->> 'read_bytes' as read_bytes, metrics ->> 'result_fetch_time_ms' as result_fetch_time_ms, metrics ->> 'result_from_cache' as result_from_cache, metrics ->> 'rows_read_count' as rows_read_count, metrics ->> 'total_time_ms' as total_time_msfrom databricks_sql_query_history;
select query_id, json_extract(metrics, '$.compilation_time_ms') as compilation_time_ms, json_extract(metrics, '$.execution_time_ms') as execution_time_ms, json_extract(metrics, '$.network_sent_bytes') as network_sent_bytes, json_extract(metrics, '$.read_bytes') as read_bytes, json_extract(metrics, '$.result_fetch_time_ms') as result_fetch_time_ms, json_extract(metrics, '$.result_from_cache') as result_from_cache, json_extract(metrics, '$.rows_read_count') as rows_read_count, json_extract(metrics, '$.total_time_ms') as total_time_msfrom databricks_sql_query_history;
Schema for databricks_sql_query_history
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form, e.g. connection_name. | |
account_id | text | The Databricks Account ID in which the resource is located. | |
channel_used | jsonb | Channel information for the SQL warehouse at the time of query execution. | |
duration | bigint | Total execution time of the query from the client's point of view. | |
endpoint_id | text | Alias for warehouse id. | |
error_message | text | Message describing why the query could not complete. | |
executed_as_user_id | bigint | The ID of the user whose credentials were used to run the query. | |
executed_as_user_name | text | The email address or username of the user whose credentials were used to run the query. | |
execution_end_time_ms | bigint | The time execution of the query ended. | |
is_final | boolean | Whether more updates for the query are expected. | |
lookup_key | text | A key that can be used to look up query details. | |
metrics | jsonb | Metrics about query execution. | |
plans_state | text | Whether plans exist for the execution, or the reason why they are missing. | |
query_end_time_ms | bigint | The time the query ended. | |
query_id | text | Databricks query ID. | |
query_start_time_ms | bigint | The time the query started. | |
query_text | text | The text of the query. | |
rows_produced | bigint | The number of results returned by the query. | |
spark_ui_url | text | URL to the query plan. | |
statement_type | text | Type of statement for this query. | |
status | text | = | The status of the query. |
user_id | bigint | = | The ID of the user who ran the query. |
user_name | text | The email address or username of the user who ran the query. | |
warehouse_id | text | = | The UUID that uniquely identifies this data source/SQL warehouse across the API. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- databricks
You can pass the configuration to the command with the --config
argument:
steampipe_export_databricks --config '<your_config>' databricks_sql_query_history