Table: gcp_dataplex_task - Query GCP Dataplex Tasks using SQL
GCP Dataplex Tasks are jobs that run on a scheduled basis or are triggered by specific events within a Dataplex Lake. These tasks can be used to manage and process data, including running custom Spark jobs or scheduled notebooks.
Table Usage Guide
The gcp_dataplex_task
table allows data engineers and cloud administrators to query and manage Dataplex Tasks within their GCP environment. You can retrieve information about a task’s configuration, execution status, and associated lake. This table is useful for monitoring and managing the state and execution of Dataplex Tasks.
Examples
List all Dataplex Tasks
Retrieve a list of all Dataplex Tasks in your GCP account to get an overview of your scheduled and triggered jobs.
select display_name, name, lake_name, state, create_timefrom gcp_dataplex_task;
select display_name, name, lake_name, state, create_timefrom gcp_dataplex_task;
Get details of task trigger specifications
This query extracts details about how tasks are triggered, including the schedule, type, start time, and maximum retries.
select name, trigger_spec ->> 'type' as trigger_type, trigger_spec ->> 'schedule' as trigger_schedule, trigger_spec ->> 'startTime' as trigger_start_time, trigger_spec ->> 'maxRetries' as trigger_max_retriesfrom gcp_dataplex_task;
select name, json_extract(trigger_spec, '$.type') as trigger_type, json_extract(trigger_spec, '$.schedule') as trigger_schedule, json_extract(trigger_spec, '$.startTime') as trigger_start_time, json_extract(trigger_spec, '$.maxRetries') as trigger_max_retriesfrom gcp_dataplex_task;
Get execution specifications for tasks
This query retrieves the execution specifications for each task, including the service account used, project, and maximum job execution lifetime.
select name, execution_spec ->> 'serviceAccount' as service_account, execution_spec ->> 'project' as project, execution_spec ->> 'maxJobExecutionLifetime' as max_execution_lifetimefrom gcp_dataplex_task;
select name, json_extract(execution_spec, '$.serviceAccount') as service_account, json_extract(execution_spec, '$.project') as project, json_extract(execution_spec, '$.maxJobExecutionLifetime') as max_execution_lifetimefrom gcp_dataplex_task;
Get the latest execution status of tasks
This query retrieves the latest execution status for each task, including the state, trigger, and any messages associated with the last job execution.
select name, execution_status -> 'latestJob' ->> 'state' as latest_job_state, execution_status -> 'latestJob' ->> 'trigger' as latest_job_trigger, execution_status -> 'latestJob' ->> 'message' as latest_job_message, execution_status -> 'latestJob' -> 'executionSpec' ->> 'kmsKey' as latest_job_kms_keyfrom gcp_dataplex_task;
select name, json_extract(execution_status, '$.latestJob.state') as latest_job_state, json_extract(execution_status, '$.latestJob.trigger') as latest_job_trigger, json_extract(execution_status, '$.latestJob.message') as latest_job_message, json_extract( execution_status, '$.latestJob.executionSpec.kmsKey' ) as latest_job_kms_keyfrom gcp_dataplex_task;
Dataplex tasks with their associated lakes
This is useful for understanding how tasks are distributed across different lakes in your Dataplex environment.
select t.name as task_name, t.state as task_state, t.create_time as task_create_time, l.name as lake_name, l.location as lake_location, l.state as lake_statefrom gcp_dataplex_task as t join gcp_dataplex_lake as l on t.lake_name = l.name;
select t.name as task_name, t.state as task_state, t.create_time as task_create_time, l.name as lake_name, l.location as lake_location, l.state as lake_statefrom gcp_dataplex_task as t join gcp_dataplex_lake as l on t.lake_name = l.name;
Schema for gcp_dataplex_task
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form. | |
akas | jsonb | Array of globally unique identifier strings (also known as) for the resource. | |
create_time | timestamp with time zone | The time when the task was created. | |
description | text | Description of the task. | |
display_name | text | = | User friendly display name. |
execution_spec | jsonb | Spec related to how a task is executed. | |
execution_status | jsonb | Status of the latest task executions. | |
lake_name | text | = | The relative resource name of the lake. |
location | text | The GCP multi-region, region, or zone in which the resource is located. | |
name | text | = | The relative resource name of the task. |
notebook | jsonb | Config related to running scheduled Notebooks. | |
project | text | =, !=, ~~, ~~*, !~~, !~~* | The GCP Project in which the resource is located. |
self_link | text | Server-defined URL for the resource. | |
sp_connection_name | text | =, !=, ~~, ~~*, !~~, !~~* | Steampipe connection name. |
sp_ctx | jsonb | Steampipe context in JSON form. | |
spark | jsonb | Config related to running custom Spark tasks. | |
state | text | = | Current state of the task. |
tags | jsonb | A map of tags for the resource. | |
title | text | Title of the resource. | |
trigger_spec | jsonb | Spec related to how often and when a task should be triggered. | |
uid | text | System generated globally unique ID for the task. | |
update_time | timestamp with time zone | The time when the task was last updated. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- gcp
You can pass the configuration to the command with the --config
argument:
steampipe_export_gcp --config '<your_config>' gcp_dataplex_task