steampipe plugin install gcp

Table: gcp_dataplex_task - Query GCP Dataplex Tasks using SQL

GCP Dataplex Tasks are jobs that run on a scheduled basis or are triggered by specific events within a Dataplex Lake. These tasks can be used to manage and process data, including running custom Spark jobs or scheduled notebooks.

Table Usage Guide

The gcp_dataplex_task table allows data engineers and cloud administrators to query and manage Dataplex Tasks within their GCP environment. You can retrieve information about a task’s configuration, execution status, and associated lake. This table is useful for monitoring and managing the state and execution of Dataplex Tasks.

Examples

List all Dataplex Tasks

Retrieve a list of all Dataplex Tasks in your GCP account to get an overview of your scheduled and triggered jobs.

select
display_name,
name,
lake_name,
state,
create_time
from
gcp_dataplex_task;
select
display_name,
name,
lake_name,
state,
create_time
from
gcp_dataplex_task;

Get details of task trigger specifications

This query extracts details about how tasks are triggered, including the schedule, type, start time, and maximum retries.

select
name,
trigger_spec ->> 'type' as trigger_type,
trigger_spec ->> 'schedule' as trigger_schedule,
trigger_spec ->> 'startTime' as trigger_start_time,
trigger_spec ->> 'maxRetries' as trigger_max_retries
from
gcp_dataplex_task;
select
name,
json_extract(trigger_spec, '$.type') as trigger_type,
json_extract(trigger_spec, '$.schedule') as trigger_schedule,
json_extract(trigger_spec, '$.startTime') as trigger_start_time,
json_extract(trigger_spec, '$.maxRetries') as trigger_max_retries
from
gcp_dataplex_task;

Get execution specifications for tasks

This query retrieves the execution specifications for each task, including the service account used, project, and maximum job execution lifetime.

select
name,
execution_spec ->> 'serviceAccount' as service_account,
execution_spec ->> 'project' as project,
execution_spec ->> 'maxJobExecutionLifetime' as max_execution_lifetime
from
gcp_dataplex_task;
select
name,
json_extract(execution_spec, '$.serviceAccount') as service_account,
json_extract(execution_spec, '$.project') as project,
json_extract(execution_spec, '$.maxJobExecutionLifetime') as max_execution_lifetime
from
gcp_dataplex_task;

Get the latest execution status of tasks

This query retrieves the latest execution status for each task, including the state, trigger, and any messages associated with the last job execution.

select
name,
execution_status -> 'latestJob' ->> 'state' as latest_job_state,
execution_status -> 'latestJob' ->> 'trigger' as latest_job_trigger,
execution_status -> 'latestJob' ->> 'message' as latest_job_message,
execution_status -> 'latestJob' -> 'executionSpec' ->> 'kmsKey' as latest_job_kms_key
from
gcp_dataplex_task;
select
name,
json_extract(execution_status, '$.latestJob.state') as latest_job_state,
json_extract(execution_status, '$.latestJob.trigger') as latest_job_trigger,
json_extract(execution_status, '$.latestJob.message') as latest_job_message,
json_extract(
execution_status,
'$.latestJob.executionSpec.kmsKey'
) as latest_job_kms_key
from
gcp_dataplex_task;

Dataplex tasks with their associated lakes

This is useful for understanding how tasks are distributed across different lakes in your Dataplex environment.

select
t.name as task_name,
t.state as task_state,
t.create_time as task_create_time,
l.name as lake_name,
l.location as lake_location,
l.state as lake_state
from
gcp_dataplex_task as t
join gcp_dataplex_lake as l on t.lake_name = l.name;
select
t.name as task_name,
t.state as task_state,
t.create_time as task_create_time,
l.name as lake_name,
l.location as lake_location,
l.state as lake_state
from
gcp_dataplex_task as t
join gcp_dataplex_lake as l on t.lake_name = l.name;

Schema for gcp_dataplex_task

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form.
akasjsonbArray of globally unique identifier strings (also known as) for the resource.
create_timetimestamp with time zoneThe time when the task was created.
descriptiontextDescription of the task.
display_nametext=User friendly display name.
execution_specjsonbSpec related to how a task is executed.
execution_statusjsonbStatus of the latest task executions.
lake_nametext=The relative resource name of the lake.
locationtextThe GCP multi-region, region, or zone in which the resource is located.
nametext=The relative resource name of the task.
notebookjsonbConfig related to running scheduled Notebooks.
projecttext=, !=, ~~, ~~*, !~~, !~~*The GCP Project in which the resource is located.
self_linktextServer-defined URL for the resource.
sp_connection_nametext=, !=, ~~, ~~*, !~~, !~~*Steampipe connection name.
sp_ctxjsonbSteampipe context in JSON form.
sparkjsonbConfig related to running custom Spark tasks.
statetext=Current state of the task.
tagsjsonbA map of tags for the resource.
titletextTitle of the resource.
trigger_specjsonbSpec related to how often and when a task should be triggered.
uidtextSystem generated globally unique ID for the task.
update_timetimestamp with time zoneThe time when the task was last updated.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- gcp

You can pass the configuration to the command with the --config argument:

steampipe_export_gcp --config '<your_config>' gcp_dataplex_task