steampipe plugin install aws

Table: aws_glue_job - Query AWS Glue Jobs using SQL

AWS Glue Jobs are a part of AWS Glue service that enables you to organize, clean, and transform your data. These jobs can be used to extract, transform, and load (ETL) data from data sources to data targets. AWS Glue Jobs automate the time-consuming data preparation steps, making it easier for you to analyze data.

Table Usage Guide

The aws_glue_job table in Steampipe provides you with information about AWS Glue Jobs. This table enables you, as a DevOps engineer, data engineer, or other technical professional, to query job-specific details, such as job properties, execution status, and associated metadata. You can utilize this table to gather insights on jobs, including job run states, job parameters, allocated capacity, and more. The schema outlines the various attributes of the AWS Glue Job for you, including the job name, role, command, and associated tags.

Examples

Basic info

Explore which AWS Glue jobs have been created in different regions, along with their capacity and timeout details. This can help in managing resources and strategizing workload distribution across multiple regions effectively.

select
name,
created_on,
description,
max_capacity,
number_of_workers,
region,
timeout
from
aws_glue_job;
select
name,
created_on,
description,
max_capacity,
number_of_workers,
region,
timeout
from
aws_glue_job;

List jobs with glue connections attached

Identify the AWS Glue jobs that have established connections to understand where data processing tasks are linked. This can assist in managing and optimizing data workflows.

select
title,
arn,
created_on,
connections -> 'Connections' as connections
from
aws_glue_job
where
connections is not null;
select
title,
arn,
created_on,
json_extract(connections, '$.Connections') as connections
from
aws_glue_job
where
connections is not null;

List job details with bookmark enabled

Explore which jobs have the bookmark feature enabled in AWS Glue. This is beneficial for tracking job progress, particularly useful when you want to resume certain jobs from where they left off.

select
title,
arn,
created_on,
job_bookmark ->> 'Attempt' as total_attempts,
job_bookmark ->> 'Run' as total_runs,
job_bookmark ->> 'RunId' as run_id
from
aws_glue_job
where
job_bookmark is not null;
select
title,
arn,
created_on,
json_extract(job_bookmark, '$.Attempt') as total_attempts,
json_extract(job_bookmark, '$.Run') as total_runs,
json_extract(job_bookmark, '$.RunId') as run_id
from
aws_glue_job
where
job_bookmark is not null;

List jobs with cloud watch encryption disabled

Determine the areas in which job encryption settings may be compromising the security of your cloud watch data. This query is useful for identifying potential vulnerabilities and ensuring data protection standards are met.

select
j.title,
j.arn,
j.created_on,
j.region,
j.account_id,
cloud_watch_encryption
from
aws_glue_job j
left join aws_glue_security_configuration s on j.security_configuration = s.name
where
cloud_watch_encryption is null
or cloud_watch_encryption ->> 'CloudWatchEncryptionMode' = 'DISABLED';
select
j.title,
j.arn,
j.created_on,
j.region,
j.account_id,
cloud_watch_encryption
from
aws_glue_job j
left join aws_glue_security_configuration s on j.security_configuration = s.name
where
cloud_watch_encryption is null
or json_extract(
cloud_watch_encryption,
'$.CloudWatchEncryptionMode'
) = 'DISABLED';

List jobs with job bookmarks encryption disabled

Determine the areas in which job bookmarks encryption is disabled within AWS Glue jobs. This is useful for identifying potential security vulnerabilities where sensitive job data may not be adequately protected.

select
j.title,
j.arn,
j.created_on,
j.region,
j.account_id,
job_bookmarks_encryption
from
aws_glue_job j
left join aws_glue_security_configuration s on j.security_configuration = s.name
where
job_bookmarks_encryption is null
or job_bookmarks_encryption ->> 'JobBookmarksEncryptionMode' = 'DISABLED';
select
j.title,
j.arn,
j.created_on,
j.region,
j.account_id,
job_bookmarks_encryption
from
aws_glue_job j
left join aws_glue_security_configuration s on j.security_configuration = s.name
where
job_bookmarks_encryption is null
or json_extract(
job_bookmarks_encryption,
'$.JobBookmarksEncryptionMode'
) = 'DISABLED';

List jobs with s3 encryption disabled

Determine the areas in which AWS Glue jobs may have S3 encryption disabled. This can help identify potential security risks and ensure data protection compliance.

select
j.title,
j.arn,
j.created_on,
j.region,
j.account_id,
e as s3_encryption
from
aws_glue_job j
left join aws_glue_security_configuration s on j.security_configuration = s.name,
jsonb_array_elements(s.s3_encryption) e
where
e is null
or e ->> 'S3EncryptionMode' = 'DISABLED';
select
j.title,
j.arn,
j.created_on,
j.region,
j.account_id,
json_extract(s.s3_encryption, '$[*]') as s3_encryption
from
aws_glue_job j
left join aws_glue_security_configuration s on j.security_configuration = s.name
where
s3_encryption is null
or json_extract(s3_encryption, '$.S3EncryptionMode') = 'DISABLED';

List jobs with logging disabled

Determine the areas in which AWS Glue jobs have continuous CloudWatch logging disabled. This can be useful to identify potential gaps in your logging strategy, ensuring that all jobs are adequately tracked and monitored.

select
title,
arn,
created_on region,
account_id
from
aws_glue_job
where
default_arguments ->> '--enable-continuous-cloudwatch-log' = 'false';
select
title,
arn,
created_on,
region,
account_id
from
aws_glue_job
where
json_extract(
default_arguments,
'$.--enable-continuous-cloudwatch-log'
) = 'false';

List jobs with monitoring disabled

Determine the areas in which AWS Glue jobs have been set up without monitoring enabled. This is useful for identifying potential blind spots in your system's performance tracking.

select
title,
arn,
created_on region,
account_id
from
aws_glue_job
where
default_arguments ->> '--enable-metrics' = 'false';
select
title,
arn,
created_on,
region,
account_id
from
aws_glue_job
where
json_extract(default_arguments, '$.--enable-metrics') = 'false';

List script details associated to the job

Determine the specifics of scripts linked to a job in your AWS Glue setup, such as their names, locations, and associated languages. This can help in managing and troubleshooting your ETL (Extract, Transform, Load) jobs.

select
title,
arn,
created_on,
command ->> 'Name' as script_name,
command ->> 'ScriptLocation' as script_location,
default_arguments ->> '--job-language' as job_language
from
aws_glue_job;
select
title,
arn,
created_on,
json_extract(command, '$.Name') as script_name,
json_extract(command, '$.ScriptLocation') as script_location,
json_extract(default_arguments, '$.--job-language') as job_language
from
aws_glue_job;

List jobs with server side encryption disabled

Determine the areas in which jobs are running without server-side encryption. This is useful for identifying potential security risks and ensuring compliance with encryption protocols.

select
title,
arn,
created_on region,
account_id,
default_arguments ->> '--encryption-type' as encryption_type
from
aws_glue_job
where
default_arguments ->> '--encryption-type' is null;
select
title,
arn,
created_on,
region,
account_id,
json_extract(default_arguments, '$.--encryption-type') as encryption_type
from
aws_glue_job
where
json_extract(default_arguments, '$.--encryption-type') is null;

Schema for aws_glue_job

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form.
account_idtext=, !=, ~~, ~~*, !~~, !~~*The AWS Account ID in which the resource is located.
akasjsonbArray of globally unique identifier strings (also known as) for the resource.
arntextThe Amazon Resource Name (ARN) of the GlueJob.
code_gen_configuration_nodesjsonbThe representation of a directed acyclic graph on which both the Glue Studio visual component and Glue Studio code generation is based.
commandjsonbThe JobCommand that runs this job.
connectionsjsonbThe connections used for this job.
created_ontimestamp with time zoneThe time and date that this job definition was created.
default_argumentsjsonbThe default arguments for this job, specified as name-value pairs.
descriptiontextA description of the job.
execution_classtextTIndicates whether the job is run with a standard or flexible execution class.
execution_propertyjsonbAn ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.
glue_versiontextGlue version determines the versions of Apache Spark and Python that Glue supports.
job_bookmarkjsonbDefines a point that a job can resume processing.
last_modified_ontimestamp with time zoneThe last point in time when this job definition was modified.
log_uritextThis field is reserved for future use.
max_capacitydouble precisionThe number of Glue data processing units (DPUs) that can be allocated when this job runs.
max_retriesbigintThe maximum number of times to retry this job after a JobRun fails.
nametext=The name of the GlueJob.
non_overridable_argumentsjsonbNon-overridable arguments for this job, specified as name-value pairs.
notification_propertyjsonbSpecifies configuration properties of a job notification.
number_of_workersbigintThe number of workers of a defined workerType that are allocated when a job runs.
partitiontextThe AWS partition in which the resource is located (aws, aws-cn, or aws-us-gov).
regiontextThe AWS Region in which the resource is located.
roletextThe name or Amazon Resource Name (ARN) of the IAM role associated with this job.
security_configurationtextThe name of the SecurityConfiguration structure to be used with this job.
source_control_detailsjsonbThe details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository.
sp_connection_nametext=, !=, ~~, ~~*, !~~, !~~*Steampipe connection name.
sp_ctxjsonbSteampipe context in JSON form.
timeoutbigintThe job timeout in minutes.
titletextTitle of the resource.
worker_typetextThe type of predefined worker that is allocated when a job runs. Accepts a value of Standard, G.1X, or G.2X.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- aws

You can pass the configuration to the command with the --config argument:

steampipe_export_aws --config '<your_config>' aws_glue_job