Table: aws_glue_job - Query AWS Glue Jobs using SQL
AWS Glue Jobs are a part of AWS Glue service that enables you to organize, clean, and transform your data. These jobs can be used to extract, transform, and load (ETL) data from data sources to data targets. AWS Glue Jobs automate the time-consuming data preparation steps, making it easier for you to analyze data.
Table Usage Guide
The aws_glue_job
table in Steampipe provides you with information about AWS Glue Jobs. This table enables you, as a DevOps engineer, data engineer, or other technical professional, to query job-specific details, such as job properties, execution status, and associated metadata. You can utilize this table to gather insights on jobs, including job run states, job parameters, allocated capacity, and more. The schema outlines the various attributes of the AWS Glue Job for you, including the job name, role, command, and associated tags.
Examples
Basic info
Explore which AWS Glue jobs have been created in different regions, along with their capacity and timeout details. This can help in managing resources and strategizing workload distribution across multiple regions effectively.
select name, created_on, description, max_capacity, number_of_workers, region, timeoutfrom aws_glue_job;
select name, created_on, description, max_capacity, number_of_workers, region, timeoutfrom aws_glue_job;
List jobs with glue connections attached
Identify the AWS Glue jobs that have established connections to understand where data processing tasks are linked. This can assist in managing and optimizing data workflows.
select title, arn, created_on, connections -> 'Connections' as connectionsfrom aws_glue_jobwhere connections is not null;
select title, arn, created_on, json_extract(connections, '$.Connections') as connectionsfrom aws_glue_jobwhere connections is not null;
List job details with bookmark enabled
Explore which jobs have the bookmark feature enabled in AWS Glue. This is beneficial for tracking job progress, particularly useful when you want to resume certain jobs from where they left off.
select title, arn, created_on, job_bookmark ->> 'Attempt' as total_attempts, job_bookmark ->> 'Run' as total_runs, job_bookmark ->> 'RunId' as run_idfrom aws_glue_jobwhere job_bookmark is not null;
select title, arn, created_on, json_extract(job_bookmark, '$.Attempt') as total_attempts, json_extract(job_bookmark, '$.Run') as total_runs, json_extract(job_bookmark, '$.RunId') as run_idfrom aws_glue_jobwhere job_bookmark is not null;
List jobs with cloud watch encryption disabled
Determine the areas in which job encryption settings may be compromising the security of your cloud watch data. This query is useful for identifying potential vulnerabilities and ensuring data protection standards are met.
select j.title, j.arn, j.created_on, j.region, j.account_id, cloud_watch_encryptionfrom aws_glue_job j left join aws_glue_security_configuration s on j.security_configuration = s.namewhere cloud_watch_encryption is null or cloud_watch_encryption ->> 'CloudWatchEncryptionMode' = 'DISABLED';
select j.title, j.arn, j.created_on, j.region, j.account_id, cloud_watch_encryptionfrom aws_glue_job j left join aws_glue_security_configuration s on j.security_configuration = s.namewhere cloud_watch_encryption is null or json_extract( cloud_watch_encryption, '$.CloudWatchEncryptionMode' ) = 'DISABLED';
List jobs with job bookmarks encryption disabled
Determine the areas in which job bookmarks encryption is disabled within AWS Glue jobs. This is useful for identifying potential security vulnerabilities where sensitive job data may not be adequately protected.
select j.title, j.arn, j.created_on, j.region, j.account_id, job_bookmarks_encryptionfrom aws_glue_job j left join aws_glue_security_configuration s on j.security_configuration = s.namewhere job_bookmarks_encryption is null or job_bookmarks_encryption ->> 'JobBookmarksEncryptionMode' = 'DISABLED';
select j.title, j.arn, j.created_on, j.region, j.account_id, job_bookmarks_encryptionfrom aws_glue_job j left join aws_glue_security_configuration s on j.security_configuration = s.namewhere job_bookmarks_encryption is null or json_extract( job_bookmarks_encryption, '$.JobBookmarksEncryptionMode' ) = 'DISABLED';
List jobs with s3 encryption disabled
Determine the areas in which AWS Glue jobs may have S3 encryption disabled. This can help identify potential security risks and ensure data protection compliance.
select j.title, j.arn, j.created_on, j.region, j.account_id, e as s3_encryptionfrom aws_glue_job j left join aws_glue_security_configuration s on j.security_configuration = s.name, jsonb_array_elements(s.s3_encryption) ewhere e is null or e ->> 'S3EncryptionMode' = 'DISABLED';
select j.title, j.arn, j.created_on, j.region, j.account_id, json_extract(s.s3_encryption, '$[*]') as s3_encryptionfrom aws_glue_job j left join aws_glue_security_configuration s on j.security_configuration = s.namewhere s3_encryption is null or json_extract(s3_encryption, '$.S3EncryptionMode') = 'DISABLED';
List jobs with logging disabled
Determine the areas in which AWS Glue jobs have continuous CloudWatch logging disabled. This can be useful to identify potential gaps in your logging strategy, ensuring that all jobs are adequately tracked and monitored.
select title, arn, created_on region, account_idfrom aws_glue_jobwhere default_arguments ->> '--enable-continuous-cloudwatch-log' = 'false';
select title, arn, created_on, region, account_idfrom aws_glue_jobwhere json_extract( default_arguments, '$.--enable-continuous-cloudwatch-log' ) = 'false';
List jobs with monitoring disabled
Determine the areas in which AWS Glue jobs have been set up without monitoring enabled. This is useful for identifying potential blind spots in your system's performance tracking.
select title, arn, created_on region, account_idfrom aws_glue_jobwhere default_arguments ->> '--enable-metrics' = 'false';
select title, arn, created_on, region, account_idfrom aws_glue_jobwhere json_extract(default_arguments, '$.--enable-metrics') = 'false';
List script details associated to the job
Determine the specifics of scripts linked to a job in your AWS Glue setup, such as their names, locations, and associated languages. This can help in managing and troubleshooting your ETL (Extract, Transform, Load) jobs.
select title, arn, created_on, command ->> 'Name' as script_name, command ->> 'ScriptLocation' as script_location, default_arguments ->> '--job-language' as job_languagefrom aws_glue_job;
select title, arn, created_on, json_extract(command, '$.Name') as script_name, json_extract(command, '$.ScriptLocation') as script_location, json_extract(default_arguments, '$.--job-language') as job_languagefrom aws_glue_job;
List jobs with server side encryption disabled
Determine the areas in which jobs are running without server-side encryption. This is useful for identifying potential security risks and ensuring compliance with encryption protocols.
select title, arn, created_on region, account_id, default_arguments ->> '--encryption-type' as encryption_typefrom aws_glue_jobwhere default_arguments ->> '--encryption-type' is null;
select title, arn, created_on, region, account_id, json_extract(default_arguments, '$.--encryption-type') as encryption_typefrom aws_glue_jobwhere json_extract(default_arguments, '$.--encryption-type') is null;
Control examples
Schema for aws_glue_job
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form. | |
account_id | text | =, !=, ~~, ~~*, !~~, !~~* | The AWS Account ID in which the resource is located. |
akas | jsonb | Array of globally unique identifier strings (also known as) for the resource. | |
allocated_capacity | double precision | [DEPRECATED] This column has been deprecated and will be removed in a future release, use max_capacity instead. The number of Glue data processing units (DPUs) that can be allocated when this job runs. | |
arn | text | The Amazon Resource Name (ARN) of the GlueJob. | |
code_gen_configuration_nodes | jsonb | The representation of a directed acyclic graph on which both the Glue Studio visual component and Glue Studio code generation is based. | |
command | jsonb | The JobCommand that runs this job. | |
connections | jsonb | The connections used for this job. | |
created_on | timestamp with time zone | The time and date that this job definition was created. | |
default_arguments | jsonb | The default arguments for this job, specified as name-value pairs. | |
description | text | A description of the job. | |
execution_class | text | TIndicates whether the job is run with a standard or flexible execution class. | |
execution_property | jsonb | An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job. | |
glue_version | text | Glue version determines the versions of Apache Spark and Python that Glue supports. | |
job_bookmark | jsonb | Defines a point that a job can resume processing. | |
last_modified_on | timestamp with time zone | The last point in time when this job definition was modified. | |
log_uri | text | This field is reserved for future use. | |
max_capacity | double precision | The number of Glue data processing units (DPUs) that can be allocated when this job runs. | |
max_retries | bigint | The maximum number of times to retry this job after a JobRun fails. | |
name | text | = | The name of the GlueJob. |
non_overridable_arguments | jsonb | Non-overridable arguments for this job, specified as name-value pairs. | |
notification_property | jsonb | Specifies configuration properties of a job notification. | |
number_of_workers | bigint | The number of workers of a defined workerType that are allocated when a job runs. | |
partition | text | The AWS partition in which the resource is located (aws, aws-cn, or aws-us-gov). | |
region | text | The AWS Region in which the resource is located. | |
role | text | The name or Amazon Resource Name (ARN) of the IAM role associated with this job. | |
security_configuration | text | The name of the SecurityConfiguration structure to be used with this job. | |
source_control_details | jsonb | The details for a source control configuration for a job, allowing synchronization of job artifacts to or from a remote repository. | |
sp_connection_name | text | =, !=, ~~, ~~*, !~~, !~~* | Steampipe connection name. |
sp_ctx | jsonb | Steampipe context in JSON form. | |
timeout | bigint | The job timeout in minutes. | |
title | text | Title of the resource. | |
worker_type | text | The type of predefined worker that is allocated when a job runs. Accepts a value of Standard, G.1X, or G.2X. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- aws
You can pass the configuration to the command with the --config
argument:
steampipe_export_aws --config '<your_config>' aws_glue_job