steampipe plugin install aws

Table: aws_glue_crawler - Query AWS Glue Crawlers using SQL

The AWS Glue Crawler is a component of AWS Glue service that automates the extraction, transformation, and loading (ETL) process. It traverses your data stores, identifies data formats, and suggests schemas and transformations. This enables you to categorize, search, and query metadata across your AWS environment.

Table Usage Guide

The aws_glue_crawler table in Steampipe provides you with information about crawlers within AWS Glue. This table allows you, as a DevOps engineer, to query crawler-specific details, including its role, database, schedule, classifiers, and associated metadata. You can utilize this table to gather insights on crawlers, such as their run frequency, the database they are associated with, their status, and more. The schema outlines the various attributes of the Glue crawler for you, including the crawler ARN, creation date, last run time, and associated tags.

Examples

Basic info

Determine the status and creation details of your AWS Glue crawlers to better understand their function and manage them effectively. This can be particularly useful for identifying any crawlers that may require attention or modification.

select
name,
state,
database_name,
creation_time,
description,
recrawl_behavior
from
aws_glue_crawler;
select
name,
state,
database_name,
creation_time,
description,
recrawl_behavior
from
aws_glue_crawler;

List running crawlers

Discover the segments that are currently operational within your AWS Glue Crawlers to understand which tasks are active and could be consuming resources. This could be useful for resource management and troubleshooting ongoing tasks.

select
name,
state,
database_name,
creation_time,
description,
recrawl_behavior
from
aws_glue_crawler
where
state = 'RUNNING';
select
name,
state,
database_name,
creation_time,
description,
recrawl_behavior
from
aws_glue_crawler
where
state = 'RUNNING';

Schema for aws_glue_crawler

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form, e.g. connection_name.
account_idtextThe AWS Account ID in which the resource is located.
akasjsonbArray of globally unique identifier strings (also known as) for the resource.
arntextThe ARN of the crawler.
classifiersjsonbA list of UTF-8 strings that specify the custom classifiers that are associated with the crawler.
configurationjsonbCrawler configuration information.
crawl_elapsed_timebigintIf the crawler is running, contains the total time elapsed since the last crawl began.
crawler_lineage_settingstextSpecifies whether data lineage is enabled for the crawler.
crawler_security_configurationtextThe name of the SecurityConfiguration structure to be used by this crawler.
creation_timetimestamp with time zoneThe time that the crawler was created.
database_nametextThe name of the database in which the crawler's output is stored.
descriptiontextA description of the crawler.
last_crawljsonbThe status of the last crawl, and potentially error information if an error occurred.
last_updatedtimestamp with time zoneThe time that the crawler was last updated.
nametext=The name of the crawler.
partitiontextThe AWS partition in which the resource is located (aws, aws-cn, or aws-us-gov).
recrawl_behaviortextSpecifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. A value of CRAWL_EVERYTHING specifies crawling the entire dataset again. A value of CRAWL_NEW_FOLDERS_ONLY specifies crawling only folders that were added since the last crawler run. A value of CRAWL_EVENT_MODE specifies crawling only the changes identified by Amazon S3 events.
regiontextThe AWS Region in which the resource is located.
roletextThe Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.
schedulejsonbFor scheduled crawlers, the schedule when the crawler runs.
schema_change_policyjsonbThe policy that specifies update and delete behaviors for the crawler.
statetextIndicates whether the crawler is running or pending.
table_prefixtextThe prefix added to the names of tables that are created.
targetsjsonbA collection of targets to crawl.
titletextTitle of the resource.
versionbigintThe version of the crawler.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- aws

You can pass the configuration to the command with the --config argument:

steampipe_export_aws --config '<your_config>' aws_glue_crawler