Table: aws_glue_data_quality_ruleset - Query AWS Glue Data Quality Ruleset using SQL
The AWS Glue Data Quality Ruleset is a feature of AWS Glue that enables you to enforce quality rules on your data sources. It allows you to define, manage, and run data quality rules on your AWS Glue Data Catalog tables. This feature helps ensure that your data is accurate, consistent, and reliable, thereby improving the overall quality of your data.
Table Usage Guide
The aws_glue_data_quality_ruleset
table in Steampipe provides you with information about the rulesets used for data quality checks in AWS Glue. This table allows you as a data engineer or developer to query ruleset-specific details, including the ruleset name, status, related applications, and associated metadata. You can utilize this table to gather insights on rulesets, such as ruleset usage, associated applications, status of rulesets, and more. The schema outlines the various attributes of the data quality ruleset for you, including the ruleset ARN, creation date, last modified date, and associated tags.
Examples
Basic info
Explore the creation dates and descriptions of various data quality rulesets in AWS Glue. This can help in understanding the evolution of data quality standards and guidelines over time in your AWS environment.
select name, database_name, table_name, created_on, description, rule_set, recommendation_run_idfrom aws_glue_data_quality_ruleset;
select name, database_name, table_name, created_on, description, rule_set, recommendation_run_idfrom aws_glue_data_quality_ruleset;
List rulesets created in the last 30 days
Determine the areas in which rulesets have been created in the past month, providing a recent history of data quality ruleset generation. This can be useful for monitoring the frequency and timing of new ruleset creation.
select name, database_name, table_name, created_on, description, rule_set, recommendation_run_idfrom aws_glue_data_quality_rulesetwhere created_on >= now() - interval '30' day;
select name, database_name, table_name, created_on, description, rule_set, recommendation_run_idfrom aws_glue_data_quality_rulesetwhere created_on >= datetime('now', '-30 day');
Count ruleset by database
Explore which databases have the most rulesets in order to optimize data quality checks. This insight can help prioritize which databases need more attention or resources for maintaining data quality.
select database_name, count("name") as rulset_countfrom aws_glue_data_quality_rulesetgroup by database_name;
select database_name, count("name") as rulset_countfrom aws_glue_data_quality_rulesetgroup by database_name;
Get Glue database details for a ruleset
Analyze the settings to understand the specific details of a Glue database associated with a certain data quality ruleset. This can be particularly useful for auditing or troubleshooting purposes, allowing you to pinpoint specific locations and creation times of the database.
select r.name, r.database_name, d.catalog_id, d.create_time as databse_create_time, d.location_urifrom aws_glue_data_quality_ruleset as r, aws_glue_catalog_database as dwhere r.database_name = d.name and r.name = 'ruleset1';
select r.name, r.database_name, d.catalog_id, d.create_time as databse_create_time, d.location_urifrom aws_glue_data_quality_ruleset as r, aws_glue_catalog_database as dwhere r.database_name = d.name and r.name = 'ruleset1';
Count rules per data quality ruleset
Determine the number of rules within each data quality ruleset to assess the complexity and thoroughness of your data validation process.
select name, rule_countfrom aws_glue_data_quality_ruleset;
select name, rule_countfrom aws_glue_data_quality_ruleset;
Schema for aws_glue_data_quality_ruleset
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form. | |
account_id | text | =, !=, ~~, ~~*, !~~, !~~* | The AWS Account ID in which the resource is located. |
created_on | timestamp with time zone | <=, <, >=, > | The date and time the data quality ruleset was created. |
database_name | text | The name of the database where the glue table exists. | |
description | text | A description of the data quality ruleset. | |
last_modified_on | timestamp with time zone | = | The date and time the data quality ruleset was last modified. |
name | text | = | The name of the data quality ruleset. |
partition | text | The AWS partition in which the resource is located (aws, aws-cn, or aws-us-gov). | |
recommendation_run_id | text | When a ruleset was created from a recommendation run, this run ID is generated to link the two together. | |
region | text | The AWS Region in which the resource is located. | |
rule_count | bigint | The number of rules in the ruleset. | |
rule_set | text | A Data Quality Definition Language (DQDL) ruleset. | |
sp_connection_name | text | =, !=, ~~, ~~*, !~~, !~~* | Steampipe connection name. |
sp_ctx | jsonb | Steampipe context in JSON form. | |
table_name | text | The name of the glue table. | |
target_table | jsonb | An object representing a glue table. | |
title | text | Title of the resource. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- aws
You can pass the configuration to the command with the --config
argument:
steampipe_export_aws --config '<your_config>' aws_glue_data_quality_ruleset