Table: aws_macie2_classification_job - Query AWS Macie2 Classification Jobs using SQL
The AWS Macie2 Classification Job is a feature of Amazon Macie, a fully managed data security and data privacy service. It uses machine learning and pattern matching to discover and protect your sensitive data. The Classification Job specifically scans and classifies data in specified S3 buckets, providing visibility into the types and nature of data stored, and assisting in meeting data privacy regulations.
Table Usage Guide
The aws_macie2_classification_job
table in Steampipe provides you with information about classification jobs within AWS Macie2. This table allows you, as a DevOps engineer, to query job-specific details, including job type, job status, and job creation and completion times. You can utilize this table to gather insights on jobs, such as jobs that are currently running, jobs that have completed, and the results of those jobs. The schema outlines the various attributes of the Macie2 classification job for you, including the job ID, job ARN, S3 bucket definition, and associated tags.
Examples
Basic info
Discover the segments that are currently active in your Amazon Macie classification job. This query is particularly useful in understanding the status and location of your data security and privacy tasks.
select job_id, arn, name, job_status, regionfrom aws_macie2_classification_job;
select job_id, arn, name, job_status, regionfrom aws_macie2_classification_job;
Get S3 bucket details for each classification job
Identify instances where specific details for each S3 bucket associated with each classification job are required. This is useful for understanding the relationship between your classification jobs and the S3 buckets they interact with.
select job_id, detail -> 'AccountId' as account_id, detail -> 'Buckets' as bucketsfrom aws_macie2_classification_job, jsonb_array_elements(s3_job_definition -> 'BucketDefinitions') as detail;
select job_id, json_extract(detail.value, '$.AccountId') as account_id, json_extract(detail.value, '$.Buckets') as bucketsfrom aws_macie2_classification_job, json_each(s3_job_definition, '$.BucketDefinitions') as detail;
List paused or cancelled classification jobs
Discover the segments that have paused or cancelled classification jobs to better manage your AWS Macie resources and ensure efficient usage. This is useful in identifying any unnecessary jobs that may be taking up resources and could be resumed or completely cancelled.
select job_id, arn, name, job_status as statusfrom aws_macie2_classification_jobwhere job_status = 'CANCELLED' or job_status = 'PAUSED';
select job_id, arn, name, job_status as statusfrom aws_macie2_classification_jobwhere job_status = 'CANCELLED' or job_status = 'PAUSED';
Get the number of times each classification job has run
Determine the frequency of each classification job's execution in your AWS Macie environment. This information can be useful to understand the workload distribution and identify any potential areas of optimization.
select job_id, arn, statistics ->> 'ApproximateNumberOfObjectsToProcess' as approximate_number_of_objects_to_process, statistics ->> 'NumberOfRuns' as number_of_runsfrom aws_macie2_classification_job;
select job_id, arn, json_extract( statistics, '$.ApproximateNumberOfObjectsToProcess' ) as approximate_number_of_objects_to_process, json_extract(statistics, '$.NumberOfRuns') as number_of_runsfrom aws_macie2_classification_job;
Control examples
- All Controls > S3 > Ensure all data in AWS S3 has been discovered, classified and secured when required
- CIS v1.4.0 > 2 Storage > 2.1 Simple Storage Service (S3) > 2.1.4 Ensure all data in Amazon S3 has been discovered, classified and secured when required
- CIS v1.5.0 > 2 Storage > 2.1 Simple Storage Service (S3) > 2.1.4 Ensure all data in Amazon S3 has been discovered, classified and secured when required
- CIS v2.0.0 > 2 Storage > 2.1 Simple Storage Service (S3) > 2.1.3 Ensure all data in Amazon S3 has been discovered, classified and secured when required
- CIS v3.0.0 > 2 Storage > 2.1 Simple Storage Service (S3) > 2.1.3 Ensure all data in Amazon S3 has been discovered, classified and secured when required
Schema for aws_macie2_classification_job
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form. | |
account_id | text | =, !=, ~~, ~~*, !~~, !~~* | The AWS Account ID in which the resource is located. |
akas | jsonb | Array of globally unique identifier strings (also known as) for the resource. | |
allow_list_ids | jsonb | An array of unique identifiers, one for each allow list that the job uses when it analyzes data. | |
arn | text | The Amazon Resource Name (ARN) of the job. | |
bucket_definitions | jsonb | The namespace of the AWS service that provides the resource, or a custom-resource. | |
client_token | text | The token that was provided to ensure the idempotency of the request to create the job. | |
created_at | timestamp with time zone | The date and time, in UTC and extended ISO 8601 format, when the job was created. | |
custom_data_identifier_ids | jsonb | The custom data identifiers that the job uses to analyze data. | |
description | text | The custom description of the job. | |
initial_run | boolean | For a recurring job, specifies whether you configured the job to analyze all existing, eligible objects immediately after the job was created (true). | |
job_id | text | = | The unique identifier for the job. |
job_status | text | =, != | The status of a classification job. |
job_type | text | =, != | The schedule for running a classification job. |
last_run_error_status | jsonb | Specifies whether any account- or bucket-level access errors occurred when a classification job ran. | |
last_run_time | timestamp with time zone | This value indicates when the most recent run started. | |
managed_data_identifier_ids | jsonb | An array of unique identifiers, one for each managed data identifier that the job is explicitly configured to include (use) or exclude (not use) when it analyzes data. | |
managed_data_identifier_selector | text | The selection type that determines which managed data identifiers the job uses when it analyzes data. | |
name | text | =, != | The custom name of the job. |
partition | text | The AWS partition in which the resource is located (aws, aws-cn, or aws-us-gov). | |
region | text | The AWS Region in which the resource is located. | |
s3_job_definition | jsonb | Specifies which S3 buckets contain the objects that a classification job analyzes, and the scope of that analysis. | |
sampling_percentage | bigint | The sampling depth, as a percentage, that determines the percentage of eligible objects that the job analyzes. | |
schedule_frequency | jsonb | Specifies the recurrence pattern for running a classification job. | |
sp_connection_name | text | =, !=, ~~, ~~*, !~~, !~~* | Steampipe connection name. |
sp_ctx | jsonb | Steampipe context in JSON form. | |
statistics | jsonb | Provides processing statistics for a classification job. | |
tags | jsonb | A map of tags for the resource. | |
title | text | Title of the resource. | |
user_paused_details | jsonb | Provides information about when a classification job was paused. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- aws
You can pass the configuration to the command with the --config
argument:
steampipe_export_aws --config '<your_config>' aws_macie2_classification_job