steampipe plugin install aws

Table: aws_macie2_classification_job - Query AWS Macie2 Classification Jobs using SQL

The AWS Macie2 Classification Job is a feature of Amazon Macie, a fully managed data security and data privacy service. It uses machine learning and pattern matching to discover and protect your sensitive data. The Classification Job specifically scans and classifies data in specified S3 buckets, providing visibility into the types and nature of data stored, and assisting in meeting data privacy regulations.

Table Usage Guide

The aws_macie2_classification_job table in Steampipe provides you with information about classification jobs within AWS Macie2. This table allows you, as a DevOps engineer, to query job-specific details, including job type, job status, and job creation and completion times. You can utilize this table to gather insights on jobs, such as jobs that are currently running, jobs that have completed, and the results of those jobs. The schema outlines the various attributes of the Macie2 classification job for you, including the job ID, job ARN, S3 bucket definition, and associated tags.

Examples

Basic info

Discover the segments that are currently active in your Amazon Macie classification job. This query is particularly useful in understanding the status and location of your data security and privacy tasks.

select
job_id,
arn,
name,
job_status,
region
from
aws_macie2_classification_job;
select
job_id,
arn,
name,
job_status,
region
from
aws_macie2_classification_job;

Get S3 bucket details for each classification job

Identify instances where specific details for each S3 bucket associated with each classification job are required. This is useful for understanding the relationship between your classification jobs and the S3 buckets they interact with.

select
job_id,
detail -> 'AccountId' as account_id,
detail -> 'Buckets' as buckets
from
aws_macie2_classification_job,
jsonb_array_elements(s3_job_definition -> 'BucketDefinitions') as detail;
select
job_id,
json_extract(detail.value, '$.AccountId') as account_id,
json_extract(detail.value, '$.Buckets') as buckets
from
aws_macie2_classification_job,
json_each(s3_job_definition, '$.BucketDefinitions') as detail;

List paused or cancelled classification jobs

Discover the segments that have paused or cancelled classification jobs to better manage your AWS Macie resources and ensure efficient usage. This is useful in identifying any unnecessary jobs that may be taking up resources and could be resumed or completely cancelled.

select
job_id,
arn,
name,
job_status as status
from
aws_macie2_classification_job
where
job_status = 'CANCELLED'
or job_status = 'PAUSED';
select
job_id,
arn,
name,
job_status as status
from
aws_macie2_classification_job
where
job_status = 'CANCELLED'
or job_status = 'PAUSED';

Get the number of times each classification job has run

Determine the frequency of each classification job's execution in your AWS Macie environment. This information can be useful to understand the workload distribution and identify any potential areas of optimization.

select
job_id,
arn,
statistics ->> 'ApproximateNumberOfObjectsToProcess' as approximate_number_of_objects_to_process,
statistics ->> 'NumberOfRuns' as number_of_runs
from
aws_macie2_classification_job;
select
job_id,
arn,
json_extract(
statistics,
'$.ApproximateNumberOfObjectsToProcess'
) as approximate_number_of_objects_to_process,
json_extract(statistics, '$.NumberOfRuns') as number_of_runs
from
aws_macie2_classification_job;

Schema for aws_macie2_classification_job

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form.
account_idtext=, !=, ~~, ~~*, !~~, !~~*The AWS Account ID in which the resource is located.
akasjsonbArray of globally unique identifier strings (also known as) for the resource.
allow_list_idsjsonbAn array of unique identifiers, one for each allow list that the job uses when it analyzes data.
arntextThe Amazon Resource Name (ARN) of the job.
bucket_definitionsjsonbThe namespace of the AWS service that provides the resource, or a custom-resource.
client_tokentextThe token that was provided to ensure the idempotency of the request to create the job.
created_attimestamp with time zoneThe date and time, in UTC and extended ISO 8601 format, when the job was created.
custom_data_identifier_idsjsonbThe custom data identifiers that the job uses to analyze data.
descriptiontextThe custom description of the job.
initial_runbooleanFor a recurring job, specifies whether you configured the job to analyze all existing, eligible objects immediately after the job was created (true).
job_idtext=The unique identifier for the job.
job_statustext=, !=The status of a classification job.
job_typetext=, !=The schedule for running a classification job.
last_run_error_statusjsonbSpecifies whether any account- or bucket-level access errors occurred when a classification job ran.
last_run_timetimestamp with time zoneThis value indicates when the most recent run started.
managed_data_identifier_idsjsonbAn array of unique identifiers, one for each managed data identifier that the job is explicitly configured to include (use) or exclude (not use) when it analyzes data.
managed_data_identifier_selectortextThe selection type that determines which managed data identifiers the job uses when it analyzes data.
nametext=, !=The custom name of the job.
partitiontextThe AWS partition in which the resource is located (aws, aws-cn, or aws-us-gov).
regiontextThe AWS Region in which the resource is located.
s3_job_definitionjsonbSpecifies which S3 buckets contain the objects that a classification job analyzes, and the scope of that analysis.
sampling_percentagebigintThe sampling depth, as a percentage, that determines the percentage of eligible objects that the job analyzes.
schedule_frequencyjsonbSpecifies the recurrence pattern for running a classification job.
sp_connection_nametext=, !=, ~~, ~~*, !~~, !~~*Steampipe connection name.
sp_ctxjsonbSteampipe context in JSON form.
statisticsjsonbProvides processing statistics for a classification job.
tagsjsonbA map of tags for the resource.
titletextTitle of the resource.
user_paused_detailsjsonbProvides information about when a classification job was paused.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- aws

You can pass the configuration to the command with the --config argument:

steampipe_export_aws --config '<your_config>' aws_macie2_classification_job