steampipe plugin install aws

Table: aws_emr_cluster - Query AWS Elastic MapReduce Cluster using SQL

The AWS Elastic MapReduce (EMR) Cluster is a web service that makes it easy to process large amounts of data efficiently. EMR uses Hadoop processing combined with several AWS products to do tasks such as web indexing, data mining, log file analysis, machine learning, scientific simulation, and data warehousing. Users can interactively analyze their data to achieve faster time-to-insights.

Table Usage Guide

The aws_emr_cluster table in Steampipe provides you with information about clusters within AWS Elastic MapReduce (EMR). This table allows you as a data engineer to query cluster-specific details, including cluster status, hardware and software configurations, VPC settings, and associated metadata. You can utilize this table to gather insights on EMR clusters, such as cluster states, hardware and software configurations, and verification of VPC settings. The schema outlines the various attributes of the EMR cluster for you, including the cluster ID, name, status, normalized instance hours, and associated tags.

Examples

Basic info

Explore the status and termination settings of your AWS EMR clusters to manage resources effectively. This helps in identifying clusters that are in use and those that can be terminated to save costs.

select
id,
cluster_arn,
name,
auto_terminate,
status ->> 'State' as state,
tags
from
aws_emr_cluster;
select
id,
cluster_arn,
name,
auto_terminate,
json_extract(status, '$.State') as state,
tags
from
aws_emr_cluster;

List clusters with auto-termination disabled

Determine the areas in which clusters are operating with auto-termination disabled, which could potentially lead to unnecessary resource usage and increased costs.

select
name,
cluster_arn,
auto_terminate
from
aws_emr_cluster
where
not auto_terminate;
select
name,
cluster_arn,
auto_terminate
from
aws_emr_cluster
where
auto_terminate = 0;

List clusters which have terminated with errors

Identify instances where clusters have ended with errors. This allows you to pinpoint specific locations where issues have occurred, enabling efficient troubleshooting and problem resolution.

select
id,
name,
status ->> 'State' as state,
status -> 'StateChangeReason' ->> 'Message' as state_change_reason
from
aws_emr_cluster
where
status ->> 'State' = 'TERMINATED_WITH_ERRORS';
select
id,
name,
json_extract(status, '$.State') as state,
json_extract(
json_extract(status, '$.StateChangeReason'),
'$.Message'
) as state_change_reason
from
aws_emr_cluster
where
json_extract(status, '$.State') = 'TERMINATED_WITH_ERRORS';

Get application names and versions installed for each cluster

Determine the applications and their respective versions installed across different clusters. This is useful for tracking software versions and ensuring consistency across your cluster environment.

select
name,
cluster_arn,
a ->> 'Name' as application_name,
a ->> 'Version' as application_version
from
aws_emr_cluster,
jsonb_array_elements(applications) as a;
select
name,
cluster_arn,
json_extract(a.value, '$.Name') as application_name,
json_extract(a.value, '$.Version') as application_version
from
aws_emr_cluster,
json_each(applications) as a;

List clusters with logging disabled

Determine the areas in which logging is disabled in your clusters. This is useful for identifying potential gaps in your data tracking and ensuring comprehensive monitoring across all clusters.

select
name,
cluster_arn,
log_uri
from
aws_emr_cluster
where
log_uri is null
select
name,
cluster_arn,
log_uri
from
aws_emr_cluster
where
log_uri is null

List clusters with logging enabled but log encryption is disabled

Explore clusters where logging is activated but without the added security layer of log encryption. This can help identify potential vulnerabilities in your data security practices.

select
name,
cluster_arn,
log_uri,
log_encryption_kms_key_id
from
aws_emr_cluster
where
log_uri is not null
and log_encryption_kms_key_id is null;
select
name,
cluster_arn,
log_uri,
log_encryption_kms_key_id
from
aws_emr_cluster
where
log_uri is not null
and log_encryption_kms_key_id is null;

Schema for aws_emr_cluster

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form.
account_idtext=, !=, ~~, ~~*, !~~, !~~*The AWS Account ID in which the resource is located.
akasjsonbArray of globally unique identifier strings (also known as) for the resource.
applicationsjsonbThe applications installed on this cluster.
auto_scaling_roletextAn IAM role for automatic scaling policies.
auto_terminatebooleanSpecifies whether the cluster should terminate after completing all steps.
cluster_arntextThe Amazon Resource Name of the cluster.
configurationsjsonbApplies only to Amazon EMR releases 4.x and later. The list of Configurations supplied to the EMR cluster.
custom_ami_idtextAvailable only in Amazon EMR version 5.7.0 and later. The ID of a custom Amazon EBS-backed Linux AMI if the cluster uses a custom AMI.
ebs_root_volume_iopsbigintThe IOPS, of the Amazon EBS root device volume of the Linux AMI that is used for each Amazon EC2 instance.
ebs_root_volume_sizetextThe size of the Amazon EBS root device volume of the Linux AMI that is used for each EC2 instance, in GiB. Available in Amazon EMR version 4.x and later.
ebs_root_volume_throughputbigintThe throughput, in MiB/s, of the Amazon EBS root device volume of the Linux AMI that is used for each Amazon EC2 instance.
ec2_instance_attributesjsonbProvides information about the EC2 instances in a cluster grouped by category.
idtext=The unique identifier for the cluster.
instance_collection_typetextThe instance group configuration of the cluster.
kerberos_attributesjsonbAttributes for Kerberos configuration when Kerberos authentication is enabled using a security configuration.
log_encryption_kms_key_idtextThe AWS KMS customer master key (CMK) used for encrypting log files. This attribute is only available with EMR version 5.30.0 and later, excluding EMR 6.0.0.
log_uritextThe path to the Amazon S3 location where logs for this cluster are stored.
master_public_dns_nametextThe DNS name of the master node.
nametextThe name of the cluster.
normalized_instance_hoursbigintAn approximation of the cost of the cluster, represented in m1.small/hours.
os_release_labeltextThe Amazon Linux release specified in a cluster launch RunJobFlow request.
outpost_arntextThe Amazon Resource Name (ARN) of the Outpost where the cluster is launched.
partitiontextThe AWS partition in which the resource is located (aws, aws-cn, or aws-us-gov).
placement_groupsjsonbPlacement group configured for an Amazon EMR cluster.
regiontextThe AWS Region in which the resource is located.
release_labeltextThe Amazon EMR release label, which determines the version of open-source application packages installed on the cluster.
repo_upgrade_on_boottextApplies only when CustomAmiID is used. Specifies the type of updates that are applied from the Amazon Linux AMI package repositories when an instance boots using the AMI.
requested_ami_versiontextApplies only when CustomAmiID is used. Specifies the type of updates that are applied from the Amazon Linux AMI package repositories when an instance boots using the AMI.
running_ami_versiontextThe AMI version running on this cluster.
scale_down_behaviortextThe way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized.
security_configurationtextThe name of the security configuration applied to the cluster.
service_roletextThe IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf.
sp_connection_nametext=, !=, ~~, ~~*, !~~, !~~*Steampipe connection name.
sp_ctxjsonbSteampipe context in JSON form.
statetext=The current state of the cluster.
statusjsonbThe current status details about the cluster.
step_concurrency_levelbigintSpecifies the number of steps that can be executed concurrently.
tagsjsonbA map of tags for the resource.
tags_srcjsonbA list of tags associated with a cluster.
termination_protectedbooleanIndicates whether Amazon EMR will lock the cluster to prevent the EC2 instances from being terminated by an API call or user intervention, or in the event of a cluster error.
titletextTitle of the resource.
unhealthy_node_replacementbooleanIndicates whether Amazon EMR should gracefully replace Amazon EC2 core instances that have degraded within the cluster.
visible_to_all_usersbooleanIndicates whether the cluster is visible to all IAM users of the AWS account associated with the cluster.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- aws

You can pass the configuration to the command with the --config argument:

steampipe_export_aws --config '<your_config>' aws_emr_cluster