Table: gcp_dataproc_cluster - Query Google Cloud Platform Dataproc Clusters using SQL
Google Cloud Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way. Operations that used to take hours or days take seconds or minutes instead, and you pay only for the resources you use. Dataproc also easily integrates with other Google Cloud services, giving you a powerful and complete data processing platform.
Table Usage Guide
The gcp_dataproc_cluster
table provides insights into Dataproc Clusters within Google Cloud Platform. As a data engineer, you can explore cluster-specific details through this table, including configurations, status, and associated metadata. Use it to uncover information about clusters, such as those with specific configurations, the operational status of clusters, and verification of associated metadata.
Examples
Basic info
Explore the configuration and status of your Google Cloud Platform's Dataproc clusters. This can help you assess the current state and settings of your clusters for better resource management and optimization.
select cluster_name, cluster_uuid, config, state, tagsfrom gcp_dataproc_cluster;
select cluster_name, cluster_uuid, config, state, tagsfrom gcp_dataproc_cluster;
List the clusters which are in error state
Explore which clusters are experiencing errors to troubleshoot and resolve issues promptly, ensuring smooth operations. This is crucial in a real-world scenario where maintaining the health and functionality of clusters is vital for various applications and services.
select cluster_name, cluster_uuid, statefrom gcp_dataproc_clusterwhere state = 'ERROR';
select cluster_name, cluster_uuid, statefrom gcp_dataproc_clusterwhere state = 'ERROR';
Get config details of a cluster
Explore the configuration details of a specific cluster to gain insights into various aspects like endpoint configuration, bucket configuration, shielded instance configuration, and master configuration. This can be particularly useful for understanding and managing the cluster's settings and configurations.
select cluster_name, config -> 'endpointConfig' as endpoint_config, config -> 'configBucket' as config_bucket, config -> 'shieldedInstanceConfig' as shielded_instance_config, config -> 'masterConfig' as master_configfrom gcp_dataproc_clusterwhere cluster_name = 'cluster-5824';
select cluster_name, json_extract(config, '$.endpointConfig') as endpoint_config, json_extract(config, '$.configBucket') as config_bucket, json_extract(config, '$.shieldedInstanceConfig') as shielded_instance_config, json_extract(config, '$.masterConfig') as master_configfrom gcp_dataproc_clusterwhere cluster_name = 'cluster-5824';
Control examples
- CIS v1.3.0 > 1 Identity and Access Management > 1.17 Ensure that dataproc cluster is encrypted using customer-managed encryption key
- CIS v2.0.0 > 1 Identity and Access Management > 1.17 Ensure that dataproc cluster is encrypted using customer-managed encryption key
- Ensure that dataproc cluster is encrypted using customer-managed encryption key
Schema for gcp_dataproc_cluster
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form. | |
akas | jsonb | Array of globally unique identifier strings (also known as) for the resource. | |
cluster_name | text | = | The cluster name. |
cluster_uuid | text | A cluster UUID (Unique Universal Identifier). Dataproc generates this value when it creates the cluster. | |
config | jsonb | The cluster config. | |
labels | jsonb | The labels to associate with this cluster. | |
location | text | The GCP multi-region, region, or zone in which the resource is located. | |
metrics | jsonb | Contains cluster daemon metrics such as HDFS and YARN stats. | |
project | text | =, !=, ~~, ~~*, !~~, !~~* | The GCP Project in which the resource is located. |
self_link | text | Server-defined URL for the resource. | |
sp_connection_name | text | =, !=, ~~, ~~*, !~~, !~~* | Steampipe connection name. |
sp_ctx | jsonb | Steampipe context in JSON form. | |
state | text | = | The cluster's state. |
status | jsonb | Cluster status. | |
status_history | jsonb | The previous cluster status. | |
tags | jsonb | A map of tags for the resource. | |
title | text | Title of the resource. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- gcp
You can pass the configuration to the command with the --config
argument:
steampipe_export_gcp --config '<your_config>' gcp_dataproc_cluster