Table: gcp_bigquery_dataset - Query Google Cloud Platform BigQuery Datasets using SQL
Google Cloud Platform's BigQuery is a serverless, highly scalable, and cost-effective multi-cloud data warehouse designed for business agility. BigQuery Datasets are top-level containers that are used to organize and control access to tables and views. They provide a mechanism for grouping tables and views, and setting permissions at a group level.
Table Usage Guide
The gcp_bigquery_dataset
table provides insights into BigQuery Datasets within Google Cloud Platform. As a data analyst or data engineer, explore dataset-specific details through this table, including access controls, locations, and associated metadata. Utilize it to uncover information about datasets, such as their default partition type, default table expiration settings, and the labels applied to them.
Examples
Basic info
Explore the creation times and locations of your Google Cloud Platform BigQuery datasets. This is useful for understanding when and where your data resources were established, which can aid in resource management and optimization.
select dataset_id, self_link, creation_time, locationfrom gcp_bigquery_dataset;
select dataset_id, self_link, creation_time, locationfrom gcp_bigquery_dataset;
List datasets which are not encrypted using CMK
Identify instances where your datasets are not encrypted using a Customer-Managed Key (CMK). This helps in enhancing the security of your data by ensuring that all datasets are encrypted with a key that you manage and control.
select dataset_id, location, kms_key_namefrom gcp_bigquery_datasetwhere kms_key_name is null;
select dataset_id, location, kms_key_namefrom gcp_bigquery_datasetwhere kms_key_name is null;
List publicly accessible datasets
Discover the segments that are publicly accessible for data analysis and manipulation. This is useful in identifying potential data privacy risks or for making data publicly available for collaboration.
select dataset_id, location, ls as access_policyfrom gcp_bigquery_dataset, jsonb_array_elements(access) as lswhere ls ->> 'specialGroup' = 'allAuthenticatedUsers' or ls ->> 'iamMember' = 'allUsers';
select dataset_id, location, ls.value as access_policyfrom gcp_bigquery_dataset, json_each(access) as lswhere json_extract(ls.value, '$.specialGroup') = 'allAuthenticatedUsers' or json_extract(ls.value, '$.iamMember') = 'allUsers';
List datasets which do not have owner tag key
Discover datasets that lack an assigned owner within the Google Cloud Platform's BigQuery service. This query can be useful to identify potential gaps in data ownership and accountability.
select dataset_id, locationfrom gcp_bigquery_datasetwhere tags -> 'owner' is null;
select dataset_id, locationfrom gcp_bigquery_datasetwhere json_extract(tags, '$.owner') is null;
Query examples
Control examples
- Check if BigQuery datasets are publicly readable
- CIS v1.2.0 > 7 BigQuery > 7.1 Ensure that BigQuery datasets are not anonymously or publicly accessible
- CIS v1.2.0 > 7 BigQuery > 7.3 Ensure that a Default Customer-managed encryption key (CMEK) is specified for all BigQuery Data Sets
- CIS v1.3.0 > 7 BigQuery > 7.1 Ensure that BigQuery datasets are not anonymously or publicly accessible
- CIS v1.3.0 > 7 BigQuery > 7.3 Ensure that a Default Customer-managed encryption key (CMEK) is specified for all BigQuery Data Sets
- CIS v2.0.0 > 7 BigQuery > 7.1 Ensure that BigQuery datasets are not anonymously or publicly accessible
- CIS v2.0.0 > 7 BigQuery > 7.3 Ensure that a Default Customer-managed encryption key (CMEK) is specified for all BigQuery Data Sets
- CIS v3.0.0 > 7 BigQuery > 7.1 Ensure That BigQuery Datasets Are Not Anonymously or Publicly Accessible
- CIS v3.0.0 > 7 BigQuery > 7.3 Ensure That a Default Customer-Managed Encryption Key (CMEK) Is Specified for All BigQuery Data Sets
- CIS v3.0.0 > 8 Dataproc > 8.1 Ensure that Dataproc Cluster is encrypted using CustomerManaged Encryption Key
- Enforce corporate domain by banning gmail.com addresses access to BigQuery datasets
- Enforce corporate domain by banning googlegroups.com addresses access to BigQuery datasets
- Ensure that a default customer-managed encryption key (CMEK) is specified for all BigQuery Data Sets
Schema for gcp_bigquery_dataset
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form. | |
access | jsonb | An array of objects that define dataset access for one or more entities. | |
akas | jsonb | Array of globally unique identifier strings (also known as) for the resource. | |
creation_time | timestamp with time zone | The time when this dataset was created. | |
dataset_id | text | = | The ID of the dataset resource. |
default_partition_expiration_ms | bigint | The default partition expiration for all partitioned tables in the dataset, in milliseconds. | |
default_table_expiration_ms | bigint | The default lifetime of all tables in the dataset, in milliseconds. | |
description | text | A user-friendly description of the dataset. | |
etag | text | A hash of the resource. | |
id | text | The fully-qualified, unique, opaque ID of the dataset. | |
kind | text | The type of the resource. This property always returns the value 'bigquery#dataset'. | |
kms_key_name | text | Describes the Cloud KMS encryption key that will be used to protect destination BigQuery table. | |
labels | jsonb | A set of labels associated with this dataset. | |
last_modified_time | timestamp with time zone | The date when this dataset or any of its tables was last modified. | |
location | text | The GCP multi-region, region, or zone in which the resource is located. | |
name | text | A descriptive name for the dataset, if one exists. | |
project | text | =, !=, ~~, ~~*, !~~, !~~* | The GCP Project in which the resource is located. |
self_link | text | An URL that can be used to access the resource again. | |
sp_connection_name | text | =, !=, ~~, ~~*, !~~, !~~* | Steampipe connection name. |
sp_ctx | jsonb | Steampipe context in JSON form. | |
tags | jsonb | A map of tags for the resource. | |
title | text | Title of the resource. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- gcp
You can pass the configuration to the command with the --config
argument:
steampipe_export_gcp --config '<your_config>' gcp_bigquery_dataset