steampipe plugin install csvsteampipe plugin install csv

CSV + Steampipe

A comma-separated values (CSV) file is a delimited text file that contains records of data.

Steampipe is an open source CLI to instantly query data using SQL.

Query data from the users.csv file:

select
first_name,
last_name
from
users;
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| Michael | Scott |
| Dwight | Schrute |
+------------+-----------+

Documentation

Get started

Install

Download and install the latest CSV plugin:

steampipe plugin install csv

Credentials

No credentials are required.

Configuration

Installing the latest csv plugin will create a config file (~/.steampipe/config/csv.spc) with a single connection named csv:

connection "csv" {
plugin = "csv"
# Paths is a list of locations to search for CSV files
# Paths can be configured with a local directory, a remote Git repository URL, or an S3 bucket URL
# Refer https://hub.steampipe.io/plugins/turbot/csv#supported-path-formats for more information
# All paths are resolved relative to the current working directory (CWD)
# Wildcard based searches are supported, including recursive searches
# For example:
# - "*.csv" matches all CSV files in the CWD
# - "*.csv.gz" matches all gzipped CSV files in the CWD
# - "**/*.csv" matches all CSV files in the CWD and all sub-directories
# - "../*.csv" matches all CSV files in the CWD's parent directory
# - "steampipe*.csv" matches all CSV files starting with "steampipe" in the CWD
# - "/path/to/dir/*.csv" matches all CSV files in a specific directory
# - "/path/to/dir/custom.csv" matches a specific file
# If paths includes "*", all files (including non-CSV files) in
# the CWD will be matched, which may cause errors if incompatible file types exist
# Defaults to CWD
paths = [ "*.csv", "*.csv.gz" ]
# The field delimiter character when parsing CSV files. Must be a single
# character. Defaults to comma.
# separator = ","
# If set, then lines beginning with the comment character without preceding
# whitespace are ignored. Disabled by default.
# comment = "#"
# Determine whether to use the first row as the header row when creating column names.
# Valid values are "auto", "on", "off":
# - "auto": If there are no empty or duplicate values use the first row as the header; else, use the first row as a data row and use generic column names, e.g., "a", "b".
# - "on": Use the first row as the header. If there are empty or duplicate values, the tables will fail to load.
# - "off": Do not use the first row as the header. All column names will be generic.
# Defaults to "auto".
# header = "auto"
}
  • paths - A list of directory paths to search for CSV files. Paths are resolved relative to the current working directory. Paths may include wildcards and also supports ** for recursive matching. Defaults to the current working directory.
  • separator - Field delimiter when parsing files. Defaults to ,.
  • comment - Lines starting with this comment character are ignored. Disabled by default.
  • header - Whether to use the first row as the header row when creating column names. Valid values are "auto", "on", "off". Defaults to "auto". For more information, please see Column Names.

Supported Path Formats

The paths config argument is flexible and can search for CSV files from several sources, e.g., local directory paths, Git, S3.

The following sources are supported:

Paths may include wildcards and support ** for recursive matching. For example:

connection "csv" {
plugin = "csv"
paths = [
"*.csv",
"**/*.csv",
"github.com/codeforamerica/ohana-api//data/sample-csv//*.csv",
"bitbucket.org/ved_misra/sample-csv//*.csv",
"gitlab.com/subhajit7/example-files//sample-csv-files//*.csv",
"s3::https://bucket.s3.us-east-1.amazonaws.com/test_folder//*.csv"
]
}

Note: If any path matches on * without .csv, all files (including non-CSV files) in the directory will be matched, which may cause errors if incompatible file types exist.

Configuring Local File Paths

You can define a list of local directory paths to search for CSV files. Paths are resolved relative to the current working directory. For example:

  • "*.csv" matches all CSV files in the CWD
  • "*.csv.gz" matches all gzipped CSV files in the CWD
  • "*/.csv" matches all CSV files in the CWD and all sub-directories
  • "../*.csv" matches all CSV files in the CWD's parent directory
  • "steampipe*.csv" matches all CSV files starting with "steampipe" in the CWD
  • "/path/to/dir/*.csv" matches all CSV files in a specific directory
  • "/path/to/dir/custom.csv" matches a specific file
connection "csv" {
plugin = "csv"
paths = [ "*.csv", "~/*.csv", "/path/to/dir/custom.csv" ]
}

Configuring Remote Git Repository URLs

You can also configure paths with any Git remote repository URLs, e.g., GitHub, BitBucket, GitLab. The plugin will then attempt to retrieve any CSV files from the remote repositories.

For example:

  • github.com/codeforamerica/ohana-api//data/sample-csv//*.csv matches CSV files in the specified repository.
  • github.com/codeforamerica/ohana-api//data//**/*.csv matches all CSV files in the specified repository and all subdirectories.
  • github.com/codeforamerica/ohana-api//data//**/*.csv?ref=v1.2.0 matches all CSV files in the specific tag of a repository.
  • github.com/codeforamerica/ohana-api//data//**/*.csv matches all CSV files in the specified folder path.

You can specify a subdirectory after a double-slash (//) if you want to download only a specific subdirectory from a downloaded directory.

connection "csv" {
plugin = "csv"
paths = [ "github.com/codeforamerica/ohana-api//data/sample-csv//*.csv" ]
}

Similarly, you can define a list of GitLab and BitBucket URLs to search for CSV files:

connection "csv" {
plugin = "csv"
paths = [
"github.com/codeforamerica/ohana-api//data/sample-csv//*.csv",
"bitbucket.org/ved_misra/sample-csv//*.csv",
"gitlab.com/subhajit7/example-files//sample-csv-files//*.csv"
]
}

Configuring S3 URLs

You can also query all CSV files stored inside an S3 bucket (public or private) using the bucket URL.

Accessing a Private Bucket

In order to access your files in a private S3 bucket, you will need to configure your credentials. You can use your configured AWS profile from local ~/.aws/config, or pass the credentials using the standard AWS environment variables, e.g., AWS_PROFILE, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION.

We recommend using AWS profiles for authentication.

Note: Make sure that region is configured in the config. If not set in the config, region will be fetched from the standard environment variable AWS_REGION.

You can also authenticate your request by setting the AWS profile and region in paths. For example:

connection "csv" {
plugin = "csv"
paths = [
"s3::https://bucket-2.s3.us-east-1.amazonaws.com//*.csv?aws_profile=<AWS_PROFILE>",
"s3::https://bucket-2.s3.us-east-1.amazonaws.com/test_folder//*.csv?aws_profile=<AWS_PROFILE>"
]
}

Note:

In order to access the bucket, the IAM user or role will require the following IAM permissions:

  • s3:ListBucket
  • s3:GetObject
  • s3:GetObjectVersion

If the bucket is in another AWS account, the bucket policy will need to grant access to your user or role. For example:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ReadBucketObject",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:user/YOUR_USER"
},
"Action": ["s3:ListBucket", "s3:GetObject", "s3:GetObjectVersion"],
"Resource": ["arn:aws:s3:::test-bucket1", "arn:aws:s3:::test-bucket1/*"]
}
]
}
Accessing a Public Bucket

Public access granted to buckets and objects through ACLs and bucket policies allows any user access to data in the bucket. We do not recommend making S3 buckets public, but if there are specific objects you'd like to make public, please see How can I grant public read access to some objects in my Amazon S3 bucket?.

You can query any public S3 bucket directly using the URL without passing credentials. For example:

connection "config" {
plugin = "config"
paths = [
"s3::https://bucket-1.s3.us-east-1.amazonaws.com/test_folder//*.csv",
"s3::https://bucket-2.s3.us-east-1.amazonaws.com/test_folder//**/*.csv"
]
}

Postgres FDW

This plugin is available as a native Postgres FDW. Unlike Steampipe CLI, which ships with an embedded Postgres server instance, the Postgres FDW can be installed in any supported Postgres database version.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_postgres_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/postgres.sh)" -- csv

The installer will prompt you for the plugin name and version, download and install the appropriate files for your OS, system architecture, and Postgres version.

To configure the Postgres FDW, you will create an extension, foreign server, and schema and import the foreign schema.

CREATE EXTENSION IF NOT EXISTS steampipe_postgres_csv;
CREATE SERVER steampipe_csv FOREIGN DATA WRAPPER steampipe_postgres_csv OPTIONS (config '<your_config>');
CREATE SCHEMA csv;
IMPORT FOREIGN SCHEMA csv FROM SERVER steampipe_csv INTO csv;

SQLite Extension

This plugin is available as a SQLite Extension, making the tables available as SQLite virtual tables.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_sqlite_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/sqlite.sh)" -- csv

The installer will prompt you for the plugin name, version, and destination directory. It will then determine the OS and system architecture, and it will download and install the appropriate package.

To configure the SQLite extension, load the extension module and then run the steampipe_configure_csv function to configure it with plugin-specific options.

$ sqlite3
sqlite> .load ./steampipe_sqlite_extension_csv.so
sqlite> select steampipe_configure_csv('<your_config>');

Export

This plugin is available as a standalone Export CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- csv

You can pass the configuration to the command with the --config argument:

steampipe_export_csv --config '<your_config>' <table_name>