Table: {csv_filename} - Query CSV Files using SQL
CSV Files are a type of structured data file format that stores tabular data, such as a spreadsheet or database in plain text. These files can be easily imported and exported from programs that store data in tables, such as Microsoft Excel or Open Office Calc. CSV files are primarily used to transport data between applications that handle lots of data, and they support a wide array of data types and are flexible in terms of what type of data they can hold.
Table Usage Guide
The csv_filename
table provides insights into the data stored within CSV files. As a data analyst, you can leverage this table to extract, transform, and load data from CSV files for in-depth analysis and reporting. Use it to uncover valuable insights from your data, such as identifying trends, patterns, and correlations.
Schema link: https://hub.steampipe.io/plugins/turbot/csv/tables/csv_filename
Query data from CSV files. A table is automatically created to represent each
CSV file found in the configured paths
.
For instance, if paths
is set to /Users/myuser/csv/*.csv
, and that directory contains:
- products.csv:product_name,skuPaper,P001Better Paper,P002
- users.csv:first_name,last_name,emailMichael,Scott,mscott@dmi.comDwight,Schrute,dschrute@dmi.comPamela,Beesly,pbeesly@dmi.com
This plugin will create 2 tables:
- products
- users
Which you can then query directly:
select *from users;
All column values are returned as text data type.
Examples
Note: All examples in this section assume the header
configuration argument is set to auto
(default value). For more information on how column names are created, please see Column Names.
Inspect the table structure
Assuming your connection is called csv
(the default), list all tables with:
.inspect csv+----------+--------------------------------------------+| table | description |+----------+--------------------------------------------+| products | CSV file at /Users/myuser/csv/products.csv || users | CSV file at /Users/myuser/csv/users.csv |+----------+--------------------------------------------+
To get defails for a specific table, inspect it by name:
.inspect csv.users+------------+------+-------------+| column | type | description |+------------+------+-------------+| first_name | text | Field 0. || last_name | text | Field 1. || email | text | Field 2. |+------------+------+-------------+
Query a simple file
Explore all user data to gain a comprehensive understanding of your user base. This could be beneficial in identifying trends, understanding user behavior, and informing strategic decisions.
Given the file users.csv
, the query is:
select *from users;
select *from users;
Query a complex file name
Explore the intricacies of a complex file by analyzing its various attributes. This is useful for understanding the file's structure and content in a comprehensive manner.
Given the file My complex file-name.csv
, the query uses identifier quotes:
select *from "My complex file-name";
select *from "My complex file-name";
Query specific columns
Determine the areas in which you want to focus by selecting specific user details. This is useful when you want to narrow down your data analysis to specific user attributes. Columns are always in text form when read from the CSV file. The column names come from the first row of the file.
select first_name, last_namefrom users;
select first_name, last_namefrom users;
If your column names are complex, use identifier quotes:
select "First Name", "Last Name"from users;
select "First Name", "Last Name"from users;
Casting column data for analysis
Analyze the settings to understand which users are older than 25. This can be useful in tailoring age-specific content or offers. Text columns can be easily cast to other types:
select first_name, age :: int as iagefrom userswhere iage > 25;
select first_name, CAST(age as INTEGER) as iagefrom userswhere CAST(age as INTEGER) > 25;
Query multiple CSV files
Determine the contents of multiple CSV files in a unified view. This is beneficial when needing to analyze or compare data from multiple sources simultaneously. Given this data:
ips1.csv:
service,ip_addrservice1,54.176.63.153service2,222.236.38.99
ips2.csv:
service,ip_addrservice3,41.65.221.12service4,83.151.87.112service5,85.188.10.179
You can query both files like so:
create view all_ips asselect *from ips1unionselect *from ips2;select *from all_ips;
create view all_ips asselect *from ips1unionselect *from ips2;select *from all_ips;
Column Names
By default, the header
configuration argument is set to auto
, so when CSV files are loaded, the first row will be checked if it's a valid header row, i.e., no missing or duplicate values.
For instance, for the following CSV file users.csv
:
first_name,last_name,emailMichael,Scott,mscott@dmi.comDwight,Schrute,dschrute@dmi.comPamela,Beesly,pbeesly@dmi.com
The CSV plugin will create a table called users
with the header values as column names:
.inspect csv.users+------------+------+-------------+| column | type | description |+------------+------+-------------+| first_name | text | Field 0. || last_name | text | Field 1. || email | text | Field 2. |+------------+------+-------------+
Which produces the following query results:
> select * from dmi+------------+-----------+------------------+| first_name | last_name | email |+------------+-----------+------------------+| Dwight | Schrute | dschrute@dmi.com || Michael | Scott | mscott@dmi.com || Pamela | Beesly | pbeesly@dmi.com |+------------+-----------+------------------+
However, if the first row in users.csv
was missing a value:
first_name,,emailMichael,Scott,mscott@dmi.comDwight,Schrute,dschrute@dmi.comPamela,Beesly,pbeesly@dmi.com
The CSV plugin will assume the first row is not the header row and will create a table called users
with positional column names:
.inspect csv.users+--------+------+-------------+| column | type | description |+--------+------+-------------+| a | text | Field 0. || b | text | Field 1. || c | text | Field 2. |+--------+------+-------------+
Which produces the following query results:
> select * from dmi+------------+---------+------------------+| a | b | c |+------------+---------+------------------+| first_name | | email || Pamela | Beesly | pbeesly@dmi.com || Dwight | Schrute | dschrute@dmi.com || Michael | Scott | mscott@dmi.com |+------------+---------+------------------+
The header
configuration argument can also be set to:
on
: This setting requires the first row to be a valid header row, else the plugin will fail to create the tables.off
: This setting always assumes the first row isn't the header row and uses positional column names for all tables.
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- csv
You can pass the configuration to the command with the --config
argument:
steampipe_export_csv --config '<your_config>' {csv_filename}