Table: {csv_filename}
Query data from CSV files. A table is automatically created to represent each
CSV file found in the configured paths
.
For instance, if paths
is set to /Users/myuser/csv/*.csv
, and that directory contains:
- products.csv:product_name,skuPaper,P001Better Paper,P002
- users.csv:first_name,last_name,emailMichael,Scott,mscott@dmi.comDwight,Schrute,dschrute@dmi.comPamela,Beesly,pbeesly@dmi.com
This plugin will create 2 tables:
- products
- users
Which you can then query directly:
select *from users;
All column values are returned as text data type.
Examples
Note: All examples in this section assume the header
configuration argument is set to auto
(default value). For more information on how column names are created, please see Column Names.
Inspect the table structure
Assuming your connection is called csv
(the default), list all tables with:
.inspect csv+----------+--------------------------------------------+| table | description |+----------+--------------------------------------------+| products | CSV file at /Users/myuser/csv/products.csv || users | CSV file at /Users/myuser/csv/users.csv |+----------+--------------------------------------------+
To get defails for a specific table, inspect it by name:
.inspect csv.users+------------+------+-------------+| column | type | description |+------------+------+-------------+| first_name | text | Field 0. || last_name | text | Field 1. || email | text | Field 2. |+------------+------+-------------+
Query a simple file
Given the file users.csv
, the query is:
select *from users;
Query a complex file name
Given the file My complex file-name.csv
, the query uses identifier quotes:
select *from "My complex file-name"
Query specific columns
Columns are always in text form when read from the CSV file. The column names come from the first row of the file.
select first_name, last_namefrom users
If your column names are complex, use identifier quotes:
select "First Name", "Last Name"from users
Casting column data for analysis
Text columns can be easily cast to other types:
select first_name, age :: int as iagefrom userswhere iage > 25
Query multiple CSV files
Given this data:
ips1.csv:
service,ip_addrservice1,54.176.63.153service2,222.236.38.99
ips2.csv:
service,ip_addrservice3,41.65.221.12service4,83.151.87.112service5,85.188.10.179
You can query both files like so:
create view all_ips asselect *from ips1unionselect *from ips2;select *from all_ips
Column Names
By default, the header
configuration argument is set to auto
, so when CSV files are loaded, the first row will be checked if it's a valid header row, i.e., no missing or duplicate values.
For instance, for the following CSV file users.csv
:
first_name,last_name,emailMichael,Scott,mscott@dmi.comDwight,Schrute,dschrute@dmi.comPamela,Beesly,pbeesly@dmi.com
The CSV plugin will create a table called users
with the header values as column names:
.inspect csv.users+------------+------+-------------+| column | type | description |+------------+------+-------------+| first_name | text | Field 0. || last_name | text | Field 1. || email | text | Field 2. |+------------+------+-------------+
Which produces the following query results:
> select * from dmi+------------+-----------+------------------+| first_name | last_name | email |+------------+-----------+------------------+| Dwight | Schrute | dschrute@dmi.com || Michael | Scott | mscott@dmi.com || Pamela | Beesly | pbeesly@dmi.com |+------------+-----------+------------------+
However, if the first row in users.csv
was missing a value:
first_name,,emailMichael,Scott,mscott@dmi.comDwight,Schrute,dschrute@dmi.comPamela,Beesly,pbeesly@dmi.com
The CSV plugin will assume the first row is not the header row and will create a table called users
with positional column names:
.inspect csv.users+--------+------+-------------+| column | type | description |+--------+------+-------------+| a | text | Field 0. || b | text | Field 1. || c | text | Field 2. |+--------+------+-------------+
Which produces the following query results:
> select * from dmi+------------+---------+------------------+| a | b | c |+------------+---------+------------------+| first_name | | email || Pamela | Beesly | pbeesly@dmi.com || Dwight | Schrute | dschrute@dmi.com || Michael | Scott | mscott@dmi.com |+------------+---------+------------------+
The header
configuration argument can also be set to:
on
: This setting requires the first row to be a valid header row, else the plugin will fail to create the tables.off
: This setting always assumes the first row isn't the header row and uses positional column names for all tables.