
Executing SQL queries without databases on CSV files using q
Problem
Have you ever stared at a CSV file on your screen and wished it was a database executing SQL queries? We've often found ourselves wanting not to directly process CSVs but favoring to run SQL queries.
Solution
brew install q
Examples
Example 1
ID,name,yearofbirth,haircolor
1,Max Mustermann,1994,brown
2,Lars Agne,1983,blonde
3,Otto Normal,1995,brown
4,Lieschen Müller,1987,black
5,John Doe,1980,brown
$> q -H -d ";" "SELECT COUNT(ID) FROM persons.csv WHERE haircolor = 'brown'"
- Result -
3
Example 2
ID,email
1,max.mustermann@example.com
2,lars.agne@example.com
3,otto.normal@example.com
4,lieschen.mueller@example.com
5,john.doe@example.com
$> q -H -d ";" "SELECT persons.name, emails.email FROM persons.csv persons
JOIN emails.csv emails ON (persons.id = emails.id)
WHERE persons.haircolor = 'brown'"
- Result -
Max Mustermann max.mustermann@example.com
Otto Normal otto.normal@example.com
John Doe john.doe@example.com
Example 3
$> ps aux | q -H "SELECT COMMAND FROM - WHERE STARTED = '24Dez20'"
- Result -
/System/Library/CoreServices/Santa
...
Performance and Limitations
The current version of q is significantly faster in running SQL queries on CSV files than comparable Go-based tools, such as Textql and Octosql [3]. We would be interested in comparing q with xsv, which can index and process CSVs.
However, we noticed that q does not allow, for example, to execute FROM on a subquery. Another limitation is that q uses SQLite as its single SQL dialect. Further limitations can be found on q's website [1].

Further Aspects
- [1] http://harelba.github.io/q/
- [2] http://harelba.github.io/q/#installation
- [3] https://github.com/harelba/q/blob/master/test/BENCHMARK.md
- [4] https://github.com/BurntSushi/xsv
---
Authors: Fionn Fuchs & Maximilian Konzack / Working Student & Software Engineer / Office Leipzig
Download Toilet Paper #140: Executing SQL queries without databases on CSV files using q (pdf)
Want to write the next ToiletPaper? Apply at jambit!