BigQuery allows you to query public datasets using sql-like syntax. You can download your result as csv directly, but if it’s very large, you have to jump through a few hoops to get it.
The general steps are:
- Save as Table
- Export to Google Bucket
- Download from bucket
Go to your BigQuery interface, such as this dataset of github archives:
Run your Query and Save as Table
First, make sure you’ve selected the right project (note red box on left).
Then, click the Save to Table button.
In the pop-up, Enter Table to copy to. Select the project (gharchiver-240019) and the dataset (gitarchive) under that project and give it a name (2015)
Export to Google Bucket
I prefer to GZIP it to make it smaller. In the Google Cloud Storage URI, write in the bucket name (gharchiver in this case) and a filename. Notice that in this case, the filename contains the * wildcard character. If the Table is too large, it will have to write it over several files. The * wildcard character will be a number starting from 0 to indicate the file number.
Download from bucket
Finally, go to your google bucket and download your data.
(In this tutorial, I assumed you had a bucket already created, but if not, use this interface to create one.)