Monthly Archives: June 2019

Exporting large resultset from Google BigQuery

BigQuery allows you to query public datasets using sql-like syntax. You can download your result as csv directly, but if it’s very large, you have to jump through a few hoops to get it.

The general steps are:

  1. Save as Table
  2. Export to Google Bucket
  3. Download from bucket

Go to your BigQuery interface, such as this dataset of github archives:

Run your Query and Save as Table
First, make sure you’ve selected the right project (note red box on left).
Then, click the Save to Table button.
Screen Shot 2019-06-04 at 5.23.10 PM.png

In the pop-up, Enter Table to copy to. Select the project (gharchiver-240019) and the dataset (gitarchive) under that project and give it a name (2015)
Screen Shot 2019-06-04 at 5.23.25 PM.png

Export to Google Bucket
Screen Shot 2019-06-04 at 5.23.53 PM.png

I prefer to GZIP it to make it smaller. In the Google Cloud Storage URI, write in the bucket name (gharchiver in this case) and a filename. Notice that in this case, the filename contains the * wildcard character. If the Table is too large, it will have to write it over several files. The * wildcard character will be a number starting from 0 to indicate the file number.
Screen Shot 2019-06-04 at 5.26.40 PM.png

Download from bucket
Finally, go to your google bucket and download your data.
(In this tutorial, I assumed you had a bucket already created, but if not, use this interface to create one.)
Screen Shot 2019-06-04 at 5.32.19 PM.png