Monthly Archives: June 2019

Exporting large resultset from Google BigQuery

BigQuery allows you to query public datasets using sql-like syntax. You can download your result as csv directly, but if it’s very large, you have to jump through a few hoops to get it.

The general steps are:

  1. Save as Table
  2. Export to Google Bucket
  3. Download from bucket

Go to your BigQuery interface, such as this dataset of github archives:
https://bigquery.cloud.google.com/table/githubarchive:day.20150101

Run your Query and Save as Table
First, make sure you’ve selected the right project (note red box on left).
Then, click the Save to Table button.
Screen Shot 2019-06-04 at 5.23.10 PM.png

In the pop-up, Enter Table to copy to. Select the project (gharchiver-240019) and the dataset (gitarchive) under that project and give it a name (2015)
Screen Shot 2019-06-04 at 5.23.25 PM.png

Export to Google Bucket
Screen Shot 2019-06-04 at 5.23.53 PM.png

I prefer to GZIP it to make it smaller. In the Google Cloud Storage URI, write in the bucket name (gharchiver in this case) and a filename. Notice that in this case, the filename contains the * wildcard character. If the Table is too large, it will have to write it over several files. The * wildcard character will be a number starting from 0 to indicate the file number.
Screen Shot 2019-06-04 at 5.26.40 PM.png

Download from bucket
Finally, go to your google bucket and download your data.
(In this tutorial, I assumed you had a bucket already created, but if not, use this interface to create one.)
Screen Shot 2019-06-04 at 5.32.19 PM.png

Advertisements
Tagged