S3’s Role in a Company

As a data company, we sometimes have some very odd ways of cataloguing data. In one instance, a 50GB file was referenced by a word doc. It said to look in someone’s old laptop under a cryptic folder structure. I dusted off and turned on the 5-year-old laptop. It made some uncomfortable churning noises but managed to let me in. I was lucky that this was my old laptop and I knew the login password and the directories well enough to know what it was referring to. If I weren’t here, I don’t know how it would have been found.

This is a problem.

Here’s the solution.

I decided to look into utilizing Amazon’s S3 service. We had already used it, mostly as a dropbox with console access. But I wanted to do more with it. I wanted to make it an integral part of our data management process.

My goal was to be able to programmatically access the data in the S3 cloud. I was building a vagrant VM so I needed it to be able to automatically pull the data and use it during its installation.

First, I had to create an IAM User with credentials. Simply log into AWS’s IAM Manangement Console. There, Create a New User. I called mine vagrantVM. Immediately after the user creation, it will give you this user’s AWS Access Key and AWS Secret Access Key. WRITE THESE DOWN BECAUSE IT WILL NEVER SHOW YOU THE SECRET ACCESS KEY AGAIN AND YOU’LL NEED IT LATER.

Click on the user you just created and go to the Permissions tab. You need to Attach a Policy. You can either give this user the AmazonS3ReadOnlyAccess policy or AmazonS3FullAccess policy depending on your needs. I want my vagrant machines to only have read-only access so I opted to only give it the former policy.

Screen Shot 2016-01-25 at 1.53.08 PM.png

Next we need to set up the Linux box with the Amazon CLI (command-line) tool and the right credentials.

The easiest way to install Amazon CLI is through pip.

  1. First, make sure you have pip, otherwise:
    “sudo apt-get install -y python-pip”
  2. Then install the CLI:
    “sudo pip install awscli”

Note: If the 2nd step above doesn’t work for you and you’re on Mac OSX El Capitan, then try this command instead:
“sudo pip install awscli –upgrade –ignore-installed six”
There’s an issue with El Capitan at the time of this writing.

Next, you’ll need to configure it.

A lot of articles encourage you to use the “aws configure” command, which works nicely. But this requires user interaction, which may not always be possible, especially if I want my vagrant machine to automatically set this up.

So the alternative is to create a “~/.aws” folder and under it add two files

  • “credentials” file
    [default]
    aws_access_key_id=YOUR_AWS_ACCESS_KEY_ID
    aws_secret_access_key=YOUR_AWS_SECRET_KEY
  • “config” file
    [default]
    region=us-west-2

I placed these two files in my resources/awscli folder under my root vagrant folder. And this is my puppet script inside manifests/site.pp

$awsDir = '/home/vagrant/.aws'
  file { $awsDir:
  ensure => 'directory'
}

file { '${awsDir}/credentials':
  require => File[$awsDir],
  source => '/vagrant/resources/awscli/credentials'
}

file { '${awsDir}/config':
  require => File[$awsDir],
  source => '/vagrant/resources/awscli/config'
}

You can now download (and upload) files to S3 assuming you have the right policies attached to the user.

If you prefer to upload via a console, you can do that through the AWS S3 Console.

You can also upload through the AWS CLI commands like so:

aws s3 cp [my_file] s3://[bucket]/[folder]/[filename]

You can now also download that same file using AWS CLI

aws s3 cp s3://[bucket]/[folder]/[filename] [my_file]

 

EDIT

I need to make one thing clear when doing this with Vagrant. Everything said above is right, but when I use puppet to setup my vagrant instance, for some reason, it doesn’t recognize my .aws/credentials file. I don’t know enough about vagrant initiation, but the home directory must not be set or a different user must be doing the vagrant setup, but it can’t find ~/.aws/credentials or ~/.aws/config.

You get an error such as “Unable to locate credentials”

So you need to provide to the script that calls “aws s3 …” commands the path to the aws config. There’s a AWS_CONFIG_FILE environment variable that you can set, so in my .sh script file, I do this


export AWS_CONFIG_FILE=/home/vagrant/.aws/config
aws s3 cp s3://[bucket]/[folder]/[file] .

Note that I’m only pointing to the config file, which does not contain the access key id or secret access key according to my post above. There is no environment variable that points to the credentials file unfortunately. However, you can add these into your config file

[default]
aws_access_key_id=YOUR_AWS_ACCESS_KEY_ID
aws_secret_access_key=YOUR_AWS_SECRET_KEY
region=us-west-2

This will allow your vagrant script to point to your AWS credentials and invoke the AWS CLI command

Advertisements
Tagged , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: