Setting up Tomcat for debugging

Short PSA. If you want to set up Tomcat for debugging, set up JPDA.

To do so, simply open the startup.sh file, which is somewhere in TOMCAT/bin folder.

Add these two lines near the top

export JPDA_ADDRESS=7000
export JPDA_TRANSPORT=dt_socket

Then modify the exec line (usually the last line in the file) from this:

exec "$PRGDIR"/"$EXECUTABLE" start "$@"

to this

exec "$PRGDIR"/"$EXECUTABLE" jpda start "$@"
Tagged ,

Upgrading to Jersey 2.x in Tomcat8

Jersey 1.x is still supported. As of this writing, version 1.19.3 was just released on Oct 24th, 2016.

But recently I discovered that Tomcat8 doesn’t play well with Jersey 1.x so we simply have to upgrade to Jersey 2.x. Easy right? (If it were, I wouldn’t be writing this post.)

Jersey 1.x
Let’s review the Jersey 1.x configurations first

As far as maven dependencies, here’s what I used in my pom.xml

<dependency>
  <groupId>com.cedarsoft.rest</groupId>
  <artifactId>jersey</artifactId>
  <version>1.0.0</version>
</dependency>

<dependency>
  <groupId>com.sun.jersey</groupId>
  <artifactId>jersey-json</artifactId>
  <version>1.5</version>
</dependency>

com.cedarsoft.rest:jersey is a bundle that includes these dependencies.

Unfortunately, I could find no such bundle for Jersey 2.x so I had to mix and match until trial-and-error led me to a workable solution. (I save you the trouble.)

Here’s my old web.xml

<web-app version="2.4"
 xmlns="http://java.sun.com/xml/ns/j2ee"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
  <servlet>
    <servlet-name>JerseyREST</servlet-name>
    <servlet-class>com.sun.jersey.spi.container.servlet.ServletContainer</servlet-class>
    <init-param>
      <param-name>com.sun.jersey.config.property.packages</param-name>
      <param-value>INSERT_MY_PACKAGES_AND_CLASSES_HERE</param-value>
    </init-param>
    <load-on-startup>2</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>JerseyREST</servlet-name>
    <url-pattern>/rest/*</url-pattern>
  </servlet-mapping>
</web-app>

Jersey 2.x
Now here are the changes for Jersey 2.

   <dependency>
     <groupId>javax.servlet</groupId>
     <artifactId>javax.servlet-api</artifactId>
     <version>3.1.0</version>
     <scope>provided</scope>
   </dependency>
   <dependency>
     <groupId>org.glassfish.jersey.containers</groupId>
     <artifactId>jersey-container-servlet-core</artifactId>
     <version>2.13</version>
   </dependency>
   <dependency>
     <groupId>org.glassfish.jersey.containers</groupId>
     <artifactId>jersey-container-servlet</artifactId>
     <version>2.13</version>
   </dependency>
   <dependency>
     <groupId>com.fasterxml.jackson.core</groupId>
     <artifactId>jackson-databind</artifactId>
     <version>2.8.5</version>
   </dependency>
   <dependency>
     <groupId>org.glassfish.jersey.media</groupId>
     <artifactId>jersey-media-json-jackson</artifactId>
     <version>2.13</version>
   </dependency>
   <!-- jersey file upload dependencies -->
   <dependency>
     <groupId>org.glassfish.jersey.media</groupId>
     <artifactId>jersey-media-multipart</artifactId>
     <version>2.13</version>
   </dependency>

org.glassfish.jersey.media:jersey-media-multipart is only required for file upload capability.

Here’s the new web.xml

<web-app version="3.1"
         metadata-complete="false"
         xmlns="http://java.sun.com/xml/ns/j2ee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
                             http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd">
  <servlet>
    <servlet-name>JerseyREST</servlet-name>
    <servlet-class>org.glassfish.jersey.servlet.ServletContainer</servlet-class>
    <init-param>
        <param-name>jersey.config.server.provider.classnames</param-name>
        <param-value>org.glassfish.jersey.media.multipart.MultiPartFeature</param-value>
    </init-param>        
    <init-param>
      <param-name>jersey.config.server.provider.packages</param-name>
      <param-value>
        INSERT_MY_PACKAGES_AND_CLASSES_HERE
      </param-value>
    </init-param>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
      <servlet-name>JerseyREST</servlet-name>
      <url-pattern>/rest/*</url-pattern>
  </servlet-mapping>
</web-app>

Notice, the new servlet class.
It’s also important to load the org.glassfish.jersey.media.multipart.MultiPartFeature to support file upload capability.

You might also have to change some of your code.
For example, all your @JsonIgnore annotation classes will have to change from
org.codehaus.jackson.annotate.JsonIgnore
to
import com.fasterxml.jackson.annotation.JsonIgnore

Tagged ,

Quick Ubuntu 16 Setup with java8, mysql 5.7, tomcat7

sudo apt-get update

# get latest java, which is java8 at time of writing
sudo apt-get install default-jdk

# get latest mysql, which is mysql5.7 at time of writing
sudo apt-get install mysql-server

# get tomcat7
sudo apt-get install tomcat7
Tagged , , ,

Meet Watson

I had the opportunity to attend the IBM Watson Conference in San Francisco a few weeks ago. It was an amazing event to showcase the new Watson AI Platform.

IBM Watson is a platform that encompasses many AI and ML (machine learning) services. The so-called Watson Services include API’s to help you understand natural language, convert speech-to-text and text-to-speech, build chatbots, recognize images, perform trade-off analytics and much more.

You can try out demos of some of the services to get a feel of how well they work. Just click on Launch app link next to each Starter Kit.

I believe many of the AI technologies underneath the hood of Watson have been around for many years. I know I have used some of them via different open source toolkits.

However, IBM Watson introduces three game-changing characteristics:

  • Accessibility of knowledge and tools
  • Integration of services
  • Fundamental change in the business model

Accessibility
IBM designed their AI tools in such a way that they are more accessible, not only to non-AI experts, but even to non-developers. Their user interface makes tasks such as training and identifying images very easy to use, without a Phd in computer science  or even programming know-how.

Having a single tool can be useful but is still limiting. More interesting applications will rise from the integration of all your tools. How much can you do with a single screwdriver, right?

Integration
To that effect, Watson supplies you with a workbench of AI tools. The value of the platform is that it can seamlessly integrate and manage all services in one place. There’s a common language between the services so to speak.

Cost Model
Finally, this platform is offered as an on-demand pay-as-you-go service. IBM has historically been known as an enterprise software service. Expect to come with a check with lots of zeros behind it if you want to use something IBM-branded. But Watson has an Amazon Web Services model where you only pay for the service calls and CPU time used on their machines. Instead of millions of dollars to get started, developers can get started for free and small businesses can probably run on a few hundred dollars a month.

Final Thoughts
The knowledge and tools that were once locked in the hands of select companies and experts is now open to the world, and it’s available at a fraction of the cost. Microsoft is moving towards this model and Google has already released free AI tools.  I believe we are entering a new world, where the greatest value will come from those who can blend just the right AI/ML services into the most interesting applications.

As an AI developer, this scares me a bit, but personally, I welcome this new world. It will mean change on my part. I believe my value and that of my company’s will be to offer expert consultation on services like Watson’s and others, to show you the possibilities as well as the boundaries of this new world.

Disclaimer: While I worked for IBM in a previous life, I am not affiliated with them in any way. I am part of a smaller, more agile AI company now that is agnostic to the tools we use. The opinions in this piece are my own, and do not necessarily reflect IBM or my current company’s views.

 

Tagged ,

How to hash mysql varchar/string into different bins

First off, what am I talking about?
Here’s a scenario.

You have a bunch of rows in your table that are URLs. You want to select an evenly distributed random set of them every so often and check if they’re still alive. How do you do this?

You could just take a range.

Get the first 10

SELECT url FROM myTable
ORDER BY url
LIMIT 0, 10

Get the next 10

SELECT url FROM myTable
ORDER BY url
LIMIT 10, 10

And the next 10…

SELECT url FROM myTable
ORDER BY url
LIMIT 20, 10

But what if you wanted to evenly distribute your checks because the URLs from the same site are sequentially ordered.

Well, you could ensure an auto-increment ID and just get chunks of them based on a mod.

So to get every 10th one

SELECT id, url FROM myTable
WHERE id % 10 = 0 

And the next 10th sequence

SELECT id, url FROM myTable
WHERE id % 10 = 1 

And the next 10th sequence

SELECT id, url FROM myTable
WHERE id % 10 = 2 

This is still not that well distributed since you might run into large clusters of site url’s (ie more than 10).
But this also requires a nice integer column value.

Instead, you could also just use the VARCHAR value like the URL itself

So to get the first 10

SELECT id, ... FROM myTable
WHERE CAST(CONV(SUBSTRING(MD5(url), 1, 16), 16, 10) AS SIGNED INTEGER) % 10 = 0

To get the next 10

SELECT id, ... FROM myTable
WHERE CAST(CONV(SUBSTRING(MD5(url), 1, 16), 16, 10) AS SIGNED INTEGER) % 10 = 1

And so on…

Let’s go over what this does.

The MD5() is a hash function that will convert your VARCHAR instead a seemingly random sequence of numbers of letters. It’s not random though. It always converts to the same sequence, but distributes the VARCHAR sequence of characters more uniformly.

The SUBSTRING(…, 1, 16) takes the first 16 digits of the MD5 hash value. I believe this gives you the first 64 bits of it, otherwise there’s a possible overflow error.

The CONV(…, 16, 10) function converts the hash (which is a hex or base-16 value) into a base-10 value.

The CAST(… AS SIGNED INTEGER) function converts it to a signed integer. (If you’re going to read this value into java, you want a signed integer otherwise you’ll get an overflow)

Then simply mod (%) it by the number of bins you want. In my example, I modded it with 10.

Tagged

Formatting DATE and DATETIME in Mysql

Did you ever inherit a table with a VARCHAR for one of the date fields?
Doesn’t seem that bad, except that gives license for people to start putting different date formats into it.

e.g.
2001-May-05 11:30
11-19-2009 23:33
Nov 4, 1998 8:03
3/18/08 3:50
8-15-1999 13:00

You should put these into a DATE or DATETIME column. And here’s how you would parse them

SELECT id, strDate,
  CASE WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%Y-%m-%d %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%Y-%m-%d %H:%i:%S")
       WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%Y-%M-%d %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%Y-%M-%d %H:%i:%S")
       WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%d-%M-%Y %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%d-%M-%Y %H:%i:%S")
  END AS newDate
FROM date_table
WHERE strDate IS NOT NULL

Add as many formats as you like and make sure you test!

Also, if you wanted to update the date_table with this new DATETIME value, you can do this

UPDATE date_table
SET newDate = CASE
  WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%Y-%m-%d %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%Y-%m-%d %H:%i:%S")
  WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%Y-%M-%d %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%Y-%M-%d %H:%i:%S")
  WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%d-%M-%Y %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%d-%M-%Y %H:%i:%S")  
END
WHERE strDate IS NOT NULL
AND newDate IS NULL

One thing to note. If you’re going to CREATE, UPDATE or INSERT into a table with these values, there’s a chance you may run into the following error

 “Incorrect datetime value: ‘XXXX’ for function str_to_date”

It may be that your MySql server is running in strict mode.

To check, run

select @@session.sql_mode

It might produce something like “STRICT_ALL_TABLES” or “STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION”

To set it to a less strict mode, run

set session sql_mode =''

Now your UPDATE, INSERT or CREATE should work.

Once it completes, you may want to set the sql_mode back to the previous value.

Tagged

Dear Mom, Yours Truly, Program

Have you ever wanted to write an email to your mom… sent by your program? Of course! What respectable programmer hasn’t. Well, this tutorial will show you how to do it from Java.

import javax.mail.Message;
import javax.mail.PasswordAuthentication;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;


  private String SMTP_HOST = "smtp.office365.com";
  private String SMTP_PORT = 587;
  private boolean DEBUG = true;

  private static void sendEmail(String contentType, final String login, final String password,
                                String fromEmail, String replyToEmail, String[] a_to, String a_subject,
                                String a_contents) {
    try {
      Properties props = new Properties();
      props.put("mail.smtp.auth", "true");
      props.put("mail.smtp.starttls.enable", "true");
      props.put("mail.smtp.host", SMTP_HOST);
      props.put("mail.smtp.port", SMTP_PORT);
      Session session = Session.getInstance(props,
          new javax.mail.Authenticator() {
        protected PasswordAuthentication getPasswordAuthentication() {
          return new PasswordAuthentication(login, password);
        }
      });
      if( DEBUG ) session.setDebug(true);

      MimeMessage message = new MimeMessage(session);
      message.setFrom(new InternetAddress(fromEmail));
      for (String toTarget : a_to) {
        message.addRecipient(Message.RecipientType.TO, new InternetAddress(
            toTarget));
      }
      message.setFrom(new InternetAddress(fromEmail));
      message.setReplyTo(new InternetAddress[] { new InternetAddress(replyToEmail) });
      message.setSubject(a_subject);
      message.setContent(a_contents, contentType);
      message.setHeader("Content-Transfer-Encoding", "7bit");

      Transport.send(message);
    } catch (Exception e) {
      throw new RuntimeException("Unable to send HTML mail", e);
    }
  }

I won’t explain the code. Just use it. You’re smart. You’ll figure it out.

So let’s say we wanted to send out from a gmail account. Just change SMTP_HOST to “smtp.gmail.com” and we’re good right? No, not quite.

First of all, you need to create an App password as opposed to using your account login password.

Then you’ll test it on your local machine and exclaim “Yay! It works!”. Then you push it to your Amazon server and go to Friday Happy Hour. Then on Sat at 4am, you will get an error that wakes you up in the middle of the night saying Email failed like this

Caused by: javax.mail.MessagingException: [EOF]
        at com.sun.mail.smtp.SMTPTransport.issueCommand(SMTPTransport.java:1481)
        at com.sun.mail.smtp.SMTPTransport.helo(SMTPTransport.java:917)
        at com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:417)
        at javax.mail.Service.connect(Service.java:310)
        at javax.mail.Service.connect(Service.java:169)
        at javax.mail.Service.connect(Service.java:118)
        at javax.mail.Transport.send0(Transport.java:188)
        at javax.mail.Transport.send(Transport.java:118)

Why didn’t it work? Seems Google smtp servers have blocked emails from being sent from AWS machines. Why? Because they hate you, and also probably to prevent spam machines from polluting our internet.

So what can you do to send from a Gmail account? You have to use Google’s own brand of code. It looks the same, but take a close look at the imports.

import com.google.code.javax.mail.Message;
import com.google.code.javax.mail.PasswordAuthentication;
import com.google.code.javax.mail.Session;
import com.google.code.javax.mail.Transport;
import com.google.code.javax.mail.internet.InternetAddress;
import com.google.code.javax.mail.internet.MimeMessage;

  private static void sendEmail(String contentType, final String username, final String password,
                                String fromAddr, String replyToEmail, String[] toAddr, String subj,
                                String txt) {
    
    Properties props = new Properties();
    props.put("mail.smtp.auth", "true");
    props.put("mail.smtp.starttls.enable", "true");
    props.put("mail.smtp.host", "smtp.gmail.com");
    props.put("mail.smtp.port", "587");
    Session session = Session.getInstance(props,
        new com.google.code.javax.mail.Authenticator() {
          protected PasswordAuthentication getPasswordAuthentication() {
            return new PasswordAuthentication(username, password);
          }
        });
    try {
      Message message = new MimeMessage(session);
      message.setFrom(new InternetAddress(username, fromAddr));
      for (String ta : toAddr) {
        message.addRecipients(Message.RecipientType.TO,
            InternetAddress.parse(ta));
      }
      message.setSubject(subj);
      message.setContent(txt, "text/html; charset=utf-8");
      Transport.send(message);
    } catch (Exception e) {
      throw new RuntimeException(e);
    }
  }  

Ok, now go write your mom an email.

Tagged , , , ,

Maven is a weirdo when it comes to resolving dependencies

The other day, I was experimenting with maven dependency conflicts. That is, if your project uses the dependency with conflicting versions, maven will resolve/pick one for you depending on your rules and it’s heuristics.

For the record, I’m using maven 3.3

According to maven docs

“by default Maven resolves version conflicts with a nearest-wins strategy”

You’d think these heuristics are simple, but not really. Let’s look at some examples.

Let’s say you have a pom with two conflicting dependencies

<dependencies>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.4</version>
</dependency>

<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.5</version>
</dependency>
</dependencies>

You can run “mvn dependency:tree -Dverbose” to see which of the two commons-codec version it picks.

In this case, maven seems to prefer the last commons-codec in the list of dependencies. 

That makes some sense. Maybe developers have the habit of adding dependencies to the end of the list so maven prefers that one

Let’s suppose we have a dependency, such as hadoop-common, that depends on commons-codec 1.4 and we have a commons-codec 1.5 dependency at the top-level. Which version would it prefer then?

<dependencies>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.5</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.4.0</version>
</dependency>
</dependencies>

Maven prefers the top-level commons-codec 1.5 version here.

Even though the commons-codec 1.4 within hadoop-common comes later in the dependency list, it prefers the top-level one that the develop explicitly chose. This makes sense since the top-level dependency is explicitly chosen by the developer while the one within hadoop-common is somewhat more implicit. So maven seems to obey explicit top-level dependencies.

Here’s where it gets a little weird. What happens if we have two dependencies that depend on different versions of commons-codec?

poi depends on commons-codec 1.5 and hadoop-common depends on commons-codec 1.4

<dependencies>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.8-beta5</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.4.0</version>
</dependency>
</dependencies>

Maven will choose the FIRST version it sees, in this case, it will prefer commons-codec 1.5 found in the earlier poi dependency.

This is a bit counter-intuitive. Remember that previously, maven prefers the LAST version of commons-codec when both were listed in the top-level.

Let’s dive deeper. Does the depth at which commons-codec is found matter?

hadoop-client depends on hadoop-common which depends on commons-codec 1.4. And poi depends on commons-codec 1.5

<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.4.0</version>
</dependency>

<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.8-beta5</version>
</dependency>
</dependencies>

Maven prefers poi’s common-codec 1.5 since it is found at the 2nd-level, whereas common-codec 1.4 is found at the 3rd-level of hadoop-client.

It seems that the closer to the top-level the dependency is, the more maven prefers it. This is probably consistent with the fact that maven picks explicit top-level dependencies over sub-dependencies at lower levels. You can try switching the order of hadoop-client and poi and you’ll see that the depth is more important than the dependency order here.

So do you think you have a good handle on how maven resolves dependencies?

 

Mysql tip: How to SELECT all the columns by name

Thanks to this stackoverflow answer
Here’s how you can create a SELECT all columns (except one) sql:

SET @sql = CONCAT('SELECT '
, (SELECT REPLACE(GROUP_CONCAT(COLUMN_NAME), '<columns_to_omit>,', '')
   FROM INFORMATION_SCHEMA.COLUMNS
   WHERE TABLE_NAME = '<table>'
   AND TABLE_SCHEMA = '<database>'), ' FROM <table>');

And thanks to this stackoverfow answer

Here’s how to print that statement


select @sql;

S3’s Role in a Company

As a data company, we sometimes have some very odd ways of cataloguing data. In one instance, a 50GB file was referenced by a word doc. It said to look in someone’s old laptop under a cryptic folder structure. I dusted off and turned on the 5-year-old laptop. It made some uncomfortable churning noises but managed to let me in. I was lucky that this was my old laptop and I knew the login password and the directories well enough to know what it was referring to. If I weren’t here, I don’t know how it would have been found.

This is a problem.

Here’s the solution.

I decided to look into utilizing Amazon’s S3 service. We had already used it, mostly as a dropbox with console access. But I wanted to do more with it. I wanted to make it an integral part of our data management process.

My goal was to be able to programmatically access the data in the S3 cloud. I was building a vagrant VM so I needed it to be able to automatically pull the data and use it during its installation.

First, I had to create an IAM User with credentials. Simply log into AWS’s IAM Manangement Console. There, Create a New User. I called mine vagrantVM. Immediately after the user creation, it will give you this user’s AWS Access Key and AWS Secret Access Key. WRITE THESE DOWN BECAUSE IT WILL NEVER SHOW YOU THE SECRET ACCESS KEY AGAIN AND YOU’LL NEED IT LATER.

Click on the user you just created and go to the Permissions tab. You need to Attach a Policy. You can either give this user the AmazonS3ReadOnlyAccess policy or AmazonS3FullAccess policy depending on your needs. I want my vagrant machines to only have read-only access so I opted to only give it the former policy.

Screen Shot 2016-01-25 at 1.53.08 PM.png

Next we need to set up the Linux box with the Amazon CLI (command-line) tool and the right credentials.

The easiest way to install Amazon CLI is through pip.

  1. First, make sure you have pip, otherwise:
    “sudo apt-get install -y python-pip”
  2. Then install the CLI:
    “sudo pip install awscli”

Note: If the 2nd step above doesn’t work for you and you’re on Mac OSX El Capitan, then try this command instead:
“sudo pip install awscli –upgrade –ignore-installed six”
There’s an issue with El Capitan at the time of this writing.

Next, you’ll need to configure it.

A lot of articles encourage you to use the “aws configure” command, which works nicely. But this requires user interaction, which may not always be possible, especially if I want my vagrant machine to automatically set this up.

So the alternative is to create a “~/.aws” folder and under it add two files

  • “credentials” file
    [default]
    aws_access_key_id=YOUR_AWS_ACCESS_KEY_ID
    aws_secret_access_key=YOUR_AWS_SECRET_KEY
  • “config” file
    [default]
    region=us-west-2

I placed these two files in my resources/awscli folder under my root vagrant folder. And this is my puppet script inside manifests/site.pp

$awsDir = '/home/vagrant/.aws'
  file { $awsDir:
  ensure =&gt; 'directory'
}

file { '${awsDir}/credentials':
  require =&gt; File[$awsDir],
  source =&gt; '/vagrant/resources/awscli/credentials'
}

file { '${awsDir}/config':
  require =&gt; File[$awsDir],
  source =&gt; '/vagrant/resources/awscli/config'
}

You can now download (and upload) files to S3 assuming you have the right policies attached to the user.

If you prefer to upload via a console, you can do that through the AWS S3 Console.

You can also upload through the AWS CLI commands like so:

aws s3 cp [my_file] s3://[bucket]/[folder]/[filename]

You can now also download that same file using AWS CLI

aws s3 cp s3://[bucket]/[folder]/[filename] [my_file]

 

EDIT

I need to make one thing clear when doing this with Vagrant. Everything said above is right, but when I use puppet to setup my vagrant instance, for some reason, it doesn’t recognize my .aws/credentials file. I don’t know enough about vagrant initiation, but the home directory must not be set or a different user must be doing the vagrant setup, but it can’t find ~/.aws/credentials or ~/.aws/config.

You get an error such as “Unable to locate credentials”

So you need to provide to the script that calls “aws s3 …” commands the path to the aws config. There’s a AWS_CONFIG_FILE environment variable that you can set, so in my .sh script file, I do this


export AWS_CONFIG_FILE=/home/vagrant/.aws/config
aws s3 cp s3://[bucket]/[folder]/[file] .

Note that I’m only pointing to the config file, which does not contain the access key id or secret access key according to my post above. There is no environment variable that points to the credentials file unfortunately. However, you can add these into your config file

[default]
aws_access_key_id=YOUR_AWS_ACCESS_KEY_ID
aws_secret_access_key=YOUR_AWS_SECRET_KEY
region=us-west-2

This will allow your vagrant script to point to your AWS credentials and invoke the AWS CLI command

Tagged , ,