Meet Watson

I had the opportunity to attend the IBM Watson Conference in San Francisco a few weeks ago. It was an amazing event to showcase the new Watson AI Platform.

IBM Watson is a platform that encompasses many AI and ML (machine learning) services. The so-called Watson Services include API’s to help you understand natural language, convert speech-to-text and text-to-speech, build chatbots, recognize images, perform trade-off analytics and much more.

You can try out demos of some of the services to get a feel of how well they work. Just click on Launch app link next to each Starter Kit.

I believe many of the AI technologies underneath the hood of Watson have been around for many years. I know I have used some of them via different open source toolkits.

However, IBM Watson introduces three game-changing characteristics:

  • Accessibility of knowledge and tools
  • Integration of services
  • Fundamental change in the business model

IBM designed their AI tools in such a way that they are more accessible, not only to non-AI experts, but even to non-developers. Their user interface makes tasks such as training and identifying images very easy to use, without a Phd in computer science  or even programming know-how.

Having a single tool can be useful but is still limiting. More interesting applications will rise from the integration of all your tools. How much can you do with a single screwdriver, right?

To that effect, Watson supplies you with a workbench of AI tools. The value of the platform is that it can seamlessly integrate and manage all services in one place. There’s a common language between the services so to speak.

Cost Model
Finally, this platform is offered as an on-demand pay-as-you-go service. IBM has historically been known as an enterprise software service. Expect to come with a check with lots of zeros behind it if you want to use something IBM-branded. But Watson has an Amazon Web Services model where you only pay for the service calls and CPU time used on their machines. Instead of millions of dollars to get started, developers can get started for free and small businesses can probably run on a few hundred dollars a month.

Final Thoughts
The knowledge and tools that were once locked in the hands of select companies and experts is now open to the world, and it’s available at a fraction of the cost. Microsoft is moving towards this model and Google has already released free AI tools.  I believe we are entering a new world, where the greatest value will come from those who can blend just the right AI/ML services into the most interesting applications.

As an AI developer, this scares me a bit, but personally, I welcome this new world. It will mean change on my part. I believe my value and that of my company’s will be to offer expert consultation on services like Watson’s and others, to show you the possibilities as well as the boundaries of this new world.

Disclaimer: While I worked for IBM in a previous life, I am not affiliated with them in any way. I am part of a smaller, more agile AI company now that is agnostic to the tools we use. The opinions in this piece are my own, and do not necessarily reflect IBM or my current company’s views.


Tagged ,

How to hash mysql varchar/string into different bins

First off, what am I talking about?
Here’s a scenario.

You have a bunch of rows in your table that are URLs. You want to select an evenly distributed random set of them every so often and check if they’re still alive. How do you do this?

You could just take a range.

Get the first 10

SELECT url FROM myTable
LIMIT 0, 10

Get the next 10

SELECT url FROM myTable
LIMIT 10, 10

And the next 10…

SELECT url FROM myTable
LIMIT 20, 10

But what if you wanted to evenly distribute your checks because the URLs from the same site are sequentially ordered.

Well, you could ensure an auto-increment ID and just get chunks of them based on a mod.

So to get every 10th one

SELECT id, url FROM myTable
WHERE id % 10 = 0 

And the next 10th sequence

SELECT id, url FROM myTable
WHERE id % 10 = 1 

And the next 10th sequence

SELECT id, url FROM myTable
WHERE id % 10 = 2 

This is still not that well distributed since you might run into large clusters of site url’s (ie more than 10).
But this also requires a nice integer column value.

Instead, you could also just use the VARCHAR value like the URL itself

So to get the first 10

SELECT id, ... FROM myTable
WHERE CAST(CONV(SUBSTRING(MD5(url), 1, 16), 16, 10) AS SIGNED INTEGER) % 10 = 0

To get the next 10

SELECT id, ... FROM myTable
WHERE CAST(CONV(SUBSTRING(MD5(url), 1, 16), 16, 10) AS SIGNED INTEGER) % 10 = 1

And so on…

Let’s go over what this does.

The MD5() is a hash function that will convert your VARCHAR instead a seemingly random sequence of numbers of letters. It’s not random though. It always converts to the same sequence, but distributes the VARCHAR sequence of characters more uniformly.

The SUBSTRING(…, 1, 16) takes the first 16 digits of the MD5 hash value. I believe this gives you the first 64 bits of it, otherwise there’s a possible overflow error.

The CONV(…, 16, 10) function converts the hash (which is a hex or base-16 value) into a base-10 value.

The CAST(… AS SIGNED INTEGER) function converts it to a signed integer. (If you’re going to read this value into java, you want a signed integer otherwise you’ll get an overflow)

Then simply mod (%) it by the number of bins you want. In my example, I modded it with 10.


Formatting DATE and DATETIME in Mysql

Did you ever inherit a table with a VARCHAR for one of the date fields?
Doesn’t seem that bad, except that gives license for people to start putting different date formats into it.

2001-May-05 11:30
11-19-2009 23:33
Nov 4, 1998 8:03
3/18/08 3:50
8-15-1999 13:00

You should put these into a DATE or DATETIME column. And here’s how you would parse them

SELECT id, strDate,
  CASE WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%Y-%m-%d %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%Y-%m-%d %H:%i:%S")
       WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%Y-%M-%d %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%Y-%M-%d %H:%i:%S")
       WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%d-%M-%Y %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%d-%M-%Y %H:%i:%S")
  END AS newDate
FROM date_table

Add as many formats as you like and make sure you test!

Also, if you wanted to update the date_table with this new DATETIME value, you can do this

UPDATE date_table
SET newDate = CASE
  WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%Y-%m-%d %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%Y-%m-%d %H:%i:%S")
  WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%Y-%M-%d %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%Y-%M-%d %H:%i:%S")
  WHEN LENGTH(DATE(STR_TO_DATE(strDate,"%d-%M-%Y %H:%i:%S"))) IS NOT NULL THEN STR_TO_DATE(strDate,"%d-%M-%Y %H:%i:%S")  

One thing to note. If you’re going to CREATE, UPDATE or INSERT into a table with these values, there’s a chance you may run into the following error

 “Incorrect datetime value: ‘XXXX’ for function str_to_date”

It may be that your MySql server is running in strict mode.

To check, run

select @@session.sql_mode


To set it to a less strict mode, run

set session sql_mode =''

Now your UPDATE, INSERT or CREATE should work.

Once it completes, you may want to set the sql_mode back to the previous value.


Dear Mom, Yours Truly, Program

Have you ever wanted to write an email to your mom… sent by your program? Of course! What respectable programmer hasn’t. Well, this tutorial will show you how to do it from Java.

import javax.mail.Message;
import javax.mail.PasswordAuthentication;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;

  private String SMTP_HOST = "";
  private String SMTP_PORT = 587;
  private boolean DEBUG = true;

  private static void sendEmail(String contentType, final String login, final String password,
                                String fromEmail, String replyToEmail, String[] a_to, String a_subject,
                                String a_contents) {
    try {
      Properties props = new Properties();
      props.put("mail.smtp.auth", "true");
      props.put("mail.smtp.starttls.enable", "true");
      props.put("", SMTP_HOST);
      props.put("mail.smtp.port", SMTP_PORT);
      Session session = Session.getInstance(props,
          new javax.mail.Authenticator() {
        protected PasswordAuthentication getPasswordAuthentication() {
          return new PasswordAuthentication(login, password);
      if( DEBUG ) session.setDebug(true);

      MimeMessage message = new MimeMessage(session);
      message.setFrom(new InternetAddress(fromEmail));
      for (String toTarget : a_to) {
        message.addRecipient(Message.RecipientType.TO, new InternetAddress(
      message.setFrom(new InternetAddress(fromEmail));
      message.setReplyTo(new InternetAddress[] { new InternetAddress(replyToEmail) });
      message.setContent(a_contents, contentType);
      message.setHeader("Content-Transfer-Encoding", "7bit");

    } catch (Exception e) {
      throw new RuntimeException("Unable to send HTML mail", e);

I won’t explain the code. Just use it. You’re smart. You’ll figure it out.

So let’s say we wanted to send out from a gmail account. Just change SMTP_HOST to “” and we’re good right? No, not quite.

First of all, you need to create an App password as opposed to using your account login password.

Then you’ll test it on your local machine and exclaim “Yay! It works!”. Then you push it to your Amazon server and go to Friday Happy Hour. Then on Sat at 4am, you will get an error that wakes you up in the middle of the night saying Email failed like this

Caused by: javax.mail.MessagingException: [EOF]
        at com.sun.mail.smtp.SMTPTransport.issueCommand(
        at com.sun.mail.smtp.SMTPTransport.helo(
        at com.sun.mail.smtp.SMTPTransport.protocolConnect(
        at javax.mail.Service.connect(
        at javax.mail.Service.connect(
        at javax.mail.Service.connect(
        at javax.mail.Transport.send0(
        at javax.mail.Transport.send(

Why didn’t it work? Seems Google smtp servers have blocked emails from being sent from AWS machines. Why? Because they hate you, and also probably to prevent spam machines from polluting our internet.

So what can you do to send from a Gmail account? You have to use Google’s own brand of code. It looks the same, but take a close look at the imports.


  private static void sendEmail(String contentType, final String username, final String password,
                                String fromAddr, String replyToEmail, String[] toAddr, String subj,
                                String txt) {
    Properties props = new Properties();
    props.put("mail.smtp.auth", "true");
    props.put("mail.smtp.starttls.enable", "true");
    props.put("", "");
    props.put("mail.smtp.port", "587");
    Session session = Session.getInstance(props,
        new {
          protected PasswordAuthentication getPasswordAuthentication() {
            return new PasswordAuthentication(username, password);
    try {
      Message message = new MimeMessage(session);
      message.setFrom(new InternetAddress(username, fromAddr));
      for (String ta : toAddr) {
      message.setContent(txt, "text/html; charset=utf-8");
    } catch (Exception e) {
      throw new RuntimeException(e);

Ok, now go write your mom an email.

Tagged , , , ,

Maven is a weirdo when it comes to resolving dependencies

The other day, I was experimenting with maven dependency conflicts. That is, if your project uses the dependency with conflicting versions, maven will resolve/pick one for you depending on your rules and it’s heuristics.

For the record, I’m using maven 3.3

According to maven docs

“by default Maven resolves version conflicts with a nearest-wins strategy”

You’d think these heuristics are simple, but not really. Let’s look at some examples.

Let’s say you have a pom with two conflicting dependencies



You can run “mvn dependency:tree -Dverbose” to see which of the two commons-codec version it picks.

In this case, maven seems to prefer the last commons-codec in the list of dependencies. 

That makes some sense. Maybe developers have the habit of adding dependencies to the end of the list so maven prefers that one

Let’s suppose we have a dependency, such as hadoop-common, that depends on commons-codec 1.4 and we have a commons-codec 1.5 dependency at the top-level. Which version would it prefer then?



Maven prefers the top-level commons-codec 1.5 version here.

Even though the commons-codec 1.4 within hadoop-common comes later in the dependency list, it prefers the top-level one that the develop explicitly chose. This makes sense since the top-level dependency is explicitly chosen by the developer while the one within hadoop-common is somewhat more implicit. So maven seems to obey explicit top-level dependencies.

Here’s where it gets a little weird. What happens if we have two dependencies that depend on different versions of commons-codec?

poi depends on commons-codec 1.5 and hadoop-common depends on commons-codec 1.4



Maven will choose the FIRST version it sees, in this case, it will prefer commons-codec 1.5 found in the earlier poi dependency.

This is a bit counter-intuitive. Remember that previously, maven prefers the LAST version of commons-codec when both were listed in the top-level.

Let’s dive deeper. Does the depth at which commons-codec is found matter?

hadoop-client depends on hadoop-common which depends on commons-codec 1.4. And poi depends on commons-codec 1.5



Maven prefers poi’s common-codec 1.5 since it is found at the 2nd-level, whereas common-codec 1.4 is found at the 3rd-level of hadoop-client.

It seems that the closer to the top-level the dependency is, the more maven prefers it. This is probably consistent with the fact that maven picks explicit top-level dependencies over sub-dependencies at lower levels. You can try switching the order of hadoop-client and poi and you’ll see that the depth is more important than the dependency order here.

So do you think you have a good handle on how maven resolves dependencies?


Mysql tip: How to SELECT all the columns by name

Thanks to this stackoverflow answer
Here’s how you can create a SELECT all columns (except one) sql:

, (SELECT REPLACE(GROUP_CONCAT(COLUMN_NAME), '<columns_to_omit>,', '')
   WHERE TABLE_NAME = '<table>'
   AND TABLE_SCHEMA = '<database>'), ' FROM <table>');

And thanks to this stackoverfow answer

Here’s how to print that statement

select @sql;

S3’s Role in a Company

As a data company, we sometimes have some very odd ways of cataloguing data. In one instance, a 50GB file was referenced by a word doc. It said to look in someone’s old laptop under a cryptic folder structure. I dusted off and turned on the 5-year-old laptop. It made some uncomfortable churning noises but managed to let me in. I was lucky that this was my old laptop and I knew the login password and the directories well enough to know what it was referring to. If I weren’t here, I don’t know how it would have been found.

This is a problem.

Here’s the solution.

I decided to look into utilizing Amazon’s S3 service. We had already used it, mostly as a dropbox with console access. But I wanted to do more with it. I wanted to make it an integral part of our data management process.

My goal was to be able to programmatically access the data in the S3 cloud. I was building a vagrant VM so I needed it to be able to automatically pull the data and use it during its installation.

First, I had to create an IAM User with credentials. Simply log into AWS’s IAM Manangement Console. There, Create a New User. I called mine vagrantVM. Immediately after the user creation, it will give you this user’s AWS Access Key and AWS Secret Access Key. WRITE THESE DOWN BECAUSE IT WILL NEVER SHOW YOU THE SECRET ACCESS KEY AGAIN AND YOU’LL NEED IT LATER.

Click on the user you just created and go to the Permissions tab. You need to Attach a Policy. You can either give this user the AmazonS3ReadOnlyAccess policy or AmazonS3FullAccess policy depending on your needs. I want my vagrant machines to only have read-only access so I opted to only give it the former policy.

Screen Shot 2016-01-25 at 1.53.08 PM.png

Next we need to set up the Linux box with the Amazon CLI (command-line) tool and the right credentials.

The easiest way to install Amazon CLI is through pip.

  1. First, make sure you have pip, otherwise:
    “sudo apt-get install -y python-pip”
  2. Then install the CLI:
    “sudo pip install awscli”

Note: If the 2nd step above doesn’t work for you and you’re on Mac OSX El Capitan, then try this command instead:
“sudo pip install awscli –upgrade –ignore-installed six”
There’s an issue with El Capitan at the time of this writing.

Next, you’ll need to configure it.

A lot of articles encourage you to use the “aws configure” command, which works nicely. But this requires user interaction, which may not always be possible, especially if I want my vagrant machine to automatically set this up.

So the alternative is to create a “~/.aws” folder and under it add two files

  • “credentials” file
  • “config” file

I placed these two files in my resources/awscli folder under my root vagrant folder. And this is my puppet script inside manifests/site.pp

$awsDir = '/home/vagrant/.aws'
  file { $awsDir:
  ensure =&gt; 'directory'

file { '${awsDir}/credentials':
  require =&gt; File[$awsDir],
  source =&gt; '/vagrant/resources/awscli/credentials'

file { '${awsDir}/config':
  require =&gt; File[$awsDir],
  source =&gt; '/vagrant/resources/awscli/config'

You can now download (and upload) files to S3 assuming you have the right policies attached to the user.

If you prefer to upload via a console, you can do that through the AWS S3 Console.

You can also upload through the AWS CLI commands like so:

aws s3 cp [my_file] s3://[bucket]/[folder]/[filename]

You can now also download that same file using AWS CLI

aws s3 cp s3://[bucket]/[folder]/[filename] [my_file]



I need to make one thing clear when doing this with Vagrant. Everything said above is right, but when I use puppet to setup my vagrant instance, for some reason, it doesn’t recognize my .aws/credentials file. I don’t know enough about vagrant initiation, but the home directory must not be set or a different user must be doing the vagrant setup, but it can’t find ~/.aws/credentials or ~/.aws/config.

You get an error such as “Unable to locate credentials”

So you need to provide to the script that calls “aws s3 …” commands the path to the aws config. There’s a AWS_CONFIG_FILE environment variable that you can set, so in my .sh script file, I do this

export AWS_CONFIG_FILE=/home/vagrant/.aws/config
aws s3 cp s3://[bucket]/[folder]/[file] .

Note that I’m only pointing to the config file, which does not contain the access key id or secret access key according to my post above. There is no environment variable that points to the credentials file unfortunately. However, you can add these into your config file


This will allow your vagrant script to point to your AWS credentials and invoke the AWS CLI command

Tagged , ,

Writing Synchronous Code in Node

First of all, you should be writing asynchronous code in node since that is the philosophy behind it.

But I’ve been experimenting with using node to write non-webapp, throwaway code and sometimes it helps to write synchronous code. This is how you’d do it.

There’s a synchronous library called synchronize.js.

Install it in your development environment

npm install synchronize

Then use it. Here’s a sample

var db = require('./db.js');
var sync = require('synchronize');

sync(db.connection, 'query');

function getDBValues() {
  var sql =
      'SELECT value'
    + ' FROM table'
  var values = [];
  var rows = db.connection.query(sql);
  for(var i=0; i < rows.length; i++) {
    values[i] = rows[i].value;
  return values;

sync.fiber(function() {
  var values = getDBValues();


I’ll point out the necessary pieces.

First is the import of synchronize

var sync = require('synchronize');

Then declare the object and the functions within that object which you want to be synchronous. Here, I declare my db.connection object’s query() method to be synchronous

sync(db.connection, 'query');

Wrap your code in a sync.fiber() method.

sync.fiber(function() { ...} )

Now you can make synchronous calls like this

  var rows = db.connection.query(sql);

instead of asynchronous code like

  db.connection.query(sql, function(err, rows) {
Tagged , , , ,

A Docker Development Environment

In a development environment, you want two things. You want access to your shiny development tools when coding. And you want the full suite of production services at your disposal to test your code in.

In the Getting Started with Docker guide, we went over how to set up a Docker container. You could put your code inside that container, but remember that I warned containers are ephemeral, meaning you could lose your changes inside the container. Second, you’d have to find a way to use your fancy dev tools on the code inside the container and that’s not simple.

It would be nice if you could develop your code on your dev machine and then automatically have changes reflected inside the container so it’s available for testing.

Well there’s a way.

Docker allows you to “mount” a volume from your host machine’s drive to your container’s drive.

> docker run -d -P -v /Users/myaccount/mywebserver:/var/www/public_html myname/myimage:1.0
  • docker run this creates a container from an image
  • -d runs it in the background
  • -P exposes all ports
  • -v to indicate a volume mount
  • /Users/myaccount/myewbserver source folder on host to mount
  • /var/www/public_html destination folder on container to mount to
  • myname/myimage:1.0 name of image to instantiate

If you’re not on a Linux machine, there’s something you should be cautious of.

If you are using Docker Machine on Mac or Windows, your Docker daemon has only limited access to your OS X or Windows filesystem. Docker Machine tries to auto-share your /Users (OS X) or C:\Users (Windows) directory.

According to Docker

Notice in my example above that I’m mounting a folder within my home folder /Users/myaccount.

Tagged , ,

Getting Started with Docker (using the docker-machine)

Docker is one of the newest kids on the block. Awhile ago, I posted about how you can get started with Vagrant to create isolated development environments. We’re going to do the same with Docker.

What’s the difference? To sum it up, Vagrant manages virtual machines and Docker manages application environments. Check out this Stackoverflow discussion with the authors of both systems chiming in.


  1. docker-machine create (optional) to create host machine
  2. docker build to build docker image
  3. docker run to instantiate a container from an image
  4. docker exec -i … bash to get into container
  5. docker commit (optional) to convert container to image
  6. docker push to upload image to Docker Hub
  7. docker pull to download an image
  8. docker save/load to save a committed image to a tar file
  9. docker stop to stop a container
  10. docker start to start a container

Ok so if you’re convinced you need to run Docker or you just want to add Docker to your skillset, let’s get started…

First get Docker. Go to docker’s website and download it. (I can’t link you directly to the download since I don’t know if you’re running Mac, Windows or Linux.)

It’s worth noting I have Mac OS X 10.10 and I ended up installing Docker 1.9.1 at the time of this writing. Your experience may be different but hopefully not by much.

Quick architecture lesson. Docker works off LinuX Containers (LXC). Mac and Windows are obviously not Linux so they work slightly differently. On my Mac, Docker requires a virtual machine, specifically VirtualBox to run a linux OS so Docker can work.

Here’s my understanding of the Docker architecture in my own image


At the foundation is the machine, or more precisely a Linux machine with a kernel that supports LXC’s. Docker lets you “build” an Image which contains all the layers of changes you made. A Container is an instance of an image. It’s the thing that’s runnable. A good description at Stackoverflow.

You get two tools after the installation

  • Docker Quickstart Terminal (docker’s command-line or CLI): gives you a terminal initialized with the “default” host virtual machine settings. (I’ll comment more about this below.)
  • Kitematic: a GUI to manage the running containers

Let’s talk about the Machine


Besides the two tools, you also get a Virtualbox (if you’re on a Mac or Windows). This is your docker-machine. You can see all the docker machines in your system by running this command in the Docker Quickstart Terminal CLI

> docker-machine ls

Your may get a listing that looks similar to this

NAME      ACTIVE   DRIVER       STATE     URL                         SWARM   ERRORS

default   *        virtualbox   Running   tcp://           

You likely have one machine called “default”. It’s the active one. I believe what makes it active depends on the environment variables in your terminal. To see what environment variables are set for the “default” machine type

> docker-machine env default

To switch to an active machine, you simply set the environment variables for that machine. You can use the linux command eval to run the output of the last command like so

> eval "$(docker-machine env default)"

You can remove and create machines using the docker-machine command. See all available commands by typing

> docker-machine help

Why would you need a different machine or new machine? I’m not really sure. But I had to remove my “default” one and recreate it again to add some dns settings like so

> docker-machine create -d virtualbox --engine-opt dns= default

For some reason, my docker containers didn’t have access to the internet due to bad DNS settings and this has to be set at the machine level.

There may be a less destructive way of altering the DNS according to this thread.

Now that you’re familiar with the machines, let’s move on.

Let’s talk about the Image

Open the Docker Quickstart Terminal CLI and run

> docker images

You probably don’t have any images yet. Let’s get an image from Docker Hub.

> docker pull linode/lamp

This command fetches a docker image called linode/lamp which contains a LAMP stack.

If you run “docker images” again, you should now see the fetched image.

At this time, you should get acquainted with Docker Hub and create an account on it (remember your username). Docker Hub is a place where you and others can push docker images to share with the rest of the world. This includes pre-built LAMP stack images or it could just be a simple database image. As a user of docker, you may be able to just get away with using these pre-built images…. but we’re going to get our hands dirty and create our own docker image.

Create an empty directory, and in it, create a file called Dockerfile. Here’s a sample

# inherit from the baseimage ‘linode/lamp’
FROM linode/lamp

# Just a note about who wrote this image

# Since it’s a LAMP image, this exposes the the apache2 and mysql ports
EXPOSE 80 3306

Now in the same directory where the Dockerfile is, run

> docker build -t myname/myimage .
  • docker build: command to build an image
  • -t: flag to assign a repo and tag to the image
  • myname: should be the account name you just created on Docker Hub
  • myimage: is a name you want to assign to this image
  • .‘: to indicate the image should be built from the current directory where the Dockerfile is located

Check your list of images again with the “docker images” command and you should see your newly built image.

Now what?

Now let’s talk about containers


Remember that a container is an instance of an image. So now that we’ve got an image, we can instantiate a container with the “docker run” command like so

> docker run -d -P myname/myimage
  • docker run: command to create a container from an image
  • -d runs the container in the background
  • -P publishes all exposed ports (ie port 80 and 3306) to arbitrary ports available on the host
  • myname/myimage is the image name that we’re instantiating

You should be able to see your running container by typing

> docker ps

You can also see your containers if you open Kitematic.

Inside Kitematic, look under Settings followed by Port.

There you’ll see the port mappings of the two exposed ports.

You should be able to point your mysql client and browser to them. (Since the image was based on linode/lamp, consult its documentation for mysql credentials)

You might ask, how do I get into the container to view the directory, edit the config files or execute scripts?

From Kitematic, you can enter the container via the Exec button

Screen Shot 2015-12-16 at 11.24.27 PM

This will open a shell script into the container. This shell lacks tab-completion, history and other basic shell niceties, but it gets you inside.

There’s another way in though. Open Docker CLI and type

> docker exec -it container_name bash
  • docker exec executes a command inside the container
  • -it runs the command in interactive mode with text console
  • container_name is arbitrary name given to your container when you created it
  • bash is the command you’re running which in this case is the bash shell

In summary, this (exec)utes an (-i)nteractive (bash) shell into your (container_name) container

This approach is superior since you get history and tab-complete in a familiar interactive bash shell.

Inside the container, go ahead and create a file or edit the filesystem in some way. Now you can take my word for it or try it yourself, but the data inside a container is essentially considered ephemeral. If you were to change the name of the container or some other minor change, everything would be wiped out. More importantly, you couldn’t transfer this container and its contents to anyone else unless you performed one additional step.

Commit your Docker container as an image

We’re going to convert our container into an image which can be persisted and transferred.

First we need the container ID (not the name but the nonsensical alphanumeric ID) from this command

> docker ps

Next commit your container

> docker commit CONTAINER_ID myname/myimage:version
  • docker commit converts the container to an image
  • CONTAINER_ID id of the container you’re converting
  • myname name that should match your Docker Hub account name
  • myimage name of image
  • version version of image (e.g. 1.0 or “latest”)

Check your set of images again with “docker images”. You should now have an image that encapsulates the container and any changes to it.

Now you’re ready to share this image.

Push your docker image to Docker Hub

> docker push myname/myimage:version
  • docker push uploads the image to Docker Hub
  • myname/myimage:version is the image you’re pushing

Check your Docker Hub account and you should see your image. Now anyone can pull your image like we pulled linode/lamp earlier.

Save your docker image to a tar file (and Load it back)

Alternatively, you can save your image to a tar file instead of pushing it to Docker Hub. If you have private, proprietary data, you may not want it in the public Hub, or you may just want to transfer this image internally or to a client

> docker save myname/myimage:version > myimage.tar

To reload it

> docker load < myimage.tar

You should be able to see your loaded image with this command

> docker images

Start and Stop a container

You can list the available containers with this command

> docker ps -a

To start a container

> docker start [CONTAINER_ID]

To stop a container

> docker stop [CONTAINER_ID]
Tagged ,