I tried a few different flavors of databases to see what life might look like outside of the relational database world.
I compared MySQL to MongoDB and Couchbase, which are document-oriented databases. I also looked at Cassandra (a column-oriented database), using the java Hector API. And of course, I used a filesystem storage implementation to see what the top range performance might be for storing and retrieving data stuff.
In my experiments, I was importing stuff and retrieving stuff, but I also wanted to know how easy and fast it would be to append to stuff I already stored (e.g. adding to a collection).
I also compared performance when doing I/O remotely (application on one machine, and database on another) vs collocated (both application and database on same machine). I relied on the databases’ transfer protocols. I did not write one for the filesystem approach so remote results for filesystem do not apply.
Let’s look at the collocated tests first (even though it’s listed second). The filesystem is obviously the fastest. It’s no frills and gets straight to the job.
MongoDB came in at nearly the same write speeds, though a bit slower in reading.
Mysql tied MongoDB’s read speed, but performed worst for writing.
Cassandra was just slow all around.
Now let’s take a look at the Remote tests.
Amazingly, MongoDB’s write speeds are still super fast. It’s transfer protocol must be very efficient. The read speeds go down but I suspect this is because it has to transfer large amounts of data back.
Mysql is consistent with read and write speeds but orders of magnitude slower than Mongo.
Cassandra’s performance is now on par and a little better than Mysql in remote I/O.
I didn’t mention anything about Couchbase because I couldn’t get it to work right. With more than 10k data writes and reads, it would start to “lose” data. Even though I import 100k items, I may end up with only 60k or so. So I may not have configured the database properly or used the client properly.
In the test results I show above, I was merely reading and writing 100K data items to the databases. I tried pushing the limits of each by reading and writing more and more. At 10M items, MongoDB and Mysql speeds are comparable at around 1500s. When I tried to import 1B items, it took Mysql 160,732s. The others all failed…
But I only took the time to configure Mongo and Cassandra as a single-node database (because I only had 1 server to work with). It seems fairly easy to make a cluster of Mongo or Cassandra nodes but that was beyond the scope of this test. I had to remove and install the 64-bit version of MongoDB at some point, but I think a single-node will only get you so far.
I found MongoDB’s java SDK super easy to use. My company decided to use it after my experiments and I’ve used the php api for MongoDB which is also very easy. It requires very little lines of code and the calls are intuitive.
I found Cassandra’s Hector SDK extremely difficult to use. I was able to read and write to the database without fail, but I can’t really say I understand every line of code I wrote (which could have resulted in poorer performance).
Needless to say, I don’t think Couchbase is a good candidate, at least out of the box.
Who’s the Winner?
Like anything else, it depends. I said nothing of data modeling which is fundamental to deciding. Lots of good reads on the interweb about this.
My general gut feel is that Mongo will be best when you need to simply read and write. There’s not a lot of support for complex joins. So use this for your front-end user data needs. You can get an app up and running quick without much consideration regarding what columns to add (since the collections/tables are schema-free).
If you need to do joins and complex analysis on data, I would stick with a relational database like Mysql.