Last week, I got the urge to start learning more about NoSQL databases. I know they've been a hot technology for several years now; but, other than watching a few presentations at various conferences, I don't know much about them at all. So, after finding out that MongoLab - a hosted MongoDB service - had a free developer sandbox, I signed up and started to look for a book about MongoDB. I ended up going with MongoDB: The Definitive Guide by Kristina Chodorow; and, half-way through the book, I can tell you that it is an outstanding resource.
Kristina Chodorow is a core contributor to the MongoDB project, so you know that she really knows her stuff! But, she also happens to be a fantastic writer and truly brings the MongoDB database to life in her epic 587 page book.
NOTE: The book is 587 pages according to the iBook app on my iPad. Your version may be different.
As of this blog post, I am only about half-way through the book. I decided to put the book on hold when I reached the Sharding chapter so that I could take some time to start applying the concepts that were outlined in the first few hundred pages. The first half of the book covers every aspect of MongoDB from connecting to databases, to creating, querying, updating, upserting, and deleting documents, indexing, extensive performance considerations, and replication.
It's a seriously robust book! It even talks about how documents are stored on the physical harddrive; and, how the size and mutability of a given document affects where on the drive it is stored and how often the document needs to be moved to a new disk location.
Since MongoDB - and document-oriented databases in general - are so new to me, I am approaching the subject matter with caution; I don't want to simply jump on the next "big thing." And, that's one reason that I really like MongoDB: The Definitive Guide - Kristina Chodorow doesn't sell MongoDB as the perfect solution to all your data persistance needs; instead, she very clearly talks about the pros and cons of using a document-oriented store as opposed to a more traditional relational database management system (RDMS). To quote page 227:
While MongoDB is a general-purpose database that works well for most applications, it isn't good at everything. Here are some tasks that MongoDB is not designed to do:
MongoDB does not support transactions, so systems that require transactions should use another data store. There are a couple of ways to hack in simple transaction-like semantics, particularly on a single document, but there is no database enforcement. Thus, you can make all of your clients agree to obey whatever semantics you come up with (e.g., Check the "locks" field before doing any operation) but there is nothing stopping an ignorant or malicious client from messing things up.
Joining many different types of data across many different dimensions is something relational databases are fantastic at. MongoDB isn't supposed to do this well and most likely never will.
Finally, one of the big (if hopefully temporary) reasons to use a relational database over MongoDB is if you're using tools that don't support MongoDB. From SQLAlchemy to Wordpress, there are thousands of tools that just weren't built to support MongoDB. The pool of tools that support MongoDB is growing but is hardly the size of relational databases' ecosystem, yet.
That second point, about Joining data, is another aspect of NoSQL databases that keeps me rather cautious. In my current applications, I JOIN records all the time. Heck, I even blog about the mindset I have when crafting INNER JOINs in a SQL statement. So, naturally, moving to a system that doesn't lend well to joining feels like it could be a considerable roadblock.
Luckily, Kristina Chodorow talks at length about application design and special considerations that should be taken in a MongoDB context. Specifically, she talks about normalization vs. denormalization and when it makes sense to embed one document within another; that is, when to use data duplication as opposed to embedding document references. According to her suggestions on page 216,
Embedding is better for:
- Small subdocuments.
- Data that does not change regularly.
- When eventual consistency is acceptable.
- Documents that grow by a small amount.
- Data that you'll often need to perform a second query to fetch.
- Fast reads.
References are better for:
- Large subdocuments.
- Volatile data.
- When immediate consistency is necessary.
- Documents that grow a large amount.
- Data that you'll often exclude from the results.
- Fast writes.
She also goes on to talk about half-way solutions that use partial-data duplication as well as when it makes sense to break a "Sub-collection" out into its own document. The chapter on application-design and collection-design were incredibly comforting and helped me feel much more confident that I may, one day, reach a NoSQL mindset. That said, I have a huge journey ahead of me.
Anyway, I definitely recommend MongoDB: The Definitive Guide. Kristina Chodorow did a wonderful job. And, if the first-half of the book is any indicator of quality, I know that the second-half will be time well spent.