What's new in MongoDB 2.2?

The end of summer in the northern hemisphere brought more than just cool temperatures and the promise of baseball playoffs. 10gen released version 2.2 of MongoDB, the leading document-based NoSQL data store. This release offers substantial upgrades in data-center awareness and concurrent operations, as well as two very powerful features in the way of a new aggregation framework and expiring collections.

In this article I'll explore the aggregation framework and expiring collections. I'll also briefly discuss some of the concurrency improvements availble in 2.2.

Aggregation Framework

One of the most significant improvements in MongoDB 2.2 is the introduction of the Aggregation Framework. Prior to 2.2, getting a simple count on a field required you to use a map-reduce or group query. For example, if I wanted to know the number of earthquakes in each USGS network area with a magnitude greater than or equal to 3, the following group query would suffice:

db.usgs.group({
cond: {'properties.mag': {$gte: 3}},
key: {'properties.net' : true},
reduce: function(obj, prev) {
prev.count++
},
initial: {
count : 0
},
});

This results in the following results:

[
{
"properties.net" : "ak",
"count" : 16
},
{
"properties.net" : "us",
"count" : 141
},
{
"properties.net" : "pr",
"count" : 17
},
{
"properties.net" : "se",
"count" : 1
},
{
"properties.net" : "ci",
"count" : 3
},
{
"properties.net" : "hv",
"count" : 1
},
{
"properties.net" : "uu",
"count" : 1
}
]

While straightforward in this case, the group command is certainly unintuitive and quite a bit more work than it should be. The Aggregation Framework simplifies the creation of these values. Instead of a map-reduce job, Aggregation Framework processes aggregations as a pipeline. Some pipeline steps create new outbound documents based on the inbound documents, while others perform filtering or sorting. If the earlier group command is restructured into an aggregate command, it might look something like:

db.usgs.aggregate([
{ $project :
{ mag : '$properties.mag',
net : '$properties.net' } },
{ $match :
{ mag : { $gte : 3 } } },
{ $group :
{ _id : '$net',
quakesPerNet : { $sum : 1 } } },
{ $sort :
{ _id : 1 } }
]);

In case it's not clear, the above aggregate command has the following pipeline:

The first step in the aggregation pipeline is to rename the properties in the original document to something more manageable. The $project operator creates two fields from the properties sub-document. These two properties are included in the original document and passed to the next operator, $match.

In addition to field renaming, the $project operator supports excluding fields from documents, creating new sub-documents, as well as computing fields based on the expression operators.

The $match command applies a query to the documents streaming through the pipeline, removing those which fail to meet the criteria. A best practice is to use a $match filter early in the pipeline to reduce the number of processed documents later in the pipeline.

The $group command creates a new document from each document it receives from the previous step in the pipeline. In this case, each document will have an _id of the projected network name and the count of the number of times it occurred, represented by the quakesPerNet field. Finally, the results are sorted on the _id field, yielding the following result:

{
"result" : [
{
"_id" : "ak",
"quakesPerNet" : 16
},
{
"_id" : "ci",
"quakesPerNet" : 3
},
{
"_id" : "hv",
"quakesPerNet" : 1
},
{
"_id" : "pr",
"quakesPerNet" : 17
},
{
"_id" : "se",
"quakesPerNet" : 1
},
{
"_id" : "us",
"quakesPerNet" : 141
},
{
"_id" : "uu",
"quakesPerNet" : 1
}
],
"ok" : 1
}

This only scratches the surface of MongoDB's Aggregation Framework. You'll want to review the reference documentation for more detail.

Time-To-Live Collections

Prior to MongoDB 2.2, if your application needed to expire data from collections, you had to create your own process to handle the expiration. This often called for tedious cron or Quartz jobs. Starting in MongoDB 2.2, you can easily create collections which expire documents in a controlled manner. Creating a TTL collection is easy - just create an index on a Date field:

db.sessions.ensureIndex( { 'sessionCreated' : 1 }, { expireAfterSeconds : 1200 } );

However, if you specify expireAfterSeconds on a non-Date field, the MongoDB shell will just create a non-expiring index without informing you of the error.

TTL collections are useful for use cases such as expiring cached items, log entries or sessions.

Concurrency Improvements

The most significant performance improvement in MongoDB 2.2 is the area of concurrency by introducing yields on page faults and the replacement of the global read/write lock with database-level locking. In the first improvement, mongod yields its write lock if a page fault is going to occur. This greatly improves disk I/O and is helpful if your database is substantially larger than available RAM.

The second improvement, database-level locking, offers obvious benefits if your mongod has more than one database. While not an ideal level of granularity, 10gen claims finer levels of granularity will be forthcoming in future version of MongoDB. It should be mentioned that global read/write locks still occur for certain operations. For example, writing to the journal causes a brief global lock since there's only one journal file per mongod. A global lock can also occur when executing multi-database commands, such as copyDatabase.


< Back to the blog