Another lesson learned with MongoDB

MongoDB is the primary database at VersionEye. Currently VersionEye is crawling more than 600K open source projects on a daily basis. Some of the crawlers are implemented in Java, others in Ruby. You can follow a library at VersionEye and as soon the next version comes out you get a email notification. Today I got this email from VersionEye.

Screen Shot 2015-06-25 at 18.20.59

As you can see the version information is missing for the Java libraries. The email template was not touched in the last couple days. Obviously the crawlers for Maven repositories are implemented in Java đŸ™‚ and they get updated more frequently. The error must be somewhere in the Java crawlers.

The version object is an embedded document in the product object. Every time a crawler finds a new version it adds a new version object to the corresponding product object. The code for that looks like that.

BasicDBObject productMatch = new BasicDBObject();
productMatch.put(Product.LANGUAGE, language);
productMatch.put(Product.PROD_KEY, prodKey);

BasicDBObject versionObj = version.getDBObject();
versionObj.put(Version.VERSION, version.getVersion());

BasicDBObject versionsUpdate = new BasicDBObject();
versionsUpdate.put("$push", new BasicDBObject(Version.VERSIONS, versionObj));

So far so good. In the next lines the product object is updated with the current time.

DBObject verUpdate = getDBObjectByKey(language, prodKey);
verUpdate.put(Product.UPDATED_AT, new Date());
getCollection().update(productMatch, verUpdate);

And of course there is a unit test case for this code and the test case is always green. On production sometimes the new version just disappears. Not always! Just sometimes. At first I thought I found a bug in MongoDB, but this only happened to the Java crawlers, never to the Ruby crawlers. So the root of all eval must be the implementation. I needed a whole day to figure it out!

On production MongoDB is running in a Replica Set on multiple hosts and 2 days ago I changed the read preference of the mongodb driver to “secondary”. That means that the read operations are distributed to ALL nodes in the Replica Set. And this is what happened.

The first code snippet always runs through and adds a new version to the product. But then the 2nd code snippet is reloading the product object from the db and executing an update.

DBObject verUpdate = getDBObjectByKey(language, prodKey);
verUpdate.put(Product.UPDATED_AT, new Date());
getCollection().update(productMatch, verUpdate);

If the changes are not yet distributed in the whole Replica Set and the read operation goes to a node which doesn’t has the new version yet, a product object is loaded without the new version. On this object the “updated_at” field is updated and stored back to the database. But the “update” method on the java driver doesn’t update only the changed field, it updates the whole object. And so it comes that it stores the object without the new version.

There are different solutions to this. First of all I could change the read preference back to “primary” again. But there is a better solution. Actually there is a way to only update single properties in a document in MongoDB. That works like this.

DBObjectnewValues = getDBObjectByKey(language, prodKey);
newValues.put(Product.UPDATED_AT, new Date());
BasicDBObject set = new BasicDBObject("$set", newValues);
getCollection().update(productMatch, set);

The big difference is in line 3. That tells the java driver to only update the changed properties. On day headache for a one liner! I hope I can save somebody else 1 day headache with this blog post.

7 thoughts on “Another lesson learned with MongoDB

  1. It’s been a year since I last worked with MongoDB but couldn’t you change the preference for the update to go to the primary? Or alternately maintain two connections, one with .SECONDARY set for reads and another targeted at the PRIMARY for writes?

    1. Yes. It would be quiet easy to have 2 connections open, one for PRIMARY and one for SECONDARY. But the actual problem was that the java drivers method name was a little bit misleading. If the method name is “update” I don’t assume that the whole objects get saved back to disk. From “update” I expect atomic updates on the fields I changed. Actually the “update” method behaves like the “save” method and to force atomic updates on specific fields I have to use native MongoDB syntax in the update object.

  2. Glad you figured that out, you should almost never use the method you were using before. This is covered in most MongoDB training.

    You should also read up on secondary reads and why they are also not advised except for some specific use cases (global latency, different indexes on secondaries etc.) Stick to primary reads.

    1. OK. That is good to know. But I don’t want to give up on the SECONDARY reads. Isn’t that the whole point of using MongoDB, that you can scale read operations?

Leave a comment