Document Versioning and MVCC

Important: All Cloudant documentation has moved to the IBM Bluemix platform. You can find the new content here, and the ‘Document Versioning and MVCC’ topic in particular here.

Content on this page will no longer be updated (Jan 31st, 2017).

Multi-version concurrency control (MVCC) is how Cloudant databases ensure that all of the nodes in a database’s cluster contain only the newest version of a document. Since Cloudant databases are eventually consistent, this is necessary to prevent inconsistencies arising between nodes as a result of synchronizing between outdated documents.

Multi-Version Concurrency Control (MVCC) enables concurrent read and write access to a Cloudant database. MVCC is a form of optimistic concurrency. It makes both read and write operations on Cloudant databases faster because there is no need for database locking on either read or write operations. MVCC also enables synchronization between Cloudant database nodes.

Revisions

Every document in a Cloudant database has a _rev field indicating its revision number.

A revision number is added to your documents by the server when you insert or modify them. The number is included in the server response when you make changes or read a document. The _rev value is constructed using a combination of a simple counter and a hash of the document.

The two main uses of the revision number are to help:

  1. Determine what documents must be replicated between servers.
  2. Confirm that a client is trying to modify the latest version of a document.

You must specify the previous _rev when updating a document or else your request will fail and return a 409 error.

You can query a particular revision using its _rev, however, older revisions are regularly deleted by a process called compaction. A consequence of compaction is that you cannot rely on a successful response when querying a particular document revision using its _rev to obtain a history of revisions to your document. If you need a version history of your documents, a solution is to create a new document for each revision.

Distributed Databases and Conflicts

Distributed databases work without a constant connection to the main database on Cloudant, which is itself distributed, so updates based on the same previous version can still be in conflict.

To find conflicts, add the query parameter conflicts=true when retrieving a document. The document will contain a _conflicts array with all conflicting revisions.

To find conflicts for multiple documents in a database, write a view. An example map function is provided, that emits all conflicting revisions for every document with a conflict.

function (doc) {
  if (doc._conflicts) {
    emit(null, [doc._rev].concat(doc._conflicts));
  }
}

You can then regularly query this view and resolve conflicts as needed, or query the view after each replication.

How to resolve conflicts

Once you’ve found a conflict, you can resolve it in 4 steps.

Let’s consider an example of how this can be done. Suppose you have a database of products for an online shop. The first version of a document might look like this example provided.

{
  "_id": "74b2be56045bed0c8c9d24b939000dbe",
  "_rev": "1-7438df87b632b312c53a08361a7c3299",
  "name": "Samsung Galaxy S4",
  "description": "",
  "price": 650
}

As the document doesn’t have a description yet, someone might add one.

{
  "_id": "74b2be56045bed0c8c9d24b939000dbe",
  "_rev": "2-61ae00e029d4f5edd2981841243ded13",
  "name": "Samsung Galaxy S4",
  "description": "Latest smartphone from Samsung",
  "price": 650
}

At the same time, someone else - working with a replicated database - reduces the price.

{
  "_id": "74b2be56045bed0c8c9d24b939000dbe",
  "_rev": "2-f796915a291b37254f6df8f6f3389121",
  "name": "Samsung Galaxy S4",
  "description": "",
  "price": 600
}

Then the two databases are replicated, leading to a conflict.

Get conflicting revisions

You get the document with conflicts=true like this:

http://<account>.cloudant.com/products/<_id>?conflicts=true

And get the following response:

{
  "_id":"74b2be56045bed0c8c9d24b939000dbe",
  "_rev":"2-f796915a291b37254f6df8f6f3389121",
  "name":"Samsung Galaxy S4",
  "description":"",
  "price":600,
  "_conflicts":["2-61ae00e029d4f5edd2981841243ded13"]
}

The version with the changed price has been chosen arbitrarily as the latest version of the document and the conflict is noted in the _conflicts array. In most cases this array has only one element, but there can be many conflicting revisions.

Merge the changes

To compare the revisions to see what has been changed, your application gets all of the versions from the database with URLs like this:

Since these two changes are for different fields of the document, it is easy to merge them.

Other conflict resolution strategies are:

For a practical example of how to implement these changes, see this project with sample code.

Upload the new revision

In this example, you create a document similar to the example provided, and update the database with it.

{
  "_id": "74b2be56045bed0c8c9d24b939000dbe",
  "_rev": "3-daaecd7213301a1ad5493186d6916755",
  "name": "Samsung Galaxy S4",
  "description": "Latest smartphone from Samsung",
  "price": 600
}

Delete old revisions

To delete an old revisions, send a DELETE request to the URLs with the revision we want to delete.

DELETE https://<account>.cloudant.com/products/$_ID?rev=2-61ae00e029d4f5edd2981841243ded13
curl "https://$ACCOUNT.cloudant.com/products/$_ID?rev=2-f796915a291b37254f6df8f6f3389121" -X DELETE

After this, conflicts are resolved. You can verify this by making a GET request to the document again with the conflicts parameter set to true.