Design Document Management

Important: All Cloudant documentation has moved to the IBM Bluemix platform. You can find the new content here, and the ‘Design Document Management’ topic in particular here.

Content on this page will no longer be updated (Jan 31st, 2017).

Article contributed by Glynn Bird, Developer Advocate at IBM Cloudant, glynn@cloudant.com

Cloudant’s scalable JSON data store has several querying mechanisms, all of which generate indices that are created and maintained separately to the core data. Indexing is not performed immediately when a document is saved. Instead, it is scheduled to happen later giving a faster, non-blocking write throughput.

{
  "_id": "23966717-5A6F-E581-AF79-BB55D6BBB613",
  "_rev": "1-96daf2e7c7c0c277d0a63c49b57919bc",
  "doc_name": "Markdown Reference",
  "body": "Lorem Ipsum",
  "ts": 1422358827
}

Cloudant’s search indexes and MapReduce views are configured by adding Design Documents to a database. Design Documents are JSON documents which contain the instructions on how the view or index is to be built. Let’s take a simple example. Assume we have a simple collection of data documents, similar to the provided example.

Each data document includes a name, a body, and a timestamp. We want to create a MapReduce view to sort our documents by timestamp.

function(doc) {
  if (doc.ts) {
    emit( doc.ts, null);
  }
}

We can do this by creating a Map function like the provided example.

The function emits the document’s timestamp so that we can use it as the key to the index; as we are not interested in the value in the index, null is emitted. The effect is to provide a time-ordered index into the document set.

{
    "_id": "_design/fetch",
    "views": {
        "by_ts": {
            "map": "function(doc) {
                if (doc.ts) {
                    emit( doc.ts, null);
                }
            }"
        }
    },
    "language": "javascript"
}

We are going to call this view “by_ts” and put it into a Design Document called “fetch”, like the provided example.

The result is that our map code has been turned into a JSON-compatible string, and included in a Design Document.

Once the Design Document is saved, Cloudant triggers server-side processes to build the fetch/by_ts view. It does this by iterating over every document in the database, and sending each one to the Javascript map function. The function returns the emitted key/value pair. As the iteration continues, each key/value pair is stored in a B-Tree index. After the index is built for the first time, subsequent re-indexing is performed only against new and updated documents. Deleted documents are de-indexed. This time-saving process is known as incremental MapReduce, as shown in the following diagram:

Illustration of Incremental MapReduce

It’s worth remembering at this point that:

Multiple views in the same design document

If we define several views in the same design document, then they are built efficiently at the same time. Each document is only read once, and passed through each view’s Map function. The downside of this approach is that modifying a design document invalidates all of the existing MapReduce views defined in that document, even if some of the views remain unaltered.

If MapReduce views must be altered independently of each other, place their definitions in separate design documents.

Illustration of Design Document version change

Managing changes to a design document

{
    "_id": "_design/fetch",
    "_rev": "2-a2324c9e74a76d2a16179c56f5315dba",
    "views": {
        "by_ts": {
            "map": "function(doc) {
                if (doc.ts) {
                    emit( doc.ts, null);
                }
            }",
            "reduce": "_count"
        }
    },
    "language": "javascript"
}

Imagine at some point in the future we decide to change the design of our view. Now, instead of returning the actual timestamp result, we are only interested in the count of how many documents match the criteria. To achieve this, the map function remains the same, but we now use a reduce of “_count”. The effect is that our design document looks like the provided example.

When this design document is saved, Cloudant completely invalidates the old index and begins building the new index from scratch, iterating over every document in turn. As with the original build, the time taken depends on how many documents are in the database, and blocks incoming queries on that view until it is complete.

But there’s a problem…

If we have an application that is accessing this view in real-time, then we might well encounter a deployment dilemma:

Coordinating changes to Design Documents

There are two ways of dealing with this change control problem.

Versioned design documents

One solution is to use versioned design document names:

Using versioned design documents is a simple way to manage change control in your Design Documents, as long as you remember to remove the older versions at a later date!

‘Move and switch’ design documents

Another approach, documented here, relies on the fact that Cloudant recognises when it has two identical design documents, and won’t waste time and resources rebuilding views it already has. In other words, if we take our design document _design/fetch and create an exact duplicate _design/fetch_OLD, then both endpoints would work interchangeably without triggering any reindexing.

The procedure to switch to the new view is this:

Move and Switch tooling

npm install -g couchmigrate

There is a command-line Node.js script that automates the ‘move and switch’ procedure, called ‘couchmigrate’. It can be installed with:

export COUCH_URL=http://127.0.0.1:5984

To use the script, first define the URL of our CouchDB/Cloudant instance by setting an environment variable called COUCH_URL.

export COUCH_URL=https://<account>@myhost.cloudant.com

This URL can be HTTP or HTTPS, and can include authentication credentials.

couchmigrate --db mydb --dd /path/to/my/dd.json

Assuming we have a design document in JSON format, stored in a file, we can then run the migrate command.

db specifies the name of the database to change and dd specifies the path to our Design Document file.

The script coordinates the ‘move and switch’ procedure, waiting until the view is built before returning. If the incoming design document is the same as the incumbent one, then the script returns almost immediately.

The source code for the script is available here: https://github.com/glynnbird/couchmigrate .

The ‘stale’ parameter

If an index is complete, but new records are added into the database, then the index is scheduled to be updated in the background. This is the state of the database shown in the following diagram:

Illustration of index scheduled for updating

When querying the view, we have three choices:

Adding “stale=ok” or “stale=update_after” can be a good way getting answers more quickly from a view, but at the expense of freshness.

A word of caution: The default behaviour distributes load evenly across nodes in the Cloudant cluster. If you use the alternative stale=ok or stale=update_after options, this might favour a subset of cluster nodes, in order to return consistent results from across the eventually consistent set. This means that the ‘stale’ parameter isn’t a perfect solution for all use-cases. However, it can be useful for providing timely responses on fast-changing data sets if your application is happy to accept stale results. If the rate of change of your data is small, adding “stale=ok” or “stale=update_after” will not bring a performance benefit, and might unevenly distribute the load on larger clusters.

Avoid using stale=ok or stale=update_after whenever possible. The reason is that the default behavior provides the freshest data, and distributes data within the cluster. If it is possible to make a client app aware that there is a large data processing task is in progress (during a regular bulk data update, for example), then the app could switch to stale=ok temporarily during these times, then revert to the default behaviour afterwards.

Note The stale option is still available, but the more useful options stable and update are available and should be used instead. For more details, see Accessing a stale view