Replication

Cloudant replication is the process that synchronizes (‘syncs’) the state of two databases. Any change which has occurred in the source database is reproduced in the target database. You can create replications between any number of databases, whether continuous or not.

Depending on your application requirements, you use replication to share and aggregate state and content.

Replication takes place in one direction only. To keep two databases synchronized with each other, you must replicate in both directions. This means that you must replicate from database1 to database2, and separately from database2 to database1.

Replications are created in one of two ways:

  1. A replication can be created using a replication document in the _replicator database. Creating and modifying replications in this way allows you to control replication in the same as working with other documents. Replication jobs created this way will be resumed after a node restart.
  2. A replication can be started by POSTing a JSON document describing the desired replication directly to the /_replicate endpoint. Replication jobs created this way are not resumed if the node they run on is restarted.

Replication document format

The format of the document used to describe a replication is as follows:

Field Name Required Description
source yes Identifies the database to copy revisions from. Can be a database URL, or an object whose url property contains the full URL of the database.
target yes Identifies the database to copy revisions to. Same format and interpretation as source. Does not have to be the same value as the source field.
continuous no Continuously syncs state from the source to the target, only stopping when deleted.
create_target no A value of true tells the replicator to create the target database if it does not exist.
doc_ids no Array of document IDs; if given, only these documents are replicated.
filter no Name of a filter function, defined in a design document. The filter function determines which documents get replicated. Note that using the selector option provides performance benefits compared with using the filter option. You should use the selector option where possible.
proxy no Proxy server URL.
selector no Provide a simple filter to select the documents that are included in the replication. Using the selector option provides performance benefits compared with using the filter option. More information about selector is available here.
since_seq no Sequence from which the replication should start. More information about since_seq is available here.
use_checkpoints
no Indicate whether to create checkpoints. Checkpoints greatly reduce the time and resources needed for repeated replications. Setting this to false removes the requirement for write access to the source database. Defaults to true.
user_ctx no An object containing the username and optionally an array of roles, for example: "user_ctx": {"name": "jane", "roles": ["admin"]}. This is needed for the replication to show up in the output of /_active_tasks.

The selector field

If you do not want to replicate the entire contents of a database, you can specify a simple filter in the selector field. The filter takes the form of a Cloudant Query selector object.

Using a selector object provides performance benefits when compared with using a filter function. You should use the selector option where possible.

{
    "source": "https://$USERNAME1:$PASSWORD1@$USERNAME1.cloudant.com/$DATABASE1",
    "target": "https://$USERNAME2:$PASSWORD2@$USERNAME2.cloudant.com/$DATABASE2",
    "selector": {
        "_id": {
            "$gte": "d2"
        }
    },
    "continuous": true
}

The selector object identifies a field (such as _id in the example), and an expression (such as "$gte": "d2") that must be true for that field in order for the selector to allow the document to be replicated. In the example, only documents that have a _id field with a value greater than or equal to "d2" are replicated.

{
    "error": "bad request",
    "reason": "<details of the problem>"
}

If there is a problem with the request, an HTTP 400 error is returned, with more details about the problem in the "reason" field of the response. The reason might be one of:

More information about using a selector object is available in the Apache CouchDB documentation.

The since_seq field

If you do not want to replicate the entire contents of a database, you can specify a ‘replication sequence value’ in the since_seq field.

The replication sequence value indicates how far you have progressed ‘through’ a database, for example as part of a replication. Setting the contents of the since_seq field to this value ensures that the replication starts from that point, rather than from the very beginning.

This field is especially useful for creating incremental copies of databases. To do this:

  1. Find the ID of the checkpoint document for the last replication. It is stored in the _replication_id field of the replication document in the _replicator database.
  2. Open the checkpoint document at /<database>/_local/<_replication_id>, where <_replication_id> is the ID you found in the previous step, and <database> is the name of the source or the target database. The document usually exists on both databases but might only exist on one.
  3. Search for the recorded_seq field of the first element in the history array.
  4. Set the since_seq field in the replication document to the value of the recorded_seq field.
  5. Start replicating to a new database.

The /_replicator database

The /_replicator database is a special database where you can PUT or POST documents to trigger replications, or DELETE to cancel ongoing replications. These documents have exactly the same content as the JSON documents you can POST to the /_replicate/ endpoint. The fields supplied in the replication document are source, target, continuous, create_target, doc_ids, filter, proxy, query_params, use_checkpoints. These fields are described in the Replication document format.

Replication documents can have a user defined _id.

The names of the source and target databases do not have to be the same.

Creating a replication

curl -X PUT https://$USERNAME:$PASSWORD@USERNAME.cloudant.com/_replicator/replication-doc -H 'Content-Type: application/json' -d @replication-document.json
#assuming replication-document.json is a json file with the following content:
PUT /_replicator/replication-doc HTTP/1.1
Content-Type: application/json
{
  "source": "https://$USERNAME1:$PASSWORD1@$USERNAME1.cloudant.com/$DATABASE1",
  "target": "https://$USERNAME2:$PASSWORD2@$USERNAME2.cloudant.com/$DATABASE2",
  "create_target": true,
  "continuous": true
}

To start a replication, add a replication document to the _replicator database.

(Optional) Creating a replication to two Bluemix environments

You can replicate a Cloudant database to multiple Bluemix environments. When you set up the replication job for each environment, the source database and target database names you provide must use this format, https://$USERNAME:$PASSWORD@$REMOTE_USERNAME.cloudant.com/$DATABASE_NAME. You create the database name, $DATABASE_NAME, and add it to the URL format. Do not copy the URL field from the VCAP_SERVICES environment variable.

Monitoring a replication

curl https://$USERNAME:$PASSWORD@$USERNAME.cloudant.com/_active_tasks
GET /_active_tasks HTTP/1.1
var nano = require('nano');
var account = nano("https://"+$USERNAME+":"+$PASSWORD+"@"+$USERNAME+".cloudant.com");

account.request({
  path: '_active_tasks'
}, function (err, body, headers) {
  if (!err) {
    console.log(body.filter(function (task) {
      return (task.type === 'replication');
    })); 
  }
});
[
  {
    "user": null,
    "updated_on": 1363274088,
    "type": "replication",
    "target": "https://repl:*****@tsm.cloudant.com/user-3dglstqg8aq0uunzimv4uiimy/",
    "docs_read": 0,
    "doc_write_failures": 0,
    "doc_id": "tsm-admin__to__user-3dglstqg8aq0uunzimv4uiimy",
    "continuous": true,
    "checkpointed_source_seq": "403-g1AAAADfeJzLYWBgYMlgTmGQS0lKzi9KdUhJMjTRyyrNSS3QS87JL01JzCvRy0styQGqY0pkSLL___9_VmIymg5TXDqSHIBkUj1YUxyaJkNcmvJYgCRDA5AC6tuflZhGrPsgGg9ANAJtzMkCAPFSStc",
    "changes_pending": 134,
    "pid": "<0.1781.4101>",
    "node": "dbcore@db11.julep.cloudant.net",
    "docs_written": 0,
    "missing_revisions_found": 0,
    "replication_id": "d0cdbfee50a80fd43e83a9f62ea650ad+continuous",
    "revisions_checked": 0,
    "source": "https://repl:*****@tsm.cloudant.com/tsm-admin/",
    "source_seq": "537-g1AAAADfeJzLYWBgYMlgTmGQS0lKzi9KdUhJMjTUyyrNSS3QS87JL01JzCvRy0styQGqY0pkSLL___9_VmI9mg4jXDqSHIBkUj1WTTityWMBkgwNQAqob39WYhextkE0HoBoBNo4MQsAFuVLVQ",
    "started_on": 1363274083
  }
]
[
  {
    "pid": "<0.1303.0>",
    "replication_id": "e42a443f5d08375c8c7a1c3af60518fb+create_target",
    "checkpointed_source_seq": 17333,
    "continuous": false,
    "doc_write_failures": 0,
    "docs_read": 17833,
    "docs_written": 17833,
    "missing_revisions_found": 17833,
    "progress": 3,
    "revisions_checked": 17833,
    "source": "http://username.cloudant.com/db/",
    "source_seq": 551202,
    "started_on": 1316229471,
    "target": "test_db",
    "type": "replication",
    "updated_on": 1316230082
  }
]

To monitor replicators currently in process, make a GET request to https://$USERNAME.cloudant.com/_active_tasks. This returns any active tasks, including replications. To filter for replications, look for documents with "type": "replication".

If you monitor the _active_tasks and find that the state of a replication is not changing, you might have a ‘stalled’ replication. If you are sure that the replication has stalled, contact Cloudant support for assistance.

For more details about the information provided by _active_tasks, see Active tasks.

Delete

DELETE /_replicator/replication-doc?rev=1-... HTTP/1.1
curl -X DELETE https://$USERNAME:$PASSWORD@$USERNAME.cloudant.com/_replicator/replication-doc?rev=1-...

To cancel a replication, simply delete its replication document from the _replicator database.

If the replication is in an error state, the replicator makes repeated attempts to achieve a successful replication. A consequence is that the replication document is updated with each attempt. This also changes the document revision value. Therefore, you should get the revision value immediately before deleting the document, otherwise you might get an HTTP 409 “document update conflict” response.

The /_replicate endpoint

curl -H 'Content-Type: application/json' -X POST "https://$USERNAME:$PASSWORD@$USERNAME.cloudant.com/_replicate" -d @replication-doc.json
#with the file replication-doc.json containing the required replication.
POST /_replicate HTTP/1.1
Content-Type: application/json
{
  "source": "http://$USERNAME:$PASSWORD@username.cloudant.com/example-database",
  "target": "http://$USERNAME2:$PASSWORD2@example.org/example-target-database"
}

It is preferable to use the Replicator Database to manage replication. Details of why are provided here.

However, replication can also be triggered by sending a POST request directly to the /_replicate API URL. The POST contains a JSON document that describes the desired replication.

Return Codes

Code Description
200 Replication request successfully completed.
202 Continuous replication request has been accepted.
404 Either the source or target database was not found.
500 JSON specification was invalid.

Use this call to request, configure, or stop, a replication operation.

The specification of the replication request is controlled through the JSON content of the request. The JSON should be an object with fields defining the source, target and other options. The fields of the JSON request are as follows:

Avoiding the /_replicate endpoint

You should use the /_replicator database in preference to the /_replicate endpoint.

A significant reason for this is that in the event of problem during replication, such as a stall, timeout, or application crash, a replication defined within the /_replicator database is automatically restarted by the system.

If you defined a replication by sending a request to the /_replicate endpoint, it cannot be restarted by the system if a problem occurs because the replication request does not persist.

In addition, replications defined in the /_replicator database are easier to monitor.

Replication Operation

POST /_replicate
Content-Type: application/json
Accept: application/json

{
   "source" : "http://user:pass@example.com/db",
   "target" : "http://user:pass@user.cloudant.com/db",
}
{
   "error" : "db_not_found"
   "reason" : "could not open http://username.cloudant.com/ol1ka/",
}

The aim of replication is that at the end of the process, all active documents on the source database are also in the destination or ‘target’ database, and that all documents deleted from the source databases are also deleted from the destination database (if they existed there).

Replication has two forms: push or pull replication:

Pull replication is helpful if your source database has a permanent IP address, and your destination database is local and has a dynamically assigned IP address, for example, obtained through DHCP. Pull replication is especially appropriate if you are replicating to a mobile or other device from a central server.

In all cases, the requested databases in the source and target specification must exist. If they do not, an error is returned within the JSON object.

Creating a target database during replication

POST http://username.cloudant.com/_replicate
Content-Type: application/json
Accept: application/json

{
   "create_target" : true
   "source" : "http://user:pass@example.com/db",
   "target" : "http://user:pass@user.cloudant.com/db",
}

If your user credentials allow it, you can create the target database during replication by adding the create_target field to the request object.

The create_target field is not destructive. If the database already exists, the replication proceeds as normal.

Canceling replication

curl -H 'Content-Type: application/json' -X POST 'https://$USERNAME:$PASSWORD@$USERNAME.cloudant.com/_replicate HTTP/1.1' -d @replication-doc.json
#the file replication-doc.json has the following content:
POST /_replicate HTTP/1.1
Content-Type: application/json
{
  "source": "https://username:password@username.cloudant.com/example-database",
  "target": "https://username:password@example.org/example-database",
  "cancel": true
}

A replication triggered by POSTing to /_replicate can be canceled by POSTing the exact same JSON object but with the additional cancel property set to true.

The replication ID can be obtained from the original replication request if it is a continuous replication. Alternatively, the replication ID can be obtained from /_active_tasks.

Single Replication

POST /_replicate
Content-Type: application/json
Accept: application/json

{
   "source" : "http://user:pass@user.cloudant.com/recipes",
   "target" : "http://user:pass@user.cloudant.com/recipes2",
}
{
   "ok" : true,
   "history" : [
      {
         "docs_read" : 1000,
         "session_id" : "52c2370f5027043d286daca4de247db0",
         "recorded_seq" : 1000,
         "end_last_seq" : 1000,
         "doc_write_failures" : 0,
         "start_time" : "Thu, 28 Oct 2010 10:24:13 GMT",
         "start_last_seq" : 0,
         "end_time" : "Thu, 28 Oct 2010 10:24:14 GMT",
         "missing_checked" : 0,
         "docs_written" : 1000,
         "missing_found" : 1000
      }
   ],
   "session_id" : "52c2370f5027043d286daca4de247db0",
   "source_last_seq" : 1000
}

Replication of a database means that the two databases - the ‘source’ and the ‘target’ - are synchronized. By default, the replication process occurs one time, and synchronizes the two databases together.

The response to a request for a single replication is a JSON structure containing the success or failure status of the synchronization process. The response also contains statistics about the process.

The structure of the response includes details about the replication status:

Continuous Replication

curl -H 'Content-Type: application/json' -X POST http://$USERNAME:$PASSWORD@$USERNAME.cloudant.com/_replicate -d @replication-doc.json
# where the file replication-doc.json indicates that the replication should be continuous
POST /_replicate HTTP/1.1
Content-Type: application/json
{
  "source": "http://username:password@example.com/foo", 
  "target": "http://username:password@username.cloudant.com/bar", 
  "continuous": true
}

By default, the synchronization of a database during replication happens only once, at the time the replicate request is made. To ensure that replication from the source database to the target database takes place continually, set the continuous field of the JSON object within the request to true.

With continuous replication, changes in the source database are replicated to the target database forever, until you specifically cancel the replication.

Changes are replicated between the two databases as long as a network connection is available between the two instances.

When in operation, the replication process does not stop when it has processed all current updates. Instead, the replication process continues to wait for further updates to the source database, and applies them to the target.

Canceling Continuous Replication

{
   "source" : "http://user:pass@example.com/db",
   "target" : "http://user:pass@user.cloudant.com/db",
   "create_target" : true,
   "continuous" : true
}
{
    "cancel" : true,
    "continuous" : true
    "create_target" : true,
    "source" : "http://user:pass@example.com/db",
    "target" : "http://user:pass@user.cloudant.com/db",
}

Cancel continuous replication by including the cancel field in the JSON request object, and setting the value to true.

Requesting cancellation of a replication that does not exist results in a 404 error.

Example replication sequence

$ curl -H 'Content-Type: application/json' -X POST 'http://username.cloudant.com/_replicate' -d @replication-doc.json
#the file replication-doc.json describes the intended replication.
POST /_replicate HTTP/1.1
Content-Type: application/json
{
  "source": "https://username:password@example.com/foo", 
  "target": "https://username:password@username.cloudant.com/bar", 
  "create_target": true, 
  "continuous": true
}
{
  "ok": true,
  "_local_id": "0a81b645497e6270611ec3419767a584+continuous+create_target"
}
curl -H 'Content-Type: application/json' -X POST http://$USERNAME:$PASSWORD@$USERNAME.cloudant.com/_replicate -d @replication-doc.json
# where the file replication-doc.json specifies the replication task to be canceled.
POST /_replicate HTTP/1.1
Content-Type: application/json
{
  "replication_id": "0a81b645497e6270611ec3419767a584+continuous+create_target",
  "cancel": true
}
{
  "ok": true,
  "_local_id": "0a81b645497e6270611ec3419767a584+continuous+create_target"
}

A simple example of creating a replication task, then cancelling it.

Replication database maintenance

A replication database should be looked after like any other database. If you do not perform regular maintenance, you might accumulate invalid documents caused by interruptions to the replication process. A large number of invalid documents can result in excess load being placed on your cluster when the replicator process is restarted by Cloudant operations.

The main action you can perform to maintain a replication database is to remove old documents. This can be done simply by determining the age of documents, and deleting them if they are no longer required.