{
  "_id": "_design/search_example",
  "indexes": {
    "animals": {
      "index": "function(doc){ ... }"
    }
  }
}

Search indexes, defined in design documents, allow databases to be queried using Lucene Query Parser Syntax. Search indexes are defined by an index function, similar to a map function in MapReduce views. The index function decides what data to index and store in the index.

Index functions

function(doc){
  index("default", doc._id);
  if (doc.min_length){
    index("min_length", doc.min_length, {"store": true});
  }
  if (doc.diet){
    index("diet", doc.diet, {"store": true});
  }
  if (doc.latin_name){
    index("latin_name", doc.latin_name, {"store": true});
  }
  if (doc.class){
    index("class", doc.class, {"store": true});
  }
}

The function contained in the index field is a Javascript function that is called for each document in the database. It takes the document as a parameter, extracts some data from it and then calls the index function to index that data.

The index function takes three parameters, where the third parameter is optional.

The first parameter is the name of the field you intend to use when querying the index, and which is specified in the Lucene syntax portion of subsequent queries. For example, when querying:

query=color:red

color is the Lucene field name specified as the first parameter of the index function.

The query parameter can be abbreviated to q, so another way of writing the query is as follows:

q=color:red

If the special value "default" is used when defining the name, you do not have to specify a field name at query time. The effect is that the query can be simplified:

query=red

The second parameter is the data to be indexed.

The third, optional parameter is a JavaScript object with the following fields:

Option Description Values Default
index Whether the data is indexed, and if so, how. If set to false or no, the data cannot be used for searches, but can still be retrieved from the index if store is set to true. See Analyzers for more information. analyzed, analyzed_no_norms, false, no, not_analyzed, not_analyzed_no_norms analyzed
facet Creates a faceted index. See Faceting for more information. true, false false
store If true, the value is returned in the search result; otherwise, the value is not returned. true, false false
boost A number specifying the relevance in search results. Content indexed with a boost value greater than 1 is more relevant than content indexed without a boost value. Content with a boost value less than one is not so relevant. A positive floating point number 1 (no boosting)

Index Guard Clauses

if (doc.min_length) {
  index("min_length", doc.min_length, {"store": true});
}

The index function requires the name of the data field to index as the second parameter. However, if that data field does not exist for the document, an error occurs. The solution is to use an appropriate ‘guard clause’ that checks if the field exists, and contains the expected type of data, before attempting to create the corresponding index.

if (typeof(doc.min_length) === 'number') {
  index("min_length", doc.min_length, {"store": true});
}

You might use the javascript typeof function to perform the guard clause test. If the field exists and has the expected type, the correct type name is returned, so the guard clause test succeeds and it is safe to use the index function. If the field does not exist, you would not get back the expected type of the field, therefore you would not attempt to index the field.

Whatever guard clause test you decide to use, remember that Javascript considers a result to be false if one of the following values is tested:

if (typeof(doc.min_length) !== 'undefined') {
  // The field exists, and does have a type, so we can proceed to index using it.
  ...
}

Therefore, a possible generic guard clause simply tests to ensure that the type of the candidate data field is defined.

Analyzers

{
  "_id": "_design/analyzer_example",
  "indexes": {
    "INDEX_NAME": {
      "index": "function (doc) { ... }",
      "analyzer": "$ANALYZER_NAME"
    }
  }
}

Analyzers are settings which define how to recognize terms within text. This can be helpful if you need to index multiple languages.

Here’s the list of generic analyzers supported by Cloudant search:

Analyzer Description
classic The standard Lucene analyzer, circa release 3.1. You’ll know if you need it.
email Like the standard analyzer, but tries harder to match an email address as a complete token.
keyword Input is not tokenized at all.
simple Divides text at non-letters.
standard The default analyzer. It implements the Word Break rules from the Unicode Text Segmentation algorithm.
whitespace Divides text at whitespace boundaries.

Language-Specific Analyzers

These analyzers omit very common words in the specific language, and many also remove prefixes and suffixes. The name of the language is also the name of the analyzer.

Per-Field Analyzers

{
  "_id": "_design/analyzer_example",
  "indexes": {
    "INDEX_NAME": {
      "analyzer": {
        "name": "perfield",
        "default": "english",
        "fields": {
          "spanish": "spanish",
          "german": "german"
        }
      },
      "index": "function (doc) { ... }"
    }
  }
}

The perfield analyzer configures multiple analyzers for different fields.

Stop Words

{
  "_id": "_design/stop_words_example",
  "indexes": {
    "INDEX_NAME": {
      "analyzer": {
        "name": "portuguese",
        "stopwords": [
          "foo",
          "bar",
          "baz"
        ]
      },
      "index": "function (doc) { ... }"
    }
  }
}

Stop words are words that do not get indexed. You define them within a design document by turning the analyzer string into an object.

Testing analyzer tokenization

curl 'https://<account>.cloudant.com/_search_analyze' -H 'Content-Type: application/json'
  -d '{"analyzer":"keyword", "text":"ablanks@renovations.com"}'
Host: <account>.cloudant.com
POST /_search_analyze HTTP/1.1
Content-Type: application/json
{"analyzer":"keyword", "text":"ablanks@renovations.com"}
{
  "tokens": [
    "ablanks@renovations.com"
  ]
}
curl 'https://<account>.cloudant.com/_search_analyze' -H 'Content-Type: application/json'
  -d '{"analyzer":"standard", "text":"ablanks@renovations.com"}'
Host: <account>.cloudant.com
POST /_search_analyze HTTP/1.1
Content-Type: application/json
{"analyzer":"standard", "text":"ablanks@renovations.com"}
{
  "tokens": [
    "ablanks",
    "renovations.com"
  ]
}

You can test the results of analyzer tokenization by posting sample data to the _search_analyze endpoint or using our analyzer test form.

Queries

curl https://$USERNAME.cloudant.com/$DATABASE/_design/$DESIGN_DOC/_search/$INDEX_NAME?include_docs=true\&query="*:*"\&limit=1 \
     -u $USERNAME
GET /$DATABASE/_design/$DESIGN_DOC/_search/$INDEX_NAME?include_docs=true\&query="*:*"\&limit=1 HTTP/1.1
Content-Type: application/json
Host: account.cloudant.com
var nano = require('nano');
var account = nano("https://"+$USERNAME+":"+$PASSWORD+"@"+$USERNAME+".cloudant.com");
var db = account.use($DATABASE);

db.search($DESIGN_ID, $SEARCH_INDEX, {
  q: $QUERY
}, function (err, body, headers) {
  if (!err) {
    console.log(body);
  }
});

Once you’ve got an index written, you can query it with a GET request to https://$USERNAME.cloudant.com/$DATABASE/_design/$DESIGN_ID/_search/$INDEX_NAME. Specify your search query in the query query parameter.

Query Parameters

Argument Description Optional Type Supported Values
bookmark A bookmark that was received from a previous search. This allows you to page through the results. If there are no more results after the bookmark, you get a response with an empty rows array and the same bookmark. That way you can determine that you have reached the end of the result list. yes string
counts This field defines an array of names of string fields, for which counts should be produced. The response contains counts for each unique value of this field name among the documents matching the search query. Faceting must be enabled for this parameter to function. yes JSON A JSON array of field names
drilldown This field can be used several times. Each use defines a pair of a field name and a value. The search only matches documents that have the given value in the field name. It differs from using "fieldname:value" in the q parameter only in that the values are not analyzed. Faceting must be enabled for this parameter to function. yes JSON A JSON array with two elements, the field name and the value.
group_field Field by which to group search matches. yes String A string containing the name of a string field. Fields containing other data (numbers, objects, arrays) can not be used.
group_limit Maximum group count. This field can only be used if group_field is specified. yes Numeric
group_sort This field defines the order of the groups in a search using group_field. The default sort order is relevance. yes JSON This field can have the same values as the sort field, so single fields as well as arrays of fields are supported.
highlight_fields Specifies which fields should be highlighted. If specified, the result object contains a highlights field with an entry for each specified field. yes Array of strings
highlight_pre_tag A string inserted before the highlighted word in the highlights output yes, defaults to <em> String
highlight_post_tag A string inserted after the highlighted word in the highlights output yes, defaults to </em> String
highlight_number Number of fragments returned in highlights. If the search term occurs less often than the number of fragments specified, longer fragments are returned. yes, defaults to 1 Numeric
highlight_size Number of characters in each fragment for highlights. yes, defaults to 100 characters Numeric
include_docs Include the full content of the documents in the response yes boolean
include_fields A JSON array of field names to include in search results. Any fields included must have been indexed with the store:true option. yes, the default is all fields Array of strings
limit Limit the number of the returned documents to the specified number. In case of a grouped search, this parameter limits the number of documents per group. yes numeric The limit value can be any positive integer number up to and including 200.
q Abbreviation for query. Performs a Lucene query. no string or number
query A Lucene query no string or number
ranges This field defines ranges for faceted, numeric search fields. The value is a JSON object where the fields names are numeric, faceted search fields and the values of the fields are again JSON objects. Their field names are names for ranges. The values are Strings describing the range, for example "[0 TO 10]" yes JSON The value must be on object whose fields again have objects as their values. These objects must have string describing ranges as their field values.
sort Specifies the sort order of the results. In a grouped search (when group_field is used), this parameter specifies the sort order within a group. The default sort order is relevance. yes JSON A JSON string of the form "fieldname<type>" or -fieldname<type> for descending order, where fieldname is the name of a string or number field, and type is either number or string or a JSON array of such strings. The type part is optional and defaults to number. Some examples are "foo", "-foo", "bar<string>", "-foo<number>" and ["-foo<number>", "bar<string>"]. String fields used for sorting must not be analyzed fields. Fields used for sorting must be indexed by the same indexer used for the search query.
stale Don’t wait for the index to finish building to return results. yes string ok

Relevance

When more than one result might be returned, it is possible for them to be sorted. By default, the sorting order is determined by ‘relevance’.

Relevance is measured according to Apache Lucene Scoring. As an example, if you search a simple database for the word “example”, there might be two documents that contain the word. If one document mentions the word “example” ten times, but the second document mentions it only twice, then the first document is considered to be more ‘relevant’.

If you do not provide a sort parameter, the default order used is relevance. The highest scoring matches are returned first.

If you provide a sort parameter, then matches are returned in that order, ignoring relevance.

If you still want to include the relevance ordering in your search results, use the special fields -<score> or <score> within the sort parameter.

POSTing search queries

POST /db/_design/ddoc/_search/searchname HTTP/1.1
Content-Type: application/json
Host: account.cloudant.com
curl 'https://account.cloudant.com/db/_design/ddoc/_search/searchname' -X POST -H 'Content-Type: application/json' -d @search.json
{
  "q": "index:my query",
  "sort": "foo",
  "limit": 3
}

Instead of using the GET HTTP method, you can also use POST. The main advantage of POST queries is that they can have a request body, so you can specify the request as a JSON object. Each parameter in the previous table corresponds to a field in the JSON object in the request body.

Query Syntax

// Birds
class:bird
// Animals that begin with the letter "l"
l*
// Carnivorous birds
class:bird AND diet:carnivore
// Herbivores that start with letter "l"
l* AND diet:herbivore
// Medium-sized herbivores
min_length:[1 TO 3] AND diet:herbivore
// Herbivores that are 2m long or less
diet:herbivore AND min_length:[-Infinity TO 2]
// Mammals that are at least 1.5m long
class:mammal AND min_length:[1.5 TO Infinity]
// Find "Meles meles"
latin_name:"Meles meles"
// Mammals who are herbivore or carnivore
diet:(herbivore OR omnivore) AND class:mammal
// Return all results
*:*

The Cloudant search query syntax is based on the Lucene syntax. Search queries take the form of name:value unless the name is omitted, in which case they use the default field, as demonstrated in the example provided.

Queries over multiple fields can be logically combined, and groups and fields can be further grouped. The available logical operators are case sensitive and are AND, +, OR, NOT and -. Range queries can run over strings or numbers.

If you want a fuzzy search you can run a query with ~ to find terms like the search term. For instance, look~ will find terms book and took.

You can alter the importance of a search term by adding ^ and a positive number. This makes matches containing the term more or less relevant to the power of the boost value, with 1 as the default. Any decimal between 0 and 1 reduces importance. A value greater than one increases importance.

Wild card searches are supported, for both single (?) and multiple (*) character searches. dat? would match date and data, dat* would match date, data, database, and dates. Wildcards must come after the search term. Use *:* to return all results.

Result sets from searches are limited to 200 rows, and return 25 rows by default. The number of rows returned can be changed via the limit parameter. If the search query does not specify the "group_field" argument, the response contains a bookmark. If the bookmark is passed back as a URL parameter you’ll skip through the rows you’ve already seen and get the next set of results.

The following characters require escaping if you want to search on them: + - && || ! ( ) { } [ ] ^ " ~ * ? : \ /

Escape these with a preceding backslash character.

The response to a search query contains an order field for each of the results. The order field is an array where the first element is the field or fields specified in the sort parameter. If no sort parameter is included in the query, then the order field contains the lucene relevance score. If using the ‘sort by distance’ feature as described in Geographical Searches, then the first element is the distance from a point, measured using either kilometers or miles.

Faceting

function(doc) {
  index("type", doc.type, {"facet": true});
  index("price", doc.price, {"facet": true});
}
?q=*:*&ranges={"price":{"cheap":"[0 TO 100]","expensive":"{100 TO Infinity}"}}
{
  "total_rows":100000,
  "bookmark":"g...",
  "rows":[...],
  "ranges": {
    "price": {
      "expensive": 278682,
      "cheap": 257023
    }
  }
}

Cloudant Search also supports faceted searching, which allows you to discover aggregate information about all your matches quickly and easily. You can match all documents using the special ?q=*:* query syntax, and use the returned facets to refine your query. To indicate a field should be indexed for faceted queries, set {"facet": true} in its options.

if (typeof doc.town == "string" && typeof doc.name == "string") {
  index("town", doc.town, {facet: true});
  index("town", doc.town, {facet: true});
    }

Counts

?q=*:*&counts=["type"]
{
  "total_rows":100000,
  "bookmark":"g...",
  "rows":[...],
  "counts":{
    "type":{
      "sofa": 10,
      "chair": 100,
      "lamp": 97
    }
  }
}

The count facet syntax takes a list of fields, and returns the number of query results for each unique value of each named field.

Drilldown

Add drilldown=["dimension","label"] to a search query and restrict results to documents with dimension equal to the given label. You can include multiple drilldown parameters to restrict results along multiple dimensions.

Using a drilldown parameter is similar to using key:value in the q parameter, but the drilldown parameter returns values that the search’s analyzer might skip. For example, if the analyzer did not index a stop word like “a”, drilldown will return it by drilldown=["key","a"].

Ranges

?q=*:*&ranges={"price":{"cheap":"[0 TO 100]","expensive":"{100 TO Infinity}"}}
{
  "total_rows":100000,
  "bookmark":"g...",
  "rows":[...],
  "ranges": {
    "price": {
      "expensive": 278682,
      "cheap": 257023
    }
  }
}

The range facet syntax reuses the standard Lucene syntax for ranges to return counts of results which fit into each specified category. Inclusive range queries are denoted by brackets ([, ]). Exclusive range queries are denoted by curly brackets ({, }).

Geographical searches

{
    "name":"Aberdeen, Scotland",
    "lat":57.15,
    "lon":-2.15,
    "type":"city"
}
function(doc) {
    if (doc.type && doc.type == 'city') {
        index('city', doc.name, {'store': true});
        index('lat', doc.lat, {'store': true});
        index('lon', doc.lon, {'store': true});
    }
}
curl 'https://docs.cloudant.com/examples/_design/cities-designdoc/_search/cities?q=lat:[0+TO+90]&sort="<distance,lon,lat,-74.0059,40.7127,km>"'
GET /examples/_design/cities-designdoc/_search/cities?q=lat:[0+TO+90]&sort="<distance,lon,lat,-74.0059,40.7127,km>" HTTP/1.1
Host: docs.cloudant.com
{
    "total_rows": 205,
    "bookmark": "g1AAAAEbeJzLYWBgYMlgTmGQS0lKzi9KdUhJMtfLTczJLyrRS87JL01JzCvRy0styQGqY0pkSLL___9_Fpjj5tDCOG974NGNieJZqAaY4DQgyQFIJtUjmyHXJivfY5PIgmaGKU4z8liAJEMDkAIasx9mTnPNv-PSgosTmbOI9QzEnAMQc-DuqY3U-vbZXTSRNSsLAMMnXIU",
    "rows": [
        {
            "id": "city180",
            "order": [
                8.530665755719783,
                18
            ],
            "fields": {
                "city": "New York, N.Y.",
                "lat": 40.78333333333333,
                "lon": -73.96666666666667
            }
        },
        {
            "id": "city177",
            "order": [
                13.756343205985946,
                17
            ],
            "fields": {
                "city": "Newark, N.J.",
                "lat": 40.733333333333334,
                "lon": -74.16666666666667
            }
        },
        {
            "id": "city178",
            "order": [
                113.53603438866077,
                26
            ],
            "fields": {
                "city": "New Haven, Conn.",
                "lat": 41.31666666666667,
                "lon": -72.91666666666667
            }
        }
    ]
}

In addition to searching by the content of textual fields, you can also sort your results by their distance from a geographic coordinate.

You will need to index two numeric fields (representing the longitude and latitude).

You can then query using the special sort field which takes 5 parameters:

You can combine sorting by distance with any other search query, such as range searches on the latitude and longitude or queries involving non-geographical information. That way, you can search in a bounding box and narrow down the search with additional criteria.

Highlighting Search Terms

GET /movies/_design/searches/_search/movies?q=movie_name:Azazel&highlight_fields=["movie_name"]&highlight_pre_tag="<b>"&highlight_post_tag="</b>"&highlights_size=30&highlights_number=2 HTTP/1.1
HOST: <account>.cloudant.com
Authorization: ...
curl "https://$user:$password@$account.cloudant.com/movies/_design/searches/_search/movies?q=movie_name:Azazel&highlight_fields=\[\"movie_name\"\]&highlight_pre_tag=\"<b>\"&highlight_post_tag=\"</b>\"&highlights_size=30&highlights_number=2
{
  "highlights": {
    "movie_name": [
      " on the <b>Azazel</b> Orient Express",
      " <b>Azazel</b> manuals, you"
    ]
  }
}

Sometimes it is useful to get the context in which a search term was mentioned so that you can display more detailed results to a user. To do this, you add the search_highlights parameter to the search query, specifying the field names for which you would like excerpts with the highlighted search term to be returned. By default, the search term is placed in <em> tags to highlight it, but this can be overridden using the highlights_pre_tag and highlights_post_tag parameters. The length of the fragments is 100 characters by default. A different length can be requested with the hightlights_size parameter. The highlights_number parameter controls the number of fragments returned, which defaults to 1.

In the response, a highlights field will be added with one subfield per field name. For each field, you will receive an array of fragments with the search term highlighted.

Search index metadata

GET /<DATABASE>/_design/<DDOC>/_search_info/<INDEX> HTTP/1.1
curl "https://$ACCOUNT.cloudant.com/$DATABASE/_design/$DDOC/_search_info/$INDEX" \
     -X GET -u "$USERNAME:$PASSWORD"

To retrieve information about a search index, you can send a GET request to the _search_info endpoint as shown in the example. DDOC refers to the design document that contains the index and INDEX is the name of the index.

{
  "name": "_design/DDOC/INDEX",
  "search_index": {
    "pending_seq": 7125496,
    "doc_del_count": 129180,
    "doc_count": 1066173,
    "disk_size": 728305827,
    "committed_seq": 7125496
  }
}

The response contains information about your index such as the number of document in the index and its size on disk.