What do 'store', 'index', '_all', '_source' mean in Elasticsearch?

What do 'store', 'index', '_all', '_source' mean in Elasticsearch?

Last updated:
What do 'store', 'index', '_all', '_source' mean in Elasticsearch?
Source
Table of Contents

TL; DR option "store" (and the _sourcefield) controls whether a field is retrievable and option "index" controls whether it is searchable and how it's searched.

It's not always easy to guess exactly what's going on inside your Elasticsearch indexes.

For one, the actual representation of you data (i.e. tokens in an index) might look very different from the documents you index. You can control the whole process, but it can take a little digging.

For example, when defining a mapping for a type (i.e. how to map incoming documents to actual data structures in the index), you can define all sorts of options for documents and their attributes.

You can set options "index" and "store" when defining a mapping for a type. But what do they mean?

One of the options that may strike you as a little bit odd are "index" and "store". It seems odd that ES (a search engine) would give you the option not to store some attribute, or to make it unsearchable. Also, what do indexing and storing mean? Are they synonyms?

Also, special fields _source and _all are not always fully understood. Using these effectively requires understanding what gets stored and what gets indexed, and how you can tune these settings to enhance your results.

What do fields _all and _source represent?

What is the inverted index?

The inverted index is an in-memory structure (like a hash or map) where all tokens and a reference (not the whole documents!) to the documents that contain them are kept.

It is called an inverted index because tokens are the keys are document IDs are the values. In regular (non-inverted) indices document IDs are keys and the tokens it contains are the values.

Indexable fields vs searchable fields

One does not necessarily imply the other.

In general, a field can be retrieved (i.e. will be part of your search result) if it's been stored by ES. In other words, if it is included in field _source or if option "store"has been explicitly set to true for this field.

For a field to be searchable, on the other hand, mapping option "index" should be set to anything other than "no". The default value for this is `"analyzed", which means it is indexed and analyzed by default.

The "_source" attribute

When you index a document into Elasticsearch, the original document (without any analyzing or tokenizing) is stored in a special field called _source.

You can disable this though (whether you should do so is another question).

Since this refers to the whole, document, you need to configure on the level of your type. So, for example, if you have an event type, by setting its mapping to this is how you could disable the _source field (in other words, prevent ES from storing the whole document by default):

{
  "event" : {
    "_source" : {
      "enabled" : false
    }
  }
}

If you do this, however, you will need to manually set each field's "store" option to true otherwise it won't be retrievable, which brings us to the next topic.

Setting option "store" in mappings

This controls whether to add each field individually into the index.

Here you can see how you would tell ES to individually store field (also called property) "title" alongside the original document.

Note that setting "store" to false (or not setting it to anything) does not change the fact that ES still stores the whole original document on disk, and it can retrieve whatever individual field you ask for from the _sourcefield.

{
  "event" : {
    "properties" : {
      "title" : {
        "type" : "string",
        "store" : true
      },
      "description" : {
        "type" : "string"
      }
    }
  }
}

"Why would I do that?" You might ask yourself, since Elasticsearch already stores the original document in the _source field?

Apart from it being the only way to be able to retrieve a field from ES if you have disabled _sourceit might make life easier for ES if you know for sure you will only ever need that one field and not the whole document. Look at the SO answer given in the references for extra info.

Set option "index" in mappings

The default for every field is that it should be analyzed. This means that it will go through every step of the analysis process (i.e. char filters, tokenizers and token filters), and the resulting tokens will be added to the index.

You can set option "index"to make that field not searchable at all ("no") or to make it searchable without any analysis at all ("not_analyzed"). The default value is "analyzed".

Coming back to our event type, this is how you would set its "event_code" not analyzed (let's pretend it is a complex code that all events should have, like "ev-01-bay-area" or something like that. If you ask ES to analyze this field (the default behaviour), it will most likely split it into these tokens: "ev", "01", "bay" and "area".

{
  "event": {
    "properties": {
      "event_code": { 
        "type": "string", 
        "index": "not_analyzed" 
      },
      "description": { 
        "type": "string" 
      }
    }
  }
}

Field _all

The _all field is a special field that contains every other field in your document. It is convenient when you want to perform a search on all fields at the same time (probably the most common use case). In fact, when you don't specify a field for your search, the _all field is queried.

You can disable the _all field altogether and you can also prevent one field from being included in this field.

To disable the _all field altogether:

{
  "event": {
    "_all": {
      "enabled": false
    }
  }
}

To prevent a field (e.g. field "producer_name") from being included in the _all field (so that queries on the _all field will not include documents where the query matches on field "producer_name"):

{
  "event": {
    "properties": {
      "producer_name": {
        "type": "string",
        "include_in_all": false
      }
    }
  }
} 

Note that field "producer_name" will still be analyzed and returned in the query results. But, in order to search this field, you will have to query it specifically (e.g. specify it in a simple match query).


Resources


  • All information is valid for version 1.5 and possibly for all newer versions

Dialogue & Discussion