Today we are going to learn
facet in elasticsearch. In this article, we are going to use elasticsearch 0.90.7 and with
this official documentation. Let's get started.
First we index a few data for facets queries later. We are going to create index articles with type article and mainly changes on field tags.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "One", "tags" : ["foo"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Two", "tags" : ["foo", "bar"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Five", "tags" : ["doo", "alpha", "omega"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Six", "tags" : ["doo", "beep", "ultra"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Seven", "tags" : ["doo", "boop", "beta"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}'
[user@localhost ~]$ curl -XGET 'http://localhost:9200/articles/_mapping?pretty'
{
"articles" : {
"article" : {
"properties" : {
"tags" : {
"type" : "string"
},
"title" : {
"type" : "string"
}
}
}
}
}
Okay, as we can read above index article mapping, both type are string. From the article, "
The field used for facet calculations must be of type numeric, date/time or be analyzed as a single token — see the Mapping guide for details on the analysis process.". Okay, let's experiment with different type of facets.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "T*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } } '
{
"took" : 90,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
} ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 5,
"other" : 0,
"terms" : [ {
"term" : "foo",
"count" : 2
}, {
"term" : "bar",
"count" : 2
}, {
"term" : "baz",
"count" : 1
} ]
}
}
}
So a query string was performed with output on the tags count. If the output of the facets is vague, the following are the explanation.
missing : The number of documents which have no value for the faceted field
total : The total number of terms in the facet
other : The number of terms not included in the returned facet (effectively other = total - terms )
Another example,
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "S*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } } '
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
} ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 6,
"other" : 0,
"terms" : [ {
"term" : "doo",
"count" : 2
}, {
"term" : "ultra",
"count" : 1
}, {
"term" : "boop",
"count" : 1
}, {
"term" : "beta",
"count" : 1
}, {
"term" : "beep",
"count" : 1
} ]
}
}
}
okay, let's try others facets. A match all query with term on field tags and limit facets output to 3.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "size" : 3 } } } }'
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 18,
"other" : 9,
"terms" : [ {
"term" : "doo",
"count" : 4
}, {
"term" : "foo",
"count" : 3
}, {
"term" : "bar",
"count" : 2
} ]
}
}
}
now we want query to show count for all the terms.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "all_terms" : true } } } } '
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 18,
"other" : 1,
"terms" : [ {
"term" : "doo",
"count" : 4
}, {
"term" : "foo",
"count" : 3
}, {
"term" : "beep",
"count" : 2
}, {
"term" : "bar",
"count" : 2
}, {
"term" : "ultra",
"count" : 1
}, {
"term" : "omega",
"count" : 1
}, {
"term" : "gamma",
"count" : 1
}, {
"term" : "boop",
"count" : 1
}, {
"term" : "beta",
"count" : 1
}, {
"term" : "baz",
"count" : 1
} ]
}
}
}
how about exclude some term from the facets output?
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "exclude" : ["boop", "baz", "beta", "gamma"] } } } }'
{
"took" : 24,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 18,
"other" : 4,
"terms" : [ {
"term" : "doo",
"count" : 4
}, {
"term" : "foo",
"count" : 3
}, {
"term" : "beep",
"count" : 2
}, {
"term" : "bar",
"count" : 2
}, {
"term" : "ultra",
"count" : 1
}, {
"term" : "omega",
"count" : 1
}, {
"term" : "alpha",
"count" : 1
} ]
}
}
}
What about if I only want certain fields only? But because this example only has a field, it only show that field, you should try index more fields.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "fields" : ["tags"], "size" : 10 } } } }'
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 18,
"other" : 1,
"terms" : [ {
"term" : "doo",
"count" : 4
}, {
"term" : "foo",
"count" : 3
}, {
"term" : "beep",
"count" : 2
}, {
"term" : "bar",
"count" : 2
}, {
"term" : "ultra",
"count" : 1
}, {
"term" : "omega",
"count" : 1
}, {
"term" : "gamma",
"count" : 1
}, {
"term" : "boop",
"count" : 1
}, {
"term" : "beta",
"count" : 1
}, {
"term" : "baz",
"count" : 1
} ]
}
}
}
What if you want to just count on a certain field?
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "facets" : { "doo_facet" : { "filter" : { "term" : { "tags" : "doo" } } } } }'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"doo_facet" : {
"_type" : "filter",
"count" : 4
}
}
}
you can also use query, similar output as above.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "facets" : { "foo_facet" : { "query" : { "term" : { "tags" : "foo" } } } } }'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"foo_facet" : {
"_type" : "query",
"count" : 3
}
}
}
To end this article, I leave some homework for you. You should also try the following facets, but do take note on the data type facets operate on.
range
histogram
date histogram
statistic
term stats
geo
In the next article, I will try out the newer version of facets, that is, aggregations.