First we index a few data for facets queries later. We are going to create index articles with type article and mainly changes on field tags.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "One", "tags" : ["foo"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Two", "tags" : ["foo", "bar"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Five", "tags" : ["doo", "alpha", "omega"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Six", "tags" : ["doo", "beep", "ultra"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Seven", "tags" : ["doo", "boop", "beta"]}'
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}'
[user@localhost ~]$ curl -XGET 'http://localhost:9200/articles/_mapping?pretty'
{
"articles" : {
"article" : {
"properties" : {
"tags" : {
"type" : "string"
},
"title" : {
"type" : "string"
}
}
}
}
}
Okay, as we can read above index article mapping, both type are string. From the article, "The field used for facet calculations must be of type numeric, date/time or be analyzed as a single token — see the Mapping guide for details on the analysis process.". Okay, let's experiment with different type of facets.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "T*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } } '
{
"took" : 90,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
} ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 5,
"other" : 0,
"terms" : [ {
"term" : "foo",
"count" : 2
}, {
"term" : "bar",
"count" : 2
}, {
"term" : "baz",
"count" : 1
} ]
}
}
}
So a query string was performed with output on the tags count. If the output of the facets is vague, the following are the explanation.
missing : The number of documents which have no value for the faceted field
total : The total number of terms in the facet
other : The number of terms not included in the returned facet (effectively other = total - terms )
Another example,
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "S*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } } '
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
} ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 6,
"other" : 0,
"terms" : [ {
"term" : "doo",
"count" : 2
}, {
"term" : "ultra",
"count" : 1
}, {
"term" : "boop",
"count" : 1
}, {
"term" : "beta",
"count" : 1
}, {
"term" : "beep",
"count" : 1
} ]
}
}
}
okay, let's try others facets. A match all query with term on field tags and limit facets output to 3.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "size" : 3 } } } }'
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 18,
"other" : 9,
"terms" : [ {
"term" : "doo",
"count" : 4
}, {
"term" : "foo",
"count" : 3
}, {
"term" : "bar",
"count" : 2
} ]
}
}
}
now we want query to show count for all the terms.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "all_terms" : true } } } } '
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 18,
"other" : 1,
"terms" : [ {
"term" : "doo",
"count" : 4
}, {
"term" : "foo",
"count" : 3
}, {
"term" : "beep",
"count" : 2
}, {
"term" : "bar",
"count" : 2
}, {
"term" : "ultra",
"count" : 1
}, {
"term" : "omega",
"count" : 1
}, {
"term" : "gamma",
"count" : 1
}, {
"term" : "boop",
"count" : 1
}, {
"term" : "beta",
"count" : 1
}, {
"term" : "baz",
"count" : 1
} ]
}
}
}
how about exclude some term from the facets output?
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "exclude" : ["boop", "baz", "beta", "gamma"] } } } }'
{
"took" : 24,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 18,
"other" : 4,
"terms" : [ {
"term" : "doo",
"count" : 4
}, {
"term" : "foo",
"count" : 3
}, {
"term" : "beep",
"count" : 2
}, {
"term" : "bar",
"count" : 2
}, {
"term" : "ultra",
"count" : 1
}, {
"term" : "omega",
"count" : 1
}, {
"term" : "alpha",
"count" : 1
} ]
}
}
}
What about if I only want certain fields only? But because this example only has a field, it only show that field, you should try index more fields.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "fields" : ["tags"], "size" : 10 } } } }'
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 18,
"other" : 1,
"terms" : [ {
"term" : "doo",
"count" : 4
}, {
"term" : "foo",
"count" : 3
}, {
"term" : "beep",
"count" : 2
}, {
"term" : "bar",
"count" : 2
}, {
"term" : "ultra",
"count" : 1
}, {
"term" : "omega",
"count" : 1
}, {
"term" : "gamma",
"count" : 1
}, {
"term" : "boop",
"count" : 1
}, {
"term" : "beta",
"count" : 1
}, {
"term" : "baz",
"count" : 1
} ]
}
}
}
What if you want to just count on a certain field?
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "facets" : { "doo_facet" : { "filter" : { "term" : { "tags" : "doo" } } } } }'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"doo_facet" : {
"_type" : "filter",
"count" : 4
}
}
}
you can also use query, similar output as above.
[user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "facets" : { "foo_facet" : { "query" : { "term" : { "tags" : "foo" } } } } }'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [ {
"_index" : "articles",
"_type" : "article",
"_id" : "WZXN-8BcSDehuM-l1tJE3w",
"_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",
"_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "JJNPiO3_SPOIiliXEfFnRA",
"_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "cJFllNNOSYa1SxQLaDSGqA",
"_score" : 1.0, "_source" : {"title" : "One", "tags" : ["foo"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "76AjyLVST4aRhY0JE2jlAw",
"_score" : 1.0, "_source" : {"title" : "Two", "tags" : ["foo", "bar"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "3f3LNtvOT0GmZ4FNpL4wxA",
"_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}
}, {
"_index" : "articles",
"_type" : "article",
"_id" : "HccmhIJOTXqX2XG6uGbuXw",
"_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}
} ]
},
"facets" : {
"foo_facet" : {
"_type" : "query",
"count" : 3
}
}
}
To end this article, I leave some homework for you. You should also try the following facets, but do take note on the data type facets operate on.
range
histogram
date histogram
statistic
term stats
geo
In the next article, I will try out the newer version of facets, that is, aggregations.
No comments:
Post a Comment