Sunday, May 24, 2015

Learning facets in elasticsearch 0.90

Today we are going to learn facet in elasticsearch. In this article, we are going to use elasticsearch 0.90.7 and with this official documentation. Let's get started.

First we index a few data for facets queries later. We are going to create index articles with type article and mainly changes on field tags.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "One",  "tags" : ["foo"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Two",  "tags" : ["foo", "bar"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Five", "tags" : ["doo", "alpha", "omega"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Six", "tags" : ["doo", "beep", "ultra"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Seven", "tags" : ["doo", "boop", "beta"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}'  

 [user@localhost ~]$ curl -XGET 'http://localhost:9200/articles/_mapping?pretty'  
 {  
  "articles" : {  
   "article" : {  
    "properties" : {  
     "tags" : {  
      "type" : "string"  
     },  
     "title" : {  
      "type" : "string"  
     }  
    }  
   }  
  }  
 }  

Okay, as we can read above index article mapping, both type are string. From the article, "The field used for facet calculations must be of type numeric, date/time or be analyzed as a single token — see the Mapping guide for details on the analysis process.". Okay, let's experiment with different type of facets.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "T*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } } '  
 {  
  "took" : 90,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 2,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   } ]  
  },  
  "facets" : {  
   "tags" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 5,  
    "other" : 0,  
    "terms" : [ {  
     "term" : "foo",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "baz",  
     "count" : 1  
    } ]  
   }  
  }  
 }  

So a query string was performed with output on the tags count. If the output of the facets is vague, the following are the explanation.

missing : The number of documents which have no value for the faceted field
total   : The total number of terms in the facet
other   : The number of terms not included in the returned facet (effectively other = total - terms )

Another example,

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "S*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } } '  
 {  
  "took" : 17,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 2,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   } ]  
  },  
  "facets" : {  
   "tags" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 6,  
    "other" : 0,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "boop",  
     "count" : 1  
    }, {  
     "term" : "beta",  
     "count" : 1  
    }, {  
     "term" : "beep",  
     "count" : 1  
    } ]  
   }  
  }  
 }  

okay, let's try others facets. A match all query with term on field tags and limit facets output to 3.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "size" : 3 } } } }'  
 {  
  "took" : 8,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 9,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "bar",  
     "count" : 2  
    } ]  
   }  
  }  
 }  

now we want query to show count for all the terms.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "all_terms" : true } } } } '  
 {  
  "took" : 3,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 1,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "beep",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "omega",  
     "count" : 1  
    }, {  
     "term" : "gamma",  
     "count" : 1  
    }, {  
     "term" : "boop",  
     "count" : 1  
    }, {  
     "term" : "beta",  
     "count" : 1  
    }, {  
     "term" : "baz",  
     "count" : 1  
    } ]  
   }  
  }  
 }  

how about exclude some term from the facets output?

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "exclude" : ["boop", "baz", "beta", "gamma"] } } } }'  
 {  
  "took" : 24,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 4,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "beep",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "omega",  
     "count" : 1  
    }, {  
     "term" : "alpha",  
     "count" : 1  
    } ]  
   }  
  }  
 }  

What about if I only want certain fields only? But because this example only has a field, it only show that field, you should try index more fields.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "fields" : ["tags"], "size" : 10 } } } }'  
 {  
  "took" : 6,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 1,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "beep",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "omega",  
     "count" : 1  
    }, {  
     "term" : "gamma",  
     "count" : 1  
    }, {  
     "term" : "boop",  
     "count" : 1  
    }, {  
     "term" : "beta",  
     "count" : 1  
    }, {  
     "term" : "baz",  
     "count" : 1  
    } ]  
   }  
  }  
 }  

What if you want to just count on a certain field?

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "facets" : { "doo_facet" : { "filter" : { "term" : { "tags" : "doo" } } } } }'  
 {  
  "took" : 3,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "doo_facet" : {  
    "_type" : "filter",  
    "count" : 4  
   }  
  }  
 }  

you can also use query, similar output as above.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "facets" : { "foo_facet" : { "query" : { "term" : { "tags" : "foo" } } } } }'  
 {  
  "took" : 2,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "foo_facet" : {  
    "_type" : "query",  
    "count" : 3  
   }  
  }  
 }  

To end this article, I leave some homework for you. You should also try the following facets, but do take note on the data type facets operate on.
range          
histogram      
date histogram  
statistic      
term stats      
geo            

In the next article, I will try out the newer version of facets, that is, aggregations.

No comments:

Post a Comment