Showing posts with label Learning. Show all posts
Showing posts with label Learning. Show all posts

Friday, June 5, 2015

Learning aggregations in elasticsearch 1.5 - part1

In the last article, we learned elasticsearch facets and in this article, we will learn the newer aggregations framework from elasticsearch. It is recommended you read the previous article on facets as it give you some idea what was in the past and it assists you in this article learning.

Before we jump into elasticsearch aggregation, let's take a look of the taxonomy of the data first. Let's use an example that you and I use everyday, we all need food. With food, we have fruits within food. A wide variety of fruits can be available and we will use different type of fruits for queries later. So you may already guessed, the index is foods and the type is fruits. Unlike previous article, in this article, we will create mapping first. The reason is with each unique type to the fruit, we can do things like range, date, or geo query.

Okay, let's get started, the following script is to create an index and its mapping. You should also be able to retrieve it here.

1:  #!/bin/bash  
2:    
3:    
4:  curl -XPUT 'http://localhost:9200/foods/?pretty'  
5:    
6:  sleep 3  
7:    
8:  curl -XPUT 'http://localhost:9200/foods/_mapping/fruits' -d '  
9:  {  
10:    "fruits" : {  
11:      "properties" : {  
12:        "insert_date"   : { "type" : "date"},  
13:        "name"      : { "type" : "string" },  
14:        "grade"      : { "type" : "string" },  
15:        "price"      : { "type" : "float"},  
16:        "price_date"   : { "type" : "date"},  
17:        "staff_update"  : { "type" : "object", "properties" : { "staff" : { "type": "object", "properties" : { "id" : { "type" : "string"}, "name" : {"type": "string"} } } } },  
18:        "quantity"    : { "type" : "integer"},  
19:        "quantity_max"  : { "type" : "integer"},  
20:        "quantity_min"  : { "type" : "integer"},  
21:        "tags"      : { "type" : "string"},  
22:        "quantity_enough" : { "type" : "boolean"},  
23:        "suppliers"    : { "type" : "nested", "properties" : { "vendor_name" : {"type": "string"}, "vendor_ip": {"type": "ip"}, "vendor_coordinate": {"type": "string"} } }  
24:      }  
25:    }  
26:  }'  

now we check if the mapping are okay and health of the cluster.

1:  [user@localhost ~]$ curl 'localhost:9200/_cat/health?v'  
2:  epoch   timestamp cluster    status node.total node.data shards pri relo init unassign pending_tasks   
3:  1431691876 14:11:16 elasticsearch green      3     3   10  5  0  0    0       0   
4:  [user@localhost ~]$ curl -XGET 'http://localhost:9200/foods/_mapping/?pretty'  
5:  {  
6:   "foods" : {  
7:    "mappings" : {  
8:     "fruits" : {  
9:      "properties" : {  
10:       "grade" : {  
11:        "type" : "string"  
12:       },  
13:       "insert_date" : {  
14:        "type" : "date",  
15:        "format" : "dateOptionalTime"  
16:       },  
17:       "name" : {  
18:        "type" : "string"  
19:       },  
20:       "price" : {  
21:        "type" : "float"  
22:       },  
23:       "price_date" : {  
24:        "type" : "date",  
25:        "format" : "dateOptionalTime"  
26:       },  
27:       "quantity" : {  
28:        "type" : "integer"  
29:       },  
30:       "quantity_enough" : {  
31:        "type" : "boolean"  
32:       },  
33:       "quantity_max" : {  
34:        "type" : "integer"  
35:       },  
36:       "quantity_min" : {  
37:        "type" : "integer"  
38:       },  
39:       "staff_update" : {  
40:        "properties" : {  
41:         "staff" : {  
42:          "properties" : {  
43:           "id" : {  
44:            "type" : "string"  
45:           },  
46:           "name" : {  
47:            "type" : "string"  
48:           }  
49:          }  
50:         }  
51:        }  
52:       },  
53:       "suppliers" : {  
54:        "type" : "nested",  
55:        "properties" : {  
56:         "vendor_coordinate" : {  
57:          "type" : "string"  
58:         },  
59:         "vendor_ip" : {  
60:          "type" : "ip"  
61:         },  
62:         "vendor_name" : {  
63:          "type" : "string"  
64:         }  
65:        }  
66:       },  
67:       "tags" : {  
68:        "type" : "string"  
69:       }  
70:      }  
71:     }  
72:    }  
73:   }  
74:  }  
75:  [user@localhost ~]$   

Look good, we are ready to index some sample data. A sample below but you should be able to get more here.

 curl -XPOST "http://localhost:9200/foods/fruits/1?pretty" -d '  
 {  
   "insert_date"   : "2015-05-15 20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 4.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "b"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }'  

okay, let's get into the actual works, min aggregation.

 {  
  "took" : 131,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 1.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 1.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 0.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "min_price" : {  
    "value" : 0.9900000095367432  
   }  
  }  
 }  

So I have no idea why is the floating end with 95367432.  Let's see on the next example, max aggregation,

 {  
  "took" : 5,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 1.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 1.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 0.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "max_price" : {  
    "value" : 1.9800000190734863  
   }  
  }  
 }  

Next, sum aggregation.

 {  
  "took" : 5,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 1.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 1.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 0.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "sum_all_item_price" : {  
    "value" : 4.350000023841858  
   }  
  }  
 }  

the average

 {  
  "took" : 5,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 1.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 1.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 0.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "avg_grade" : {  
    "value" : 1.450000007947286  
   }  
  }  
 }  

Something different now, statistics aggreation. This one is cool as you can combine the above output into one.

 {  
  "took" : 5,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 1.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 1.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 0.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "prices_stats" : {  
    "count" : 3,  
    "min" : 0.9900000095367432,  
    "max" : 1.9800000190734863,  
    "avg" : 1.450000007947286,  
    "sum" : 4.350000023841858  
   }  
  }  
 }  

and if you want extra statistics exposure, try extended stats aggregation.

 {  
  "took" : 3,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 1.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 1.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 0.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "prices_stats" : {  
    "count" : 3,  
    "min" : 0.9900000095367432,  
    "max" : 1.9800000190734863,  
    "avg" : 1.450000007947286,  
    "sum" : 4.350000023841858,  
    "sum_of_squares" : 6.804900081253052,  
    "variance" : 0.16580000403722148,  
    "std_deviation" : 0.4071854663875191,  
    "std_deviation_bounds" : {  
     "upper" : 2.264370940722324,  
     "lower" : 0.6356290751722476  
    }  
   }  
  }  
 }  

value count aggregation.

 {  
  "took" : 9,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 4.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 3.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 2.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "prices_count" : {  
    "value" : 3  
   }  
  }  
 }  

percentile aggregation. This is cool to see your data distributions, like from 1% to 99%, where are the usual data distributed.

 {  
  "took" : 15,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 4.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 3.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 2.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "quantity_outlier" : {  
    "values" : {  
     "1.0" : 9.18,  
     "5.0" : 9.9,  
     "25.0" : 13.5,  
     "50.0" : 18.0,  
     "75.0" : 19.0,  
     "95.0" : 19.8,  
     "99.0" : 19.96  
    }  
   }  
  }  
 }  

percentile ranks aggregation.

 {  
  "took" : 4,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 4.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 3.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 2.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "quantity_outlier" : {  
    "values" : {  
     "15.0" : 0.0,  
     "30.0" : 100.0  
    }  
   }  
  }  
 }  

cardinality aggregation. Note that this is experimental, it may have been removed in the future.

 {  
  "took" : 21,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 4.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 3.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 2.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "grade_count" : {  
    "value" : 3  
   }  
  }  
 }  

geo bounds aggregation

 {  
  "took" : 4,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 0.8465736,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 0.8465736,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 2.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 0.70273256,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 4.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 0.70273256,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 3.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "viewport" : { }  
  }  
 }  

Top hits Aggregation

 {  
  "took" : 38,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 3,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "1",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-15T20:18:50",  
   "name"      : "apple-a",  
   "grade"      : "A",  
   "price"      : 4.98,  
   "price_date"   : "2015-05-15",  
   "staff_update"  : {"staff" : {"id" : 9739, "name" : "John Smith"} },  
   "quantity"    : 20,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "large", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "2",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:50",  
   "name"      : "apple-b",  
   "grade"      : "B",  
   "price"      : 3.38,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 18,  
   "quantity_max"  : 30,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "medium", "red"],  
   "quantity_enough" : true,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}]  
 }  
   }, {  
    "_index" : "foods",  
    "_type" : "fruits",  
    "_id" : "3",  
    "_score" : 1.0,  
    "_source":  
 {  
   "insert_date"   : "2015-05-14T20:18:55",  
   "name"      : "apple-c",  
   "grade"      : "C",  
   "price"      : 2.99,  
   "price_date"   : "2015-05-14",  
   "staff_update"  : {"staff" : {"id" : 7795, "name" : "Tide Hunter"} },  
   "quantity"    : 9,  
   "quantity_max"  : 40,  
   "quantity_min"  : 10,  
   "tags"      : ["fruits", "foods", "small", "red"],  
   "quantity_enough" : false,  
   "suppliers"    : [{"vendor_name": "company-A", "vendor_ip": "10.10.10.1", "vendor_coordinate": "41.72,-10.35"}, {"vendor_name": "company-B", "vendor_ip": "10.20.10.1", "vendor_coordinate": "45.72,8.35"}, {"vendor_name": "company-C", "vendor_ip": "203.83.10.55", "vendor_coordinate": "11.72,18.72"}]  
 }  
   } ]  
  },  
  "aggregations" : {  
   "top-tags" : {  
    "doc_count_error_upper_bound" : 0,  
    "sum_other_doc_count" : 6,  
    "buckets" : [ {  
     "key" : "foods",  
     "doc_count" : 3,  
     "top_tag_hits" : {  
      "hits" : {  
       "total" : 3,  
       "max_score" : null,  
       "hits" : [ {  
        "_index" : "foods",  
        "_type" : "fruits",  
        "_id" : "1",  
        "_score" : null,  
        "_source":{"price":4.98},  
        "sort" : [ 1431721130000 ]  
       } ]  
      }  
     }  
    }, {  
     "key" : "fruits",  
     "doc_count" : 3,  
     "top_tag_hits" : {  
      "hits" : {  
       "total" : 3,  
       "max_score" : null,  
       "hits" : [ {  
        "_index" : "foods",  
        "_type" : "fruits",  
        "_id" : "1",  
        "_score" : null,  
        "_source":{"price":4.98},  
        "sort" : [ 1431721130000 ]  
       } ]  
      }  
     }  
    } ]  
   }  
  }  
 }  
   

Okay, we have covered a lot in this article for aggregations. But there are more to come in the next article. Hence, let's continue the rest of aggregation in the incoming article.

Sunday, May 24, 2015

Learning facets in elasticsearch 0.90

Today we are going to learn facet in elasticsearch. In this article, we are going to use elasticsearch 0.90.7 and with this official documentation. Let's get started.

First we index a few data for facets queries later. We are going to create index articles with type article and mainly changes on field tags.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "One",  "tags" : ["foo"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Two",  "tags" : ["foo", "bar"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Five", "tags" : ["doo", "alpha", "omega"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Six", "tags" : ["doo", "beep", "ultra"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Seven", "tags" : ["doo", "boop", "beta"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}'  

 [user@localhost ~]$ curl -XGET 'http://localhost:9200/articles/_mapping?pretty'  
 {  
  "articles" : {  
   "article" : {  
    "properties" : {  
     "tags" : {  
      "type" : "string"  
     },  
     "title" : {  
      "type" : "string"  
     }  
    }  
   }  
  }  
 }  

Okay, as we can read above index article mapping, both type are string. From the article, "The field used for facet calculations must be of type numeric, date/time or be analyzed as a single token — see the Mapping guide for details on the analysis process.". Okay, let's experiment with different type of facets.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "T*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } } '  
 {  
  "took" : 90,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 2,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   } ]  
  },  
  "facets" : {  
   "tags" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 5,  
    "other" : 0,  
    "terms" : [ {  
     "term" : "foo",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "baz",  
     "count" : 1  
    } ]  
   }  
  }  
 }  

So a query string was performed with output on the tags count. If the output of the facets is vague, the following are the explanation.

missing : The number of documents which have no value for the faceted field
total   : The total number of terms in the facet
other   : The number of terms not included in the returned facet (effectively other = total - terms )

Another example,

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "S*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } } '  
 {  
  "took" : 17,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 2,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   } ]  
  },  
  "facets" : {  
   "tags" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 6,  
    "other" : 0,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "boop",  
     "count" : 1  
    }, {  
     "term" : "beta",  
     "count" : 1  
    }, {  
     "term" : "beep",  
     "count" : 1  
    } ]  
   }  
  }  
 }  

okay, let's try others facets. A match all query with term on field tags and limit facets output to 3.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "size" : 3 } } } }'  
 {  
  "took" : 8,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 9,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "bar",  
     "count" : 2  
    } ]  
   }  
  }  
 }  

now we want query to show count for all the terms.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "all_terms" : true } } } } '  
 {  
  "took" : 3,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 1,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "beep",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "omega",  
     "count" : 1  
    }, {  
     "term" : "gamma",  
     "count" : 1  
    }, {  
     "term" : "boop",  
     "count" : 1  
    }, {  
     "term" : "beta",  
     "count" : 1  
    }, {  
     "term" : "baz",  
     "count" : 1  
    } ]  
   }  
  }  
 }  

how about exclude some term from the facets output?

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "exclude" : ["boop", "baz", "beta", "gamma"] } } } }'  
 {  
  "took" : 24,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 4,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "beep",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "omega",  
     "count" : 1  
    }, {  
     "term" : "alpha",  
     "count" : 1  
    } ]  
   }  
  }  
 }  

What about if I only want certain fields only? But because this example only has a field, it only show that field, you should try index more fields.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "fields" : ["tags"], "size" : 10 } } } }'  
 {  
  "took" : 6,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 1,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "beep",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "omega",  
     "count" : 1  
    }, {  
     "term" : "gamma",  
     "count" : 1  
    }, {  
     "term" : "boop",  
     "count" : 1  
    }, {  
     "term" : "beta",  
     "count" : 1  
    }, {  
     "term" : "baz",  
     "count" : 1  
    } ]  
   }  
  }  
 }  

What if you want to just count on a certain field?

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "facets" : { "doo_facet" : { "filter" : { "term" : { "tags" : "doo" } } } } }'  
 {  
  "took" : 3,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "doo_facet" : {  
    "_type" : "filter",  
    "count" : 4  
   }  
  }  
 }  

you can also use query, similar output as above.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "facets" : { "foo_facet" : { "query" : { "term" : { "tags" : "foo" } } } } }'  
 {  
  "took" : 2,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "foo_facet" : {  
    "_type" : "query",  
    "count" : 3  
   }  
  }  
 }  

To end this article, I leave some homework for you. You should also try the following facets, but do take note on the data type facets operate on.
range          
histogram      
date histogram  
statistic      
term stats      
geo            

In the next article, I will try out the newer version of facets, that is, aggregations.

Saturday, May 9, 2015

Light walkthrough on Java Execution Time Measurement Library (JETM)

Today, let's learn a java library, Java Execution Time Measurement Library or JETM. What is JETM?

From the official site
A small and free library, that helps locating performance problems in existing Java applications.

 

JETM enables developers to track down performance issues on demand, either programmatic or declarative with minimal impact on application performance, even in production.

jetm is pretty cool and has a lot of features.

You can follow the tutorial trail here. The following codes are taken from one of the tutorial with minor modification.
public class BusinessService {

private static final EtmMonitor etmMonitor = EtmManager.getEtmMonitor();

public void someMethod() {
EtmPoint point = etmMonitor.createPoint("BusinessService:someMethod");

try {
Thread.sleep((long)(10d * Math.random()));
nestedMethod();
} catch (InterruptedException e ) {

} finally {
point.collect();
}
}

public void nestedMethod() {
EtmPoint point = etmMonitor.createPoint("BusinessService:nestedMethod");

try {
Thread.sleep((long)(15d * Math.random()));
} catch (InterruptedException e) {

} finally {
point.collect();
}

}

public static void main(String[] args) {
BasicEtmConfigurator.configure(true);
//etmMonitor = EtmManager.getEtmMonitor();
etmMonitor.start();
BusinessService bizz = new BusinessService();
bizz.someMethod();
bizz.someMethod();
bizz.someMethod();
bizz.someMethod();
bizz.nestedMethod();
etmMonitor.render(new SimpleTextRenderer());

etmMonitor.stop();
}

}

Hit the run button in eclipse.
EtmMonitor info [INFO] JETM 1.2.3 started.
|--------------------------------|---|---------|-------|--------|--------|
| Measurement Point | # | Average | Min | Max | Total |
|--------------------------------|---|---------|-------|--------|--------|
| BusinessService:nestedMethod | 1 | 4.121 | 4.121 | 4.121 | 4.121 |
|--------------------------------|---|---------|-------|--------|--------|
| BusinessService:someMethod | 4 | 12.611 | 6.196 | 16.347 | 50.442 |
| BusinessService:nestedMethod | 4 | 5.381 | 0.017 | 10.194 | 21.523 |
|--------------------------------|---|---------|-------|--------|--------|
EtmMonitor info [INFO] Shutting down JETM.

So we saw that nestedMethod execute once and four time for someMethod. The result showing a minimum and maximum for the execution with an avarage. Last column shown the total. Pretty neat for a small java library.

 

Friday, May 8, 2015

Elasticsearch no node exception happened in tomcat web container

If you ever get the stack trace in web container log file such as below and wondering how to solve these. Then read on but first, a little background. A elasticsearch cluster 0.90 and client running on tomcat web container using elasticsearch java transport client. Both server and client running same elasticsearch version and same java version.
16.Feb 6:21:30,830 ERROR WebAppTransportClient [put]: error
org.elasticsearch.client.transport.NoNodeAvailableException: No node available
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:212)
at org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106)
at org.elasticsearch.client.support.AbstractClient.index(AbstractClient.java:84)
at org.elasticsearch.client.transport.TransportClient.index(TransportClient.java:316)
at org.elasticsearch.action.index.IndexRequestBuilder.doExecute(IndexRequestBuilder.java:324)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
at com.example.elasticsearch.WebAppTransportClient.put(WebAppTransportClient.java:258)
at com.example.elasticsearch.WebAppTransportClient.put(WebAppTransportClient.java:307)
at com.example.threadpool.TaskThread.run(TaskThread.java:38)
at java.lang.Thread.run(Thread.java:662)

This exception will disappear once web container is restarted but restarting webapp that often is not a good solution in production. I did a few research on line and gather a few information, they are as following:

* The default number of channels in each of these class are configured with the configuration prefix of transport.connections_per_node.
https://www.found.no/foundation/elasticsearch-networking/

* If you see NoNodeAvailableException you may have hit a connect timeout of the client. Connect timeout is 30 secs IIRC.
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/elasticsearch/VyNpCs17aTA/CcXkYvVMYWAJ

* You can set org.elasticsearch.client.transport to TRACE level in your logging configuration (on the client side) to see the failures it has (to connect for example). For more information, you can turn on logging on org.elasticsearch.client.transport.
https://groups.google.com/forum/#!topic/elasticsearch/Mt2x4d5BCGI

* This means that you started to get disconnections between the client (transport) and the server. It will try and reconnect automatically, and possibly manages to do it. For more information, you can turn on logging on org.elasticsearch.client.transport.
* Can you try and increase the timeout and see how it goes? Set client.transport.ping_timeout in the settings you pass to the TransportClient to 10s for example.
* We had the same problem. reason: The application server uses a older version of log4j than ES needed.
http://elasticsearch-users.115913.n3.nabble.com/No-node-available-Exception-td3920119.html

* The correct method is to add the known host addresses with addTransportAddresses() and afterwards check the connectedNodes() method. If it returns empty list, no nodes could be found.
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/elasticsearch/ceH3UIy14jM/XJSFKd8kAXEJ

* the most common case for NoNodeAvailable is the regular pinging that the transport client does fails to do it, so no nodes end up as the list of nodes that the transport client uses. If you will set client.transport (or org.elasticsearch.client.transport if running embedded) to TRACE, you will see the pinging effort and if it failed or not (and the reason for the failures). This might get us further into trying to understand why it happens.
* .put("client.transport.ping_timeout", pingTimeout)
* .put("client.transport.nodes_sampler_interval", pingSamplerInterval).build();
https://groups.google.com/forum/#!msg/elasticsearch/9aSkB0AVrHU/_4kDkjAFKuYJ

* this has nothing to do with migration errors. Your JVM performs a very long GC of 9 seconds which exceeds the default ping timeout of 5 seconds, so ES dropped the connection ,assuming your JVM is just too busy. Try again if you can reproduce it. If yes, increase the timeout to something like 10 seconds, or consider to update your Java version.
http://elasticsearch-users.115913.n3.nabble.com/Migration-errors-0-20-1-to-0-90-td4035165.html

* During long GC the JVM is somehow suspended. So your client can not see it anymore.
http://grokbase.com/t/gg/elasticsearch/136fw0hppp/transport-client-ping-timeout-no-node-available-exception

* You wrote that you have a 0.90.9 cluster but you added 0.90.0 jars to the client. Is that correct?
* Please check:
*
* if your cluster nodes and client node is using exactly the same JVM
* if your cluster and client use exactly the same ES version
* if your cluster and client use the same cluster name
* reasons outside ES: IP blocking, network reachability, network interfaces, IPv4/IPv6 etc.
* Then you should be able to connect with TransportClient.

https://groups.google.com/forum/#!msg/elasticsearch/fYmKjGywe8o/z9Ci5L5WjUAJ

So I have tried all that option mentioned and the problem solve by added sniff to the transport client setting. 08988For more information, read here.

I hope this will solve your problem too.

Saturday, April 25, 2015

My way of solving tomcat memory leaking issue

Recently, I did a mistake by accidentally commit a stupid static codes into a static method into production causing heap usage grow tremendously. Since the static method stay persisted with the object, tomcat has to restart often to free up the heap that get hold. So today, I will share my experience on how I solve it and I hope it will give you a way on how to solve this difficult problem.
First is the to end, I will summarize the sequence you need to investigate and find out the fix.

* CHECK YOUR CODE.
* learn on how to find the memory leak using google.
* one step at a time to trace until you successfully pin down the problem and fix it.

As you can read, only three general steps but for each step, I will talk more about it.
CHECK YOUR CODE.

Always check your code by reading and tests! Best if you have someone experience and you can probably send your code for inspection. Remember, 4 eyes ball and 2 brains are better than 2 eyes ball and a brain. If you are using opensource project, most probably, the library are well tested and you should just spend time to investigate your codes. It's difficult especially for new programmer, but that should not stopped you to find out the problem. If you still cannot find out the problem, then you should start to search on search engine on how people solve it.
learn on how to find the memory leak using google.
Nobody is perfect and know everything, but if you are unsure, always google away. Google keyword such as java memory leak, tomcat memory leak or even best java coding practice. Pay attention on the first 10 links return by google and then read on blogging or even stackoverflow, it will give you knowledge that you never know of. Example of tools needed include jstat, jmap, jhat, and visualvm that can give you an idea what or even where might be the problem from. Remember, reading this material is a way of growing and it take times, so please be patience at this step and make sure u spend adequate amount of time and jot down important points mentioned and so you can use it on final step.

one step at a time to trace until you successfully pin down the problem and fix it.
Final step would probably repeating step 1 and step 2 slowly to determine the root cause. If you are using versoning system, you should really find out when was the last best working codes and start to check file by file where the problem was introduced. This is a TEDIOUS and DAUNTING process but this is effective to solving the root cause.
These steps were used by myself during determine the tomcat web application memory problem. Thank you and I hope you can benefit too.

Friday, April 24, 2015

Learning java jstat

Today, we will going to learn a java tool, which is incredibly useful if you are frequent coding for java application. This java tool is a monitoring tool known as jstat and it came with jdk. So you would ask why would I need to use jstat, my app run just fine. So for a simple java application, yes, you do not need to this monitoring tool. However if you have a long running application or big java codebase application, and sometime when your java application run midway hang (pause/freeze), then you should start to look into this tool really. In this article, I'm going to show you how I use it.

But first, let understand on what is jstat.
The jstat tool displays performance statistics for an instrumented HotSpot Java virtual machine (JVM).

As you aware, object that you wrote in the code will eventually get free from heap when it is not reference. If you has a lot of objects and heap usage grow, then you can use this monitoring tool to check out wassup of the heap allocation. Okay now, let's read into the command input.
jstat [ generalOption | outputOptions vmid [interval[s|ms] [count]] ]

so pretty simple, the commands jstat followed by a few parameters. The parameters can be explain below. You can find official documentation here.

generalOption
A single general command-line option (-help or -options)

outputOptions
One or more output options, consisting of a single statOption, plus any of the -t, -h, and -J options.

vmid
Virtual machine identifier, a string indicating the target Java virtual machine (JVM). The general syntax is
[protocol:][//]lvmid[@hostname[:port]/servername]
The syntax of the vmid string largely corresponds to the syntax of a URI. The vmid can vary from a simple integer representing a local JVM to a more complex construction
specifying a communications protocol, port number, and other implementation-specific values. See Virtual Machine Identifier for details.

interval[s|ms]
Sampling interval in the specified units, seconds (s) or milliseconds (ms). Default units are milliseconds. Must be a positive integer. If specified, jstat will produce its
output at each interval.

count
Number of samples to display. Default value is infinity; that is, jstat displays statistics until the target JVM terminates or the jstat command is terminated. Must be a
positive integer.

It should be very clear to you if you are season java coder and if you don't, take a look at an example below.
[iser@localhost ~]$ jstat -gcutil 12345 1s
S0 S1 E O P YGC YGCT FGC FGCT GCT
10.08 0.00 70.70 69.22 59.49 122328 4380.327 355 43.146 4423.474
10.08 0.00 84.99 69.22 59.49 122328 4380.327 355 43.146 4423.474
0.00 15.62 0.00 69.24 59.49 122329 4380.351 355 43.146 4423.497

so jstat is instrument a local jvm with process id 12345 with an interval of 1 second and loop infinitely. There are different type of statistics can be shown and with the above example given, it show summary of garbage collection statistics. If you want to shown different types of gc statistics, you can use the command jstat -options and below is the table of summaries what these options display means.
Option 	                Displays...
class Statistics on the behavior of the class loader.
compiler Statistics of the behavior of the HotSpot Just-in-Time compiler.
gc Statistics of the behavior of the garbage collected heap.
gccapacity Statistics of the capacities of the generations and their corresponding spaces.
gccause Summary of garbage collection statistics (same as -gcutil), with the cause of the last and current (if applicable) garbage collection events.
gcnew Statistics of the behavior of the new generation.
gcnewcapacity Statistics of the sizes of the new generations and its corresponding spaces.
gcold Statistics of the behavior of the old and permanent generations.
gcoldcapacity Statistics of the sizes of the old generation.
gcpermcapacity Statistics of the sizes of the permanent generation.
gcutil Summary of garbage collection statistics.
printcompilation HotSpot compilation method statistics.

Out of all these options, probably the most frequently you will use is gcutil, gc and gccapacity. We will look at them with example. Please note that in order to protect the privacy of the user, there are some information is removed but what need to be presented in this article shall remained as is.

option gcutil

jstat-gcutil

As can be read above, the command jstat with option gcutil on a java process id 23483. The statistics are generated with an interval at 1 second. It has 10 columns and these column can be explain below.
Column 	Description
S0 Survivor space 0 utilization as a percentage of the space's current capacity.
S1 Survivor space 1 utilization as a percentage of the space's current capacity.
E Eden space utilization as a percentage of the space's current capacity.
O Old space utilization as a percentage of the space's current capacity.
P Permanent space utilization as a percentage of the space's current capacity.
YGC Number of young generation GC events.
YGCT Young generation garbage collection time.
FGC Number of full GC events.
FGCT Full garbage collection time.
GCT Total garbage collection time.

First five columns depict space utilization in term of percentage. The next five depict amount of young generation collection and its time, full garbage collection and its time and last, total garbage collection time. With this screen capture, we see that the eden space is filling up quickly and promoted to either survivor space 0 or survivor space 1. At one instance, some object survived and eventually promoted to old space and increased the usage by 0.01% to 5.24%. Note that also YGC is increased by one as a result to 256. This young generation collection time took 13 milliseconds. Similar pattern happen again later and we see that, YGC is increased by oen to 257 with another 13 milliseconds of collection time. In this output, there is no change to full collection, which is good. It is only one full collection happened but with a pause of 94millseconds! You might want to keep an eye on the E column so it dont fill up quickly and adjust hte young gen in your java app accordingly. But for a long term solution, you might want to spend some time to find out which code take a lot of resources and improve it.

option gc

jstat-gcAs can be read above, the command jstat with option gc on a java process id 28276. The statistics are generated with an interval at 1 second. It has 15 columns and these column can be explain below.
Column 	Description
S0C Current survivor space 0 capacity (KB).
S1C Current survivor space 1 capacity (KB).
S0U Survivor space 0 utilization (KB).
S1U Survivor space 1 utilization (KB).
EC Current eden space capacity (KB).
EU Eden space utilization (KB).
OC Current old space capacity (KB).
OU Old space utilization (KB).
PC Current permanent space capacity (KB).
PU Permanent space utilization (KB).
YGC Number of young generation GC Events.
YGCT Young generation garbage collection time.
FGC Number of full GC events.
FGCT Full garbage collection time.
GCT Total garbage collection time.

The statistics shown the capacity in term of kilobytes. First ten columns are pretty easy, the space capacity and its current utilization. The last five columns are the same as gcutil last five columns. Notice that when the column EU value near to the column EC value, young generation collection happened. Object promoted to survivor spaces. Notice that column OU grow gradually. This statistics almost the same with gcutil except that the statistics shown here display in term of bytes whereas gcutil statistics display in term of percentage.

option gccapacity

jstat-gccapacity

As can be read above, the command jstat with option gccapacity on a java process id 13080. The statistics are generated with an interval at 1 second. It has 16 columns and these column can be explain below.
Column 	Description
NGCMN Minimum new generation capacity (KB).
NGCMX Maximum new generation capacity (KB).
NGC Current new generation capacity (KB).
S0C Current survivor space 0 capacity (KB).
S1C Current survivor space 1 capacity (KB).
EC Current eden space capacity (KB).
OGCMN Minimum old generation capacity (KB).
OGCMX Maximum old generation capacity (KB).
OGC Current old generation capacity (KB).
OC Current old space capacity (KB).
PGCMN Minimum permanent generation capacity (KB).
PGCMX Maximum Permanent generation capacity (KB).
PGC Current Permanent generation capacity (KB).
PC Current Permanent space capacity (KB).
YGC Number of Young generation GC Events.
FGC Number of Full GC Events.

These output is similar to the output of option gc but with minimum and maximum for the individual java heap.

That's it for this article and I will leave three links for your references.

http://www.cubrid.org/blog/dev-platform/how-to-monitor-java-garbage-collection/
http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html
http://oracle-base.com/articles/misc/monitoring-java-garbage-collection-using-jstat.php

 

Saturday, March 28, 2015

Investigate into apache cassandra corrupt sstable exception

Today, we will take a look at another apache cassandra 1.0.8 exception. Example of stack trace below.
ERROR [SSTableBatchOpen:2] 2015-03-07 06:11:58,544 SSTableReader.java (line 228) Corrupt sstable /var/lib/cassandra/data/MySuperKeyspace/MyColumnFamily-hc-6681=[Index.db, Statistics.db, CompressionInfo.db, Filter.db, Data.db]; skipped
java.io.IOException: Input/output error
at java.io.RandomAccessFile.readBytes0(Native Method)
at java.io.RandomAccessFile.readBytes(RandomAccessFile.java:350)
at java.io.RandomAccessFile.read(RandomAccessFile.java:385)
at org.apache.cassandra.io.util.RandomAccessReader.reBuffer(RandomAccessReader.java:128)
at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:444)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:424)
at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324)
at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:393)
at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:375)
at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:186)
at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:224)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Before we go into the code base for this stacktrace, I have no idea what is this about and this one shown when the cassandra 1.0.12 instance is booting up. Last I remember I trigger user defined compaction twice in cassandra 1.0.8 using the same sstables and after first compaction is done, then this sstable stay forever... like for two weeks plus. Then we have upgrade for the cassandra.

Enough said, let's go into the code base and understand what is really mean by corrupt sstable. Bottom of the the stack trace pretty obvious, ThreadPoolExecutor execute a future task run method.Then it is now on apache cassandra namespace codebase, as can be read below class SSTableReader, method batchOpen(), code snippet
    public static Collection<SSTableReader> batchOpen(Set<Map.Entry<Descriptor, Set<Component>>> entries,
final Set<DecoratedKey> savedKeys,
final DataTracker tracker,
final CFMetaData metadata,
final IPartitioner partitioner)
{
final Collection<SSTableReader> sstables = new LinkedBlockingQueue<SSTableReader>();

ExecutorService executor = DebuggableThreadPoolExecutor.createWithPoolSize("SSTableBatchOpen", Runtime.getRuntime().availableProcessors());
for (final Map.Entry<Descriptor, Set<Component>> entry : entries)
{
Runnable runnable = new Runnable()
{
public void run()
{
SSTableReader sstable;
try
{
sstable = open(entry.getKey(), entry.getValue(), savedKeys, tracker, metadata, partitioner);
}
catch (IOException ex)
{
logger.error("Corrupt sstable " + entry + "; skipped", ex);
return;
}
sstables.add(sstable);
}
};
executor.submit(runnable);
}

executor.shutdown();
try
{
executor.awaitTermination(7, TimeUnit.DAYS);
}
catch (InterruptedException e)
{
throw new AssertionError(e);
}

return sstables;

}

As can be read above, somewhere within the method open() throw the IOException, hence the above exception was thrown. Two stack trace up, we read that, sstable load method execute and, ByteBufferUtil.read() method. With the method read from class ByteBufferUtil as shown below.
    public static ByteBuffer read(DataInput in, int length) throws IOException
{
if (in instanceof FileDataInput)
return ((FileDataInput) in).readBytes(length);

byte[] buff = new byte[length];
in.readFully(buff);
return ByteBuffer.wrap(buff);
}

We see that, the input in a instance of FileDataInput stream and read the bytes with length. Since FileDataInput is a interface, we read that, the class that implement this interface is RandomAccessReader class and method readBytes as the follow.
public ByteBuffer readBytes(int length) throws IOException
{
assert length >= 0 : "buffer length should not be negative: " + length;

byte[] buff = new byte[length];
readFully(buff); // reading data buffer

return ByteBuffer.wrap(buff);
}

to read bytes with length is actually to read fully on the length but started on the current file pointer pointing at. And a little bit way up in the stack trace, method reBuffer()
    /**
* Read data from file starting from current currentOffset to populate buffer.
* @throws IOException on any I/O error.
*/
protected void reBuffer() throws IOException
{
resetBuffer();

if (bufferOffset >= channel.size())
return;

channel.position(bufferOffset); // setting channel position

int read = 0;

while (read < buffer.length)
{
int n = super.read(buffer, read, buffer.length - read);
if (n < 0)
break;
read += n;
}

validBufferBytes = read;

bytesSinceCacheFlush += read;

if (skipIOCache && bytesSinceCacheFlush >= MAX_BYTES_IN_PAGE_CACHE)
{
// with random I/O we can't control what we are skipping so
// it will be more appropriate to just skip a whole file after
// we reach threshold
CLibrary.trySkipCache(this.fd, 0, 0);
bytesSinceCacheFlush = 0;
}
}

and this method call superclass to read another chunk into the buffer. The upper class RandomAccessFile , method readBytes()
    /**
* Reads a sub array as a sequence of bytes.
* @param b the buffer into which the data is read.
* @param off the start offset of the data.
* @param len the number of bytes to read.
* @exception IOException If an I/O error has occurred.
*/
private int readBytes(byte b[], int off, int len) throws IOException {
Object traceContext = IoTrace.fileReadBegin(path);
int bytesRead = 0;
try {
bytesRead = readBytes0(b, off, len);
} finally {
IoTrace.fileReadEnd(traceContext, bytesRead == -1 ? 0 : bytesRead);
}
return bytesRead;
}

private native int readBytes0(byte b[], int off, int len) throws IOException;

.. and we are at the end of this path, it turn out that the call to readBytes0 thrown exception, the lower layer native non java call throwing the IO exception. You can use nodetool scrub to see if this fix the problem but what I do basically wipe the data directory for the cassandra and rebuild it. Then I don't see anymore of this message anymore.

That's it for this article and if you want to improve and/or comment, please leave your input below.