Information Technology Blogs: discussion

Showing posts with label discussion. Show all posts

Sunday, May 24, 2015

Learning facets in elasticsearch 0.90

Today we are going to learn facet in elasticsearch. In this article, we are going to use elasticsearch 0.90.7 and with this official documentation. Let's get started.

First we index a few data for facets queries later. We are going to create index articles with type article and mainly changes on field tags.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "One",  "tags" : ["foo"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Two",  "tags" : ["foo", "bar"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Five", "tags" : ["doo", "alpha", "omega"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Six", "tags" : ["doo", "beep", "ultra"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Seven", "tags" : ["doo", "boop", "beta"]}'  
 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/article?pretty" -d '{"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}'

 [user@localhost ~]$ curl -XGET 'http://localhost:9200/articles/_mapping?pretty'  
 {  
  "articles" : {  
   "article" : {  
    "properties" : {  
     "tags" : {  
      "type" : "string"  
     },  
     "title" : {  
      "type" : "string"  
     }  
    }  
   }  
  }  
 }

Okay, as we can read above index article mapping, both type are string. From the article, "The field used for facet calculations must be of type numeric, date/time or be analyzed as a single token — see the Mapping guide for details on the analysis process.". Okay, let's experiment with different type of facets.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "T*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } } '  
 {  
  "took" : 90,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 2,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   } ]  
  },  
  "facets" : {  
   "tags" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 5,  
    "other" : 0,  
    "terms" : [ {  
     "term" : "foo",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "baz",  
     "count" : 1  
    } ]  
   }  
  }  
 }

So a query string was performed with output on the tags count. If the output of the facets is vague, the following are the explanation.

missing : The number of documents which have no value for the faceted field
total : The total number of terms in the facet
other : The number of terms not included in the returned facet (effectively other = total - terms )

Another example,

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "S*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } } '  
 {  
  "took" : 17,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 2,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   } ]  
  },  
  "facets" : {  
   "tags" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 6,  
    "other" : 0,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "boop",  
     "count" : 1  
    }, {  
     "term" : "beta",  
     "count" : 1  
    }, {  
     "term" : "beep",  
     "count" : 1  
    } ]  
   }  
  }  
 }

okay, let's try others facets. A match all query with term on field tags and limit facets output to 3.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "size" : 3 } } } }'  
 {  
  "took" : 8,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 9,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "bar",  
     "count" : 2  
    } ]  
   }  
  }  
 }

now we want query to show count for all the terms.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "all_terms" : true } } } } '  
 {  
  "took" : 3,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 1,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "beep",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "omega",  
     "count" : 1  
    }, {  
     "term" : "gamma",  
     "count" : 1  
    }, {  
     "term" : "boop",  
     "count" : 1  
    }, {  
     "term" : "beta",  
     "count" : 1  
    }, {  
     "term" : "baz",  
     "count" : 1  
    } ]  
   }  
  }  
 }

how about exclude some term from the facets output?

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "field" : "tags", "exclude" : ["boop", "baz", "beta", "gamma"] } } } }'  
 {  
  "took" : 24,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 4,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "beep",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "omega",  
     "count" : 1  
    }, {  
     "term" : "alpha",  
     "count" : 1  
    } ]  
   }  
  }  
 }

What about if I only want certain fields only? But because this example only has a field, it only show that field, you should try index more fields.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '{ "query" : { "match_all" : { } }, "facets" : { "tag" : { "terms" : { "fields" : ["tags"], "size" : 10 } } } }'  
 {  
  "took" : 6,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "tag" : {  
    "_type" : "terms",  
    "missing" : 0,  
    "total" : 18,  
    "other" : 1,  
    "terms" : [ {  
     "term" : "doo",  
     "count" : 4  
    }, {  
     "term" : "foo",  
     "count" : 3  
    }, {  
     "term" : "beep",  
     "count" : 2  
    }, {  
     "term" : "bar",  
     "count" : 2  
    }, {  
     "term" : "ultra",  
     "count" : 1  
    }, {  
     "term" : "omega",  
     "count" : 1  
    }, {  
     "term" : "gamma",  
     "count" : 1  
    }, {  
     "term" : "boop",  
     "count" : 1  
    }, {  
     "term" : "beta",  
     "count" : 1  
    }, {  
     "term" : "baz",  
     "count" : 1  
    } ]  
   }  
  }  
 }

What if you want to just count on a certain field?

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "facets" : { "doo_facet" : { "filter" : { "term" : { "tags" : "doo" } } } } }'  
 {  
  "took" : 3,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "doo_facet" : {  
    "_type" : "filter",  
    "count" : 4  
   }  
  }  
 }

you can also use query, similar output as above.

 [user@localhost ~]$ curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "facets" : { "foo_facet" : { "query" : { "term" : { "tags" : "foo" } } } } }'  
 {  
  "took" : 2,  
  "timed_out" : false,  
  "_shards" : {  
   "total" : 5,  
   "successful" : 5,  
   "failed" : 0  
  },  
  "hits" : {  
   "total" : 7,  
   "max_score" : 1.0,  
   "hits" : [ {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "WZXN-8BcSDehuM-l1tJE3w",  
    "_score" : 1.0, "_source" : {"title" : "Five", "tags" : ["doo", "alpha", "omega"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "k-Z3lbE9Tx2ZlNDb3ypA8A",  
    "_score" : 1.0, "_source" : {"title" : "Six", "tags" : ["doo", "beep", "ultra"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "JJNPiO3_SPOIiliXEfFnRA",  
    "_score" : 1.0, "_source" : {"title" : "Seven", "tags" : ["doo", "boop", "beta"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "cJFllNNOSYa1SxQLaDSGqA",  
    "_score" : 1.0, "_source" : {"title" : "One",  "tags" : ["foo"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "76AjyLVST4aRhY0JE2jlAw",  
    "_score" : 1.0, "_source" : {"title" : "Two",  "tags" : ["foo", "bar"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "3f3LNtvOT0GmZ4FNpL4wxA",  
    "_score" : 1.0, "_source" : {"title" : "Three", "tags" : ["foo", "bar", "baz"]}  
   }, {  
    "_index" : "articles",  
    "_type" : "article",  
    "_id" : "HccmhIJOTXqX2XG6uGbuXw",  
    "_score" : 1.0, "_source" : {"title" : "Nine", "tags" : ["doo", "gamma", "beep"]}  
   } ]  
  },  
  "facets" : {  
   "foo_facet" : {  
    "_type" : "query",  
    "count" : 3  
   }  
  }  
 }

To end this article, I leave some homework for you. You should also try the following facets, but do take note on the data type facets operate on.
range
histogram
date histogram
statistic
term stats
geo

In the next article, I will try out the newer version of facets, that is, aggregations.

Saturday, April 25, 2015

My way of solving tomcat memory leaking issue

Recently, I did a mistake by accidentally commit a stupid static codes into a static method into production causing heap usage grow tremendously. Since the static method stay persisted with the object, tomcat has to restart often to free up the heap that get hold. So today, I will share my experience on how I solve it and I hope it will give you a way on how to solve this difficult problem.
First is the to end, I will summarize the sequence you need to investigate and find out the fix.

* CHECK YOUR CODE.
* learn on how to find the memory leak using google.
* one step at a time to trace until you successfully pin down the problem and fix it.

As you can read, only three general steps but for each step, I will talk more about it.
CHECK YOUR CODE.

Always check your code by reading and tests! Best if you have someone experience and you can probably send your code for inspection. Remember, 4 eyes ball and 2 brains are better than 2 eyes ball and a brain. If you are using opensource project, most probably, the library are well tested and you should just spend time to investigate your codes. It's difficult especially for new programmer, but that should not stopped you to find out the problem. If you still cannot find out the problem, then you should start to search on search engine on how people solve it.
learn on how to find the memory leak using google.
Nobody is perfect and know everything, but if you are unsure, always google away. Google keyword such as java memory leak, tomcat memory leak or even best java coding practice. Pay attention on the first 10 links return by google and then read on blogging or even stackoverflow, it will give you knowledge that you never know of. Example of tools needed include jstat, jmap, jhat, and visualvm that can give you an idea what or even where might be the problem from. Remember, reading this material is a way of growing and it take times, so please be patience at this step and make sure u spend adequate amount of time and jot down important points mentioned and so you can use it on final step.

one step at a time to trace until you successfully pin down the problem and fix it.
Final step would probably repeating step 1 and step 2 slowly to determine the root cause. If you are using versoning system, you should really find out when was the last best working codes and start to check file by file where the problem was introduced. This is a TEDIOUS and DAUNTING process but this is effective to solving the root cause.
These steps were used by myself during determine the tomcat web application memory problem. Thank you and I hope you can benefit too.

Friday, April 24, 2015

Learning java jstat

Today, we will going to learn a java tool, which is incredibly useful if you are frequent coding for java application. This java tool is a monitoring tool known as jstat and it came with jdk. So you would ask why would I need to use jstat, my app run just fine. So for a simple java application, yes, you do not need to this monitoring tool. However if you have a long running application or big java codebase application, and sometime when your java application run midway hang (pause/freeze), then you should start to look into this tool really. In this article, I'm going to show you how I use it.

But first, let understand on what is jstat.

The jstat tool displays performance statistics for an instrumented HotSpot Java virtual machine (JVM).

As you aware, object that you wrote in the code will eventually get free from heap when it is not reference. If you has a lot of objects and heap usage grow, then you can use this monitoring tool to check out wassup of the heap allocation. Okay now, let's read into the command input.

jstat [ generalOption | outputOptions vmid [interval[s|ms] [count]] ]

so pretty simple, the commands jstat followed by a few parameters. The parameters can be explain below. You can find official documentation here.

generalOption
A single general command-line option (-help or -options)

outputOptions
One or more output options, consisting of a single statOption, plus any of the -t, -h, and -J options.

vmid
Virtual machine identifier, a string indicating the target Java virtual machine (JVM). The general syntax is
[protocol:][//]lvmid[@hostname[:port]/servername]
The syntax of the vmid string largely corresponds to the syntax of a URI. The vmid can vary from a simple integer representing a local JVM to a more complex construction
specifying a communications protocol, port number, and other implementation-specific values. See Virtual Machine Identifier for details.

interval[s|ms]
Sampling interval in the specified units, seconds (s) or milliseconds (ms). Default units are milliseconds. Must be a positive integer. If specified, jstat will produce its
output at each interval.

count
Number of samples to display. Default value is infinity; that is, jstat displays statistics until the target JVM terminates or the jstat command is terminated. Must be a
positive integer.

It should be very clear to you if you are season java coder and if you don't, take a look at an example below.

[iser@localhost ~]$ jstat -gcutil 12345 1s
S0 S1 E O P YGC YGCT FGC FGCT GCT 
10.08 0.00 70.70 69.22 59.49 122328 4380.327 355 43.146 4423.474
10.08 0.00 84.99 69.22 59.49 122328 4380.327 355 43.146 4423.474
0.00 15.62 0.00 69.24 59.49 122329 4380.351 355 43.146 4423.497

so jstat is instrument a local jvm with process id 12345 with an interval of 1 second and loop infinitely. There are different type of statistics can be shown and with the above example given, it show summary of garbage collection statistics. If you want to shown different types of gc statistics, you can use the command jstat -options and below is the table of summaries what these options display means.

Option 	                Displays...
class 	                Statistics on the behavior of the class loader.
compiler 	        Statistics of the behavior of the HotSpot Just-in-Time compiler.
gc 	                Statistics of the behavior of the garbage collected heap.
gccapacity      	Statistics of the capacities of the generations and their corresponding spaces.
gccause 	        Summary of garbage collection statistics (same as -gcutil), with the cause of the last and current (if applicable) garbage collection events.
gcnew 	                Statistics of the behavior of the new generation.
gcnewcapacity   	Statistics of the sizes of the new generations and its corresponding spaces.
gcold           	Statistics of the behavior of the old and permanent generations.
gcoldcapacity   	Statistics of the sizes of the old generation.
gcpermcapacity 	        Statistics of the sizes of the permanent generation.
gcutil          	Summary of garbage collection statistics.
printcompilation 	HotSpot compilation method statistics.

Out of all these options, probably the most frequently you will use is gcutil, gc and gccapacity. We will look at them with example. Please note that in order to protect the privacy of the user, there are some information is removed but what need to be presented in this article shall remained as is.

option gcutil

As can be read above, the command jstat with option gcutil on a java process id 23483. The statistics are generated with an interval at 1 second. It has 10 columns and these column can be explain below.

Column 	Description
S0 	Survivor space 0 utilization as a percentage of the space's current capacity.
S1 	Survivor space 1 utilization as a percentage of the space's current capacity.
E 	Eden space utilization as a percentage of the space's current capacity.
O 	Old space utilization as a percentage of the space's current capacity.
P 	Permanent space utilization as a percentage of the space's current capacity.
YGC 	Number of young generation GC events.
YGCT 	Young generation garbage collection time.
FGC 	Number of full GC events.
FGCT 	Full garbage collection time.
GCT 	Total garbage collection time.

First five columns depict space utilization in term of percentage. The next five depict amount of young generation collection and its time, full garbage collection and its time and last, total garbage collection time. With this screen capture, we see that the eden space is filling up quickly and promoted to either survivor space 0 or survivor space 1. At one instance, some object survived and eventually promoted to old space and increased the usage by 0.01% to 5.24%. Note that also YGC is increased by one as a result to 256. This young generation collection time took 13 milliseconds. Similar pattern happen again later and we see that, YGC is increased by oen to 257 with another 13 milliseconds of collection time. In this output, there is no change to full collection, which is good. It is only one full collection happened but with a pause of 94millseconds! You might want to keep an eye on the E column so it dont fill up quickly and adjust hte young gen in your java app accordingly. But for a long term solution, you might want to spend some time to find out which code take a lot of resources and improve it.

option gc

As can be read above, the command jstat with option gc on a java process id 28276. The statistics are generated with an interval at 1 second. It has 15 columns and these column can be explain below.

Column 	Description
S0C 	Current survivor space 0 capacity (KB).
S1C 	Current survivor space 1 capacity (KB).
S0U 	Survivor space 0 utilization (KB).
S1U 	Survivor space 1 utilization (KB).
EC 	Current eden space capacity (KB).
EU 	Eden space utilization (KB).
OC 	Current old space capacity (KB).
OU 	Old space utilization (KB).
PC 	Current permanent space capacity (KB).
PU 	Permanent space utilization (KB).
YGC 	Number of young generation GC Events.
YGCT 	Young generation garbage collection time.
FGC 	Number of full GC events.
FGCT 	Full garbage collection time.
GCT 	Total garbage collection time.

The statistics shown the capacity in term of kilobytes. First ten columns are pretty easy, the space capacity and its current utilization. The last five columns are the same as gcutil last five columns. Notice that when the column EU value near to the column EC value, young generation collection happened. Object promoted to survivor spaces. Notice that column OU grow gradually. This statistics almost the same with gcutil except that the statistics shown here display in term of bytes whereas gcutil statistics display in term of percentage.

option gccapacity

As can be read above, the command jstat with option gccapacity on a java process id 13080. The statistics are generated with an interval at 1 second. It has 16 columns and these column can be explain below.

Column 	Description
NGCMN 	Minimum new generation capacity (KB).
NGCMX 	Maximum new generation capacity (KB).
NGC 	Current new generation capacity (KB).
S0C 	Current survivor space 0 capacity (KB).
S1C 	Current survivor space 1 capacity (KB).
EC 	Current eden space capacity (KB).
OGCMN 	Minimum old generation capacity (KB).
OGCMX 	Maximum old generation capacity (KB).
OGC 	Current old generation capacity (KB).
OC 	Current old space capacity (KB).
PGCMN 	Minimum permanent generation capacity (KB).
PGCMX 	Maximum Permanent generation capacity (KB).
PGC 	Current Permanent generation capacity (KB).
PC 	Current Permanent space capacity (KB).
YGC 	Number of Young generation GC Events.
FGC 	Number of Full GC Events.

These output is similar to the output of option gc but with minimum and maximum for the individual java heap.

That's it for this article and I will leave three links for your references.

http://www.cubrid.org/blog/dev-platform/how-to-monitor-java-garbage-collection/
http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html
http://oracle-base.com/articles/misc/monitoring-java-garbage-collection-using-jstat.php

Sunday, March 29, 2015

My journey and experience on upgrading apache cassandra 1.0.8 to 1.0.12

Upon request of my blog reader, today I will share with you my experience on upgrading apache cassandra version 1.0.8 to 1.0.12 on a production live cluster. By sharing this information, I hope if you are also running and/or administer cassandra cluster, you can learn from my experience and ease your worry or pain.

First, let's lay out what's the current architecture in this environment.

java 6

12 nodes cluster.

two spinning disk with raid 0, 32GB total system memory where 14GB allocated to the cassandra heap instance, with 800MB for young gen. quad core cpu.

pretty much stock cassandra.yaml configuration with the following different like concurrent_write to 64, flush_largest_memtables_at to 0.8, compaction_throughput_mb_per_sec to 64.

node load per node average at 500-550GB.

As you can see, this is pretty ancient cassandra we are using at of this time of writing but because cassandra has been rock solid serving read/write requests for years, it stays like this stable condition forever and we leverage on the benefit of scalling out like adding nodes from six to nine and eventually to twelve now. Realizing that the disk failure do happened in the nodes of the cluster, because of cassandra has a no single point of failure in mind, we can afford to loose a single node out of operation while replacing it. That were a few of the reasons we stayed with cassandra 1.0 for quite sometime.

Because we would like to probably goes to cassandra 2.0 and beyond, and java 6 has been EOL for quite sometime, it would be wise to upgrade java before cassandra. Because system are integrated like an ecosystem, it would be also wise to look at java used in the client system that read/write requests to the cassandra cluster. So make a checklist brainstorming what are clients that integrate into the cluster and then check out what are the current stable java 7 available. Example:

cassandra 1.0 cassandra-1.0.12 java miniumum 6 and above.
https://github.com/apache/cassandra/tree/cassandra-1.0.12

hector client using casandra 2.0.4 so java 7 minimum
https://github.com/hector-client/hector/blob/master/pom.xml

datastax cql driver use cassandra 2.1.2 so java 7 minimum
https://github.com/datastax/java-driver/blob/2.1/pom.xml

java 7 update release note
http://www.oracle.com/technetwork/java/javase/7u-relnotes-515228.html

features and enhancement
http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html

java 7 in wiki http://en.wikipedia.org/wiki/Java_version_history#Java_SE_7_.28July_28.2C_2011.29

unicode
before upgrading, check if cassandra using different unicode on the data http://www.herongyang.com/Unicode/Java-Unicode-Version-Supported-in-Java-History.html
http://docs.oracle.com/javase/7/docs/technotes/guides/intl/enhancements.7.html
Early versions of the Java SE 7 release added support for Unicode 5.1.0. The final version of the Java SE 7 release supports Unicode 6.0.0. Unicode 6.0.0 is a major version of the Unicode Standard and adds support for over 2000 additional characters, as well as support for properties and data files.

As of the time of checking, we picked java 7 update 72. Upgrading java 6 to java 7 update 72 in the cassandra 1.0.8 is a painless process other than just time consuming. As load per node is huge and total number of nodes in cluster. I follow this steps for java upgrade in cassandra node.

upgrade java for all cassandra node
1. write a script to automatically install java7 on node, update java stacked size to 256k in cassandra-env.sh. set JAVA_HOME for file cassandra.in.sh to java 7.
2. execute the script in rolling fashion for all the node with one upgrade at a time.
3. stop cassandra
4. execute the script.
5. start the cassandra instance
6.0 start the cassandra instance and monitor after the node is up and then check the monitoring system after node elapsed for 30minutes, 60minutes, 1hours and 2hours.
6.1 check your client can read/write to that one upgraded node.

By now, you can perform the next node in the ring, but you can skip step 6.0 as you are sure that it is going to work. One thing I observed is that, the gc duration for cassandra using java 6 and java 7 is it is down by half! That's could means faster gc means more cpu cycle to process other tasks and less stop of the world for cassandra instance.

Leave this cluster with java 7 upgraded run a day or two and if it is okay, continue to cassandra upgrade. So which cassandra version to upgrade to? There are several guidelines I followed.

1. choose ONLY STABLE release for production cluster. How to choose? You should read this link.
2. read NEWS.txt and Changes.txt . As time to time, change to the code base may affect example, the sstable. So pay attention especially between cassandra major upgrade.
3. read the code difference between the version you decided to upgrade too, example for this upgrade. https://github.com/apache/cassandra/compare/cassandra-1.0.8...cassandra-1.0.12
4. read the datastax upgrading node for minor version.

I spent a lot of time doing step 3 and by reading the code diference, realize what has been change and/or added and consider it will impact your cassandra environment. In order for further upgrade to cassandra 1.1, you will need to upgrade to the latest version of the one currently deployed. Example here. Once read the above checkpoints, you may have a lot of questions and TODOs and that will give further works. In the next step, it is best if you find out the questions and TODOs you have and then verify in the test cluster before apply to a production cluster.

For me, I have written a few bash scripts example mentioned above, java upgrade. Also I have written install test cluster for cassandra upgrade. Remember to also write script to snapshot the data directory using nodetool and then also write script to automatically downgrade. When something goes wrong, you can revert using the automatic downgrade script and using the backup from nodetool dump. Then you will need to save the configurations example, cassandra.in.sh, cassandra-env.sh, cassandra.yaml or any other in your environment cluster.

With these scripts written and tested, it is best if you get and acknowledgements from the management if this is to be proceed and also, it would be best if you have someone who is also administer of cassandra cluster with you just for the good and bad moments. ;-) You can also reach me by my follow button in the home page. :)

upgrade cassandra from 1.0.8 to 1.0.12

stop repair and cleanup in all nodes in the cluster.

write a script to automatically upgrade it and so you dont panic, waste time and composed during node upgrade. Trust me, save you a lot of time and human error free. scripts content could be the following:
- download cassandra 1.0.12 and extract, file permission ,etc
- backup current cassandra 1.0.8 using nodetool snapshots. make sure you write the snapshot directory name like MyKeyspace-1.0.8-date
- drain the node.
- stop cassandra if it is not yet stopped.
- update cassandra 1.0.12 with your cluster settings.

check the configuration changed and then start cassandra 1.0.12 new instance.

monitor after the node is up and then check the monitoring system after node elapsed for 30minutes, 60minutes, 1hours and 2hours.

check your client can read/write to that one upgraded node.

By now, you can perform the next node in the ring, but you can skip step 4.0 as you are sure that it is going to work. As the version of the cassandra sstable change in 1.0.10, from hc to hd, it is best all sstables in all nodes, using the hd version before perform the next major upgrade.

That's it for this article and whilst this maybe not cover all, may contain mistake, and/or if you want to comment, please leave your comment below.

Saturday, March 28, 2015

Investigate into apache cassandra corrupt sstable exception

Today, we will take a look at another apache cassandra 1.0.8 exception. Example of stack trace below.

ERROR [SSTableBatchOpen:2] 2015-03-07 06:11:58,544 SSTableReader.java (line 228) Corrupt sstable /var/lib/cassandra/data/MySuperKeyspace/MyColumnFamily-hc-6681=[Index.db, Statistics.db, CompressionInfo.db, Filter.db, Data.db]; skipped
java.io.IOException: Input/output error
	at java.io.RandomAccessFile.readBytes0(Native Method)
	at java.io.RandomAccessFile.readBytes(RandomAccessFile.java:350)
	at java.io.RandomAccessFile.read(RandomAccessFile.java:385)
	at org.apache.cassandra.io.util.RandomAccessReader.reBuffer(RandomAccessReader.java:128)
	at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302)
	at java.io.RandomAccessFile.readFully(RandomAccessFile.java:444)
	at java.io.RandomAccessFile.readFully(RandomAccessFile.java:424)
	at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324)
	at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:393)
	at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:375)
	at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:186)
	at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:224)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Before we go into the code base for this stacktrace, I have no idea what is this about and this one shown when the cassandra 1.0.12 instance is booting up. Last I remember I trigger user defined compaction twice in cassandra 1.0.8 using the same sstables and after first compaction is done, then this sstable stay forever... like for two weeks plus. Then we have upgrade for the cassandra.

Enough said, let's go into the code base and understand what is really mean by corrupt sstable. Bottom of the the stack trace pretty obvious, ThreadPoolExecutor execute a future task run method.Then it is now on apache cassandra namespace codebase, as can be read below class SSTableReader, method batchOpen(), code snippet

    public static Collection<SSTableReader> batchOpen(Set<Map.Entry<Descriptor, Set<Component>>> entries,
                                                      final Set<DecoratedKey> savedKeys,
                                                      final DataTracker tracker,
                                                      final CFMetaData metadata,
                                                      final IPartitioner partitioner)
    {
        final Collection<SSTableReader> sstables = new LinkedBlockingQueue<SSTableReader>();

        ExecutorService executor = DebuggableThreadPoolExecutor.createWithPoolSize("SSTableBatchOpen", Runtime.getRuntime().availableProcessors());
        for (final Map.Entry<Descriptor, Set<Component>> entry : entries)
        {
            Runnable runnable = new Runnable()
            {
                public void run()
                {
                    SSTableReader sstable;
                    try
                    {
                        sstable = open(entry.getKey(), entry.getValue(), savedKeys, tracker, metadata, partitioner);
                    }
                    catch (IOException ex)
                    {
                        logger.error("Corrupt sstable " + entry + "; skipped", ex);
                        return;
                    }
                    sstables.add(sstable);
                }
            };
            executor.submit(runnable);
        }

        executor.shutdown();
        try
        {
            executor.awaitTermination(7, TimeUnit.DAYS);
        }
        catch (InterruptedException e)
        {
            throw new AssertionError(e);
        }

        return sstables;

    }

As can be read above, somewhere within the method open() throw the IOException, hence the above exception was thrown. Two stack trace up, we read that, sstable load method execute and, ByteBufferUtil.read() method. With the method read from class ByteBufferUtil as shown below.

    public static ByteBuffer read(DataInput in, int length) throws IOException
    {
        if (in instanceof FileDataInput)
            return ((FileDataInput) in).readBytes(length);

        byte[] buff = new byte[length];
        in.readFully(buff);
        return ByteBuffer.wrap(buff);
    }

We see that, the input in a instance of FileDataInput stream and read the bytes with length. Since FileDataInput is a interface, we read that, the class that implement this interface is RandomAccessReader class and method readBytes as the follow.

public ByteBuffer readBytes(int length) throws IOException
{
assert length >= 0 : "buffer length should not be negative: " + length;

byte[] buff = new byte[length];
readFully(buff); // reading data buffer

return ByteBuffer.wrap(buff);
}

to read bytes with length is actually to read fully on the length but started on the current file pointer pointing at. And a little bit way up in the stack trace, method reBuffer()

    /**
     * Read data from file starting from current currentOffset to populate buffer.
     * @throws IOException on any I/O error.
     */
    protected void reBuffer() throws IOException
    {
        resetBuffer();

        if (bufferOffset >= channel.size())
            return;

        channel.position(bufferOffset); // setting channel position

        int read = 0;

        while (read < buffer.length)
        {
            int n = super.read(buffer, read, buffer.length - read);
            if (n < 0)
                break;
            read += n;
        }

        validBufferBytes = read;

        bytesSinceCacheFlush += read;

        if (skipIOCache && bytesSinceCacheFlush >= MAX_BYTES_IN_PAGE_CACHE)
        {
            // with random I/O we can't control what we are skipping so
            // it will be more appropriate to just skip a whole file after
            // we reach threshold
            CLibrary.trySkipCache(this.fd, 0, 0);
            bytesSinceCacheFlush = 0;
        }
    }

and this method call superclass to read another chunk into the buffer. The upper class RandomAccessFile , method readBytes()

    /**
     * Reads a sub array as a sequence of bytes.
     * @param b the buffer into which the data is read.
     * @param off the start offset of the data.
     * @param len the number of bytes to read.
     * @exception IOException If an I/O error has occurred.
     */
    private int readBytes(byte b[], int off, int len) throws IOException {
        Object traceContext = IoTrace.fileReadBegin(path);
        int bytesRead = 0;
        try {
            bytesRead = readBytes0(b, off, len);
        } finally {
            IoTrace.fileReadEnd(traceContext, bytesRead == -1 ? 0 : bytesRead);
        }
        return bytesRead;
    }

    private native int readBytes0(byte b[], int off, int len) throws IOException;

.. and we are at the end of this path, it turn out that the call to readBytes0 thrown exception, the lower layer native non java call throwing the IO exception. You can use nodetool scrub to see if this fix the problem but what I do basically wipe the data directory for the cassandra and rebuild it. Then I don't see anymore of this message anymore.

That's it for this article and if you want to improve and/or comment, please leave your input below.

Sunday, March 15, 2015

Investigate into elasticsearch already flushing exception

Today, we will take a look at another elasticsearch issue. The example of message output in the log file happened in one of my node.

[2015-03-02 04:51:30,450][DEBUG][action.admin.indices.flush] [node02] [index_A][1], node[DEF_-ABC], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.admin.indices.flush.FlushRequest@23465ba5]
org.elasticsearch.index.engine.FlushNotAllowedEngineException: [index_A][1] already flushing...
        at org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:825)
        at org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:564)
        at org.elasticsearch.action.admin.indices.flush.TransportFlushAction.shardOperation(TransportFlushAction.java:114)
        at org.elasticsearch.action.admin.indices.flush.TransportFlushAction.shardOperation(TransportFlushAction.java:49)
        at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$2.run(TransportBroadcastOperationAction.java:225)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

We will investigate into the code base lead to the exception above and to decide if this is a problem. Below is the snippet of the "entry" elasticsearch code.

public void run() {
    try {
        onOperation(shard, shardIndex, shardOperation(shardRequest));
    } catch (Throwable e) {
        onOperation(shard, shardIt, shardIndex, e);
    }
}

so we read above, this is an asychronous broadcast action execution happened during performing an action. Code execution path tell that this shard is a local shard and encounter exception during flushing.

    @Override
    protected ShardFlushResponse shardOperation(ShardFlushRequest request) throws ElasticSearchException {
        IndexShard indexShard = indicesService.indexServiceSafe(request.index()).shardSafe(request.shardId());
        indexShard.flush(new Engine.Flush().type(request.full() ? Engine.Flush.Type.NEW_WRITER : Engine.Flush.Type.COMMIT_TRANSLOG).force(request.force()));
        return new ShardFlushResponse(request.index(), request.shardId());
    }

Tracing up in the stack trace, we noticed it is an internal index shard class during flushing of the shard.

    @Override
    public void flush(Engine.Flush flush) throws ElasticSearchException {
        // we allows flush while recovering, since we allow for operations to happen
        // while recovering, and we want to keep the translog at bay (up to deletes, which
        // we don't gc).
        verifyStartedOrRecovering();
        if (logger.isTraceEnabled()) {
            logger.trace("flush with {}", flush);
        }
        long time = System.nanoTime();
        engine.flush(flush);
        flushMetric.inc(System.nanoTime() - time);
    }

As can be read above, we are at the end of the investigation journey here, it turn out that this shard flushing count is greater than one and is false, the flushing count get decrease and exception is thrown and tell that this shard is currently being flush.

As to why this is happening it is puzzling, but I never see this exception anymore, and from the investigation above, it does not look like this is an harm since the shard in concern is flushing currently.

If you think this analysis is not correct or you have better insight, please leave your comment below. Thank you.

Sunday, March 1, 2015

Study logic elasticsearch discovery zen fd ping_timeout and ping_retries

Today, we are going to study two parameter from elasticsearch version 0.90.7, specifically

discovery.zen.fd.ping_timeout

discovery.zen.fd.ping_retries

Let's find the definition from official documentation, here and here.

There are two fault detection processes running. The first is by the master, to ping all the other nodes in the cluster and verify that they are alive. And on the other end, each node pings to master to verify if its still alive or an election process needs to be initiated.

ping_timeout How long to wait for a ping response, defaults to 30s.
ping_retries How many ping failures / timeouts cause a node to be considered failed. Defaults to 3.

So this setting is use by master node and data node for detection if the node is okay or not and once ping, the duration time to wait is 30seconds (default) and if 3 times, a node is considered down/failed if it exceed 3 times. Okay, let's go into the code.

NodesFaultDetection.java

Within the class NodesFaultDetection, there is a inner class SendPingRequest. As it implement interface Runnable, the run method will be execute by an executor. Instead, object are read and write so emulate a ping behaviour, you can read the class PingRequest for more information. As you noticed, ping_timeout is pass to the super class of PingRequest.

The essence of logic is pretty much written in the statement transportService.sendRequest(final DiscoveryNode node, final String action, final TransportRequest request, final TransportRequestOptions options, TransportResponseHandler<T> handler). You would think it would be an ICMP ping, but it is not, there isn't isReachable() is called.

In the method, handleResponse(PingResponse response), we see that, the retry count is reset to 0 and then this SendPingRequest object is schedule again with the ping_interval you set earlier. In the method, handleException(TransportException exp). we see that the variable retryCount is increase by one, and if the current retry count exceed the default 3 times, then the node is considered dead and were removed. If the current retry count is less than the default 3 times, then another ping is send with the same ping_timeout.

MasterFaultDetection.java

Master fault detection is a little different than nodes fault detection. When the public method of MasterFaultDetection start is called and then method innerStart(), object MasterPinger is created.

this.masterPinger = new MasterPinger();
// start the ping process
threadPool.schedule(pingInterval, ThreadPool.Names.SAME, masterPinger);

So there is a periodic ping of default 1 second. When instance of MasterPinger is run, we noticed that it goes through the same process of sending the request using transport service. transportService.sendRequest(final DiscoveryNode node, final String action, final TransportRequest request, final TransportRequestOptions options, TransportResponseHandler<T> handler)
The logic of this request sending is same with NodesFaultDetection. What interesting is the method override in class BaseTransportResponseHandler. In handleResponse, so we see the the retry count is reset back to 0. Then another ping is scheduled.

In the override method handleException(TransportException exp) , so there are three exception check on the master if it no longer a master, or ping to a non master ping a master but does not exists on it. Now at the stage, retry count is increase by one. If current retry count greater than or equal to the default 3 times, then ping to node by this master is falied, this node consider failed. if current retry count less than the default three, another ping is sent.

That's it, if you think this analysis need improvement, please leave your comment below. Thank you.

Saturday, January 3, 2015

apache cassandra 1.0.8 IncompatibleClassChangeError vtable stub and AssertionError Added column does not sort as the last column

Today we will spend sometime to look into two errors and see if it is really something to concern about. The erros are thrown when apache cassandra version 1.0.8 is running. Okay, let's to the first error.

ERROR [ReadStage:1559] 2012-10-16 20:38:25,336 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReadStage:1559,5,main]
java.lang.IncompatibleClassChangeError: vtable stub
        at org.apache.cassandra.db.AbstractColumnContainer.getColumn(AbstractColumnContainer.java:134)
        at org.apache.cassandra.db.Memtable$6.computeNext(Memtable.java:402)
        at org.apache.cassandra.db.Memtable$6.computeNext(Memtable.java:384)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at org.apache.cassandra.db.CollationController.collectTimeOrderedData(CollationController.java:93)
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1298)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1184)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1151)
        at org.apache.cassandra.db.Table.getRow(Table.java:375)
        at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:58)
        at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:765)
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1224)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

I guess this is really some fatal errors encountered. Let's check out what is IncompatibleClassChangeError means. From IncompatibleClassChangeError javadoc

Thrown when an incompatible class change has occurred to some class definition. The definition of some class, on which the currently executing method depends, has since changed.

Honestly, this is really odd, this error thrown out of no where and the jvm stopped. I guess nobody change the compiled apache cassandra code during the node instance is running. It certainly sounds odd but I guess when the data hold per node is huge, then strange thing start to happened. Now, let's get into stack trace and follow stack trace path.

Bottom three stack traces are pretty obvious, a new thread was started and execute by the thread pool executor. Then we have class StorageProxy. Within this class, there is a static class LocalReadRunnable which implement the abstract method LocalReadRunnable. It seem like it is trying to read a local node table row. The table eventually make calls to the column family to retrieve column. Tracing even deeper, at line 134 of class AbstractColumnContainer, reveal that no exception is thrown from here. This is like a mystery! :) The stack trace analysis and observed jvm stopped shown something is wrong. Though I am not sure what went wrong but if you have any idea, please discuss it as a comment below.

Next we look to another error.

ERROR [CompactionExecutor:4] 2014-10-22 06:13:00,884 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[CompactionExecutor:4,1,main]
java.lang.AssertionError: Added column does not sort as the last column
        at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:126)
        at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:129)
        at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:124)
        at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:148)
        at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:232)
        at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:110)
        at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:97)
        at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:137)
        at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:102)
        at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:87)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
        at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:135)
        at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:115)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

Again, this error pop out of nowhere, during node operation, this exception just thrown. So now, we will look into what happened using the stack trace given. Similar to the above, bottom three stack traces are pretty obvious, a new thread was started and execute by the thread pool executor. It is a compaction thread and when compacting sstable, things break. Then row get reduced and columns get deserialized. When columns are re-formation, it throw an exception as the column does not sort. This is another strange phenomenon. It should have been sort before anyway but it is not. The javdoc for this method addColumn

AddColumn throws an exception if the column added does not sort after the last column in the map. The reasoning is that this implementation can get slower if too much insertions are done in unsorted order and right now we only use it when *all* insertion (with this method) are done in sorted order. The assertion throwing is thus a protection against performance regression without knowing about (we can revisit that decision later if we have use cases where most insert are in sorted order but a few are not).

It seem that this is probably reproducible during development environment as a few are not.. I guess this is not as fatal as previous error. That's it for this analysis, please give your inputs or comments if you have workaround.

UPDATE:

It seem that the second error is fix in https://issues.apache.org/jira/browse/CASSANDRA-5856

Pages