Friday, February 27, 2015

how to determine currently occupied queue size and cache usage in elasticsearch 0.90

Have you encounter situation like, when a elasticsearch client is indexing into elasticsearch cluster, the client get rejected exception from the cluster? What about if you have cache some filters in your query and you want to know how much memory is used at the moment? If yes, and you are using elasticsearch 0.90, then you come to the right place. I'm going to show you how to show these statistics through elasticsearch exposed metric API. This is important if you want to determine the health of your cluster.

Okay, let's start with the first one, how to get the occupied queue size in the node cluster.
[jason@node009 ~]$ curl -XGET 'http://localhost:9200/_nodes/node009/stats/thread_pool?pretty'
{
"cluster_name" : "MY_TEST_Cluster",
"nodes" : {
"1111111111111111111111" : {
"timestamp" : 1422372473667,
"name" : "node009",
"transport_address" : "inet[my.private.ip.com/1.2.3.4:9300]",
"hostname" : "node009.foobar.com",
"thread_pool" : {
"generic" : {
"threads" : 1,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 82,
"completed" : 6378594
},
"index" : {
"threads" : 8,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 8,
"completed" : 25735782
},
"get" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"snapshot" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 4,
"completed" : 1003286
},
"merge" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 4,
"completed" : 4863710
},
"suggest" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"bulk" : {
"threads" : 8,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 8,
"completed" : 42148
},
"optimize" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"warmer" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 4,
"completed" : 2087615
},
"flush" : {
"threads" : 3,
"queue" : 0,
"active" : 1,
"rejected" : 0,
"largest" : 4,
"completed" : 10492
},
"search" : {
"threads" : 512,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 512,
"completed" : 245843
},
"percolate" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"management" : {
"threads" : 5,
"queue" : 0,
"active" : 1,
"rejected" : 0,
"largest" : 5,
"completed" : 2082438
},
"refresh" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 4,
"completed" : 1521727
}
}
}
}
}

As can be read above, this is node statistics and only thread pools stats are exposed. So if the client are actively index into elasticsearch, the metric you should look at is index. So the above sample look pretty good, there is no queue and no rejection.

Next, we will take a look at cache usage. Using the same node stats api but change the thread_pool to indices.
[jason@node009 ~]$ curl -XGET 'http://localhost:9200/_nodes/node009/stats/indices?pretty'
{
"cluster_name" : "MY_TEST_Cluster",
"nodes" : {
"1111111111111111111111" : {
"timestamp" : 1422373322128,
"name" : "node009",
"transport_address" : "inet[my.private.ip.com/1.2.3.4:9300]",
"hostname" : "node009.foobar.com",
"indices" : {
"docs" : {
"count" : 134502646,
"deleted" : 104806463
},
"store" : {
"size" : "340.9gb",
"size_in_bytes" : 366092384499,
"throttle_time" : "2.1ms",
"throttle_time_in_millis" : 2
},
"indexing" : {
"index_total" : 25692998,
"index_time" : "6.8h",
"index_time_in_millis" : 24495073,
"index_current" : 22015,
"delete_total" : 13217673,
"delete_time" : "14.6m",
"delete_time_in_millis" : 877101,
"delete_current" : 0
},
"get" : {
"total" : 0,
"get_time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"open_contexts" : 1,
"query_total" : 204027,
"query_time" : "1.9h",
"query_time_in_millis" : 6856699,
"query_current" : 0,
"fetch_total" : 34409,
"fetch_time" : "2.4m",
"fetch_time_in_millis" : 148210,
"fetch_current" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 563950,
"total_time" : "7.5h",
"total_time_in_millis" : 27166425,
"total_docs" : 219150610,
"total_size" : "374.6gb",
"total_size_in_bytes" : 402324404903
},
"refresh" : {
"total" : 1522907,
"total_time" : "10.5h",
"total_time_in_millis" : 38110904
},
"flush" : {
"total" : 10499,
"total_time" : "2.4h",
"total_time_in_millis" : 8726951
},
"warmer" : {
"current" : 0,
"total" : 2089352,
"total_time" : "9.3m",
"total_time_in_millis" : 560985
},
"filter_cache" : {
"memory_size" : "2.8gb",
"memory_size_in_bytes" : 3011274608,
"evictions" : 35449
},
"id_cache" : {
"memory_size" : "0b",
"memory_size_in_bytes" : 0
},
"fielddata" : {
"memory_size" : "140.5mb",
"memory_size_in_bytes" : 147415629,
"evictions" : 86803
},
"completion" : {
"size" : "231b",
"size_in_bytes" : 231
},
"segments" : {
"count" : 700
}
}
}
}
}

As seem above, there are two cache metrics, filter cache and id cache. Right now it is pretty clear, how much this cache is used in this node and how much evictions happened. There is also a metric, fielddata which is occupied memory in the jvm, you might want to keep an eye during monitoring. If you want to know exactly what field using how much memory, you can use this api
curl localhost:9200/_nodes/stats/indices/fielddata/field1,field2?pretty

But this one is left for you to play with as a home work. Hints, to replace field1 and field2 to the value you index and read the output. That's it. :-)

No comments:

Post a Comment