Sunday, June 7, 2015

code path learning on elasticsearch monitoring jvm logging

Today we are going to study the following info logging from elasticsearch 0.90.7.

1:  [2015-03-25 00:45:07,008][INFO ][monitor.jvm       ] [node03] [gc][ParNew][649825][391294] duration [829ms], collections [1]/[1.1s], total [829ms]/[2.8h], memory [8.4gb]->[8.5gb]/[14gb], all_pools {[Code Cache] [14.4mb]->[14.4mb]/[48mb]}{[Par Eden Space] [34.4mb]->[460.1kb]/[133.1mb]}{[Par Survivor Space] [16.6mb]->[16.6mb]/[16.6mb]}{[CMS Old Gen] [8.3gb]->[8.4gb]/[13.8gb]}{[CMS Perm Gen] [43.5mb]->[43.5mb]/[82mb]}  
2:  [2015-03-25 00:45:11,529][INFO ][monitor.jvm       ] [node03] [gc][ParNew][649829][391299] duration [921ms], collections [1]/[1.3s], total [921ms]/[2.8h], memory [8.7gb]->[8.8gb]/[14gb], all_pools {[Code Cache] [14.4mb]->[14.4mb]/[48mb]}{[Par Eden Space] [203.2kb]->[4.1mb]/[133.1mb]}{[Par Survivor Space] [16.6mb]->[16.6mb]/[16.6mb]}{[CMS Old Gen] [8.7gb]->[8.8gb]/[13.8gb]}{[CMS Perm Gen] [43.5mb]->[43.5mb]/[82mb]}  
3:  [2015-03-25 00:45:13,800][INFO ][monitor.jvm       ] [node03] [gc][ParNew][649831][391301] duration [744ms], collections [1]/[1.1s], total [744ms]/[2.8h], memory [8.9gb]->[9gb]/[14gb], all_pools {[Code Cache] [14.4mb]->[14.4mb]/[48mb]}{[Par Eden Space] [5.7mb]->[537.8kb]/[133.1mb]}{[Par Survivor Space] [16.6mb]->[16.6mb]/[16.6mb]}{[CMS Old Gen] [8.9gb]->[9gb]/[13.8gb]}{[CMS Perm Gen] [43.5mb]->[43.5mb]/[82mb]}  
4:  [2015-03-25 00:45:15,088][INFO ][monitor.jvm       ] [node03] [gc][ParNew][649832][391302] duration [891ms], collections [1]/[1.2s], total [891ms]/[2.8h], memory [9gb]->[9.1gb]/[14gb], all_pools {[Code Cache] [14.4mb]->[14.4mb]/[48mb]}{[Par Eden Space] [537.8kb]->[5.4mb]/[133.1mb]}{[Par Survivor Space] [16.6mb]->[16.6mb]/[16.6mb]}{[CMS Old Gen] [9gb]->[9.1gb]/[13.8gb]}{[CMS Perm Gen] [43.5mb]->[43.5mb]/[82mb]}  
5:  [2015-03-25 00:45:17,287][INFO ][monitor.jvm       ] [node03] [gc][ParNew][649834][391304] duration [770ms], collections [1]/[1.1s], total [770ms]/[2.8h], memory [9.2gb]->[9.3gb]/[14gb], all_pools {[Code Cache] [14.4mb]->[14.4mb]/[48mb]}{[Par Eden Space] [359.7kb]->[357.3kb]/[133.1mb]}{[Par Survivor Space] [16.6mb]->[16.6mb]/[16.6mb]}{[CMS Old Gen] [9.2gb]->[9.3gb]/[13.8gb]}{[CMS Perm Gen] [43.5mb]->[43.5mb]/[82mb]}  
6:  [2015-03-25 00:45:18,531][INFO ][monitor.jvm       ] [node03] [gc][ParNew][649835][391305] duration [713ms], collections [1]/[1.2s], total [713ms]/[2.8h], memory [9.3gb]->[9.4gb]/[14gb], all_pools {[Code Cache] [14.4mb]->[14.4mb]/[48mb]}{[Par Eden Space] [357.3kb]->[441.9kb]/[133.1mb]}{[Par Survivor Space] [16.6mb]->[16.6mb]/[16.6mb]}{[CMS Old Gen] [9.3gb]->[9.4gb]/[13.8gb]}{[CMS Perm Gen] [43.5mb]->[43.5mb]/[82mb]}  

Okay, before we go into this codes, let's analyze based on just the output above. We have info logging, nothing very seriously but as a sign we should already pay attention. Happened on one of the node, node03 with class file from package monitor.jvm. Let's take the first line in the log above and format nicely and so we can analyze further. Read below.

 [2015-03-25 00:45:07,008][INFO ][monitor.jvm       ] [node03] [gc]  
   
 [ParNew][649825][391294] duration [829ms],   
 collections [1]/[1.1s],   
 total [829ms]/[2.8h],   
 memory [8.4gb]->[8.5gb]/[14gb],   
 all_pools {[Code Cache]          [14.4mb]->[ 14.4mb ] / [   48mb]}  
           {[Par Eden Space]      [34.4mb]->[460.1kb ] / [133.1mb]}  
           {[Par Survivor Space]  [16.6mb]->[ 16.6mb ] / [ 16.6mb]}  
           {[CMS Old Gen]         [8.3gb] ->[  8.4gb ] / [ 13.8gb]}  
           {[CMS Perm Gen]        [43.5mb]->[ 43.5mb ] / [   82mb]}  

From just the above output, it become even clearer, the gc par new collection duration is 829milliseconds and one collection of 1.1seconds. The heap before was 8.4GB and after collected became 8.5GB with a total heap of 14GB. Then we have all the pools jvm statistics revealed. Pretty obvious, before and after with total memory heap assigned for the pool respectively.

Now, we will read into the code and verify if our assumption are valid. The class that log this is JvmMonitorService.java and we noticed it fall in the if else second evaluation clause.

Again, interpretation based on the code, it is an info logging with gc name is par new. Sequence of this gc is 649825. In this collection, total collection for par new is 391294 and its total collections time is 391294 milliseconds (6 minutes 31 seconds). Note that this collections time is derived with the difference in the last gc and this gc.

Next, the collections total counts between last count and this count. It is follow by the collections time in milliseconds. In this example, we have one collections performed in  1.1seconds.. hmm... Then we have the total time in milliseconds (829 milliseconds) with the gc collections time and in the output, it is showing 2.8 hours!

For the next jvm statistics, we have memory/heap statistics. So it is different than we interpret based on the log before, this statistics shown in the last jvm memory used and then current jvm used. The different is about one GB. The total memory/heap allocated for this jvm is 14GB.

The last statistics is the jvm pool statistics. Notice that the pool statistics is built such that the previous jvm collection run and this jvm collection run were built. The logic of output is such that the previous jvm pool used and then current jvm pool used with current pool max heap allocated. Let's take one from the pool for discussion, {[CMS Old Gen]        [8.3gb] ->[  8.4gb ] / [ 13.8gb]} . So cms old gen pool previous jvm usage was 8.3gb and in this jvm collection, the current usage is 8.4GB. This cms old gen max used is 13.8GB.

That's it and if you think this analysis is not correct, please leave your comment below and make the necessary correction.

No comments:

Post a Comment