Information Technology Blogs: garbage collector

Showing posts with label garbage collector. Show all posts

Saturday, November 21, 2015

Java Garbage Collector

If you are a java developer, java garbage collection (gc) sometime pop up from time to time in javadoc, online article or online discussion. It is such a hot and tough topic because that is entirely different paradigm than what programmer usually do, that is coding. Java gc free heap for the object you created in class in the background. In the past, I also cover a few article which related to java gc and today I am thinking to go through several blogs/articles which I found online, learn the basic and share what I've learned and hopefully for java programmer, java gc will become clearer.

When you start a java application, with the parameters that are assigned to the java, the operating system will reserved some memory for java application known as heap. The heap further divided into several regions collectively known as eden, survivor spaces, old gen and perm gens. In oracle java8 hotspot, perm gen has been removed, be sure to always check official documention on garbage collector for changes. Below are a few links for hotspot implementation for java gc.

Survivor spaces are divided into two, survivor 0 and survivor 1. Both eden and survivor spaces collectively known as Young generation or new generation whilst old gen also known as tenured generation. Garbage collections will happened on young generation and old generations. Below are two diagrams show the heap regions are divided.

While the concept of Garbage Collection is the same, the implementation is not and neither are the default settings or how to tune it. The well known jvm includes the oracle sun hotspot, oracle jrockit and ibm j9. You can find the other jvm lists here. Essentially garbage collection will perform on young generation and old generation to remove object on heap that has no valid reference.

common java parameters settings. For full list, issue the command java -X

-Xms initial java heap size

-Xmx maximum java heap size

-Xmn the size of the heap for the young generation

There are a few type of GC

- serial gc

- parallel gc

- parallel old gc

- cms gc

- g1 gc

You can specify what gc implementation to run on the java heap region.

If you run a server application, the metric exposed by gc is definitely to watch out for. In order to get the metric, you can use

* jstat

* gc logging

That's it for this brief introduction.

Friday, July 31, 2015

Attempting to understand java garbage collect statistics

If you have been develop large java application, at times troubleshooting application can go as deep as looking into garbage collector when application is running. Unfortunately the statistics are just too much to begin to investigate into or trying to understand it. At least for me, it is pretty mundane and I seek your help too if you came across this article and please leave comment.

There are very few documentation describe how are these statistics should be interpreted. There is this from oracle blog which is dated year 2006, pretty outdated to be relevant but nonetheless, it analyze line by line. More recent article from alexy ragozin and poonam bajaj are worth to take a look too.

The gc statistics should be able to regenerate using these parameter to the java command line. -XX:+PrintGCDetails -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 , and the following are snippets extracted from a production machine. Let's take a look at them line by line.

 Before GC:  
 Statistics for BinaryTreeDictionary:  
 ------------------------------------  
 Total Free Space: 230400  
 Max  Chunk Size: 230400  
 Number of Blocks: 1  
 Av. Block Size: 230400  
 Tree   Height: 1  
 586945.492: [ParNew  
 Desired survivor size 41943040 bytes, new threshold 1 (max 1)  
 - age  1:  10038008 bytes,  10038008 total  
 : 660426K->10292K(737280K), 0.0353470 secs] 9424156K->8774094K(12500992K)After GC:  
 Statistics for BinaryTreeDictionary:  
 ------------------------------------  
 Total Free Space: 127053189  
 Max  Chunk Size: 21404293  
 Number of Blocks: 125654  
 Av. Block Size: 1011  
 Tree   Height: 36  
   
   
   
 After GC:  
 Statistics for BinaryTreeDictionary:  
 ------------------------------------  
 Total Free Space: 230400  
 Max  Chunk Size: 230400  
 Number of Blocks: 1  
 Av. Block Size: 230400  
 Tree   Height: 1  
 , 0.0359540 secs] [Times: user=0.26 sys=0.00, real=0.03 secs]   
 Heap after GC invocations=550778 (full 2090):  
  par new generation  total 737280K, used 10292K [0x00000004fae00000, 0x000000052ce00000, 0x000000052ce00000)  
  eden space 655360K,  0% used [0x00000004fae00000, 0x00000004fae00000, 0x0000000522e00000)  
  from space 81920K, 12% used [0x0000000522e00000, 0x000000052380d360, 0x0000000527e00000)  
  to  space 81920K,  0% used [0x0000000527e00000, 0x0000000527e00000, 0x000000052ce00000)  
  concurrent mark-sweep generation total 11763712K, used 8763801K [0x000000052ce00000, 0x00000007fae00000, 0x00000007fae00000)  
  concurrent-mark-sweep perm gen total 40952K, used 24563K [0x00000007fae00000, 0x00000007fd5fe000, 0x0000000800000000)  
 }  
 Total time for which application threads were stopped: 0.0675660 seconds  
 {Heap before GC invocations=550778 (full 2090):  
  par new generation  total 737280K, used 11677K [0x00000004fae00000, 0x000000052ce00000, 0x000000052ce00000)  
  eden space 655360K,  0% used [0x00000004fae00000, 0x00000004faf5a220, 0x0000000522e00000)  
  from space 81920K, 12% used [0x0000000522e00000, 0x000000052380d360, 0x0000000527e00000)  
  to  space 81920K,  0% used [0x0000000527e00000, 0x0000000527e00000, 0x000000052ce00000)  
  concurrent mark-sweep generation total 11763712K, used 8763801K [0x000000052ce00000, 0x00000007fae00000, 0x00000007fae00000)  
  concurrent-mark-sweep perm gen total 40952K, used 24563K [0x00000007fae00000, 0x00000007fd5fe000, 0x0000000800000000)

We can summarize the statistics above with the following points.

* the statistics generated above is from java hotspot and the source code can be foudn here https://github.com/openjdk-mirror/jdk7u-hotspot/blob/master/src/share/vm/gc_implementation/concurrentMarkSweep/binaryTreeDictionary.cpp#L1098-L1112

* there are two statistics, before gc and after gc and this is not full gc.

* before gc, we notice the max chunk size is equal to the total free space, so we assume there is no usage.

* before gc, we also noticed that the total free space has 127053189 and max chunk size is 21404293

* after gc, cpu usage is spent on user 0.26 and real 0.03.

* after gc, from region usage 12% of the heap.

* after gc, concurrent mark sweep generation total of 11,763,712k whilst concurrent mark sweep permanent generation total is 40,954k and used only 24,563K

* total time this application stop were 0.0675660 seconds.

So we can guess that this gc snippet is good. it is not a full gc and usage does not increase to 100%. There is no failure/error appear anywhere. The total time stop is trivial too, less than a second.

That's it and if you think this analysis is wrong and/or can be improve upon, please leave your message below. I would like to learn more too.

Friday, October 18, 2013

Allocate jvm heap more than 8GB for cassandra

What happen if you allocate jvm heap more than 8GB for cassandra instance? With my past experience, we allocate more than 16GB for the cassandra instance and it is still running fine. But occasionally we encounter performance issue when we increase more than 16GB to the heap. Google a little and found this doc

Excessive heap space size

DataStax recommends using the default heap space size for most use 
cases. Exceeding this size can impair the Java virtual machine's 
(JVM) ability to perform fluid garbage collections (GC). The 
following table shows a comparison of heap space performances 
reported by a Cassandra user:

Heap 	CPU utilization 	Queries per second 	Latency
40 GB 	50% 	                750             	1 second
8 GB 	5% 	                8500 (not maxed out) 	10 ms

For information on heap sizing, see Tuning Java resources.

As the benchmark indicate, the more heap you allocate, the higher the cpu usage is. Though the performance decrease is not linear but rather exponentially. So it is wise to keep the heap at 8GB or not more than 50% of that value. It is not deadly but it certainly decrease the performance of the cluster dramatically which would render it useless. If you encountered memory error in the log, in this situation, apart from other factors, it is better if you consider scale your cluster horizontally, that is adding more nodes to increase the capacity. But a quick workaround should you encounter memory error, the

So what happen really happen in the gc if high heap is allocated? well, excerpt from the guru,

..the concurrent mark/sweep phase runs concurrently with your
application. CMS will cause a stop-the-world full pause it it fails to
complete a CMS sweep in time and you hit the maximum heap size, but
unless that happens, CMS will run concurrently (though there are
stop-the-world pauses involved, that are typically very short, the
mark/sweep phase is concurrent).

Hence, if you really hit stop the world situation, this would render the node useless, because the node is too busy doing gc that, cassandra would not be able to perform.

http://www.mail-archive.com/user@cassandra.apache.org/msg17481.html
http://www.mail-archive.com/user@cassandra.apache.org/msg32312.html

Pages