Showing posts with label gc. Show all posts
Showing posts with label gc. Show all posts

Saturday, November 21, 2015

Java Garbage Collector

If you are a java developer, java garbage collection (gc) sometime pop up from time to time in javadoc, online article or online discussion. It is such a hot and tough topic because that is entirely different paradigm than what programmer usually do, that is coding. Java gc free heap for the object you created in class in the background. In the past, I also cover a few article which related to java gc and today I am thinking to go through several blogs/articles which I found online, learn the basic and share what I've learned  and hopefully for java programmer, java gc will become clearer.

When you start a java application, with the parameters that are assigned to the java, the operating system will reserved some memory for java application known as heap. The heap further divided into several regions collectively known as eden, survivor spaces, old gen and perm gens. In oracle java8 hotspot, perm gen has been removed, be sure to always check official documention on garbage collector for changes. Below are a few links for hotspot implementation for java gc.
Survivor spaces are divided into two, survivor 0 and survivor 1. Both eden and survivor spaces collectively known as Young generation or new generation whilst old gen also known as tenured generation. Garbage collections will happened on young generation and old generations. Below are two diagrams show the heap regions are divided.

While the concept of Garbage Collection is the same, the implementation is not and neither are the default settings or how to tune it. The well known jvm includes the oracle sun hotspot, oracle jrockit and ibm j9. You can find the other jvm lists here. Essentially garbage collection will perform on young generation and old generation to remove object on heap that has no valid reference.

common java parameters settings. For full list, issue the command java -X

-Xms initial java heap size
-Xmx maximum java heap size
-Xmn the size of the heap for the young generation

There are a few type of GC
- serial gc
- parallel gc
- parallel old gc
- cms gc 

You can specify what gc implementation to run on the java heap region.

If you run a server application, the metric exposed by gc is definitely to watch out for. In order to get the metric, you can use

That's it for this brief introduction.

Sunday, October 25, 2015

Learning Java Eden Space

If you have been a java developer and you should came across java garbage collection that free the object created by your application from occupied all the java heap. In today article, we will look into java heap and particular into java eden space. First, let's look at the general java heap.

From this StackOverflow

Heap memory

The heap memory is the runtime data area from which the Java VM allocates memory for all class instances and arrays. The heap may be of a fixed or variable size. The garbage collector is an automatic memory management system that reclaims heap memory for objects.

Eden Space: The pool from which memory is initially allocated for most objects.

Survivor Space: The pool containing objects that have survived the garbage collection of the Eden space.

Tenured Generation: The pool containing objects that have existed for some time in the survivor space.

When you created a new object, jvm allocate a part of the heap for your object. Visually, it is something as of following.

                   |     |  
   <-minor gc->    v     v   <------------- major gc---------------------->  
   |            |     |     |                                              |             |
   | Eden       | S0  | S1  |  Tenure Generation                           | Perm gen    |
   |            |     |     |                                              |             |
    <---------------------jvm heap (-Xms -Xmx)----------------------------> -XX:PermSize  
    <-- young gen(-Xmn)---->                                                -XX:MaxPermSize  

When eden space is fill with object and minor gc is performed, survive objects will copy to either survivor spaces; s0 or s1. At a time, one of the survivor space is empty. Because the eden space are relatively small in comparison to the tenure generation, hence, the gc that happened in eden space is quick.  Eden and both survivors spaces are also known as young or new generation.

To understand into how young generation heap get free, this article provided detail explanation.

The Sun/Oracle HotSpot JVM further divides the young generation into three sub-areas: one large area named "Eden" and two smaller "survivor spaces" named "From" and "To". As a rule, new objects are allocated in "Eden" (with the exception that if a new object is too large to fit into "Eden" space, it will be directly allocated in the old generation). During a GC, the live objects in "Eden" first move into the survivor spaces and stay there until they have reached a certain age (in terms of numbers of GCs passed since their creation), and only then they are transferred to the old generation. Thus, the role of the survivor spaces is to keep young objects in the young generation for a little longer than just their first GC, in order to be able to still collect them quickly should they die soon afterwards.
Based on the assumption that most of the young objects may be deleted during a GC, a copying strategy ("copy collection") is being used for young generation GC. At the beginning of a GC, the survivor space "To" is empty and objects can only exist in "Eden" or "From". Then, during the GC, all objects in "Eden" that are still being referenced are moved into "To". Regarding "From", the still referenced objects in this space are handled depending on their age. If they have not reached a certain age ("tenuring threshold"), they are also moved into "To". Otherwise they are moved into the old generation. At the end of this copying procedure, "Eden" and "From" can be considered empty (because they only contain dead objects), and all live objects in the young generation are located in "To". Should "to" fill up at some point during the GC, all remaining objects are moved into the old generation instead (and will never return). As a final step, "From" and "To" swap their roles (or, more precisely, their names) so that "To" is empty again for the next GC and "From" contains all remaining young objects.

As you can observed based on the visual diagram above, you can set the amount of heap for the eden and survivor space using -Xmn in the java parameter. There is also -XX:SurvivorRatio=ratio and you can find further information here for java8. Note that in the diagram above, Perm gen has been removed in java8, hence always refer find out what java run your application and refer to the right version of java documentation.

If you want to monitor the statistics of eden , you can use jstats. Previously I have written an article about jstat and you can read here what is jstat and how to use it. You can also enable gc log statistics and so jvm will write the gc statistics into a file, you can further read more here.

Till then we meet again in the next article. Please consider donate, thank you!

Saturday, April 26, 2014

study gc parameters in cassandra 1.0.8

Today we are going to study the GC parameter in the file
. Below are the GC parameter extracted from cassandra 1.0.8 environment file . So let's study them one by one what is the parameter means and what can be change.
# GC tuning options
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

# GC logging options -- uncomment to enable
# JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
# JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"
# JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"
# JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"
# JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"
# JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1"
# JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"


Use parallel algorithm for young space collection.


Use Concurrent Mark-Sweep GC in the old generation


Ratio of eden/survivor space size. The default value is 8


Max value for tenuring threshold.


Percentage CMS generation occupancy to start a CMS collection cycle (A negative value means that CMSTirggerRatio is used).


Only use occupancy as a criterion for starting a CMS collection.



Print more elaborated GC info


Print date stamps at garbage collection events (e.g. 2011-09-08T14:20:29.557+0400: [GC... )


heap layout before and after each GC


Print detailed demography of young space after each collection


Print the time the application has been stopped


Print additional diagnostic information following promotion failure


Print additional info concerning free lists


Redirects GC output to file instead of console

The first part of GC tuning is geared toward which GC strategy to use in cassandra. The second GC tuning is more toward fine tune GC logging example timestamp, heap layaout, etc. If you want to get even more challenging, I end this article by providing a few good links for your further references.

Last but not least, if you are happy reading this and learn something, please remember to donate too.

Friday, October 18, 2013

Allocate jvm heap more than 8GB for cassandra

What happen if you allocate jvm heap more than 8GB for cassandra instance? With my past experience, we allocate more than 16GB for the cassandra instance and it is still running fine. But occasionally we encounter performance issue when we increase more than 16GB to the heap. Google a little and found this doc

Excessive heap space size

DataStax recommends using the default heap space size for most use
cases. Exceeding this size can impair the Java virtual machine's
(JVM) ability to perform fluid garbage collections (GC). The
following table shows a comparison of heap space performances
reported by a Cassandra user:

Heap CPU utilization Queries per second Latency
40 GB 50% 750 1 second
8 GB 5% 8500 (not maxed out) 10 ms

For information on heap sizing, see Tuning Java resources.

As the benchmark indicate, the more heap you allocate, the higher the cpu usage is. Though the performance decrease is not linear but rather exponentially. So it is wise to keep the heap at 8GB or not more than 50% of that value. It is not deadly but it certainly decrease the performance of the cluster dramatically which would render it useless. If you encountered memory error in the log, in this situation, apart from other factors, it is better if you consider scale your cluster horizontally, that is adding more nodes to increase the capacity. But a quick workaround should you encounter memory error, the

So what happen really happen in the gc if high heap is allocated? well, excerpt from the guru,

..the concurrent mark/sweep phase runs concurrently with your
application. CMS will cause a stop-the-world full pause it it fails to
complete a CMS sweep in time and you hit the maximum heap size, but
unless that happens, CMS will run concurrently (though there are
stop-the-world pauses involved, that are typically very short, the
mark/sweep phase is concurrent).

Hence, if you really hit stop the world situation, this would render the node useless, because the node is too busy doing gc that, cassandra would not be able to perform.