Friday, June 30, 2017

How to start apache cassandra in eclipse?

This quick tutorial served as a guidance on how to start cassandra in eclipse. It has assumption that you had git clone apache cassandra and then setup in eclipse.

With that said, point your mouse cursor to 'Run' menu, then 'Run Configurations...'. In the popup window, left tree menu, select 'Java Application' and create a new application, see screenshot below,

Main class : org.apache.cassandra.service.CassandraDaemon

Click on the 'Arguments' tab , and the provide the arguments to start cassandra.

-Xms1024M -Xmx1024M -Xmn220M -Xss256k -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCondCardMark -javaagent:./lib/jamm-0.3.0.jar

click on Apply and then Run button!

Check the 'Progress' window and see cassandra is starting.

Last but not least, when you are done, remember to terminate the jvm.

Sunday, June 18, 2017

setup local cassandra repository coexist with cassandra upstream

For the past several years, apache cassandra has always been in my working realm. From start of cassandra 0.8 until cassandra 1.2 to beyond cassandra 3.9 (as of this time), I have been using it by modelling data in cassandra, inserting, retrieving, administrating and maintenance of production cluster.

So to take a step further, I thought of going into cassandra development. In this article, I will describe how I got source from cassandra official git repository and setup my own repository in github , so to coexist both of them.

Previously I have done git clone

 user@localhost:~/cassandra-trunk$ git remote -v  
 origin (fetch)  
 origin (push)  

So pretty much usual. Now, let's change to the follow
* origin point to github repository
* upstream point to github repository

First, let's remove the remote origin and then add my repository in github. Of cause, create an empty repository in github first before you continue following.

 user@localhost:~/cassandra-trunk$ git remote remove origin  
 user@localhost:~/cassandra-trunk$ git remote add origin  

Now, let's check the remote origin.

 user@localhost:~/cassandra-trunk$ git remote -v  
 origin (fetch)  
 origin (push)  

okay, everything is on track and expected. Now let's add upstream to the cassandra git repository.

 user@localhost:~/cassandra-trunk$ git remote add upstream  

and then we check again.

 user@localhost:~/workspace/StudyCassandra/cassandra-trunk$ git remote -v  
 origin (fetch)  
 origin (push)  
 upstream (fetch)  
 upstream (push)  

OK! everything is good to go. okay.. now that we have two remote repositories, so how should we continue to work further? Now, when we pull, we have to first specify where to pull from and what branch to pull. In the following example, we pull from upstream on the trunk (master) branch.

 user@localhost:~/cassandra-trunk$ git pull upstream trunk  
  * branch      trunk   -> FETCH_HEAD  
 Already up-to-date.  

beautiful, now we can pull from upstream and into our working repository. Let's push our repository into github now.

 user@localhost:~/cassandra-trunk$ git push -u origin trunk  
 Counting objects: 266270, done.  
 Delta compression using up to 8 threads.  
 Compressing objects: 100% (42703/42703), done.  
 Writing objects: 100% (266270/266270), 136.16 MiB | 547.00 KiB/s, done.  
 Total 266270 (delta 160130), reused 265316 (delta 159364)  
 remote: Resolving deltas: 100% (160130/160130), done.  
  * [new branch]   trunk -> trunk  
 Branch trunk set up to track remote branch trunk from origin.  

okay, that's it, one more step ahead.

Saturday, June 17, 2017

how to debug remote cassandra with eclipse

This tutorial is written such a way you install cassandra using debian package and want to debug remote apache cassandra instance with another workstation with eclipse install. With that said, let's start.

* uncomment the following lines in this cassandra environment file /etc/cassandra/ . The cassandra environment for debian is located at the path specified.

 # JVM_OPTS="$JVM_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=1414"  

* restart apache cassandra instance and verify the above jvm options are shown in ps output.

 $ ps aux | grep cassandra | grep --color 1414  
 cassand+ 26718 44.8 29.8 1556048 1232024 ?   SLl 19:19  2:28 /usr/lib/jvm/jdk1.8.0_45//bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSWaitDuration=10000 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:+CMSClassUnloadingEnabled -Xms1024M -Xmx1024M -Xmn200M -ea -Xss256k -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:StringTableSize=1000003 -XX:+UseTLAB -XX:+ResizeTLAB -XX:+PerfDisableSharedMem -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=1414 -Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC -Djava.library.path=/usr/share/cassandra/lib/sigar-bin -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra -Dcassandra.storagedir=/var/lib/cassandra -Dcassandra-pidfile=/var/run/cassandra/ -cp /etc/cassandra:/usr/share/cassandra/lib/ST4-4.0.8.jar:/usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/cassandra/lib/asm-5.0.4.jar:/usr/share/cassandra/lib/cassandra-driver-core-3.0.0-beta1-bb1bce4-SNAPSHOT-shaded.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/ecj-4.4.2.jar:/usr/share/cassandra/lib/guava-18.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.3.0.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/jgrapht-core-0.9.1.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/cassandra/lib/joda-time-2.4.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.2.jar:/usr/share/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/cassandra/lib/logback-classic-1.1.3.jar:/usr/share/cassandra/lib/logback-core-1.1.3.jar:/usr/share/cassandra/lib/lz4-1.3.0.jar:/usr/share/cassandra/lib/metrics-core-3.1.0.jar:/usr/share/cassandra/lib/metrics-logback-3.1.0.jar:/usr/share/cassandra/lib/netty-all-4.0.23.Final.jar:/usr/share/cassandra/lib/ohc-core-0.4.2.jar:/usr/share/cassandra/lib/ohc-core-j8-0.4.2.jar:/usr/share/cassandra/lib/reporter-config-base-3.0.0.jar:/usr/share/cassandra/lib/reporter-config3-3.0.0.jar:/usr/share/cassandra/lib/sigar-1.6.4.jar:/usr/share/cassandra/lib/slf4j-api-1.7.7.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java- -XX:HeapDumpPath=/var/lib/cassandra/java_1473247174.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1473247174.log org.apache.cassandra.service.CassandraDaemon# JVM_OPTS="$JVM_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=1414"  

* okay looks good, now let's move on to the eclipse. See the screenshot below, connect to it. To get to the debu configuration window, click on 'Run' in the menu, 'Debug Configurations...' on the pop up window, right click 'Remote Java Application' and click new.

* check in the machine that run cassandra instance.

 user@workstation:~$ sudo netstat -tupan | grep 1414  
 tcp    0   0   ESTABLISHED 26718/java   

That's it! When you are done, remember to terminate the debug process.

Friday, June 16, 2017

Journey on cassandra development part1

To go into cassandra development has always been part of my wish during spare time but due to hectic work and life, this wish never really get kick off.... Until now. I found a very fine documentation and just start immediately once the spare time is available. In this article, I will describe my journey on developing in cassandra.

Let's clone the cassandra repository using git. Wait a while on my slow line at 436KBps with 136MB.

 user@localhost:~/$ git clone cassandra-trunk  
 Cloning into 'cassandra-trunk'...  
 remote: Counting objects: 273406, done.  
 remote: Compressing objects: 100% (45601/45601), done.  
 remote: Total 273406 (delta 164011), reused 269444 (delta 161446)  
 Receiving objects: 100% (273406/273406), 136.60 MiB | 436.00 KiB/s, done.  
 Resolving deltas: 100% (164011/164011), done.  
 Checking connectivity... done.  

Once repository was cloned, let's check out the directory.

 user@localhost:~/$ cd cassandra-trunk/  
 user@localhost:~/cassandra-trunk$ ls  
 total 596K  
 -rw-r--r-- 1 user user 3.5K Sep 6 21:28 README.asc  
 -rw-r--r-- 1 user user 2.8K Sep 6 21:28 NOTICE.txt  
 -rw-r--r-- 1 user user 99K Sep 6 21:28 NEWS.txt  
 -rw-r--r-- 1 user user 12K Sep 6 21:28 LICENSE.txt  
 -rw-r--r-- 1 user user 1.3K Sep 6 21:28  
 -rw-r--r-- 1 user user 320K Sep 6 21:28 CHANGES.txt  
 drwxr-xr-x 3 user user 4.0K Sep 6 21:28 conf  
 -rw-r--r-- 1 user user 91K Sep 6 21:28 build.xml  
 -rw-r--r-- 1 user user 516 Sep 6 21:28  
 drwxr-xr-x 2 user user 4.0K Sep 6 21:28 bin  
 drwxr-xr-x 4 user user 4.0K Sep 6 21:28 doc  
 drwxr-xr-x 3 user user 4.0K Sep 6 21:28 debian  
 drwxr-xr-x 3 user user 4.0K Sep 6 21:28 interface  
 drwxr-xr-x 3 user user 4.0K Sep 6 21:28 ide  
 drwxr-xr-x 4 user user 4.0K Sep 6 21:28 examples  
 -rw-r--r-- 1 user user 5.8K Sep 6 21:28  
 drwxr-xr-x 3 user user 4.0K Sep 6 21:28 pylib  
 drwxr-xr-x 5 user user 4.0K Sep 6 21:28 lib  
 drwxr-xr-x 6 user user 4.0K Sep 6 21:28 src  
 drwxr-xr-x 9 user user 4.0K Sep 6 21:28 test  
 drwxr-xr-x 4 user user 4.0K Sep 6 21:28 tools  
 user@localhost:~/cassandra-trunk$ git branch  
 * trunk  
 user@localhost:~/cassandra-trunk$ git branch -a  
 * trunk  
  remotes/origin/HEAD -> origin/trunk  

okay, looks almost the same sets of files when you downloaded the binary tarball package except this is much more. But that is expected because we are  in development environment. As of this moment, we have cassandra up to version 3.9

Before we go further, let's checkout the package required by cassandra.

 user@localhost:~/cassandra-trunk$ ant -version  
 Apache Ant(TM) version 1.9.7 compiled on May 16 2016  
 user@localhost:~/cassandra-trunk$ export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_45/  

So I'm using trunk cassandra , ant version 1.9.7 and oracle jdk version8 update 45. Eclipse I'm a bit behind, I'm still sticking with eclipse luna, I should  upgrade to eclipse neon soon!

Okay, our tools are all checked at this point, let's start ant build.

 user@localhost:~/cassandra-trunk$ ant  
 Buildfile: /home/user/cassandra-trunk/build.xml  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/classes/main  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/classes/thrift  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/test/lib  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/test/classes  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/test/stress-classes  
   [mkdir] Created dir: /home/user/cassandra-trunk/src/gen-java  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/lib  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/jacoco  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/jacoco/partials  
    [echo] Downloading Maven ANT Tasks...  
    [get] Getting:  
    [get] To: /home/user/cassandra-trunk/build/maven-ant-tasks-2.1.3.jar  
    [copy] Copying 1 file to /home/user/.m2/repository/org/apache/maven/maven-ant-tasks/2.1.3  
 [artifact:dependencies] Downloading: com/datastax/cassandra/cassandra-driver-core/3.0.1/cassandra-driver-core-3.0.1.pom from repository central at  
 [artifact:dependencies] Transferring 15K from central  
 [artifact:dependencies] Downloading: com/datastax/cassandra/cassandra-driver-parent/3.0.1/cassandra-driver-parent-3.0.1.pom from repository central at  
 [artifact:dependencies] Transferring 18K from central  
 [artifact:dependencies] Downloading: io/dropwizard/metrics/metrics-core/3.1.2/metrics-core-3.1.2.pom from repository central at  
 [artifact:dependencies] Transferring 1K from central  
 [artifact:dependencies] Downloading: io/dropwizard/metrics/metrics-parent/3.1.2/metrics-parent-3.1.2.pom from repository central at  
 [artifact:dependencies] Transferring 12K from central  
 [artifact:dependencies] Downloading: org/slf4j/slf4j-api/1.7.7/slf4j-api-1.7.7.pom from repository central at  
 [artifact:dependencies] Transferring 3K from central  
 [artifact:dependencies] Downloading: org/slf4j/slf4j-parent/1.7.7/slf4j-parent-1.7.7.pom from repository central at  
 [artifact:dependencies] Transferring 12K from central  
 [artifact:dependencies] Downloading: org/eclipse/jdt/core/compiler/ecj/4.4.2/ecj-4.4.2.pom from repository central at  
 [artifact:dependencies] Transferring 2K from central  
 [artifact:dependencies] Downloading: org/caffinitas/ohc/ohc-core/0.4.4/ohc-core-0.4.4.pom from repository central at  
 [artifact:dependencies] Transferring 11K from central  
 [artifact:dependencies] Downloading: org/caffinitas/ohc/ohc-parent/0.4.4/ohc-parent-0.4.4.pom from repository central at  
 [artifact:dependencies] Transferring 16K from central  
 [artifact:dependencies] Downloading: org/caffinitas/ohc/ohc-core-j8/0.4.4/ohc-core-j8-0.4.4.pom from repository central at  
 [artifact:dependencies] Transferring 5K from central  
 [artifact:dependencies] Downloading: org/openjdk/jmh/jmh-core/1.13/jmh-core-1.13.pom from repository central at  
 [artifact:dependencies] Transferring 10K from central  
 [artifact:dependencies] Downloading: org/openjdk/jmh/jmh-parent/1.13/jmh-parent-1.13.pom from repository at  
 [artifact:dependencies] Unable to locate resource in repository  
 [artifact:dependencies] [INFO] Unable to find resource 'org.openjdk.jmh:jmh-parent:pom:1.13' in repository (  
 [artifact:dependencies] Downloading: org/openjdk/jmh/jmh-parent/1.13/jmh-parent-1.13.pom from repository central at  
 [artifact:dependencies] Transferring 6K from central  
 [artifact:dependencies] Downloading: net/sf/jopt-simple/jopt-simple/4.6/jopt-simple-4.6.pom from repository central at  
 [artifact:dependencies] Transferring 11K from central  
 [artifact:dependencies] Downloading: org/apache/commons/commons-math3/3.2/commons-math3-3.2.pom from repository central at  
 [artifact:dependencies] Transferring 17K from central  
 [artifact:dependencies] Downloading: org/openjdk/jmh/jmh-generator-annprocess/1.13/jmh-generator-annprocess-1.13.pom from repository central at  
 [artifact:dependencies] Transferring 4K from central  
 [artifact:dependencies] Downloading: net/ju-n/compile-command-annotations/compile-command-annotations/1.2.0/compile-command-annotations-1.2.0.pom from repository central at  
 [artifact:dependencies] Transferring 5K from central  
 [artifact:dependencies] Downloading: net/ju-n/net-ju-n-parent/32/net-ju-n-parent-32.pom from repository central at  
 [artifact:dependencies] Transferring 21K from central  
 [artifact:dependencies] Downloading: org/apache/ant/ant-junit/1.9.4/ant-junit-1.9.4.pom from repository central at  
 [artifact:dependencies] Transferring 4K from central  
 [artifact:dependencies] Downloading: org/caffinitas/ohc/ohc-core/0.4.4/ohc-core-0.4.4.jar from repository central at  
 [artifact:dependencies] Downloading: org/apache/ant/ant-junit/1.9.4/ant-junit-1.9.4.jar from repository central at  
 [artifact:dependencies] Downloading: org/apache/commons/commons-math3/3.2/commons-math3-3.2.jar from repository central at  
 [artifact:dependencies] Downloading: io/dropwizard/metrics/metrics-core/3.1.2/metrics-core-3.1.2.jar from repository central at  
 [artifact:dependencies] Downloading: net/ju-n/compile-command-annotations/compile-command-annotations/1.2.0/compile-command-annotations-1.2.0.jar from repository central at  
 [artifact:dependencies] Transferring 132K from central  
 [artifact:dependencies] Transferring 16K from central  
 [artifact:dependencies] Transferring 110K from central  
 [artifact:dependencies] Transferring 1653K from central  
 [artifact:dependencies] Downloading: org/eclipse/jdt/core/compiler/ecj/4.4.2/ecj-4.4.2.jar from repository central at  
 [artifact:dependencies] Transferring 2256K from central  
 [artifact:dependencies] Downloading: net/sf/jopt-simple/jopt-simple/4.6/jopt-simple-4.6.jar from repository central at  
 [artifact:dependencies] Transferring 115K from central  
 [artifact:dependencies] Downloading: com/datastax/cassandra/cassandra-driver-core/3.0.1/cassandra-driver-core-3.0.1-shaded.jar from repository central at  
 [artifact:dependencies] Transferring 61K from central  
 [artifact:dependencies] Transferring 2388K from central  
 [artifact:dependencies] Downloading: org/openjdk/jmh/jmh-core/1.13/jmh-core-1.13.jar from repository central at  
 [artifact:dependencies] Transferring 454K from central  
 [artifact:dependencies] Downloading: org/caffinitas/ohc/ohc-core-j8/0.4.4/ohc-core-j8-0.4.4.jar from repository central at  
 [artifact:dependencies] Transferring 5K from central  
 [artifact:dependencies] Downloading: org/openjdk/jmh/jmh-generator-annprocess/1.13/jmh-generator-annprocess-1.13.jar from repository central at  
 [artifact:dependencies] Transferring 30K from central  
 ^Y[artifact:dependencies] Building ant file: /home/user/cassandra-trunk/build/build-dependencies.xml  
 [artifact:dependencies] Downloading: io/netty/netty-all/4.0.39.Final/netty-all-4.0.39.Final.pom from repository central at  
 [artifact:dependencies] Transferring 17K from central  
 [artifact:dependencies] Downloading: io/netty/netty-parent/4.0.39.Final/netty-parent-4.0.39.Final.pom from repository central at  
 [artifact:dependencies] Transferring 44K from central  
 [artifact:dependencies] Downloading: io/netty/netty-all/4.0.39.Final/netty-all-4.0.39.Final.jar from repository central at  
 [artifact:dependencies] Transferring 2218K from central  
 [artifact:dependencies] Downloading: com/datastax/cassandra/cassandra-driver-core/3.0.1/cassandra-driver-core-3.0.1-sources.jar from repository central at  
 [artifact:dependencies] Transferring 552K from central  
 [artifact:dependencies] Downloading: io/dropwizard/metrics/metrics-core/3.1.2/metrics-core-3.1.2-sources.jar from repository central at  
 [artifact:dependencies] Transferring 52K from central  
 [artifact:dependencies] Downloading: org/slf4j/slf4j-api/1.7.12/slf4j-api-1.7.12-sources.jar from repository central at  
 [artifact:dependencies] Transferring 50K from central  
 [artifact:dependencies] Downloading: io/netty/netty-all/4.0.39.Final/netty-all-4.0.39.Final-sources.jar from repository central at  
 [artifact:dependencies] Transferring 1749K from central  
 [artifact:dependencies] Downloading: org/eclipse/jdt/core/compiler/ecj/4.4.2/ecj-4.4.2-sources.jar from repository central at  
 [artifact:dependencies] Transferring 1724K from central  
 [artifact:dependencies] Downloading: org/caffinitas/ohc/ohc-core/0.4.4/ohc-core-0.4.4-sources.jar from repository central at  
 [artifact:dependencies] Transferring 83K from central  
 [artifact:dependencies] Downloading: org/openjdk/jmh/jmh-core/1.13/jmh-core-1.13-sources.jar from repository central at  
 [artifact:dependencies] Transferring 357K from central  
 [artifact:dependencies] Downloading: net/sf/jopt-simple/jopt-simple/4.6/jopt-simple-4.6-sources.jar from repository central at  
 [artifact:dependencies] Transferring 73K from central  
 [artifact:dependencies] Downloading: org/apache/commons/commons-math3/3.2/commons-math3-3.2-sources.jar from repository central at  
 [artifact:dependencies] Transferring 1958K from central  
 [artifact:dependencies] Downloading: org/openjdk/jmh/jmh-generator-annprocess/1.13/jmh-generator-annprocess-1.13-sources.jar from repository central at  
 [artifact:dependencies] Transferring 26K from central  
 [artifact:dependencies] Downloading: net/ju-n/compile-command-annotations/compile-command-annotations/1.2.0/compile-command-annotations-1.2.0-sources.jar from repository central at  
 [artifact:dependencies] Transferring 15K from central  
 [artifact:dependencies] Downloading: org/apache/ant/ant-junit/1.9.4/ant-junit-1.9.4-sources.jar from repository central at  
 [artifact:dependencies] Transferring 92K from central  
 [artifact:dependencies] Downloading: org/apache/ant/ant/1.9.4/ant-1.9.4-sources.jar from repository central at  
 [artifact:dependencies] Transferring 1889K from central  
 [artifact:dependencies] Downloading: org/apache/ant/ant-launcher/1.9.4/ant-launcher-1.9.4-sources.jar from repository central at  
 [artifact:dependencies] Transferring 19K from central  
 [artifact:dependencies] Building ant file: /home/user/cassandra-trunk/build/build-dependencies-sources.xml  
    [copy] Copying 62 files to /home/user/cassandra-trunk/build/lib/jars  
    [copy] Copying 17 files to /home/user/cassandra-trunk/build/lib/sources  
 [artifact:dependencies] Downloading: org/jacoco/org.jacoco.agent/ from repository central at  
 [artifact:dependencies] Transferring 3K from central  
 [artifact:dependencies] Downloading: org/jacoco/ from repository central at  
 [artifact:dependencies] Transferring 36K from central  
 [artifact:dependencies] Downloading: org/jacoco/org.jacoco.ant/ from repository central at  
 [artifact:dependencies] Transferring 2K from central  
 [artifact:dependencies] Downloading: org/jacoco/org.jacoco.core/ from repository central at  
 [artifact:dependencies] Transferring 1K from central  
 [artifact:dependencies] Downloading: org/ow2/asm/asm-debug-all/5.0.1/asm-debug-all-5.0.1.pom from repository central at  
 [artifact:dependencies] Transferring 2K from central  
 [artifact:dependencies] Downloading: org/ow2/asm/asm-parent/5.0.1/asm-parent-5.0.1.pom from repository central at  
 [artifact:dependencies] Transferring 5K from central  
 [artifact:dependencies] Downloading: org/jacoco/ from repository central at  
 [artifact:dependencies] Transferring 1K from central  
 [artifact:dependencies] Downloading: org/jboss/byteman/byteman/3.0.3/byteman-3.0.3.pom from repository central at  
 [artifact:dependencies] Transferring 91K from central  
 [artifact:dependencies] Downloading: org/jboss/byteman/byteman-root/3.0.3/byteman-root-3.0.3.pom from repository central at  
 [artifact:dependencies] Transferring 18K from central  
 [artifact:dependencies] Downloading: org/jboss/byteman/byteman-submit/3.0.3/byteman-submit-3.0.3.pom from repository central at  
 [artifact:dependencies] Transferring 2K from central  
 [artifact:dependencies] Downloading: org/jboss/byteman/byteman-bmunit/3.0.3/byteman-bmunit-3.0.3.pom from repository central at  
 [artifact:dependencies] Transferring 6K from central  
 [artifact:dependencies] Downloading: org/jboss/byteman/byteman-install/3.0.3/byteman-install-3.0.3.pom from repository central at  
 [artifact:dependencies] Transferring 2K from central  
 [artifact:dependencies] Downloading: org/jacoco/org.jacoco.agent/ from repository central at  
 [artifact:dependencies] Downloading: org/ow2/asm/asm-debug-all/5.0.1/asm-debug-all-5.0.1.jar from repository central at  
 [artifact:dependencies] Downloading: org/jboss/byteman/byteman/3.0.3/byteman-3.0.3.jar from repository central at  
 [artifact:dependencies] Transferring 251K from central  
 [artifact:dependencies] Transferring 743K from central  
 [artifact:dependencies] Transferring 371K from central  
 [artifact:dependencies] Downloading: org/jboss/byteman/byteman-submit/3.0.3/byteman-submit-3.0.3.jar from repository central at  
 [artifact:dependencies] Transferring 14K from central  
 [artifact:dependencies] Downloading: org/jboss/byteman/byteman-bmunit/3.0.3/byteman-bmunit-3.0.3.jar from repository central at  
 [artifact:dependencies] Downloading: org/jacoco/org.jacoco.ant/ from repository central at  
 [artifact:dependencies] Transferring 38K from central  
 [artifact:dependencies] Transferring 37K from central  
 [artifact:dependencies] Downloading: org/jboss/byteman/byteman-install/3.0.3/byteman-install-3.0.3.jar from repository central at  
 [artifact:dependencies] Transferring 9K from central  
 [artifact:dependencies] Downloading: org/jacoco/org.jacoco.core/ from repository central at  
 [artifact:dependencies] Transferring 130K from central  
 [artifact:dependencies] Downloading: org/jacoco/ from repository central at  
 [artifact:dependencies] Transferring 137K from central  
    [copy] Copying 9 files to /home/user/cassandra-trunk/build/lib/jars  
   [unzip] Expanding: /home/user/cassandra-trunk/build/lib/jars/org.jacoco.agent- into /home/user/cassandra-trunk/build/lib/jars  
    [echo] Building Grammar /home/user/cassandra-trunk/src/antlr/Cql.g ...  
 [artifact:dependencies] Downloading: com/datastax/wikitext/wikitext-core-ant/1.3/wikitext-core-ant-1.3.pom from repository central at  
 [artifact:dependencies] Transferring 3K from central  
 [artifact:dependencies] Downloading: org/fusesource/wikitext/wikitext-core/1.3/wikitext-core-1.3.pom from repository central at  
 [artifact:dependencies] Transferring 2K from central  
 [artifact:dependencies] Downloading: org/fusesource/wikitext/wikitext-project/1.3/wikitext-project-1.3.pom from repository central at  
 [artifact:dependencies] Transferring 4K from central  
 [artifact:dependencies] Downloading: org/fusesource/fusesource-pom/1.3/fusesource-pom-1.3.pom from repository central at  
 [artifact:dependencies] Transferring 13K from central  
 [artifact:dependencies] Downloading: org/fusesource/wikitext/textile-core/1.3/textile-core-1.3.pom from repository central at  
 [artifact:dependencies] Transferring 1K from central  
 [artifact:dependencies] Downloading: org/fusesource/wikitext/wikitext-core/1.3/wikitext-core-1.3.jar from repository central at  
 [artifact:dependencies] Downloading: com/datastax/wikitext/wikitext-core-ant/1.3/wikitext-core-ant-1.3.jar from repository central at  
 [artifact:dependencies] Transferring 237K from central  
 [artifact:dependencies] Transferring 32K from central  
 [artifact:dependencies] Downloading: org/fusesource/wikitext/textile-core/1.3/textile-core-1.3.jar from repository central at  
 [artifact:dependencies] Transferring 54K from central  
   [jflex] Generated:  
    [echo] apache-cassandra: /home/user/cassandra-trunk/build.xml  
   [javac] Compiling 45 source files to /home/user/cassandra-trunk/build/classes/thrift  
   [javac] Note: /home/user/cassandra-trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/ uses or overrides a deprecated API.  
   [javac] Note: Recompile with -Xlint:deprecation for details.  
   [javac] Note: Some input files use unchecked or unsafe operations.  
   [javac] Note: Recompile with -Xlint:unchecked for details.  
   [javac] Compiling 1459 source files to /home/user/cassandra-trunk/build/classes/main  
   [javac] Note: Processing compiler hints annotations  
   [javac] Note: Processing compiler hints annotations  
   [javac] Note: Writing compiler command file at META-INF/hotspot_compiler  
   [javac] Note: Done processing compiler hints annotations  
   [javac] Note: Some input files use or override a deprecated API.  
   [javac] Note: Recompile with -Xlint:deprecation for details.  
   [javac] Note: Some input files use unchecked or unsafe operations.  
   [javac] Note: Recompile with -Xlint:unchecked for details.  
   [javac] Creating empty /home/user/cassandra-trunk/build/classes/main/org/apache/cassandra/hints/package-info.class  
   [mkdir] Created dir: /home/user/cassandra-trunk/src/resources/org/apache/cassandra/config  
 [propertyfile] Creating new property file: /home/user/cassandra-trunk/src/resources/org/apache/cassandra/config/  
    [copy] Copying 18 files to /home/user/cassandra-trunk/build/classes/main  
    [copy] Copying 1 file to /home/user/cassandra-trunk/conf  
   [javac] Compiling 481 source files to /home/user/cassandra-trunk/build/test/classes  
   [javac] Note: Some input files use or override a deprecated API.  
   [javac] Note: Recompile with -Xlint:deprecation for details.  
   [javac] Note: Some input files use unchecked or unsafe operations.  
   [javac] Note: Recompile with -Xlint:unchecked for details.  
    [copy] Copying 22 files to /home/user/cassandra-trunk/build/test/classes  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/classes/stress  
   [javac] Compiling 118 source files to /home/user/cassandra-trunk/build/classes/stress  
   [javac] Note: Some input files use or override a deprecated API.  
   [javac] Note: Recompile with -Xlint:deprecation for details.  
   [javac] Note: Some input files use unchecked or unsafe operations.  
   [javac] Note: Recompile with -Xlint:unchecked for details.  
    [copy] Copying 1 file to /home/user/cassandra-trunk/build/classes/stress  
    [copy] Copying 1 file to /home/user/cassandra-trunk/build/classes/main/META-INF  
    [copy] Copying 1 file to /home/user/cassandra-trunk/build/classes/thrift/META-INF  
    [copy] Copying 1 file to /home/user/cassandra-trunk/build/classes/main/META-INF  
    [copy] Copying 1 file to /home/user/cassandra-trunk/build/classes/thrift/META-INF  
    [jar] Building jar: /home/user/cassandra-trunk/build/apache-cassandra-thrift-3.10-SNAPSHOT.jar  
    [jar] Building jar: /home/user/cassandra-trunk/build/apache-cassandra-3.10-SNAPSHOT.jar  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/classes/stress/META-INF  
   [mkdir] Created dir: /home/user/cassandra-trunk/build/tools/lib  
    [jar] Building jar: /home/user/cassandra-trunk/build/tools/lib/stress.jar  
 Total time: 4 minutes 45 secondsuser@localhost:~/cassandra-trunk$ ant -version  
 Apache Ant(TM) version 1.9.7 compiled on May 16 2016  
 user@localhost:~/cassandra-trunk$ export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_45/  

Wow, build is success! Next launch your eclipse and create cassandra eclipse development descriptors.

 user@localhost:~/cassandra-trunk$ ant generate-eclipse-files  
 Buildfile: /home/user/cassandra-trunk/build.xml  
    [echo] Loading dependency paths from file: /home/user/cassandra-trunk/build/build-dependencies.xml  
    [echo] Loading dependency paths from file: /home/user/cassandra-trunk/build/build-dependencies-sources.xml  
   [unzip] Expanding: /home/user/cassandra-trunk/build/lib/jars/org.jacoco.agent- into /home/user/cassandra-trunk/build/lib/jars  
    [echo] apache-cassandra: /home/user/cassandra-trunk/build.xml  
 [propertyfile] Updating property file: /home/user/cassandra-trunk/src/resources/org/apache/cassandra/config/  
    [copy] Copying 1 file to /home/user/cassandra-trunk/build/classes/main  
   [mkdir] Created dir: /home/user/cassandra-trunk/.settings  
 Total time: 1 second  

Everything is good and assuming eclipse is also started, now let's import the eclipse settings. Using this sequence,'File' ->  'Import...' -> 'Existing Projects into Workspace'. Click, next and in the 'Select root directory:' textbox , browse to the git repository cloned earlier and select it. Leave the remaining as is and click Finish.

If you get error, fix it :D for me, the default java configured for eclipse is 7 and hence, code compile is java 7, so I have to change that and move java lib to the top of the project. Particulary you can move to java lib in 'Order and Export' in project's properties.

There are two class file which give compile errors, which I have to commented out the code.

since both classes are unit test, I don't actually bothered.

Okay, we take a pause here, in the next article, I will explain testing in cassandra. Stay tune!

Saturday, June 18, 2016

yet another upgrade to cassandra virtual nodes fail

Recently I was assigned a project to upgrade cassandra from 1.1 to 1.2 (I know it is ancient cassandra but who cares? we just want it to work and cassandra deliver just that) and one of the main feature of cassandra 1.2 is the virtual nodes.

Although there is a red warning note in this instruction, but I took sometime to investigate it knowing that we not enable bleeding edge technology or home based customized the cassandra code. If you selecting cassandra in 1.2 for your upgrade and you want to try on virtual nodes upgrade as well, choose one less version than 1.2.19. why? read here

I started three nodes cassandra 1.2.18 in sandbox environment where I can safely test the cassandra upgrade from 1.1 to 1.2 and after upgraded that, upgrade to virtual nodes.

1:  [user@localhost ~]$ sudo ./cassandra-shuffle -h -p 7210 create  
2:  Token                   From      To         
3:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~  
4:  73107539768170373009709388315418951678     
5:  169033493463981801837600797832317151914     
6:  136467407567251362951457524855448709801     
7:  133808951575681531205649910734888020649     
8:  75544457760442718776699701259266250066     
11:  [user@localhost ~]$ sudo ./cassandra-shuffle -h -p 7210 enable  
12:  [user@localhost ~]$ sudo ./cassandra-shuffle -h -p 7210 ls  
13:  Token                   Endpoint    Requested at  
14:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
15:  159285821494892418769639546056927958356    Tue Feb 02 16:12:41 MYT 2016  
16:  91938269708456681209179988336057166505    Tue Feb 02 16:12:41 MYT 2016  
17:  74436767763955288882613195375699296254    Tue Feb 02 16:12:41 MYT 2016  
18:  103901321670520924065314251878580267688    Tue Feb 02 16:12:41 MYT 2016  

1:  [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h -p 7210 status  
2:  Datacenter: datacenter1  
3:  ===================  
4:  Status=Up/Down  
5:  |/ State=Normal/Leaving/Joining/Moving  
6:  -- Address  Load    Tokens Owns (effective) Host ID                Rack  
7:  UN 20.01 GB  256   100.0%      ba12301a-2e3f-49d2-bb5b-e125e91fcd1b rack1  
8:  UN 20.01 GB  1    100.0%      e09705bf-01b1-423a-863c-c425a7796e51 rack1  
9:  UN 20.01 GB  1    100.0%      937f97ce-a55f-4785-8636-809456123a63 rack1  
10:  [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h -p 7210 ring  
12:  Datacenter: datacenter1  
13:  ==========  
14:  Replicas: 257  
16:  Address  Rack    Status State  Load      Owns        Token                      
17:                                       169919645461171745752870002539170714965     
18: 1e     Up   Normal 20.01 GB    100.00%       0                        
19: 1e     Up   Normal 20.01 GB    100.00%       56713727820156410577229101238628035242     
20: 1e     Up   Normal 20.01 GB    100.00%       113427455640312821154458202477256070485     
21: 1e     Up   Normal 20.01 GB    100.00%       113648993639610307133275503653969461247     
22: 1e     Up   Normal 20.01 GB    100.00%       113870531638907793112092804830682852010     
23: 1e     Up   Normal 20.01 GB    100.00%       114092069638205279090910106007396242772     
24: 1e     Up   Normal 20.01 GB    100.00%       114313607637502765069727407184109633535     
25: 1e     Up   Normal 20.01 GB    100.00%       114535145636800251048544708360823024297     
26: 1e     Up   Normal 20.01 GB    100.00%       114756683636097737027362009537536415060     
27: 1e     Up   Normal 20.01 GB    100.00%       114978221635395223006179310714249805823     
28:  ...  
29:  ...  
32:  shuffling ongoing  
33:  [user@localhost ~]$ sudo ./cassandra-shuffle -h -p 7210 ls | wc -l  
34:  767  
35:  [user@localhost ~]$ sudo ./cassandra-shuffle -h -p 7210 ls | wc -l  
36:  764  
38:  [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h -p 7210 netstats  
39:  Mode: RELOCATING  
40:  Not sending any streams.  
41:  Not receiving any streams.  
42:  Read Repair Statistics:  
43:  Attempted: 0  
44:  Mismatch (Blocking): 0  
45:  Mismatch (Background): 0  
46:  Pool Name          Active  Pending   Completed  
47:  Commands            n/a     0      154  
48:  Responses            n/a     0      3310  
49:  [user@localhost ~]$ apache-cassandra-1.2.18/bin/nodetool -h -p 7210 compactionstats  
50:  pending tasks: 0  
51:  Active compaction remaining time :    n/a  
52:  [user@localhost ~]$   

As you can read above, I have created a shuffling process and enable it. The tokens started to change to 256 and the shuffling count suddenly coming down. I thought hey man, this can actually work! happily I announce to the team, looks like we able to migrate to cassandra vnodes.

However, on the next morning, when I check the upgrade process, oh gosh, the upgrade goes into a loop it seems.

3:   WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:29,594 (line 120) Pausing until token count stabilizes (target=256, actual=282)  
4:   WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:30,836 (line 120) Pausing until token count stabilizes (target=256, actual=282)  
5:   WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:32,667 (line 120) Pausing until token count stabilizes (target=256, actual=282)  
6:   WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:33,339 (line 120) Pausing until token count stabilizes (target=256, actual=282)  
7:   WARN [ScheduledRangeXfers:0] 2016-02-03 05:14:34,582 (line 120) Pausing until token count stabilizes (target=256, actual=282)  
11:       if (res.size() < 1)  
12:      {  
13:        LOG.debug("No queued ranges to transfer");  
14:        return;  
15:      }  
17:      if (!isReady())  
18:        return;  
20:      UntypedResultSet.Row row = res.iterator().next();  
22:      Date requestedAt = row.getTimestamp("requested_at");  
23:      ByteBuffer tokenBytes = row.getBytes("token_bytes");  
24:      Token token = StorageService.getPartitioner().getTokenFactory().fromByteArray(tokenBytes);  
26:"Initiating transfer of {} (scheduled at {})", token, requestedAt.toString());  
27:      try  
28:      {  
29:        StorageService.instance.relocateTokens(Collections.singleton(token));  
30:      }  
31:      catch (Exception e)  
32:      {  
33:        LOG.error("Error removing {}: {}", token, e);   
34:      }  
35:      finally  
36:      {  
37:        LOG.debug("Removing queued entry for transfer of {}", token);  
38:        processInternal(String.format("DELETE FROM system.%s WHERE token_bytes = '%s'",  
39:                       SystemTable.RANGE_XFERS_CF,  
40:                       ByteBufferUtil.bytesToHex(tokenBytes)));  
41:      }  
42:    }    
44:    private boolean isReady()  
45:    {    
46:      int targetTokens = DatabaseDescriptor.getNumTokens();  
47:      int highMark = (int)Math.ceil(targetTokens + (targetTokens * .10));  
48:      int actualTokens = StorageService.instance.getTokens().size();  
50:      if (actualTokens >= highMark)  
51:      {  
52:        LOG.warn("Pausing until token count stabilizes (target={}, actual={})", targetTokens, actualTokens);  
53:        return false;  
54:      }  
56:      return true;  
57:    }   

The shuffling counts stay at 744, it is unfortunately we have to stay with the non vnodes technology. If you have success virtual nodes upgrade, please leave your comment below like what version path you taken and what shuffling steps you taken to successfully upgrade c* cluster to vnodes.

I end this article with the steps I have taken. If you intend to upgrade to vnodes, I suggest don't waste time and might as well spin up a new cluster if more and more upgrade is not possible. One comes to mind now is the partitioner (random to murmur3) and vnodes technology.

  • stop automatic cassandra maintenace.
  • make sure data consistent.

1. change in all server cassandra.yaml
initial_tokens to empty
rolling restart all server

2. cassandra-shuffle create

3. cassandra-shuffle enable

4. cassandra-shuffle ls

5. periodic checks.
check in log,
check in nodetool netstats
user@localhost ~$ sudo ./cassandra-shuffle -h -p 7210 ls | wc -l

Sunday, May 10, 2015

My journey and experience on upgrading apache cassandra from version1.0.12 to 1.1.12

If you have read my previous post on apache cassandra upgrade, this is another journey to major upgrade apache cassandra from version 1.0 to 1.1. In this article, I will share on my experience on upgrading cassandra from version 1.0.12 to 1.1.12.

The sstable version used by cassandra 1.0.12 is hd  and you should ensure that all nodes sstables become hd before proceed upgrade to a newer version of cassandra.

First, let read some highlight of cassandra 1.1

  • api version 19.33.0

  • new file,

  • new directory structure for sstable and filename change for sstable.

  • more features/improvement to nodetool such as compactionstats has remaining timestamp, calculate exact size required for cleanup operations, you can now stop compaction, rangekeysample, getsstables, repair print progress, etc.

  • global key and row cache.

  • cql 3.0 beta

  • schema change for cassandra in caching.

  • libthrift version 0.7.0.

  • sstable hf version.

  • default compressor become snappy compressor.

  • a lot of improvement to level compression strategy.

  • sliced_buffer_size_in_kb option has been removed from the cassandra.yaml configuration file (this option was a no-op since 1.0).

  • thread stack size increased to 160k

  • added flag UseTLAB for jvm to improve read speed.

As this is a newer version of cassandra compare the previous, it is always good to setup a test node and so you can play around and get familiar with it before actually doing the upgrade. With this new node, you can also quickly test with your application client which write and/or read to the test cassandra node. It is also recommended to do some load test to see the result is what you have expected.

If you want to be extremely careful on the upgrade, then reading the code changes between the version you chose to upgrade is always recommended. This is the link for this upgrade  and I know and understand as there are huge differences in betweeen them, so you should split as small as possible to read through it. You can learn a lot from the experience coder if you spend a lot of time reading their code and you can learn new technology too. It is a daunting huge tasks but if you willing to spend sometime to read them, the benefits return is just too much to even describe here.

If you upgrade from 1.0.12 to 1.1.12, cassandra 1.1 is smart enough to move the sstable into new directory structure. So, it ease your job that you do not need to move the sstable into the new directory structure. When the new cassandra 1.1.12 starting up, it will move for you.

So you might want to consider prepare the configuration file for your cluster environment before hand. For example, cassandra.yaml, and By doing this, you can decrease the upgrade process time duration and less error when you are not actually doing it but a upgrade script will symlink this for you. So spend sometime to write upgrade and downgrade scripts for the production cluster and tests it.

Because upgrade process will take time (a long one, depend on how many nodes you have in cluster) and it will tired you in the process (remember, there will be post upgrade issues which you need to deal with), so I suggest you create a upgrade script to handle the upgrade process. The cassandra configuration which you prepare before will be automatically symlink within this script. When you do this, you reduce risk such as factor human error and for a production cluster, you will NOT want to risk anything or cut the risk to as minimum as possible.

There is official upgrade documentation here at datastax but because your cluster environment might be different, so you might want to write the upgrade step taking into consideration from the official documentation and let peer review so you cover as much as possible. Best if your peer will tests and raise in some questions which you might not think of.

If you have using monitoring system such as opscenter, spm, jconsole, or your own monitoring system, you wanna check it out if these monitoring can support the newer version of cassandra.

key cache and row cache per column family based has been replace with global key cache and global row cache respectively. These global cache settings can be found in casandra.yaml file. If you leave it to default, 1 millon key cache by default. Below are some new parameter for cassandra 1.1,

  • populate_io_cache_on_flush

  • key_cache_size_in_mb

  • key_cache_save_period

  • row_cache_size_in_mb

  • row_cache_save_period

  • row_cache_provider

  • commitlog_segment_size_in_mb

  • trickle_fsync

  • trickle_fsync_interval_in_kb

  • internode_authenticator

and below are configuration get removed

  • sliced_buffer_size_in_kb

  • thrift_max_message_length_in_mb

For the upgrade steps in production, these steps are taken appropriately:

pre-upgrade apply to all node in cluster.
* stop any repair , cleanup in all cassandra node and no streaming happened. Streaming are the nodes bootstrap or you rebuild a node.

upgrade steps.
1. download cassandra 1.1.12 and verify binary is not corrupted.
2. extract the compressed tarball.
3. nodetool snapshot.
4. nodetool drain.
5. stop cassandra if it not stopped.
6. symlink new configuration files.
7. start cassandra 1.1.12
8. monitor cassandra system.log
9. check monitoring system.

If everything looks okay for first node, best if you do two nodes, and then continue till the rest of the node in rolling upgrade fashion. After you migrate, you might also noticed there are 3 more additional column families in cassandra 1.1

cassandra 1.0 system keyspace has a total of 7 column families

  • HintsColumnFamily

  • IndexInfo

  • LocationInfo

  • Migrations

  • NodeIdInfo

  • Schema

  • Versions

cassandra 1.1 system keyspace has a total 10 column families.

  • HintsColumnFamily

  • IndexInfo

  • LocationInfo

  • Migrations

  • NodeIdInfo

  • Schema

  • schema_columnfamilies

  • schema_columns

  • schema_keyspaces

  • Versions

If you are using level compaction strategy, these sstable need to be scrub accordingly. There are nodetool scrub and offline sstablescrub for this job. If you have defined column family using counter type, you should upgrade the sstable using nodetool upgradesstables.

That's it and if you need professional service for this, please contact me and I will be gladly to provide professional advice and/or service.

Sunday, March 29, 2015

My journey and experience on upgrading apache cassandra 1.0.8 to 1.0.12

Upon request of my blog reader, today I will share with you my experience on upgrading apache cassandra version 1.0.8 to 1.0.12 on a production live cluster. By sharing this information, I hope if you are also running and/or administer cassandra cluster, you can learn from my experience and ease your worry or pain.

First, let's lay out what's the current architecture in this environment.

  • java 6

  • 12 nodes cluster.

  • two spinning disk with raid 0, 32GB total system memory where 14GB allocated to the cassandra heap instance, with 800MB for young gen. quad core cpu.

  • pretty much stock cassandra.yaml configuration with the following different like concurrent_write to 64, flush_largest_memtables_at to 0.8, compaction_throughput_mb_per_sec to 64.

  • node load per node average at 500-550GB.

As you can see, this is pretty ancient cassandra we are using at of this time of writing but because cassandra has been rock solid serving read/write requests for years, it stays like this stable condition forever and we leverage on the benefit of scalling out like adding nodes from six to nine and eventually to twelve now. Realizing that the disk failure do happened in the nodes of the cluster, because of cassandra has a no single point of failure in mind, we can afford to loose a single node out of operation while replacing it. That were a few of the reasons we stayed with cassandra 1.0 for quite sometime.

Because we would like to probably goes to cassandra 2.0 and beyond, and java 6 has been EOL for quite sometime, it would be wise to upgrade java before cassandra. Because system are integrated like an ecosystem, it would be also wise to look at java used in the client system that read/write requests to the cassandra cluster. So make a checklist brainstorming what are clients that integrate into the cluster and then check out what are the current stable java 7 available. Example:

cassandra 1.0 cassandra-1.0.12 java miniumum 6 and above.

hector client using casandra 2.0.4 so java 7 minimum

datastax cql driver use cassandra 2.1.2 so java 7 minimum

java 7 update release note

features and enhancement

java 7 in wiki

before upgrading, check if cassandra using different unicode on the data
Early versions of the Java SE 7 release added support for Unicode 5.1.0. The final version of the Java SE 7 release supports Unicode 6.0.0. Unicode 6.0.0 is a major version of the Unicode Standard and adds support for over 2000 additional characters, as well as support for properties and data files.

As of the time of checking, we picked java 7 update 72. Upgrading java 6 to java 7 update 72 in the cassandra 1.0.8 is a painless process other than just time consuming. As load per node is huge and total number of nodes in cluster. I follow this steps for java upgrade in cassandra node.

upgrade java for all cassandra node
1. write a script to automatically install java7 on node, update java stacked size to 256k in set JAVA_HOME for file to java 7.
2. execute the script in rolling fashion for all the node with one upgrade at a time.
3. stop cassandra
4. execute the script.
5. start the cassandra instance
6.0 start the cassandra instance and monitor after the node is up and then check the monitoring system after node elapsed for 30minutes, 60minutes, 1hours and 2hours.
6.1 check your client can read/write to that one upgraded node.

By now, you can perform the next node in the ring, but you can skip step 6.0 as you are sure that it is going to work. One thing I observed is that, the gc duration for cassandra using java 6 and java 7 is it is down by half! That's could means faster gc means more cpu cycle to process other tasks and less stop of the world for cassandra instance.

Leave this cluster with java 7 upgraded run a day or two and if it is okay, continue to cassandra upgrade. So which cassandra version to upgrade to? There are several guidelines I followed.

1. choose ONLY STABLE release for production cluster. How to choose? You should read this link.
2. read NEWS.txt  and Changes.txt . As time to time, change to the code base may affect example, the sstable. So pay attention especially between cassandra major upgrade.
3. read the code difference between the version you decided to upgrade too, example for this upgrade.
4. read the datastax upgrading node for minor version.

I spent a lot of time doing step 3 and by reading the code diference, realize what has been change and/or added and consider it will impact your cassandra environment. In order for further upgrade to cassandra 1.1, you will need to upgrade to the latest version of the one currently deployed. Example here. Once read the above checkpoints, you may have a lot of questions and TODOs and that will give further works. In the next step, it is best if you find out the questions and TODOs you have and then verify in the test cluster before apply to a production cluster.

For me, I have written a few bash scripts example mentioned above, java upgrade. Also I have written install test cluster for cassandra upgrade. Remember to also write script to snapshot the data directory using nodetool and then also write script to automatically downgrade. When something goes wrong, you can revert using the automatic downgrade script and using the backup from nodetool dump. Then you will need to save the configurations example,,, cassandra.yaml or any other in your environment cluster.

With these scripts written and tested, it is best if you get and acknowledgements from the management if this is to be proceed and also, it would be best if you have someone who is also administer of cassandra cluster with you just for the good and bad moments. ;-) You can also reach me by my follow button in the home page. :)

upgrade cassandra from 1.0.8 to 1.0.12

  1. stop repair and cleanup in all nodes in the cluster.

  2. write a script to automatically upgrade it and so you dont panic, waste time and composed during node upgrade. Trust me, save you a lot of time and human error free. scripts content could be the following:
    - download cassandra 1.0.12 and extract, file permission ,etc
    - backup current cassandra 1.0.8 using nodetool snapshots. make sure you write the snapshot directory name like MyKeyspace-1.0.8-date
    - drain the node.
    - stop cassandra if it is not yet stopped.
    - update cassandra 1.0.12 with your cluster settings.

  3. check the configuration changed and then start cassandra 1.0.12 new instance.

  4. monitor after the node is up and then check the monitoring system after node elapsed for 30minutes, 60minutes, 1hours and 2hours.

  5. check your client can read/write to that one upgraded node.

By now, you can perform the next node in the ring, but you can skip step 4.0 as you are sure that it is going to work. As the version of the cassandra sstable change in 1.0.10, from hc to hd, it is best all sstables in all nodes, using the hd version before perform the next major upgrade.

That's it for this article and whilst this maybe not cover all, may contain mistake, and/or if you want to comment, please leave your comment below.

Saturday, March 28, 2015

Investigate into apache cassandra corrupt sstable exception

Today, we will take a look at another apache cassandra 1.0.8 exception. Example of stack trace below.
ERROR [SSTableBatchOpen:2] 2015-03-07 06:11:58,544 (line 228) Corrupt sstable /var/lib/cassandra/data/MySuperKeyspace/MyColumnFamily-hc-6681=[Index.db, Statistics.db, CompressionInfo.db, Filter.db, Data.db]; skipped Input/output error
at Method)
at java.util.concurrent.Executors$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$

Before we go into the code base for this stacktrace, I have no idea what is this about and this one shown when the cassandra 1.0.12 instance is booting up. Last I remember I trigger user defined compaction twice in cassandra 1.0.8 using the same sstables and after first compaction is done, then this sstable stay forever... like for two weeks plus. Then we have upgrade for the cassandra.

Enough said, let's go into the code base and understand what is really mean by corrupt sstable. Bottom of the the stack trace pretty obvious, ThreadPoolExecutor execute a future task run method.Then it is now on apache cassandra namespace codebase, as can be read below class SSTableReader, method batchOpen(), code snippet
    public static Collection<SSTableReader> batchOpen(Set<Map.Entry<Descriptor, Set<Component>>> entries,
final Set<DecoratedKey> savedKeys,
final DataTracker tracker,
final CFMetaData metadata,
final IPartitioner partitioner)
final Collection<SSTableReader> sstables = new LinkedBlockingQueue<SSTableReader>();

ExecutorService executor = DebuggableThreadPoolExecutor.createWithPoolSize("SSTableBatchOpen", Runtime.getRuntime().availableProcessors());
for (final Map.Entry<Descriptor, Set<Component>> entry : entries)
Runnable runnable = new Runnable()
public void run()
SSTableReader sstable;
sstable = open(entry.getKey(), entry.getValue(), savedKeys, tracker, metadata, partitioner);
catch (IOException ex)
logger.error("Corrupt sstable " + entry + "; skipped", ex);

executor.awaitTermination(7, TimeUnit.DAYS);
catch (InterruptedException e)
throw new AssertionError(e);

return sstables;


As can be read above, somewhere within the method open() throw the IOException, hence the above exception was thrown. Two stack trace up, we read that, sstable load method execute and, method. With the method read from class ByteBufferUtil as shown below.
    public static ByteBuffer read(DataInput in, int length) throws IOException
if (in instanceof FileDataInput)
return ((FileDataInput) in).readBytes(length);

byte[] buff = new byte[length];
return ByteBuffer.wrap(buff);

We see that, the input in a instance of FileDataInput stream and read the bytes with length. Since FileDataInput is a interface, we read that, the class that implement this interface is RandomAccessReader class and method readBytes as the follow.
public ByteBuffer readBytes(int length) throws IOException
assert length >= 0 : "buffer length should not be negative: " + length;

byte[] buff = new byte[length];
readFully(buff); // reading data buffer

return ByteBuffer.wrap(buff);

to read bytes with length is actually to read fully on the length but started on the current file pointer pointing at. And a little bit way up in the stack trace, method reBuffer()
* Read data from file starting from current currentOffset to populate buffer.
* @throws IOException on any I/O error.
protected void reBuffer() throws IOException

if (bufferOffset >= channel.size())

channel.position(bufferOffset); // setting channel position

int read = 0;

while (read < buffer.length)
int n =, read, buffer.length - read);
if (n < 0)
read += n;

validBufferBytes = read;

bytesSinceCacheFlush += read;

if (skipIOCache && bytesSinceCacheFlush >= MAX_BYTES_IN_PAGE_CACHE)
// with random I/O we can't control what we are skipping so
// it will be more appropriate to just skip a whole file after
// we reach threshold
CLibrary.trySkipCache(this.fd, 0, 0);
bytesSinceCacheFlush = 0;

and this method call superclass to read another chunk into the buffer. The upper class RandomAccessFile , method readBytes()
* Reads a sub array as a sequence of bytes.
* @param b the buffer into which the data is read.
* @param off the start offset of the data.
* @param len the number of bytes to read.
* @exception IOException If an I/O error has occurred.
private int readBytes(byte b[], int off, int len) throws IOException {
Object traceContext = IoTrace.fileReadBegin(path);
int bytesRead = 0;
try {
bytesRead = readBytes0(b, off, len);
} finally {
IoTrace.fileReadEnd(traceContext, bytesRead == -1 ? 0 : bytesRead);
return bytesRead;

private native int readBytes0(byte b[], int off, int len) throws IOException;

.. and we are at the end of this path, it turn out that the call to readBytes0 thrown exception, the lower layer native non java call throwing the IO exception. You can use nodetool scrub to see if this fix the problem but what I do basically wipe the data directory for the cassandra and rebuild it. Then I don't see anymore of this message anymore.

That's it for this article and if you want to improve and/or comment, please leave your input below.

Friday, March 27, 2015

Investigate into apache cassandra get_slice assertion error

Today, we will investigate another error from apache cassandra. Error as shown below in cassandra log.
ERROR [Thrift:2] 2015-02-11 11:06:10,837 (line 3041) Internal error processing get_slice
at org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(
at org.apache.cassandra.locator.TokenMetadata.firstToken(
at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(
at org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(
at org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(
at org.apache.cassandra.service.StorageProxy.fetchRows(
at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(
at org.apache.cassandra.thrift.CassandraServer.getSlice(
at org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(
at org.apache.cassandra.thrift.CassandraServer.get_slice(
at org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(
at org.apache.cassandra.thrift.Cassandra$Processor.process(
at org.apache.cassandra.thrift.CustomTThreadPoolServer$
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
at java.util.concurrent.ThreadPoolExecutor$

So bottom first three lines pretty easy, a thread is ran with the thread pool executor. As indicated by the code snipet below, that a worker process having trouble in processing a request.
processor = processorFactory_.getProcessor(client_);
inputTransport = inputTransportFactory_.getTransport(client_);
outputTransport = outputTransportFactory_.getTransport(client_);
inputProtocol = inputProtocolFactory_.getProtocol(inputTransport);
outputProtocol = outputProtocolFactory_.getProtocol(outputTransport);
// we check stopped_ first to make sure we're not supposed to be shutting
// down. this is necessary for graceful shutdown. (but not sufficient,
// since process() can take arbitrarily long waiting for client input.
// See comments at the end of serve().)
while (!stopped_ && processor.process(inputProtocol, outputProtocol))
inputProtocol = inputProtocolFactory_.getProtocol(inputTransport);
outputProtocol = outputProtocolFactory_.getProtocol(outputTransport);

Skipping a few low level byte stream processing, we arrived at the actual class which actually implement the method get_slice. Read code snippet below.
    public List<ColumnOrSuperColumn> get_slice(ByteBuffer key, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)
throws InvalidRequestException, UnavailableException, TimedOutException

state().hasColumnFamilyAccess(column_parent.column_family, Permission.READ);
return multigetSliceInternal(state().getKeyspace(), Collections.singletonList(key), column_parent, predicate, consistency_level).get(key);

so we see another method is called, multigetSliceInternal. Read code snippet below where a few validations on the data.
    private Map<ByteBuffer, List<ColumnOrSuperColumn>> multigetSliceInternal(String keyspace, List<ByteBuffer> keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)
throws InvalidRequestException, UnavailableException, TimedOutException
CFMetaData metadata = ThriftValidation.validateColumnFamily(keyspace, column_parent.column_family);
ThriftValidation.validateColumnParent(metadata, column_parent);
ThriftValidation.validatePredicate(metadata, column_parent, predicate);
ThriftValidation.validateConsistencyLevel(keyspace, consistency_level);

List<ReadCommand> commands = new ArrayList<ReadCommand>();
if (predicate.column_names != null)
for (ByteBuffer key: keys)
ThriftValidation.validateKey(metadata, key);
commands.add(new SliceByNamesReadCommand(keyspace, key, column_parent, predicate.column_names));
SliceRange range = predicate.slice_range;
for (ByteBuffer key: keys)
ThriftValidation.validateKey(metadata, key);
commands.add(new SliceFromReadCommand(keyspace, key, column_parent, range.start, range.finish, range.reversed, range.count));

return getSlice(commands, consistency_level);

then method getSlice is called,  and method readColumnFamily() is also called. As shown below, the code snippet
  protected Map<DecoratedKey, ColumnFamily> readColumnFamily(List<ReadCommand> commands, ConsistencyLevel consistency_level)
throws InvalidRequestException, UnavailableException, TimedOutException
// TODO - Support multiple column families per row, right now row only contains 1 column family
Map<DecoratedKey, ColumnFamily> columnFamilyKeyMap = new HashMap<DecoratedKey, ColumnFamily>();

if (consistency_level == ConsistencyLevel.ANY)
throw new InvalidRequestException("Consistency level any may not be applied to read operations");

List<Row> rows;
rows =, consistency_level);
catch (TimeoutException e)
logger.debug("... timed out");
throw new TimedOutException();
catch (IOException e)
throw new RuntimeException(e);

for (Row row: rows)
return columnFamilyKeyMap;

another class is called, StorageProxy to read the row in concern and the read method code snippet below.
* Performs the actual reading of a row out of the StorageService, fetching
* a specific set of column names from a given column family.
public static List<Row> read(List<ReadCommand> commands, ConsistencyLevel consistency_level)
throws IOException, UnavailableException, TimeoutException, InvalidRequestException
if (StorageService.instance.isBootstrapMode())
throw new UnavailableException();
long startTime = System.nanoTime();
List<Row> rows;
rows = fetchRows(commands, consistency_level);
readStats.addNano(System.nanoTime() - startTime);
return rows;

the exception lead this investigation to fetching the row and within the same class, for method fetchRows, code snippet below.
* This function executes local and remote reads, and blocks for the results:
* 1. Get the replica locations, sorted by response time according to the snitch
* 2. Send a data request to the closest replica, and digest requests to either
* a) all the replicas, if read repair is enabled
* b) the closest R-1 replicas, where R is the number required to satisfy the ConsistencyLevel
* 3. Wait for a response from R replicas
* 4. If the digests (if any) match the data return the data
* 5. else carry out read repair by getting data from all the nodes.
private static List<Row> fetchRows(List<ReadCommand> initialCommands, ConsistencyLevel consistency_level) throws IOException, UnavailableException, TimeoutException
List<Row> rows = new ArrayList<Row>(initialCommands.size());
List<ReadCommand> commandsToRetry = Collections.emptyList();

List<ReadCommand> commands = commandsToRetry.isEmpty() ? initialCommands : commandsToRetry;
ReadCallback<Row>[] readCallbacks = new ReadCallback[commands.size()];

if (!commandsToRetry.isEmpty())
logger.debug("Retrying {} commands", commandsToRetry.size());

// send out read requests
for (int i = 0; i < commands.size(); i++)
ReadCommand command = commands.get(i);
assert !command.isDigestQuery();
logger.debug("Command/ConsistencyLevel is {}/{}", command, consistency_level);

List<InetAddress> endpoints = StorageService.instance.getLiveNaturalEndpoints(command.table,
DatabaseDescriptor.getEndpointSnitch().sortByProximity(FBUtilities.getBroadcastAddress(), endpoints);

RowDigestResolver resolver = new RowDigestResolver(command.table, command.key);
ReadCallback<Row> handler = getReadCallback(resolver, command, consistency_level, endpoints);
assert !handler.endpoints.isEmpty();
readCallbacks[i] = handler;

// The data-request message is sent to dataPoint, the node that will actually get the data for us
InetAddress dataPoint = handler.endpoints.get(0);
if (dataPoint.equals(FBUtilities.getBroadcastAddress()) && OPTIMIZE_LOCAL_REQUESTS)
logger.debug("reading data locally");
StageManager.getStage(Stage.READ).execute(new LocalReadRunnable(command, handler));
logger.debug("reading data from {}", dataPoint);
MessagingService.instance().sendRR(command, dataPoint, handler);

if (handler.endpoints.size() == 1)

// send the other endpoints a digest request
ReadCommand digestCommand = command.copy();
MessageProducer producer = null;
for (InetAddress digestPoint : handler.endpoints.subList(1, handler.endpoints.size()))
if (digestPoint.equals(FBUtilities.getBroadcastAddress()))
logger.debug("reading digest locally");
StageManager.getStage(Stage.READ).execute(new LocalReadRunnable(digestCommand, handler));
logger.debug("reading digest from {}", digestPoint);
// (We lazy-construct the digest Message object since it may not be necessary if we
// are doing a local digest read, or no digest reads at all.)
if (producer == null)
producer = new CachingMessageProducer(digestCommand);
MessagingService.instance().sendRR(producer, digestPoint, handler);

// read results and make a second pass for any digest mismatches
List<ReadCommand> repairCommands = null;
List<RepairCallback> repairResponseHandlers = null;
for (int i = 0; i < commands.size(); i++)
ReadCallback<Row> handler = readCallbacks[i];
ReadCommand command = commands.get(i);
long startTime2 = System.currentTimeMillis();
Row row = handler.get();
if (row != null)

if (logger.isDebugEnabled())
logger.debug("Read: " + (System.currentTimeMillis() - startTime2) + " ms.");
catch (TimeoutException ex)
if (logger.isDebugEnabled())
logger.debug("Read timeout: {}", ex.toString());
throw ex;
catch (DigestMismatchException ex)
if (logger.isDebugEnabled())
logger.debug("Digest mismatch: {}", ex.toString());
RowRepairResolver resolver = new RowRepairResolver(command.table, command.key);
RepairCallback repairHandler = new RepairCallback(resolver, handler.endpoints);

if (repairCommands == null)
repairCommands = new ArrayList<ReadCommand>();
repairResponseHandlers = new ArrayList<RepairCallback>();

MessageProducer producer = new CachingMessageProducer(command);
for (InetAddress endpoint : handler.endpoints)
MessagingService.instance().sendRR(producer, endpoint, repairHandler);

if (commandsToRetry != Collections.EMPTY_LIST)

// read the results for the digest mismatch retries
if (repairResponseHandlers != null)
for (int i = 0; i < repairCommands.size(); i++)
ReadCommand command = repairCommands.get(i);
RepairCallback handler = repairResponseHandlers.get(i);
// wait for the repair writes to be acknowledged, to minimize impact on any replica that's
// behind on writes in case the out-of-sync row is read multiple times in quick succession
FBUtilities.waitOnFutures(handler.resolver.repairResults, DatabaseDescriptor.getRpcTimeout());

Row row;
row = handler.get();
catch (DigestMismatchException e)
throw new AssertionError(e); // full data requested from each node here, no digests should be sent

ReadCommand retryCommand = command.maybeGenerateRetryCommand(handler, row);
if (retryCommand != null)
logger.debug("issuing retry for read command");
if (commandsToRetry == Collections.EMPTY_LIST)
commandsToRetry = new ArrayList<ReadCommand>();

if (row != null)
} while (!commandsToRetry.isEmpty());

return rows;

As this point of investigation, this method, fetchRows documentation is pretty useful for us.
* This function executes local and remote reads, and blocks for the results:
* 1. Get the replica locations, sorted by response time according to the snitch
* 2. Send a data request to the closest replica, and digest requests to either
* a) all the replicas, if read repair is enabled
* b) the closest R-1 replicas, where R is the number required to satisfy the ConsistencyLevel

we see this method actually execute on local and remote node, and during getting the node who is responsible to keep the row, problem occur. Let's read on the method getLiveNaturalEndpoints() and as shown below.
* This method attempts to return N endpoints that are responsible for storing the
* specified key i.e for replication.
* @param key - key for which we need to find the endpoint return value -
* the endpoint responsible for this key
public List<InetAddress> getLiveNaturalEndpoints(String table, ByteBuffer key)
return getLiveNaturalEndpoints(table, partitioner.getToken(key));

public List<InetAddress> getLiveNaturalEndpoints(String table, Token token)
List<InetAddress> liveEps = new ArrayList<InetAddress>();
List<InetAddress> endpoints =;

for (InetAddress endpoint : endpoints)
if (FailureDetector.instance.isAlive(endpoint))

return liveEps;

a little upper in the stack trace, abstract class AbstractReplicationStrategy
* get the (possibly cached) endpoints that should store the given Token
* Note that while the endpoints are conceptually a Set (no duplicates will be included),
* we return a List to avoid an extra allocation when sorting by proximity later
* @param searchToken the token the natural endpoints are requested for
* @return a copy of the natural endpoints for the given token
public ArrayList<InetAddress> getNaturalEndpoints(Token searchToken)
Token keyToken = TokenMetadata.firstToken(tokenMetadata.sortedTokens(), searchToken);
ArrayList<InetAddress> endpoints = getCachedEndpoints(keyToken);
if (endpoints == null)
TokenMetadata tokenMetadataClone = tokenMetadata.cloneOnlyTokenMap();
keyToken = TokenMetadata.firstToken(tokenMetadataClone.sortedTokens(), searchToken);
endpoints = new ArrayList<InetAddress>(calculateNaturalEndpoints(searchToken, tokenMetadataClone));
cacheEndpoint(keyToken, endpoints);

return new ArrayList<InetAddress>(endpoints);

somehow the ring size is equal to 0 or less than 0. class and code snippet where the assertion thrown,
    public static int firstTokenIndex(final ArrayList ring, Token start, boolean insertMin)
assert ring.size() > 0;
// insert the minimum token (at index == -1) if we were asked to include it and it isn't a member of the ring
int i = Collections.binarySearch(ring, start);
if (i < 0)
i = (i + 1) * (-1);
if (i >= ring.size())
i = insertMin ? -1 : 0;
return i;

public static Token firstToken(final ArrayList<Token> ring, Token start)
return ring.get(firstTokenIndex(ring, start, false));

So something went during during reading a row's column and somehow the natural endpoint is either 0 or empty. My guess is that, it could be gossip is disable so the ring metadata is empty. The solution is to enable the gossip and then restart cassandra instance.

If you think this analysis is not accurate or want to provide more information, please do so by commenting below. Thank you.