Showing posts with label major_upgrade. Show all posts
Showing posts with label major_upgrade. Show all posts

Sunday, May 10, 2015

My journey and experience on upgrading apache cassandra from version1.0.12 to 1.1.12

If you have read my previous post on apache cassandra upgrade, this is another journey to major upgrade apache cassandra from version 1.0 to 1.1. In this article, I will share on my experience on upgrading cassandra from version 1.0.12 to 1.1.12.

The sstable version used by cassandra 1.0.12 is hd  and you should ensure that all nodes sstables become hd before proceed upgrade to a newer version of cassandra.

First, let read some highlight of cassandra 1.1

  • api version 19.33.0

  • new file cassandra-rackdc.properties, commitlog_archiving.properties

  • new directory structure for sstable and filename change for sstable.

  • more features/improvement to nodetool such as compactionstats has remaining timestamp, calculate exact size required for cleanup operations, you can now stop compaction, rangekeysample, getsstables, repair print progress, etc.

  • global key and row cache.

  • cql 3.0 beta

  • schema change for cassandra in caching.

  • libthrift version 0.7.0.

  • sstable hf version.

  • default compressor become snappy compressor.

  • a lot of improvement to level compression strategy.

  • sliced_buffer_size_in_kb option has been removed from the cassandra.yaml configuration file (this option was a no-op since 1.0).

  • thread stack size increased to 160k

  • added flag UseTLAB for jvm to improve read speed.

As this is a newer version of cassandra compare the previous, it is always good to setup a test node and so you can play around and get familiar with it before actually doing the upgrade. With this new node, you can also quickly test with your application client which write and/or read to the test cassandra node. It is also recommended to do some load test to see the result is what you have expected.

If you want to be extremely careful on the upgrade, then reading the code changes between the version you chose to upgrade is always recommended. This is the link for this upgrade  and I know and understand as there are huge differences in betweeen them, so you should split as small as possible to read through it. You can learn a lot from the experience coder if you spend a lot of time reading their code and you can learn new technology too. It is a daunting huge tasks but if you willing to spend sometime to read them, the benefits return is just too much to even describe here.

If you upgrade from 1.0.12 to 1.1.12, cassandra 1.1 is smart enough to move the sstable into new directory structure. So, it ease your job that you do not need to move the sstable into the new directory structure. When the new cassandra 1.1.12 starting up, it will move for you.

So you might want to consider prepare the configuration file for your cluster environment before hand. For example, cassandra.yaml, cassandra-env.sh and cassandra.in.sh. By doing this, you can decrease the upgrade process time duration and less error when you are not actually doing it but a upgrade script will symlink this for you. So spend sometime to write upgrade and downgrade scripts for the production cluster and tests it.

Because upgrade process will take time (a long one, depend on how many nodes you have in cluster) and it will tired you in the process (remember, there will be post upgrade issues which you need to deal with), so I suggest you create a upgrade script to handle the upgrade process. The cassandra configuration which you prepare before will be automatically symlink within this script. When you do this, you reduce risk such as factor human error and for a production cluster, you will NOT want to risk anything or cut the risk to as minimum as possible.

There is official upgrade documentation here at datastax but because your cluster environment might be different, so you might want to write the upgrade step taking into consideration from the official documentation and let peer review so you cover as much as possible. Best if your peer will tests and raise in some questions which you might not think of.

If you have using monitoring system such as opscenter, spm, jconsole, or your own monitoring system, you wanna check it out if these monitoring can support the newer version of cassandra.

key cache and row cache per column family based has been replace with global key cache and global row cache respectively. These global cache settings can be found in casandra.yaml file. If you leave it to default, 1 millon key cache by default. Below are some new parameter for cassandra 1.1,

  • populate_io_cache_on_flush

  • key_cache_size_in_mb

  • key_cache_save_period

  • row_cache_size_in_mb

  • row_cache_save_period

  • row_cache_provider

  • commitlog_segment_size_in_mb

  • trickle_fsync

  • trickle_fsync_interval_in_kb

  • internode_authenticator

and below are configuration get removed

  • sliced_buffer_size_in_kb

  • thrift_max_message_length_in_mb

For the upgrade steps in production, these steps are taken appropriately:

pre-upgrade apply to all node in cluster.
* stop any repair , cleanup in all cassandra node and no streaming happened. Streaming are the nodes bootstrap or you rebuild a node.

upgrade steps.
1. download cassandra 1.1.12 and verify binary is not corrupted.
2. extract the compressed tarball.
3. nodetool snapshot.
4. nodetool drain.
5. stop cassandra if it not stopped.
6. symlink new configuration files.
7. start cassandra 1.1.12
8. monitor cassandra system.log
9. check monitoring system.

If everything looks okay for first node, best if you do two nodes, and then continue till the rest of the node in rolling upgrade fashion. After you migrate, you might also noticed there are 3 more additional column families in cassandra 1.1

cassandra 1.0 system keyspace has a total of 7 column families

  • HintsColumnFamily

  • IndexInfo

  • LocationInfo

  • Migrations

  • NodeIdInfo

  • Schema

  • Versions

cassandra 1.1 system keyspace has a total 10 column families.

  • HintsColumnFamily

  • IndexInfo

  • LocationInfo

  • Migrations

  • NodeIdInfo

  • Schema

  • schema_columnfamilies

  • schema_columns

  • schema_keyspaces

  • Versions

If you are using level compaction strategy, these sstable need to be scrub accordingly. There are nodetool scrub and offline sstablescrub for this job. If you have defined column family using counter type, you should upgrade the sstable using nodetool upgradesstables.

That's it and if you need professional service for this, please contact me and I will be gladly to provide professional advice and/or service.