Sunday, January 18, 2015

how to improve apache cassandra 1.0.8 read speed

This article is for improve reading speed for apache cassandra 1.0.8. Because the reading improvement determined by many factors, we will investigate all possible areas so the gain will be improve collectively. So you may experience these factors and alter according to suit your node environment to achieve the best result. As the cassandra 1.0 released, the official cited that the read performance has increased up to 400%!

First and foremost, there are numerous articles which I use as a reference has cited copyright, I take no ownership nor credit of their hardwork as that is rightfully belong to them entirely. I only reference their work to improve my knowledge and to help people (like me) who need help and came to read what I share in my article.

Let's split these improvements into two parts, the hardware and the software.

hardware

ssd

ssd disk is way faster than hdd disk in term of reading in multiple magnitude, please read cassandra-benchmark for the benchmark. Although the cassandra using was version 0.8.10, but when cassandra 1.0 was released, read gained tremendous improvement. Then these two improvemetns will be linear gain too. Also, it is recommend to read the aforementioned article as it explain why is the random speed will hurt the read performance for hdd disk.

multiple disks

disks allocation to the commit log should be different than the data directory. Because during data write, data is repeating appending to the commit log. If the data directory is located on different drive, read performance gain should be visible.

 

software

memtable
If write behaviour has a lot of updates, it is good to look into memtable settings. There are two settings which you can start with

  • memtable_total_space_in_mb

  • commitlog_total_space_in_mb


more memory to this settings means the frequent write (update) will be absorded by the memory and thus, reading will be fast too as read start from memtable first before going into sstable. But because this impact system wide, you might want to gradually start to increase it and measure them. Read below for more information on what these two settings are and how to tune them
http://www.datastax.com/docs/1.0/operations/tuning#memtable-sizing
http://www.datastax.com/docs/1.0/configuration/node_configuration#memtable-total-space-in-mb
http://www.slideshare.net/driftx/cassandra-summit-2010-performance-tuning slide 14 and slide 26
http://wiki.apache.org/cassandra/MemtableThresholds

WP-DataStax-Cassandra page 16
Specifically, for read performance, Cassandra 1.0 optimizes queries by using a lighter-weight data structure for representing a row fragment from a read, than for a row fragment in a memtable into which updates accumulates. Also, with named reads, Cassandra 1.0 includes enhancements for deserializing the most recent versions of requested columns. Combined with the other optimizations, this makes reads in Cassandra as fast as writes for many workloads.

data compression

Previously I have done a study on the compression affect improvement read, read here, herehere and here. Please read the links as it provide comprehensive explanation than I could describe here.

compaction

compaction can improve (or impact) the read speed. Citation fromWP-DataStax-Cassandra,
The above process produces exceptionally fast write operations; however it also can lead to data fragmentation across the disk. Read requests may have to combine data from many SStables as well as Memtables to satisfy end user requests for data, and this can increase query response times.

To reduce data fragmentation and reclaim space taken by obsolete data, Cassandra performs "compactions" that merge the most recent data from many different SStables on disk into a new one.

So with my experience, if you trigger compaction (major) through nodetool, during compaction, the read latency will increase, thus impact but when the compaction is done, the read performance is improved.

In this documentation,  it explain different compaction strategy to use for read or write workload. So identify how's your environment write and read pattern and always measure it so you know what and when it could went wrong. Choose compaction strategy to suit your data model. For instance, if cassandra is not strong at a point, choose other big data technology. Read here for bad experience encountered.

sstables counts

Keep the sstables counts as low as possible for a column family. Excerpt from FULLTEXT02 page 39.
If a read operation is performed, initially the data are read from the memtable. If data are not in the memtable, then data get read from SSTable. Multiple SSTables may be looked up to find the data. Reading directly from SSTable decreases the performance because there are many SSTable that might need to be looked at hence requires an I/O operation means it requires touching the disk. Compared to SSTable, reading directly from memtable is fast because there is no I/O involved. The more the I/O operations are involved, the more performance will be degraded. Performance can also be increased by increasing the size of memtable [7].

Cassandra uses Bloom filter to judge quickly whether the key exists in the SSTable or not before touching the disk. Bloom filter is a efficient data structure that checks whether element is a member of a set by dividing the memory into buckets. Check each bucket to see if a key is present and if any bucket is empty then key was never inserted before. If there are many SSTables, then lots of I/O operations would be needed to read the data which can definitely decrease the performance. This is because of the fact that I/O operations are expensive and therefore compaction is used to improve read performance. Compaction merge two SSTables and sort to become one SSTable, which eventually decreases the number of SSTables and number of I/O operations, hence increasing the performance [7].

key / row cache

key cache should be enable to reduce the search from touching the disk, especially spinning disk. Excerpt from FULLTEXT02 page 47.
By default key cache is enabled and Cassandra caches 20,000 keys per Column Family (CF). The key cache decreases the Input/Output (I/O) operations because if key cache is not enabled then I/O operation is required in order to figure out the exact location of the row. Key cache holds the exact location of the data belonging to that key.

 

Row cache holds the entire content of the row in cache. By default, row cache is disabled. The overhead of enabling or increasing the row cache is that it may require more Java Virtual Machine (JVM) heap of Cassandra. By if jna lib is available, then storing row cache off heap is a good option. This article has diagram on how read is perform.

concurrent_read

Excerpt from FULLTEXT02 page 48.
Read performance can also be increased by tuning the concurrent reads. The rule is span 4 threads per Central Processing Units (CPU) core in the cluster. The higher the number of threads spanned for read, the higher performance can be achieved if the machines have got faster I/O.

A word of cautious, I tried increase concurrent_read from 32 to 64 and see some unpredictable behaviour, so it is better you do this in test environment.

decreasing read consistency level

If your business requirement can tolerate of eventual consistency, then decrease from quorum to one will improve read speed as only one node acknowledgement is sufficient to fulfill the read request compare to a certain amount of nodes in quorum.

turn off swap space

When the node start to swap due to shortage of memory, the response of node be it write or read will be visible. Hence it is best to turn off swap, and let the operating system kill or jvm kill itself to oom than the page swap start to happen.

java heap

Citation from OS-8.1.3-Cassandra Installation and Configuration Guide page 33
HEAP_NEWSIZE : Size of young generation. Larger value leads to longer GC pause times while smaller value will typically lead to more expensive GC. Set in conjunction with MAX_HEAP_SIZE.

So tune it carefully since this is pretty low level. Read this article as it mentioned a few garbage collector settings for cassandra and memory footprint.

upgrade

Each release of software improve or fix the previous defect, so is cassandra. If upgrade is viable, you should consider. For instance, to quote Aaron Morton
1.0 has key and row caches defined per CF, 1.1 has global ones which are better utilised and easier to manage. 1.2 moves bloom filters and compression meta off heap which reduces GC, which will help.  Things normally get faster.

This is also true.

monitoring

Because data load increase and/or decrease will impact the read response time, it is vital if there is monitoring services running. As cited from this paper, p1724_tilmannrabl_vldb2012 Page 1,
In modern enterprise systems it is not uncommon to have thousands of different metrics that are reported from a single host machine.

So monitor crucial metrics by cassandra, example, cpu, java heap and io should give some indicator if your speed has been reduced.

Whilst these are collected knowledge are from public and free will sharing. Any mistake and errors in this article is mine alone and does not reflect to them. Thank you and I hope you learned something.

No comments:

Post a Comment