Information Technology Blogs: cassandra.yaml

Sunday, November 23, 2014

Investigate into why key cache in apache cassandra 1.0.8 gets reduced

Today, we will investigate into apache cassandra 1.0.8 when and why it reduce configured key cache. If you run the command nodetool cfstats. One of the statistics would probably interest you. I paste the snippet below.

Key cache capacity: 200000
Key cache size: 200000
Key cache hit rate: 0.9655797101449275
Row cache: disabled

After cassandra instance has been running for sometime, and you start to notice that the key cache capacity has gone down.

Key cache capacity: 150000
Key cache size: 150000
Key cache hit rate: 0.962251615630851
Row cache: disabled

As seen above, the initial capacity for this column family has 20,000 total key for cache. Currently, all object (that is 20,000) occupied fully in the key cache assigned. The hit rate is 96% which is very good statistics. So after a while, what had happened and why was it reduce? Let's investigate into the log file.

 WARN [ScheduledTasks:1] 2014-02-02 00:46:46,384 AutoSavingCache.java (line 187) Reducing MyColumnFamily KeyCache capacity from 200000 to 150000 to reduce memory pressure

Apparently memory is not enough at this point of time and the key cache is reduced to free up more memory for the cassandra instance. Let's look at the cassandra yaml file if there is any description for the key cache.

# emergency pressure valve #2: the first time heap usage after a full
# (CMS) garbage collection is above this fraction of the max,
# Cassandra will reduce cache maximum _capacity_ to the given fraction
# of the current _size_.  Should usually be set substantially above
# flush_largest_memtables_at, since that will have less long-term
# impact on the system.  
# 
# Set to 1.0 to disable.  Setting this lower than
# CMSInitiatingOccupancyFraction is not likely to be useful.
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6

There are two configurations that reduce the cache size. When memory heap usage at 85%, key cache is reduced to 60% of its initial value. So now we dive deeper into the code to see what happened. Let's read into class GCInspector.

double usage = (double) memoryUsed / memoryMax;

if (memoryUsed > DatabaseDescriptor.getReduceCacheSizesAt() * memoryMax && !cacheSizesReduced)
{
    cacheSizesReduced = true;
    logger.warn("Heap is " + usage + " full.  You may need to reduce memtable and/or cache sizes.  Cassandra is now reducing cache sizes to free up memory.  Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically");
    StorageService.instance.reduceCacheSizes();
}

When memory used is greater than reduce_cache_sizes_at (configured in cassanra.yaml, value at 0.85) multiply maximum memory in the heap and cache has not been reduced before. For example, if jvm is assigned with 8GB of heap, so the if statement evaluation become valid under such arithmetic, memory usage greater than 6.8GB when cache size has not been reduced before.

When the condition become true, StorageService will start to reduce cache size. A simple for loop over all column families to reduce the cache size. As seen here, there are two caches are being reduced. The rowcache and the keycache. Because we did not enable row cache and also not a focus on this study, I'll leave as an exercise for you. The investigation continue on the keyCache.reduceCacheSize();. As the snippet of code below shown.

public void reduceCacheSize()
{
   if (getCapacity() > 0)
   {
       int newCapacity = (int) (DatabaseDescriptor.getReduceCacheCapacityTo() * size());
       logger.warn(String.format("Reducing %s %s capacity from %d to %s to reduce memory pressure",
                              cfName, cacheType, getCapacity(), newCapacity));
       setCapacity(newCapacity);
   }
}

So if the capacity is initially assigned to a value larger than 0, then a new capacity is set. The new capacity is such that, reduce_cache_capacity_to (default at cassandra yaml, 0.60) multiply with the current size of the cache. For example, if the cache is occupied at 20000 x 0.60, the new value will be the new cache capacity at 12000.

This wrap up the investigation. Final thought, because the memory consumption is exceed certain amount of threshold, this emergency pressure valve kicked in. To fix immediate, an increase heap for cassandra instance will solve, but the correct would probably reduce node load or increase node for the cluster. When cache capacity is reduced, expect read become slower too and in data storage perspective, speed and performance is everything and reduced cache is definitely an impact to the cluster.

Saturday, May 3, 2014

what and why always all time blocked for cassandra pool FlushWriter

FlushWriter                       0         0            941         0                53

If you noticed in a cassandra cluster, I often noticed that the pool FlushWriter all time block always increased while other pool remain 0. So is this that we should concern of?

Snippet from class ColumnFamilyStore:

/*
 * maybeSwitchMemtable puts Memtable.getSortedContents on the writer executor.  When the write is complete,
 * we turn the writer into an SSTableReader and add it to ssTables_ where it is available for reads.
 *
 * There are two other things that maybeSwitchMemtable does.
 * First, it puts the Memtable into memtablesPendingFlush, where it stays until the flush is complete
 * and it's been added as an SSTableReader to ssTables_.  Second, it adds an entry to commitLogUpdater
 * that waits for the flush to complete, then calls onMemtableFlush.  This allows multiple flushes
 * to happen simultaneously on multicore systems, while still calling onMF in the correct order,
 * which is necessary for replay in case of a restart since CommitLog assumes that when onMF is
 * called, all data up to the given context has been persisted to SSTables.
 */
 private static final ExecutorService flushWriter
        = new JMXEnabledThreadPoolExecutor(DatabaseDescriptor.getFlushWriters(),
                                           StageManager.KEEPALIVE,
                                           TimeUnit.SECONDS,
                                           new LinkedBlockingQueue<Runnable>(DatabaseDescriptor.getFlushQueueSize()),
                                           new NamedThreadFactory("FlushWriter"),
                                           "internal");

Just like other Stage.replicate_on_write, FlushWriter is also an instance of JMXEnabledThreadPoolExecutor, governed by two configuration which you can altered in cassandra.yaml.

memtable_flush_writers default based on number of data_file_directories specified.

memtable_flush_queue_size default 4

Whenever maybeSwitchMemtable is called, memtable.flushAndSignal() is called within.

Notice that in Memtable.flushAndSignal(), ExecutorService which is extends a few until the construction object JMXEnabledThreadPoolExecutor for pool FlushWriter aforementioned. So whenever, the task is rejected due to queue full, method rejectedExecution() is triggered which eventually increase the count by one.

So that's it, hope you get an idea what and why is the all time block for pool FlushWriter is increased, so it should give indication you should altered the parameter for the two configuration in cassandra.yaml file.

Last, if you learned something and would like to contribute back, please visit our donation page. Thank you.

Pages

Sunday, November 23, 2014

Investigate into why key cache in apache cassandra 1.0.8 gets reduced

Saturday, May 3, 2014

what and why always all time blocked for cassandra pool FlushWriter