ERROR [ReadStage:1559] 2012-10-16 20:38:25,336 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReadStage:1559,5,main]
java.lang.IncompatibleClassChangeError: vtable stub
at org.apache.cassandra.db.AbstractColumnContainer.getColumn(AbstractColumnContainer.java:134)
at org.apache.cassandra.db.Memtable$6.computeNext(Memtable.java:402)
at org.apache.cassandra.db.Memtable$6.computeNext(Memtable.java:384)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at org.apache.cassandra.db.CollationController.collectTimeOrderedData(CollationController.java:93)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1298)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1184)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1151)
at org.apache.cassandra.db.Table.getRow(Table.java:375)
at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:58)
at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:765)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1224)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I guess this is really some fatal errors encountered. Let's check out what is IncompatibleClassChangeError means. From IncompatibleClassChangeError javadoc
Thrown when an incompatible class change has occurred to some class definition. The definition of some class, on which the currently executing method depends, has since changed.
Honestly, this is really odd, this error thrown out of no where and the jvm stopped. I guess nobody change the compiled apache cassandra code during the node instance is running. It certainly sounds odd but I guess when the data hold per node is huge, then strange thing start to happened. Now, let's get into stack trace and follow stack trace path.
Bottom three stack traces are pretty obvious, a new thread was started and execute by the thread pool executor. Then we have class StorageProxy. Within this class, there is a static class LocalReadRunnable which implement the abstract method LocalReadRunnable. It seem like it is trying to read a local node table row. The table eventually make calls to the column family to retrieve column. Tracing even deeper, at line 134 of class AbstractColumnContainer, reveal that no exception is thrown from here. This is like a mystery! :) The stack trace analysis and observed jvm stopped shown something is wrong. Though I am not sure what went wrong but if you have any idea, please discuss it as a comment below.
Next we look to another error.
ERROR [CompactionExecutor:4] 2014-10-22 06:13:00,884 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[CompactionExecutor:4,1,main]
java.lang.AssertionError: Added column does not sort as the last column
at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:126)
at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:129)
at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:124)
at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:148)
at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:232)
at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:110)
at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:97)
at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:137)
at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:102)
at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:87)
at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:135)
at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:115)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Again, this error pop out of nowhere, during node operation, this exception just thrown. So now, we will look into what happened using the stack trace given. Similar to the above, bottom three stack traces are pretty obvious, a new thread was started and execute by the thread pool executor. It is a compaction thread and when compacting sstable, things break. Then row get reduced and columns get deserialized. When columns are re-formation, it throw an exception as the column does not sort. This is another strange phenomenon. It should have been sort before anyway but it is not. The javdoc for this method addColumn
AddColumn throws an exception if the column added does not sort after the last column in the map. The reasoning is that this implementation can get slower if too much insertions are done in unsorted order and right now we only use it when *all* insertion (with this method) are done in sorted order. The assertion throwing is thus a protection against performance regression without knowing about (we can revisit that decision later if we have use cases where most insert are in sorted order but a few are not).
It seem that this is probably reproducible during development environment as a few are not.. I guess this is not as fatal as previous error. That's it for this analysis, please give your inputs or comments if you have workaround.
UPDATE:
It seem that the second error is fix in https://issues.apache.org/jira/browse/CASSANDRA-5856