Showing posts with label openjdk. Show all posts
Showing posts with label openjdk. Show all posts

Sunday, December 21, 2014

apache cassandra 1.0.8 out of memory error unable to create new native thread

If you are using apache cassandra 1.0.8 and having the exception such as below, you may want to further read. Today, we will investigate on what this error means and what can we do to correct this situation.
ERROR [Thread-273] 2012-14-10 16:33:18,328 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-273,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657)
at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:104)
at org.apache.cassandra.thrift.CassandraDaemon$ThriftServer.run(CassandraDaemon.java:214)

This is not good, the application crashed with this error during operation. To illustrate this environment, it is running using oracle java 6 with apache cassandra 1.0.8. It has 12GB of java heap assigned with stack size 128k, max user processes 260000 and open files capped at 65536.

Investigate into the java stack trace, reveal that, this error is not thrown by java code but native code. Below is the trace path.

  1. https://github.com/apache/cassandra/blob/cassandra-0.8/src/java/org/apache/cassandra/thrift/CassandraDaemon.java#L214

  2. https://github.com/apache/cassandra/blob/cassandra-1.0.8/src/java/org/apache/cassandra/thrift/CustomTThreadPoolServer.java#L104

  3. ThreadPoolExecutor.java line 657
    cassandra_investigation_1

  4. ThreadPoolExecutor.java line 727
    cassandra_investigation_2

  5. Thread.java line 640
    cassandra_investigation_3


A little explanation before we delve even deeper. Number 3 to 5, is jdk dependent. Hence, if you are using openjdk, the line number may be different. As mentioned early, I'm using oracle jdk. Unfortunately, it is not available online for browsing but you can download the source from oracle site.

Because this is a native call, we will look into code that is not in Java. If the following code looks alien to you, it sure looks alien to me as it is probably written in c++. If you have also notice, this code is taken from openjdk and it is not found in the oracle jdk. Probably it is a closed source but we will not go there. Let's just focus where this error thrown from and why. It is taken from here and the explanation here.
JVM_ENTRY(void, JVM_StartThread(JNIEnv* env, jobject jthread))
JVMWrapper("JVM_StartThread");
JavaThread *native_thread = NULL;

// We cannot hold the Threads_lock when we throw an exception,
// due to rank ordering issues. Example: we might need to grab the
// Heap_lock while we construct the exception.
bool throw_illegal_thread_state = false;

// We must release the Threads_lock before we can post a jvmti event
// in Thread::start.
{
// Ensure that the C++ Thread and OSThread structures aren't freed before
// we operate.
MutexLocker mu(Threads_lock);

// Since JDK 5 the java.lang.Thread threadStatus is used to prevent
// re-starting an already started thread, so we should usually find
// that the JavaThread is null. However for a JNI attached thread
// there is a small window between the Thread object being created
// (with its JavaThread set) and the update to its threadStatus, so we
// have to check for this
if (java_lang_Thread::thread(JNIHandles::resolve_non_null(jthread)) != NULL) {
throw_illegal_thread_state = true;
} else {
// We could also check the stillborn flag to see if this thread was already stopped, but
// for historical reasons we let the thread detect that itself when it starts running

jlong size =
java_lang_Thread::stackSize(JNIHandles::resolve_non_null(jthread));
// Allocate the C++ Thread structure and create the native thread. The
// stack size retrieved from java is signed, but the constructor takes
// size_t (an unsigned type), so avoid passing negative values which would
// result in really large stacks.
size_t sz = size > 0 ? (size_t) size : 0;
native_thread = new JavaThread(&thread_entry, sz);

// At this point it may be possible that no osthread was created for the
// JavaThread due to lack of memory. Check for this situation and throw
// an exception if necessary. Eventually we may want to change this so
// that we only grab the lock if the thread was created successfully -
// then we can also do this check and throw the exception in the
// JavaThread constructor.
if (native_thread->osthread() != NULL) {
// Note: the current thread is not being used within "prepare".
native_thread->prepare(jthread);
}
}
}

if (throw_illegal_thread_state) {
THROW(vmSymbols::java_lang_IllegalThreadStateException());
}

assert(native_thread != NULL, "Starting null thread?");

if (native_thread->osthread() == NULL) {
// No one should hold a reference to the 'native_thread'.
delete native_thread;
if (JvmtiExport::should_post_resource_exhausted()) {
JvmtiExport::post_resource_exhausted(
JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR | JVMTI_RESOURCE_EXHAUSTED_THREADS,
"unable to create new native thread");
}
THROW_MSG(vmSymbols::java_lang_OutOfMemoryError(),
"unable to create new native thread");
}

Thread::start(native_thread);

JVM_END

As I don't have knowledge in cpp, hence, there is no analysis into this snippet above, but if you understand what it does, I will be happy if you can give your analysis as a comment below of this article. It certainly looks to me that the operating system cannot create a thread at this point due to a few errors, JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR and / or JVMTI_RESOURCE_EXHAUSTED_THREADS. Let's google to find out what is that supposed to mean. Below are some which is interesting.

To summarize the analysis from the links above.

  • stack is created when thread is created and when more threads are created, hence the total of stacks also increased as a result.

  • A Java Virtual Machine stack stores frames. A Java Virtual Machine stack is analogous to the stack of a conventional language such as C: it holds local variables and partial results, and plays a part in method invocation and return.

  • Java stack is not within of java heap, hence, even if you increase java heap to the cassandra via parameter -Xms or -Xmx, this error will happen again if the condition is met again in the future.

  • If Java Virtual Machine stacks can be dynamically expanded, and expansion is attempted but insufficient memory can be made available to effect the expansion, or if insufficient memory can be made available to create the initial Java Virtual Machine stack for a new thread, the Java Virtual Machine throws an OutOfMemoryError.


Until current analysis, it certainly looks to me that when cassandra instance trying to create a new thread, it was not able to. It was not able to because the underlying operating system cannot create the thread due to two errors. It actually looks like the operating system does not have sufficient memory to create the thread, hence increasing -Xms or -Xmx will not solve the problem. Note that the file descriptor set in this case is not met neither as most of the criterias pretty much infinite.

It's pretty interesting to note that, if such error is thrown, to solve the problem is to decrease the -Xss or even the heap -Xms and -Xmx. Although I don't understand the logic behind of such method used, perhaps you should try but I seriously doubt so. If cassandra node has high usage of heap, decreasing heap will only create another type of problem.

If you know or have encountered such problem before and has a good fix, please leave the comment below this article. To end this article, there is currently as of this writing, a discussion happen at cassandra mailing list.