Information Technology Blogs: java

Showing posts with label java. Show all posts

Sunday, December 21, 2014

apache cassandra 1.0.8 out of memory error unable to create new native thread

If you are using apache cassandra 1.0.8 and having the exception such as below, you may want to further read. Today, we will investigate on what this error means and what can we do to correct this situation.

ERROR [Thread-273] 2012-14-10 16:33:18,328 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-273,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:640)
        at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657)
        at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:104)
        at org.apache.cassandra.thrift.CassandraDaemon$ThriftServer.run(CassandraDaemon.java:214)

This is not good, the application crashed with this error during operation. To illustrate this environment, it is running using oracle java 6 with apache cassandra 1.0.8. It has 12GB of java heap assigned with stack size 128k, max user processes 260000 and open files capped at 65536.

Investigate into the java stack trace, reveal that, this error is not thrown by java code but native code. Below is the trace path.

https://github.com/apache/cassandra/blob/cassandra-0.8/src/java/org/apache/cassandra/thrift/CassandraDaemon.java#L214

https://github.com/apache/cassandra/blob/cassandra-1.0.8/src/java/org/apache/cassandra/thrift/CustomTThreadPoolServer.java#L104

ThreadPoolExecutor.java line 657

ThreadPoolExecutor.java line 727

Thread.java line 640

A little explanation before we delve even deeper. Number 3 to 5, is jdk dependent. Hence, if you are using openjdk, the line number may be different. As mentioned early, I'm using oracle jdk. Unfortunately, it is not available online for browsing but you can download the source from oracle site.

Because this is a native call, we will look into code that is not in Java. If the following code looks alien to you, it sure looks alien to me as it is probably written in c++. If you have also notice, this code is taken from openjdk and it is not found in the oracle jdk. Probably it is a closed source but we will not go there. Let's just focus where this error thrown from and why. It is taken from here and the explanation here.

JVM_ENTRY(void, JVM_StartThread(JNIEnv* env, jobject jthread))
  JVMWrapper("JVM_StartThread");
  JavaThread *native_thread = NULL;

  // We cannot hold the Threads_lock when we throw an exception,
  // due to rank ordering issues. Example:  we might need to grab the
  // Heap_lock while we construct the exception.
  bool throw_illegal_thread_state = false;

  // We must release the Threads_lock before we can post a jvmti event
  // in Thread::start.
  {
    // Ensure that the C++ Thread and OSThread structures aren't freed before
    // we operate.
    MutexLocker mu(Threads_lock);

    // Since JDK 5 the java.lang.Thread threadStatus is used to prevent
    // re-starting an already started thread, so we should usually find
    // that the JavaThread is null. However for a JNI attached thread
    // there is a small window between the Thread object being created
    // (with its JavaThread set) and the update to its threadStatus, so we
    // have to check for this
    if (java_lang_Thread::thread(JNIHandles::resolve_non_null(jthread)) != NULL) {
      throw_illegal_thread_state = true;
    } else {
      // We could also check the stillborn flag to see if this thread was already stopped, but
      // for historical reasons we let the thread detect that itself when it starts running

      jlong size =
             java_lang_Thread::stackSize(JNIHandles::resolve_non_null(jthread));
      // Allocate the C++ Thread structure and create the native thread.  The
      // stack size retrieved from java is signed, but the constructor takes
      // size_t (an unsigned type), so avoid passing negative values which would
      // result in really large stacks.
      size_t sz = size > 0 ? (size_t) size : 0;
      native_thread = new JavaThread(&thread_entry, sz);

      // At this point it may be possible that no osthread was created for the
      // JavaThread due to lack of memory. Check for this situation and throw
      // an exception if necessary. Eventually we may want to change this so
      // that we only grab the lock if the thread was created successfully -
      // then we can also do this check and throw the exception in the
      // JavaThread constructor.
      if (native_thread->osthread() != NULL) {
        // Note: the current thread is not being used within "prepare".
        native_thread->prepare(jthread);
      }
    }
  }

  if (throw_illegal_thread_state) {
    THROW(vmSymbols::java_lang_IllegalThreadStateException());
  }

  assert(native_thread != NULL, "Starting null thread?");

  if (native_thread->osthread() == NULL) {
    // No one should hold a reference to the 'native_thread'.
    delete native_thread;
    if (JvmtiExport::should_post_resource_exhausted()) {
      JvmtiExport::post_resource_exhausted(
        JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR | JVMTI_RESOURCE_EXHAUSTED_THREADS,
        "unable to create new native thread");
    }
    THROW_MSG(vmSymbols::java_lang_OutOfMemoryError(),
              "unable to create new native thread");
  }

  Thread::start(native_thread);

JVM_END

As I don't have knowledge in cpp, hence, there is no analysis into this snippet above, but if you understand what it does, I will be happy if you can give your analysis as a comment below of this article. It certainly looks to me that the operating system cannot create a thread at this point due to a few errors, JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR and / or JVMTI_RESOURCE_EXHAUSTED_THREADS. Let's google to find out what is that supposed to mean. Below are some which is interesting.

http://www.oracle.com/technetwork/java/hotspotfaq-138619.html#threads_oom

http://www.onkarjoshi.com/blog/209/using-xss-to-adjust-java-default-thread-stack-size-to-save-memory-and-prevent-stackoverflowerror/

http://www.odi.ch/weblog/posting.php?posting=411

http://www.avricot.com/blog/?post/2010/05/03/Get-started-with-java-JVM-memory-%28heap%2C-stack%2C-xss-xms-xmx-xmn...%29

https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.5.2

To summarize the analysis from the links above.

stack is created when thread is created and when more threads are created, hence the total of stacks also increased as a result.

A Java Virtual Machine stack stores frames. A Java Virtual Machine stack is analogous to the stack of a conventional language such as C: it holds local variables and partial results, and plays a part in method invocation and return.

Java stack is not within of java heap, hence, even if you increase java heap to the cassandra via parameter -Xms or -Xmx, this error will happen again if the condition is met again in the future.

If Java Virtual Machine stacks can be dynamically expanded, and expansion is attempted but insufficient memory can be made available to effect the expansion, or if insufficient memory can be made available to create the initial Java Virtual Machine stack for a new thread, the Java Virtual Machine throws an OutOfMemoryError.

Until current analysis, it certainly looks to me that when cassandra instance trying to create a new thread, it was not able to. It was not able to because the underlying operating system cannot create the thread due to two errors. It actually looks like the operating system does not have sufficient memory to create the thread, hence increasing -Xms or -Xmx will not solve the problem. Note that the file descriptor set in this case is not met neither as most of the criterias pretty much infinite.

It's pretty interesting to note that, if such error is thrown, to solve the problem is to decrease the -Xss or even the heap -Xms and -Xmx. Although I don't understand the logic behind of such method used, perhaps you should try but I seriously doubt so. If cassandra node has high usage of heap, decreasing heap will only create another type of problem.

If you know or have encountered such problem before and has a good fix, please leave the comment below this article. To end this article, there is currently as of this writing, a discussion happen at cassandra mailing list.

Sunday, November 23, 2014

Investigate into why key cache in apache cassandra 1.0.8 gets reduced

Today, we will investigate into apache cassandra 1.0.8 when and why it reduce configured key cache. If you run the command nodetool cfstats. One of the statistics would probably interest you. I paste the snippet below.

Key cache capacity: 200000
Key cache size: 200000
Key cache hit rate: 0.9655797101449275
Row cache: disabled

After cassandra instance has been running for sometime, and you start to notice that the key cache capacity has gone down.

Key cache capacity: 150000
Key cache size: 150000
Key cache hit rate: 0.962251615630851
Row cache: disabled

As seen above, the initial capacity for this column family has 20,000 total key for cache. Currently, all object (that is 20,000) occupied fully in the key cache assigned. The hit rate is 96% which is very good statistics. So after a while, what had happened and why was it reduce? Let's investigate into the log file.

 WARN [ScheduledTasks:1] 2014-02-02 00:46:46,384 AutoSavingCache.java (line 187) Reducing MyColumnFamily KeyCache capacity from 200000 to 150000 to reduce memory pressure

Apparently memory is not enough at this point of time and the key cache is reduced to free up more memory for the cassandra instance. Let's look at the cassandra yaml file if there is any description for the key cache.

# emergency pressure valve #2: the first time heap usage after a full
# (CMS) garbage collection is above this fraction of the max,
# Cassandra will reduce cache maximum _capacity_ to the given fraction
# of the current _size_.  Should usually be set substantially above
# flush_largest_memtables_at, since that will have less long-term
# impact on the system.  
# 
# Set to 1.0 to disable.  Setting this lower than
# CMSInitiatingOccupancyFraction is not likely to be useful.
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6

There are two configurations that reduce the cache size. When memory heap usage at 85%, key cache is reduced to 60% of its initial value. So now we dive deeper into the code to see what happened. Let's read into class GCInspector.

double usage = (double) memoryUsed / memoryMax;

if (memoryUsed > DatabaseDescriptor.getReduceCacheSizesAt() * memoryMax && !cacheSizesReduced)
{
    cacheSizesReduced = true;
    logger.warn("Heap is " + usage + " full.  You may need to reduce memtable and/or cache sizes.  Cassandra is now reducing cache sizes to free up memory.  Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically");
    StorageService.instance.reduceCacheSizes();
}

When memory used is greater than reduce_cache_sizes_at (configured in cassanra.yaml, value at 0.85) multiply maximum memory in the heap and cache has not been reduced before. For example, if jvm is assigned with 8GB of heap, so the if statement evaluation become valid under such arithmetic, memory usage greater than 6.8GB when cache size has not been reduced before.

When the condition become true, StorageService will start to reduce cache size. A simple for loop over all column families to reduce the cache size. As seen here, there are two caches are being reduced. The rowcache and the keycache. Because we did not enable row cache and also not a focus on this study, I'll leave as an exercise for you. The investigation continue on the keyCache.reduceCacheSize();. As the snippet of code below shown.

public void reduceCacheSize()
{
   if (getCapacity() > 0)
   {
       int newCapacity = (int) (DatabaseDescriptor.getReduceCacheCapacityTo() * size());
       logger.warn(String.format("Reducing %s %s capacity from %d to %s to reduce memory pressure",
                              cfName, cacheType, getCapacity(), newCapacity));
       setCapacity(newCapacity);
   }
}

So if the capacity is initially assigned to a value larger than 0, then a new capacity is set. The new capacity is such that, reduce_cache_capacity_to (default at cassandra yaml, 0.60) multiply with the current size of the cache. For example, if the cache is occupied at 20000 x 0.60, the new value will be the new cache capacity at 12000.

This wrap up the investigation. Final thought, because the memory consumption is exceed certain amount of threshold, this emergency pressure valve kicked in. To fix immediate, an increase heap for cassandra instance will solve, but the correct would probably reduce node load or increase node for the cluster. When cache capacity is reduced, expect read become slower too and in data storage perspective, speed and performance is everything and reduced cache is definitely an impact to the cluster.

Friday, November 7, 2014

How to set java thread name, thread pool name, get stack dump and heap dump

If you have been developing java application and one of the exception could be shown in the log, "java.lang.OutOfMemoryError: unable to create new native thread". This is what we are going to learn today. We will learn how to set thread a name, set a thread pool name, get the stack dump and get java heap dump and so we know why the exception aforementioned is thrown. This is in response to doing exercises after read the article http://code.nomad-labs.com/2012/12/20/some-tips-on-java-lang-outofmemoryerror-unable-to-create-new-native-thread/

I will divide these exercises into sub topics and we will learn one by one. We will start with simple exercise first.

How to name a thread?
To set a name for a thread, you will need to call method setName(), then show the currently running thread using Thread.currentThread().getName(). See example below.

public class Minion implements Runnable {
    
    private String name = null;
    
    public Minion(String name) {
        this.name = name;
    }

    @Override
    public void run() {
        Thread.currentThread().setName(name);
        System.out.printf("minion %s eating banana", Thread.currentThread().getName());
    }

}

Another modified version as below. Note that we do not set thread name here, we set it in the class ThreadFactory. Read on to find out why later in this article.

import java.util.Random;


public class Minion implements Runnable {
    
    private static String[] fruits = {"apple", "banana"}; 
    
    public Minion() {
     
    }

    @Override
    public void run() {
        try {
            // banana or apple
            int idx = new Random().nextInt(fruits.length);
            System.out.printf("minion %s eating %s %n", Thread.currentThread().getName(), fruits[idx]);
            Thread.sleep((idx * 1000) + 10000);
            StackTraceElement[] ste =Thread.currentThread().getStackTrace();
            
            System.out.printf("minion %s done eat %s %n", Thread.currentThread().getName(), fruits[idx]);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

    }

}

You can pass the task in the constructor as a parameter but for the sake of simplicity, I hard coded task within this worker thread, two fruits; apple and banana. Longer thread sleep because later in this article, we will using this class for getting heap dump and thread dump.

How to name the thread pool?

To change name of a thread pool, you will need to implement ThreadFactory and then override namePrefix in the constructor. There are some other libraries which is easier and do that for you but we will not look in other libraries but only java native libraries. Let's do the work, see example below.

import java.util.concurrent.ThreadFactory;
import java.util.concurrent.atomic.AtomicInteger;


public class GruThreadFactory implements ThreadFactory {
    
    static final AtomicInteger poolNumber = new AtomicInteger(1);
    final ThreadGroup group;
    final AtomicInteger threadNumber = new AtomicInteger(1);
    final String namePrefix;
    
    public GruThreadFactory() {
        SecurityManager s = System.getSecurityManager();
        group = (s != null) ? s.getThreadGroup() : Thread.currentThread().getThreadGroup();
        namePrefix = "GruPool-" + poolNumber.getAndIncrement() + "-thread-";
    }

    @Override
    public Thread newThread(Runnable r) {
        final Thread t = new Thread(group, r, namePrefix + threadNumber.getAndIncrement(), 0);
        if (t.isDaemon())
            t.setDaemon(false);
        if (t.getPriority() != Thread.NORM_PRIORITY)
            t.setPriority(Thread.NORM_PRIORITY);
        return t;
    }

}

and the modified version. As you can see, minions are passed into GruThreadFactory constructor and then each time a newThread() is called, the thread name is named as random to the String array minions.

import java.util.Random;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.atomic.AtomicInteger;


public class GruThreadFactory implements ThreadFactory {
    
    static final AtomicInteger poolNumber = new AtomicInteger(1);
    final ThreadGroup group;
    final AtomicInteger threadNumber = new AtomicInteger(1);
    final String namePrefix;
    final String[] minions;
    
    public GruThreadFactory(String[] minions) {
        SecurityManager s = System.getSecurityManager();
        group = (s != null) ? s.getThreadGroup() : Thread.currentThread().getThreadGroup();
        namePrefix = "GruPool-" + poolNumber.getAndIncrement() + "-thread-";
        this.minions = minions; 
    }

    @Override
    public Thread newThread(Runnable r) {
        int index = new Random().nextInt(minions.length);
        final Thread t = new Thread(group, r, namePrefix + minions[index], 0);
        if (t.isDaemon())
            t.setDaemon(false);
        if (t.getPriority() != Thread.NORM_PRIORITY)
            t.setPriority(Thread.NORM_PRIORITY);
        return t;
    }

}

and then out main app to bind the ThreadFactory and the workers.

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadFactory;


public class MainApp {

    /**
     * @param args
     * @throws InterruptedException 
     */
    public static void main(String[] args) throws InterruptedException {
        ThreadFactory GruFactory = null;
        ExecutorService exec = null;
        
        try {
            String[] minions = {"Dave", "Stuart", "Kevin", "Jerry", "Carl", "Phil", "Paul", "Mike", "Jorge"};
            GruFactory = new GruThreadFactory(minions);
            exec = Executors.newFixedThreadPool(minions.length, GruFactory);
            
            for (int i = 0; i < 90; i++) {
                Minion m = new Minion();
                exec.execute(m);
            }
            exec.shutdown();
            while (!exec.isTerminated()) {
                
            }
            
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            
            System.out.println("done");
        }

    }

}

Now to check. Output below is from the modified version. Note that you may have different output than mine but that is as expected. I paste only a few lines below.

minion GruPool-1-thread-Mike eating banana 
minion GruPool-1-thread-Phil eating banana 
minion GruPool-1-thread-Mike eating apple 
minion GruPool-1-thread-Stuart eating banana 
minion GruPool-1-thread-Mike done eat apple 
minion GruPool-1-thread-Paul eating banana 
minion GruPool-1-thread-Stuart done eat banana 
minion GruPool-1-thread-Stuart eating apple 
minion GruPool-1-thread-Stuart done eat apple 
minion GruPool-1-thread-Stuart eating apple 
minion GruPool-1-thread-Stuart done eat apple 
minion GruPool-1-thread-Stuart eating banana 
..
..
minion GruPool-1-thread-Paul done eat apple 
minion GruPool-1-thread-Jorge eating apple 
minion GruPool-1-thread-Jorge done eat apple 
minion GruPool-1-thread-Mike done eat banana 
minion GruPool-1-thread-Paul done eat banana 
minion GruPool-1-thread-Mike done eat banana 
minion GruPool-1-thread-Jerry done eat banana 
minion GruPool-1-thread-Stuart done eat banana 
minion GruPool-1-thread-Mike done eat banana 
done

So far so good, our app work.

How to get a stack dump?

In order to get a stack dump, you can use jstack. When you install jdk, jstack came with it. To get the stack dump,

jstack -F 19557  &> jstack.log

where 19557 is the pid of the java process and we redirected stdout and stderr to a file. There are several command to get the process id, you can either use ps or java jps. Output as attached.

Attaching to process ID 19557, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.6-b01
Deadlock Detection:

No deadlocks found.

Thread 19586: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - Minion.run() @bci=55, line=18 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Thread 19585: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - Minion.run() @bci=55, line=18 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Thread 19584: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - Minion.run() @bci=55, line=18 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Thread 19583: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - Minion.run() @bci=55, line=18 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Thread 19582: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - Minion.run() @bci=55, line=18 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Thread 19581: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - Minion.run() @bci=55, line=18 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Thread 19580: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - Minion.run() @bci=55, line=18 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Thread 19579: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - Minion.run() @bci=55, line=18 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Thread 19578: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - Minion.run() @bci=55, line=18 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Thread 19573: (state = BLOCKED)


Thread 19572: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=118 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=134 (Interpreted frame)
 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 (Interpreted frame)


Thread 19571: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
 - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=116 (Interpreted frame)


Thread 19564: (state = IN_JAVA)
 - MainApp.main(java.lang.String[]) @bci=119, line=28 (Compiled frame; information may be imprecise)

jstack output show

it has 9 threads with Minion.run().

no deadlocks

individual thread state was blocked because it was sleeping.

main thread state is in IN_JAVA which is running in Java or in stub code.

Another useful information could probably be thread dump. With visualvm, you can actually see the thread running over time, I recorded a video and you should try to watch it.

http://www.youtube.com/watch?v=REoi6sbcIUY

Also see the thread dump capture using VisualVM

2014-10-07 22:58:32
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode):

"RMI TCP Connection(2)-127.0.0.1" daemon prio=10 tid=0x00007fd5c0003000 nid=0x646c runnable [0x00007fd618af8000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	- locked <0x00000007d7ece620> (a java.io.BufferedInputStream)
	at java.io.FilterInputStream.read(FilterInputStream.java:66)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:517)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d79e3908> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"JMX server connection timeout 22" daemon prio=10 tid=0x00007fd5b4008000 nid=0x6464 in Object.wait() [0x00007fd618bf9000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000007d7b6ddb8> (a [I)
	at com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150)
	- locked <0x00000007d7b6ddb8> (a [I)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- None

"RMI Scheduler(0)" daemon prio=10 tid=0x00007fd5b4004800 nid=0x6463 waiting on condition [0x00007fd618cfa000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000007d799dc18> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
	at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- None

"RMI TCP Connection(1)-127.0.0.1" daemon prio=10 tid=0x00007fd5c0001800 nid=0x6462 runnable [0x00007fd618efc000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	- locked <0x00000007d7b1e8f8> (a java.io.BufferedInputStream)
	at java.io.FilterInputStream.read(FilterInputStream.java:66)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:517)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d79e3430> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"RMI TCP Accept-0" daemon prio=10 tid=0x00007fd5bc424800 nid=0x6460 runnable [0x00007fd618ffd000]
   java.lang.Thread.State: RUNNABLE
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
	- locked <0x00000007d79ac788> (a java.net.SocksSocketImpl)
	at java.net.ServerSocket.implAccept(ServerSocket.java:462)
	at java.net.ServerSocket.accept(ServerSocket.java:430)
	at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
	at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
	at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- None

"Attach Listener" daemon prio=10 tid=0x00007fd5f4001000 nid=0x645e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"GruPool-1-thread-Paul" prio=10 tid=0x00007fd61c0f8000 nid=0x6451 waiting on condition [0x00007fd620898000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at Minion.run(Minion.java:18)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d706f6c8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"GruPool-1-thread-Carl" prio=10 tid=0x00007fd61c0f6000 nid=0x6450 waiting on condition [0x00007fd620999000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at Minion.run(Minion.java:18)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d706f3f0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"GruPool-1-thread-Jorge" prio=10 tid=0x00007fd61c0f4000 nid=0x644f waiting on condition [0x00007fd620a9a000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at Minion.run(Minion.java:18)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d706f168> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"GruPool-1-thread-Dave" prio=10 tid=0x00007fd61c0f2000 nid=0x644e waiting on condition [0x00007fd620b9b000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at Minion.run(Minion.java:18)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d706eee0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"GruPool-1-thread-Phil" prio=10 tid=0x00007fd61c0f0000 nid=0x644d waiting on condition [0x00007fd620c9c000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at Minion.run(Minion.java:18)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d706ec58> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"GruPool-1-thread-Jorge" prio=10 tid=0x00007fd61c0ee000 nid=0x644c waiting on condition [0x00007fd620d9d000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at Minion.run(Minion.java:18)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d706e9a0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"GruPool-1-thread-Carl" prio=10 tid=0x00007fd61c0ec800 nid=0x644b waiting on condition [0x00007fd620e9e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at Minion.run(Minion.java:18)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d706e718> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"GruPool-1-thread-Dave" prio=10 tid=0x00007fd61c0eb000 nid=0x644a waiting on condition [0x00007fd620f9f000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at Minion.run(Minion.java:18)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d706e490> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"GruPool-1-thread-Mike" prio=10 tid=0x00007fd61c0e9800 nid=0x6449 waiting on condition [0x00007fd6210a0000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at Minion.run(Minion.java:18)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
	- <0x00000007d706d8e8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"Low Memory Detector" daemon prio=10 tid=0x00007fd61c094000 nid=0x6447 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"C2 CompilerThread1" daemon prio=10 tid=0x00007fd61c092000 nid=0x6446 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"C2 CompilerThread0" daemon prio=10 tid=0x00007fd61c08f000 nid=0x6445 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"Signal Dispatcher" daemon prio=10 tid=0x00007fd61c08d000 nid=0x6444 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
	- None

"Finalizer" daemon prio=10 tid=0x00007fd61c071000 nid=0x6443 in Object.wait() [0x00007fd62182f000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000007d7001300> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
	- locked <0x00000007d7001300> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

   Locked ownable synchronizers:
	- None

"Reference Handler" daemon prio=10 tid=0x00007fd61c06f000 nid=0x6442 in Object.wait() [0x00007fd621930000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000007d70011d8> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:485)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
	- locked <0x00000007d70011d8> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
	- None

"main" prio=10 tid=0x00007fd61c00a800 nid=0x643c runnable [0x00007fd623a7f000]
   java.lang.Thread.State: RUNNABLE
	at MainApp.main(MainApp.java:28)

   Locked ownable synchronizers:
	- None

"VM Thread" prio=10 tid=0x00007fd61c068800 nid=0x6441 runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fd61c01d800 nid=0x643d runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fd61c01f800 nid=0x643e runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fd61c021000 nid=0x643f runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fd61c023000 nid=0x6440 runnable 

"VM Periodic Task Thread" prio=10 tid=0x00007fd61c09e800 nid=0x6448 waiting on condition 

JNI global references: 948

How to get a heap dump?

To get a heap dump, we will use jmap. Just like jstack, jmap also come with jdk installation. To get a heap dump, run the command such as below.

user@localhost:~$ jmap -dump:format=b,file=myapp.hprof 26266
Dumping heap to /home/user/myapp.hprof ...
Heap dump file created

Then you can inspect using visualvm. See the following screenshot.

As you can see, the classes and instance give a lot of information about how much instances it created and it occupied for the size.

With all these method written out, I hope the next time this exception happen to you, get know what are the tools and steps needed to start investigating into the cause.

Good luck!

Friday, July 18, 2014

Cassandra discarding obsolete commit log

Often times, I noticed the log show the following lines in apache cassandra 1.0.8

INFO [COMMIT-LOG-WRITER] 2014-06-30 08:58:55,039 CommitLog.java (line 490) Discarding obsolete commit log:CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1404095354976.log)

It looks nothing to worry about as the log level is INFO. However, I often see this and just to check to understand what it really is.

This log is written by CommitLog.java.

private void maybeDiscardSegment(CommitLogSegment segment, Iterator<CommitLogSegment> iter)
{
    if (segment.isSafeToDelete() && iter.hasNext())
    {
        logger.info("Discarding obsolete commit log:" + segment);
        FileUtils.deleteAsync(segment.getPath());
        // usually this will be the first (remaining) segment, but not always, if segment A contains
        // writes to a CF that is unflushed but is followed by segment B whose CFs are all flushed.
        iter.remove();
    }
    else
    {
        if (logger.isDebugEnabled())
            logger.debug("Not safe to delete commit log " + segment + "; dirty is " + segment.dirtyString() + "; hasNext: " + iter.hasNext());
    }
}

So why discard segment? From the code documentation

Commit Log tracks every write operation into the system. The aim of the commit log is to be able to successfully recover data that was not stored to disk via the Memtable. Every Commit Log maintains a header represented by the abstraction CommitLogHeader. The header contains a bit array and an array of longs and both the arrays are of size, #column families for the Table, the Commit Log represents.

Whenever a ColumnFamily is written to, for the first time its bit flag is set to one in the CommitLogHeader. When it is flushed to disk by the Memtable its corresponding bit in the header is set to zero. This helps track which CommitLogs can be thrown away as a result of Memtable flushes. Additionally, when a ColumnFamily is flushed and written to disk, its entry in the array of longs is updated with the offset in the Commit Log file where it was written. This helps speed up recovery since we can seek to these offsets and start processing the commit log.

Every Commit Log is rolled over everytime it reaches its threshold in size; the new log inherits the "dirty" bits from the old.

Over time there could be a number of commit logs that would be generated. To allow cleaning up non-active commit logs, whenever we flush a column family and update its bit flag in the active CL, we take the dirty bit array and bitwise & it with the headers of the older logs. If the result is 0, then it is safe to remove the older file. (Since the new CL inherited the old's dirty bitflags, getting a zero for any given bit in the anding means that either the CF was clean in the old CL or it has been flushed since the switch in the new.)

So that's pretty clear why commit log is remove. Tracing the call upward,

CommitLog.maybeDiscardSegment()
  ^
  |
  +--- CommitLog.discardCompletedSegmentsInternal()
    ^
    |
    +--- CommitLog.discardCompletedSegments()
       ^
       |
       +--- ColumnFamilyStore.maybeSwitchMemtable()
       |
       +--- ColumnFamilyStore.truncate()

So whenever a column family is truncated, the commit log is discard (remove) as well which make sense. When maybeSwitchMemtable is executed, that is, memtable is flushed, after that, its segments get discard (remove) as well.

This logging message is just fine as it is part of cassandra operation. That's it for this article.

Sunday, July 6, 2014

Learning java selenium webdriver

Today, we are going to something different, we explore software test technology. As end user, we are mostly dealing with point and click, and with this article, we are going to learn how to test user interface.

There are many testing software out there, just google and you find plenty. In this article, we are going to use Java Selenium webdriver. To quickly start, you will have to complete these bullet points. For complete information of selenium, read here.

download latest stable selenium java driver

setup your IDE to reference this jar.

or additional optional download at http://docs.seleniumhq.org/download/

Let's see continue this article with two example.

import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;

public class GoogleSuggest {

    public static void main(String[] args) throws Exception {
        // The Firefox driver supports javascript
        WebDriver driver = new FirefoxDriver();

        // Go to the Google Suggest home page
        driver.get("http://www.google.com/webhp?complete=1&hl=en");

        // Enter the query string "Cheese"
        WebElement query = driver.findElement(By.name("q"));
        query.sendKeys("Cheese");

        // Sleep until the div we want is visible or 5 seconds is over
        long end = System.currentTimeMillis() + 5000;
        while (System.currentTimeMillis() < end) {
            WebElement resultsDiv = driver.findElement(By.className("gssb_e"));

            // If results have been returned, the results are displayed in a drop down.
            if (resultsDiv.isDisplayed()) {
              break;
            }
        }

        // And now list the suggestions
        List<WebElement> allSuggestions = driver.findElements(By.xpath("//td[@class='gssb_a gbqfsf']"));

        for (WebElement suggestion : allSuggestions) {
            System.out.println("suggestion => " + suggestion.getText());
        }

        driver.quit();
    }

}

import org.junit.Before;
import org.junit.Test;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

import com.thoughtworks.selenium.SeleneseTestBase;
import com.thoughtworks.selenium.webdriven.WebDriverBackedSelenium;

public class BaseSeleniumTest extends SeleneseTestBase {

    @Before
    public void setUp() throws Exception {
            WebDriver driver1 = new FirefoxDriver();
            String baseUrl = "http://google.com/";
            selenium = new WebDriverBackedSelenium(driver1, baseUrl);
    }

    @Test
    public void testGoogle() throws Exception {
    	selenium.open("http://google.com/");
    	System.out.println(selenium.getTitle());
    	assertEquals("Google", selenium.getTitle());
    	selenium.close();
    }
}

The first example construct a FireFox WebDriver. Load google url and mimic a typing by sending a keys of Cheese to the query. When Web Element resultsDiv is displayed, then all the suggestion get printed.

The second example is much easier to construct test. By extending SeleneseTestBase, you can use the object selenium and start to call the rich methods it provides for your sub test class. As seem here, similar to the first example, the test started with construction of FirefoxDriver. Then the very familiar junit test, open up google link, assert that the title is Google and then close the selenium object.

That's it for this article, I hope it help you get started. Remember to donate us if you would like to contribute back.

Saturday, July 5, 2014

Study MongoDB GridFS with java example

In the past, we have learned basic MongoDB and study data model, in this article , we will study MongoDB GridFS by storing a file into MongoDB. Below is the java simple application to show how to store, retrieve, and delete eventually.

import java.io.File;
import java.io.IOException;

import com.mongodb.DB;
import com.mongodb.DBCursor;
import com.mongodb.Mongo;
import com.mongodb.MongoException;
import com.mongodb.gridfs.GridFS;
import com.mongodb.gridfs.GridFSDBFile;
import com.mongodb.gridfs.GridFSInputFile;

public class LearnMongo {

	public static void main(String[] args) throws MongoException, IOException {
		Mongo mongo = new Mongo("192.168.0.2", 27017);
		DB db = mongo.getDB("mp3db");

		// save image
		String newFilename = "django.mp3";
		File mp3File = new File("src/resources/django.mp3");
		GridFS gfsMp3 = new GridFS(db, "mp3");
		GridFSInputFile gfsFile = gfsMp3.createFile(mp3File);
		gfsFile.setFilename(newFilename);
		gfsFile.setContentType("audio/mpeg");
		System.out.println(gfsFile.toString());
		gfsFile.save();

		// get mp3
		GridFSDBFile imageForOutput = gfsMp3.findOne(newFilename);
		System.out.println(imageForOutput);

		// print image
		DBCursor cursor = gfsMp3.getFileList();
		while (cursor.hasNext()) {
			System.out.println(cursor.next());
		}

		// save into another image
		imageForOutput.writeTo("/home/jason/Desktop/newsong.mp3");

		// delete image
		gfsMp3.remove(gfsMp3.findOne(newFilename));

	}

}

We start by connecting to the server, so with this example MongoDB instance running on server 192.168.0.2 on port 27017. You may want to check the configuration for MongoDB if you connect remotely as the default MongoDB configuration only listen to localhost.

Then we form a MongoDB DB object on mp3db. You can store other object as well but for this example, I'm going to store a mp3. With this ready, we are going to store the mp3. The important piece of code is probably below.

GridFS gfsMp3 = new GridFS(db, "mp3");
GridFSInputFile gfsFile = gfsMp3.createFile(mp3File);

Instantiate two object, GrisFS and GridFSInputFile. You can set additional information like filename, content type. Calling GridFSInputFile.save() will save the object into MongoDB. If you have access to MongoDB cli, command such as > db.mp3.files.find(); will shown below the output.

{ "_id" : ObjectId("53ad60f944aeaca83109d253"), "chunkSize" : NumberLong(262144), "length" : NumberLong(316773), "md5" : "7293e9fd795e2bb6d5035e5b69cb2923", "filename" : "django.mp3", "contentType" : "audio/mpeg", "uploadDate" : ISODate("2014-06-27T12:18:01.934Z"), "aliases" : null }

To find the mp3, you can use the code, GridFSDBFile imageForOutput = gfsMp3.findOne(newFilename); below is the output.

{ "_id" : { "$oid" : "53ad60f944aeaca83109d253"} , "chunkSize" : 262144 , "length" : 316773 , "md5" : "7293e9fd795e2bb6d5035e5b69cb2923" , "filename" : "django.mp3" , "contentType" : "audio/mpeg" , "uploadDate" : { "$date" : "2014-06-27T12:18:01Z"} , "aliases" : null }

You can also use GridFS.getFileList(); to retrieve all the files currently store on this database. The code continue on writing the object into a file. As you can see, I'm writing to desktop just to ensure it is not from the source.

I end this article by removing the object in the MongoDB database.

Saturday, June 21, 2014

measure java object size using jamm

Often when you develop a java application, you want to measure how big the object occupied in the heap. There are many available tool, such as the SizeOf.jar but today we will take a look at jbellis/jamm. What is Jamm? Jamm provides MemoryMeter, a java agent to measure actual object memory use including JVM overhead.

It is very easy to use, get the source and build the jar. Below is the output of building jar .

jason@localhost:~/codes/jamm$ ant jar
Buildfile: /home/jason/codes/jamm/build.xml

ivy-download:
     [echo] Downloading Ivy...
    [mkdir] Created dir: /home/jason/codes/jamm/target
      [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
      [get] To: /home/jason/codes/jamm/target/ivy-2.1.0.jar

ivy-init:
    [mkdir] Created dir: /home/jason/codes/jamm/target/lib

ivy-retrieve-build:
[ivy:retrieve] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ ::
[ivy:retrieve] :: loading settings :: url = jar:file:/home/jason/codes/jamm/target/ivy-2.1.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: jamm#jamm;working@debby.e2e.serveftp.net
[ivy:retrieve] 	confs: [default]
[ivy:retrieve] 	found junit#junit;4.11 in public
[ivy:retrieve] 	found org.hamcrest#hamcrest-core;1.3 in public
[ivy:retrieve] downloading http://repo1.maven.org/maven2/junit/junit/4.11/junit-4.11-javadoc.jar ...
[ivy:retrieve] .................................................................................................................................................................................................................. (370kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] 	[SUCCESSFUL ] junit#junit;4.11!junit.jar(javadoc) (2772ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/junit/junit/4.11/junit-4.11.jar ...
[ivy:retrieve] ........................................................................................................................................... (239kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] 	[SUCCESSFUL ] junit#junit;4.11!junit.jar (1725ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/junit/junit/4.11/junit-4.11-sources.jar ...
[ivy:retrieve] ................................................................................................. (147kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] 	[SUCCESSFUL ] junit#junit;4.11!junit.jar(source) (1403ms)
[ivy:retrieve] downloading http://repo1.maven.org/maven2/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar ...
[ivy:retrieve] ........................... (43kB)
[ivy:retrieve] .. (0kB)
[ivy:retrieve] 	[SUCCESSFUL ] org.hamcrest#hamcrest-core;1.3!hamcrest-core.jar (1363ms)
[ivy:retrieve] :: resolution report :: resolve 9107ms :: artifacts dl 7338ms
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   2   |   2   |   2   |   0   ||   4   |   4   |
	---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: jamm#jamm [sync]
[ivy:retrieve] 	confs: [default]
[ivy:retrieve] 	3 artifacts copied, 0 already retrieved (431kB/40ms)

init:
    [mkdir] Created dir: /home/jason/codes/jamm/target/classes
    [mkdir] Created dir: /home/jason/codes/jamm/target/test/classes

build:
     [echo] jamm: /home/jason/codes/jamm/build.xml
    [javac] Compiling 3 source files to /home/jason/codes/jamm/target/classes
    [javac] Note: /home/jason/codes/jamm/src/org/github/jamm/AlwaysEmptySet.java uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

jar:
      [jar] Building jar: /home/jason/codes/jamm/target/jamm-0.2.7-SNAPSHOT.jar

BUILD SUCCESSFUL
Total time: 26 seconds

The jar is found in target/jamm-0.2.7-SNAPSHOT.jar . You can start testing the built jar using ant test. Below is the output.

jason@localhost:~/codes/jamm$ ant test
Buildfile: /home/jason/codes/jamm/build.xml

ivy-download:

ivy-init:

ivy-retrieve-build:
[ivy:retrieve] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ ::
[ivy:retrieve] :: loading settings :: url = jar:file:/home/jason/codes/jamm/target/ivy-2.1.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
[ivy:retrieve] :: resolving dependencies :: jamm#jamm;working@debby.e2e.serveftp.net
[ivy:retrieve] 	confs: [default]
[ivy:retrieve] 	found junit#junit;4.11 in public
[ivy:retrieve] 	found org.hamcrest#hamcrest-core;1.3 in public
[ivy:retrieve] :: resolution report :: resolve 266ms :: artifacts dl 23ms
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   2   |   0   |   0   |   0   ||   4   |   0   |
	---------------------------------------------------------------------
[ivy:retrieve] :: retrieving :: jamm#jamm [sync]
[ivy:retrieve] 	confs: [default]
[ivy:retrieve] 	0 artifacts copied, 3 already retrieved (0kB/18ms)

init:

build:
     [echo] jamm: /home/jason/codes/jamm/build.xml

jar:

build-test:
    [javac] Compiling 2 source files to /home/jason/codes/jamm/target/test/classes
    [javac] Note: /home/jason/codes/jamm/test/org/github/jamm/MemoryMeterTest.java uses or overrides a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: /home/jason/codes/jamm/test/org/github/jamm/MemoryMeterTest.java uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

checkos:

test-mac:

test:
     [echo] running tests
    [mkdir] Created dir: /home/jason/codes/jamm/target/test/output
     [echo] Testing with default Java
    [junit] Testsuite: org.github.jamm.GuessTest
    [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.869 sec
    [junit]
    [junit] Testsuite: org.github.jamm.MemoryMeterTest
    [junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 0.813 sec
    [junit]
    [junit] Testcase: testMacOSX_i386(org.github.jamm.MemoryMeterTest):SKIPPED: got: "Linux", expected: is "Mac OS X"
    [junit] Testcase: testMacOSX_x86_64(org.github.jamm.MemoryMeterTest):SKIPPED: got: "Linux", expected: is "Mac OS X"
    [junit] Testcase: testCollections(org.github.jamm.MemoryMeterTest):SKIPPED: These vary quite radically depending on the JVM.

BUILD SUCCESSFUL
Total time: 14 seconds

Very easy and work out of the box. To even start to use, just import the class and start measure the object you interested in,

MemoryMeter meter = new MemoryMeter();
meter.measure(object);

but remember to add "-javaagent:<path to>/jamm.jar" to your classpath when you run your app.

Sunday, June 8, 2014

Initial study into apache hadoop single node cluster

If you have read big data article, you will definitely encountered the term, hadoop. Today, we are going to learn Apache Hadoop.

What is Apache Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

There are links here and here further explain what is hadoop and its components.

I must admit to quickly setup and run single node cluster is difficult. Mainly because this is my first time learning hadoop and official documentation is not for the starter. So I google a few and got a few helpful links. The following setup is mainly for starter to get a feel on how it works. Sort of like hello world example of hadoop. As such goal is as simple as possible to get a feel of what it is.

Setup is a single node cluster, it work with current linux (debian) user environment and we can remove easily changes we've made after this tutorial. Note that example below is using my own username (jason), and it should work with your user ($HOME) environment too. User security is not a concern issue here as the objective is to learn the basic of hadoop here. A few system setup are needed and we start to prepare for the environment for hadoop.

Because this is a java library, a required JRE installed is needed. This article assume you have java installed and running. You can check it below. If you do not have java, google how to install JRE.

jason@localhost:~$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

ssh daemon is required on your workstation. It is also recommend that openssh-client is installed as we will generate public and private key for automatic ssh login. Thus, apt-get install openssh-server openssh-client

Once both packages are installed, make sure sshd daemon is running and generate public and private key.

ssh-keygen -t rsa -P '' -f id_rsa_hadoop

with the above commands, we specified key type is rsa with empty passphrase so ssh will not prompt for passphrase and the key filename is id_rsa_hadoop. It's okay if you do not specify the key filename but because I have a few keys file, it is easy for me to identify and remove it later when this tutorial is done. The key should be available in your current user .ssh directory. To ensure ssh to localhost is automatic, echo your public key into authorized_keys file as a valid authorized key.

jason@localhost:~$ ls .ssh/
authorized_keys  id_rsa  id_rsa_hadoop  id_rsa_hadoop.pub  id_rsa.pub  known_hosts

$ cat $HOME/.ssh/id_rsa_hadoop.pub >> $HOME/.ssh/authorized_keys

Right now if you ssh to localhost, you should logged without ssh asking for password in the terminal. That's it for the localhost setup. We will move on to the hadoop configuration.

Download a copy of hadoop. For this example, we are using hadoop version 2.4.0 . You can download it here. Then extract in the Desktop directory.

jason@localhost:~/Desktop$ tar -zxf hadoop-2.4.0.tar.gz
jason@localhost:~/Desktop$ cd hadoop-2.4.0
jason@localhost:~/Desktop/hadoop-2.4.0$ ls
bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin	share

Then we will create directory for namenode and datanode.

jason@localhost:~/Desktop/hadoop-2.4.0$ pwd
/home/jason/Desktop/hadoop-2.4.0
jason@localhost:~/Desktop/hadoop-2.4.0$ mkdir -p hadoop_store/hdfs/namenode hadoop_store/hdfs/datanode

Then there are a few environment needed to be setup. Assuming you are using bash, enter the following into your .bashrc

#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55
export HADOOP_INSTALL=/home/jason/Desktop/hadoop-2.4.0
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END

The only variable you need to take notice is JAVA_HOME and HADOOP_INSTALL. Once this is done, source immediately this file in your terminal as you will use the commands next.

jason@localhost:~/Desktop/hadoop-2.4.0$ source $HOME/.bashrc

We will now configured five xml properties files for hadoop, namely

etc/hadoop/hadoop-env.sh

etc/hadoop/core-site.xml

etc/hadoop/hdfs-site.xml

etc/hadoop/yarn-site.xml

etc/hadoop/mapred-site.xml

It is assume you are still at current working directory such as below so you can easily edit the above files.

$ pwd
/home/jason/Desktop/hadoop-2.4.0

add the following content into etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55

add the following contents into etc/hadoop/core-site.xml

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
</property>

add the following contents into etc/hadoop/hdfs-site.xml

<property>
   <name>dfs.replication</name>
   <value>1</value>
</property>
<property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/home/jason/Desktop/hadoop-2.4.0/hadoop_store/hdfs/namenode</value>
</property>
<property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/home/jason/Desktop/hadoop-2.4.0/hadoop_store/hdfs/datanode</value>
</property>

add the following into etc/hadoop/yarn-site.xml

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

for file etc/hadoop/mapred-site.xml, you can start by copy from etc/hadoop/mapred-site.xml.template

jason@localhost:~/Desktop/hadoop-2.4.0$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml

then add the following into the file etc/hadoop/mapred-site.xml

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>

Once it is done, that's it for the hadoop configuration and now run the command hdfs namenode -format . Below is the output in my terminal.

jason@localhost:~/Desktop/hadoop-2.4.0$ hdfs namenode -format
14/05/30 16:00:55 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.4.0
STARTUP_MSG:   classpath = /home/jason/Desktop/hadoop-2.4.0/etc/hadoop:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/log4j-1.2.17.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/paranamer-2.3.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/avro-1.7.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jersey-core-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jersey-server-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/hadoop-annotations-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/httpcore-4.2.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jets3t-0.9.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jetty-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/asm-3.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-cli-1.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/zookeeper-3.4.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/hadoop-auth-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-el-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jsp-api-2.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-lang-2.6.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jsch-0.1.42.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jettison-1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/servlet-api-2.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/activation-1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/xmlenc-0.52.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jetty-util-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/guava-11.0.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-codec-1.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/stax-api-1.0-2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-digester-1.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/xz-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-net-3.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jersey-json-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/netty-3.6.2.Final.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-io-2.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/httpclient-4.2.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/junit-4.8.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/hadoop-common-2.4.0-tests.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/hadoop-nfs-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/hadoop-common-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/asm-3.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-el-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-io-2.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/hadoop-hdfs-nfs-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/hadoop-hdfs-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/hadoop-hdfs-2.4.0-tests.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jackson-xc-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jersey-core-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jersey-server-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jline-0.9.94.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jersey-client-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jetty-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jackson-jaxrs-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/asm-3.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-cli-1.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/zookeeper-3.4.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jsr305-1.3.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-lang-2.6.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jettison-1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/servlet-api-2.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/activation-1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/javax.inject-1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/guava-11.0.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-codec-1.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/xz-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-httpclient-3.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/guice-3.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jersey-json-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-io-2.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-tests-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-common-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-client-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-common-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-api-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/hadoop-annotations-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/junit-4.10.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/xz-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/hamcrest-core-1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0-tests.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.0.jar:/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common -r 1583262; compiled by 'jenkins' on 2014-03-31T08:29Z
STARTUP_MSG:   java = 1.7.0_55
************************************************************/
14/05/30 16:00:55 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
14/05/30 16:00:55 INFO namenode.NameNode: createNameNode [-format]
14/05/30 16:00:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-a15244a5-fea6-42ad-ab38-92b9730521f5
14/05/30 16:00:58 INFO namenode.FSNamesystem: fsLock is fair:true
14/05/30 16:00:58 INFO namenode.HostFileManager: read includes:
HostSet(
)
14/05/30 16:00:58 INFO namenode.HostFileManager: read excludes:
HostSet(
)
14/05/30 16:00:58 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
14/05/30 16:00:58 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
14/05/30 16:00:58 INFO util.GSet: Computing capacity for map BlocksMap
14/05/30 16:00:58 INFO util.GSet: VM type       = 64-bit
14/05/30 16:00:58 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
14/05/30 16:00:58 INFO util.GSet: capacity      = 2^21 = 2097152 entries
14/05/30 16:00:58 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
14/05/30 16:00:58 INFO blockmanagement.BlockManager: defaultReplication         = 1
14/05/30 16:00:58 INFO blockmanagement.BlockManager: maxReplication             = 512
14/05/30 16:00:58 INFO blockmanagement.BlockManager: minReplication             = 1
14/05/30 16:00:58 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
14/05/30 16:00:58 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
14/05/30 16:00:58 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
14/05/30 16:00:58 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
14/05/30 16:00:58 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
14/05/30 16:00:58 INFO namenode.FSNamesystem: fsOwner             = jason (auth:SIMPLE)
14/05/30 16:00:58 INFO namenode.FSNamesystem: supergroup          = supergroup
14/05/30 16:00:58 INFO namenode.FSNamesystem: isPermissionEnabled = true
14/05/30 16:00:58 INFO namenode.FSNamesystem: HA Enabled: false
14/05/30 16:00:58 INFO namenode.FSNamesystem: Append Enabled: true
14/05/30 16:00:59 INFO util.GSet: Computing capacity for map INodeMap
14/05/30 16:00:59 INFO util.GSet: VM type       = 64-bit
14/05/30 16:00:59 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
14/05/30 16:00:59 INFO util.GSet: capacity      = 2^20 = 1048576 entries
14/05/30 16:00:59 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/05/30 16:00:59 INFO util.GSet: Computing capacity for map cachedBlocks
14/05/30 16:00:59 INFO util.GSet: VM type       = 64-bit
14/05/30 16:00:59 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
14/05/30 16:00:59 INFO util.GSet: capacity      = 2^18 = 262144 entries
14/05/30 16:00:59 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
14/05/30 16:00:59 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
14/05/30 16:00:59 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
14/05/30 16:00:59 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
14/05/30 16:00:59 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
14/05/30 16:00:59 INFO util.GSet: Computing capacity for map NameNodeRetryCache
14/05/30 16:00:59 INFO util.GSet: VM type       = 64-bit
14/05/30 16:00:59 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
14/05/30 16:00:59 INFO util.GSet: capacity      = 2^15 = 32768 entries
14/05/30 16:00:59 INFO namenode.AclConfigFlag: ACLs enabled? false
14/05/30 16:01:00 INFO namenode.FSImage: Allocated new BlockPoolId: BP-908722954-127.0.1.1-1401436859922
14/05/30 16:01:00 INFO common.Storage: Storage directory /home/jason/Desktop/hadoop-2.4.0/hadoop_store/hdfs/namenode has been successfully formatted.
14/05/30 16:01:01 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
14/05/30 16:01:01 INFO util.ExitUtil: Exiting with status 0
14/05/30 16:01:01 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.1.1
************************************************************/

With this output, you should not see any error. Okay, all good and now, start the engine!

jason@localhost:~/Desktop/hadoop-2.4.0$ start-dfs.sh && start-yarn.sh
14/05/30 16:04:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/jason/Desktop/hadoop-2.4.0/logs/hadoop-jason-namenode-localhost.out
localhost: starting datanode, logging to /home/jason/Desktop/hadoop-2.4.0/logs/hadoop-jason-datanode-localhost.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/jason/Desktop/hadoop-2.4.0/logs/hadoop-jason-secondarynamenode-localhost.out
14/05/30 16:05:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /home/jason/Desktop/hadoop-2.4.0/logs/yarn-jason-resourcemanager-localhost.out
localhost: starting nodemanager, logging to /home/jason/Desktop/hadoop-2.4.0/logs/yarn-jason-nodemanager-localhost.out
jason@localhost:~/Desktop/hadoop-2.4.0$

So you can check using jps if your hadoop is running. The expected hadoop processes are ResourceManager, SecondaryNameNode, NameNode, NodeManager and DataNode.

jason@localhost:~$ jps
22701 ResourceManager
22512 SecondaryNameNode
22210 NameNode
22800 NodeManager
6728 org.eclipse.equinox.launcher_1.3.0.v20120522-1813.jar
22840 Jps
22326 DataNode

You can access apache hadoop via the web interfaces:

Cluster status: http://localhost:8088
HDFS status: http://localhost:50070
Secondary NameNode status: http://localhost:50090

So that's looks good, everything is configured and now it is running fine. So we will continue by running a few examples.

jason@localhost:~/Desktop/hadoop-2.4.0$ hadoop jar /home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10
14/05/30 16:10:54 INFO fs.TestDFSIO: TestDFSIO.1.7
14/05/30 16:10:54 INFO fs.TestDFSIO: nrFiles = 20
14/05/30 16:10:54 INFO fs.TestDFSIO: nrBytes (MB) = 10.0
14/05/30 16:10:54 INFO fs.TestDFSIO: bufferSize = 1000000
14/05/30 16:10:54 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
14/05/30 16:10:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/30 16:10:57 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 20 files
14/05/30 16:11:01 INFO fs.TestDFSIO: created control files for: 20 files
14/05/30 16:11:01 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/30 16:11:01 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/30 16:11:04 INFO mapred.FileInputFormat: Total input paths to process : 20
14/05/30 16:11:04 INFO mapreduce.JobSubmitter: number of splits:20
14/05/30 16:11:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401437120030_0001
14/05/30 16:11:06 INFO impl.YarnClientImpl: Submitted application application_1401437120030_0001
14/05/30 16:11:06 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1401437120030_0001/
14/05/30 16:11:06 INFO mapreduce.Job: Running job: job_1401437120030_0001
14/05/30 16:11:28 INFO mapreduce.Job: Job job_1401437120030_0001 running in uber mode : false
14/05/30 16:11:28 INFO mapreduce.Job:  map 0% reduce 0%
14/05/30 16:12:30 INFO mapreduce.Job:  map 7% reduce 0%
14/05/30 16:12:31 INFO mapreduce.Job:  map 17% reduce 0%
14/05/30 16:12:34 INFO mapreduce.Job:  map 23% reduce 0%
14/05/30 16:12:36 INFO mapreduce.Job:  map 28% reduce 0%
14/05/30 16:12:37 INFO mapreduce.Job:  map 30% reduce 0%
14/05/30 16:13:36 INFO mapreduce.Job:  map 33% reduce 0%
14/05/30 16:13:39 INFO mapreduce.Job:  map 40% reduce 0%
14/05/30 16:13:40 INFO mapreduce.Job:  map 42% reduce 0%
14/05/30 16:13:42 INFO mapreduce.Job:  map 52% reduce 0%
14/05/30 16:13:43 INFO mapreduce.Job:  map 55% reduce 0%
14/05/30 16:13:44 INFO mapreduce.Job:  map 58% reduce 0%
14/05/30 16:13:45 INFO mapreduce.Job:  map 60% reduce 0%
14/05/30 16:14:47 INFO mapreduce.Job:  map 67% reduce 2%
14/05/30 16:14:50 INFO mapreduce.Job:  map 75% reduce 2%
14/05/30 16:14:51 INFO mapreduce.Job:  map 78% reduce 22%
14/05/30 16:14:53 INFO mapreduce.Job:  map 82% reduce 22%
14/05/30 16:14:54 INFO mapreduce.Job:  map 85% reduce 22%
14/05/30 16:14:55 INFO mapreduce.Job:  map 85% reduce 28%
14/05/30 16:15:37 INFO mapreduce.Job:  map 88% reduce 28%
14/05/30 16:15:40 INFO mapreduce.Job:  map 93% reduce 28%
14/05/30 16:15:42 INFO mapreduce.Job:  map 95% reduce 32%
14/05/30 16:15:44 INFO mapreduce.Job:  map 100% reduce 32%
14/05/30 16:15:45 INFO mapreduce.Job:  map 100% reduce 67%
14/05/30 16:15:47 INFO mapreduce.Job:  map 100% reduce 100%
14/05/30 16:15:49 INFO mapreduce.Job: Job job_1401437120030_0001 completed successfully
14/05/30 16:15:50 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=1673
		FILE: Number of bytes written=1965945
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=4720
		HDFS: Number of bytes written=209715278
		HDFS: Number of read operations=83
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=22
	Job Counters
		Killed map tasks=3
		Launched map tasks=23
		Launched reduce tasks=1
		Data-local map tasks=23
		Total time spent by all maps in occupied slots (ms)=1319128
		Total time spent by all reduces in occupied slots (ms)=124593
		Total time spent by all map tasks (ms)=1319128
		Total time spent by all reduce tasks (ms)=124593
		Total vcore-seconds taken by all map tasks=1319128
		Total vcore-seconds taken by all reduce tasks=124593
		Total megabyte-seconds taken by all map tasks=1350787072
		Total megabyte-seconds taken by all reduce tasks=127583232
	Map-Reduce Framework
		Map input records=20
		Map output records=100
		Map output bytes=1467
		Map output materialized bytes=1787
		Input split bytes=2470
		Combine input records=0
		Combine output records=0
		Reduce input groups=5
		Reduce shuffle bytes=1787
		Reduce input records=100
		Reduce output records=5
		Spilled Records=200
		Shuffled Maps =20
		Failed Shuffles=0
		Merged Map outputs=20
		GC time elapsed (ms)=14063
		CPU time spent (ms)=127640
		Physical memory (bytes) snapshot=5418561536
		Virtual memory (bytes) snapshot=14516457472
		Total committed heap usage (bytes)=4196401152
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=2250
	File Output Format Counters
		Bytes Written=78
14/05/30 16:15:50 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
14/05/30 16:15:50 INFO fs.TestDFSIO:            Date & time: Fri May 30 16:15:50 MYT 2014
14/05/30 16:15:50 INFO fs.TestDFSIO:        Number of files: 20
14/05/30 16:15:50 INFO fs.TestDFSIO: Total MBytes processed: 200.0
14/05/30 16:15:50 INFO fs.TestDFSIO:      Throughput mb/sec: 1.6888468553671554
14/05/30 16:15:50 INFO fs.TestDFSIO: Average IO rate mb/sec: 1.840719223022461
14/05/30 16:15:50 INFO fs.TestDFSIO:  IO rate std deviation: 0.7043729046488437
14/05/30 16:15:50 INFO fs.TestDFSIO:     Test exec time sec: 289.58
14/05/30 16:15:50 INFO fs.TestDFSIO:

clean the project.

jason@localhost:~/Desktop/hadoop-2.4.0$ hadoop jar /home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0-tests.jar TestDFSIO -clean
14/05/30 16:20:03 INFO fs.TestDFSIO: TestDFSIO.1.7
14/05/30 16:20:03 INFO fs.TestDFSIO: nrFiles = 1
14/05/30 16:20:03 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
14/05/30 16:20:03 INFO fs.TestDFSIO: bufferSize = 1000000
14/05/30 16:20:03 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
14/05/30 16:20:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/30 16:20:06 INFO fs.TestDFSIO: Cleaning up test files

another job example.

jason@localhost:~/Desktop/hadoop-2.4.0$ hadoop jar /home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar pi 2 5
Number of Maps  = 2
Samples per Map = 5
14/05/30 16:21:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
14/05/30 16:21:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/30 16:21:25 INFO input.FileInputFormat: Total input paths to process : 2
14/05/30 16:21:26 INFO mapreduce.JobSubmitter: number of splits:2
14/05/30 16:21:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401437120030_0002
14/05/30 16:21:28 INFO impl.YarnClientImpl: Submitted application application_1401437120030_0002
14/05/30 16:21:28 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1401437120030_0002/
14/05/30 16:21:28 INFO mapreduce.Job: Running job: job_1401437120030_0002
14/05/30 16:21:53 INFO mapreduce.Job: Job job_1401437120030_0002 running in uber mode : false
14/05/30 16:21:53 INFO mapreduce.Job:  map 0% reduce 0%
14/05/30 16:22:18 INFO mapreduce.Job:  map 100% reduce 0%
14/05/30 16:22:34 INFO mapreduce.Job:  map 100% reduce 100%
14/05/30 16:22:35 INFO mapreduce.Job: Job job_1401437120030_0002 completed successfully
14/05/30 16:22:36 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=50
		FILE: Number of bytes written=280470
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=530
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=11
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Job Counters
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=46538
		Total time spent by all reduces in occupied slots (ms)=13821
		Total time spent by all map tasks (ms)=46538
		Total time spent by all reduce tasks (ms)=13821
		Total vcore-seconds taken by all map tasks=46538
		Total vcore-seconds taken by all reduce tasks=13821
		Total megabyte-seconds taken by all map tasks=47654912
		Total megabyte-seconds taken by all reduce tasks=14152704
	Map-Reduce Framework
		Map input records=2
		Map output records=4
		Map output bytes=36
		Map output materialized bytes=56
		Input split bytes=294
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=56
		Reduce input records=4
		Reduce output records=0
		Spilled Records=8
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=631
		CPU time spent (ms)=7890
		Physical memory (bytes) snapshot=623665152
		Virtual memory (bytes) snapshot=2097958912
		Total committed heap usage (bytes)=559939584
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=236
	File Output Format Counters
		Bytes Written=97
Job Finished in 73.196 seconds
Estimated value of Pi is 3.60000000000000000000

You can also create file and save on hadoop. You can read more at http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/FileSystemShell.html

jason@localhost:~$ hadoop fs -mkdir -p /user/hduser
14/05/30 16:27:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
jason@localhost:~$ hadoop fs -copyFromLocal dummy.txt dummy.txt
14/05/30 16:27:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
jason@localhost:~$ hadoop fs -ls
14/05/30 16:28:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 jason supergroup         13 2014-05-30 16:27 dummy.txt
jason@localhost:~$ hadoop fs -cat /user/hduser/dummy.txt
14/05/30 16:29:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
cat: `/user/hduser/dummy.txt': No such file or directory
jason@localhost:~$ hadoop fs -cat /user/jason/dummy.txt
14/05/30 16:29:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hello world.
jason@localhost:~$ hadoop fs -ls /
14/05/30 16:29:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x   - jason supergroup          0 2014-05-30 16:20 /benchmarks
drwx------   - jason supergroup          0 2014-05-30 16:11 /tmp
drwxr-xr-x   - jason supergroup          0 2014-05-30 16:27 /user
jason@localhost:~$ hadoop fs -rm dummy.txt
14/05/30 16:29:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/30 16:29:54 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted dummy.txt
jason@localhost:~$ hadoop fs -ls
14/05/30 16:30:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
jason@localhost:~$

Once you are done with hadoop cluster, you can shut it down using stop-dfs.sh && stop-yarn.sh

jason@localhost:~/Desktop/hadoop-2.4.0$ stop-dfs.sh && stop-yarn.sh
14/05/30 17:51:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
14/05/30 17:51:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop

You can remove/revert the changes you made for this tutorial.

/home/jason/Desktop/hadoop-2.4.0
/home/jason/.ssh/id_rsa_hadoop.pub
/home/jason/.ssh/id_rsa_hadoop
/home/jason/.ssh/authorized_keys
/home/jason/.bashrc

That's it for this lengthy article, hope you like it and if you learn something , remember to donate to us too!

Pages