Information Technology Blogs: java1.7.0

Showing posts with label java1.7.0_55. Show all posts

Sunday, June 8, 2014

Initial study into apache hadoop single node cluster

If you have read big data article, you will definitely encountered the term, hadoop. Today, we are going to learn Apache Hadoop.

What is Apache Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

There are links here and here further explain what is hadoop and its components.

I must admit to quickly setup and run single node cluster is difficult. Mainly because this is my first time learning hadoop and official documentation is not for the starter. So I google a few and got a few helpful links. The following setup is mainly for starter to get a feel on how it works. Sort of like hello world example of hadoop. As such goal is as simple as possible to get a feel of what it is.

Setup is a single node cluster, it work with current linux (debian) user environment and we can remove easily changes we've made after this tutorial. Note that example below is using my own username (jason), and it should work with your user ($HOME) environment too. User security is not a concern issue here as the objective is to learn the basic of hadoop here. A few system setup are needed and we start to prepare for the environment for hadoop.

Because this is a java library, a required JRE installed is needed. This article assume you have java installed and running. You can check it below. If you do not have java, google how to install JRE.

jason@localhost:~$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

ssh daemon is required on your workstation. It is also recommend that openssh-client is installed as we will generate public and private key for automatic ssh login. Thus, apt-get install openssh-server openssh-client

Once both packages are installed, make sure sshd daemon is running and generate public and private key.

ssh-keygen -t rsa -P '' -f id_rsa_hadoop

with the above commands, we specified key type is rsa with empty passphrase so ssh will not prompt for passphrase and the key filename is id_rsa_hadoop. It's okay if you do not specify the key filename but because I have a few keys file, it is easy for me to identify and remove it later when this tutorial is done. The key should be available in your current user .ssh directory. To ensure ssh to localhost is automatic, echo your public key into authorized_keys file as a valid authorized key.

jason@localhost:~$ ls .ssh/
authorized_keys  id_rsa  id_rsa_hadoop  id_rsa_hadoop.pub  id_rsa.pub  known_hosts

$ cat $HOME/.ssh/id_rsa_hadoop.pub >> $HOME/.ssh/authorized_keys

Right now if you ssh to localhost, you should logged without ssh asking for password in the terminal. That's it for the localhost setup. We will move on to the hadoop configuration.

Download a copy of hadoop. For this example, we are using hadoop version 2.4.0 . You can download it here. Then extract in the Desktop directory.

jason@localhost:~/Desktop$ tar -zxf hadoop-2.4.0.tar.gz
jason@localhost:~/Desktop$ cd hadoop-2.4.0
jason@localhost:~/Desktop/hadoop-2.4.0$ ls
bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin	share

Then we will create directory for namenode and datanode.

jason@localhost:~/Desktop/hadoop-2.4.0$ pwd
/home/jason/Desktop/hadoop-2.4.0
jason@localhost:~/Desktop/hadoop-2.4.0$ mkdir -p hadoop_store/hdfs/namenode hadoop_store/hdfs/datanode

Then there are a few environment needed to be setup. Assuming you are using bash, enter the following into your .bashrc

#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55
export HADOOP_INSTALL=/home/jason/Desktop/hadoop-2.4.0
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END

The only variable you need to take notice is JAVA_HOME and HADOOP_INSTALL. Once this is done, source immediately this file in your terminal as you will use the commands next.

jason@localhost:~/Desktop/hadoop-2.4.0$ source $HOME/.bashrc

We will now configured five xml properties files for hadoop, namely

etc/hadoop/hadoop-env.sh

etc/hadoop/core-site.xml

etc/hadoop/hdfs-site.xml

etc/hadoop/yarn-site.xml

etc/hadoop/mapred-site.xml

It is assume you are still at current working directory such as below so you can easily edit the above files.

$ pwd
/home/jason/Desktop/hadoop-2.4.0

add the following content into etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55

add the following contents into etc/hadoop/core-site.xml

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
</property>

add the following contents into etc/hadoop/hdfs-site.xml

<property>
   <name>dfs.replication</name>
   <value>1</value>
</property>
<property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/home/jason/Desktop/hadoop-2.4.0/hadoop_store/hdfs/namenode</value>
</property>
<property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/home/jason/Desktop/hadoop-2.4.0/hadoop_store/hdfs/datanode</value>
</property>

add the following into etc/hadoop/yarn-site.xml

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

for file etc/hadoop/mapred-site.xml, you can start by copy from etc/hadoop/mapred-site.xml.template

jason@localhost:~/Desktop/hadoop-2.4.0$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml

then add the following into the file etc/hadoop/mapred-site.xml

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>

Once it is done, that's it for the hadoop configuration and now run the command hdfs namenode -format . Below is the output in my terminal.

jason@localhost:~/Desktop/hadoop-2.4.0$ hdfs namenode -format
14/05/30 16:00:55 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.4.0
STARTUP_MSG:   classpath = /home/jason/Desktop/hadoop-2.4.0/etc/hadoop:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/log4j-1.2.17.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/paranamer-2.3.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/avro-1.7.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jersey-core-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jersey-server-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/hadoop-annotations-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/httpcore-4.2.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jets3t-0.9.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jetty-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/asm-3.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-cli-1.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/zookeeper-3.4.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/hadoop-auth-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-el-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jsp-api-2.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-lang-2.6.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jsch-0.1.42.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jettison-1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/servlet-api-2.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/activation-1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/xmlenc-0.52.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jetty-util-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/guava-11.0.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-codec-1.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/stax-api-1.0-2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-digester-1.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/xz-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-net-3.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/jersey-json-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/netty-3.6.2.Final.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-io-2.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/httpclient-4.2.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/junit-4.8.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/hadoop-common-2.4.0-tests.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/hadoop-nfs-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/common/hadoop-common-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/asm-3.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-el-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/lib/commons-io-2.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/hadoop-hdfs-nfs-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/hadoop-hdfs-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/hdfs/hadoop-hdfs-2.4.0-tests.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jackson-xc-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jersey-core-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jersey-server-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jline-0.9.94.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jersey-client-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jetty-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jackson-jaxrs-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/asm-3.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-cli-1.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/zookeeper-3.4.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jsr305-1.3.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-lang-2.6.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jettison-1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/servlet-api-2.5.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/activation-1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/javax.inject-1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/guava-11.0.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-codec-1.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/xz-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-httpclient-3.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/guice-3.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/jersey-json-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-io-2.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-tests-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-common-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-client-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-common-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-api-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/hadoop-annotations-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/junit-4.10.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/xz-1.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/hamcrest-core-1.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0-tests.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.4.0.jar:/home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.0.jar:/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common -r 1583262; compiled by 'jenkins' on 2014-03-31T08:29Z
STARTUP_MSG:   java = 1.7.0_55
************************************************************/
14/05/30 16:00:55 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
14/05/30 16:00:55 INFO namenode.NameNode: createNameNode [-format]
14/05/30 16:00:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-a15244a5-fea6-42ad-ab38-92b9730521f5
14/05/30 16:00:58 INFO namenode.FSNamesystem: fsLock is fair:true
14/05/30 16:00:58 INFO namenode.HostFileManager: read includes:
HostSet(
)
14/05/30 16:00:58 INFO namenode.HostFileManager: read excludes:
HostSet(
)
14/05/30 16:00:58 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
14/05/30 16:00:58 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
14/05/30 16:00:58 INFO util.GSet: Computing capacity for map BlocksMap
14/05/30 16:00:58 INFO util.GSet: VM type       = 64-bit
14/05/30 16:00:58 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
14/05/30 16:00:58 INFO util.GSet: capacity      = 2^21 = 2097152 entries
14/05/30 16:00:58 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
14/05/30 16:00:58 INFO blockmanagement.BlockManager: defaultReplication         = 1
14/05/30 16:00:58 INFO blockmanagement.BlockManager: maxReplication             = 512
14/05/30 16:00:58 INFO blockmanagement.BlockManager: minReplication             = 1
14/05/30 16:00:58 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
14/05/30 16:00:58 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
14/05/30 16:00:58 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
14/05/30 16:00:58 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
14/05/30 16:00:58 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
14/05/30 16:00:58 INFO namenode.FSNamesystem: fsOwner             = jason (auth:SIMPLE)
14/05/30 16:00:58 INFO namenode.FSNamesystem: supergroup          = supergroup
14/05/30 16:00:58 INFO namenode.FSNamesystem: isPermissionEnabled = true
14/05/30 16:00:58 INFO namenode.FSNamesystem: HA Enabled: false
14/05/30 16:00:58 INFO namenode.FSNamesystem: Append Enabled: true
14/05/30 16:00:59 INFO util.GSet: Computing capacity for map INodeMap
14/05/30 16:00:59 INFO util.GSet: VM type       = 64-bit
14/05/30 16:00:59 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
14/05/30 16:00:59 INFO util.GSet: capacity      = 2^20 = 1048576 entries
14/05/30 16:00:59 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/05/30 16:00:59 INFO util.GSet: Computing capacity for map cachedBlocks
14/05/30 16:00:59 INFO util.GSet: VM type       = 64-bit
14/05/30 16:00:59 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
14/05/30 16:00:59 INFO util.GSet: capacity      = 2^18 = 262144 entries
14/05/30 16:00:59 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
14/05/30 16:00:59 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
14/05/30 16:00:59 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
14/05/30 16:00:59 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
14/05/30 16:00:59 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
14/05/30 16:00:59 INFO util.GSet: Computing capacity for map NameNodeRetryCache
14/05/30 16:00:59 INFO util.GSet: VM type       = 64-bit
14/05/30 16:00:59 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
14/05/30 16:00:59 INFO util.GSet: capacity      = 2^15 = 32768 entries
14/05/30 16:00:59 INFO namenode.AclConfigFlag: ACLs enabled? false
14/05/30 16:01:00 INFO namenode.FSImage: Allocated new BlockPoolId: BP-908722954-127.0.1.1-1401436859922
14/05/30 16:01:00 INFO common.Storage: Storage directory /home/jason/Desktop/hadoop-2.4.0/hadoop_store/hdfs/namenode has been successfully formatted.
14/05/30 16:01:01 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
14/05/30 16:01:01 INFO util.ExitUtil: Exiting with status 0
14/05/30 16:01:01 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.1.1
************************************************************/

With this output, you should not see any error. Okay, all good and now, start the engine!

jason@localhost:~/Desktop/hadoop-2.4.0$ start-dfs.sh && start-yarn.sh
14/05/30 16:04:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/jason/Desktop/hadoop-2.4.0/logs/hadoop-jason-namenode-localhost.out
localhost: starting datanode, logging to /home/jason/Desktop/hadoop-2.4.0/logs/hadoop-jason-datanode-localhost.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/jason/Desktop/hadoop-2.4.0/logs/hadoop-jason-secondarynamenode-localhost.out
14/05/30 16:05:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /home/jason/Desktop/hadoop-2.4.0/logs/yarn-jason-resourcemanager-localhost.out
localhost: starting nodemanager, logging to /home/jason/Desktop/hadoop-2.4.0/logs/yarn-jason-nodemanager-localhost.out
jason@localhost:~/Desktop/hadoop-2.4.0$

So you can check using jps if your hadoop is running. The expected hadoop processes are ResourceManager, SecondaryNameNode, NameNode, NodeManager and DataNode.

jason@localhost:~$ jps
22701 ResourceManager
22512 SecondaryNameNode
22210 NameNode
22800 NodeManager
6728 org.eclipse.equinox.launcher_1.3.0.v20120522-1813.jar
22840 Jps
22326 DataNode

You can access apache hadoop via the web interfaces:

Cluster status: http://localhost:8088
HDFS status: http://localhost:50070
Secondary NameNode status: http://localhost:50090

So that's looks good, everything is configured and now it is running fine. So we will continue by running a few examples.

jason@localhost:~/Desktop/hadoop-2.4.0$ hadoop jar /home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10
14/05/30 16:10:54 INFO fs.TestDFSIO: TestDFSIO.1.7
14/05/30 16:10:54 INFO fs.TestDFSIO: nrFiles = 20
14/05/30 16:10:54 INFO fs.TestDFSIO: nrBytes (MB) = 10.0
14/05/30 16:10:54 INFO fs.TestDFSIO: bufferSize = 1000000
14/05/30 16:10:54 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
14/05/30 16:10:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/30 16:10:57 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 20 files
14/05/30 16:11:01 INFO fs.TestDFSIO: created control files for: 20 files
14/05/30 16:11:01 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/30 16:11:01 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/30 16:11:04 INFO mapred.FileInputFormat: Total input paths to process : 20
14/05/30 16:11:04 INFO mapreduce.JobSubmitter: number of splits:20
14/05/30 16:11:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401437120030_0001
14/05/30 16:11:06 INFO impl.YarnClientImpl: Submitted application application_1401437120030_0001
14/05/30 16:11:06 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1401437120030_0001/
14/05/30 16:11:06 INFO mapreduce.Job: Running job: job_1401437120030_0001
14/05/30 16:11:28 INFO mapreduce.Job: Job job_1401437120030_0001 running in uber mode : false
14/05/30 16:11:28 INFO mapreduce.Job:  map 0% reduce 0%
14/05/30 16:12:30 INFO mapreduce.Job:  map 7% reduce 0%
14/05/30 16:12:31 INFO mapreduce.Job:  map 17% reduce 0%
14/05/30 16:12:34 INFO mapreduce.Job:  map 23% reduce 0%
14/05/30 16:12:36 INFO mapreduce.Job:  map 28% reduce 0%
14/05/30 16:12:37 INFO mapreduce.Job:  map 30% reduce 0%
14/05/30 16:13:36 INFO mapreduce.Job:  map 33% reduce 0%
14/05/30 16:13:39 INFO mapreduce.Job:  map 40% reduce 0%
14/05/30 16:13:40 INFO mapreduce.Job:  map 42% reduce 0%
14/05/30 16:13:42 INFO mapreduce.Job:  map 52% reduce 0%
14/05/30 16:13:43 INFO mapreduce.Job:  map 55% reduce 0%
14/05/30 16:13:44 INFO mapreduce.Job:  map 58% reduce 0%
14/05/30 16:13:45 INFO mapreduce.Job:  map 60% reduce 0%
14/05/30 16:14:47 INFO mapreduce.Job:  map 67% reduce 2%
14/05/30 16:14:50 INFO mapreduce.Job:  map 75% reduce 2%
14/05/30 16:14:51 INFO mapreduce.Job:  map 78% reduce 22%
14/05/30 16:14:53 INFO mapreduce.Job:  map 82% reduce 22%
14/05/30 16:14:54 INFO mapreduce.Job:  map 85% reduce 22%
14/05/30 16:14:55 INFO mapreduce.Job:  map 85% reduce 28%
14/05/30 16:15:37 INFO mapreduce.Job:  map 88% reduce 28%
14/05/30 16:15:40 INFO mapreduce.Job:  map 93% reduce 28%
14/05/30 16:15:42 INFO mapreduce.Job:  map 95% reduce 32%
14/05/30 16:15:44 INFO mapreduce.Job:  map 100% reduce 32%
14/05/30 16:15:45 INFO mapreduce.Job:  map 100% reduce 67%
14/05/30 16:15:47 INFO mapreduce.Job:  map 100% reduce 100%
14/05/30 16:15:49 INFO mapreduce.Job: Job job_1401437120030_0001 completed successfully
14/05/30 16:15:50 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=1673
		FILE: Number of bytes written=1965945
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=4720
		HDFS: Number of bytes written=209715278
		HDFS: Number of read operations=83
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=22
	Job Counters
		Killed map tasks=3
		Launched map tasks=23
		Launched reduce tasks=1
		Data-local map tasks=23
		Total time spent by all maps in occupied slots (ms)=1319128
		Total time spent by all reduces in occupied slots (ms)=124593
		Total time spent by all map tasks (ms)=1319128
		Total time spent by all reduce tasks (ms)=124593
		Total vcore-seconds taken by all map tasks=1319128
		Total vcore-seconds taken by all reduce tasks=124593
		Total megabyte-seconds taken by all map tasks=1350787072
		Total megabyte-seconds taken by all reduce tasks=127583232
	Map-Reduce Framework
		Map input records=20
		Map output records=100
		Map output bytes=1467
		Map output materialized bytes=1787
		Input split bytes=2470
		Combine input records=0
		Combine output records=0
		Reduce input groups=5
		Reduce shuffle bytes=1787
		Reduce input records=100
		Reduce output records=5
		Spilled Records=200
		Shuffled Maps =20
		Failed Shuffles=0
		Merged Map outputs=20
		GC time elapsed (ms)=14063
		CPU time spent (ms)=127640
		Physical memory (bytes) snapshot=5418561536
		Virtual memory (bytes) snapshot=14516457472
		Total committed heap usage (bytes)=4196401152
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=2250
	File Output Format Counters
		Bytes Written=78
14/05/30 16:15:50 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
14/05/30 16:15:50 INFO fs.TestDFSIO:            Date & time: Fri May 30 16:15:50 MYT 2014
14/05/30 16:15:50 INFO fs.TestDFSIO:        Number of files: 20
14/05/30 16:15:50 INFO fs.TestDFSIO: Total MBytes processed: 200.0
14/05/30 16:15:50 INFO fs.TestDFSIO:      Throughput mb/sec: 1.6888468553671554
14/05/30 16:15:50 INFO fs.TestDFSIO: Average IO rate mb/sec: 1.840719223022461
14/05/30 16:15:50 INFO fs.TestDFSIO:  IO rate std deviation: 0.7043729046488437
14/05/30 16:15:50 INFO fs.TestDFSIO:     Test exec time sec: 289.58
14/05/30 16:15:50 INFO fs.TestDFSIO:

clean the project.

jason@localhost:~/Desktop/hadoop-2.4.0$ hadoop jar /home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.0-tests.jar TestDFSIO -clean
14/05/30 16:20:03 INFO fs.TestDFSIO: TestDFSIO.1.7
14/05/30 16:20:03 INFO fs.TestDFSIO: nrFiles = 1
14/05/30 16:20:03 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
14/05/30 16:20:03 INFO fs.TestDFSIO: bufferSize = 1000000
14/05/30 16:20:03 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
14/05/30 16:20:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/30 16:20:06 INFO fs.TestDFSIO: Cleaning up test files

another job example.

jason@localhost:~/Desktop/hadoop-2.4.0$ hadoop jar /home/jason/Desktop/hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar pi 2 5
Number of Maps  = 2
Samples per Map = 5
14/05/30 16:21:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
14/05/30 16:21:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/30 16:21:25 INFO input.FileInputFormat: Total input paths to process : 2
14/05/30 16:21:26 INFO mapreduce.JobSubmitter: number of splits:2
14/05/30 16:21:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401437120030_0002
14/05/30 16:21:28 INFO impl.YarnClientImpl: Submitted application application_1401437120030_0002
14/05/30 16:21:28 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1401437120030_0002/
14/05/30 16:21:28 INFO mapreduce.Job: Running job: job_1401437120030_0002
14/05/30 16:21:53 INFO mapreduce.Job: Job job_1401437120030_0002 running in uber mode : false
14/05/30 16:21:53 INFO mapreduce.Job:  map 0% reduce 0%
14/05/30 16:22:18 INFO mapreduce.Job:  map 100% reduce 0%
14/05/30 16:22:34 INFO mapreduce.Job:  map 100% reduce 100%
14/05/30 16:22:35 INFO mapreduce.Job: Job job_1401437120030_0002 completed successfully
14/05/30 16:22:36 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=50
		FILE: Number of bytes written=280470
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=530
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=11
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Job Counters
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=46538
		Total time spent by all reduces in occupied slots (ms)=13821
		Total time spent by all map tasks (ms)=46538
		Total time spent by all reduce tasks (ms)=13821
		Total vcore-seconds taken by all map tasks=46538
		Total vcore-seconds taken by all reduce tasks=13821
		Total megabyte-seconds taken by all map tasks=47654912
		Total megabyte-seconds taken by all reduce tasks=14152704
	Map-Reduce Framework
		Map input records=2
		Map output records=4
		Map output bytes=36
		Map output materialized bytes=56
		Input split bytes=294
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=56
		Reduce input records=4
		Reduce output records=0
		Spilled Records=8
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=631
		CPU time spent (ms)=7890
		Physical memory (bytes) snapshot=623665152
		Virtual memory (bytes) snapshot=2097958912
		Total committed heap usage (bytes)=559939584
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=236
	File Output Format Counters
		Bytes Written=97
Job Finished in 73.196 seconds
Estimated value of Pi is 3.60000000000000000000

You can also create file and save on hadoop. You can read more at http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/FileSystemShell.html

jason@localhost:~$ hadoop fs -mkdir -p /user/hduser
14/05/30 16:27:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
jason@localhost:~$ hadoop fs -copyFromLocal dummy.txt dummy.txt
14/05/30 16:27:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
jason@localhost:~$ hadoop fs -ls
14/05/30 16:28:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 jason supergroup         13 2014-05-30 16:27 dummy.txt
jason@localhost:~$ hadoop fs -cat /user/hduser/dummy.txt
14/05/30 16:29:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
cat: `/user/hduser/dummy.txt': No such file or directory
jason@localhost:~$ hadoop fs -cat /user/jason/dummy.txt
14/05/30 16:29:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hello world.
jason@localhost:~$ hadoop fs -ls /
14/05/30 16:29:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x   - jason supergroup          0 2014-05-30 16:20 /benchmarks
drwx------   - jason supergroup          0 2014-05-30 16:11 /tmp
drwxr-xr-x   - jason supergroup          0 2014-05-30 16:27 /user
jason@localhost:~$ hadoop fs -rm dummy.txt
14/05/30 16:29:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/30 16:29:54 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted dummy.txt
jason@localhost:~$ hadoop fs -ls
14/05/30 16:30:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
jason@localhost:~$

Once you are done with hadoop cluster, you can shut it down using stop-dfs.sh && stop-yarn.sh

jason@localhost:~/Desktop/hadoop-2.4.0$ stop-dfs.sh && stop-yarn.sh
14/05/30 17:51:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
14/05/30 17:51:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop

You can remove/revert the changes you made for this tutorial.

/home/jason/Desktop/hadoop-2.4.0
/home/jason/.ssh/id_rsa_hadoop.pub
/home/jason/.ssh/id_rsa_hadoop
/home/jason/.ssh/authorized_keys
/home/jason/.bashrc

That's it for this lengthy article, hope you like it and if you learn something , remember to donate to us too!

Sunday, May 18, 2014

Learning java native keyword

Today when I study into java code UnixNativeDispatcher.java, the code caught my attention. Snippet below

/**
 * int openat(int dfd, const char* path, int oflag, mode_t mode)
 */
static int openat(int dfd, byte[] path, int flags, int mode) throws UnixException {
    NativeBuffer buffer = NativeBuffers.asNativeBuffer(path);
    try {
        return openat0(dfd, buffer.address(), flags, mode);
    } finally {
        buffer.release();
    }
}
private static native int openat0(int dfd, long pathAddress, int flags, int mode) throws UnixException;

So what happened actually? The static method openat is declared with package modifier which eventually do conversion before native private method openat0 is called. Note that the private method openat0 is actually a non java code execution.

native : Used in method declarations to specify that the method is not implemented in the same Java source file, but rather in another language.

Because it is executed in non java code, if you called native code, the associated method must be implemented in non java language, you can see example how is it done here with hello world example.

But with the native openat0, it came with the precompiled soname file, depending on which jvm you are using, to illustrate using this example. In the directory /usr/lib/jvm/jdk1.7.0_55/jre/lib/amd64/

$ ls /usr/lib/jvm/jdk1.7.0_55/jre/lib/amd64/
fxavcodecplugin-52.so  libawt.so	      libgstreamer-lite.so  libjava_crw_demo.so   libjdwp.so	   libjsoundalsa.so  libnpjp2.so	 libt2k.so
fxavcodecplugin-53.so  libdcpr.so	      libhprof.so	    libjavafx-font.so	  libjfr.so	   libjsound.so      libnpt.so		 libunpack.so
fxplugins.so	       libdeploy.so	      libinstrument.so	    libjavafx-iio.so	  libjfxmedia.so   libkcms.so	     libprism-es2.so	 libverify.so
headless	       libdt_socket.so	      libj2gss.so	    libjavaplugin_jni.so  libjfxwebkit.so  libmanagement.so  libsaproc.so	 libzip.so
jli		       libfontmanager.so      libj2pcsc.so	    libjava.so		  libjpeg.so	   libmlib_image.so  libsctp.so		 server
jvm.cfg		       libglass.so	      libj2pkcs11.so	    libjawt.so		  libjsdt.so	   libnet.so	     libsplashscreen.so  xawt
libattach.so	       libgstplugins-lite.so  libjaas_unix.so	    libJdbcOdbc.so	  libjsig.so	   libnio.so	     libsunec.so

As you may notice, there are many precompiled soname files come with jre. To check which soname loaded,

static {
    AccessController.doPrivileged(new PrivilegedAction<Void>() {
        public Void run() {
            System.loadLibrary("nio");
            return null;
    }});
    int flags = init();

    hasAtSysCalls = (flags & HAS_AT_SYSCALLS) > 0;
}

obviously libnio.so is loaded, and to read the content of the soname file,

$ objdump -T /usr/lib/jvm/jdk1.7.0_55/jre/lib/amd64/libnio.so | grep openat
000000000000c350 g DF .text 00000000000000ac SUNWprivate_1.1 Java_sun_nio_fs_UnixNativeDispatcher_openat0
$ nm -D /usr/lib/jvm/jdk1.7.0_55/jre/lib/amd64/libnio.so | grep openat0
000000000000c350 T Java_sun_nio_fs_UnixNativeDispatcher_openat0

So that's it, I hope you learned java native keyword too!

Saturday, May 17, 2014

Getting familiar with Java FileChannel

When I was studying into lucene 4.8.0 codebase, one particular code that stumble upon was the use of FileChannel. So today, I'm spending time to play around this class FileChannel.

So you would ask, why use FileChannel instead of BufferedWriter?

From the FileChannel documentation:

In addition to the familiar read, write, and close operations of byte channels, this class defines the following file-specific operations:

Bytes may be read or written at an absolute position in a file in a way that does not affect the channel's current position.

A region of a file may be mapped directly into memory; for large files this is often much more efficient than invoking the usual read or write methods.

Updates made to a file may be forced out to the underlying storage device, ensuring that data are not lost in the event of a system crash.

Bytes can be transferred from a file to some other channel, and vice versa, in a way that can be optimized by many operating systems into a very fast transfer directly to or from the filesystem cache.

A region of a file may be locked against access by other programs.

That sounds interesting, and to understand better, we will start to write code using class FileChannel. Below is one I wrote and explanation come after.

import java.io.File;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.StandardOpenOption;

public class FileChannelTest {

	public static void main(String[] args) throws IOException {

		try {
			File aFile = new File("test.txt");
			FileChannel fc = FileChannel.open(aFile.toPath(), StandardOpenOption.CREATE, StandardOpenOption.WRITE, StandardOpenOption.READ);
			System.out.println("initialized is Open " + fc.isOpen()); // true

			String data = "hello orld";
			ByteBuffer buf = ByteBuffer.allocate(data.length());
			buf.put(data.getBytes());
			buf.flip();

			System.out.println("initial size " + fc.size());  // 0
			System.out.println("initial position " + fc.position()); // 0
			fc.write(buf);
			System.out.println("after write size " + fc.size()); // 10
			System.out.println("after write position " + fc.position()); //10 

			ByteBuffer dst = ByteBuffer.allocate(200);
			fc.read(dst, 0);
			System.out.println("initial read " + new String(dst.array(), "UTF-8")); // hello orld

			ByteBuffer newData = ByteBuffer.wrap("world\ndelete me".getBytes());
			fc.write(newData, 6);
			fc.position(21);

			dst.clear();
			fc.read(dst, 0);
			System.out.println("read second write " + new String(dst.array(), "UTF-8")); //hello world
                                                                                                     //delete me
			System.out.println("after second write size " + fc.size());  // 21
			System.out.println("after second write pos  " + fc.position()); // 21

			fc.truncate(12);
			System.out.println("after truncate size " + fc.size()); // 12
			System.out.println("after truncate pos  " + fc.position()); // 12
			dst = ByteBuffer.allocate(200);
			fc.read(dst, 0);
			System.out.println("after truncate " + new String(dst.array(), "UTF-8"));

			newData.clear();
			newData = ByteBuffer.wrap("a new line of text\n".getBytes());
			fc.write(newData);
			System.out.println("after second write size " + fc.size()); // 31
			System.out.println("after second write pos  " + fc.position()); // 31

			fc.force(true);
			fc.close();

			System.out.println("after close " + fc.isOpen()); // false

		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

	}

}

It's a simple single threaded class. So we start with a File object with a test file. Then we create FileChannel object where we create test.txt if is not exists, then write and read it.

To dip our toe into the water, we start by checking if the file Channel is open. In order to write, we need ByteBuffer. We construct a new ByteBuffer object with the data length and write the data into the buffer array. In order for the channel to write this ByteBuffer, you must call the method flip().

We checked what is the current FileChannel size and position. Initially, they are zero. After data is written, note that size and position has been increased to 10. So in order to check the file channel written, we can invoke the method read(), hence the next few statements in the code.

To hold the data read from the file channel object, we create a new ByteBuffer object called dst, with a capacity of 200, so we can fit 200 bytes of data. We read the file channel object starting from file position 0 into the dst object. As expected in the print out, hello orld is written into file channel object. Interesting! It means we can write based on position we want to specify and that is exactly the next few lines of code did, we write "world\ndelete me" using method write but with position specify. If you notice carefully, unlike method write(), write() did not update file channel position, hence we are setting it explicitly to advance to position 21, that is, last position of character in the file. With this second write, we change the original data (hello orld) to (hello world\ndelete me), this overwrite the existing string than appending.

We check the file channel object after second write, we corrected the typo in the string and the position and size is as expected (21). Now we truncate the file channel up to 12bytes. That is, we start to truncate from position 0 until 12, which the byte array containing "hello world\n" survive and the remaining will be truncated. so the remaining size is 12 and position is set to 12 as well. As verified from the print out, delete me is no longer exist in the file channel.

As the file position is set end of file, we can simulate append effect by just writing and that is exactly what the next lines of code did. To ensure data and metadata is flush to the block device, we called force with parameter true. We end this example with invoking close method and check after that file channel is no longer opened.

When file channel object is created, we added open option, StandardOpenOption.READ is because when read into the file channel, this bit has to be set or else you will get exception. That's it about learning FileChannel.

Pages