Information Technology Blogs: java1.7.0

Showing posts with label java1.7.0_55. Show all posts

Saturday, November 7, 2015

Apache accumulo first learning experience

Today we will take a look into another big data technology. Apache accumulo is the topic for today. First, what is accumulo?

Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. Other notable improvements and feature are outlined here.

Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, Hypertable, and Cassandra. Accumulo began its development in 2008 and joined the Apache community in 2011.

In this article, as always, we will setup the infrastructure. I reference this article with the following environment.

64bit arch
open jdk 1.7/1.8
zookeeper-3.4.6
hadoop-2.6.1
accumulo-1.7.0
openssh
rsync
debian sid

As accumulo is java based project, you must installed and configured java. Get latest java 1.7 or 1.8 as of this writing. After java is installed you need to export JAVA_HOME in your bash configuration file, .bashrc with this line export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55

Then you need to source the new .bashrc. . .bashrc is sufficient. For ssh and rsync, you can use apt-get package manager as it is easy. What's important is that, you should enable public and private key in your user configuration ssh directory.

You can create two directories, $HOME/Downloads and $HOME/Installs respectively. It's pretty intuitive, the downloads directory is for the package downloaded and the install is the working directory after the compress packages are downloaded.

http://www.eu.apache.org/dist/zookeeper/zookeeper-3.4.6/

http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.1/hadoop-2.6.1.tar.gz

https://www.apache.org/dyn/closer.lua/accumulo/1.7.0/accumulo-1.7.0-bin.tar.gz

Download the above packages into the $HOME/Downloads directory and extracted into $HOME/Installs. First, let's configure apache hadoop.

 $ vim $HOME/Installs/hadoop-2.6.1/etc/hadoop/hadoop-env.sh  
 $ # uncomment in the file above export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55  
 $ vim $HOME/Installs/hadoop-2.6.1/etc/hadoop/core-site.xml  
 $ cat $HOME/Installs/hadoop-2.6.1/etc/hadoop/core-site.xml  
 <?xml version="1.0" encoding="UTF-8"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <configuration>  
   <property>  
     <name>fs.defaultFS</name>  
     <value>hdfs://localhost:9000</value>  
   </property>  
 </configuration>  
 $ vim $HOME/Installs/hadoop-2.6.1/etc/hadoop/hdfs-site.xml  
 $ cat $HOME/Installs/hadoop-2.6.1/etc/hadoop/hdfs-site.xml  
 <?xml version="1.0" encoding="UTF-8"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <configuration>  
   <property>  
     <name>dfs.replication</name>  
     <value>1</value>  
   </property>  
   <property>  
     <name>dfs.name.dir</name>  
     <value>hdfs_storage/name</value>  
   </property>  
   <property>  
     <name>dfs.data.dir</name>  
     <value>hdfs_storage/data</value>  
   </property>  
 </configuration>  
 $ vim $HOME/Installs/hadoop-2.6.1/etc/hadoop/mapred-site.xml  
 $ cat $HOME/Installs/hadoop-2.6.1/etc/hadoop/mapred-site.xml  
 <?xml version="1.0"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <configuration>  
    <property>  
      <name>mapred.job.tracker</name>  
      <value>localhost:9001</value>  
    </property>  
 </configuration>  
 $ cd $HOME/Installs/hadoop-2.6.1/  
 $ $HOME/Installs/hadoop-2.6.1/bin/hdfs namenode -format  
 $ $HOME/Installs/hadoop-2.6.1/sbin/start-dfs.sh

As you can read above, we specify the java home for hadoop and then we configure hadoop to run on port 9000, so make sure this port is free for hadoop to use. Then we format the hadoop namenode and start the hadoop.

Next we will configure zookeeper.

 $ cp $HOME/Installs/zookeeper-3.4.6/conf/zoo_sample.cfg $HOME/Installs/zookeeper-3.4.6/conf/zoo.cfg  
 $ $HOME/Installs/zookeeper-3.4.6/bin/zkServer.sh start

Pretty simple, get the default config file and start the services. Last steps is the apache accumulo.

 $ cp $HOME/Installs/accumulo-1.7.0/conf/examples/512MB/standalone/* $HOME/Installs/accumulo-1.7.0/conf/  
 $ vim $HOME/.bashrc  
 $ tail -2 $HOME/.bashrc  
 export HADOOP_HOME=$HOME/Installs/hadoop-2.6.1/  
 export ZOOKEEPER_HOME=$HOME/Installs/zookeeper-3.4.6/  
 $ . $HOME/.bashrc  
 $ vim $HOME/Installs/accumulo-1.7.0/conf/accumulo-env.sh  
 $ # SET ACCUMULO_MONITOR_BIND_ALL to true.  
 $ vim $HOME/Installs/accumulo-1.7.0/conf/accumulo-site.xml  
 $ # in file $HOME/Installs/accumulo-1.7.0/conf/accumulo-site.xml  
 <property>  
   <name>instance.volumes</name>  
   <value>hdfs://localhost:9000/accumulo</value>  
 </property>  
 $ # in file $HOME/Installs/accumulo-1.7.0/conf/accumulo-site.xml   
   <name>instance.secret</name>  
   <value>mysecret</value>  
 $ # in file $HOME/Installs/accumulo-1.7.0/conf/accumulo-site.xml    
  <property>  
   <name>trace.token.property.password</name>  
   <value>my scret</value>  
  </property>

So we have configure the setting for accumulo in .bashrc and some properties settings in accumulo-env.sh and accumulo-site.xml . Next, we will initialize accumulo and start it using the password we specify previously.

 $ $HOME/Installs/accumulo-1.7.0/bin/accumulo init  
 $ # give a instance name.  
 $ # type in the password as specify in trace.token.property.password.  
 $ $HOME/Installs/accumulo-1.7.0/bin/start-all.sh

That's it! If you want to do CRUD in accumulo, I suggest you go with this official documentation.

Saturday, August 29, 2015

First time learning Apache HBase

Today, we will take another look at another big data technology. Apache HBase is the topic for today and before we dip our toe into Apache HBase, let's find out what actually is Apache HBase.

Apache HBase [1] is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al.[2] Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop [3].

In this article, we can setup a single node for this adventure. Before we begin, let's download a copy of Apache HBase here. Once downloaded, extract the compressed content. At the time of this writing, I'm using Apache HBase version 1.1.1 for this learning experience.

 user@localhost:~/Desktop/hbase-1.1.1$ ls  
 bin CHANGES.txt conf     docs hbase-webapps lib LICENSE.txt NOTICE.txt README.txt

If you have not install java, go ahead and install it. Pick a recent java or at least java7. Make sure terminal prompt the correct version of java. An example would be as of following

 user@localhost:~/Desktop/hbase-1.1.1$ java -version  
 java version "1.7.0_55"  
 Java(TM) SE Runtime Environment (build 1.7.0_55-b13)  
 Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

If you cannot change system configuration for this java, then in the HBase configuration file, conf/hbase-env.sh, uncomment JAVA_HOME variable and set to the java that you installed. The main configuration file for hbase is conf/hbase-site.xml and we will now edit this file so it became such as following. Change to your environment as required.

 user@localhost:~/Desktop/hbase-1.1.1$ cat conf/hbase-site.xml   
 <?xml version="1.0"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <!--  
 /**  
  *  
  * Licensed to the Apache Software Foundation (ASF) under one  
  * or more contributor license agreements. See the NOTICE file  
  * distributed with this work for additional information  
  * regarding copyright ownership. The ASF licenses this file  
  * to you under the Apache License, Version 2.0 (the  
  * "License"); you may not use this file except in compliance  
  * with the License. You may obtain a copy of the License at  
  *  
  *   http://www.apache.org/licenses/LICENSE-2.0  
  *  
  * Unless required by applicable law or agreed to in writing, software  
  * distributed under the License is distributed on an "AS IS" BASIS,  
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
  * See the License for the specific language governing permissions and  
  * limitations under the License.  
  */  
 -->  
 <configuration>  
  <property>  
   <name>hbase.rootdir</name>  
   <value>file:///home/user/Desktop/hbase-1.1.1</value>  
  </property>  
  <property>  
   <name>hbase.zookeeper.property.dataDir</name>  
   <value>/home/user/zookeeper</value>  
  </property>  
 </configuration>

Okay, we are ready to start hbase. start it with a helpful script bin/start-hbase.sh

 user@localhost:~/Desktop/hbase-1.1.1$ bin/start-hbase.sh   
 starting master, logging to /home/user/Desktop/hbase-1.1.1/bin/../logs/hbase-user-master-localhost.out  
   
 user@localhost:~/Desktop/hbase-1.1.1/logs$ tail -F hbase-user-master-localhost.out SecurityAuth.audit hbase-user-master-localhost.log  
 ==> hbase-user-master-localhost.out <==  
   
 ==> SecurityAuth.audit <==  
 2015-08-18 17:49:41,533 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.1.1 port: 36745 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
 2015-08-18 17:49:46,812 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.0.1 port: 53042 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
 2015-08-18 17:49:48,309 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.0.1 port: 53043 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
 2015-08-18 17:49:49,317 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.0.1 port: 53044 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
   
 ==> hbase-user-master-localhost.log <==  
 2015-08-18 17:49:49,281 INFO [StoreOpener-78a2a3664205fcf679d2043ac3259648-1] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=831688, freeSize=808983544, maxSize=809815232, heapSize=831688, minSize=769324480, minFactor=0.95, multiSize=384662240, multiFactor=0.5, singleSize=192331120, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false  
 2015-08-18 17:49:49,282 INFO [StoreOpener-78a2a3664205fcf679d2043ac3259648-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to compact 0.000000  
 2015-08-18 17:49:49,295 INFO [RS_OPEN_REGION-localhost:60631-0] regionserver.HRegion: Onlined 78a2a3664205fcf679d2043ac3259648; next sequenceid=2  
 2015-08-18 17:49:49,303 INFO [PostOpenDeployTasks:78a2a3664205fcf679d2043ac3259648] regionserver.HRegionServer: Post open deploy tasks for hbase:namespace,,1439891388424.78a2a3664205fcf679d2043ac3259648.  
 2015-08-18 17:49:49,322 INFO [PostOpenDeployTasks:78a2a3664205fcf679d2043ac3259648] hbase.MetaTableAccessor: Updated row hbase:namespace,,1439891388424.78a2a3664205fcf679d2043ac3259648. with server=localhost,60631,1439891378840  
 2015-08-18 17:49:49,332 INFO [AM.ZK.Worker-pool3-t6] master.RegionStates: Transition {78a2a3664205fcf679d2043ac3259648 state=OPENING, ts=1439891389276, server=localhost,60631,1439891378840} to {78a2a3664205fcf679d2043ac3259648 state=OPEN, ts=1439891389332, server=localhost,60631,1439891378840}  
 2015-08-18 17:49:49,603 INFO [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14f4036b87d0000 type:create cxid:0x1d5 zxid:0x44 txntype:-1 reqpath:n/a Error Path:/hbase/namespace/default Error:KeeperErrorCode = NodeExists for /hbase/namespace/default  
 2015-08-18 17:49:49,625 INFO [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14f4036b87d0000 type:create cxid:0x1d8 zxid:0x46 txntype:-1 reqpath:n/a Error Path:/hbase/namespace/hbase Error:KeeperErrorCode = NodeExists for /hbase/namespace/hbase  
 2015-08-18 17:49:49,639 INFO [localhost:51452.activeMasterManager] master.HMaster: Master has completed initialization  
 2015-08-18 17:49:49,642 INFO [localhost:51452.activeMasterManager] quotas.MasterQuotaManager: Quota support disabled

and you notice, log file is also available and jps shown a HMaster is running.

 user@localhost: $ jps  
 22144 Jps  
 21793 HMaster

okay, let's experience apache hbase using a hbase shell.

 user@localhost:~/Desktop/hbase-1.1.1$ ./bin/hbase shell  
 2015-08-18 17:55:25,134 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
 HBase Shell; enter 'help<RETURN>' for list of supported commands.  
 Type "exit<RETURN>" to leave the HBase Shell  
 Version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015  
   
 hbase(main):001:0>   
   
 A help command show very helpful description such as the followings.  
   
 hbase(main):001:0> help  
 HBase Shell, version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015  
 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.  
 Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.  
   
 COMMAND GROUPS:  
  Group name: general  
  Commands: status, table_help, version, whoami  
   
  Group name: ddl  
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters  
   
  Group name: namespace  
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables  
   
  Group name: dml  
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve  
   
  Group name: tools  
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, split, trace, unassign, wal_roll, zk_dump  
   
  Group name: replication  
  Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs  
   
  Group name: snapshots  
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot  
   
  Group name: configuration  
  Commands: update_all_config, update_config  
   
  Group name: quotas  
  Commands: list_quotas, set_quota  
   
  Group name: security  
  Commands: grant, revoke, user_permission  
   
  Group name: visibility labels  
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility  
   
 SHELL USAGE:  
 Quote all names in HBase Shell such as table and column names. Commas delimit  
 command parameters. Type <RETURN> after entering a command to run it.  
 Dictionaries of configuration used in the creation and alteration of tables are  
 Ruby Hashes. They look like this:  
   
  {'key1' => 'value1', 'key2' => 'value2', ...}  
   
 and are opened and closed with curley-braces. Key/values are delimited by the  
 '=>' character combination. Usually keys are predefined constants such as  
 NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type  
 'Object.constants' to see a (messy) list of all constants in the environment.  
   
 If you are using binary keys or values and need to enter them in the shell, use  
 double-quote'd hexadecimal representation. For example:  
   
  hbase> get 't1', "key\x03\x3f\xcd"  
  hbase> get 't1', "key\003\023\011"  
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"  
   
 The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.  
 For more on the HBase Shell, see http://hbase.apache.org/book.html  
 hbase(main):002:0>

To create a table (column family),

 hbase(main):002:0> create 'test', 'cf'  
 0 row(s) in 1.5700 seconds  
   
 => Hbase::Table - test  
 hbase(main):003:0>

list information about a table.

 hbase(main):001:0> list 'test'  
 TABLE                                                                                               
 test                                                                                               
 1 row(s) in 0.3530 seconds  
   
 => ["test"]

let's put something into the table we have just created.

 hbase(main):002:0> put 'test', 'row1', 'cf:a', 'value1'  
 0 row(s) in 0.2280 seconds  
   
 hbase(main):003:0> put 'test', 'row2', 'cf:b', 'value2'  
 0 row(s) in 0.0140 seconds  
   
 hbase(main):004:0> put 'test', 'row3', 'cf:c', 'value3'  
 0 row(s) in 0.0060 seconds  
   
 hbase(main):005:0>

Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in this case.

To select the row from the table, use scan.

 hbase(main):005:0> scan 'test'  
 ROW                       COLUMN+CELL                                                                   
  row1                      column=cf:a, timestamp=1439892359305, value=value1                                                
  row2                      column=cf:b, timestamp=1439892363921, value=value2                                                
  row3                      column=cf:c, timestamp=1439892369775, value=value3                                                
 3 row(s) in 0.0420 seconds  
   
 hbase(main):006:0>

To get a row only.

 hbase(main):006:0> get 'test', 'row1'  
 COLUMN                      CELL                                                                       
  cf:a                      timestamp=1439892359305, value=value1                                                      
 1 row(s) in 0.0340 seconds  
   
 hbase(main):007:0>

Something really interesting about apache hbase, say if you want to delete or change settings of a table, you need to disable it first. After that, you can enable it back.

 hbase(main):007:0> disable 'test'  
 0 row(s) in 2.3610 seconds  
   
 hbase(main):008:0> enable 'test'  
 0 row(s) in 1.2790 seconds  
   
 hbase(main):009:0>

okay, now, let's delete this table.

 hbase(main):009:0> drop 'test'  
   
 ERROR: Table test is enabled. Disable it first.  
   
 Here is some help for this command:  
 Drop the named table. Table must first be disabled:  
  hbase> drop 't1'  
  hbase> drop 'ns1:t1'  
   
   
 hbase(main):010:0> disable 'test'  
 0 row(s) in 2.2640 seconds  
   
 hbase(main):011:0> drop 'test'  
 0 row(s) in 1.2800 seconds  
   
 hbase(main):012:0>

Okay, we are done for this basic learning. Let's quit for now.

 hbase(main):012:0> quit  
 user@localhost:~/Desktop/hbase-1.1.1$   
   
 To stop apache hbase instance,   
   
 user@localhost:~/Desktop/hbase-1.1.1$ ./bin/stop-hbase.sh   
 stopping hbase.................  
   
   
 user@localhost:~/Desktop/hbase-1.1.1$ jps  
 23399 Jps  
 5445 org.eclipse.equinox.launcher_1.3.0.v20140415-2008.jar

If you like me who came from apache cassandra, apache hbase looks very similar. If this interest you, I shall leave you with the following three links which will get you further.

http://hbase.apache.org/book.html

http://wiki.apache.org/hadoop/Hbase

https://blogs.apache.org/hbase/

Sunday, August 16, 2015

First time learning gradle

It is difficult to jump start into software development if you are new to introduction of many sub technologies. Today, I'm gonna put aside of my project and start to learn another technology. Gradle, a build system but there are much more than just build. If you are also new to gradle, you might want to find out what actually is gradle.

Gradle on wikipedia

Gradle is a build automation tool that builds upon the concepts of Apache Ant and Apache Maven and introduces a Groovy-based domain-specific language (DSL) instead of the more traditional XML form of declaring the project configuration. Gradle uses a directed acyclic graph ("DAG") to determine the order in which tasks can be run.

Gradle was designed for multi-project builds which can grow to be quite large, and supports incremental builds by intelligently determining which parts of the build tree are up-to-date, so that any task dependent upon those parts will not need to be re-executed.

If you have many projects that depend on a project, gradle will solve your problems. We will look into the basic of gradle build automation tool today. I love to code java and so I will use java as this demo. First, let's install gradle. If you are using deb based distribution like debian or ubuntu, to install gradle, it is as easy as $ sudo apt-get install gradle. Otherwise, you can download gradle from http://gradle.org/ and install in your system. Now let's create a gradle build file. See below.

 user@localhost:~/gradle$ cat build.gradle   
 apply plugin: 'java'  
 user@localhost:~/gradle$ ls -a  
 total 36K  
 -rw-r--r--  1 user user  21 Aug 6 17:15 build.gradle  
 drwxr-xr-x 214 user user 28K Aug 6 17:15 ..  
 drwxr-xr-x  2 user user 4.0K Aug 6 17:15 .  
 user@localhost:~/gradle$ gradle build  
 :compileJava UP-TO-DATE  
 :processResources UP-TO-DATE  
 :classes UP-TO-DATE  
 :jar  
 :assemble  
 :compileTestJava UP-TO-DATE  
 :processTestResources UP-TO-DATE  
 :testClasses UP-TO-DATE  
 :test  
 :check  
 :build  
   
 BUILD SUCCESSFUL  
   
 Total time: 13.304 secs  
 user@localhost:~/gradle$ ls -a  
 total 44K  
 -rw-r--r--  1 user user  21 Aug 6 17:15 build.gradle  
 drwxr-xr-x 214 user user 28K Aug 6 17:15 ..  
 drwxr-xr-x  3 user user 4.0K Aug 6 17:15 .gradle  
 drwxr-xr-x  4 user user 4.0K Aug 6 17:15 .  
 drwxr-xr-x  6 user user 4.0K Aug 6 17:15 build  
 user@localhost:~/gradle$ find .gradle/  
 .gradle/  
 .gradle/1.5  
 .gradle/1.5/taskArtifacts  
 .gradle/1.5/taskArtifacts/fileHashes.bin  
 .gradle/1.5/taskArtifacts/taskArtifacts.bin  
 .gradle/1.5/taskArtifacts/fileSnapshots.bin  
 .gradle/1.5/taskArtifacts/outputFileStates.bin  
 .gradle/1.5/taskArtifacts/cache.properties.lock  
 .gradle/1.5/taskArtifacts/cache.properties  
 user@localhost:~/gradle$ find build  
 build  
 build/libs  
 build/libs/gradle.jar  
 build/test-results  
 build/test-results/binary  
 build/test-results/binary/test  
 build/test-results/binary/test/results.bin  
 build/reports  
 build/reports/tests  
 build/reports/tests/report.js  
 build/reports/tests/index.html  
 build/reports/tests/base-style.css  
 build/reports/tests/style.css  
 build/tmp  
 build/tmp/jar  
 build/tmp/jar/MANIFEST.MF

one liner of input produce so many output files. Amazing! Why so many files that were generated, read the output of the command output, it compile, process resource, jar, assemble, test check and build. What are all these means, I will not explain to you one by one, you learn better if you read this definition yourself which is documented very well here. You might say, hey , I have different java source path can gradle handle this? Yes of cause! In the build path you created, you can add another line.

 // set the source java folder to another non maven standard path  
 sourceSets.main.java.srcDirs = ['src/java']

Most of us coming from java has ant build file. If that is the case, gradle integrate nicely with ant too, you just need to import ant build file and then call ant target from gradle. See code snippet below.

 user@localhost:~/gradle$ cat build.xml   
 <project>  
  <target name="helloAnt">  
   <echo message="hello this is ant."/>  
  </target>  
 </project>  
 user@localhost:~/gradle$ cat build.gradle  
 apply plugin: 'java'  
   
 // set the source java folder to another non maven standard path  
 sourceSets.main.java.srcDirs = ['src/java']  
   
 // import ant build file.  
 ant.importBuild 'build.xml'  
 user@localhost:~/gradle$ gradle helloAnt   
 :helloAnt  
 [ant:echo] hello this is ant.  
   
 BUILD SUCCESSFUL  
   
 Total time: 5.573 secs

That looks pretty good! If you curious about what gradle parameter that you can use during figuring out if the build went wrong, you should really read into this link. Also, if read on the environment variable as you can specify other jdk for gradle or even java parameter during compile big projects.

You might want to ask also, what if I only want to compile, I don't want to go through all the automatic builds above. No problem, since this is a java project, you specify compileJava.

 user@localhost:~/gradle$ gradle compileJava  
 :compileJava UP-TO-DATE  
   
 BUILD SUCCESSFUL  
   
 Total time: 4.976 secs

As you can see, gradle is very flexible and because of that, you might want to exploit it further. For example, customizing the task in build.gradle, listing projects, listing tasks and others. For that, read here as it explain and give a lot of example how all that can be done. So at this stage, you might want to add more feature into gradle build file. Okay, let's do just that.

 user@localhost:~/gradle$ cat build.gradle   
 apply plugin: 'java'  
 apply plugin: 'eclipse'  
   
 // set the source java folder to another non maven standard path  
 // default src/main/java  
 sourceSets.main.java.srcDirs = ['src/java']  
   
 // default src test   
 //src/test/java  
   
 // default src resources.  
 // src/main/resources   
   
 // default src test resources.  
 // src/test/resources  
   
 // default build  
 // build  
   
 // default jar built  
 // build/libs  
   
   
 // dependencies of external jar, we reference the very good from maven.  
 repositories {  
   mavenCentral()  
 }  
   
 // actual libs dependencies  
 dependencies {  
   compile group: 'commons-collections', name: 'commons-collections', version: '3.2'  
   testCompile group: 'junit', name: 'junit', version: '4.+'  
 }  
   
 test {  
   testLogging {  
     // Show that tests are run in the command-line output  
     events 'started', 'passed'  
   }  
 }  
   
 sourceCompatibility = 1.5  
 version = '1.0'  
 jar {  
   manifest {  
     attributes 'Implementation-Title': 'Gradle Quickstart',  
           'Implementation-Version': version  
   }  
 }  
   
 // import ant build file.  
 ant.importBuild 'build.xml'  
   
 // common for subprojects  
 subprojects {  
   apply plugin: 'java'  
   
   repositories {  
     mavenCentral()  
   }  
   
   dependencies {  
     testCompile 'junit:junit:4.12'  
   }  
   
   version = '1.0'  
   
   jar {  
     manifest.attributes provider: 'gradle'  
   }  
 }  
 user@localhost:~/gradle$ cat settings.gradle   
 include ":nativeapp",":webapp"

Now, if you want to generate eclipse configuration, just run gradle eclipse, all eclipse configuration and setting are created automatically. Of cause, you can customize settings even further.

 user@localhost:~/gradle$ gradle eclipse  
 :eclipseClasspath  
 Download http://repo1.maven.org/maven2/junit/junit/4.12/junit-4.12.pom  
 Download http://repo1.maven.org/maven2/junit/junit/4.12/junit-4.12-sources.jar  
 Download http://repo1.maven.org/maven2/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3-sources.jar  
 Download http://repo1.maven.org/maven2/junit/junit/4.12/junit-4.12.jar  
 :eclipseJdt  
 :eclipseProject  
 :eclipse  
   
 BUILD SUCCESSFUL  
   
 Total time: 19.497 secs  
 user@localhost:~/gradle$ find .  
 .  
 .  
 ./build.xml  
 ./build  
 ./build/classes  
 ./build/classes/test  
 ./build/classes/test/org  
 ./build/classes/test/org/just4fun  
 ./build/classes/test/org/just4fun/voc  
 ./build/classes/test/org/just4fun/voc/file  
 ./build/classes/test/org/just4fun/voc/file/QuickTest.class  
 ./build/libs  
 ./build/libs/gradle.jar  
 ./build/libs/gradle-1.0.jar  
 ./build/test-results  
 ./build/test-results/binary  
 ./build/test-results/binary/test  
 ./build/test-results/binary/test/results.bin  
 ./build/test-results/TEST-org.just4fun.voc.file.QuickTest.xml  
 ./build/reports  
 ./build/reports/tests  
 ./build/reports/tests/report.js  
 ./build/reports/tests/index.html  
 ./build/reports/tests/org.just4fun.voc.file.html  
 ./build/reports/tests/base-style.css  
 ./build/reports/tests/org.just4fun.voc.file.QuickTest.html  
 ./build/reports/tests/style.css  
 ./build/dependency-cache  
 ./build/tmp  
 ./build/tmp/jar  
 ./build/tmp/jar/MANIFEST.MF  
 ./webapp  
 ./webapp/build.gradle  
 ./.gradle  
 ./.gradle/1.5  
 ./.gradle/1.5/taskArtifacts  
 ./.gradle/1.5/taskArtifacts/fileHashes.bin  
 ./.gradle/1.5/taskArtifacts/taskArtifacts.bin  
 ./.gradle/1.5/taskArtifacts/fileSnapshots.bin  
 ./.gradle/1.5/taskArtifacts/outputFileStates.bin  
 ./.gradle/1.5/taskArtifacts/cache.properties.lock  
 ./.gradle/1.5/taskArtifacts/cache.properties  
 ./.classpath  
 ./build.gradle  
 ./.project  
 ./.settings  
 ./.settings/org.eclipse.jdt.core.prefs  
 ./settings.gradle  
 ./nativeapp  
 ./nativeapp/build.gradle  
 ./src  
 ./src/test  
 ./src/test/java  
 ./src/test/java/org  
 ./src/test/java/org/just4fun  
 ./src/test/java/org/just4fun/voc  
 ./src/test/java/org/just4fun/voc/file  
 ./src/test/java/org/just4fun/voc/file/QuickTest.java

Now, I create a simple unit test class file, see below. Then only run a single unit test, that's very cool.

 user@localhost:~/gradle$ find src/  
 src/  
 src/test  
 src/test/java  
 src/test/java/org  
 src/test/java/org/just4fun  
 src/test/java/org/just4fun/voc  
 src/test/java/org/just4fun/voc/file  
 src/test/java/org/just4fun/voc/file/QuickTest.java  
 $ gradle -Dtest.single=Quick test  
 :compileJava UP-TO-DATE  
 :processResources UP-TO-DATE  
 :classes UP-TO-DATE  
 :compileTestJavawarning: [options] bootstrap class path not set in conjunction with -source 1.5  
 1 warning  
   
 :processTestResources UP-TO-DATE  
 :testClasses  
 :test  
   
 org.just4fun.voc.file.QuickTest > test STARTED  
   
 org.just4fun.voc.file.QuickTest > test PASSED  
   
 BUILD SUCCESSFUL  
   
 Total time: 55.81 secs  
 user@localhost:~/gradle $

There are two additional directories created , that is nativeapp and webapp, this is subprojects for this big project and it contain its own gradle build file. At the parent of the gradle build file, we see a subprojects configuration as this will applied to all the subprojects. You can create a settings.gradle to specify the subprojects.

That's all for today, as this is just an introduction to quicklyl dive into some of the cool features of gradle, with this shown, I hope it give you some idea where to head next. Good luck!

Friday, August 14, 2015

Light learning apache spark

A while back, I was reading articles and many articles referencing spark and in this week, hey, why not check out what actually is spark. Googling spark produced many results return and we are particularly interested in apache spark. Let us take a look today at apache stark and what is all about. From official spark github,

Apache Spark
Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, and Python, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.

Okay, let's download a copy of spark to your local pc. You can download from this site.

extract the downloaded file and ran the command, not good.

 user@localhost:~/Desktop/spark-1.4.1$ ./bin/pyspark   
 ls: cannot access /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10: No such file or directory  
 Failed to find Spark assembly in /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10.  
 You need to build Spark before running this program.  
 user@localhost:~/Desktop/spark-1.4.1$ ./bin/spark-shell   
 ls: cannot access /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10: No such file or directory  
 Failed to find Spark assembly in /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10.  
 You need to build Spark before running this program.

Well, the default download setting is source, so you will have to compile the source.

 user@localhost:~/Desktop/spark-1.4.1$ mvn -DskipTests clean package  
 [INFO] Scanning for projects...  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Reactor Build Order:  
 [INFO]   
 [INFO] Spark Project Parent POM  
 [INFO] Spark Launcher Project  
 [INFO] Spark Project Networking  
 [INFO] Spark Project Shuffle Streaming Service  
 [INFO] Spark Project Unsafe  
 ...  
 ...  
 ...  
 constituent[20]: file:/usr/share/maven/lib/wagon-http-shaded.jar  
 constituent[21]: file:/usr/share/maven/lib/maven-settings-builder-3.x.jar  
 constituent[22]: file:/usr/share/maven/lib/maven-aether-provider-3.x.jar  
 constituent[23]: file:/usr/share/maven/lib/maven-core-3.x.jar  
 constituent[24]: file:/usr/share/maven/lib/plexus-cipher.jar  
 constituent[25]: file:/usr/share/maven/lib/aether-util.jar  
 constituent[26]: file:/usr/share/maven/lib/commons-httpclient.jar  
 constituent[27]: file:/usr/share/maven/lib/commons-cli.jar  
 constituent[28]: file:/usr/share/maven/lib/aether-api.jar  
 constituent[29]: file:/usr/share/maven/lib/maven-model-3.x.jar  
 constituent[30]: file:/usr/share/maven/lib/guava.jar  
 constituent[31]: file:/usr/share/maven/lib/wagon-file.jar  
 ---------------------------------------------------  
 Exception in thread "main" java.lang.OutOfMemoryError: PermGen space  
      at java.lang.ClassLoader.defineClass1(Native Method)  
      at java.lang.ClassLoader.defineClass(ClassLoader.java:800)  
      at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)  
      at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)  
      at java.net.URLClassLoader.access$100(URLClassLoader.java:71)  
      at java.net.URLClassLoader$1.run(URLClassLoader.java:361)  
      at java.net.URLClassLoader$1.run(URLClassLoader.java:355)  
      at java.security.AccessController.doPrivileged(Native Method)  
      at java.net.URLClassLoader.findClass(URLClassLoader.java:354)  
      at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClassFromSelf(ClassRealm.java:401)  
      at org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy.loadClass(SelfFirstStrategy.java:42)  
      at org.codehaus.plexus.classworlds.realm.ClassRealm.unsynchronizedLoadClass(ClassRealm.java:271)  
      at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:247)  
      at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:239)  
      at org.apache.maven.cli.MavenCli.execute(MavenCli.java:545)  
      at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)  
      at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)  
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
      at java.lang.reflect.Method.invoke(Method.java:606)  
      at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)  
      at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)  
      at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)  
      at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)

okay, let's beef up a little for the build setting, and the build took very long time, eventually. I switch to build in the directory build. See below.

 user@localhost:~/Desktop/spark-1.4.1$ export MAVEN_OPTS="-XX:MaxPermSize=1024M"  
 user@localhost:~/Desktop/spark-1.4.1$ mvn -DskipTests clean package  
   
 [INFO] Scanning for projects...  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Reactor Build Order:  
 [INFO]   
 [INFO] Spark Project Parent POM  
 [INFO] Spark Launcher Project  
 [INFO] Spark Project Networking  
 [INFO] Spark Project Shuffle Streaming Service  
 [INFO] Spark Project Unsafe  
 [INFO] Spark Project Core  
   
 user@localhost:~/Desktop/spark-1.4.1$ build/mvn -DskipTests clean package  
 [INFO] Scanning for projects...  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Reactor Build Order:  
 [INFO]   
 [INFO] Spark Project Parent POM  
 [INFO] Spark Launcher Project  
 [INFO] Spark Project Networking  
 [INFO] Spark Project Shuffle Streaming Service  
 [INFO] Spark Project Unsafe  
 [INFO] Spark Project Core  
 ..  
 ...  
 ...  
 ...  
 get/spark-streaming-kafka-assembly_2.10-1.4.1-shaded.jar  
 [INFO]   
 [INFO] --- maven-source-plugin:2.4:jar-no-fork (create-source-jar) @ spark-streaming-kafka-assembly_2.10 ---  
 [INFO] Building jar: /home/user/Desktop/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1-sources.jar  
 [INFO]   
 [INFO] --- maven-source-plugin:2.4:test-jar-no-fork (create-source-jar) @ spark-streaming-kafka-assembly_2.10 ---  
 [INFO] Building jar: /home/user/Desktop/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1-test-sources.jar  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Reactor Summary:  
 [INFO]   
 [INFO] Spark Project Parent POM .......................... SUCCESS [26.138s]  
 [INFO] Spark Launcher Project ............................ SUCCESS [1:15.976s]  
 [INFO] Spark Project Networking .......................... SUCCESS [26.347s]  
 [INFO] Spark Project Shuffle Streaming Service ........... SUCCESS [14.123s]  
 [INFO] Spark Project Unsafe .............................. SUCCESS [12.643s]  
 [INFO] Spark Project Core ................................ SUCCESS [9:49.622s]  
 [INFO] Spark Project Bagel ............................... SUCCESS [17.426s]  
 [INFO] Spark Project GraphX .............................. SUCCESS [53.601s]  
 [INFO] Spark Project Streaming ........................... SUCCESS [1:34.290s]  
 [INFO] Spark Project Catalyst ............................ SUCCESS [2:04.020s]  
 [INFO] Spark Project SQL ................................. SUCCESS [2:11.032s]  
 [INFO] Spark Project ML Library .......................... SUCCESS [2:57.880s]  
 [INFO] Spark Project Tools ............................... SUCCESS [6.920s]  
 [INFO] Spark Project Hive ................................ SUCCESS [2:58.649s]  
 [INFO] Spark Project REPL ................................ SUCCESS [36.564s]  
 [INFO] Spark Project Assembly ............................ SUCCESS [3:13.152s]  
 [INFO] Spark Project External Twitter .................... SUCCESS [1:09.316s]  
 [INFO] Spark Project External Flume Sink ................. SUCCESS [42.294s]  
 [INFO] Spark Project External Flume ...................... SUCCESS [37.907s]  
 [INFO] Spark Project External MQTT ....................... SUCCESS [1:20.999s]  
 [INFO] Spark Project External ZeroMQ ..................... SUCCESS [29.090s]  
 [INFO] Spark Project External Kafka ...................... SUCCESS [54.212s]  
 [INFO] Spark Project Examples ............................ SUCCESS [5:54.508s]  
 [INFO] Spark Project External Kafka Assembly ............. SUCCESS [1:24.962s]  
 [INFO] ------------------------------------------------------------------------  
 [INFO] BUILD SUCCESS  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Total time: 41:53.884s  
 [INFO] Finished at: Tue Aug 04 08:56:02 MYT 2015  
 [INFO] Final Memory: 71M/684M  
 [INFO] ------------------------------------------------------------------------

Yes, finally the build is success. Even though success, as you can see above, it took 41minutes on my pc just to compile. Okay, now that all libs are built, let's repeat the command we type just now.

 $ ./bin/spark-shell  
 log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).  
 log4j:WARN Please initialize the log4j system properly.  
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.  
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties  
 15/08/04 20:21:16 INFO SecurityManager: Changing view acls to: user  
 15/08/04 20:21:16 INFO SecurityManager: Changing modify acls to: user  
 15/08/04 20:21:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)  
 15/08/04 20:21:16 INFO HttpServer: Starting HTTP Server  
 15/08/04 20:21:17 INFO Utils: Successfully started service 'HTTP class server' on port 56379.  
 Welcome to  
    ____       __  
    / __/__ ___ _____/ /__  
   _\ \/ _ \/ _ `/ __/ '_/  
   /___/ .__/\_,_/_/ /_/\_\  version 1.4.1  
    /_/  
   
 Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)  
 Type in expressions to have them evaluated.  
 Type :help for more information.  
 15/08/04 20:21:24 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.1.1; using 192.168.133.28 instead (on interface eth0)  
 15/08/04 20:21:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address  
 15/08/04 20:21:24 INFO SparkContext: Running Spark version 1.4.1  
 15/08/04 20:21:24 INFO SecurityManager: Changing view acls to: user  
 15/08/04 20:21:24 INFO SecurityManager: Changing modify acls to: user  
 15/08/04 20:21:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)  
 15/08/04 20:21:25 INFO Slf4jLogger: Slf4jLogger started  
 15/08/04 20:21:26 INFO Remoting: Starting remoting  
 15/08/04 20:21:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.133.28:47888]  
 15/08/04 20:21:26 INFO Utils: Successfully started service 'sparkDriver' on port 47888.  
 15/08/04 20:21:27 INFO SparkEnv: Registering MapOutputTracker  
 15/08/04 20:21:27 INFO SparkEnv: Registering BlockManagerMaster  
 15/08/04 20:21:27 INFO DiskBlockManager: Created local directory at /tmp/spark-660b5f39-26be-4ea2-8593-c0c05a093a23/blockmgr-c3225f03-5ecf-4fed-bbe4-df2331ac7742  
 15/08/04 20:21:27 INFO MemoryStore: MemoryStore started with capacity 265.4 MB  
 15/08/04 20:21:27 INFO HttpFileServer: HTTP File server directory is /tmp/spark-660b5f39-26be-4ea2-8593-c0c05a093a23/httpd-3ab40971-a6d0-42a7-b39e-4d1ce4290642  
 15/08/04 20:21:27 INFO HttpServer: Starting HTTP Server  
 15/08/04 20:21:27 INFO Utils: Successfully started service 'HTTP file server' on port 50089.  
 15/08/04 20:21:27 INFO SparkEnv: Registering OutputCommitCoordinator  
 15/08/04 20:21:28 INFO Utils: Successfully started service 'SparkUI' on port 4040.  
 15/08/04 20:21:28 INFO SparkUI: Started SparkUI at http://192.168.133.28:4040  
 15/08/04 20:21:28 INFO Executor: Starting executor ID driver on host localhost  
 15/08/04 20:21:28 INFO Executor: Using REPL class URI: http://192.168.133.28:56379  
 15/08/04 20:21:28 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36428.  
 15/08/04 20:21:28 INFO NettyBlockTransferService: Server created on 36428  
 15/08/04 20:21:28 INFO BlockManagerMaster: Trying to register BlockManager  
 15/08/04 20:21:28 INFO BlockManagerMasterEndpoint: Registering block manager localhost:36428 with 265.4 MB RAM, BlockManagerId(driver, localhost, 36428)  
 15/08/04 20:21:28 INFO BlockManagerMaster: Registered BlockManager  
 15/08/04 20:21:29 INFO SparkILoop: Created spark context..  
 Spark context available as sc.  
 15/08/04 20:21:30 INFO SparkILoop: Created sql context..  
 SQL context available as sqlContext.  
   
 scala>

Okay, everything looks good, the error above no longer exists. Let's explore further.

 scala> sc.parallelize(1 to 1000).count()  
 15/08/04 20:30:05 INFO SparkContext: Starting job: count at <console>:22  
 15/08/04 20:30:05 INFO DAGScheduler: Got job 0 (count at <console>:22) with 4 output partitions (allowLocal=false)  
 15/08/04 20:30:05 INFO DAGScheduler: Final stage: ResultStage 0(count at <console>:22)  
 15/08/04 20:30:05 INFO DAGScheduler: Parents of final stage: List()  
 15/08/04 20:30:05 INFO DAGScheduler: Missing parents: List()  
 15/08/04 20:30:05 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at <console>:22), which has no missing parents  
 15/08/04 20:30:05 INFO MemoryStore: ensureFreeSpace(1096) called with curMem=0, maxMem=278302556  
 15/08/04 20:30:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1096.0 B, free 265.4 MB)  
 15/08/04 20:30:05 INFO MemoryStore: ensureFreeSpace(804) called with curMem=1096, maxMem=278302556  
 15/08/04 20:30:05 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 804.0 B, free 265.4 MB)  
 15/08/04 20:30:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:36428 (size: 804.0 B, free: 265.4 MB)  
 15/08/04 20:30:05 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874  
 15/08/04 20:30:05 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at <console>:22)  
 15/08/04 20:30:05 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks  
 15/08/04 20:30:05 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1369 bytes)  
 15/08/04 20:30:05 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1369 bytes)  
 15/08/04 20:30:05 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 1369 bytes)  
 15/08/04 20:30:05 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 1426 bytes)  
 15/08/04 20:30:05 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)  
 15/08/04 20:30:05 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)  
 15/08/04 20:30:05 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)  
 15/08/04 20:30:05 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)  
 15/08/04 20:30:06 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 658 bytes result sent to driver  
 15/08/04 20:30:06 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 658 bytes result sent to driver  
 15/08/04 20:30:06 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 658 bytes result sent to driver  
 15/08/04 20:30:06 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 658 bytes result sent to driver  
 15/08/04 20:30:06 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 477 ms on localhost (1/4)  
 15/08/04 20:30:06 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 478 ms on localhost (2/4)  
 15/08/04 20:30:06 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 508 ms on localhost (3/4)  
 15/08/04 20:30:06 INFO DAGScheduler: ResultStage 0 (count at <console>:22) finished in 0.520 s  
 15/08/04 20:30:06 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 478 ms on localhost (4/4)  
 15/08/04 20:30:06 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool   
 15/08/04 20:30:06 INFO DAGScheduler: Job 0 finished: count at <console>:22, took 1.079304 s  
 res0: Long = 1000

That's pretty nice, for a small demo on how is spark work. Now move on to the next example, let's open another terminal.

 user@localhost:~/Desktop/spark-1.4.1$ ./bin/pyspark   
 Python 2.7.10 (default, Jul 1 2015, 10:54:53)   
 [GCC 4.9.2] on linux2  
 Type "help", "copyright", "credits" or "license" for more information.  
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties  
 15/08/04 20:37:42 INFO SparkContext: Running Spark version 1.4.1  
 15/08/04 20:37:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
 15/08/04 20:37:44 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.1.1; using 182.168.133.28 instead (on interface eth0)  
 15/08/04 20:37:44 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address  
 15/08/04 20:37:44 INFO SecurityManager: Changing view acls to: user  
 15/08/04 20:37:44 INFO SecurityManager: Changing modify acls to: user  
 15/08/04 20:37:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)  
 15/08/04 20:37:46 INFO Slf4jLogger: Slf4jLogger started  
 15/08/04 20:37:46 INFO Remoting: Starting remoting  
 15/08/04 20:37:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@182.168.133.28:35904]  
 15/08/04 20:37:46 INFO Utils: Successfully started service 'sparkDriver' on port 35904.  
 15/08/04 20:37:46 INFO SparkEnv: Registering MapOutputTracker  
 15/08/04 20:37:46 INFO SparkEnv: Registering BlockManagerMaster  
 15/08/04 20:37:47 INFO DiskBlockManager: Created local directory at /tmp/spark-2b46e9e7-1779-45d1-b9cf-46000baf7d9b/blockmgr-e2f47b34-47a8-4b72-a0d6-25d0a7daa02e  
 15/08/04 20:37:47 INFO MemoryStore: MemoryStore started with capacity 265.4 MB  
 15/08/04 20:37:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-2b46e9e7-1779-45d1-b9cf-46000baf7d9b/httpd-2ec128c2-bad0-4dd9-a826-eab2ee0779cb  
 15/08/04 20:37:47 INFO HttpServer: Starting HTTP Server  
 15/08/04 20:37:47 INFO Utils: Successfully started service 'HTTP file server' on port 45429.  
 15/08/04 20:37:47 INFO SparkEnv: Registering OutputCommitCoordinator  
 15/08/04 20:37:49 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.  
 15/08/04 20:37:50 INFO Utils: Successfully started service 'SparkUI' on port 4041.  
 15/08/04 20:37:50 INFO SparkUI: Started SparkUI at http://182.168.133.28:4041  
 15/08/04 20:37:50 INFO Executor: Starting executor ID driver on host localhost  
 15/08/04 20:37:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 47045.  
 15/08/04 20:37:51 INFO NettyBlockTransferService: Server created on 47045  
 15/08/04 20:37:51 INFO BlockManagerMaster: Trying to register BlockManager  
 15/08/04 20:37:51 INFO BlockManagerMasterEndpoint: Registering block manager localhost:47045 with 265.4 MB RAM, BlockManagerId(driver, localhost, 47045)  
 15/08/04 20:37:51 INFO BlockManagerMaster: Registered BlockManager  
 Welcome to  
    ____       __  
    / __/__ ___ _____/ /__  
   _\ \/ _ \/ _ `/ __/ '_/  
   /__ / .__/\_,_/_/ /_/\_\  version 1.4.1  
    /_/  
   
 Using Python version 2.7.10 (default, Jul 1 2015 10:54:53)  
 SparkContext available as sc, SQLContext available as sqlContext.  
 >>> sc.parallelize(range(1000)).count()  
 15/08/04 20:37:55 INFO SparkContext: Starting job: count at <stdin>:1  
 15/08/04 20:37:55 INFO DAGScheduler: Got job 0 (count at <stdin>:1) with 4 output partitions (allowLocal=false)  
 15/08/04 20:37:55 INFO DAGScheduler: Final stage: ResultStage 0(count at <stdin>:1)  
 15/08/04 20:37:55 INFO DAGScheduler: Parents of final stage: List()  
 15/08/04 20:37:55 INFO DAGScheduler: Missing parents: List()  
 15/08/04 20:37:55 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at count at <stdin>:1), which has no missing parents  
 15/08/04 20:37:55 INFO MemoryStore: ensureFreeSpace(4416) called with curMem=0, maxMem=278302556  
 15/08/04 20:37:55 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.3 KB, free 265.4 MB)  
 15/08/04 20:37:55 INFO MemoryStore: ensureFreeSpace(2722) called with curMem=4416, maxMem=278302556  
 15/08/04 20:37:55 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.7 KB, free 265.4 MB)  
 15/08/04 20:37:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:47045 (size: 2.7 KB, free: 265.4 MB)  
 15/08/04 20:37:55 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874  
 15/08/04 20:37:55 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (PythonRDD[1] at count at <stdin>:1)  
 15/08/04 20:37:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks  
 15/08/04 20:37:55 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1873 bytes)  
 15/08/04 20:37:55 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 2117 bytes)  
 15/08/04 20:37:55 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 2123 bytes)  
 15/08/04 20:37:55 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 2123 bytes)  
 15/08/04 20:37:55 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)  
 15/08/04 20:37:55 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)  
 15/08/04 20:37:55 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)  
 15/08/04 20:37:55 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)  
 15/08/04 20:37:56 INFO PythonRDD: Times: total = 421, boot = 376, init = 44, finish = 1  
 15/08/04 20:37:56 INFO PythonRDD: Times: total = 418, boot = 354, init = 64, finish = 0  
 15/08/04 20:37:56 INFO PythonRDD: Times: total = 423, boot = 372, init = 51, finish = 0  
 15/08/04 20:37:56 INFO PythonRDD: Times: total = 421, boot = 381, init = 40, finish = 0  
 15/08/04 20:37:56 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 698 bytes result sent to driver  
 15/08/04 20:37:56 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 698 bytes result sent to driver  
 15/08/04 20:37:56 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 698 bytes result sent to driver  
 15/08/04 20:37:56 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 698 bytes result sent to driver  
 15/08/04 20:37:56 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 552 ms on localhost (1/4)  
 15/08/04 20:37:56 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 560 ms on localhost (2/4)  
 15/08/04 20:37:56 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 562 ms on localhost (3/4)  
 15/08/04 20:37:56 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 626 ms on localhost (4/4)  
 15/08/04 20:37:56 INFO DAGScheduler: ResultStage 0 (count at <stdin>:1) finished in 0.641 s  
 15/08/04 20:37:56 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool   
 15/08/04 20:37:56 INFO DAGScheduler: Job 0 finished: count at <stdin>:1, took 1.137405 s  
 1000  
 >>>

Looks good, next example will calculate pi using spark.

 user@localhost:~/Desktop/spark-1.4.1$ ./bin/run-example SparkPi  
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties  
 15/08/04 20:44:50 INFO SparkContext: Running Spark version 1.4.1  
 15/08/04 20:44:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
 15/08/04 20:44:51 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.1.1; using 182.168.133.28 instead (on interface eth0)  
 15/08/04 20:44:51 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address  
 15/08/04 20:44:51 INFO SecurityManager: Changing view acls to: user  
 15/08/04 20:44:51 INFO SecurityManager: Changing modify acls to: user  
 15/08/04 20:44:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)  
 15/08/04 20:44:52 INFO Slf4jLogger: Slf4jLogger started  
 15/08/04 20:44:52 INFO Remoting: Starting remoting  
 15/08/04 20:44:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@182.168.133.28:45817]  
 15/08/04 20:44:53 INFO Utils: Successfully started service 'sparkDriver' on port 45817.  
 15/08/04 20:44:53 INFO SparkEnv: Registering MapOutputTracker  
 15/08/04 20:44:53 INFO SparkEnv: Registering BlockManagerMaster  
 15/08/04 20:44:53 INFO DiskBlockManager: Created local directory at /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/blockmgr-5ed813af-a26f-413c-bdfc-1e08001f9cb2  
 15/08/04 20:44:53 INFO MemoryStore: MemoryStore started with capacity 265.4 MB  
 15/08/04 20:44:53 INFO HttpFileServer: HTTP File server directory is /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/httpd-f07ff755-e34d-4149-b4ac-399e6897221a  
 15/08/04 20:44:53 INFO HttpServer: Starting HTTP Server  
 15/08/04 20:44:53 INFO Utils: Successfully started service 'HTTP file server' on port 50955.  
 15/08/04 20:44:53 INFO SparkEnv: Registering OutputCommitCoordinator  
 15/08/04 20:44:54 INFO Utils: Successfully started service 'SparkUI' on port 4040.  
 15/08/04 20:44:54 INFO SparkUI: Started SparkUI at http://182.168.133.28:4040  
 15/08/04 20:44:58 INFO SparkContext: Added JAR file:/home/user/Desktop/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.2.0.jar at http://182.168.133.28:50955/jars/spark-examples-1.4.1-hadoop2.2.0.jar with timestamp 1438692298221  
 15/08/04 20:44:58 INFO Executor: Starting executor ID driver on host localhost  
 15/08/04 20:44:58 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45731.  
 15/08/04 20:44:58 INFO NettyBlockTransferService: Server created on 45731  
 15/08/04 20:44:58 INFO BlockManagerMaster: Trying to register BlockManager  
 15/08/04 20:44:58 INFO BlockManagerMasterEndpoint: Registering block manager localhost:45731 with 265.4 MB RAM, BlockManagerId(driver, localhost, 45731)  
 15/08/04 20:44:58 INFO BlockManagerMaster: Registered BlockManager  
 15/08/04 20:44:59 INFO SparkContext: Starting job: reduce at SparkPi.scala:35  
 15/08/04 20:44:59 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false)  
 15/08/04 20:44:59 INFO DAGScheduler: Final stage: ResultStage 0(reduce at SparkPi.scala:35)  
 15/08/04 20:44:59 INFO DAGScheduler: Parents of final stage: List()  
 15/08/04 20:44:59 INFO DAGScheduler: Missing parents: List()  
 15/08/04 20:44:59 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31), which has no missing parents  
 15/08/04 20:44:59 INFO MemoryStore: ensureFreeSpace(1888) called with curMem=0, maxMem=278302556  
 15/08/04 20:44:59 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1888.0 B, free 265.4 MB)  
 15/08/04 20:44:59 INFO MemoryStore: ensureFreeSpace(1202) called with curMem=1888, maxMem=278302556  
 15/08/04 20:44:59 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1202.0 B, free 265.4 MB)  
 15/08/04 20:44:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:45731 (size: 1202.0 B, free: 265.4 MB)  
 15/08/04 20:44:59 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874  
 15/08/04 20:44:59 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31)  
 15/08/04 20:44:59 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks  
 15/08/04 20:44:59 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1446 bytes)  
 15/08/04 20:44:59 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1446 bytes)  
 15/08/04 20:44:59 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)  
 15/08/04 20:44:59 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)  
 15/08/04 20:44:59 INFO Executor: Fetching http://182.168.133.28:50955/jars/spark-examples-1.4.1-hadoop2.2.0.jar with timestamp 1438692298221  
 15/08/04 20:45:00 INFO Utils: Fetching http://182.168.133.28:50955/jars/spark-examples-1.4.1-hadoop2.2.0.jar to /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/userFiles-f3a72f24-78e5-4d5d-82eb-dcc8c6b899cb/fetchFileTemp5981400277552657211.tmp  
 15/08/04 20:45:03 INFO Executor: Adding file:/tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/userFiles-f3a72f24-78e5-4d5d-82eb-dcc8c6b899cb/spark-examples-1.4.1-hadoop2.2.0.jar to class loader  
 15/08/04 20:45:03 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 736 bytes result sent to driver  
 15/08/04 20:45:03 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 736 bytes result sent to driver  
 15/08/04 20:45:03 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3722 ms on localhost (1/2)  
 15/08/04 20:45:03 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 3685 ms on localhost (2/2)  
 15/08/04 20:45:03 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool   
 15/08/04 20:45:03 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:35) finished in 3.750 s  
 15/08/04 20:45:03 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:35, took 4.032610 s  
 Pi is roughly 3.14038  
 15/08/04 20:45:03 INFO SparkUI: Stopped Spark web UI at http://182.168.133.28:4040  
 15/08/04 20:45:03 INFO DAGScheduler: Stopping DAGScheduler  
 15/08/04 20:45:03 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!  
 15/08/04 20:45:03 INFO Utils: path = /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/blockmgr-5ed813af-a26f-413c-bdfc-1e08001f9cb2, already present as root for deletion.  
 15/08/04 20:45:03 INFO MemoryStore: MemoryStore cleared  
 15/08/04 20:45:03 INFO BlockManager: BlockManager stopped  
 15/08/04 20:45:03 INFO BlockManagerMaster: BlockManagerMaster stopped  
 15/08/04 20:45:03 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!  
 15/08/04 20:45:03 INFO SparkContext: Successfully stopped SparkContext  
 15/08/04 20:45:03 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.  
 15/08/04 20:45:03 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.  
 15/08/04 20:45:03 INFO Utils: Shutdown hook called  
 15/08/04 20:45:03 INFO Utils: Deleting directory /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9

With this introduction, it give you an idea on what spark is all about, you can basically use spark to do distributed processing. These tutorial give some quick idea on what spark is all about and how I can use it. It is definitely worth while to look into the example directory to see what can spark really do for you. Before I end this, I think these two links are very helpful to get you further.

http://spark.apache.org/docs/latest/quick-start.html

http://spark.apache.org/docs/latest/#launching-on-a-cluster

Friday, March 13, 2015

check out java wait, notify and notifyAll method

If you have been program java for a while already, you will most probably encounter method such as wait, notify and notifyAll. Example screenshot below.

So today, we will take a look at these methods and what are them about and example code of usage of these methods. First, let's read the javadoc for these methods.

wait()

public final void wait()
throws InterruptedException

Causes the current thread to wait until another thread invokes the notify() method or the notifyAll() method for this object. In other words, this method behaves exactly as if it simply performs the call wait(0).

The current thread must own this object's monitor. The thread releases ownership of this monitor and waits until another thread notifies threads waiting on this object's monitor to wake up either through a call to the notify method or the notifyAll method. The thread then waits until it can re-obtain ownership of the monitor and resumes execution.

As in the one argument version, interrupts and spurious wakeups are possible, and this method should always be used in a loop:

synchronized (obj) {
while (<condition does not hold>)
obj.wait();
... // Perform action appropriate to condition
}

This method should only be called by a thread that is the owner of this object's monitor. See the notify method for a description of the ways in which a thread can become the owner of a monitor.

Throws:
IllegalMonitorStateException - if the current thread is not the owner of the object's monitor.
InterruptedException - if any thread interrupted the current thread before or while the current thread was waiting for a notification. The interrupted status of the current thread is cleared when this exception is thrown.

notify()

public final void notify()

Wakes up a single thread that is waiting on this object's monitor. If any threads are waiting on this object, one of them is chosen to be awakened. The choice is arbitrary and occurs at the discretion of the implementation. A thread waits on an object's monitor by calling one of the wait methods.

The awakened thread will not be able to proceed until the current thread relinquishes the lock on this object. The awakened thread will compete in the usual manner with any other threads that might be actively competing to synchronize on this object; for example, the awakened thread enjoys no reliable privilege or disadvantage in being the next thread to lock this object.

This method should only be called by a thread that is the owner of this object's monitor. A thread becomes the owner of the object's monitor in one of three ways:

By executing a synchronized instance method of that object.
By executing the body of a synchronized statement that synchronizes on the object.
For objects of type Class, by executing a synchronized static method of that class.

Only one thread at a time can own an object's monitor.

Throws:
IllegalMonitorStateException - if the current thread is not the owner of this object's monitor.

notifyAll()

public final void notifyAll()

Wakes up all threads that are waiting on this object's monitor. A thread waits on an object's monitor by calling one of the wait methods.

The awakened threads will not be able to proceed until the current thread relinquishes the lock on this object. The awakened threads will compete in the usual manner with any other threads that might be actively competing to synchronize on this object; for example, the awakened threads enjoy no reliable privilege or disadvantage in being the next thread to lock this object.

This method should only be called by a thread that is the owner of this object's monitor. See the notify method for a description of the ways in which a thread can become the owner of a monitor.

Throws:
IllegalMonitorStateException - if the current thread is not the owner of this object's monitor.

Careful reader like you might notice that, all these methods belong to the class Object and remember that all object in java has Object class as their super class. So now this answer why these methods always exists on all objects.

Now, it is pretty clear that these method are used between two or more threads that interact with each other. You might ask when or why should I use these methods? If you have these codes, like thread sleep for a second (or less) and wait up to poll/check if the queue is fill. This is resource wasting like expensive cpu cycle. Then you should start to change the code to use these methods instead.

Okay, reasons given and now let's read into code example of these method usage. I have google and many given calculator example and I will also use this an my first example. >:)

public class Calculator {

	public static void main(String[] args) {
		
		ThreadB b = new ThreadB();
		b.start();
		
		synchronized (b) {
			try {
				System.out.println("Waiting for b to complete...");
				b.wait();
			} catch (InterruptedException e) {
				e.printStackTrace();
			}
			System.out.println("total is : " + b.total);
		}
		
	}
	
	static class ThreadB extends Thread {
		int total;
		
		@Override
		public void run() {
			System.out.println("calculate start");
			synchronized(this) {
				for (int i = 0 ; i < 100; i++) {
					total += i;
				}
				notify();
			}
			System.out.println("calculate done");
		}
	}

}

output

Waiting for b to complete...
calculate start
calculate done
total is : 4950

As can be read above, we have a Calculator class. Then we have a static inner class to do a summation of 0 to 100. Noticed that a thread is started in the main method. object b is synchronized and thread b tell the main thread to wait until it is notify. In object b, we see that the thread goes through the loop to do the summation and once long running summation is executed, then thread b notify. Total is print in the main method and then thread b is done with the calculation. As for homework, try modify the codebase by removing synchronized keyword, wait and notify method, and run the application again, see the output. It will not be consistent and result will not be as expected.

Okay, let's go to a more complex example. Imagine a simple clinic where there are many patients waiting to be service by a doctor and there will be a nurse to handle the patient to the doctor one at a time. With that said, let's layout and read the code.

import java.util.Queue;
import java.util.Random;

public class Doctor extends Thread {
	
	private Queue<String> patientQueue;
	
	public Doctor(Queue<String> patientQueue) {
		this.patientQueue = patientQueue;
	}
	
	@Override
	public void run() {		
		try {
			while (true) {
				synchronized (patientQueue) {
					if (patientQueue.size() == 0) {
						patientQueue.wait();
					}
					
					String patient = patientQueue.remove();
					 
					System.out.println("treating patient: " + patient + " total patient in queue " + patientQueue.size());
					Thread.sleep(duration(2, 3));
				}

			}
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
	}
	
	
	private static int duration(int min, int max) {

	    Random rand = new Random();

	    // nextInt is normally exclusive of the top value,
	    // so add 1 to make it inclusive
	    int randomNum = rand.nextInt((max - min) + 1) + min;
	    randomNum *= 1000; 

	    return randomNum;
	}

}

import java.util.Queue;

public class Nurse extends Thread {
	
	static final int MAX = 10;
	private int i;
	
	private Queue<String> patientQueue;
	
	public Nurse(Queue<String> patientQueue) {
		this.patientQueue = patientQueue;
	}

	
	@Override
	public void run() {
		try {
			while (true) {
				patientAddedToQueue();
			}
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
	}
	
	private synchronized void patientAddedToQueue() throws InterruptedException {
		synchronized (patientQueue) {
			while (patientQueue.size() == MAX) {
				patientQueue.wait(10000);
				System.out.println("done wait");
			}
			String patientName = "patient " + i;
			patientQueue.add(patientName);
			i++;
			System.out.println("adding patient : " + patientName);
			patientQueue.notify();
			
		}
		
	}

}

import java.util.LinkedList;
import java.util.Queue;

public class Clinics {

	public static void main(String args[]) {
		
		Queue<String> patientQueue = new LinkedList<String>();
		
		Doctor veelan = new Doctor(patientQueue);
		veelan.start();
		
		Nurse gomathy = new Nurse(patientQueue);
		gomathy.start();
		
	}

}

Example output
adding patient : patient 0
adding patient : patient 1
adding patient : patient 2
adding patient : patient 3
adding patient : patient 4
adding patient : patient 5
adding patient : patient 6
adding patient : patient 7
adding patient : patient 8
adding patient : patient 9
treating patient: patient 0 total patient in queue 9
treating patient: patient 1 total patient in queue 8
treating patient: patient 2 total patient in queue 7
treating patient: patient 3 total patient in queue 6
treating patient: patient 4 total patient in queue 5
treating patient: patient 5 total patient in queue 4
treating patient: patient 6 total patient in queue 3
treating patient: patient 7 total patient in queue 2
treating patient: patient 8 total patient in queue 1
treating patient: patient 9 total patient in queue 0
done wait
adding patient : patient 10
adding patient : patient 11
adding patient : patient 12
adding patient : patient 13
adding patient : patient 14
adding patient : patient 15
adding patient : patient 16
adding patient : patient 17
adding patient : patient 18
adding patient : patient 19
treating patient: patient 10 total patient in queue 9
treating patient: patient 11 total patient in queue 8

As you can read above, we have three classes, Doctor, Nurse and Clinics. The entry point is in the class Clinics where this class bind the two other classes. A queue is share between class Doctor and Nurse class where a mock up example of nurse class will "accept" new patient with a private method patientAddedToQueue and if the queue is full at 10 patient, then a wait of 10seconds to tell the nurse object to stop accept new patient. Why 10 seconds of timeout, I will explain later. If the queue is less than 10, then a patient is added to the queue and method notify is called.

In the class Doctor, in the run method, we read that object patientQueue is synchronized. If the size is zero, there is no point to called remove method of patientQueue as it just make no sense. ;-) This doctor thread is sleep between 2 to 3 seconds to simulate a mock up example of when doctor is treating the patient.

Now, imagine if nurse gomathy has queue of size 10 and gomathy is on wait state, and meanwhile patientQueue is process by doctor veelan gradually, the queue came down slowly. But eventually the queue will become 0 as well. At this time, gomathy nurse object and veelan doctor object both in wait state. Hence, now you may figure out why 10 seconds of gomathy will start to receive more patients. :-) Probably this is another homework with better interaction between thread nurse and thread doctor for you reader. Try also add one more doctor and change notify to notifyAll.

Well that's it for this article. I hope you learn something and remember to contribute back.

Friday, December 19, 2014

Setting up eclipse with Android development environment

Today, we will explore something different. Android is a very well known brand in the public now and can be found in devices such as smartphone, tablet, wearable like watch and glass, any end user device like tv or smart appliances. So we will take a look from programming point of view. This is a fresh start article and if you come from java development background and has zero knowledge on android development then this is an article for you. More on android development articles will come later.

To reduce the learning curve and to have a better learning and development experience in android, you should really setup android developer toolkit plugin in an IDE. If you have been develop Java for sometime now, IDE such as eclipse or netbeans should come in mind. Eclipse has always been my favourite IDE for java development and in this article, I will share on how to install ADT plugin in eclipse for android environment.

This article is based on this instruction. You should refer that for any changes or error encountered if you read this article in the future. As of this writing, I'm using eclipse luna for this ADT plugin installation. Let's start the installation.

Launch eclipse and click Help then Install New Software... then click on Add button.

We will add ADT plugin repository to this eclipse IDE. In the pop up window, add as following and then click on OK.
Name: ADT Plugin
Location: https://dl-ssl.google.com/android/eclipse/

Wait a while for the update to pull in by eclipse and the in the Available Software dialog, select the checkbox next to Developer Tools and click on Next.

In the next window, you'll see a list of the tools to be downloaded. Click Next.

Read and accept the license agreements, then click Finish.
If you get a security warning saying that the authenticity or validity of the software can't be established, click OK.

When the installation completes, restart Eclipse.

Once Eclipse restarts, you must specify the location of your Android SDK directory. Because this is a new installation, there is no android sdk installed and click on Open Preferences.

In the "Welcome to Android Development" window that appears, select Install new SDK. Then wait until the installation is complete.

When sdk is installed, a new window popup requesting to install new build tool. Click on Open SDK Manager.

There are preselected build-tools and as of this moment, just accept the default preselected tools and click Install button. We can install more tools later.