Saturday, September 12, 2015

First look into Pentaho


Today, we will look into another big data technology. Pentaho

Pentaho is a company that offers Pentaho Business Analytics, a suite of open source Business Intelligence (BI) products which provide data integration, OLAP services, reporting, dashboarding, data mining and ETL capabilities.[1] Pentaho was founded in 2004 by five founders.[2][3] and is headquartered in Orlando, FL, USA.[4]

Pentaho suite consists of two offerings, an enterprise and community edition. But we will pick the community edition to check out further on Pentaho. With that said, let's download a free community pentaho suite.  I have downloaded Pentaho version 5.4 for this learning experience.

The download take sometime as it is huge file. We are talking 603MB in a compressed file. It is difficult to even begin with on how to install. I took sometime to google and search around, although there are data-integration/docs/English/install_graphical.pdf but it just took too much time to fiddle on just how to install. I don't think for new starter, this is a good motivation.

I don't think for starter this is right as I just want to quickly experience on what is pentaho. I think there should be better alternative for pentaho suite to quickly jump start it. Sorry as this is a short article.

Friday, September 11, 2015

Amazon AWSome day 2015 kuala lumpur malaysia

Recently (actually yesterday) I attended an event organized by Amazon known as AWEsome day at Kuala Lumpur Malaysia. From the agenda , it read to me that amazon will focus more on development/technical in this one day event. So I have registered particularly interested in the nosql from amazon and how much malaysian adoption on big data technologies.

For a start, I did not expect that much of people to turn out. I have been into many seminar/webinar/forum discussions, seldom I noticed that much of Malaysian participants. That got me excited when I arrived.




People are queuing up for their turn to register badge. Well, as you can see above, food and beverages are everywhere in the lobby. Throughout the event, food and coffees were served as if you can eat and drink as much as you can. It was a pity when I ask for the goodies bag which contain the amazon manual, it was finish but the helpful staff said they can send me the softcopy.

Event start around at 9am and entering into the grand ballroom...



My badge number is 937, I supposed there are around one thousand attendees! But the pictures said for itself. The site survey by the speaker shown many developers, system admin, softtware engineer, etc came to this event.

First half of the event is boring, perhaps I had expectation on the talk focus more on technical than business. First half of the talk mostly on selling amazon web services and convincing people on boarding amazon services with its attractive pricing. The speaker explained how the I.T. world is changing and how amazon fill that role and its pricing. All the marketing jargon teminologies to impress. :-) if you know what I mean.




The second half of the events are more technical. Although I wish the speaker can elaborate much longer but due to the time constraint, it was a brief one.




Topic such as amazon load balancer ssl endpoint end there, nosql technology from amazon, such as dynamo db and amazon elasticcache attract my attention. I was pretty surprised that when the speaker do another site survey on how much the attendees know about nosql, almost zero person hand up. So I would say more big data jobs coming to malaysia and malaysia is very young to this new technology. The speaker also mentioned about using the metric from the cloud watch and auto scalling by provision new server into the cloud is something attracted to me. Being a daily devops and software engineer, I interested in what metrics and how often will only result a new server provision into the cloud servers group.

Last but not least, lucky draw and certificate of attendance is handle out to all the attendees. I wish there is more of this technical seminar or even pure developer seminar focusing on the topic. It would be great nosql technologies like cassandra, elasticsearch, and hadoop can happen in malaysia in the near future.


Sunday, August 30, 2015

First learning into Cloudera Impala

Let's take a look into a vendor big data technology today. In this article, we will take a look into Cloudera Impala. So what is Impala all about?

wikipedia definition

Cloudera Impala is Cloudera's open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.[1]

and from the official github repository definition

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters. 
Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:

Let us download a virtual machine image, this is good as impala works with integration with hadoop and if you don't have hadoop knowledge, you must start from establish hadoop cluster first before integrating it with Impala. With this virtual machine image, it is as easy as import this virtual machine image into the host and power it up. It also save time for you like setting it up and reduce error.

With that said, I'm downloading a virtual box image. Once download and extract to a directory. If you have not install virtualbox, you should by now install it. apt-get install virtualbox virtualbox-guest-additions-iso and make sure virtualbox instance is running.

 root@localhost:~# /etc/init.d/virtualbox status  
 ● virtualbox.service - LSB: VirtualBox Linux kernel module  
   Loaded: loaded (/etc/init.d/virtualbox)  
   Active: active (exited) since Thu 2015-08-20 17:07:43 MYT; 2min 36s ago  
    Docs: man:systemd-sysv-generator(8)  
  Process: 29390 ExecStop=/etc/init.d/virtualbox stop (code=exited, status=0/SUCCESS)  
  Process: 29425 ExecStart=/etc/init.d/virtualbox start (code=exited, status=0/SUCCESS)  
   
 Aug 20 17:07:43 localhost systemd[1]: Starting LSB: VirtualBox Linux kernel module...  
 Aug 20 17:07:43 localhost systemd[1]: Started LSB: VirtualBox Linux kernel module.  
 Aug 20 17:07:43 localhost virtualbox[29425]: Starting VirtualBox kernel modules.  

launch virtualbox and add that virtual image into a new instance, see screenshot below.




now power this virtual machine up! Please be patient as it will take a long time to boot it up. At least for my pc. Be patient and you might want to get some drink in the mean time. The ongoing article is using this tutorial. However, I give up as select statement take a long time and it is very slow in virtual environment, at least for me here. But I will illustrate until the point where it became slow.

first you need to copy this csv files (tab1.csv and tab2.csv) into the virtual machine.







Then you can load the script with the sql to create the tables and load the csv into the table. But the example given in the tutorial does not have database and i suggest you add these two lines into the script and load it up.

 create database testdb;  
 use testdb;  
 DROP TABLE IF EXISTS tab1;  
 -- The EXTERNAL clause means the data is located outside the central location  
 -- for Impala data files and is preserved when the associated Impala table is dropped.  
 -- We expect the data to already ex  



After that, you can issue command impala-shell and you can do sql queries, but as you see, the select statement just hang there forever.



Not a good experience but if impala is what you need, find out what is the problem and let me know. :-)

Saturday, August 29, 2015

First time learning Apache HBase

Today, we will take another look at another big data technology. Apache HBase is the topic for today and before we dip our toe into Apache HBase, let's find out what actually is Apache HBase.

Apache HBase [1] is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al.[2]  Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop [3].

In this article, we can setup a single node for this adventure. Before we begin, let's download a copy of Apache HBase here. Once downloaded, extract the compressed content. At the time of this writing, I'm using Apache HBase version 1.1.1 for this learning experience.

 user@localhost:~/Desktop/hbase-1.1.1$ ls  
 bin CHANGES.txt conf     docs hbase-webapps lib LICENSE.txt NOTICE.txt README.txt  

If you have not install java, go ahead and install it. Pick a recent java or at least java7. Make sure terminal prompt the correct version of java. An example would be as of following

 user@localhost:~/Desktop/hbase-1.1.1$ java -version  
 java version "1.7.0_55"  
 Java(TM) SE Runtime Environment (build 1.7.0_55-b13)  
 Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)  

If you cannot change system configuration for this java, then in the HBase configuration file, conf/hbase-env.sh, uncomment JAVA_HOME variable and set to the java that you installed. The main configuration file for hbase is conf/hbase-site.xml and we will now edit this file so it became such as following. Change to your environment as required.

 user@localhost:~/Desktop/hbase-1.1.1$ cat conf/hbase-site.xml   
 <?xml version="1.0"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <!--  
 /**  
  *  
  * Licensed to the Apache Software Foundation (ASF) under one  
  * or more contributor license agreements. See the NOTICE file  
  * distributed with this work for additional information  
  * regarding copyright ownership. The ASF licenses this file  
  * to you under the Apache License, Version 2.0 (the  
  * "License"); you may not use this file except in compliance  
  * with the License. You may obtain a copy of the License at  
  *  
  *   http://www.apache.org/licenses/LICENSE-2.0  
  *  
  * Unless required by applicable law or agreed to in writing, software  
  * distributed under the License is distributed on an "AS IS" BASIS,  
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
  * See the License for the specific language governing permissions and  
  * limitations under the License.  
  */  
 -->  
 <configuration>  
  <property>  
   <name>hbase.rootdir</name>  
   <value>file:///home/user/Desktop/hbase-1.1.1</value>  
  </property>  
  <property>  
   <name>hbase.zookeeper.property.dataDir</name>  
   <value>/home/user/zookeeper</value>  
  </property>  
 </configuration>  

Okay, we are ready to start hbase. start it with a helpful script bin/start-hbase.sh

 user@localhost:~/Desktop/hbase-1.1.1$ bin/start-hbase.sh   
 starting master, logging to /home/user/Desktop/hbase-1.1.1/bin/../logs/hbase-user-master-localhost.out  
   
 user@localhost:~/Desktop/hbase-1.1.1/logs$ tail -F hbase-user-master-localhost.out SecurityAuth.audit hbase-user-master-localhost.log  
 ==> hbase-user-master-localhost.out <==  
   
 ==> SecurityAuth.audit <==  
 2015-08-18 17:49:41,533 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.1.1 port: 36745 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
 2015-08-18 17:49:46,812 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.0.1 port: 53042 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
 2015-08-18 17:49:48,309 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.0.1 port: 53043 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
 2015-08-18 17:49:49,317 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 127.0.0.1 port: 53044 with version info: version: "1.1.1" url: "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision: "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun 23 14:44:07 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"  
   
 ==> hbase-user-master-localhost.log <==  
 2015-08-18 17:49:49,281 INFO [StoreOpener-78a2a3664205fcf679d2043ac3259648-1] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=0, currentSize=831688, freeSize=808983544, maxSize=809815232, heapSize=831688, minSize=769324480, minFactor=0.95, multiSize=384662240, multiFactor=0.5, singleSize=192331120, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false  
 2015-08-18 17:49:49,282 INFO [StoreOpener-78a2a3664205fcf679d2043ac3259648-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to compact 0.000000  
 2015-08-18 17:49:49,295 INFO [RS_OPEN_REGION-localhost:60631-0] regionserver.HRegion: Onlined 78a2a3664205fcf679d2043ac3259648; next sequenceid=2  
 2015-08-18 17:49:49,303 INFO [PostOpenDeployTasks:78a2a3664205fcf679d2043ac3259648] regionserver.HRegionServer: Post open deploy tasks for hbase:namespace,,1439891388424.78a2a3664205fcf679d2043ac3259648.  
 2015-08-18 17:49:49,322 INFO [PostOpenDeployTasks:78a2a3664205fcf679d2043ac3259648] hbase.MetaTableAccessor: Updated row hbase:namespace,,1439891388424.78a2a3664205fcf679d2043ac3259648. with server=localhost,60631,1439891378840  
 2015-08-18 17:49:49,332 INFO [AM.ZK.Worker-pool3-t6] master.RegionStates: Transition {78a2a3664205fcf679d2043ac3259648 state=OPENING, ts=1439891389276, server=localhost,60631,1439891378840} to {78a2a3664205fcf679d2043ac3259648 state=OPEN, ts=1439891389332, server=localhost,60631,1439891378840}  
 2015-08-18 17:49:49,603 INFO [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14f4036b87d0000 type:create cxid:0x1d5 zxid:0x44 txntype:-1 reqpath:n/a Error Path:/hbase/namespace/default Error:KeeperErrorCode = NodeExists for /hbase/namespace/default  
 2015-08-18 17:49:49,625 INFO [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14f4036b87d0000 type:create cxid:0x1d8 zxid:0x46 txntype:-1 reqpath:n/a Error Path:/hbase/namespace/hbase Error:KeeperErrorCode = NodeExists for /hbase/namespace/hbase  
 2015-08-18 17:49:49,639 INFO [localhost:51452.activeMasterManager] master.HMaster: Master has completed initialization  
 2015-08-18 17:49:49,642 INFO [localhost:51452.activeMasterManager] quotas.MasterQuotaManager: Quota support disabled  

and you notice, log file is also available and jps shown a HMaster is running.

 user@localhost: $ jps  
 22144 Jps  
 21793 HMaster  

okay, let's experience apache hbase using a hbase shell.

 user@localhost:~/Desktop/hbase-1.1.1$ ./bin/hbase shell  
 2015-08-18 17:55:25,134 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
 HBase Shell; enter 'help<RETURN>' for list of supported commands.  
 Type "exit<RETURN>" to leave the HBase Shell  
 Version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015  
   
 hbase(main):001:0>   
   
 A help command show very helpful description such as the followings.  
   
 hbase(main):001:0> help  
 HBase Shell, version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015  
 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.  
 Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.  
   
 COMMAND GROUPS:  
  Group name: general  
  Commands: status, table_help, version, whoami  
   
  Group name: ddl  
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters  
   
  Group name: namespace  
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables  
   
  Group name: dml  
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve  
   
  Group name: tools  
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, split, trace, unassign, wal_roll, zk_dump  
   
  Group name: replication  
  Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs  
   
  Group name: snapshots  
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot  
   
  Group name: configuration  
  Commands: update_all_config, update_config  
   
  Group name: quotas  
  Commands: list_quotas, set_quota  
   
  Group name: security  
  Commands: grant, revoke, user_permission  
   
  Group name: visibility labels  
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility  
   
 SHELL USAGE:  
 Quote all names in HBase Shell such as table and column names. Commas delimit  
 command parameters. Type <RETURN> after entering a command to run it.  
 Dictionaries of configuration used in the creation and alteration of tables are  
 Ruby Hashes. They look like this:  
   
  {'key1' => 'value1', 'key2' => 'value2', ...}  
   
 and are opened and closed with curley-braces. Key/values are delimited by the  
 '=>' character combination. Usually keys are predefined constants such as  
 NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type  
 'Object.constants' to see a (messy) list of all constants in the environment.  
   
 If you are using binary keys or values and need to enter them in the shell, use  
 double-quote'd hexadecimal representation. For example:  
   
  hbase> get 't1', "key\x03\x3f\xcd"  
  hbase> get 't1', "key\003\023\011"  
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"  
   
 The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.  
 For more on the HBase Shell, see http://hbase.apache.org/book.html  
 hbase(main):002:0>   

To create a table (column family),

 hbase(main):002:0> create 'test', 'cf'  
 0 row(s) in 1.5700 seconds  
   
 => Hbase::Table - test  
 hbase(main):003:0>   

list information about a table.

 hbase(main):001:0> list 'test'  
 TABLE                                                                                               
 test                                                                                               
 1 row(s) in 0.3530 seconds  
   
 => ["test"]  

let's put something into the table we have just created.

 hbase(main):002:0> put 'test', 'row1', 'cf:a', 'value1'  
 0 row(s) in 0.2280 seconds  
   
 hbase(main):003:0> put 'test', 'row2', 'cf:b', 'value2'  
 0 row(s) in 0.0140 seconds  
   
 hbase(main):004:0> put 'test', 'row3', 'cf:c', 'value3'  
 0 row(s) in 0.0060 seconds  
   
 hbase(main):005:0>   

Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in this case.

To select the row from the table, use scan.

 hbase(main):005:0> scan 'test'  
 ROW                       COLUMN+CELL                                                                   
  row1                      column=cf:a, timestamp=1439892359305, value=value1                                                
  row2                      column=cf:b, timestamp=1439892363921, value=value2                                                
  row3                      column=cf:c, timestamp=1439892369775, value=value3                                                
 3 row(s) in 0.0420 seconds  
   
 hbase(main):006:0>   

To get a row only.

 hbase(main):006:0> get 'test', 'row1'  
 COLUMN                      CELL                                                                       
  cf:a                      timestamp=1439892359305, value=value1                                                      
 1 row(s) in 0.0340 seconds  
   
 hbase(main):007:0>   

Something really interesting about apache hbase, say if you want to delete or change settings of a table, you need to disable it first. After that, you can enable it back.

 hbase(main):007:0> disable 'test'  
 0 row(s) in 2.3610 seconds  
   
 hbase(main):008:0> enable 'test'  
 0 row(s) in 1.2790 seconds  
   
 hbase(main):009:0>   

okay, now, let's delete this table.

 hbase(main):009:0> drop 'test'  
   
 ERROR: Table test is enabled. Disable it first.  
   
 Here is some help for this command:  
 Drop the named table. Table must first be disabled:  
  hbase> drop 't1'  
  hbase> drop 'ns1:t1'  
   
   
 hbase(main):010:0> disable 'test'  
 0 row(s) in 2.2640 seconds  
   
 hbase(main):011:0> drop 'test'  
 0 row(s) in 1.2800 seconds  
   
 hbase(main):012:0>   

Okay, we are done for this basic learning. Let's quit for now.

 hbase(main):012:0> quit  
 user@localhost:~/Desktop/hbase-1.1.1$   
   
 To stop apache hbase instance,   
   
 user@localhost:~/Desktop/hbase-1.1.1$ ./bin/stop-hbase.sh   
 stopping hbase.................  
   
   
 user@localhost:~/Desktop/hbase-1.1.1$ jps  
 23399 Jps  
 5445 org.eclipse.equinox.launcher_1.3.0.v20140415-2008.jar  

If you like me who came from apache cassandra, apache hbase looks very similar. If this interest you, I shall leave you with the following three links which will get you further.

http://hbase.apache.org/book.html

http://wiki.apache.org/hadoop/Hbase

https://blogs.apache.org/hbase/

Friday, August 28, 2015

First light learning into Apache Storm part 1

Today we will go through another software, Apache Storm. According to the official Apache Storm github

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation.

Well, if you like me which are new to Apache Storm, this seem a bit vague on what Apache Storm is about. Fear not, we will in this article, go through some basic apache storm like installing storm, setup a storm cluster and perform a storm of hello world. But this is a good video that give introduction to apache storm.

If you study storm, the fundamentals three terminologies which you may come across which are spouts, bolts and topologies. These definition are excerpt from this site link.

There are just three abstractions in Storm: spouts, bolts, and topologies. A spout is a source of streams in a computation. Typically a spout reads from a queueing broker such as Kestrel, RabbitMQ, or Kafka, but a spout can also generate its own stream or read from somewhere like the Twitter streaming API. Spout implementations already exist for most queueing systems.
A bolt processes any number of input streams and produces any number of new output streams. Most of the logic of a computation goes into bolts, such as functions, filters, streaming joins, streaming aggregations, talking to databases, and so on.
A topology is a network of spouts and bolts, with each edge in the network representing a bolt subscribing to the output stream of some other spout or bolt. A topology is an arbitrarily complex multi-stage stream computation. Topologies run indefinitely when deployed


Let's first download and install Apache Storm. Pick a stable version at here, download and then extract it. By now, your directories should be similar to the one below. I'm using Apache Storm 0.9.5 for this learning experience.

 user@localhost:~/Desktop/apache-storm-0.9.5$ ls   
 bin CHANGELOG.md conf DISCLAIMER examples external lib LICENSE logback     NOTICE     public     README.markdown RELEASE SECURITY.md  
 user@localhost:~/Desktop/apache-storm-0.9.5$   

In the next article, we will setup a storm cluster.

Sunday, August 16, 2015

First time learning gradle

It is difficult to jump start into software development if you are new to introduction of many sub technologies. Today, I'm gonna put aside of my project and start to learn another technology. Gradle, a build system but there are much more than just build. If you are also new to gradle, you might want to find out what actually is gradle.

Gradle on wikipedia

Gradle is a build automation tool that builds upon the concepts of Apache Ant and Apache Maven and introduces a Groovy-based domain-specific language (DSL) instead of the more traditional XML form of declaring the project configuration. Gradle uses a directed acyclic graph ("DAG") to determine the order in which tasks can be run.
Gradle was designed for multi-project builds which can grow to be quite large, and supports incremental builds by intelligently determining which parts of the build tree are up-to-date, so that any task dependent upon those parts will not need to be re-executed.

If you have many projects that depend on a project, gradle will solve your problems. We will look into the basic of gradle build automation tool today. I love to code java and so I will use java as this demo. First, let's install gradle. If you are using deb based distribution like debian or ubuntu, to install gradle, it is as easy as $ sudo apt-get install gradle. Otherwise, you can download gradle from http://gradle.org/ and install in your system. Now let's create a gradle build file. See below.

 user@localhost:~/gradle$ cat build.gradle   
 apply plugin: 'java'  
 user@localhost:~/gradle$ ls -a  
 total 36K  
 -rw-r--r--  1 user user  21 Aug 6 17:15 build.gradle  
 drwxr-xr-x 214 user user 28K Aug 6 17:15 ..  
 drwxr-xr-x  2 user user 4.0K Aug 6 17:15 .  
 user@localhost:~/gradle$ gradle build  
 :compileJava UP-TO-DATE  
 :processResources UP-TO-DATE  
 :classes UP-TO-DATE  
 :jar  
 :assemble  
 :compileTestJava UP-TO-DATE  
 :processTestResources UP-TO-DATE  
 :testClasses UP-TO-DATE  
 :test  
 :check  
 :build  
   
 BUILD SUCCESSFUL  
   
 Total time: 13.304 secs  
 user@localhost:~/gradle$ ls -a  
 total 44K  
 -rw-r--r--  1 user user  21 Aug 6 17:15 build.gradle  
 drwxr-xr-x 214 user user 28K Aug 6 17:15 ..  
 drwxr-xr-x  3 user user 4.0K Aug 6 17:15 .gradle  
 drwxr-xr-x  4 user user 4.0K Aug 6 17:15 .  
 drwxr-xr-x  6 user user 4.0K Aug 6 17:15 build  
 user@localhost:~/gradle$ find .gradle/  
 .gradle/  
 .gradle/1.5  
 .gradle/1.5/taskArtifacts  
 .gradle/1.5/taskArtifacts/fileHashes.bin  
 .gradle/1.5/taskArtifacts/taskArtifacts.bin  
 .gradle/1.5/taskArtifacts/fileSnapshots.bin  
 .gradle/1.5/taskArtifacts/outputFileStates.bin  
 .gradle/1.5/taskArtifacts/cache.properties.lock  
 .gradle/1.5/taskArtifacts/cache.properties  
 user@localhost:~/gradle$ find build  
 build  
 build/libs  
 build/libs/gradle.jar  
 build/test-results  
 build/test-results/binary  
 build/test-results/binary/test  
 build/test-results/binary/test/results.bin  
 build/reports  
 build/reports/tests  
 build/reports/tests/report.js  
 build/reports/tests/index.html  
 build/reports/tests/base-style.css  
 build/reports/tests/style.css  
 build/tmp  
 build/tmp/jar  
 build/tmp/jar/MANIFEST.MF  

one liner of input produce so many output files. Amazing! Why so many files that were generated, read the output of the command output, it compile, process resource, jar, assemble, test check and build. What are all these means, I will not explain to you one by one, you learn better if you read this definition yourself which is documented very well here. You might say, hey , I have different java source path can gradle handle this? Yes of cause! In the build path you created, you can add another line.

 // set the source java folder to another non maven standard path  
 sourceSets.main.java.srcDirs = ['src/java']  

Most of us coming from java has ant build file. If that is the case, gradle integrate nicely with ant too, you just need to import ant build file and then call ant target from gradle. See code snippet below.

 user@localhost:~/gradle$ cat build.xml   
 <project>  
  <target name="helloAnt">  
   <echo message="hello this is ant."/>  
  </target>  
 </project>  
 user@localhost:~/gradle$ cat build.gradle  
 apply plugin: 'java'  
   
 // set the source java folder to another non maven standard path  
 sourceSets.main.java.srcDirs = ['src/java']  
   
 // import ant build file.  
 ant.importBuild 'build.xml'  
 user@localhost:~/gradle$ gradle helloAnt   
 :helloAnt  
 [ant:echo] hello this is ant.  
   
 BUILD SUCCESSFUL  
   
 Total time: 5.573 secs  

That looks pretty good! If you curious about what gradle parameter that you can use during figuring out if the build went wrong, you should really read into this link. Also, if read on the environment variable as you can specify other jdk for gradle or even java parameter during compile big projects.

You might want to ask also, what if I only want to compile, I don't want to go through all the automatic builds above. No problem, since this is a java project, you specify compileJava.

 user@localhost:~/gradle$ gradle compileJava  
 :compileJava UP-TO-DATE  
   
 BUILD SUCCESSFUL  
   
 Total time: 4.976 secs  

As you can see, gradle is very flexible and because of that, you might want to exploit it further. For example, customizing the task in build.gradle, listing projects, listing tasks and others. For that, read here as it explain and give a lot of example how all that can be done. So at this stage, you might want to add more feature into gradle build file. Okay, let's do just that.

 user@localhost:~/gradle$ cat build.gradle   
 apply plugin: 'java'  
 apply plugin: 'eclipse'  
   
 // set the source java folder to another non maven standard path  
 // default src/main/java  
 sourceSets.main.java.srcDirs = ['src/java']  
   
 // default src test   
 //src/test/java  
   
 // default src resources.  
 // src/main/resources   
   
 // default src test resources.  
 // src/test/resources  
   
 // default build  
 // build  
   
 // default jar built  
 // build/libs  
   
   
 // dependencies of external jar, we reference the very good from maven.  
 repositories {  
   mavenCentral()  
 }  
   
 // actual libs dependencies  
 dependencies {  
   compile group: 'commons-collections', name: 'commons-collections', version: '3.2'  
   testCompile group: 'junit', name: 'junit', version: '4.+'  
 }  
   
 test {  
   testLogging {  
     // Show that tests are run in the command-line output  
     events 'started', 'passed'  
   }  
 }  
   
 sourceCompatibility = 1.5  
 version = '1.0'  
 jar {  
   manifest {  
     attributes 'Implementation-Title': 'Gradle Quickstart',  
           'Implementation-Version': version  
   }  
 }  
   
 // import ant build file.  
 ant.importBuild 'build.xml'  
   
 // common for subprojects  
 subprojects {  
   apply plugin: 'java'  
   
   repositories {  
     mavenCentral()  
   }  
   
   dependencies {  
     testCompile 'junit:junit:4.12'  
   }  
   
   version = '1.0'  
   
   jar {  
     manifest.attributes provider: 'gradle'  
   }  
 }  
 user@localhost:~/gradle$ cat settings.gradle   
 include ":nativeapp",":webapp"  

Now, if you want to generate eclipse configuration, just run gradle eclipse, all eclipse configuration and setting are created automatically. Of cause, you can customize settings even further.

 user@localhost:~/gradle$ gradle eclipse  
 :eclipseClasspath  
 Download http://repo1.maven.org/maven2/junit/junit/4.12/junit-4.12.pom  
 Download http://repo1.maven.org/maven2/junit/junit/4.12/junit-4.12-sources.jar  
 Download http://repo1.maven.org/maven2/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3-sources.jar  
 Download http://repo1.maven.org/maven2/junit/junit/4.12/junit-4.12.jar  
 :eclipseJdt  
 :eclipseProject  
 :eclipse  
   
 BUILD SUCCESSFUL  
   
 Total time: 19.497 secs  
 user@localhost:~/gradle$ find .  
 .  
 .  
 ./build.xml  
 ./build  
 ./build/classes  
 ./build/classes/test  
 ./build/classes/test/org  
 ./build/classes/test/org/just4fun  
 ./build/classes/test/org/just4fun/voc  
 ./build/classes/test/org/just4fun/voc/file  
 ./build/classes/test/org/just4fun/voc/file/QuickTest.class  
 ./build/libs  
 ./build/libs/gradle.jar  
 ./build/libs/gradle-1.0.jar  
 ./build/test-results  
 ./build/test-results/binary  
 ./build/test-results/binary/test  
 ./build/test-results/binary/test/results.bin  
 ./build/test-results/TEST-org.just4fun.voc.file.QuickTest.xml  
 ./build/reports  
 ./build/reports/tests  
 ./build/reports/tests/report.js  
 ./build/reports/tests/index.html  
 ./build/reports/tests/org.just4fun.voc.file.html  
 ./build/reports/tests/base-style.css  
 ./build/reports/tests/org.just4fun.voc.file.QuickTest.html  
 ./build/reports/tests/style.css  
 ./build/dependency-cache  
 ./build/tmp  
 ./build/tmp/jar  
 ./build/tmp/jar/MANIFEST.MF  
 ./webapp  
 ./webapp/build.gradle  
 ./.gradle  
 ./.gradle/1.5  
 ./.gradle/1.5/taskArtifacts  
 ./.gradle/1.5/taskArtifacts/fileHashes.bin  
 ./.gradle/1.5/taskArtifacts/taskArtifacts.bin  
 ./.gradle/1.5/taskArtifacts/fileSnapshots.bin  
 ./.gradle/1.5/taskArtifacts/outputFileStates.bin  
 ./.gradle/1.5/taskArtifacts/cache.properties.lock  
 ./.gradle/1.5/taskArtifacts/cache.properties  
 ./.classpath  
 ./build.gradle  
 ./.project  
 ./.settings  
 ./.settings/org.eclipse.jdt.core.prefs  
 ./settings.gradle  
 ./nativeapp  
 ./nativeapp/build.gradle  
 ./src  
 ./src/test  
 ./src/test/java  
 ./src/test/java/org  
 ./src/test/java/org/just4fun  
 ./src/test/java/org/just4fun/voc  
 ./src/test/java/org/just4fun/voc/file  
 ./src/test/java/org/just4fun/voc/file/QuickTest.java  

Now, I create a simple unit test class file, see below. Then only run a single unit test, that's very cool.

 user@localhost:~/gradle$ find src/  
 src/  
 src/test  
 src/test/java  
 src/test/java/org  
 src/test/java/org/just4fun  
 src/test/java/org/just4fun/voc  
 src/test/java/org/just4fun/voc/file  
 src/test/java/org/just4fun/voc/file/QuickTest.java  
 $ gradle -Dtest.single=Quick test  
 :compileJava UP-TO-DATE  
 :processResources UP-TO-DATE  
 :classes UP-TO-DATE  
 :compileTestJavawarning: [options] bootstrap class path not set in conjunction with -source 1.5  
 1 warning  
   
 :processTestResources UP-TO-DATE  
 :testClasses  
 :test  
   
 org.just4fun.voc.file.QuickTest > test STARTED  
   
 org.just4fun.voc.file.QuickTest > test PASSED  
   
 BUILD SUCCESSFUL  
   
 Total time: 55.81 secs  
 user@localhost:~/gradle $  

There are two additional directories created , that is nativeapp and webapp, this is subprojects for this big project and it contain its own gradle build file. At the parent of the gradle build file, we see a subprojects configuration as this will applied to all the subprojects. You can create a settings.gradle to specify the subprojects.

That's all for today, as this is just an introduction to quicklyl dive into some of the cool features of gradle, with this shown, I hope it give you some idea where to head next. Good luck!


Saturday, August 15, 2015

First learning Node.js

We will learn another software today, Node.js. Another word that I came across many times when reading on information technology articles. First, let's take a look on what is Node.js. From the official site,

Node.js® is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.

So this is very much to understand what exactly is Node.js from that two sentences but as you continue to read in this article, you will get some idea. If you have basic javascript coding experience, you will think Node.js is just a script that run goodies stuff on browsers to enhance people experience. But as javascript envolve, Node.js evolve into an application where you can code as a server application! We will see that later in a moment.

Okay, let's install Node.js. If you are using deb base linux distribution, for example debian or ubuntu. It is as easy as $ sudo apt-get install nodejs. Otherwise, you can download a copy from this official site and install it.

Let's start with a simple Node.js hello world. Very easy, create a helloworld.js and do the print. See below.

 user@localhost:~/nodejs$ cat helloworld.js   
 console.log("Hello World");  
 user@localhost:~/nodejs$ nodejs helloworld.js   
 Hello World  
 user@localhost:~/nodejs$   

very simple, one liner produce the hello world output. You might ask, what can Node.js functionalities can I use other than console. Well, at the end of this article, I will give you the link so you can explore further. But in the meantime, I will show you how easy to create a web server using Node.js! Let's read the code below.

 user@localhost:~/nodejs$ cat server.js   
 var http = require("http");  
 http.createServer(function(request, response) {  
  response.writeHead(200, {"Content-Type": "text/plain"});  
  response.write("Hello World");  
  response.end();  
 }).listen(8888);  
 console.log("create a webserver at port 8888");  
 user@localhost:~/nodejs$ nodejs server.js   
 create a webserver at port 8888  

As you can read,  we create a file called server.js require a module called http. We pass an anonymous function into the function createServer of http module. The response will return http status 200 with a hello world. You can try to access in your browser with localhost:8888. Notice that the execution of the Node.js continue after http is created, unlike other language which will wait the execution finish before proceed the next line of code, Node.js execution will continue and this make Node.js asynchronous.

Well, by now you should understand what Node.js can do for you and if you interest more on Node.js , I will leave you this very helpful link.