Friday, August 14, 2015

Light learning apache spark

A while back, I was reading articles and many articles referencing spark and in this week, hey, why not check out what actually is spark. Googling spark produced many results return and we are particularly interested in apache spark. Let us take a look today at apache stark and what is all about. From official spark github,

Apache Spark
Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, and Python, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.
Okay, let's download a copy of spark to your local pc. You can download from this site.


extract the downloaded file and ran the command, not good.

 user@localhost:~/Desktop/spark-1.4.1$ ./bin/pyspark   
 ls: cannot access /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10: No such file or directory  
 Failed to find Spark assembly in /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10.  
 You need to build Spark before running this program.  
 user@localhost:~/Desktop/spark-1.4.1$ ./bin/spark-shell   
 ls: cannot access /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10: No such file or directory  
 Failed to find Spark assembly in /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10.  
 You need to build Spark before running this program.  

Well, the default download setting is source, so you will have to compile the source.

 user@localhost:~/Desktop/spark-1.4.1$ mvn -DskipTests clean package  
 [INFO] Scanning for projects...  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Reactor Build Order:  
 [INFO]   
 [INFO] Spark Project Parent POM  
 [INFO] Spark Launcher Project  
 [INFO] Spark Project Networking  
 [INFO] Spark Project Shuffle Streaming Service  
 [INFO] Spark Project Unsafe  
 ...  
 ...  
 ...  
 constituent[20]: file:/usr/share/maven/lib/wagon-http-shaded.jar  
 constituent[21]: file:/usr/share/maven/lib/maven-settings-builder-3.x.jar  
 constituent[22]: file:/usr/share/maven/lib/maven-aether-provider-3.x.jar  
 constituent[23]: file:/usr/share/maven/lib/maven-core-3.x.jar  
 constituent[24]: file:/usr/share/maven/lib/plexus-cipher.jar  
 constituent[25]: file:/usr/share/maven/lib/aether-util.jar  
 constituent[26]: file:/usr/share/maven/lib/commons-httpclient.jar  
 constituent[27]: file:/usr/share/maven/lib/commons-cli.jar  
 constituent[28]: file:/usr/share/maven/lib/aether-api.jar  
 constituent[29]: file:/usr/share/maven/lib/maven-model-3.x.jar  
 constituent[30]: file:/usr/share/maven/lib/guava.jar  
 constituent[31]: file:/usr/share/maven/lib/wagon-file.jar  
 ---------------------------------------------------  
 Exception in thread "main" java.lang.OutOfMemoryError: PermGen space  
      at java.lang.ClassLoader.defineClass1(Native Method)  
      at java.lang.ClassLoader.defineClass(ClassLoader.java:800)  
      at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)  
      at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)  
      at java.net.URLClassLoader.access$100(URLClassLoader.java:71)  
      at java.net.URLClassLoader$1.run(URLClassLoader.java:361)  
      at java.net.URLClassLoader$1.run(URLClassLoader.java:355)  
      at java.security.AccessController.doPrivileged(Native Method)  
      at java.net.URLClassLoader.findClass(URLClassLoader.java:354)  
      at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClassFromSelf(ClassRealm.java:401)  
      at org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy.loadClass(SelfFirstStrategy.java:42)  
      at org.codehaus.plexus.classworlds.realm.ClassRealm.unsynchronizedLoadClass(ClassRealm.java:271)  
      at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:247)  
      at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:239)  
      at org.apache.maven.cli.MavenCli.execute(MavenCli.java:545)  
      at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)  
      at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)  
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
      at java.lang.reflect.Method.invoke(Method.java:606)  
      at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)  
      at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)  
      at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)  
      at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)  

okay, let's beef up a little for the build setting, and the build took very long time, eventually. I switch to build in the directory build. See below.

 user@localhost:~/Desktop/spark-1.4.1$ export MAVEN_OPTS="-XX:MaxPermSize=1024M"  
 user@localhost:~/Desktop/spark-1.4.1$ mvn -DskipTests clean package  
   
 [INFO] Scanning for projects...  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Reactor Build Order:  
 [INFO]   
 [INFO] Spark Project Parent POM  
 [INFO] Spark Launcher Project  
 [INFO] Spark Project Networking  
 [INFO] Spark Project Shuffle Streaming Service  
 [INFO] Spark Project Unsafe  
 [INFO] Spark Project Core  
   
 user@localhost:~/Desktop/spark-1.4.1$ build/mvn -DskipTests clean package  
 [INFO] Scanning for projects...  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Reactor Build Order:  
 [INFO]   
 [INFO] Spark Project Parent POM  
 [INFO] Spark Launcher Project  
 [INFO] Spark Project Networking  
 [INFO] Spark Project Shuffle Streaming Service  
 [INFO] Spark Project Unsafe  
 [INFO] Spark Project Core  
 ..  
 ...  
 ...  
 ...  
 get/spark-streaming-kafka-assembly_2.10-1.4.1-shaded.jar  
 [INFO]   
 [INFO] --- maven-source-plugin:2.4:jar-no-fork (create-source-jar) @ spark-streaming-kafka-assembly_2.10 ---  
 [INFO] Building jar: /home/user/Desktop/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1-sources.jar  
 [INFO]   
 [INFO] --- maven-source-plugin:2.4:test-jar-no-fork (create-source-jar) @ spark-streaming-kafka-assembly_2.10 ---  
 [INFO] Building jar: /home/user/Desktop/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1-test-sources.jar  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Reactor Summary:  
 [INFO]   
 [INFO] Spark Project Parent POM .......................... SUCCESS [26.138s]  
 [INFO] Spark Launcher Project ............................ SUCCESS [1:15.976s]  
 [INFO] Spark Project Networking .......................... SUCCESS [26.347s]  
 [INFO] Spark Project Shuffle Streaming Service ........... SUCCESS [14.123s]  
 [INFO] Spark Project Unsafe .............................. SUCCESS [12.643s]  
 [INFO] Spark Project Core ................................ SUCCESS [9:49.622s]  
 [INFO] Spark Project Bagel ............................... SUCCESS [17.426s]  
 [INFO] Spark Project GraphX .............................. SUCCESS [53.601s]  
 [INFO] Spark Project Streaming ........................... SUCCESS [1:34.290s]  
 [INFO] Spark Project Catalyst ............................ SUCCESS [2:04.020s]  
 [INFO] Spark Project SQL ................................. SUCCESS [2:11.032s]  
 [INFO] Spark Project ML Library .......................... SUCCESS [2:57.880s]  
 [INFO] Spark Project Tools ............................... SUCCESS [6.920s]  
 [INFO] Spark Project Hive ................................ SUCCESS [2:58.649s]  
 [INFO] Spark Project REPL ................................ SUCCESS [36.564s]  
 [INFO] Spark Project Assembly ............................ SUCCESS [3:13.152s]  
 [INFO] Spark Project External Twitter .................... SUCCESS [1:09.316s]  
 [INFO] Spark Project External Flume Sink ................. SUCCESS [42.294s]  
 [INFO] Spark Project External Flume ...................... SUCCESS [37.907s]  
 [INFO] Spark Project External MQTT ....................... SUCCESS [1:20.999s]  
 [INFO] Spark Project External ZeroMQ ..................... SUCCESS [29.090s]  
 [INFO] Spark Project External Kafka ...................... SUCCESS [54.212s]  
 [INFO] Spark Project Examples ............................ SUCCESS [5:54.508s]  
 [INFO] Spark Project External Kafka Assembly ............. SUCCESS [1:24.962s]  
 [INFO] ------------------------------------------------------------------------  
 [INFO] BUILD SUCCESS  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Total time: 41:53.884s  
 [INFO] Finished at: Tue Aug 04 08:56:02 MYT 2015  
 [INFO] Final Memory: 71M/684M  
 [INFO] ------------------------------------------------------------------------  

Yes, finally the build is success. Even though success, as you can see above, it took 41minutes on my pc just to compile. Okay, now that all libs are built, let's repeat the command we type just now.

 $ ./bin/spark-shell  
 log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).  
 log4j:WARN Please initialize the log4j system properly.  
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.  
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties  
 15/08/04 20:21:16 INFO SecurityManager: Changing view acls to: user  
 15/08/04 20:21:16 INFO SecurityManager: Changing modify acls to: user  
 15/08/04 20:21:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)  
 15/08/04 20:21:16 INFO HttpServer: Starting HTTP Server  
 15/08/04 20:21:17 INFO Utils: Successfully started service 'HTTP class server' on port 56379.  
 Welcome to  
    ____       __  
    / __/__ ___ _____/ /__  
   _\ \/ _ \/ _ `/ __/ '_/  
   /___/ .__/\_,_/_/ /_/\_\  version 1.4.1  
    /_/  
   
 Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)  
 Type in expressions to have them evaluated.  
 Type :help for more information.  
 15/08/04 20:21:24 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.1.1; using 192.168.133.28 instead (on interface eth0)  
 15/08/04 20:21:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address  
 15/08/04 20:21:24 INFO SparkContext: Running Spark version 1.4.1  
 15/08/04 20:21:24 INFO SecurityManager: Changing view acls to: user  
 15/08/04 20:21:24 INFO SecurityManager: Changing modify acls to: user  
 15/08/04 20:21:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)  
 15/08/04 20:21:25 INFO Slf4jLogger: Slf4jLogger started  
 15/08/04 20:21:26 INFO Remoting: Starting remoting  
 15/08/04 20:21:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.133.28:47888]  
 15/08/04 20:21:26 INFO Utils: Successfully started service 'sparkDriver' on port 47888.  
 15/08/04 20:21:27 INFO SparkEnv: Registering MapOutputTracker  
 15/08/04 20:21:27 INFO SparkEnv: Registering BlockManagerMaster  
 15/08/04 20:21:27 INFO DiskBlockManager: Created local directory at /tmp/spark-660b5f39-26be-4ea2-8593-c0c05a093a23/blockmgr-c3225f03-5ecf-4fed-bbe4-df2331ac7742  
 15/08/04 20:21:27 INFO MemoryStore: MemoryStore started with capacity 265.4 MB  
 15/08/04 20:21:27 INFO HttpFileServer: HTTP File server directory is /tmp/spark-660b5f39-26be-4ea2-8593-c0c05a093a23/httpd-3ab40971-a6d0-42a7-b39e-4d1ce4290642  
 15/08/04 20:21:27 INFO HttpServer: Starting HTTP Server  
 15/08/04 20:21:27 INFO Utils: Successfully started service 'HTTP file server' on port 50089.  
 15/08/04 20:21:27 INFO SparkEnv: Registering OutputCommitCoordinator  
 15/08/04 20:21:28 INFO Utils: Successfully started service 'SparkUI' on port 4040.  
 15/08/04 20:21:28 INFO SparkUI: Started SparkUI at http://192.168.133.28:4040  
 15/08/04 20:21:28 INFO Executor: Starting executor ID driver on host localhost  
 15/08/04 20:21:28 INFO Executor: Using REPL class URI: http://192.168.133.28:56379  
 15/08/04 20:21:28 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36428.  
 15/08/04 20:21:28 INFO NettyBlockTransferService: Server created on 36428  
 15/08/04 20:21:28 INFO BlockManagerMaster: Trying to register BlockManager  
 15/08/04 20:21:28 INFO BlockManagerMasterEndpoint: Registering block manager localhost:36428 with 265.4 MB RAM, BlockManagerId(driver, localhost, 36428)  
 15/08/04 20:21:28 INFO BlockManagerMaster: Registered BlockManager  
 15/08/04 20:21:29 INFO SparkILoop: Created spark context..  
 Spark context available as sc.  
 15/08/04 20:21:30 INFO SparkILoop: Created sql context..  
 SQL context available as sqlContext.  
   
 scala>   

Okay, everything looks good, the error above no longer exists. Let's explore further.

 scala> sc.parallelize(1 to 1000).count()  
 15/08/04 20:30:05 INFO SparkContext: Starting job: count at <console>:22  
 15/08/04 20:30:05 INFO DAGScheduler: Got job 0 (count at <console>:22) with 4 output partitions (allowLocal=false)  
 15/08/04 20:30:05 INFO DAGScheduler: Final stage: ResultStage 0(count at <console>:22)  
 15/08/04 20:30:05 INFO DAGScheduler: Parents of final stage: List()  
 15/08/04 20:30:05 INFO DAGScheduler: Missing parents: List()  
 15/08/04 20:30:05 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at <console>:22), which has no missing parents  
 15/08/04 20:30:05 INFO MemoryStore: ensureFreeSpace(1096) called with curMem=0, maxMem=278302556  
 15/08/04 20:30:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1096.0 B, free 265.4 MB)  
 15/08/04 20:30:05 INFO MemoryStore: ensureFreeSpace(804) called with curMem=1096, maxMem=278302556  
 15/08/04 20:30:05 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 804.0 B, free 265.4 MB)  
 15/08/04 20:30:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:36428 (size: 804.0 B, free: 265.4 MB)  
 15/08/04 20:30:05 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874  
 15/08/04 20:30:05 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at <console>:22)  
 15/08/04 20:30:05 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks  
 15/08/04 20:30:05 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1369 bytes)  
 15/08/04 20:30:05 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1369 bytes)  
 15/08/04 20:30:05 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 1369 bytes)  
 15/08/04 20:30:05 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 1426 bytes)  
 15/08/04 20:30:05 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)  
 15/08/04 20:30:05 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)  
 15/08/04 20:30:05 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)  
 15/08/04 20:30:05 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)  
 15/08/04 20:30:06 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 658 bytes result sent to driver  
 15/08/04 20:30:06 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 658 bytes result sent to driver  
 15/08/04 20:30:06 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 658 bytes result sent to driver  
 15/08/04 20:30:06 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 658 bytes result sent to driver  
 15/08/04 20:30:06 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 477 ms on localhost (1/4)  
 15/08/04 20:30:06 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 478 ms on localhost (2/4)  
 15/08/04 20:30:06 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 508 ms on localhost (3/4)  
 15/08/04 20:30:06 INFO DAGScheduler: ResultStage 0 (count at <console>:22) finished in 0.520 s  
 15/08/04 20:30:06 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 478 ms on localhost (4/4)  
 15/08/04 20:30:06 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool   
 15/08/04 20:30:06 INFO DAGScheduler: Job 0 finished: count at <console>:22, took 1.079304 s  
 res0: Long = 1000  

That's pretty nice, for a small demo on how is spark work. Now move on to the next example, let's open another terminal.

 user@localhost:~/Desktop/spark-1.4.1$ ./bin/pyspark   
 Python 2.7.10 (default, Jul 1 2015, 10:54:53)   
 [GCC 4.9.2] on linux2  
 Type "help", "copyright", "credits" or "license" for more information.  
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties  
 15/08/04 20:37:42 INFO SparkContext: Running Spark version 1.4.1  
 15/08/04 20:37:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
 15/08/04 20:37:44 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.1.1; using 182.168.133.28 instead (on interface eth0)  
 15/08/04 20:37:44 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address  
 15/08/04 20:37:44 INFO SecurityManager: Changing view acls to: user  
 15/08/04 20:37:44 INFO SecurityManager: Changing modify acls to: user  
 15/08/04 20:37:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)  
 15/08/04 20:37:46 INFO Slf4jLogger: Slf4jLogger started  
 15/08/04 20:37:46 INFO Remoting: Starting remoting  
 15/08/04 20:37:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@182.168.133.28:35904]  
 15/08/04 20:37:46 INFO Utils: Successfully started service 'sparkDriver' on port 35904.  
 15/08/04 20:37:46 INFO SparkEnv: Registering MapOutputTracker  
 15/08/04 20:37:46 INFO SparkEnv: Registering BlockManagerMaster  
 15/08/04 20:37:47 INFO DiskBlockManager: Created local directory at /tmp/spark-2b46e9e7-1779-45d1-b9cf-46000baf7d9b/blockmgr-e2f47b34-47a8-4b72-a0d6-25d0a7daa02e  
 15/08/04 20:37:47 INFO MemoryStore: MemoryStore started with capacity 265.4 MB  
 15/08/04 20:37:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-2b46e9e7-1779-45d1-b9cf-46000baf7d9b/httpd-2ec128c2-bad0-4dd9-a826-eab2ee0779cb  
 15/08/04 20:37:47 INFO HttpServer: Starting HTTP Server  
 15/08/04 20:37:47 INFO Utils: Successfully started service 'HTTP file server' on port 45429.  
 15/08/04 20:37:47 INFO SparkEnv: Registering OutputCommitCoordinator  
 15/08/04 20:37:49 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.  
 15/08/04 20:37:50 INFO Utils: Successfully started service 'SparkUI' on port 4041.  
 15/08/04 20:37:50 INFO SparkUI: Started SparkUI at http://182.168.133.28:4041  
 15/08/04 20:37:50 INFO Executor: Starting executor ID driver on host localhost  
 15/08/04 20:37:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 47045.  
 15/08/04 20:37:51 INFO NettyBlockTransferService: Server created on 47045  
 15/08/04 20:37:51 INFO BlockManagerMaster: Trying to register BlockManager  
 15/08/04 20:37:51 INFO BlockManagerMasterEndpoint: Registering block manager localhost:47045 with 265.4 MB RAM, BlockManagerId(driver, localhost, 47045)  
 15/08/04 20:37:51 INFO BlockManagerMaster: Registered BlockManager  
 Welcome to  
    ____       __  
    / __/__ ___ _____/ /__  
   _\ \/ _ \/ _ `/ __/ '_/  
   /__ / .__/\_,_/_/ /_/\_\  version 1.4.1  
    /_/  
   
 Using Python version 2.7.10 (default, Jul 1 2015 10:54:53)  
 SparkContext available as sc, SQLContext available as sqlContext.  
 >>> sc.parallelize(range(1000)).count()  
 15/08/04 20:37:55 INFO SparkContext: Starting job: count at <stdin>:1  
 15/08/04 20:37:55 INFO DAGScheduler: Got job 0 (count at <stdin>:1) with 4 output partitions (allowLocal=false)  
 15/08/04 20:37:55 INFO DAGScheduler: Final stage: ResultStage 0(count at <stdin>:1)  
 15/08/04 20:37:55 INFO DAGScheduler: Parents of final stage: List()  
 15/08/04 20:37:55 INFO DAGScheduler: Missing parents: List()  
 15/08/04 20:37:55 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at count at <stdin>:1), which has no missing parents  
 15/08/04 20:37:55 INFO MemoryStore: ensureFreeSpace(4416) called with curMem=0, maxMem=278302556  
 15/08/04 20:37:55 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.3 KB, free 265.4 MB)  
 15/08/04 20:37:55 INFO MemoryStore: ensureFreeSpace(2722) called with curMem=4416, maxMem=278302556  
 15/08/04 20:37:55 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.7 KB, free 265.4 MB)  
 15/08/04 20:37:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:47045 (size: 2.7 KB, free: 265.4 MB)  
 15/08/04 20:37:55 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874  
 15/08/04 20:37:55 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (PythonRDD[1] at count at <stdin>:1)  
 15/08/04 20:37:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks  
 15/08/04 20:37:55 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1873 bytes)  
 15/08/04 20:37:55 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 2117 bytes)  
 15/08/04 20:37:55 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 2123 bytes)  
 15/08/04 20:37:55 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 2123 bytes)  
 15/08/04 20:37:55 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)  
 15/08/04 20:37:55 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)  
 15/08/04 20:37:55 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)  
 15/08/04 20:37:55 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)  
 15/08/04 20:37:56 INFO PythonRDD: Times: total = 421, boot = 376, init = 44, finish = 1  
 15/08/04 20:37:56 INFO PythonRDD: Times: total = 418, boot = 354, init = 64, finish = 0  
 15/08/04 20:37:56 INFO PythonRDD: Times: total = 423, boot = 372, init = 51, finish = 0  
 15/08/04 20:37:56 INFO PythonRDD: Times: total = 421, boot = 381, init = 40, finish = 0  
 15/08/04 20:37:56 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 698 bytes result sent to driver  
 15/08/04 20:37:56 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 698 bytes result sent to driver  
 15/08/04 20:37:56 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 698 bytes result sent to driver  
 15/08/04 20:37:56 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 698 bytes result sent to driver  
 15/08/04 20:37:56 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 552 ms on localhost (1/4)  
 15/08/04 20:37:56 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 560 ms on localhost (2/4)  
 15/08/04 20:37:56 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 562 ms on localhost (3/4)  
 15/08/04 20:37:56 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 626 ms on localhost (4/4)  
 15/08/04 20:37:56 INFO DAGScheduler: ResultStage 0 (count at <stdin>:1) finished in 0.641 s  
 15/08/04 20:37:56 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool   
 15/08/04 20:37:56 INFO DAGScheduler: Job 0 finished: count at <stdin>:1, took 1.137405 s  
 1000  
 >>>   

Looks good, next example will calculate pi using spark.

 user@localhost:~/Desktop/spark-1.4.1$ ./bin/run-example SparkPi  
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties  
 15/08/04 20:44:50 INFO SparkContext: Running Spark version 1.4.1  
 15/08/04 20:44:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
 15/08/04 20:44:51 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.1.1; using 182.168.133.28 instead (on interface eth0)  
 15/08/04 20:44:51 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address  
 15/08/04 20:44:51 INFO SecurityManager: Changing view acls to: user  
 15/08/04 20:44:51 INFO SecurityManager: Changing modify acls to: user  
 15/08/04 20:44:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)  
 15/08/04 20:44:52 INFO Slf4jLogger: Slf4jLogger started  
 15/08/04 20:44:52 INFO Remoting: Starting remoting  
 15/08/04 20:44:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@182.168.133.28:45817]  
 15/08/04 20:44:53 INFO Utils: Successfully started service 'sparkDriver' on port 45817.  
 15/08/04 20:44:53 INFO SparkEnv: Registering MapOutputTracker  
 15/08/04 20:44:53 INFO SparkEnv: Registering BlockManagerMaster  
 15/08/04 20:44:53 INFO DiskBlockManager: Created local directory at /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/blockmgr-5ed813af-a26f-413c-bdfc-1e08001f9cb2  
 15/08/04 20:44:53 INFO MemoryStore: MemoryStore started with capacity 265.4 MB  
 15/08/04 20:44:53 INFO HttpFileServer: HTTP File server directory is /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/httpd-f07ff755-e34d-4149-b4ac-399e6897221a  
 15/08/04 20:44:53 INFO HttpServer: Starting HTTP Server  
 15/08/04 20:44:53 INFO Utils: Successfully started service 'HTTP file server' on port 50955.  
 15/08/04 20:44:53 INFO SparkEnv: Registering OutputCommitCoordinator  
 15/08/04 20:44:54 INFO Utils: Successfully started service 'SparkUI' on port 4040.  
 15/08/04 20:44:54 INFO SparkUI: Started SparkUI at http://182.168.133.28:4040  
 15/08/04 20:44:58 INFO SparkContext: Added JAR file:/home/user/Desktop/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.2.0.jar at http://182.168.133.28:50955/jars/spark-examples-1.4.1-hadoop2.2.0.jar with timestamp 1438692298221  
 15/08/04 20:44:58 INFO Executor: Starting executor ID driver on host localhost  
 15/08/04 20:44:58 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45731.  
 15/08/04 20:44:58 INFO NettyBlockTransferService: Server created on 45731  
 15/08/04 20:44:58 INFO BlockManagerMaster: Trying to register BlockManager  
 15/08/04 20:44:58 INFO BlockManagerMasterEndpoint: Registering block manager localhost:45731 with 265.4 MB RAM, BlockManagerId(driver, localhost, 45731)  
 15/08/04 20:44:58 INFO BlockManagerMaster: Registered BlockManager  
 15/08/04 20:44:59 INFO SparkContext: Starting job: reduce at SparkPi.scala:35  
 15/08/04 20:44:59 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false)  
 15/08/04 20:44:59 INFO DAGScheduler: Final stage: ResultStage 0(reduce at SparkPi.scala:35)  
 15/08/04 20:44:59 INFO DAGScheduler: Parents of final stage: List()  
 15/08/04 20:44:59 INFO DAGScheduler: Missing parents: List()  
 15/08/04 20:44:59 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31), which has no missing parents  
 15/08/04 20:44:59 INFO MemoryStore: ensureFreeSpace(1888) called with curMem=0, maxMem=278302556  
 15/08/04 20:44:59 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1888.0 B, free 265.4 MB)  
 15/08/04 20:44:59 INFO MemoryStore: ensureFreeSpace(1202) called with curMem=1888, maxMem=278302556  
 15/08/04 20:44:59 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1202.0 B, free 265.4 MB)  
 15/08/04 20:44:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:45731 (size: 1202.0 B, free: 265.4 MB)  
 15/08/04 20:44:59 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874  
 15/08/04 20:44:59 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31)  
 15/08/04 20:44:59 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks  
 15/08/04 20:44:59 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1446 bytes)  
 15/08/04 20:44:59 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1446 bytes)  
 15/08/04 20:44:59 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)  
 15/08/04 20:44:59 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)  
 15/08/04 20:44:59 INFO Executor: Fetching http://182.168.133.28:50955/jars/spark-examples-1.4.1-hadoop2.2.0.jar with timestamp 1438692298221  
 15/08/04 20:45:00 INFO Utils: Fetching http://182.168.133.28:50955/jars/spark-examples-1.4.1-hadoop2.2.0.jar to /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/userFiles-f3a72f24-78e5-4d5d-82eb-dcc8c6b899cb/fetchFileTemp5981400277552657211.tmp  
 15/08/04 20:45:03 INFO Executor: Adding file:/tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/userFiles-f3a72f24-78e5-4d5d-82eb-dcc8c6b899cb/spark-examples-1.4.1-hadoop2.2.0.jar to class loader  
 15/08/04 20:45:03 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 736 bytes result sent to driver  
 15/08/04 20:45:03 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 736 bytes result sent to driver  
 15/08/04 20:45:03 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3722 ms on localhost (1/2)  
 15/08/04 20:45:03 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 3685 ms on localhost (2/2)  
 15/08/04 20:45:03 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool   
 15/08/04 20:45:03 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:35) finished in 3.750 s  
 15/08/04 20:45:03 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:35, took 4.032610 s  
 Pi is roughly 3.14038  
 15/08/04 20:45:03 INFO SparkUI: Stopped Spark web UI at http://182.168.133.28:4040  
 15/08/04 20:45:03 INFO DAGScheduler: Stopping DAGScheduler  
 15/08/04 20:45:03 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!  
 15/08/04 20:45:03 INFO Utils: path = /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/blockmgr-5ed813af-a26f-413c-bdfc-1e08001f9cb2, already present as root for deletion.  
 15/08/04 20:45:03 INFO MemoryStore: MemoryStore cleared  
 15/08/04 20:45:03 INFO BlockManager: BlockManager stopped  
 15/08/04 20:45:03 INFO BlockManagerMaster: BlockManagerMaster stopped  
 15/08/04 20:45:03 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!  
 15/08/04 20:45:03 INFO SparkContext: Successfully stopped SparkContext  
 15/08/04 20:45:03 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.  
 15/08/04 20:45:03 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.  
 15/08/04 20:45:03 INFO Utils: Shutdown hook called  
 15/08/04 20:45:03 INFO Utils: Deleting directory /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9  

With this introduction, it give you an idea on what spark is all about, you can basically use spark to do distributed processing. These tutorial give some quick idea on what spark is all about and how I can use it. It is definitely worth while to look into the example directory to see what can spark really do for you. Before I end this, I think these two links are very helpful to get you further.

http://spark.apache.org/docs/latest/quick-start.html

http://spark.apache.org/docs/latest/#launching-on-a-cluster

Sunday, August 2, 2015

Learning basic of cobertura

A while back, I was reading an article talk about code coverage and I googled, there is this opensource code coverage tool called cobertura. So naturally I thought to give it a try and the result was not disappoint. Read on to find out why. You might wondering why the name of cobertura, from the official site explanation.

"Cobertura" is the Spanish and Portuguese word for "coverage." We were trying to avoid acronyms and coffee references. It's not too hard to associate the word "cobertura" with the word "coverage," and it even has a bit of a zesty kick to it!

Okay, again, why would I want this as I have already junit running?

Cobertura is a free Java tool that calculates the percentage of code accessed by tests. It can be used to identify which parts of your Java program are lacking test coverage. It is based on jcoverage.

So cobertura is a auxiliary to the exiting test by showing how much of your test currently cover in your main codebase. So a requirement is such that, you need to have tests written before you use cobertura.

Okay, enough for the theory, let's dip toe into water. First, download the library, you can download from this link. Next, unzip this file and change into this library directory. There is a nice ready example for you to play with.

 $ ls  
 cobertura-2.1.1.jar        cobertura-2.1.1-sources.jar cobertura-check.sh cobertura-instrument.bat     cobertura-merge.bat cobertura-report.bat examples LICENSE.txt  
 cobertura-2.1.1-javadoc.jar cobertura-check.bat      coberturaFlush.war cobertura-instrument.sh     cobertura-merge.sh  cobertura-report.sh  lib        README.markdown  

change into this directory and run the command such as below. Yes, you will need ant installed and java.

 $ ant -p  
 Buildfile: /home/user/Desktop/cobertura-2.1.1/examples/basic/build.xml  
   Cobertura - http://cobertura.sourceforge.net/  
   Copyright (C) 2003 jcoverage ltd.  
   Copyright (C) 2005 Mark Doliner <thekingant@users.sourceforge.net>  
   Copyright (C) 2006 Dan Godfrey  
   Cobertura is licensed under the GNU General Public License  
   Cobertura comes with ABSOLUTELY NO WARRANTY  
 Main targets:  
  clean   Remove all files created by the build/test process.  
  coverage Compile, instrument ourself, run the tests and generate JUnit and coverage reports.  
 Default target: coverage  

So that's pretty clear, we have two targets, the clean and coverage. The coverage will generate all necessary files for you. See below.

 $ ant coverage  
 Buildfile: /home/user/Desktop/cobertura-2.1.1/examples/basic/build.xml  
 init:  
   [mkdir] Created dir: /home/user/Desktop/cobertura-2.1.1/examples/basic/classes  
   [mkdir] Created dir: /home/user/Desktop/cobertura-2.1.1/examples/basic/instrumented  
   [mkdir] Created dir: /home/user/Desktop/cobertura-2.1.1/examples/basic/reports/junit-xml  
   [mkdir] Created dir: /home/user/Desktop/cobertura-2.1.1/examples/basic/reports/junit-html  
   [mkdir] Created dir: /home/user/Desktop/cobertura-2.1.1/examples/basic/reports/cobertura-xml  
   [mkdir] Created dir: /home/user/Desktop/cobertura-2.1.1/examples/basic/reports/cobertura-summary-xml  
   [mkdir] Created dir: /home/user/Desktop/cobertura-2.1.1/examples/basic/reports/cobertura-html  
 compile:  
   [javac] /home/user/Desktop/cobertura-2.1.1/examples/basic/build.xml:36: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds  
   [javac] Compiling 2 source files to /home/user/Desktop/cobertura-2.1.1/examples/basic/classes  
   [javac] Note: /home/user/Desktop/cobertura-2.1.1/examples/basic/src/com/example/simple/SimpleTest.java uses unchecked or unsafe operations.  
   [javac] Note: Recompile with -Xlint:unchecked for details.  
 instrument:  
   [delete] Deleting directory /home/user/Desktop/cobertura-2.1.1/examples/basic/instrumented  
 [cobertura-instrument] 21:55:08,566 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]  
 [cobertura-instrument] 21:55:08,566 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]  
 [cobertura-instrument] 21:55:08,566 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml]  
 [cobertura-instrument] 21:55:08,567 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.  
 [cobertura-instrument] 21:55:08,567 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml]  
 [cobertura-instrument] 21:55:08,567 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1.jar!/logback.xml]  
 [cobertura-instrument] 21:55:08,601 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@4fce7ceb - URL [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml] is not of type file  
 [cobertura-instrument] 21:55:08,699 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set  
 [cobertura-instrument] 21:55:08,704 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]  
 [cobertura-instrument] 21:55:08,716 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]  
 [cobertura-instrument] 21:55:08,813 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property  
 [cobertura-instrument] 21:55:08,897 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [net.sourceforge.cobertura] to INFO  
 [cobertura-instrument] 21:55:08,897 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to DEBUG  
 [cobertura-instrument] 21:55:08,897 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]  
 [cobertura-instrument] 21:55:08,898 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.  
 [cobertura-instrument] 21:55:08,899 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@7d6b513b - Registering current configuration as safe fallback point  
 [cobertura-instrument]   
 [cobertura-instrument] 21:55:09,216 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]  
 [cobertura-instrument] 21:55:09,217 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]  
 [cobertura-instrument] 21:55:09,217 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml]  
 [cobertura-instrument] 21:55:09,218 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.  
 [cobertura-instrument] 21:55:09,218 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml]  
 [cobertura-instrument] 21:55:09,218 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1.jar!/logback.xml]  
 [cobertura-instrument] 21:55:09,243 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@45a5049a - URL [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml] is not of type file  
 [cobertura-instrument] 21:55:09,310 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set  
 [cobertura-instrument] 21:55:09,315 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]  
 [cobertura-instrument] 21:55:09,325 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]  
 [cobertura-instrument] 21:55:09,354 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property  
 [cobertura-instrument] 21:55:09,402 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [net.sourceforge.cobertura] to INFO  
 [cobertura-instrument] 21:55:09,402 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to DEBUG  
 [cobertura-instrument] 21:55:09,402 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]  
 [cobertura-instrument] 21:55:09,403 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.  
 [cobertura-instrument] 21:55:09,405 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@54d13e2e - Registering current configuration as safe fallback point  
 [cobertura-instrument]   
 [cobertura-instrument] Cobertura 2.1.1 - GNU GPL License (NO WARRANTY) - See COPYRIGHT file  
 [cobertura-instrument] [INFO] Cobertura: Saved information on 1 classes.  
 [cobertura-instrument] [INFO] Cobertura: Saved information on 1 classes.  
 test:  
   [junit] [INFO] Cobertura: Loaded information on 1 classes.  
   [junit] [INFO] Cobertura: Saved information on 1 classes.  
 [junitreport] Processing /home/user/Desktop/cobertura-2.1.1/examples/basic/reports/junit-xml/TESTS-TestSuites.xml to /tmp/null1467716178  
 [junitreport] Loading stylesheet jar:file:/usr/share/ant/lib/ant-junit.jar!/org/apache/tools/ant/taskdefs/optional/junit/xsl/junit-frames.xsl  
 [junitreport] Transform time: 1272ms  
 [junitreport] Deleting: /tmp/null1467716178  
 coverage-report:  
 [cobertura-report] 21:55:13,533 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]  
 [cobertura-report] 21:55:13,533 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]  
 [cobertura-report] 21:55:13,533 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml]  
 [cobertura-report] 21:55:13,535 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.  
 [cobertura-report] 21:55:13,535 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml]  
 [cobertura-report] 21:55:13,535 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1.jar!/logback.xml]  
 [cobertura-report] 21:55:13,561 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@6e038230 - URL [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml] is not of type file  
 [cobertura-report] 21:55:13,636 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set  
 [cobertura-report] 21:55:13,643 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]  
 [cobertura-report] 21:55:13,653 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]  
 [cobertura-report] 21:55:13,684 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property  
 [cobertura-report] 21:55:13,748 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [net.sourceforge.cobertura] to INFO  
 [cobertura-report] 21:55:13,748 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to DEBUG  
 [cobertura-report] 21:55:13,748 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]  
 [cobertura-report] 21:55:13,749 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.  
 [cobertura-report] 21:55:13,751 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@285855bd - Registering current configuration as safe fallback point  
 [cobertura-report]   
 [cobertura-report] Cobertura 2.1.1 - GNU GPL License (NO WARRANTY) - See COPYRIGHT file  
 [cobertura-report] [INFO] Cobertura: Loaded information on 1 classes.  
 [cobertura-report] Report time: 159ms  
 summary-coverage-report:  
 [cobertura-report] 21:55:14,128 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]  
 [cobertura-report] 21:55:14,129 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]  
 [cobertura-report] 21:55:14,129 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml]  
 [cobertura-report] 21:55:14,131 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.  
 [cobertura-report] 21:55:14,131 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml]  
 [cobertura-report] 21:55:14,131 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1.jar!/logback.xml]  
 [cobertura-report] 21:55:14,161 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@52633079 - URL [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml] is not of type file  
 [cobertura-report] 21:55:14,234 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set  
 [cobertura-report] 21:55:14,239 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]  
 [cobertura-report] 21:55:14,250 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]  
 [cobertura-report] 21:55:14,281 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property  
 [cobertura-report] 21:55:14,334 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [net.sourceforge.cobertura] to INFO  
 [cobertura-report] 21:55:14,335 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to DEBUG  
 [cobertura-report] 21:55:14,335 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]  
 [cobertura-report] 21:55:14,336 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.  
 [cobertura-report] 21:55:14,338 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6e038230 - Registering current configuration as safe fallback point  
 [cobertura-report]   
 [cobertura-report] Cobertura 2.1.1 - GNU GPL License (NO WARRANTY) - See COPYRIGHT file  
 [cobertura-report] [INFO] Cobertura: Loaded information on 1 classes.  
 [cobertura-report] Report time: 124ms  
 alternate-coverage-report:  
 [cobertura-report] 21:55:14,694 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]  
 [cobertura-report] 21:55:14,694 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]  
 [cobertura-report] 21:55:14,694 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml]  
 [cobertura-report] 21:55:14,695 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.  
 [cobertura-report] 21:55:14,695 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml]  
 [cobertura-report] 21:55:14,695 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1.jar!/logback.xml]  
 [cobertura-report] 21:55:14,727 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@5abce07 - URL [jar:file:/home/user/Desktop/cobertura-2.1.1/cobertura-2.1.1-sources.jar!/logback.xml] is not of type file  
 [cobertura-report] 21:55:14,814 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set  
 [cobertura-report] 21:55:14,821 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]  
 [cobertura-report] 21:55:14,832 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]  
 [cobertura-report] 21:55:14,874 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property  
 [cobertura-report] 21:55:14,934 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - Setting level of logger [net.sourceforge.cobertura] to INFO  
 [cobertura-report] 21:55:14,934 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to DEBUG  
 [cobertura-report] 21:55:14,935 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]  
 [cobertura-report] 21:55:14,935 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.  
 [cobertura-report] 21:55:14,937 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@52633079 - Registering current configuration as safe fallback point  
 [cobertura-report]   
 [cobertura-report] Cobertura 2.1.1 - GNU GPL License (NO WARRANTY) - See COPYRIGHT file  
 [cobertura-report] [INFO] Cobertura: Loaded information on 1 classes.  
 [cobertura-report] Report time: 171ms  
 coverage:  
 BUILD SUCCESSFUL  
 Total time: 9 seconds  

From the output, we see a target called instrument and then test was called. Then all the reports are generated. If you list the directory now, you should see a few additional files are added. cobertura.ser, reports, classes and instrumented. All the generated reports are in the reports directory. The one is relevant for this article cobertura and I will show you cobertura-html report.



That's it, if you want to go further integrating into this code coverage tool into your project, start to look into the example build.xml. Have fun.

Saturday, August 1, 2015

Learning ActionScript with hello world

Today, we are going to learn ActionScript. First, what is ActionScript?

ActionScript is an object-oriented programming language originally developed by Macromedia Inc. (now dissolved into Adobe Systems). It is a derivation of HyperTalk, the scripting language for HyperCard.[2] It is a dialect of ECMAScript (meaning it is a superset of the syntax and semantics of the language more widely known as JavaScript).

ActionScript is used primarily for the development of websites and software targeting the Adobe Flash Player platform, used on Web pages in the form of embedded SWF files.

Okay, let's download adobe flex sdk, it is available here then extract the sdk to a file.

Let get into ActionScript hello world. Create a file HelloWorld.as

1:  user@localhost:~/Desktop/flex$ cat HelloWorld.as  
2:  package {  
3:   import flash.display.*;  
4:   import flash.text.*;  
5:    
6:   public class HelloWorld extends Sprite {  
7:    private var greeting:TextField = new TextField();  
8:      
9:    public function HelloWorld() {     
10:     greeting.text = "Hello World!";  
11:     greeting.x = 200;  
12:     greeting.y = 200;  
13:     addChild(greeting);  
14:    }  
15:   }  
16:  }  

Now, let's compile the action script into swf. From where you extract the sdk, then you can follow the following steps. I make them easy as this is my first time doing flash development work. So I just put them into the sdk folder.

1:  user@localhost:~/Desktop/flex$ ./bin/mxmlc HelloWorld.as   
2:  Loading configuration file /home/user/Desktop/flex/frameworks/flex-config.xml  
3:  /home/user/Desktop/flex/HelloWorld.as: Warning: This compilation unit did not have a factoryClass specified in Frame metadata to load the configured runtime shared libraries. To compile without runtime shared libraries either set the -static-link-runtime-shared-libraries option to true or remove the -runtime-shared-libraries option.  
4:    
5:  /home/user/Desktop/flex/HelloWorld.swf (649 bytes)  
6:  user@localhost:~/Desktop/flex$ ll HelloWorld.*  
7:  -rw-r--r-- 1 user user 312 Jul 13 22:20 HelloWorld.as  
8:  -rw-r--r-- 1 user user 649 Jul 14 00:32 HelloWorld.swf  
9:    

Apart from the compiler warning, a shockwave flash is generated. Now, you can render this through swf player or using html. A html is just as simple as the following.

1:  user@localhost:~/Desktop/flex$ cat index.html   
2:  <object data="HelloWorld.swf"  
3:    type="application/x-shockwave-flash" width="500" height="500" >  
4:    <param name="movie" value="HelloWorld.swf" />  
5:  </object>  

and then open this file using your browser.



This is a start to dip your toe into the swf water. You can follow this tutorial and article to better be a flash developer! If you are starter of flash, I suggest you get an IDE which will give you a great learning experience. If you have been following my articles, you should aware eclipse were mentioned many times. As such, you can use this plugin fdt for shockwave development.

Friday, July 31, 2015

Attempting to understand java garbage collect statistics

If you have been develop large java application, at times troubleshooting application can go as deep as looking into garbage collector when application is running. Unfortunately the statistics are just too much to begin to investigate into or trying to understand it. At least for me, it is pretty mundane and I seek your help too if you came across this article and please leave comment.

There are very few documentation describe how are these statistics should be interpreted. There is this from oracle blog which is dated year 2006, pretty outdated to be relevant but nonetheless, it analyze line by line. More recent article from alexy ragozin and poonam bajaj are worth to take a look too.

The gc statistics should be able to regenerate using these parameter to the java command line.  -XX:+PrintGCDetails -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 , and the following are snippets extracted from a production machine. Let's take a look at them line by line.

 Before GC:  
 Statistics for BinaryTreeDictionary:  
 ------------------------------------  
 Total Free Space: 230400  
 Max  Chunk Size: 230400  
 Number of Blocks: 1  
 Av. Block Size: 230400  
 Tree   Height: 1  
 586945.492: [ParNew  
 Desired survivor size 41943040 bytes, new threshold 1 (max 1)  
 - age  1:  10038008 bytes,  10038008 total  
 : 660426K->10292K(737280K), 0.0353470 secs] 9424156K->8774094K(12500992K)After GC:  
 Statistics for BinaryTreeDictionary:  
 ------------------------------------  
 Total Free Space: 127053189  
 Max  Chunk Size: 21404293  
 Number of Blocks: 125654  
 Av. Block Size: 1011  
 Tree   Height: 36  
   
   
   
 After GC:  
 Statistics for BinaryTreeDictionary:  
 ------------------------------------  
 Total Free Space: 230400  
 Max  Chunk Size: 230400  
 Number of Blocks: 1  
 Av. Block Size: 230400  
 Tree   Height: 1  
 , 0.0359540 secs] [Times: user=0.26 sys=0.00, real=0.03 secs]   
 Heap after GC invocations=550778 (full 2090):  
  par new generation  total 737280K, used 10292K [0x00000004fae00000, 0x000000052ce00000, 0x000000052ce00000)  
  eden space 655360K,  0% used [0x00000004fae00000, 0x00000004fae00000, 0x0000000522e00000)  
  from space 81920K, 12% used [0x0000000522e00000, 0x000000052380d360, 0x0000000527e00000)  
  to  space 81920K,  0% used [0x0000000527e00000, 0x0000000527e00000, 0x000000052ce00000)  
  concurrent mark-sweep generation total 11763712K, used 8763801K [0x000000052ce00000, 0x00000007fae00000, 0x00000007fae00000)  
  concurrent-mark-sweep perm gen total 40952K, used 24563K [0x00000007fae00000, 0x00000007fd5fe000, 0x0000000800000000)  
 }  
 Total time for which application threads were stopped: 0.0675660 seconds  
 {Heap before GC invocations=550778 (full 2090):  
  par new generation  total 737280K, used 11677K [0x00000004fae00000, 0x000000052ce00000, 0x000000052ce00000)  
  eden space 655360K,  0% used [0x00000004fae00000, 0x00000004faf5a220, 0x0000000522e00000)  
  from space 81920K, 12% used [0x0000000522e00000, 0x000000052380d360, 0x0000000527e00000)  
  to  space 81920K,  0% used [0x0000000527e00000, 0x0000000527e00000, 0x000000052ce00000)  
  concurrent mark-sweep generation total 11763712K, used 8763801K [0x000000052ce00000, 0x00000007fae00000, 0x00000007fae00000)  
  concurrent-mark-sweep perm gen total 40952K, used 24563K [0x00000007fae00000, 0x00000007fd5fe000, 0x0000000800000000)  

We can summarize the statistics above with the following points.

* the statistics generated above is from java hotspot and the source code can be foudn here https://github.com/openjdk-mirror/jdk7u-hotspot/blob/master/src/share/vm/gc_implementation/concurrentMarkSweep/binaryTreeDictionary.cpp#L1098-L1112

* there are two statistics, before gc and after gc and this is not full gc.

* before gc, we notice the max chunk size is equal to the total free space, so we assume there is no usage.

* before gc, we also noticed that the total free space has 127053189 and max chunk size is 21404293

* after gc, cpu usage is spent on user 0.26 and real 0.03.

* after gc, from region usage 12% of the heap.

* after gc, concurrent mark sweep generation total of 11,763,712k whilst concurrent mark sweep permanent generation total is 40,954k and used only 24,563K

* total time this application stop were 0.0675660 seconds.

So we can guess that this gc snippet is good. it is not a full gc and usage does not increase to 100%. There is no failure/error appear anywhere. The total time stop is trivial too, less than a second.

That's it and if you think this analysis is wrong and/or can be improve upon, please leave your message below. I would like to learn more too.

Sunday, July 19, 2015

Light learning using tsung on elasticsearch

Today, we will learn another software tool, Tsung. As this is just a introduction course, we will not go into details like interpret the statistics if it make sense. This is course plan to introduce how you quickly install, setup, test and show results using tsung. We want quickly acquaint distributed load testing using tsung. Let's first understand what tsung is.

Tsung is an open-source multi-protocol distributed load testing tool. It can be used to stress HTTP, WebDAV, SOAP, PostgreSQL, MySQL, LDAP and Jabber/XMPP servers. Tsung is a free software released under the GPLv2 license. That we like sir!

Okay, http is very common and let's pick elasticsearch 1.6.0 for this test. You can download elasticsearch from elastic.co , extract the content and just launch from command line. See below.

 user@localhost:~/Desktop/elasticsearch-1.6.0/bin$ ./elasticsearch  
 [2015-07-04 09:46:10,756][INFO ][node           ] [Futurist] version[1.6.0], pid[16404], build[cdd3ac4/2015-06-09T13:36:34Z]  
 [2015-07-04 09:46:10,757][INFO ][node           ] [Futurist] initializing ...  
 [2015-07-04 09:46:10,762][INFO ][plugins         ] [Futurist] loaded [], sites []  
 [2015-07-04 09:46:10,863][INFO ][env           ] [Futurist] using [1] data paths, mounts [[/ (/dev/sda5)]], net usable_space [15.4gb], net total_space [215.2gb], types [ext3]  
 [2015-07-04 09:46:13,833][INFO ][node           ] [Futurist] initialized  
 [2015-07-04 09:46:13,834][INFO ][node           ] [Futurist] starting ...  
 [2015-07-04 09:46:13,960][INFO ][transport        ] [Futurist] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.2:9300]}  
 [2015-07-04 09:46:13,978][INFO ][discovery        ] [Futurist] elasticsearch/eSG-tzQuQdCz5QKIomYm5Q  
 [2015-07-04 09:46:17,774][INFO ][cluster.service     ] [Futurist] new_master [Futurist][eSG-tzQuQdCz5QKIomYm5Q][VerticalHorizon][inet[/192.168.1.2:9300]], reason: zen-disco-join (elected_as_master)  
 [2015-07-04 09:46:17,809][INFO ][http           ] [Futurist] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.2:9200]}  
 [2015-07-04 09:46:17,810][INFO ][node           ] [Futurist] started  
 [2015-07-04 09:46:17,934][INFO ][gateway         ] [Futurist] recovered [0] indices into cluster_state  

A single node elasticsearch with no index nor mapping. We just wanna quickly use elasticsearch as software under test.

Next, let's install tsung. If you are using debian, that's a good news! See below.

 $ sudo apt-get install tsung  

easy peasy. Okay, now let's create a directory tsung in user home directory. Then create two files. See below.

 user@localhost:~$ mkdir tsung  
 user@localhost:~$ cd tsung  
 user@localhost:~/tsung$ cat query.json   
 {"size":10,"query":{"filtered":{"query":{"match_all":{}}}}}  
 $ cat tsung.xml  
 <?xml version="1.0" encoding="utf-8"?>  
 <!DOCTYPE tsung SYSTEM "/usr/share/tsung/tsung-1.0.dtd" []>  
 <tsung loglevel="info">  
   
  <clients>  
   <client host="localhost" use_controller_vm="true" cpu="1" maxusers="30000000"/>  
  </clients>  
   
  <servers>  
   <server host="localhost" port="9200" type="tcp"/>  
  </servers>  
   
  <load>  
   <arrivalphase phase="1" duration="1" unit="minute">  
    <users arrivalrate="5" unit="second"/>  
   </arrivalphase>  
  </load>  
   
  <sessions>  
   <session name="es_load" weight="1" type="ts_http">  
    <request>  
    <http url="/myindex/_search"  
        method="GET"  
        contents_from_file="/home/user/tsung/query.json" />  
    </request>  
   </session>  
  </sessions>  
 </tsung>  

Now we are ready to do the load test, you can start tsung with the configuration descriptor. See below.

 user@localhost:~/tsung$ tsung -f /home/user/tsung/tsung.xml start  
 Starting Tsung  
 "Log directory is: /home/user/.tsung/log/20150704-1104"  
 user@localhost:~/tsung$  

Give a few minute for it to run and you should be able to check the log output. Once it is done, then let's generate the graph.

 user@localhost:~/tsung$ ls  
 total 8.0K  
 -rw-r--r-- 1 user user 60 Jul 4 09:33 query.json  
 -rw-r--r-- 1 user user 724 Jul 4 11:04 tsung.xml  
 user@localhost:~/tsung$ mkdir show_me_da_graph  
 user@localhost:~/tsung$ cd show_me_da_graph/  
 user@localhost:~/tsung/show_me_da_graph$ /usr/lib/tsung/bin/tsung_stats.pl --stats /home/user/.tsung/log/20150704-1104/tsung.log   
 creating subdirectory data   
 creating subdirectory gnuplot_scripts   
 creating subdirectory images   
 warn, last interval (5) not equal to the first, use the first one (10)  
 No data for Bosh  
 No data for Match  
 No data for Event  
 No data for Async  
 No data for Errors  
 user@localhost:~/tsung/show_me_da_graph$ ls  
 total 32K  
 drwxr-xr-x 2 user user 4.0K Jul 4 11:07 data  
 drwxr-xr-x 2 user user 4.0K Jul 4 11:07 gnuplot_scripts  
 drwxr-xr-x 2 user user 4.0K Jul 4 11:07 images  
 -rw-r--r-- 1 user user 3.7K Jul 4 11:07 gnuplot.log  
 -rw-r--r-- 1 user user 7.9K Jul 4 11:07 report.html  
 -rw-r--r-- 1 user user 7.1K Jul 4 11:07 graph.html  
 user@localhost:~/tsung/show_me_da_graph$ chromium graph.html  
 [18334:18334:0704/110724:ERROR:nss_util.cc(819)] After loading Root Certs, loaded==false: NSS error code: -8018  
 Created new window in existing browser session.  

Now look at the beautiful output! If you don't have chromium browser, copy this link and paste into your browser url. file:///home/user/tsung/show_me_da_graph/graph.html . Change the link where appropriately in your workstation. See screenshow below.



If tsung is what you like, consider going into details of tsung. Tsung provide a comprehensive documentation! That's it for this article, I hope you learn something.



Saturday, July 18, 2015

Learn what is python method mangling and variable mangling

While I was studying python, there are some unique word, mangling that caught my attention. There are two mangling, the method and the variable. But first, let's take a look what is mangling in python. From the official pep 8 documentation.

  If your class is intended to be subclassed, and you have attributes that you do not want subclasses to use, consider naming them with double leading underscores and no trailing underscores. This invokes Python's name mangling algorithm, where the name of the class is mangled into the attribute name. This helps avoid attribute name collisions should subclasses inadvertently contain attributes with the same name.

  Python mangles these names with the class name: if class Foo has an attribute named __a , it cannot be accessed by Foo.__a . (An insistent user could still gain access by calling Foo._Foo__a .) Generally, double leading underscores should be used only to avoid name conflicts with attributes in classes designed to be subclassed.

Okay, with that said and explained, let's hop into the code. We will create two class, parent and child class and both sharing the same method and same variable.

 $ cat mangle.py  
 class Parent(object):  
   __name = "John Smith"  
   def __init__(self):  
     self.__alive = False  
     self.__parentAlive = False  
   def __show_age(self):  
     print "65"  
   
 class Child(Parent):  
   __name = "John Smith Junior"  
   def __init__(self):  
     super(Child, self).__init__()  
     self.__alive = True  
     self.__childAlive = True  
   def __show_age(self):  
     print "34"  

now import this module into python interpreter and understand how mangling works.

 $ python  
 Python 2.7.10 (default, Jun 1 2015, 16:21:46)   
 [GCC 4.9.2] on linux2  
 Type "help", "copyright", "credits" or "license" for more information.  
 >>> from mangle import Child  
 >>> johnny = Child()  
 >>> dir(johnny)  
 ['_Child__alive', '_Child__childAlive', '_Child__name', '_Child__show_age', '_Parent__alive', '_Parent__name', '_Parent__parentAlive', '_Parent__show_age', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']  
 >>> johnny._Child__alive  
 True  
 >>> johnny._Child__name  
 'John Smith Junior'  
 >>> johnny._Child__show_age()  
 34  
 >>> johnny._Parent__alive  
 False  
 >>> johnny._Parent__name  
 'John Smith'  
 >>> johnny._Parent__show_age()  
 65  
 >>>   

So with this, it should be clear that the attributes that live in the python object will prepend with an underscore and the class name with the variable. Same wise applicable to the method name too.

Friday, July 17, 2015

Generate flame graph using FlameGraph goodies

Lately, I have been reading slideshare from brendan gregg on how to monitor stack in linux. An example would be this slide share. There is an interesting among his slide share using a few commands to generate the flame graph. This flame graph project can be found in his github project.

Today, we are trying flamegraph on my local system. A simple walk through using his great software. Okay, in one of his slide, he gave a few commands and I modified a bit from his original version. Remember I ran these command on my debian box.

 git clone --depth 1 https://github.com/brendangregg/FlameGraph  
 cd FlameGraph  
 perf record -F 99 -a -g -- sleep 30  
 perf script| ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg  

and in the terminal, output below.

 user@localhost:~$ git clone --depth 1 https://github.com/brendangregg/FlameGraph  
 Cloning into 'FlameGraph'...  
 remote: Counting objects: 50, done.  
 remote: Compressing objects: 100% (29/29), done.  
 remote: Total 50 (delta 24), reused 37 (delta 20), pack-reused 0  
 Unpacking objects: 100% (50/50), done.  
 Checking connectivity... done.  
 user@localhost:~$ cd FlameGraph  
 user@localhost:~/FlameGraph$ sudo perf record -F 99 -a -g -- sleep 30  
 /usr/bin/perf: line 24: exec: perf_4.0: not found  
 E: linux-tools-4.0 is not installed.  
 user@localhost:~/FlameGraph$ sudo perf record -F 99 -a -g -- sleep 30  
 [ perf record: Woken up 1 times to write data ]  
 [ perf record: Captured and wrote 1.719 MB perf.data (5082 samples) ]  
 user@localhost:~/FlameGraph$ perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg  
 failed to open perf.data: Permission denied  
 ERROR: No stack counts found  
 user@localhost:~/FlameGraph$ sudo perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg  
 Failed to open /tmp/perf-3763.map, continuing without symbols  
 Failed to open /tmp/perf-4908.map, continuing without symbols  
 Failed to open /usr/lib/i386-linux-gnu/libQtCore.so.4.8.6, continuing without symbols  
 Failed to open /lib/i386-linux-gnu/libglib-2.0.so.0.4200.1, continuing without symbols  
 Failed to open /tmp/perf-5995.map, continuing without symbols  
 Failed to open /tmp/perf-2337.map, continuing without symbols  
 Failed to open /tmp/perf-3012.map, continuing without symbols  
 no symbols found in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstplayback.so, maybe install a debug package?  
 Failed to open /usr/lib/i386-linux-gnu/libQtGui.so.4.8.6, continuing without symbols  
 Failed to open /tmp/perf-19187.map, continuing without symbols  
 no symbols found in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstmatroska.so, maybe install a debug package?  
 no symbols found in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstcoreelements.so, maybe install a debug package?  
 Failed to open /run/user/1000/orcexec.Sg4yUn, continuing without symbols  
 no symbols found in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstfaad.so, maybe install a debug package?  
 Failed to open /usr/bin/skype, continuing without symbols  
 no symbols found in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstlibav.so, maybe install a debug package?  
 user@localhost:~/FlameGraph$  

As you can see above, you need perf to be installed. perf is provided by this package, linux-tools-4.0 and you need root permission to run perf command. It will take a few seconds to collect the statistics and then you can again using the script to generate the svg. Now, you should be able to view the svg file using eof or gimp. See below for the flame graph generated in my workstation. :) Note, I have to change svg to jpg to upload to this blogger.