Friday, July 17, 2015

Generate flame graph using FlameGraph goodies

Lately, I have been reading slideshare from brendan gregg on how to monitor stack in linux. An example would be this slide share. There is an interesting among his slide share using a few commands to generate the flame graph. This flame graph project can be found in his github project.

Today, we are trying flamegraph on my local system. A simple walk through using his great software. Okay, in one of his slide, he gave a few commands and I modified a bit from his original version. Remember I ran these command on my debian box.

 git clone --depth 1 https://github.com/brendangregg/FlameGraph  
 cd FlameGraph  
 perf record -F 99 -a -g -- sleep 30  
 perf script| ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg  

and in the terminal, output below.

 user@localhost:~$ git clone --depth 1 https://github.com/brendangregg/FlameGraph  
 Cloning into 'FlameGraph'...  
 remote: Counting objects: 50, done.  
 remote: Compressing objects: 100% (29/29), done.  
 remote: Total 50 (delta 24), reused 37 (delta 20), pack-reused 0  
 Unpacking objects: 100% (50/50), done.  
 Checking connectivity... done.  
 user@localhost:~$ cd FlameGraph  
 user@localhost:~/FlameGraph$ sudo perf record -F 99 -a -g -- sleep 30  
 /usr/bin/perf: line 24: exec: perf_4.0: not found  
 E: linux-tools-4.0 is not installed.  
 user@localhost:~/FlameGraph$ sudo perf record -F 99 -a -g -- sleep 30  
 [ perf record: Woken up 1 times to write data ]  
 [ perf record: Captured and wrote 1.719 MB perf.data (5082 samples) ]  
 user@localhost:~/FlameGraph$ perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg  
 failed to open perf.data: Permission denied  
 ERROR: No stack counts found  
 user@localhost:~/FlameGraph$ sudo perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg  
 Failed to open /tmp/perf-3763.map, continuing without symbols  
 Failed to open /tmp/perf-4908.map, continuing without symbols  
 Failed to open /usr/lib/i386-linux-gnu/libQtCore.so.4.8.6, continuing without symbols  
 Failed to open /lib/i386-linux-gnu/libglib-2.0.so.0.4200.1, continuing without symbols  
 Failed to open /tmp/perf-5995.map, continuing without symbols  
 Failed to open /tmp/perf-2337.map, continuing without symbols  
 Failed to open /tmp/perf-3012.map, continuing without symbols  
 no symbols found in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstplayback.so, maybe install a debug package?  
 Failed to open /usr/lib/i386-linux-gnu/libQtGui.so.4.8.6, continuing without symbols  
 Failed to open /tmp/perf-19187.map, continuing without symbols  
 no symbols found in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstmatroska.so, maybe install a debug package?  
 no symbols found in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstcoreelements.so, maybe install a debug package?  
 Failed to open /run/user/1000/orcexec.Sg4yUn, continuing without symbols  
 no symbols found in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstfaad.so, maybe install a debug package?  
 Failed to open /usr/bin/skype, continuing without symbols  
 no symbols found in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstlibav.so, maybe install a debug package?  
 user@localhost:~/FlameGraph$  

As you can see above, you need perf to be installed. perf is provided by this package, linux-tools-4.0 and you need root permission to run perf command. It will take a few seconds to collect the statistics and then you can again using the script to generate the svg. Now, you should be able to view the svg file using eof or gimp. See below for the flame graph generated in my workstation. :) Note, I have to change svg to jpg to upload to this blogger.






Sunday, July 5, 2015

Check out what is Python package

It's been a while I learn python and today, I would like to check out what is python package. These two reference give python package definition pretty clear.

Packages are a way of structuring Python's module namespace by using "dotted module names". For example, the module name ‘A.B’ designates a submodule named ‘B’ in a package named ‘A’. Just like the use of modules saves the authors of different modules from having to worry about each other's global variable names, the use of dotted module names saves the authors of multi-module packages like NumPy or the Python Imaging Library from having to worry about each other's module names. 

and from learn python org

Packages are namespaces which contain multiple packages and modules themselves. They are simply directories, but with a twist. 
Each package in Python is a directory which MUST contain a special file called __init__.py. This file can be empty, and it indicates that the directory it contains is a Python package, so it can be imported the same way a module can be imported. 

If you come from java background, essentially java package are directories until you create a class. In python, for that directory, you need to create a unique empty file call __init__.py which denote this is a python package.

So something like

router_statistics
    __init__.py
    routerStats.py
    test
        __init__.py
        router_stats_test.py

The above file structure is from github project.We have a python package router_statistics with a module routerStats.py. Then we have a test python package and a test module router_stats_test.py.

Pretty neat :) That's all for this light learning experience.




Saturday, July 4, 2015

Light walkthrough on Groovy

Today, we will learn another language, groovy. It is a scripting language, much like perl and python. Okay, first, let's understand what is groovy. From wikipedia

Groovy is an object-oriented programming language for the Java platform. It is a dynamic language with features similar to those of Python, Ruby, Perl, and Smalltalk. It can be used as a scripting language for the Java Platform, is dynamically compiled to Java Virtual Machine (JVM) bytecode, and interoperates with other Java code and libraries. Groovy uses a Java-like curly-bracket syntax. Most Java code is also syntactically valid Groovy, although semantics may be different.

Groovy 1.0 was released on January 2, 2007, and Groovy 2.0 in July, 2012. Groovy 3.0 is planned for release in late 2015, with support for a new Meta Object Protocol.[2] Since version 2, Groovy can also be compiled statically, offering type inference and performance very close to that of Java.[3][4] Groovy 2.4 was the last major release under Pivotal Software's sponsorship which ended in March 2015.[5]

A few current facts summarize from groovy official site.


Because it is script and interpreted by jvm, so you need to watch out for jvm that run groovy. Below is the table.

Groovy Branch           JVM Required (non-indy) JVM Required (indy) *
2.3 - current           1.6                                        1.7
2.0 - 2.2                   1.5                                        1.7
1.6 - 1.8                   1.5                                        N/A
1.0 - 1.5                   1.4                                        N/A

Okay, let's start with groovy hello world. Groovy provides three quick way to show "hello world" application. You can do it via groovy console, or groovy script or groovy shell.

1:  $ cat hello.groovy   
2:  #!/usr/bin/env groovy  
3:    
4:  println "Hello world!"  
5:  $ groovy hello.groovy   
6:  Hello world!  

$ groovyConsole


1:  $ groovysh   
2:  Groovy Shell (1.8.6, JVM: 1.7.0_55)  
3:  Type 'help' or '\h' for help.  
4:  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
5:  groovy:000> println "hello world"  
6:  hello world  
7:  ===> null  
8:  groovy:000>   

So that's it, if you want to learn more about groovy, here are a few FAQ and its helpful links.

how much does it different than java?
http://www.groovy-lang.org/differences.html

gimme a few example?
http://www.groovy-lang.org/groovy-dev-kit.html

show me the syntax?
http://www.groovy-lang.org/syntax.html

operator?
http://www.groovy-lang.org/operators.html

groovy compiler?
http://www.groovy-lang.org/groovyc.html

groovy shell?
http://www.groovy-lang.org/groovysh.html

groovy console?
http://www.groovy-lang.org/groovyconsole.html


Friday, July 3, 2015

how big data can help legal firm?

Today, we are going to something a little different than our usual learning journey. By that, I mean not purely on information technology but it is somewhat related. Let me explain further. I was reading Malaysia Personal Data Protection Act 2010 or PDPA 2010 on this blog. Legal is not my profession but reading this article from information technology professional, gave several ideas.

Reading this article, no offence, but really is a daunting activity. It is long blogs and dull. :-) nonetheless, every wordings are as equally important to define what is the act should mean and what scope is an act encompasses. I think information retrieval application like elasticsearch would be a match with this. By indexing all the words in the articles and then search quickly and show which act, section and article that reference it. It would be even better with score as more relevant document is shown first. Something the lawyer would probably want to quickly find the relevant document to further read. I'm sure there are books with thousand pages and to remember every single line of the acts is almost impossible or impractical. Information technology will be able to fit for this gap for them.

For law student, this is especially useful as this will speed up the way they learn law. Nobody wanna sit there hours in library and then spend twelves hours a day to read 1000 pages. I think what drive people is we want active learning, not passived reading. So I guess with elasticsearch, they can quickly search with legal terminology and results show them the book that best serve their interest.

For each court cases, transcript or even any text data can be digitize into query-able data. Then with that, data can be turn into information, with information retrieval tools like elasticsearch. I believe a high court case would take months or even year to complete, to quickly digitize these data and be reference upon later down the day, be it during later day of this court case or in the next court case would put law firm into the next stage.

Of cause, this is just my opinion and maybe expressed only from the information technology point of view (as rightfully, I.T. is my profession), please feel free to comment and improve if you find any. Thank you.

Sunday, June 21, 2015

Learning JavaFX on eclipse luna

Today, we will learn JavaFX using eclipse luna as the IDE. It's a start learning journey to get acquainted with the basic of JavaFX in the eclipse development environment. Essentially it is a 'hello world' application. First, let's take a look what is JavaFX. From wikipedia,

JavaFX is a software platform for creating and delivering rich internet applications (RIAs) that can run across a wide variety of devices. JavaFX is intended to replace Swing as the standard GUI library for Java SE, but both will be included for the foreseeable future.[3] JavaFX has support for desktop computers and web browsers on Microsoft Windows, Linux, and Mac OS X.

Okay, so javaFX is a GUI related development arena. With that said, let's start with a simple hello world GUI application for JavaFX. This article assume your java project is using java 8 and eclipse luna and you have setup already. Below is a sample code.

1:  package play.learn.java.fx;  
2:    
3:  import javafx.application.Application;  
4:  import javafx.event.ActionEvent;  
5:  import javafx.event.EventHandler;  
6:  import javafx.scene.Scene;  
7:  import javafx.scene.control.Button;  
8:  import javafx.scene.layout.StackPane;  
9:  import javafx.stage.Stage;  
10:    
11:  public class HelloWorld extends Application {  
12:    
13:     @Override  
14:     public void start(Stage primaryStage) throws Exception {  
15:        Button btn = new Button();  
16:      btn.setText("Say 'Hello World'");  
17:      btn.setOnAction(new EventHandler<ActionEvent>() {  
18:     
19:        @Override  
20:        public void handle(ActionEvent event) {  
21:          System.out.println("Hello World!");  
22:        }  
23:      });  
24:        
25:      StackPane root = new StackPane();  
26:      root.getChildren().add(btn);  
27:        
28:      Scene scene = new Scene(root, 300, 250);  
29:    
30:      primaryStage.setTitle("Hello World!");  
31:      primaryStage.setScene(scene);  
32:      primaryStage.show();  
33:          
34:     }  
35:       
36:     public static void main(String[] args) {  
37:        launch(args);  
38:    
39:     }  
40:  }  


As you can see above, there is a warning about restrict access to the api. To summarize the warning short, it is because non java library is not import by default into the project. So in this situation, you will have to manually add it. It's simple, on the project, right click and then select Properties, then a window pop up and in the Java Build Path tree, click on the 'Add External JARs...' , now you will have to locate where is the java 8 installed, and then select a jar file name jfxrt.jar. It will be relative to where the JAVA_HOME install such that, <JAVA_HOME>/jre/lib/ext/




When that is done, the warning should be dissapear. Now run the application, a window should pop up and click on it, look at the eclipse console, you should see "Hello World!". A little remarks to understand the basic of this application.

Here are the important things to know about the basic structure of a JavaFX application:


  •     The main class for a JavaFX application extends the javafx.application.Application class. The start() method is the main entry point for all JavaFX applications.
  •     A JavaFX application defines the user interface container by means of a stage and a scene. The JavaFX Stage class is the top-level JavaFX container. The JavaFX Scene class is the container for all content. Example 3-1 creates the stage and scene and makes the scene visible in a given pixel size.
  •     In JavaFX, the content of the scene is represented as a hierarchical scene graph of nodes. In this example, the root node is a StackPane object, which is a resizable layout node. This means that the root node's size tracks the scene's size and changes when the stage is resized by a user.
  •     The root node contains one child node, a button control with text, plus an event handler to print a message when the button is pressed.
  •     The main() method is not required for JavaFX applications when the JAR file for the application is created with the JavaFX Packager tool, which embeds the JavaFX Launcher in the JAR file. However, it is useful to include the main() method so you can run JAR files that were created without the JavaFX Launcher, such as when using an IDE in which the JavaFX tools are not fully integrated. Also, Swing applications that embed JavaFX code require the main() method.


The above are excerpt from official documentation. The code can also be found here. That's it, have fun to explore more of JavaFX.

Saturday, June 20, 2015

Fix corrupted ods file

If you have been working with spreadsheet, then one day, when you open up the file, for some unknown reason, it show gibberish text. You will like OH MY GAWD!! where is my file!!?? well afraid not, today, we will try to recover the file. To be exact, the spreadsheet is ods format from open office. You can find more information here.

So, a good normal working ods file start with PK. See example below.

 $ hexdump -C myfile.ods | head -1  
 00000000 50 4b 03 04 14 00 00 08 00 00 b7 71 c5 46 85 6c |PK.........q.F.l|  

The broken one does not start with PK and for my spreadsheet, it is something like the following. It may be different than you but that does not matter.

 $ hexdump -C myfile.ods | head -1  
 00000000 2c 75 73 65 72 2c 55 73 65 72 57 6f 72 6b 73 |,user,UserWorks|  

because openoffice file is compressed file, and then you can fix using the application zip. To fix it, you can run the command such as the one below.

 user@localhost:~$ zip --fixfix myfile.ods --out myfixfile.ods   
 Fix archive (-FF) - salvage what can  
  Found end record (EOCDR) - says expect single disk archive  
 Scanning for entries...  
  copying: Object 1/styles.xml (398 bytes)  
  copying: Object 1/content.xml (1892 bytes)  
  copying: Object 1/meta.xml (281 bytes)  
  copying: Object 2/content.xml (1999 bytes)  
  copying: Object 2/meta.xml (281 bytes)  
  copying: Object 2/styles.xml (483 bytes)  
  copying: Object 3/content.xml (2116 bytes)  
  copying: Object 3/meta.xml (281 bytes)  
  copying: Object 3/styles.xml (398 bytes)  
  copying: styles.xml (1999 bytes)  
  copying: Object 4/meta.xml (281 bytes)  
  copying: Object 4/content.xml (2405 bytes)  
  copying: Object 4/styles.xml (398 bytes)  
  copying: content.xml (17364 bytes)  
  copying: meta.xml (441 bytes)  
  copying: ObjectReplacements/Object 1 (2278 bytes)  
  copying: ObjectReplacements/Object 2 (3654 bytes)  
  copying: ObjectReplacements/Object 3 (1924 bytes)  
  copying: ObjectReplacements/Object 4 (2483 bytes)  
  copying: META-INF/manifest.xml (449 bytes)  
 Central Directory found...  
 no local entry: mimetype  
 no local entry: settings.xml  
 no local entry: manifest.rdf  
 no local entry: Configurations2/menubar/  
 no local entry: Configurations2/toolpanel/  
 no local entry: Configurations2/progressbar/  
 no local entry: Configurations2/accelerator/current.xml  
 no local entry: Configurations2/statusbar/  
 no local entry: Configurations2/images/Bitmaps/  
 no local entry: Configurations2/toolbar/  
 no local entry: Configurations2/floater/  
 no local entry: Configurations2/popupmenu/  
 no local entry: Thumbnails/thumbnail.png  
 EOCDR found ( 1 73809)...  

So the above command will try to salvage whatever it can. You might have guess it, the fix version file is the one specified by --out parameter.

This method works superb for my corrupted file. The fix version of the file contain all the data as before and I was happy. :) I hope it works for you too. That's it for today learning. Good luck to you!

Friday, June 19, 2015

Learn lucene term range query

Today, we are going to learn lucene term range query. But first, what actually is lucene term range query? From the official javadoc definition

A Query that matches documents within an range of terms.

This query matches the documents looking for terms that fall into the supplied range according to Byte.compareTo(Byte). It is not intended for numerical ranges; use NumericRangeQuery instead.

This query uses the MultiTermQuery.CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method.

So byte to byte comparison of between two ranges, because it is byte to byte comparison, the comparison is lexicographic. If you intend to find range between two numbers, this is not the class you should use. Okay, if this is not clear, let's go into the code, shall we?

As you know, lucene is about two parts, the first indexing (write) part and then search (query) part. So in this article, we are going to index and query using term range query. To give you an overall of this article, we have four class.

  • LuceneConstants - just a setting class for this application.
  • Indexer - the class that does the indexing. 
  • Searcher - a class that do the search.
  • LearnTermRangeQuery - our main entry class to bind the above three classes into one. 
We have create an object tester for this learning journey. We then create index by calling method createIndex and then the index using term range query.


1:  LearnTermRangeQuery tester;  
2:    
3:  try {  
4:     tester = new LearnTermRangeQuery();  
5:     tester.createIndex();  
6:     tester.searchUsingTermRangeQuery("record2.txt", "record6.txt");  
7:  } catch (Exception e) {  
8:       
9:  }  

In the method createIndex(), I have some lambda usage, which you can notice with the arrow symbol, so you need to have java8 installed. There are two variables, indexDir and dataDir. The variable, indexDir is there directory where the created index will reside whilst dataDir is the sample data to be index upon. In the class Indexer, method getDocument(), is essentially index all sample documents. Nothing fancy, just ordinary creating lucene document and three fields, filename, filepath and file content.

Back to the class LearnTermRangeQuery, method searchUsingTermRangeQuery(). Notice we search the range with two files as the border. We initialized a lucene directory object and pass to the object index searcher. Everything else for lucene index searcher is just standard. We construct the TermRangeQuery and passed to the searcher object. The results are then shown and eventually close.

Below are the sample output in eclipse output.

 record 21.txt  
 src/resources/samples.termrange/record 21.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record 21.txt  
 record 33 .txt  
 src/resources/samples.termrange/record 33 .txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record 33 .txt  
 record10.txt  
 src/resources/samples.termrange/record10.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record10.txt  
 record7.txt  
 src/resources/samples.termrange/record7.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record7.txt  
 record6.txt  
 src/resources/samples.termrange/record6.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record6.txt  
 record9.txt  
 src/resources/samples.termrange/record9.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record9.txt  
 record33.txt  
 src/resources/samples.termrange/record33.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record33.txt  
 record2.txt  
 src/resources/samples.termrange/record2.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record2.txt  
 record5.txt  
 src/resources/samples.termrange/record5.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record5.txt  
 record 33.txt  
 src/resources/samples.termrange/record 33.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record 33.txt  
 record3.txt  
 src/resources/samples.termrange/record3.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record3.txt  
 record8.txt  
 src/resources/samples.termrange/record8.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record8.txt  
 record2.1.txt  
 src/resources/samples.termrange/record2.1.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record2.1.txt  
 record1.txt  
 src/resources/samples.termrange/record1.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record1.txt  
 record4.txt  
 src/resources/samples.termrange/record4.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record4.txt  
 record22.txt  
 src/resources/samples.termrange/record22.txt  
 Indexing /home/user/eclipse/test/src/resources/samples.termrange/record22.txt  
 16 File indexed, time taken: 800 ms  
 6 documents found. Time :74ms  
 File : /home/user/eclipse/test/src/resources/samples.termrange/record33.txt  
 File : /home/user/eclipse/test/src/resources/samples.termrange/record2.txt  
 File : /home/user/eclipse/test/src/resources/samples.termrange/record5.txt  
 File : /home/user/eclipse/test/src/resources/samples.termrange/record3.txt  
 File : /home/user/eclipse/test/src/resources/samples.termrange/record4.txt  
 File : /home/user/eclipse/test/src/resources/samples.termrange/record22.txt  
   

As you can see above, the result are not correct if you consider numeric file name from record2.txt to record6.txt. So, always try experiment for few values before you implement. hehe, have fun! You can get the source for this codes at my github.