Sunday, March 1, 2015

Study logic elasticsearch discovery zen fd ping_timeout and ping_retries

Today, we are going to study two parameter from elasticsearch version 0.90.7, specifically

  • discovery.zen.fd.ping_timeout

  • discovery.zen.fd.ping_retries


Let's find the definition from official documentation, here and here.
There are two fault detection processes running. The first is by the master, to ping all the other nodes in the cluster and verify that they are alive. And on the other end, each node pings to master to verify if its still alive or an election process needs to be initiated.

 

ping_timeout How long to wait for a ping response, defaults to 30s.
ping_retries How many ping failures / timeouts cause a node to be considered failed. Defaults to 3.

So this setting is use by master node and data node for detection if the node is okay or not and once ping, the duration time to wait is 30seconds (default) and if 3 times, a node is considered down/failed if it exceed 3 times. Okay, let's go into the code.

NodesFaultDetection.java

Within the class NodesFaultDetection, there is a inner class SendPingRequest. As it implement interface Runnable, the run method will be execute by an executor. Instead, object are read and write so emulate a ping behaviour, you can read the class PingRequest for more information. As you noticed, ping_timeout is pass to the super class of PingRequest.

The essence of logic is pretty much written in the statement transportService.sendRequest(final DiscoveryNode node, final String action, final TransportRequest request, final TransportRequestOptions options, TransportResponseHandler<T> handler). You would think it would be an ICMP ping, but it is not, there isn't isReachable() is called.

In the method, handleResponse(PingResponse response), we see that, the retry count is reset to 0 and then this SendPingRequest object is schedule again with the ping_interval you set earlier. In the method, handleException(TransportException exp). we see that the variable retryCount is increase by one, and if the current retry count exceed the default 3 times, then the node is considered dead and were removed. If the current retry count is less than the default 3 times, then another ping is send with the same ping_timeout.

MasterFaultDetection.java

Master fault detection is a little different than nodes fault detection. When the public method of MasterFaultDetection start is called and then method innerStart(), object MasterPinger is created.
this.masterPinger = new MasterPinger();
// start the ping process
threadPool.schedule(pingInterval, ThreadPool.Names.SAME, masterPinger);

So there is a periodic ping of default 1 second. When instance of MasterPinger is run, we noticed that it goes through the same process of sending the request using transport service. transportService.sendRequest(final DiscoveryNode node, final String action, final TransportRequest request, final TransportRequestOptions options, TransportResponseHandler<T> handler)
The logic of this request sending is same with NodesFaultDetection. What interesting is the method override in class BaseTransportResponseHandler. In handleResponse, so we see the the retry count is reset back to 0. Then another ping is scheduled.

In the override method handleException(TransportException exp) , so there are three exception check on the master if it no longer a master, or ping to a non master ping a master but does not exists on it. Now at the stage, retry count is increase by one. If current retry count greater than or equal to the default 3 times, then ping to node by this master is falied, this node consider failed. if current retry count less than the default three, another ping is sent.

That's it, if you think this analysis need improvement, please leave your comment below. Thank you.

Saturday, February 28, 2015

Implement java remote method invocation on tomcat6

28Today, we will learn a bit on remote method invocation (rmi) via java. I know this concept rmi is old but for the sake of learning, nothing is old :) fun and knowledge is what matter. First, let's see what is java remote method invocation. From wikipedia.
The Java Remote Method Invocation (Java RMI) is a Java API that performs the object-oriented equivalent of remote procedure calls (RPC), with support for direct transfer of serialized Java classes and distributed garbage collection.

 

The original implementation depends on Java Virtual Machine (JVM) class representation mechanisms and it thus only supports making calls from one JVM to another. The protocol underlying this Java-only implementation is known as Java Remote Method Protocol (JRMP).
In order to support code running in a non-JVM context, a CORBA version was later developed.

 

Usage of the term RMI may denote solely the programming interface or may signify both the API and JRMP, whereas the term RMI-IIOP (read: RMI over IIOP) denotes the RMI interface delegating most of the functionality to the supporting CORBA implementation.



and if you do not understand, looking one step up, java rmi is actually a java implementation of remote procedure call (rpc). Excerpts from wikipedia.
In computer science, a remote procedure call (RPC) is an inter-process communication that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction.[1] That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. When the software in question uses object-oriented principles, RPC is called remote invocation or remote method invocation.

Okay, enough of the theory, let's start a simple java rmi using tomcat. This learning tutorial assume you have tomcat server running and know basic how to deploy the jar file into your tomcat running server.

Provide access permission to the jar. Probably easiest if you are starting up to learn this and too much concept to grasp for, you start with grant permission for all security and when you are good at it, start to fine tune. You should set this in <TOMCAT_HOME>/conf/catalina.policy
  grant {
permission java.security.AllPermission;
};

Then now we will code at the server side. First, let's create a java interface which the server will implement this and the client will invoke this method remotely.
import java.rmi.Remote;

public interface CalculatorInterface extends Remote {

public final String serviceName = "MyRemoteService";

public Double Add(Double num1, Double num2) throws Exception;

public Double Sub(Double num1, Double num2) throws Exception;

public Double Mul(Double num1, Double num2) throws Exception;

public Double Div(Double num1, Double num2) throws Exception;

public Integer Factorial(Integer num) throws Exception;

public Float Random() throws Exception;
}

So we have an interface of Calculator which extends Remote interface. There are a few public method are exposed in this interface which will be invoke by client later.
public class Calculator implements CalculatorInterface {

public Calculator() {
super();
}

@Override
public Double Add(Double num1, Double num2) throws Exception {
return num1 + num2;
}

@Override
public Double Sub(Double num1, Double num2) throws Exception {
return num1 - num2;
}

@Override
public Double Mul(Double num1, Double num2) throws Exception {
return num1 * num2;
}

@Override
public Double Div(Double num1, Double num2) throws Exception {
return num1 / num2;
}

@Override
public Integer Factorial(Integer num) throws Exception {
Integer t = 1;
for(int i = 1; i <= num;i++){
t = t * i;
}
return t;
}

@Override
public Float Random() throws Exception {
return (float) Math.random();
}

}

Here, we implement the calculator. As seen here, all basic mathematics formulae. Now, we will start this instance in tomcat. The easy way would probably be implement servletContextListener and start the stub on a port when tomcat is starting. With that said, let's read the code below.
import java.rmi.registry.LocateRegistry;
import java.rmi.registry.Registry;
import java.rmi.server.UnicastRemoteObject;

import javax.servlet.ServletContextEvent;
import javax.servlet.ServletContextListener;

public class InitCalculator implements ServletContextListener {
public static boolean isRegistered = false;
public static CalculatorInterface service;

public InitCalculator() {
if (!isRegistered) {
try {
service = new Calculator();
CalculatorInterface stub = (CalculatorInterface)UnicastRemoteObject.exportObject(service, 0);
Registry registry = LocateRegistry.createRegistry(9345);
registry.rebind(CalculatorInterface.serviceName, stub);
System.out.println("Remote service bound");
isRegistered = true;
} catch (Exception e) {
System.err.println("Remote service exception:");
e.printStackTrace();
}
}

}

@Override
public void contextDestroyed(ServletContextEvent arg0) {
// TODO Auto-generated method stub

}

@Override
public void contextInitialized(ServletContextEvent arg0) {
new InitCalculator();
System.out.println("started ...");
}

}

As seen above, when webapp context is initialized, a new object InitCalculator() is created. This object is bind to port 9345, so make sure your firewall allow this as later you will need to access this port remotely. So we create a registray and bind it to the registry on port 9345. So very easy code. Remember to register this listener class into tomcat web descriptor.
  <listener>
<listener-class>com.example.InitCalculator</listener-class>
</listener>

Moving on to the last piece of puzzle, the client code.
import java.rmi.registry.LocateRegistry;
import java.rmi.registry.Registry;

public class CalculatorClient {

public static void main(String[] args) {
try {
Registry registry = LocateRegistry.getRegistry("localhost", 9345);
String[] names = registry.list();
for (String name: names) {
System.out.println("~~~~~" + name + "~~~");
}
CalculatorInterface serv = (CalculatorInterface)registry.lookup(CalculatorInterface.serviceName);
System.out.println("add total " + serv.Add(1d, 1d));
} catch (Exception e) {
e.printStackTrace();
}
}

}

As can be read above, the client code connect to localhost on port 9345 and then list what's in the registry. Then the interface is created with registry lookup on the interface service name. Now, we can invoke the server method.. Pretty cool stuff. :) See below.
[user@localhost ~]$ java -cp /var/lib/tomcat/webapps/example/WEB-INF/lib/example.jar:. CalculatorClient
~~~~~MyRemoteService~~~
add total 2.0

That's it.

Friday, February 27, 2015

how to determine currently occupied queue size and cache usage in elasticsearch 0.90

Have you encounter situation like, when a elasticsearch client is indexing into elasticsearch cluster, the client get rejected exception from the cluster? What about if you have cache some filters in your query and you want to know how much memory is used at the moment? If yes, and you are using elasticsearch 0.90, then you come to the right place. I'm going to show you how to show these statistics through elasticsearch exposed metric API. This is important if you want to determine the health of your cluster.

Okay, let's start with the first one, how to get the occupied queue size in the node cluster.
[jason@node009 ~]$ curl -XGET 'http://localhost:9200/_nodes/node009/stats/thread_pool?pretty'
{
"cluster_name" : "MY_TEST_Cluster",
"nodes" : {
"1111111111111111111111" : {
"timestamp" : 1422372473667,
"name" : "node009",
"transport_address" : "inet[my.private.ip.com/1.2.3.4:9300]",
"hostname" : "node009.foobar.com",
"thread_pool" : {
"generic" : {
"threads" : 1,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 82,
"completed" : 6378594
},
"index" : {
"threads" : 8,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 8,
"completed" : 25735782
},
"get" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"snapshot" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 4,
"completed" : 1003286
},
"merge" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 4,
"completed" : 4863710
},
"suggest" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"bulk" : {
"threads" : 8,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 8,
"completed" : 42148
},
"optimize" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"warmer" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 4,
"completed" : 2087615
},
"flush" : {
"threads" : 3,
"queue" : 0,
"active" : 1,
"rejected" : 0,
"largest" : 4,
"completed" : 10492
},
"search" : {
"threads" : 512,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 512,
"completed" : 245843
},
"percolate" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"management" : {
"threads" : 5,
"queue" : 0,
"active" : 1,
"rejected" : 0,
"largest" : 5,
"completed" : 2082438
},
"refresh" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 4,
"completed" : 1521727
}
}
}
}
}

As can be read above, this is node statistics and only thread pools stats are exposed. So if the client are actively index into elasticsearch, the metric you should look at is index. So the above sample look pretty good, there is no queue and no rejection.

Next, we will take a look at cache usage. Using the same node stats api but change the thread_pool to indices.
[jason@node009 ~]$ curl -XGET 'http://localhost:9200/_nodes/node009/stats/indices?pretty'
{
"cluster_name" : "MY_TEST_Cluster",
"nodes" : {
"1111111111111111111111" : {
"timestamp" : 1422373322128,
"name" : "node009",
"transport_address" : "inet[my.private.ip.com/1.2.3.4:9300]",
"hostname" : "node009.foobar.com",
"indices" : {
"docs" : {
"count" : 134502646,
"deleted" : 104806463
},
"store" : {
"size" : "340.9gb",
"size_in_bytes" : 366092384499,
"throttle_time" : "2.1ms",
"throttle_time_in_millis" : 2
},
"indexing" : {
"index_total" : 25692998,
"index_time" : "6.8h",
"index_time_in_millis" : 24495073,
"index_current" : 22015,
"delete_total" : 13217673,
"delete_time" : "14.6m",
"delete_time_in_millis" : 877101,
"delete_current" : 0
},
"get" : {
"total" : 0,
"get_time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"open_contexts" : 1,
"query_total" : 204027,
"query_time" : "1.9h",
"query_time_in_millis" : 6856699,
"query_current" : 0,
"fetch_total" : 34409,
"fetch_time" : "2.4m",
"fetch_time_in_millis" : 148210,
"fetch_current" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 563950,
"total_time" : "7.5h",
"total_time_in_millis" : 27166425,
"total_docs" : 219150610,
"total_size" : "374.6gb",
"total_size_in_bytes" : 402324404903
},
"refresh" : {
"total" : 1522907,
"total_time" : "10.5h",
"total_time_in_millis" : 38110904
},
"flush" : {
"total" : 10499,
"total_time" : "2.4h",
"total_time_in_millis" : 8726951
},
"warmer" : {
"current" : 0,
"total" : 2089352,
"total_time" : "9.3m",
"total_time_in_millis" : 560985
},
"filter_cache" : {
"memory_size" : "2.8gb",
"memory_size_in_bytes" : 3011274608,
"evictions" : 35449
},
"id_cache" : {
"memory_size" : "0b",
"memory_size_in_bytes" : 0
},
"fielddata" : {
"memory_size" : "140.5mb",
"memory_size_in_bytes" : 147415629,
"evictions" : 86803
},
"completion" : {
"size" : "231b",
"size_in_bytes" : 231
},
"segments" : {
"count" : 700
}
}
}
}
}

As seem above, there are two cache metrics, filter cache and id cache. Right now it is pretty clear, how much this cache is used in this node and how much evictions happened. There is also a metric, fielddata which is occupied memory in the jvm, you might want to keep an eye during monitoring. If you want to know exactly what field using how much memory, you can use this api
curl localhost:9200/_nodes/stats/indices/fielddata/field1,field2?pretty

But this one is left for you to play with as a home work. Hints, to replace field1 and field2 to the value you index and read the output. That's it. :-)

Sunday, February 15, 2015

Fix steam error libGL error: failed to load driver: swrast in debian

If you have steam client installed on debian sid, once a while, operating system is upgraded and then the upgraded break steam client. An example output of such error encountered.
user@localhost:~$ steam
Running Steam on debian 8 64-bit
STEAM_RUNTIME is enabled automatically
Installing breakpad exception handler for appid(steam)/version(1421694684)
libGL error: unable to load driver: r600_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: r600
libGL error: unable to load driver: swrast_dri.so
libGL error: failed to load driver: swrast
^C

So steam client fail to launch and this look like 3d graphic driver unable to load or not install. Don't bother to even install the package libgl1-mesa-swx11 that provide the file swrast because at this point of time, installation of this package will not work as conflict is clearly indicated. Conflicts: libgl1, libgl1-mesa-swrast, mesag3, mesag3+ggi, mesag3-glide, mesag3-glide2, nvidia-glx. Installation of this package will render debian gui not usable, had that path :( So don't do that.

So I have google and found a good solution and below is what I have taken. I hope it works for you too.
user@localhost:~/.local/share/Steam/ubuntu12_32/steam-runtime/i386/usr/lib/i386-linux-gnu$ mv libstdc++.so.6.0.18 libstdc++.so.6.0.18.remove.by.user
user@localhost:~/.local/share/Steam/ubuntu12_32/steam-runtime/i386/usr/lib/i386-linux-gnu$ ls libstdc++.so.6*
lrwxrwxrwx 1 user user 19 Jul 19 2014 libstdc++.so.6 -> libstdc++.so.6.0.18
-rw-r--r-- 1 user user 901K Jul 19 2014 libstdc++.so.6.0.18.remove.by.user
user@localhost:~/.local/share/Steam/ubuntu12_32/steam-runtime/i386/usr/lib/i386-linux-gnu$ rm libstdc++.so.6
rm: remove symbolic link ‘libstdc++.so.6’? y
user@localhost:~/.local/share/Steam/ubuntu12_32/steam-runtime/i386/usr/lib/i386-linux-gnu$ pwd
/home/user/.local/share/Steam/ubuntu12_32/steam-runtime/i386/usr/lib/i386-linux-gnu


user@localhost:~/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/lib/x86_64-linux-gnu$ pwd
/home/user/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/lib/x86_64-linux-gnu
user@localhost:~/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/lib/x86_64-linux-gnu$ mv libstdc++.so.6.0.18 libstdc++.so.6.0.18.remove.by.user
user@localhost:~/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/lib/x86_64-linux-gnu$ rm libstdc++.so.6
rm: remove symbolic link ‘libstdc++.so.6’? y
user@localhost:~/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/lib/x86_64-linux-gnu$ pwd
/home/user/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/lib/x86_64-linux-gnu
user@localhost:~/.local/share/Steam/ubuntu12_32/steam-runtime/amd64/usr/lib/x86_64-linux-gnu$

As you can see above, the example shown two symbolic links libstdc++.so.6 in two different directory, i386 and amd64 were removed. Then again for the file that symlink pointed to libstdc++.so.6.0.18 is moved to another name and so it can be revert if something goes wrong after.

After these were removed, start again the steam client and steam will redownload the file and it should work again! :-)

Saturday, February 14, 2015

how to connect to msn server with pidgin 2.10.11

After numeral news (here , here  and here) that msn will? was? shut down, today we will take another  look if connection to msn server is still possible. hehe

Well, this issue of cannot make connection to msn server happen to me again. But I'm not so sure if microsoft really shutdown the messenger server? Anyway, let's fire up pidgin with --debug option.
(04:39:32) account: Connecting to account ursa@hotmail.com.
(04:39:32) connection: Connecting. gc = 0x7fb12226a4a0
(04:39:32) msn: new httpconn (0x7fb12244ce40)
(04:39:32) proxy: Gnome proxy settings are set to 'manual' but no suitable proxy server is specified. Using Pidgin's proxy settings instead.
(04:39:32) dnsquery: Performing DNS lookup for messenger.hotmail.com
(04:39:32) proxy: Gnome proxy settings are set to 'manual' but no suitable proxy server is specified. Using Pidgin's proxy settings instead.
(04:39:32) dns: Wait for DNS child 4807 failed: No child processes
(04:39:32) dns: Wait for DNS child 4816 failed: No child processes
(04:39:32) dns: Created new DNS child 5206, there are now 1 children.
(04:39:32) dns: Successfully sent DNS request to child 5206
(04:39:32) dns: Got response for 'messenger.hotmail.com'
(04:39:32) dnsquery: IP resolved for messenger.hotmail.com
(04:39:32) proxy: Attempting connection to 64.4.45.209
(04:39:32) proxy: Connecting to messenger.hotmail.com:1863 with no proxy
(04:39:32) proxy: Connection in progress
(04:39:32) proxy: Connecting to messenger.hotmail.com:1863.
(04:39:32) proxy: Error connecting to messenger.hotmail.com:1863 (Connection refused).
(04:39:32) proxy: Connection attempt failed: Connection refused
(04:39:32) msn: Connection error: Connection refused
(04:39:32) msn: Connection error from Notification server (messenger.hotmail.com): Connection refused
(04:39:32) connection: Connection error on 0x7fb12226a4a0 (reason: 0 description: Connection error from Notification server:
Connection refused)
(04:39:32) account: Disconnecting account ursa@hotmail.com (0x7fb1218f83c0)
(04:39:32) connection: Disconnecting connection 0x7fb12226a4a0
(04:39:32) msn: destroy the OIM 0x7fb12226b250
(04:39:32) msn: destroy httpconn (0x7fb12244ce40)
(04:39:32) connection: Destroying connection 0x7fb12226a4a0

Bummer! So connection to msn really a problem (again) ! So I'm trying to play around the settings and surprise surprise, pidgin can connect to the msn again. >:-) Here is how I did to make it work.

  1. In the pidgin menu, click on Accounts.

  2. Click on Manage Accounts.

  3. Select your msn account and click on Modify...

  4. In the Modify Account window, click on Advanced tab and check the checkbox Use HTTP Method.

  5. Then in the Proxy tab, for the proxy type, select Use Environmental Settings. Note this setting really depend on your network setup so check with your network admin.


pidgin_modify_account_proxy pidgin_modify_account_advance

save the settings and click on the checkbox in the Enabled column in Accounts. Finger cross it will work, at least this time for me (until it break again). :-)

That's it!

 

Friday, February 13, 2015

using google guava library to hold data for report

Often times when ones work with report (just a typical report), it is pretty common to meet the situation like to hold a list of rows into a data type which has a key and value and maybe a page number. So for java programmer, you will encounter something like this.
public class Report  {

List<LinkedHashMap<String, String>> rows = new ArrayList<LinkedHashMap<String, String>>();
private int page;

public static void printReport(Report report) {

List<LinkedHashMap<String, String>> oldReport = report.getOld();

for (LinkedHashMap<String, String> oldRows : oldReport) {
for (Entry<String, String> entry : oldRows.entrySet()) {
System.out.print(entry.getValue());
}
}
}

}

So you will have many rows to hold each row in a report and within each row, you have a key and a value. For instance, one the first page of report, you will have a person with first name john and last name doe and age 30. Then you have another row of person, first name dan, last name christensen, age 40, etc. Then to print the report, you basically iterate over the data collections and print out its value.

Is there any other ways, better yet efficient?

So I have google and people suggest using guava and I will take a look at the different feature offer by guava and how it help me in this situation above. So what is google guava?
The Google Guava is an open-source set of common libraries for Java, mainly developed by Google engineers.

This page give a general overview for the common libraries found in google guava. As you notice, there are many features included in this library but for the report above, I will use only two of it. Let's rewrite the above code.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import com.google.common.base.Joiner;
import com.google.common.collect.LinkedHashMultimap;
import com.google.common.collect.LinkedHashMultiset;
import com.google.common.collect.Multimap;
import com.google.common.collect.Multiset;

public class Report {

private int page;
private List<String> header;
private Multimap<String, String> rows;
private Multiset<String> rowsSet;

private List<LinkedHashMap<String, String>> old = new ArrayList<LinkedHashMap<String, String>>();

public Report() {
page = 0;
header = new ArrayList<String>();
rows = LinkedHashMultimap.create();
rowsSet = LinkedHashMultiset.create();

}


public void getReportFromDS() {
header.addAll(Arrays.asList("firstName", "LastName", "age"));
rows.put("2829f395317df0f88597ef288f132827794707af", "john");
rows.put("2829f395317df0f88597ef288f132827794707af", "doe");
rows.put("2829f395317df0f88597ef288f132827794707af", "30");
rows.put("d94c2ddf2a4817e5c9a56db45d41ed876e823fcf", "dan");
rows.put("d94c2ddf2a4817e5c9a56db45d41ed876e823fcf", "christensen");
rows.put("d94c2ddf2a4817e5c9a56db45d41ed876e823fcf", "40");
rows.put("1fd23a55e9780810d2e6f0ec9ba1ddb99827e4cf", "chai");
rows.put("1fd23a55e9780810d2e6f0ec9ba1ddb99827e4cf", "lenny");
rows.put("1fd23a55e9780810d2e6f0ec9ba1ddb99827e4cf", "20");
}

public int getPage() {
return page;
}


public List<String> getHeader() {
return header;
}


public Multimap<String, String> getRows() {
return rows;
}

public List<LinkedHashMap<String, String>> getOld() {
return old;
}


public static void printReport(Report report) {
Joiner joiner = Joiner.on(", ");
String headers = joiner.join(report.getHeader());

System.out.println(headers);
Map<String, Collection<String>> rows = report.getRows().asMap();

for (Entry<String, Collection<String>> row : rows.entrySet()) {

String line = joiner.join(row.getValue().iterator());
System.out.println(line);
}

List<LinkedHashMap<String, String>> oldReport = report.getOld();

for (LinkedHashMap<String, String> oldRows : oldReport) {
for (Entry<String, String> entry : oldRows.entrySet()) {
System.out.print(entry.getValue());
}
}
}


public static void main(String[] args) {
Report sampleReport = new Report();
sampleReport.getReportFromDS();
printReport(sampleReport);
}

}

As noted from the full code above, it contain the constructor to initialize the objects. Then a method getReportFromDS(), you could probably get from your data source like database. Then we have getter methods and a static method to print the report. If you run this app, you notice it print out the report header, and then rows.

There is a class which join the string together with just two lines. Even better you can make it a line ;-) using Joinner. To print each row of sample report, you can using a for loop but only a for loop. Then you can join the value of the row and print out the row. Less codes and more readability. If you measure the object sampleReport, I guess is much use less memory footprint.

That's it, just two goodies features from google guava, I suggest you read on different features offered and fully use this great library.13

Sunday, February 1, 2015

Initial study on apache lucene

Today, we are going to learn apache lucene. So first thing first, what is apache lucene?
Apache Lucene is a free open source information retrieval software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License.

Let's go into apache lucene "hello world", so we get an basic idea what is it. Go to the offical site and download the latest release. Below is the tutorial I follow from the official documentation, and using apache lucene version 4.10.3 with oracle java 7 with slight modification to the tutorial.
jason@localhost:~/Desktop/lucene-4.10.3$ java -cp ./core/lucene-core-4.10.3.jar:./queryparser/lucene-queryparser-4.10.3.jar:./analysis/common/lucene-analyzers-common-4.10.3.jar:./demo/lucene-demo-4.10.3.jar org.apache.lucene.demo.IndexFiles
Usage: java org.apache.lucene.demo.IndexFiles [-index INDEX_PATH] [-docs DOCS_PATH] [-update]

This indexes the documents in DOCS_PATH, creating a Lucene indexin INDEX_PATH that can be searched with SearchFiles
jason@localhost:~/Desktop/lucene-4.10.3$ java -cp ./core/lucene-core-4.10.3.jar:./queryparser/lucene-queryparser-4.10.3.jar:./analysis/common/lucene-analyzers-common-4.10.3.jar:./demo/lucene-demo-4.10.3.jar org.apache.lucene.demo.IndexFiles -index data/ -docs docs/
Indexing to directory 'data/'...
adding docs/grouping/constant-values.html
adding docs/grouping/index.html
adding docs/grouping/allclasses-noframe.html
adding docs/grouping/overview-frame.html
adding docs/grouping/org/apache/lucene/search/grouping/AbstractGroupFacetCollector.html
...
...
...
adding docs/analyzers-phonetic/deprecated-list.html
adding docs/analyzers-phonetic/package-list
adding docs/analyzers-phonetic/allclasses-frame.html
95794 total milliseconds
jason@localhost:~/Desktop/lucene-4.10.3$ uptime
21:10:16 up 16:44, 23 users, load average: 5.45, 4.49, 3.59

As you can see, instead of indexing the source of java class file, I index the javadoc in html format and it works nicely. Although my system is loaded but the index still reasonably quick. Apache lucene finish index within 95seconds for a total of 5818 files. After index are done, if you do a list on the directory data, you will notice the lucene index files. If you want to go into details what are these files before, you should read this documentation.
jason@localhost:~/Desktop/lucene-4.10.3$ ls -l data/
total 13784
-rw-r--r-- 1 jason jason 284 Jan 13 21:07 _0.cfe
-rw-r--r-- 1 jason jason 12387776 Jan 13 21:07 _0.cfs
-rw-r--r-- 1 jason jason 242 Jan 13 21:07 _0.si
-rw-r--r-- 1 jason jason 284 Jan 13 21:07 _1.cfe
-rw-r--r-- 1 jason jason 1677329 Jan 13 21:07 _1.cfs
-rw-r--r-- 1 jason jason 242 Jan 13 21:07 _1.si
-rw-r--r-- 1 jason jason 151 Jan 13 21:07 segments_1
-rw-r--r-- 1 jason jason 36 Jan 13 21:07 segments.gen
-rw-r--r-- 1 jason jason 0 Jan 13 21:06 write.lock

Okay, now to the search.
jason@localhost:~/Desktop/lucene-4.10.3$ java -cp ./core/lucene-core-4.10.3.jar:./queryparser/lucene-queryparser-4.10.3.jar:./analysis/common/lucene-analyzers-common-4.10.3.jar:./demo/lucene-demo-4.10.3.jar  org.apache.lucene.demo.SearchFiles
Exception in thread "main" org.apache.lucene.store.NoSuchDirectoryException: directory '/home/jason/Desktop/lucene-4.10.3/index' does not exist
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:218)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:242)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:801)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:53)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:67)
at org.apache.lucene.demo.SearchFiles.main(SearchFiles.java:91)
jason@localhost:~/Desktop/lucene-4.10.3$ java -cp ./core/lucene-core-4.10.3.jar:./queryparser/lucene-queryparser-4.10.3.jar:./analysis/common/lucene-analyzers-common-4.10.3.jar:./demo/lucene-demo-4.10.3.jar org.apache.lucene.demo.SearchFiles --help
Exception in thread "main" org.apache.lucene.store.NoSuchDirectoryException: directory '/home/jason/Desktop/lucene-4.10.3/index' does not exist
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:218)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:242)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:801)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:53)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:67)
at org.apache.lucene.demo.SearchFiles.main(SearchFiles.java:91)
jason@localhost:~/Desktop/lucene-4.10.3$ java -cp ./core/lucene-core-4.10.3.jar:./queryparser/lucene-queryparser-4.10.3.jar:./analysis/common/lucene-analyzers-common-4.10.3.jar:./demo/lucene-demo-4.10.3.jar org.apache.lucene.demo.SearchFiles -h
Usage: java org.apache.lucene.demo.SearchFiles [-index dir] [-field f] [-repeat n] [-queries file] [-query string] [-raw] [-paging hitsPerPage]

See http://lucene.apache.org/core/4_1_0/demo/ for details.
jason@localhost:~/Desktop/lucene-4.10.3$ java -cp ./core/lucene-core-4.10.3.jar:./queryparser/lucene-queryparser-4.10.3.jar:./analysis/common/lucene-analyzers-common-4.10.3.jar:./demo/lucene-demo-4.10.3.jar org.apache.lucene.demo.SearchFiles -index data
Enter query:
string
Searching for: string
1674 total matching documents
1. docs/benchmark/org/apache/lucene/benchmark/byTask/utils/Format.html
2. docs/analyzers-common/org/apache/lucene/analysis/util/AbstractAnalysisFactory.html
3. docs/queryparser/deprecated-list.html
4. docs/queryparser/org/apache/lucene/queryparser/classic/class-use/ParseException.html
5. docs/queryparser/org/apache/lucene/queryparser/flexible/core/messages/QueryParserMessages.html
6. docs/core/org/apache/lucene/index/IndexFileNames.html
7. docs/analyzers-stempel/org/egothor/stemmer/Diff.html
8. docs/queryparser/org/apache/lucene/queryparser/ext/Extensions.html
9. docs/facet/org/apache/lucene/facet/FacetsConfig. html
10. docs/queryparser/org/apache/lucene/queryparser/flexible/messages/package-summary.html
Press (n)ext page, (q)uit or enter number to jump to a page.
n
11. docs/highlighter/org/apache/lucene/search/highlight/class-use/InvalidTokenOffsetsException.html
12. docs/queryparser/org/apache/lucene/queryparser/xml/DOMUtils.html
13. docs/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html
14. docs/core/org/apache/lucene/index/SegmentInfo.html
15. docs/highlighter/org/apache/lucene/search/vectorhighlight/FragmentsBuilder.html
16. docs/highlighter/org/apache/lucene/search/vectorhighlight/class-use/FieldFragList.html
17. docs/highlighter/org/apache/lucene/search/vectorhighlight/BaseFragmentsBuilder.html
18. docs/queryparser/org/apache/lucene/queryparser/flexible/standard/QueryParserUtil.html
19. docs/highlighter/org/apache/lucene/search/highlight/GradientFormatter.html
20. docs/highlighter/org/apache/lucene/search/postingshighlight/PostingsHighlighter.html
Press (p)revious page, (n)ext page, (q)uit or enter number to jump to a page.
q
Enter query:
quit
Searching for: quit
2 total matching documents
1. docs/demo/src-html/org/apache/lucene/demo/SearchFiles.html
2. docs/changes/Changes.html
Press (q)uit or enter number to jump to a page.
q
Enter query:
^Cjason@localhost:~/Desktop/lucene-4.10.3$

The search is quick even though in the loaded system. That's it, a light learning experience on apache lucene.