Saturday, December 5, 2015

Learn java util concurrent part3

This series is the next learning series of java.util.concurrent. You should read the first part and second part too. Today we will learn remaining ten interfaces in package java.util.concurrent.

Okay, let's start with ForkJoinPool.ForkJoinWorkerThreadFactory. For programming wise, you should not be worry as the class ForkJoinPool takes care of this implementation. Within class ForkJoinPool, we see that, there are two classes


  • DefaultForkJoinWorkerThreadFactory
  • InnocuousForkJoinWorkerThreadFactory

which implement ForkJoinWorkerThreadFactory with access modifier to default. So unless you know what you want and you know how ForkJoinPool work, then subclass ForkJoinPool away. For beginner in this article, it is sufficient to just use ForkJoinPool.

ManagedBlocker is an interface for extending managed parallelism for tasks running in ForkJoinPools. There are two methods to be implemented, block() and isReleasable()

1:  public class QueueManagedBlocker<T> implements ManagedBlocker {  
2:       
3:     final BlockingQueue<T> queue;  
4:     volatile T value = null;  
5:       
6:     QueueManagedBlocker(BlockingQueue<T> queue) {  
7:        this.queue = queue;  
8:     }  
9:    
10:     @Override  
11:     public boolean block() throws InterruptedException {  
12:        if (value == null)  
13:           value = queue.take();  
14:        return true;  
15:     }  
16:    
17:     @Override  
18:     public boolean isReleasable() {  
19:        return value != null || (value = queue.poll()) != null;  
20:     }  
21:       
22:     public T getValue() {  
23:        return value;  
24:     }  
25:    
26:  }  

Next, we have interface Future<V> which we have see before in the previous learning series.

1:  ExecutorService executorService = Executors.newFixedThreadPool(1);  
2:  Future<Integer> future = executorService.submit(new Summer(11,22));  

It's very clear you can obtain the result via future variable above. Interface RejectedExecutionHandler is mostly for error handling.

1:  RejectedExecutionHandler executionHandler = new MyRejectedExecutionHandlerImpl();  
2:  ThreadPoolExecutor executor = new ThreadPoolExecutor(3, 3, 10, TimeUnit.SECONDS, worksQueue, executionHandler);  
3:    
4:  public class MyRejectedExecutionHandlerImpl implements RejectedExecutionHandler {  
5:    
6:     @Override  
7:     public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {  
8:        System.out.println(r.toString() + " : I've been rejected !");  
9:     }  
10:    
11:  }  

So you can set the implementation classed to ThreadPoolExecutor and if a task cannot be executor by the ThreadPoolExecutor, rejectedExecution will be executed. Moving onto the next interface, RunnableFuture<V> .

1:  RunnableFuture<Integer> rf = new FutureTask<Integer>(new Summer(22,33));  

so we see an initialization of object FutureTask with a callable class Summer class which we created in the previous learning series. Interface RunnableScheduledFuture which extend the previous interface RunnableFuture has another additional method to implement upon on.

1:  RunnableScheduledFuture<Integer> rsf = new Summer1();  
2:  System.out.println(rsf.isPeriodic());RunnableFuture<Integer> rf = new FutureTask<Integer>(new Summer(22,33));  

In the class Summer1, you should determine if the class is periodic or not. ScheduledExecutorService is pretty common if you google this interface and given the code below.

1:  ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);  
2:  scheduler.scheduleAtFixedRate(() -> System.out.println("hihi"), 1, 1, TimeUnit.SECONDS);  
3:  Thread.sleep(3000);  
4:  scheduler.shutdown();  

so we see a thread is executed every second.

1:  ScheduledFuture<Integer> sf = new ScheduledFutureImpl();  
2:  sf.isCancelled();  

ScheduledFuture<V> is a delayed result-bearing action that can be cancelled. Usually a scheduled future is the result of scheduling a task with a ScheduledExecutorService. This class is pretty common if you have a future task which get delay for whatever reason or it may get cancel, you want to look further into this class.

ThreadFactory is another interface which creates new threads on demand. Using thread factories removes hardwiring of calls to new Thread, enabling applications to use special thread subclasses, priorities, etc.

1:  ThreadFactory tf = Executors.defaultThreadFactory();  
2:  tf.newThread(()->System.out.println("ThreadFactory")).start();  

In this last series, we take a look at the last interface, TransferQueue.  A TransferQueue may be useful for example in message passing applications in which producers sometimes (using method transfer(E)) await receipt of elements by consumers invoking take or poll, while at other times enqueue elements (via method put) without waiting for receipt.

1:  TransferQueue<Integer> tq = new LinkedTransferQueue<Integer>();  

That's it for this learning series. Thank you. Oh, and the source code.

https://github.com/jasonwee/videoOnCloud/commit/ce479e5befaf7abe84d3d85930d5196a639e2643

https://github.com/jasonwee/videoOnCloud/blob/master/src/java/org/just4fun/concurrent/MyRejectedExecutionHandlerImpl.java
https://github.com/jasonwee/videoOnCloud/blob/master/src/java/org/just4fun/concurrent/ExampleThreadPoolExecutor.java
https://github.com/jasonwee/videoOnCloud/blob/master/src/java/org/just4fun/concurrent/DemoExecutor.java

https://github.com/jasonwee/videoOnCloud/blob/master/src/java/play/learn/java/concurrent/LearnConcurrentInterfaceP2.java
https://github.com/jasonwee/videoOnCloud/blob/master/src/java/play/learn/java/concurrent/QueueManagedBlocker.java
https://github.com/jasonwee/videoOnCloud/blob/master/src/java/play/learn/java/concurrent/ScheduledFutureImpl.java
https://github.com/jasonwee/videoOnCloud/blob/master/src/java/play/learn/java/concurrent/Summer1.java


Friday, December 4, 2015

Learn java util concurrent part2

This series is the next learning series of java.util.concurrent. You should read the first part here too. Today we will learn ten interfaces in package java.util.concurrent.

Okay, let's start on the queue interface, BlockingDeque. Some characteristics of this interface including

  • blocking
  • thread safe
  • does not permit null elements
  • may (or may not) be capacity-constrained.

and we can do adding/removing item to this queue. Example below.

1:  bd.add(1);  
2:  bd.add(2);  
3:  System.out.println("size: " + bd.size());  
4:          
5:  bd.add(3);  
6:  System.out.println("size: " + bd.size());  
7:  //bd.add(4); // exception  
8:          
9:  bd.forEach(s -> System.out.println(s));  

Try play around this class with different methods to get a basic understanding on it. Next, we have a similar queue called BlockingQueue. It's characters same as BlockingDeque, not sure where is the different. But official java has many classes implement this BlockingQueue.

1:  BlockingQueue<Integer> bq = new ArrayBlockingQueue<Integer>(10);  
2:  bq = new DelayQueue();  
3:  bq = new LinkedBlockingDeque<Integer>();  
4:  bq = new LinkedTransferQueue<Integer>();  
5:  bq = new PriorityBlockingQueue<Integer>();  
6:  bq = new SynchronousQueue<Integer>();  

Next we have Callable interface which is similar to Runnable with a clear distinction. Callable return a value.

1:  ExecutorService executorService = Executors.newFixedThreadPool(1);  
2:    
3:  Future<Integer> future = executorService.submit(new Summer(11,22));  
4:    
5:  try {  
6:     Integer total = future.get();  
7:     System.out.println("sum " + total);  
8:  } catch (Exception e) {  
9:     e.printStackTrace();  
10:  }  
11:    

As can be read above, Summer is a implementation of interface Callable and it is submitted to an executor service to be execute upon on. CompletableFuture.AsynchronousCompletionTask is a interesting interface and its official documentation said "A marker interface identifying asynchronous tasks produced by async methods. This may be useful for monitoring, debugging, and tracking asynchronous activities."

1:  CompletableFuture<Integer> cf = new CompletableFuture<Integer>();  
2:  ForkJoinPool.commonPool().submit(  
3:        (Runnable & CompletableFuture.AsynchronousCompletionTask)()->{  
4:      try {  
5:         cf.complete(1);  
6:      } catch (Exception e) {  
7:         cf.completeExceptionally(e);  
8:      }  
9:   });  

As can be read above, we submit a anonymous function to the ForkJoinPool where this anonymous function cast into the intersection of interface Runnable and CompletableFuture.AsynchronousCompletionTask. Moving on, we have Interface CompletionService. Now if you have a long running service, you might want to look into this interface. Example as can be read below.

1:  CompletionService<Integer> longRunningCompletionService = new ExecutorCompletionService<Integer>(executorService);  
2:    
3:  longRunningCompletionService.submit(() -> {System.out.println("done"); return 1;});  
4:    
5:  try {  
6:     Future<Integer> result = longRunningCompletionService.take();  
7:     System.out.println(result.get());  
8:  } catch (Exception e) {  
9:     // TODO Auto-generated catch block  
10:     e.printStackTrace();  
11:  }  

Persist the object longRunningCompletionService throughout your application and the result can be retrieve in the future. Pretty handy. Moving on, we have a new Interface CompletionStage which debut on jdk8. From CompletionStage javadoc, A stage of a possibly asynchronous computation, that performs an action or computes a value when another CompletionStage completes. A stage completes upon termination of its computation, but this may in turn trigger other dependent stages.

Example code using CompletionStage as of following.

1:  ListenableFuture<String> springListenableFuture = createSpringListenableFuture();  
2:    
3:  CompletableCompletionStage<Object> completionStage = factory.createCompletionStage();  
4:  springListenableFuture.addCallback(new ListenableFutureCallback<String>() {  
5:    @Override  
6:    public void onSuccess(String result) {  
7:       System.out.println("onSuccess called");  
8:      completionStage.complete(result);  
9:    }  
10:    @Override  
11:    public void onFailure(Throwable t) {  
12:       System.out.println("onFailure called");  
13:      completionStage.completeExceptionally(t);  
14:    }  
15:  });  
16:    
17:  completionStage.thenAccept(System.out::println);  

Until here, if you don't understand, you should start to take the code and start to work on it. In this concurrent package, we have two Maps to use, that is ConcurrentMap and ConcurrentNavigableMap.

1:  ConcurrentMap<String, String> cm = new ConcurrentHashMap();  
2:  cm = new ConcurrentSkipListMap<String, String>();  
3:    
4:  ConcurrentNavigableMap<String, String> cnm = new ConcurrentSkipListMap<String, String>();  

ConcurrentMap providing thread safety and atomicity guarantees whilst ConcurrentNavigableMap support additional supporting NavigableMap operations, and recursively so for its navigable sub-maps. Then we have interface Delayed.

1:  Random random = new Random();  
2:  int delay = random.nextInt(10000);  
3:  Delayed employer = new SalaryDelay("a lot of bs reasons", delay);  
4:  System.out.println("bullshit delay this time " + employer.getDelay(TimeUnit.SECONDS));  

Delayed is an interface where you should implemented two require methods, getDelay and compareTo. As you can read fundamentally is you have a few object which feed to the executor to process upon but not immmediate, it may get delay for any reasons. As an example, pretty common in the working world, emplayer delay salary for any reasons.

Last two interfaces is related to each other where ExecutorService is a sub interface of Executor. For ExecutorService, we have seen an example above and below is for Executor,

1:  Executor executor = new ForkJoinPool();  
2:  executor = new ScheduledThreadPoolExecutor(1);  
3:    
4:  BlockingQueue<Runnable> blockingQueue = new ArrayBlockingQueue<Runnable>(4);  
5:  executor = new ThreadPoolExecutor(1, 1, 1000, TimeUnit.SECONDS, blockingQueue);  

We see there are many classes implemented Executor interface. Executor interface guaranteed


  • An object that executes submitted Runnable tasks. 
  • Executor interface does not strictly require that execution be asynchronous


That's it for this article, we continue the rest in the next article!

Oh before that, you can download the full source at the follow links

https://github.com/jasonwee/videoOnCloud/commit/3d291610df89c610215b8ecdbcc59cbb028ba14b
https://github.com/jasonwee/videoOnCloud/blob/master/src/java/play/learn/java/concurrent/LearnConcurrentInterfaceP1.java
https://github.com/jasonwee/videoOnCloud/blob/master/src/java/play/learn/java/concurrent/SalaryDelay.java
https://github.com/jasonwee/videoOnCloud/blob/master/src/java/play/learn/java/concurrent/Summer.java
https://github.com/jasonwee/videoOnCloud/blob/master/src/java/play/learn/java/CompletableFuture/LearnCompletionStage.java

Sunday, November 22, 2015

Learn java util concurrent part1

Today we are going to learn classes in java package java.util.concurrent. Because there are many classes within java.util.concurrent package, there will be several articles covering classes in this package. Let's start with a simple class first, TimeUnit.

1:  package play.learn.java.concurrent;  
2:    
3:  import java.util.concurrent.BrokenBarrierException;  
4:  import java.util.concurrent.TimeUnit;  
5:    
6:  public class LearnTimeUnit {  
7:       
8:     public LearnTimeUnit() throws InterruptedException {  
9:          
10:        // assuming we have a long running apps which ran for 2 days 7hours 35minutes 6 seconds   
11:        long longRunningApplicationDuration = 200102l;  
12:          
13:        System.out.println("duration in nanos " + TimeUnit.SECONDS.toNanos(longRunningApplicationDuration));  
14:        System.out.println("duration in days " + TimeUnit.SECONDS.toDays(longRunningApplicationDuration));  
15:        System.out.println("duration in hours " + TimeUnit.SECONDS.toHours(longRunningApplicationDuration));  
16:        System.out.println("duration in micros " + TimeUnit.SECONDS.toMicros(longRunningApplicationDuration));  
17:        System.out.println("duration in millis " + TimeUnit.SECONDS.toMillis(longRunningApplicationDuration));  
18:        System.out.println("duration in minutes " + TimeUnit.SECONDS.toMinutes(longRunningApplicationDuration));  
19:        System.out.println("duration in seconds " + TimeUnit.SECONDS.toSeconds(longRunningApplicationDuration));  
20:          
21:          
22:        TimeUnit[] var = TimeUnit.values();  
23:        System.out.println("size " + var.length);  
24:          
25:        for (TimeUnit elem : var) {  
26:           System.out.println(elem.name());  
27:        }  
28:          
29:        TimeUnit.SECONDS.sleep(10);  
30:     }  
31:    
32:     public static void main(String[] args) throws InterruptedException {  
33:        new LearnTimeUnit();  
34:     }  
35:    
36:  }  

TimeUnit provides several helpful methods to convert the time to different unit. You can also download the above source code here.

Next, we will take a look at concurrent exceptions. This exception will become meaningful when we try catch it in the class. For now, we will go through the definition to get a basic understanding of them. Below is a summarization.

BrokenBarrierException
Exception thrown when a thread tries to wait upon a barrier that is in a broken state, or which enters the broken state while the thread is waiting.

CancellationException
Exception indicating that the result of a value-producing task, such as a FutureTask, cannot be retrieved because the task was cancelled.

CompletionException
Exception thrown when an error or other exception is encountered in the course of completing a result or task.

ExecutionException
Exception thrown when attempting to retrieve the result of a task that aborted by throwing an exception. This exception can be inspected using the Throwable.getCause() method.

RejectedExecutionException
Exception thrown by an Executor when a task cannot be accepted for execution.

TimeoutException
Exception thrown when a blocking operation times out. Blocking operations for which a timeout is specified need a means to indicate that the timeout has occurred. For many such operations it is possible to return a value that indicates timeout; when that is not possible or desirable then TimeoutException should be declared and thrown.


BrokenBarrierException example, for full source code, you can download it here.

1:  package play.learn.java.concurrent;  
2:    
3:  import java.util.concurrent.BrokenBarrierException;  
4:  import java.util.concurrent.CyclicBarrier;  
5:    
6:  public class LearnBrokenBarrierException {  
7:       
8:     private CyclicBarrier cibai;  
9:     public static int count = 0;  
10:       
11:     private void manageThread() {  
12:        cibai = new CyclicBarrier(3);  
13:          
14:        for (int i = 0; i < 3; i++) {  
15:           new Thread(new Worker(cibai)).start();  
16:        }  
17:     }  
18:       
19:     public static void barrierComplete(CyclicBarrier cb) {  
20:        System.out.println("collating task");  
21:          
22:        if (count == 3) {  
23:           System.out.println("Exit from system");  
24:           // comment for finite  
25:           System.exit(0);  
26:        }  
27:        count++;  
28:          
29:        for (int i = 0; i < 3; i++) {  
30:        new Thread(new Worker(cb)).start();  
31:        }  
32:     }  
33:       
34:     public static void main(String[] args) {  
35:        new LearnBrokenBarrierException().manageThread();   
36:     }  
37:       
38:     static class Worker implements Runnable {  
39:          
40:        CyclicBarrier cibai;  
41:          
42:        public Worker(CyclicBarrier cb) {  
43:           this.cibai = cb;  
44:        }  
45:          
46:        @Override  
47:        public void run() {  
48:           doSomeWork();  
49:           try {  
50:              if (cibai.await() == 0)  
51:                 LearnBrokenBarrierException.barrierComplete(cibai);  
52:           } catch (InterruptedException e) {  
53:              e.printStackTrace();  
54:           } catch (BrokenBarrierException e) {  
55:              e.printStackTrace();  
56:           }  
57:        }  
58:    
59:        private void doSomeWork() {  
60:           System.out.println("Doing some work");  
61:        }  
62:          
63:     }  
64:    
65:  }  
66:    

CancellationException, ExecutionException, RejectedExecutionException and TimeoutException example, see below. Full source code can be download here.

1:  package play.learn.java.concurrent;  
2:    
3:  import java.util.concurrent.Callable;  
4:  import java.util.concurrent.CancellationException;  
5:  import java.util.concurrent.ExecutionException;  
6:  import java.util.concurrent.ExecutorService;  
7:  import java.util.concurrent.Executors;  
8:  import java.util.concurrent.TimeUnit;  
9:  import java.util.concurrent.TimeoutException;  
10:  import java.util.concurrent.FutureTask;  
11:    
12:  public class LearnCancellationException {  
13:    
14:     public static void main(String[] args) {  
15:        MyCallable callable1 = new MyCallable(1000);  
16:        MyCallable callable2 = new MyCallable(2000);  
17:    
18:        FutureTask<String> futureTask1 = new FutureTask<String>(callable1);  
19:        FutureTask<String> futureTask2 = new FutureTask<String>(callable2);  
20:    
21:        ExecutorService executor = Executors.newFixedThreadPool(2);  
22:        executor.execute(futureTask1);  
23:        executor.execute(futureTask2);  
24:    
25:        while (true) {  
26:           try {  
27:              if(futureTask1.isDone() && futureTask2.isDone()){  
28:                 System.out.println("Done");  
29:                 //shut down executor service  
30:                 executor.shutdown();  
31:                 return;  
32:              }  
33:                
34:              // uncomment for cancel  
35:              //futureTask2.cancel(true);  
36:    
37:              if(!futureTask1.isDone()){  
38:              //wait indefinitely for future task to complete  
39:              System.out.println("FutureTask1 output="+futureTask1.get());  
40:              }  
41:    
42:              System.out.println("Waiting for FutureTask2 to complete");  
43:              // set a samll range to get timedout exception.  
44:              String s = futureTask2.get(2000L, TimeUnit.MILLISECONDS);  
45:              if(s !=null){  
46:                 System.out.println("FutureTask2 output="+s);  
47:              }  
48:           } catch (CancellationException e) {  
49:              e.printStackTrace();  
50:           } catch (InterruptedException | ExecutionException e) {  
51:              e.printStackTrace();  
52:           } catch(TimeoutException e){  
53:              e.printStackTrace();  
54:           }  
55:        }  
56:    
57:     }  
58:       
59:     static class MyCallable implements Callable<String> {  
60:          
61:        private long waitTime;  
62:          
63:        public MyCallable(int timeInMillis) {  
64:           this.waitTime = timeInMillis;  
65:        }  
66:    
67:        @Override  
68:        public String call() throws Exception {  
69:           Thread.sleep(waitTime);  
70:           return Thread.currentThread().getName();  
71:        }  
72:          
73:     }  
74:    
75:  }  
76:    

CompletionException example, for full source code, you can download it here.

1:  public class LearnCompletableFuture {  
2:    
3:     public void learnCompletionException() {  
4:        try {  
5:           List<String> list = Arrays.asList("A", "B", "C", "D");  
6:           list.stream().map(s->CompletableFuture.supplyAsync(() -> s+s))  
7:           .map(f->f.getNow("Not Done")).forEach(s->System.out.println(s));  
8:    
9:        } catch (CompletionException e) {  
10:           e.printStackTrace();  
11:        }  
12:    
13:     }  
14:    
15:     public static void main(String[] args) {  
16:        LearnCompletableFuture c = new LearnCompletableFuture();  
17:        c.learnCompletionException();  
18:     }  
19:  }  

That's it for this article, for the incoming interface and classed until java.util.concurrent which will be publish in the next few articles, until then.

Saturday, November 21, 2015

Java Garbage Collector

If you are a java developer, java garbage collection (gc) sometime pop up from time to time in javadoc, online article or online discussion. It is such a hot and tough topic because that is entirely different paradigm than what programmer usually do, that is coding. Java gc free heap for the object you created in class in the background. In the past, I also cover a few article which related to java gc and today I am thinking to go through several blogs/articles which I found online, learn the basic and share what I've learned  and hopefully for java programmer, java gc will become clearer.

When you start a java application, with the parameters that are assigned to the java, the operating system will reserved some memory for java application known as heap. The heap further divided into several regions collectively known as eden, survivor spaces, old gen and perm gens. In oracle java8 hotspot, perm gen has been removed, be sure to always check official documention on garbage collector for changes. Below are a few links for hotspot implementation for java gc.
Survivor spaces are divided into two, survivor 0 and survivor 1. Both eden and survivor spaces collectively known as Young generation or new generation whilst old gen also known as tenured generation. Garbage collections will happened on young generation and old generations. Below are two diagrams show the heap regions are divided.



While the concept of Garbage Collection is the same, the implementation is not and neither are the default settings or how to tune it. The well known jvm includes the oracle sun hotspot, oracle jrockit and ibm j9. You can find the other jvm lists here. Essentially garbage collection will perform on young generation and old generation to remove object on heap that has no valid reference.

common java parameters settings. For full list, issue the command java -X

-Xms initial java heap size
-Xmx maximum java heap size
-Xmn the size of the heap for the young generation

There are a few type of GC
- serial gc
- parallel gc
- parallel old gc
- cms gc 

You can specify what gc implementation to run on the java heap region.

If you run a server application, the metric exposed by gc is definitely to watch out for. In order to get the metric, you can use

That's it for this brief introduction.

Friday, November 20, 2015

Yet another hdd vs ssd comparison

There are many article online that articulate how fast a solid state drive in comparison to a spinning hard disk drive. But ones wonder really, just how fast is ssd. I mean if your earning is limited and the current spinning disk is working fine, there is no compelling reason to make the change and at the same time, it will be difficult to afford a ssd considering the cost per gb. As of this writing, a samsung 850 pro 512GB selling at compuzone malaysia at a staggering pricing of 1378MYR!!! But there is a kind soul who generously donate ssd to me and as a gesture of good will back to him, here in I will write a blog describe my experience migrating from a Hitachi/HGST Travelstar Z7K500 HGST HTS725050A7E630 to a Samsung SSD 850 PRO 512GB.

One things for sure, ssd weight is so light. It is only 66gram! In comparison with hdd at 95gram. For the first timer, ssd weight is so light that it confused me like did it actually contain the size of 512GB.

For the spinning hdd, I took 5 samples, 1min 41sec, 1min 34sec, 1min 34sec, 1min 34sec and 1min 31sec. All measurements from the moment I push the power button until I see the login screen. Of cause, there many services running during bootup. On average, the bootup time is around  1min 48sec.

Next, I benchmark the current hdd with hdparm and dd.

 user@localhost:~$ sudo hdparm -t /dev/sda5  
   
 /dev/sda5:  
  Timing buffered disk reads: 322 MB in 3.01 seconds = 107.12 MB/sec  
 user@localhost:~$ sudo hdparm -t /dev/sda5  
   
 /dev/sda5:  
  Timing buffered disk reads: 322 MB in 3.01 seconds = 107.13 MB/sec  
 user@localhost:~$ sudo hdparm -t /dev/sda5  
   
 /dev/sda5:  
  Timing buffered disk reads: 322 MB in 3.01 seconds = 107.12 MB/sec  
   
   
 user@localhost:~$ sudo hdparm -T /dev/sda5  
   
 /dev/sda5:  
  Timing cached reads:  5778 MB in 2.00 seconds = 2889.25 MB/sec  
 user@localhost:~$ sudo hdparm -T /dev/sda5  
   
 /dev/sda5:  
  Timing cached reads:  5726 MB in 2.00 seconds = 2863.32 MB/sec  
 user@localhost:~$ sudo hdparm -T /dev/sda5  
   
 /dev/sda5:  
  Timing cached reads:  5702 MB in 2.00 seconds = 2850.84 MB/sec  

I have performed 3 tests with cache and without cache reading. On average, non caching read speed is about 107.12MB/sec and caching read speed is 2867.80MB/sec . The parameter for hdparm is shown below.
       -t     Perform timings of device reads for benchmark and comparison purposes.  For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system
(no  other active processes) with at least a couple of megabytes of free memory.  This displays the speed of reading through the buffer cache to the disk without any prior caching of data.  This measurement is an indication of how fast the drive can sustain sequential data reads under Linux, without any filesystem overhead.  To ensure  accurate measurements, the buffer cache is flushed during the processing of -t using the BLKFLSBUF ioctl.
       -T     Perform  timings of cache reads for benchmark and comparison purposes.  For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system
 (no other active processes) with at least a couple of megabytes of free memory.  This displays the speed of reading directly from  the  Linux  buffer  cache  without  disk access.  This measurement is essentially an indication of the throughput of the processor, cache, and memory of the system under test.

Next, I use command dd to do higher layer benchmark on the disk. See below.

 user@localhost:~$ time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm -f ddfile  
 250000+0 records in  
 250000+0 records out  
 2048000000 bytes (2.0 GB) copied, 32.5028 s, 63.0 MB/s  
   
 real     0m41.773s  
 user     0m0.068s  
 sys     0m4.048s  
 user@localhost:~$ time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm -f ddfile  
 250000+0 records in  
 250000+0 records out  
 2048000000 bytes (2.0 GB) copied, 27.1676 s, 75.4 MB/s  
   
 real     0m37.012s  
 user     0m0.056s  
 sys     0m3.848s  
 user@localhost:~$ time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm -f ddfile  
 250000+0 records in  
 250000+0 records out  
 2048000000 bytes (2.0 GB) copied, 19.4599 s, 105 MB/s  
   
 real     0m37.929s  
 user     0m0.064s  
 sys     0m3.740s  

So the arithmetic, ( 8 x 1024 x 250000 / 2014 / 1024 ) / real so that give 46.75MB/sec, 52.77MB/sec and 51.49MB/sec respectively on all three tests and on average 50.33MB/sec.

Now to the samsung ssd.

 root@localhost:~# sudo hdparm -t /dev/sda6  
   
 /dev/sda6:  
  Timing buffered disk reads: 770 MB in 3.01 seconds = 256.00 MB/sec  
 root@localhost:~# sudo hdparm -t /dev/sda6  
   
 /dev/sda6:  
  Timing buffered disk reads: 762 MB in 3.00 seconds = 253.98 MB/sec  
 root@localhost:~# sudo hdparm -t /dev/sda6  
   
 /dev/sda6:  
  Timing buffered disk reads: 758 MB in 3.00 seconds = 252.44 MB/sec  
   
   
 root@localhost:~# sudo hdparm -T /dev/sda6   
   
 /dev/sda6:  
  Timing cached reads:  5820 MB in 2.00 seconds = 2910.31 MB/sec  
 root@localhost:~# sudo hdparm -T /dev/sda6   
   
 /dev/sda6:  
  Timing cached reads:  6022 MB in 2.00 seconds = 3011.33 MB/sec  
 root@localhost:~# sudo hdparm -T /dev/sda6   
   
 /dev/sda6:  
  Timing cached reads:  5698 MB in 2.00 seconds = 2849.14 MB/sec  
 root@localhost:~#   
   
   
 root@localhost:~# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm -f ddfile   
 250000+0 records in  
 250000+0 records out  
 2048000000 bytes (2.0 GB) copied, 4.14268 s, 494 MB/s  
   
 real     0m8.280s  
 user     0m0.040s  
 sys     0m2.084s  
 root@localhost:~# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm -f ddfile   
 250000+0 records in  
 250000+0 records out  
 2048000000 bytes (2.0 GB) copied, 4.18595 s, 489 MB/s  
   
 real     0m8.279s  
 user     0m0.068s  
 sys     0m2.036s  
 root@localhost:~# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm -f ddfile   
 250000+0 records in  
 250000+0 records out  
 2048000000 bytes (2.0 GB) copied, 3.94227 s, 519 MB/s  
   
 real     0m8.258s  
 user     0m0.080s  
 sys     0m2.060s  
   

On average, non caching read speed is 253.67MB/sec and caching read speed is 2926.93MB/sec. As for dd tests, on average, 236.10MB/sec. On average, the bootup time is around  30sec! The significant change is during after grub and the login shown is around 3 seconds or less. Although the rated for this ssd for sequantil read and write is 550/520MB/sec, maybe it was because of my old system bandwidth maxing out.

Significant time reduced during bootup and disk read is clearly seen from the statistics above. As for user experience, everything become so fast! To put into comparison, hdd to ssd is like proton car to F1 car. I think in the future, it will help in term of programming like code grepping and code find.


UPDATE: with adjustment to the grub timeout and changes to the POST to fast boot, now my cold boot to login screen is improved to 20seconds!

Sunday, November 8, 2015

Light learning into CouchDB

Today we will explore another opensource database, CouchDB. First, let's understand what is Apache CouchDB

Apache CouchDB, commonly referred to as CouchDB, is an open source database that focuses on ease of use and on being "a database that completely embraces the web".[1] It is a document-oriented NoSQL database that uses JSON to store data, JavaScript as its query language using MapReduce, and HTTP for an API.[1] CouchDB was first released in 2005 and later became an Apache project in 2008.

Actually couch is an acronym for Cluster Of Unreliable Commodity Hardware.  If you are using debian based linux distribution, it will be very easy, just apt-get install couchdb . If not, you can check out this link on how to install for other linux distribution. Once installed, make sure couchdb is running.

 root@localhost:~# /etc/init.d/couchdb status  
 ● couchdb.service - LSB: Apache CouchDB init script  
   Loaded: loaded (/etc/init.d/couchdb)  
   Active: active (exited) since Thu 2015-08-20 23:07:30 MYT; 14s ago  
    Docs: man:systemd-sysv-generator(8)  
   
 Aug 20 23:07:28 localhost systemd[1]: Starting LSB: Apache CouchDB init script...  
 Aug 20 23:07:28 localhost su[14399]: Successful su for couchdb by root  
 Aug 20 23:07:28 localhost su[14399]: + ??? root:couchdb  
 Aug 20 23:07:28 localhost su[14399]: pam_unix(su:session): session opened for user couchdb by (uid=0)  
 Aug 20 23:07:30 localhost couchdb[14392]: Starting database server: couchdb.  
 Aug 20 23:07:30 localhost systemd[1]: Started LSB: Apache CouchDB init script.  
   

very easy, you can quickly check the version of couchdb that is running. just put the following link into the browser.

http://127.0.0.1:5984/

I have version 1.4.0 running. Let's create a database. If you have terminal ready, you can copy and paste below and check the database created.

 user@localhost:~$ curl -X PUT http://127.0.0.1:5984/wiki  
 {"ok":true}  
 user@localhost:~$ curl -X PUT http://127.0.0.1:5984/wiki  
 {"error":"file_exists","reason":"The database could not be created, the file already exists."}  
 user@localhost:~$ curl http://127.0.0.1:5984/wiki  
 {"db_name":"wiki","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":79,"data_size":0,"instance_start_time":"1440083544219325","disk_format_version":6,"committed_update_seq":0}  
 user@localhost:~$ curl -X GET http://127.0.0.1:5984/_all_dbs  
 ["_replicator","_users","wiki"]  

couchdb has json output, key value output and error handling is pretty good. Very speedy too! okay now, let's try on crud on couchdb and we will do that using curl.

 user@localhost:~$ curl -X POST -H "Content-Type: application/json" --data '{ "text" : "Wikipedia on CouchDB", "rating": 5 }' http://127.0.0.1:5984/wiki  
 {"ok":true,"id":"4c6a6dce960e16aba7e50d02c9001241","rev":"1-80fd6f7aeb55c83c8999b4613843af5d"}  
   
 user@localhost:~$ curl -X GET -H "Content-Type: application/json" http://127.0.0.1:5984/wiki/4c6a6dce960e16aba7e50d02c9001241  
 {"_id":"4c6a6dce960e16aba7e50d02c9001241","_rev":"1-80fd6f7aeb55c83c8999b4613843af5d","text":"Wikipedia on CouchDB","rating":5}  

first, we create a new document by http post with data in json format. Then using the id that couchdb generated, we get that document data back.

 user@localhost:~$ curl -X PUT -H "Content-Type: application/json" --data '{ "text" : "Wikipedia on CouchDB", "rating": 6, "_rev": "1-80fd6f7aeb55c83c8999b4613843af5d" }' http://127.0.0.1:5984/wiki/4c6a6dce960e16aba7e50d02c9001241  
 {"ok":true,"id":"4c6a6dce960e16aba7e50d02c9001241","rev":"2-b7248b6af9b6efcea5a8fe8cc299a85c"}  
 user@localhost:~$ curl -X GET -H "Content-Type: application/json" http://127.0.0.1:5984/wiki/4c6a6dce960e16aba7e50d02c9001241  
 {"_id":"4c6a6dce960e16aba7e50d02c9001241","_rev":"2-b7248b6af9b6efcea5a8fe8cc299a85c","text":"Wikipedia on CouchDB","rating":6}  
 user@localhost:~$ curl -X PUT -H "Content-Type: application/json" --data '{ "views" : 10, "_rev": "2-b7248b6af9b6efcea5a8fe8cc299a85c" }' http://127.0.0.1:5984/wiki/4c6a6dce960e16aba7e50d02c9001241  
 {"ok":true,"id":"4c6a6dce960e16aba7e50d02c9001241","rev":"3-9d1c59138f909760f9de6e5ce63c3a4e"}  
 user@localhost:~$ curl -X GET -H "Content-Type: application/json" http://127.0.0.1:5984/wiki/4c6a6dce960e16aba7e50d02c9001241  
 {"_id":"4c6a6dce960e16aba7e50d02c9001241","_rev":"3-9d1c59138f909760f9de6e5ce63c3a4e","views":10}  

As you can read above, we update this document twice, first, we update by increase rating to 6 and appending _rev using a unique id generated from couchdb. If this update is a success, notice that rev is increase by 1? note 2- on rev. On the second update, we update the view to 10 but essentially in couchdb, everything in this doc is wipe and insert a new key called views. So note if you want to update a document, you should send in all existing value before do the update.

Finally, now we delete this document. Very easy, see below.

 user@localhost:~$ curl -X DELETE -H "Content-Type: application/json" http://127.0.0.1:5984/wiki/4c6a6dce960e16aba7e50d02c9001241?rev=3-9d1c59138f909760f9de6e5ce63c3a4e  
 {"ok":true,"id":"4c6a6dce960e16aba7e50d02c9001241","rev":"4-aa379357deff42739d2fc77aea38dde1"}  
 user@localhost:~$ curl -X GET -H "Content-Type: application/json" http://127.0.0.1:5984/wiki/4c6a6dce960e16aba7e50d02c9001241  
 {"error":"not_found","reason":"deleted"}  
 user@localhost:~$ curl -X DELETE http://127.0.0.1:5984/wiki  
 {"ok":true}  
 user@localhost:~$ curl -X GET http://127.0.0.1:5984/_all_dbs  
 ["_replicator","_users"]  

couchdb is very good and efficient of handling document, it certainly definitely earn it spot for beginner to dwelve deeper of its capability. If you think so, take a look at this link. That's it for today, have fun learning!

Saturday, November 7, 2015

Apache accumulo first learning experience


Today we will take a look into another big data technology. Apache accumulo is the topic for today. First, what is accumulo?

Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. Other notable improvements and feature are outlined here.
Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, Hypertable, and Cassandra. Accumulo began its development in 2008 and joined the Apache community in 2011.

In this article, as always, we will setup the infrastructure. I reference this article with the following environment.

  • 64bit arch
  • open jdk 1.7/1.8
  • zookeeper-3.4.6
  • hadoop-2.6.1
  • accumulo-1.7.0
  • openssh 
  • rsync
  • debian sid

As accumulo is java based project, you must installed and configured java. Get latest java 1.7 or 1.8 as of this writing. After java is installed you need to export JAVA_HOME in your bash configuration file, .bashrc with this line export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55

Then you need to source the new .bashrc. . .bashrc is sufficient. For ssh and rsync, you can use apt-get package manager as it is easy. What's important is that, you should enable public and private key in your user configuration ssh directory. 

You can create two directories, $HOME/Downloads and $HOME/Installs respectively. It's pretty intuitive, the downloads directory is for the package downloaded and the install is the working directory after the compress packages are downloaded.


Download the above packages into the $HOME/Downloads directory and extracted into $HOME/Installs. First, let's configure apache hadoop.

 $ vim $HOME/Installs/hadoop-2.6.1/etc/hadoop/hadoop-env.sh  
 $ # uncomment in the file above export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_55  
 $ vim $HOME/Installs/hadoop-2.6.1/etc/hadoop/core-site.xml  
 $ cat $HOME/Installs/hadoop-2.6.1/etc/hadoop/core-site.xml  
 <?xml version="1.0" encoding="UTF-8"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <configuration>  
   <property>  
     <name>fs.defaultFS</name>  
     <value>hdfs://localhost:9000</value>  
   </property>  
 </configuration>  
 $ vim $HOME/Installs/hadoop-2.6.1/etc/hadoop/hdfs-site.xml  
 $ cat $HOME/Installs/hadoop-2.6.1/etc/hadoop/hdfs-site.xml  
 <?xml version="1.0" encoding="UTF-8"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <configuration>  
   <property>  
     <name>dfs.replication</name>  
     <value>1</value>  
   </property>  
   <property>  
     <name>dfs.name.dir</name>  
     <value>hdfs_storage/name</value>  
   </property>  
   <property>  
     <name>dfs.data.dir</name>  
     <value>hdfs_storage/data</value>  
   </property>  
 </configuration>  
 $ vim $HOME/Installs/hadoop-2.6.1/etc/hadoop/mapred-site.xml  
 $ cat $HOME/Installs/hadoop-2.6.1/etc/hadoop/mapred-site.xml  
 <?xml version="1.0"?>  
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
 <configuration>  
    <property>  
      <name>mapred.job.tracker</name>  
      <value>localhost:9001</value>  
    </property>  
 </configuration>  
 $ cd $HOME/Installs/hadoop-2.6.1/  
 $ $HOME/Installs/hadoop-2.6.1/bin/hdfs namenode -format  
 $ $HOME/Installs/hadoop-2.6.1/sbin/start-dfs.sh  

As you can read above, we specify the java home for hadoop and then we configure hadoop to run on port 9000, so make sure this port is free for hadoop to use. Then we format the hadoop namenode and start the hadoop.

Next we will configure zookeeper.

 $ cp $HOME/Installs/zookeeper-3.4.6/conf/zoo_sample.cfg $HOME/Installs/zookeeper-3.4.6/conf/zoo.cfg  
 $ $HOME/Installs/zookeeper-3.4.6/bin/zkServer.sh start  

Pretty simple, get the default config file and start the services. Last steps is the apache accumulo.

 $ cp $HOME/Installs/accumulo-1.7.0/conf/examples/512MB/standalone/* $HOME/Installs/accumulo-1.7.0/conf/  
 $ vim $HOME/.bashrc  
 $ tail -2 $HOME/.bashrc  
 export HADOOP_HOME=$HOME/Installs/hadoop-2.6.1/  
 export ZOOKEEPER_HOME=$HOME/Installs/zookeeper-3.4.6/  
 $ . $HOME/.bashrc  
 $ vim $HOME/Installs/accumulo-1.7.0/conf/accumulo-env.sh  
 $ # SET ACCUMULO_MONITOR_BIND_ALL to true.  
 $ vim $HOME/Installs/accumulo-1.7.0/conf/accumulo-site.xml  
 $ # in file $HOME/Installs/accumulo-1.7.0/conf/accumulo-site.xml  
 <property>  
   <name>instance.volumes</name>  
   <value>hdfs://localhost:9000/accumulo</value>  
 </property>  
 $ # in file $HOME/Installs/accumulo-1.7.0/conf/accumulo-site.xml   
   <name>instance.secret</name>  
   <value>mysecret</value>  
 $ # in file $HOME/Installs/accumulo-1.7.0/conf/accumulo-site.xml    
  <property>  
   <name>trace.token.property.password</name>  
   <value>my scret</value>  
  </property>  

So we have configure the setting for accumulo in .bashrc and some properties settings in accumulo-env.sh and accumulo-site.xml . Next, we will initialize accumulo and start it using the password we specify previously.

 $ $HOME/Installs/accumulo-1.7.0/bin/accumulo init  
 $ # give a instance name.  
 $ # type in the password as specify in trace.token.property.password.  
 $ $HOME/Installs/accumulo-1.7.0/bin/start-all.sh  

That's it! If you want to do CRUD in accumulo, I suggest you go with this official documentation.