Tuesday, December 31, 2013

Happy new year!

My software pick of 2013
* So the year is coming to the end, time to evaluate, when I look back at
2013, to see what software I used the most, compare to 2012, one piece of software
come to mind vimwiki, a personal wiki system for vim, I know it don't sound
like much, but it is just so much more, the wiki part refer to the way it use
links, in the previous version vimwiki used CamelBack to make a link, in the
latest all words can be used as a link, and you can link to files on your pc
or on the web.

from the homepage:http://code.google.com/p/vimwiki/
* With vimwiki you can
* organize notes and ideas
* manage todo-lists
* write documentation

* I use it for many purpose, note, blogs, todo, appointment, projects etc.
Before vimwiki I used zim it's a nice gui wiki, but the great think with
vimwiki is all the time I used to learn navigate in vim pays of, so I jump
around in my wiki-pages with a speed no gui tool can match, and all is kept
in text files, so the footprint on the harddrive is very small.

* I been using vimwiki for most of 2013 and my wiki-folder is only 392K, so
it's fast even if you put it on dropbox or local server, vimwiki has many
options to fit your needs, look up the vimwiki help file to learn more, a nice
thing is that you can have as many wiki's as you need, like one for work and
one private, one for a single project. I can't encourage you enough to read the
help file. So you can get the most out of this handy tool. I use it all day
every day, the speed and power from vim, in a wiki-tool, don't get better
then this.

* Thanks to habamax for this wonderful plugin.

* What is your software pick of 2013?

Cassandra cluster jmx metrics inspection to decide if cluster expansion is justifiable

One of the important decision during managing the clusters is to determine when cluster capacity should expand. To maintain a cluster in optimal performance will give the applications working nicely and most importantly, it give confidence to the people.

So, to answer question like, how do I determine if my cluster is at bottleneck? To answer this type of question, you will need to have the measuring tools ready and measure over time. Meaning that you need to display statistics in graphical form and with the history, it should give an indication of the cluster performance.

Because the topic will grow huge, hence, we will focus on a specific metric. This article gonna inspect the metric exposed by the jmx beans. In order to inspect the jmx metrics, you will need a jmx client. There is a gui jmx client that comes with the jdk, that is jconsole. Because nature of this article, I would suggest you go for cli jmx client, for example jmxterm. You can read introduction of jmxterm here.

There are many metrics exposed by cassandra jmx beans. But we will focus on bean org.apache.cassandra.db:type=CompactionManager.

If you are using jmxterm, you can read the output below:
$ cat test.script 
open localhost:7199
bean org.apache.cassandra.db:type=CompactionManager
get PendingTasks
$ java -jar jmxterm-1.0-alpha-4-uber.jar -i test.script
Welcome to JMX terminal. Type "help" for available commands.
#Connection to localhost:7199 is opened
#bean is set to org.apache.cassandra.db:type=CompactionManager
#mbean = org.apache.cassandra.db:type=CompactionManager:
PendingTasks = 0;

So if you plot PendingTasks in a graph over time with a periodic interval, it should give insight to your cluster performance. You can also plot the statistics output from nodetool tpstats. I would suggest also, you plot Message type dropped as those metrics indicate over time that the performance is impacted. If you have a stock cassandra settings, you will probably want to fine tune to your node at this point after this investigation and analysis on the graph.

There is no best strategies, as mentioned earlier, you need experience and there are many other metrics measuring tools, example sar , iostats, top, your application measurement. So it take some times to even master all of these but it is crucial if you managed a production cluster.

Thursday, December 26, 2013

cassandra 2.0 catch 101 – part5

Our cluster status.
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.33.31 147.34 KB 512 34.3% f13c9390-4c52-4fc2-afa8-f7f74e7fd710 rack1
UN 192.168.33.32 123.31 KB 512 33.2% bc7fcfcc-9a30-4929-bf24-35ec770856a3 rack1
UN 192.168.33.33 160.74 KB 256 16.5% 999d58bf-2b31-49ff-a452-6f0d01598429 rack1
UN 192.168.33.34 137.39 KB 256 16.0% 222796e9-d330-469a-8dcd-3f3581c9d795 rack1

So it is pretty interesting that a node can own different amount of cluster load based on the tokens specified. Because in our cluster environment, we have different types of hardware and for instance, mine is pretty old. With default settings, a default of 512 tokens is assigned.

The setting for tokens can be found in cassandra.yaml
# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
# that this node will store. You probably want all nodes to have the same number
# of tokens assuming they have equal hardware capability.
#
# If you leave this unspecified, Cassandra will use the default of 1 token for legacy compatibility,
# and will use the initial_token as described below.
#
# Specifying initial_token will override this setting.
#
# If you already have a cluster with 1 token per node, and wish to migrate to
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
num_tokens: 512

Just note that this setting is one time only when a join the cluster. Meaning that the first time it join, 512 tokens will be assign for this node and this tokens will be store in the keyspace system and in column family local. Even if you removed the data directory and start all over, the data will be stream from other nodes, hence, the information is still persists. If you really want to change at later day, it is possible, you may want to treat this node as dead through decommission, stop cassandra instance. Change the num_tokens configuration in yaml file and then start cassandra instance back. You may want to think about this because decommission stream data to other servers and it may create load in system and also network traffic.

Wednesday, December 25, 2013

Lightweight Java Game Library

Since childhood, gaming has been one of my favorite activities. If you are from 80s, Supermario should sound familiar to you. =) 30 years had passed, gaming development improve tremendously over the period.

In this article, we are going to explore gaming development. Most of the gaming is written in low level languages, example C and thus, it is very complicated. This certainly introduced steep learning curve if you are a beginner. Hence, we will choose a simple startup to learn about gaming development. A example of library that can be use is Lightweight Java Game Library or its acronym LWJGL.

What is Lightweight Java Game Library?

The Lightweight Java Game Library (LWJGL) is a solution aimed directly at professional and amateur Java programmers alike to enable commercial quality games to be written in Java. LWJGL provides developers access to high performance crossplatform libraries such as OpenGL (Open Graphics Library), OpenCL (Open Computing Language) and OpenAL (Open Audio Library) allowing for state of the art 3D games and 3D sound. Additionally LWJGL provides access to controllers such as Gamepads, Steering wheel and Joysticks. All in a simple and straight forward API.

Because nature of this library deal with graphic display, hence the hardware display driver must be setup correctly. For me, my workstation is using ati radeon, and using xserver-xorg-video-radeon and enable 3D acceleration with package libgl1-mesa-dri. We won't delve deep into graphic driver installation and configuration since our focus here is the gaming development. You can check if your drive is setup properly by running glxgears via a terminal. If a windows popup with three gears spinning, your driver install and setup should be fine to continue for this coding tutorial.

In the official wiki, it is well written and documented to get you started. With this, I have setup my eclipse environment in debian sid. The library needed to should be setup in the project build path so when you run your application, the library is detected. Because I'm running linux, the native library location is pointed to lwjgl-2.9.1/native/linux. These two library must be configured before any development begin. If you noticed, I've setup the source as well, it will be convienient to read the code if you need to be sure later down the road during coding phase.



There are many tutorials to pick from, as a start, I just pick the basics - LWJGL Basics 1 (The Display).  The source code should be in the link, and it is incredibly easy to create the display with few lines of codes and I got that window display with just initial try. Very impressive and promising.



It is pretty impressive what this library can do. There are many examples that come in the library and one of it is an example game. Just execute
java -cp .:res:jar/lwjgl.jar:jar/lwjgl_test.jar:jar/lwjgl_util.jar:jar/jinput.jar: -Djava.library.path=native/linux org.lwjgl.examples.spaceinvaders.Game

if you are running linux. Run fine in my environment and played the bundle game; amazing. Maybe in my next article, I'm gonna try to even complete this .

Monday, December 23, 2013

Elasticsearch index slow log for search and indexing

Today, we are going to learn on the logging for elasticsearch for its search and index. In elasticsearch config file, elasticsearch.yml, it should have a configuration such as below:
################################## Slow Log ##################################

# Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

So with this example, I have enable tracing for search query and search fetch with 500ms and 200ms respectively. A search in elasticsearch consists of query time and fetch time. Hence the two configuration for search. Meanwhile, logging for elasticsearch index is also enable with a threshold of 500ms.

With these configuration sets, and if your indexing or search exceed that threshold,
an entry will be log into a file. The logging file should be located in path.log
that is set in elasticsearch.yml.

So what does the number really means? Excerpts from elasticsearch official documentation

The logging is done on the shard level scope, meaning the executionof a search request within a specific shard. It does not encompass the whole search request, which can be broadcast to several shards in order to execute. Some of the benefits of shard level logging is the association of the actual execution on the specific machine, compared with request level.


 

All settings are index level settings (and each index can have different values for it), and can be changed in runtime using the indexupdate settings API.


 

... and, I have tried updating the index setting via a simple tool I've made earlier on. But the idea is same, you just need to http get by putting the variable into the index setting. You can find more information here The key for the configuration is available at ShardSlowLogSearchService.java class.
[jason@node1 bin]$ ./indices-setting.sh set search.slowlog.threshold.query.trace 500
{
"ok" : true,
"acknowledged" : true
}

[2013-12-23 12:31:12,758][TRACE][index.search.slowlog.query] [node1] [index_test][146] took[1s], took_millis[1026], types[foo,bar], stats[], search_type[QUERY_THEN_FETCH], total_shards[90], source[{"size":80,"timeout":10000,"query":{"filtered":{"query":{"query_string":{"query":"maxis*","default_operator":"and"}},"filter":{"and":{"filters":[{"query":{"match":{"site":{"query":"www.google.com","type":"boolean"}}}},{"range":{"unixtimestamp":{"from":null,"to":1387825199000,"include_lower":true,"include_upper":true}}}]}}}},"filter":{"query":{"match":{"site":{"query":"www.google.com","type":"boolean"}}}},"sort":[{"unixtimestamp":{"order":"desc"}}]}], extra_source[],

With this example, it has exceed the threshold set at 500ms which it ran for 1 second.

As for indexing, the fundamental concept is the same, so we won't elaborate in this article and that should leave you as a tutorial. :-)

Sunday, December 22, 2013

Learning Jmxterm

If you have been using jconsole to inspect an application perform under jvm, you might want to look for alternative in command line form. In this article, we are going to spend sometime to learn on Jmxterm . So what is a Jmxterm? Jmxterm is a command line based interactive JMX client. It's designed to allow user to access a Java MBean server from command line without graphical environment. In another word, it's a command line based jconsole.

To get started, you will of cause, needed JDK installed and an java application that you want to inspect. To start using it , go to http://wiki.cyclopsgroup.org/jmxterm/download and start to download. You should have a jmxterm-[version].jar file.

So, I'm gonna demonstrate on how to use Jmxterm by showing with examples of a terminal output.
$ java -jar jmxterm-1.0-alpha-4-uber.jar
Welcome to JMX terminal. Type "help" for available commands.
$>help;
#IllegalArgumentException: Command help; isn't valid, run help to see available commands
$>help
#following commands are available to use:
about - Display about page
bean - Display or set current selected MBean.
beans - List available beans under a domain or all domains
bye - Terminate console and exit
close - Close current JMX connection
domain - Display or set current selected domain.
domains - List all available domain names
exit - Terminate console and exit
get - Get value of MBean attribute(s)
help - Display available commands or usage of a command
info - Display detail information about an MBean
jvms - List all running local JVM processes
open - Open JMX session or display current connection
option - Set options for command session
quit - Terminate console and exit
run - Invoke an MBean operation
set - Set value of an MBean attribute
$> bean
null
$>beans
#IllegalStateException: Connection isn't open yet. Run open command to open a connection
$>domains
#following domains are available
#IllegalStateException: Connection isn't open yet. Run open command to open a connection
$>jvms
5552 ( ) - jmxterm-1.0-alpha-4-uber.jar
$>help open
usage: open [-h] [-p <val>] [-u <val>]
Open JMX session or display current connection
-h,--help Display usage
-p,--password <val> Password for user/password authentication
-u,--user <val> User name for user/password authentication
Without argument this command display current connection. URL can be a <PID>,
<hostname>:<port> or full qualified JMX service URL. For example
open localhost:9991,
open jmx:service:...
$>open 192.168.0.2:7199
#RuntimeIOException: Runtime IO exception: Connection refused to host: 127.0.0.1; nested exception is:
java.net.ConnectException: Connection refused
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean
null
$>bean org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies
#bean is set to org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies
$>info
#mbean = org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies
#class name = org.apache.cassandra.db.ColumnFamilyStore
# attributes
%0 - AutoCompactionDisabled (boolean, r)
%1 - BloomFilterDiskSpaceUsed (long, r)
%2 - BloomFilterFalsePositives (long, r)
%3 - BloomFilterFalseRatio (double, r)
%4 - BuiltIndexes (java.util.List, r)
%5 - ColumnFamilyName (java.lang.String, r)
%6 - CompactionStrategyClass (java.lang.String, rw)
%7 - CompressionParameters (java.util.Map, rw)
%8 - CompressionRatio (double, r)
%9 - CrcCheckChance (double, w)
%10 - DroppableTombstoneRatio (double, r)
%11 - EstimatedColumnCountHistogram ([J, r)
%12 - EstimatedRowSizeHistogram ([J, r)
%13 - LifetimeReadLatencyHistogramMicros ([J, r)
%14 - LifetimeWriteLatencyHistogramMicros ([J, r)
%15 - LiveCellsPerSlice (double, r)
%16 - LiveDiskSpaceUsed (long, r)
%17 - LiveSSTableCount (int, r)
%18 - MaxRowSize (long, r)
%19 - MaximumCompactionThreshold (int, rw)
%20 - MeanRowSize (long, r)
%21 - MemtableColumnsCount (long, r)
%22 - MemtableDataSize (long, r)
%23 - MemtableSwitchCount (int, r)
%24 - MinRowSize (long, r)
%25 - MinimumCompactionThreshold (int, rw)
%26 - PendingTasks (int, r)
%27 - ReadCount (long, r)
%28 - RecentBloomFilterFalsePositives (long, r)
%29 - RecentBloomFilterFalseRatio (double, r)
%30 - RecentReadLatencyHistogramMicros ([J, r)
%31 - RecentReadLatencyMicros (double, r)
%32 - RecentSSTablesPerReadHistogram ([J, r)
%33 - RecentWriteLatencyHistogramMicros ([J, r)
%34 - RecentWriteLatencyMicros (double, r)
%35 - SSTableCountPerLevel ([I, r)
%36 - SSTablesPerReadHistogram ([J, r)
%37 - TombstonesPerSlice (double, r)
%38 - TotalDiskSpaceUsed (long, r)
%39 - TotalReadLatencyMicros (long, r)
%40 - TotalWriteLatencyMicros (long, r)
%41 - UnleveledSSTables (int, r)
%42 - WriteCount (long, r)
# operations
%0 - long estimateKeys()
%1 - void forceMajorCompaction()
%2 - java.util.List getSSTablesForKey(java.lang.String p1)
%3 - void loadNewSSTables()
%4 - void setCompactionThresholds(int p1,int p2)
#there's no notifications
$>get WriteCount
#mbean = org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies:
WriteCount = 0;
$>get TotalDiskSpaceUsed
#mbean = org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies:
TotalDiskSpaceUsed = 9437;

So, a brief explanation on what I have just done. To start, you will need to run the Jmxterm from a terminal. To understand what commands it has and what can you use for, simply issued command help. In order to inspect, you will need to open a connection to the jvm. Once a connection is established, you get do all sort of operations and in this example, I'm connected to cassandra and inspect on its bean org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies and get the WriteCount and TotalDiskSpaceUsed statistics.

That's all folks! Hope you get an idea on what it does and where it is applicable to you.

Saturday, December 21, 2013

vifm a true gem

filemanager goes vim


  * A ncurse file manager, with vim like UI for a vim user you will feel right
at home, command like the dd delete line just like in vim, and move to other
window and type p past the line in the "clipboard" to it. And the normal move
command like the hjkl works as expected jk down/up item in list and hl
up/down directory. like in vim most settings are made in it's rc file, the
vifmrc is in ~/.vifm to get an idea about all the option you have in vifm go
to http://vifm.sourceforge.net/docs.html#OPTIONS it's amazing what you can do
with vifm, if you like me has been using vim for sometime this filemanager is
a true gem. And much like vim, the options are "endless" browse on the project
homepage or at github.com. You will find the sourcecode/setup and help to make your own
setup. I often look at the config/setup on github to get idea's and mabye
improve my setup.

* On http://vifm.sourceforge.net/docs.html is the documentation for vifm.

* I must say after using vifm for some time, and done some github'ing made my
own vifmrc some nice filetype setting and hard-bookmarks, it's like vim the
more you use it, and add to your rc file the better and faster it get.

  * So thanks to ksteen & xaizek for this power-tool.
Don't look as much, but it is ;)

Friday, December 20, 2013

A maven introduction.

So recently, I have been working on an opensource project and stumble upon maven. So I'm all ant guy (with ant background), and guess that to use maven should not be that difficult to start using it.

When you see a file in the java project, pom.xml, this should tell you that it is a maven configuration file. So for instance, in a as simple java project, it would look like
+-------------+
|project home |
+--+----------+
|
| +-------+
+------+ src |
| +---+---+
| | +------+
| +-----+ main |
| | +--+---+
| | | +----------+
| | +----+ java |
| | | +----------+
| | | +----------+
| | +----+ resources|
| | +----------+
| | +------+
| +-----+ test |
| +--+---+
| | +----------+
| +----+ java |
| | +----------+
| | +----------+
| +----+ resources|
| +--------+ +----------+
+-----+ target |
| +--------+
| +--------+
+-----+ pom.xml|
+--------+

The most basic command that you ever gonna use and use it very often would probably

mvn package

With above command, mvn will compile your class, run any tests and package the deliverable code and resources into target/my-app-1.0.jar . If mvn produced this jar, this should be enough and that the developer should be able to concentrate the java project.

But if you are adventurous and want to know more about maven, continue to read on. There are a few maven phases which you can issue the command. The following is the standard maven lifecycle with an ordered phases.

  • process-resources

  • compile

  • process-test-resources

  • test-compile

  • test

  • package

  • install

  • deploy


So in order to satisfy the library dependencies of your project, you should specify coordinate of the lib that it depends into pom.xml. You can use  this site to search for the libraries it depends.

I hope this answer a simple start up to use maven to assist in your java project. If you reach here and have further question, this link  and this link .

Changing ElasticSearch logging level by updating cluster setting.

In this article, we are going to learn how to update logging for all the in the elasticsearch cluster. Because logging is crucial in understanding the system behaviour, so from time to time, change the logging level in elasticsearch via elasticsearch.yml and restart elasticsearch instance so that the logging level will be pick up. Unfortunately restart on the live production will take sometime (because of the shards recovery) and this could not be efficient.

Luckily, there is a setting in the cluster which allow the logging level to be change on the fly.

So with that, if you want to understand the what's happening in the cluster node, you can change the logging

e.g.
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"logger.cluster.service" : "DEBUG"
}
}'

and tail the elasticsearch log, you should see some log started appearing. Because logging is managed by the class NodeSettingsService, so you should read into the elasticsearch package that initialized with this class. Example elasticsearch package, cluster.service, cluster.routing.allocation.allocator, indices.ttl.IndicesTTLService, etc. Note that the package prefix, org.elastic is not needed when the setting is updated.

If you want more information, this link would provide better help.

Friday, December 6, 2013

cassandra 2.0 catch 101 – part4

It has been a while since I last post, mainly was due to the abundane works. :-(  In this article, I'm gonna share with the lesson learned on cassandra 2.0.2 learned using cqlsh 4.1.0.

Last we had to remove all the files in /var/lib/cassandra/ simply because somewhere it break when we upgraded from cassandra 2.0.0 to 2.0.2 and everybody in the teams just do not have the time to goes into details. So since this is just4fun cluster, we agreed to removed the dir /var/lib/cassandra/ and start the cluster using cassandra 2.0.2.

In order to better understand cassandra, we take a detail look at alter table. But before that, let's create a new keyspace and table.
cqlsh> CREATE KEYSPACE jw_schema1 WITH replication = {'class':'SimpleStrategy', 'replication_factor':3};
cqlsh>

and the correspondance cassandra system.log

INFO [Thrift:7] 2013-12-06 16:17:21,902 MigrationManager.java (line 217) Create new Keyspace: jw_schema1, rep strategy:SimpleStrategy{}, strategy_options: {replication_factor=3}, durable_writes: true

cassandra 2.0 catch 101 – part3

So many of us are from mysql / postgres background and we quickly interface to the database using the command line. In order to comment in cassandra cql, it is different than in sql. Read the example below
cqlsh:jw_schema1> #select * from users;
Invalid syntax at line 1, char 1
#select * from users;
^
cqlsh:jw_schema1> --select * from users;
cqlsh:jw_schema1> -select * from users;
Bad Request: line 1:0 no viable alternative at input '-'
cqlsh:jw_schema1> -- select * from users;
cqlsh:jw_schema1>

So as you can see, the hash glyph do not work in cqlsh, you need to use double dashes in front of the comment you want to made.

Voila! =)