Friday, December 6, 2013

cassandra 2.0 catch 101 – part4

It has been a while since I last post, mainly was due to the abundane works. :-(  In this article, I'm gonna share with the lesson learned on cassandra 2.0.2 learned using cqlsh 4.1.0.

Last we had to remove all the files in /var/lib/cassandra/ simply because somewhere it break when we upgraded from cassandra 2.0.0 to 2.0.2 and everybody in the teams just do not have the time to goes into details. So since this is just4fun cluster, we agreed to removed the dir /var/lib/cassandra/ and start the cluster using cassandra 2.0.2.

In order to better understand cassandra, we take a detail look at alter table. But before that, let's create a new keyspace and table.
cqlsh> CREATE KEYSPACE jw_schema1 WITH replication = {'class':'SimpleStrategy', 'replication_factor':3};
cqlsh>

and the correspondance cassandra system.log

INFO [Thrift:7] 2013-12-06 16:17:21,902 MigrationManager.java (line 217) Create new Keyspace: jw_schema1, rep strategy:SimpleStrategy{}, strategy_options: {replication_factor=3}, durable_writes: true

cassandra 2.0 catch 101 – part3

So many of us are from mysql / postgres background and we quickly interface to the database using the command line. In order to comment in cassandra cql, it is different than in sql. Read the example below
cqlsh:jw_schema1> #select * from users;
Invalid syntax at line 1, char 1
#select * from users;
^
cqlsh:jw_schema1> --select * from users;
cqlsh:jw_schema1> -select * from users;
Bad Request: line 1:0 no viable alternative at input '-'
cqlsh:jw_schema1> -- select * from users;
cqlsh:jw_schema1>

So as you can see, the hash glyph do not work in cqlsh, you need to use double dashes in front of the comment you want to made.

Voila! =)

Saturday, November 30, 2013

how does read performance gains when in compression?

Read the following interesting discussion in the cassandra mailing list, and think very good explanation and would like to share out.

how does read performance gains when in compression?
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=index#cassandra/dml/dml_about_reads_c.html

Cite from Artur Kronenberg
The way I understand it is that compression gives you the advantage of having to use way less IO and rather use CPU. The bottleneck of reads is usually the IO time you need to read the data from disk. As a figure, we had about 25 reads/s reading from disk, while we get up to 3000 reads/s when we have all of it in cache. So having good compression reduces the amount you have to read from disk. Rather you may spend a little bit more time decompressing data, but this data will be in cache anyways so it won't matter.

Cite from Edward Capriolo
The big * in the explanation: Smaller file size footprint leads to better disk cache, however decompression adds work for the JVM to do and increases the churn of objects in the JVM. Additionally compression block sizes might be 4KB while for some use cases a small row may be 200bytes. This means that internally a large block might be decompressed to get at the row inside of it.

In many use cases compression is a performance win, but not necessarily in all cases. In particular if you are already doing JVM performance tuning issues to stop garbage collection pauses enabling compression could make performance worse.

Thursday, November 14, 2013

C++ "hello world" (test syntax highlight)

#include <iostream>
int main()
{
std::cout << "hello world" << std::endl;

return 0;
}

Wednesday, November 13, 2013

cassandra 2.0 catch 101 – part2

After playing playing around cassandra 2.0 for quite sometime and in this article, I'm gonna share with you a strange issue that encountered, unable to drop table no matter how.

I'm using the stress tools in cassandra package to create the table column family. It seem that the keyspaces and table created successfully. Following are the output.


Created keyspaces. Sleeping 1s for propagation.
total,interval_op_rate, interval_key_rate,latency,95th,99.9th,elapsed_time
..
..
..


So everything seem to created okay in cassandra.


cqlsh:system> desc keyspaces;

jw_schema1 system system_traces

cqlsh:system> use jw_schema1;
cqlsh:jw_schema1> desc tables;

Counter1 Counter3 Standard1 Super1 SuperCounter1

cqlsh:jw_schema1> desc table Counter1;

CREATE TABLE "Counter1" (
key blob,
column1 ascii,
value counter,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='NONE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={};

cqlsh:jw_schema1>




when selecting or dropping table in any tables within the keyspaces, things started to become wrong and cassandra server debug log show nothing wrong.
cqlsh:jw_schema1> select * from Counter1;
Bad Request: unconfigured columnfamily counter1
cqlsh:jw_schema1>

DEBUG [Thrift:105] 2013-11-13 20:55:29,050 CassandraServer.java (line 1932) execute_cql3_query
DEBUG [Thrift:105] 2013-11-13 20:55:29,050 Tracing.java (line 159) request complete

cqlsh:jw_schema1> drop table Counter1;
Bad Request: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.
cqlsh:jw_schema1>

DEBUG [Thrift:105] 2013-11-13 20:55:59,392 CassandraServer.java (line 1932) execute_cql3_query
DEBUG [Thrift:105] 2013-11-13 20:55:59,393 Tracing.java (line 159) request complete

and using the datastax java binary driver.
public void connect(String node) {
cluster = Cluster.builder().addContactPoint(node)
.addContactPoints("127.0.0.1")
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withReconnectionPolicy(new ConstantReconnectionPolicy(100L)).build();
session = cluster.connect("jw_schema1");

ExecutionInfo info = session.execute("DROP TABLE Counter1").getExecutionInfo();
}

 
Exception in thread "main" com.datastax.driver.core.exceptions.InvalidQueryException: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:187)
at com.datastax.driver.core.Session.execute(Session.java:126)
at com.datastax.driver.core.Session.execute(Session.java:77)
at foo.bar.main.SimpleClient.connect(SimpleClient.java:38)
at foo.bar.main.SimpleClient.main(SimpleClient.java:69)
Caused by: com.datastax.driver.core.exceptions.InvalidConfigurationInQueryException: Cannot drop non existing column family 'counter1' in keyspace 'jw_schema1'.
at com.datastax.driver.core.Responses$Error.asException(Responses.java:97)
at com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(ResultSetFuture.java:122)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:217)
at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:349)
at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:500)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:458)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:439)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:472)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:333)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

So I'm not sure what is gone wrong, but I'end up dropping the keyspace as a work around.
cqlsh:system> drop keyspace jw_schema1;
cqlsh:system>





work around
cqlsh:system> desc keyspaces;

TestKeyspace system system_traces

cqlsh:system> drop keyspace TestKeyspace;
Bad Request: Cannot drop non existing keyspace 'testkeyspace'.
cqlsh:system> drop keyspace "TestKeyspace";
cqlsh:system> desc keyspaces;

system system_traces

cqlsh:system>

 

Friday, October 18, 2013

How to generate murmur3 in cassandra2.0

Reading into this doc, got really curious on how the murmur3 hash value is generated.

So I dig at cassandra github, found this this class , it seem that, cassandra 2.0 generate the token for the primary key using this method hash3_x64_128. Below are the method to get it work.. just put this into any java class and see the token generated.

    

public static LongToken genToken(String rowKey) {
ByteBuffer key = ByteBufferUtil.bytes(rowKey);
long hash = MurmurHash.hash3_x64_128(key, key.position(), key.remaining(), 0)[0];
LongToken lk = new LongToken(normalize(hash));
return lk;
}

public static void main(String[] args) {
System.out.println(genToken("jim"));
}

private static long normalize(long v)
{
// We exclude the MINIMUM value; see getToken()
return v == Long.MIN_VALUE ? Long.MAX_VALUE : v;
}

 


with jim, it generated as 2680261686609811218. So that should be correct. Something extra, if you use nodetool to show the token ranges, e.g. nodetool -h localhost describering jw_schema1, you should get an idea with the token generated, the range on which nodes are responsible that hold the row data.

Allocate jvm heap more than 8GB for cassandra

What happen if you allocate jvm heap more than 8GB for cassandra instance? With my past experience, we allocate more than 16GB for the cassandra instance and it is still running fine. But occasionally we encounter performance issue when we increase more than 16GB to the heap. Google a little and found this doc

Excessive heap space size

DataStax recommends using the default heap space size for most use
cases. Exceeding this size can impair the Java virtual machine's
(JVM) ability to perform fluid garbage collections (GC). The
following table shows a comparison of heap space performances
reported by a Cassandra user:

Heap CPU utilization Queries per second Latency
40 GB 50% 750 1 second
8 GB 5% 8500 (not maxed out) 10 ms

For information on heap sizing, see Tuning Java resources.


As the benchmark indicate, the more heap you allocate, the higher the cpu usage is. Though the performance decrease is not linear but rather exponentially. So it is wise to keep the heap at 8GB or not more than 50% of that value. It is not deadly but it certainly decrease the performance of the cluster dramatically which would render it useless. If you encountered memory error in the log, in this situation, apart from other factors, it is better if you consider scale your cluster horizontally, that is adding more nodes to increase the capacity. But a quick workaround should you encounter memory error, the

So what happen really happen in the gc if high heap is allocated? well, excerpt from the guru,

..the concurrent mark/sweep phase runs concurrently with your
application. CMS will cause a stop-the-world full pause it it fails to
complete a CMS sweep in time and you hit the maximum heap size, but
unless that happens, CMS will run concurrently (though there are
stop-the-world pauses involved, that are typically very short, the
mark/sweep phase is concurrent).


Hence, if you really hit stop the world situation, this would render the node useless, because the node is too busy doing gc that, cassandra would not be able to perform.

http://www.mail-archive.com/user@cassandra.apache.org/msg17481.html
http://www.mail-archive.com/user@cassandra.apache.org/msg32312.html