Showing posts with label yaml. Show all posts
Showing posts with label yaml. Show all posts

Thursday, December 26, 2013

cassandra 2.0 catch 101 – part5

Our cluster status.
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.33.31 147.34 KB 512 34.3% f13c9390-4c52-4fc2-afa8-f7f74e7fd710 rack1
UN 192.168.33.32 123.31 KB 512 33.2% bc7fcfcc-9a30-4929-bf24-35ec770856a3 rack1
UN 192.168.33.33 160.74 KB 256 16.5% 999d58bf-2b31-49ff-a452-6f0d01598429 rack1
UN 192.168.33.34 137.39 KB 256 16.0% 222796e9-d330-469a-8dcd-3f3581c9d795 rack1

So it is pretty interesting that a node can own different amount of cluster load based on the tokens specified. Because in our cluster environment, we have different types of hardware and for instance, mine is pretty old. With default settings, a default of 512 tokens is assigned.

The setting for tokens can be found in cassandra.yaml
# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
# that this node will store. You probably want all nodes to have the same number
# of tokens assuming they have equal hardware capability.
#
# If you leave this unspecified, Cassandra will use the default of 1 token for legacy compatibility,
# and will use the initial_token as described below.
#
# Specifying initial_token will override this setting.
#
# If you already have a cluster with 1 token per node, and wish to migrate to
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
num_tokens: 512

Just note that this setting is one time only when a join the cluster. Meaning that the first time it join, 512 tokens will be assign for this node and this tokens will be store in the keyspace system and in column family local. Even if you removed the data directory and start all over, the data will be stream from other nodes, hence, the information is still persists. If you really want to change at later day, it is possible, you may want to treat this node as dead through decommission, stop cassandra instance. Change the num_tokens configuration in yaml file and then start cassandra instance back. You may want to think about this because decommission stream data to other servers and it may create load in system and also network traffic.

Monday, December 23, 2013

Elasticsearch index slow log for search and indexing

Today, we are going to learn on the logging for elasticsearch for its search and index. In elasticsearch config file, elasticsearch.yml, it should have a configuration such as below:
################################## Slow Log ##################################

# Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

So with this example, I have enable tracing for search query and search fetch with 500ms and 200ms respectively. A search in elasticsearch consists of query time and fetch time. Hence the two configuration for search. Meanwhile, logging for elasticsearch index is also enable with a threshold of 500ms.

With these configuration sets, and if your indexing or search exceed that threshold,
an entry will be log into a file. The logging file should be located in path.log
that is set in elasticsearch.yml.

So what does the number really means? Excerpts from elasticsearch official documentation

The logging is done on the shard level scope, meaning the executionof a search request within a specific shard. It does not encompass the whole search request, which can be broadcast to several shards in order to execute. Some of the benefits of shard level logging is the association of the actual execution on the specific machine, compared with request level.


 

All settings are index level settings (and each index can have different values for it), and can be changed in runtime using the indexupdate settings API.


 

... and, I have tried updating the index setting via a simple tool I've made earlier on. But the idea is same, you just need to http get by putting the variable into the index setting. You can find more information here The key for the configuration is available at ShardSlowLogSearchService.java class.
[jason@node1 bin]$ ./indices-setting.sh set search.slowlog.threshold.query.trace 500
{
"ok" : true,
"acknowledged" : true
}

[2013-12-23 12:31:12,758][TRACE][index.search.slowlog.query] [node1] [index_test][146] took[1s], took_millis[1026], types[foo,bar], stats[], search_type[QUERY_THEN_FETCH], total_shards[90], source[{"size":80,"timeout":10000,"query":{"filtered":{"query":{"query_string":{"query":"maxis*","default_operator":"and"}},"filter":{"and":{"filters":[{"query":{"match":{"site":{"query":"www.google.com","type":"boolean"}}}},{"range":{"unixtimestamp":{"from":null,"to":1387825199000,"include_lower":true,"include_upper":true}}}]}}}},"filter":{"query":{"match":{"site":{"query":"www.google.com","type":"boolean"}}}},"sort":[{"unixtimestamp":{"order":"desc"}}]}], extra_source[],

With this example, it has exceed the threshold set at 500ms which it ran for 1 second.

As for indexing, the fundamental concept is the same, so we won't elaborate in this article and that should leave you as a tutorial. :-)