Saturday, May 24, 2014

Load balancing policy in datastax java driver

Today we are going to explore LoadBalancingPolicy in datastax java driver for apache cassandra.

So what is load balancing policy in datastax java driver? From code description :

The policy that decides which Cassandra hosts to contact for each new query.

Two methods need to be implemented:

  • LoadBalancingPolicy.distance : returns the "distance" of an host for that balancing policy.

  • LoadBalancingPolicy.newQueryPlan: it is used for each query to find which host to query first, and which hosts to use as failover.


The LoadBalancingPolicy is a com.datastax.driver.core.Host.StateListener and is thus informed of hosts up/down events. For efficiency purposes, the policy is expected to exclude down hosts from query plans.

The default policy for java driver version 2.0.2, is TokenAwarePolicy() and with child policy DCAwareRoundRobinPolicy().

Below are a list of policies available in this version of driver.

RoundRobinPolicy 

This policy queries nodes in a round-robin fashion. For a given query, if an host fail, the next one (following the round-robin order) is tried, until all hosts have been tried. This policy is not datacenter aware and will include every known Cassandra host in its round robin algorithm. If you use multiple datacenter this will be inefficient and you will want to use the DCAwareRoundRobinPolicy load balancing policy instead.

DCAwareRoundRobinPolicy

This policy provides round-robin queries over the node of the local data center. It also includes in the query plans returned a configurable number of hosts in the remote data centers, but those are always tried after the local nodes. In other words, this policy guarantees that no host in a remote data center will be queried unless no host in the local data center can be reached.

If used with a single data center, this policy is equivalent to the RoundRobin policy, but its DC awareness incurs a slight overhead so the RoundRobin policy could be preferred to this policy in that case.

TokenAwarePolicy

This policy encapsulates another policy. The resulting policy works in the following way:

  • the distance method is inherited from the child policy.

  • the iterator return by the newQueryPlan method will first return the LOCAL replicas for the query (based on Statement.getRoutingKey if possible (i.e. if the query getRoutingKey method doesn't return null and if Metadata.getReplicas returns a non empty set of replicas for that partition key). If no local replica can be either found or successfully contacted, the rest of the query plan will fallback to one of the child policy.


Do note that only replica for which the child policy distance method returns HostDistance.LOCAL will be considered having priority. For example, if you wrap DCAwareRoundRobinPolicy with this token aware policy, replicas from remote data centers may only be returned after all the host of the local data center.

WhiteListPolicy

A load balancing policy wrapper that ensure that only hosts from a provided white list will ever be returned.

This policy wraps another load balancing policy and will delegate the choice of hosts to the wrapped policy with the exception that only hosts contained in the white list provided when constructing this policy will ever be returned. Any host not in the while list will be considered IGNORED and thus will not be connected to.

This policy can be useful to ensure that the driver only connects to a predefined set of hosts. Keep in mind however that this policy defeats somewhat the host auto-detection of the driver. As such, this policy is only useful in a few special cases or for testing, but is not optimal in general. If all you want to do is limiting connections to hosts of the local data-center then you should use DCAwareRoundRobinPolicy and *not* this policy in particular.

LatencyAwarePolicy

A wrapper load balancing policy that adds latency awareness to a child policy.

When used, this policy will collect the latencies of the queries to each Cassandra node and maintain a per-node latency score (an average). Based on these scores, the policy will penalize (technically, it will ignore them unless no other nodes are up) the nodes that are slower than the best performing node by more than some configurable amount (the exclusion threshold).

The latency score for a given node is a based on a form of http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average exponential moving average.
In other words, the latency score of a node is the average of its previously measured latencies, but where older measurements gets an exponentially decreasing weight. The exact weight applied to a newly received latency is based on the time elapsed since the previous measure (to account for the fact that latencies are not necessarily reported with equal regularity, neither over time nor between different nodes).

Once a node is excluded from query plans (because its averaged latency grew over the exclusion threshold), its latency score will not be updated anymore (since it is not queried). To give a chance to this node to recover, the policy has a configurable retry period. The policy will not penalize a host for which no measurement has been collected for more than this retry period.

 

Of cause, not a single load balancing is perfect for one environment and thus you should evaluate the load balancing policy that suit your needs. Because of this, load balancing will be fine tune or more will be added in the future, so always check back in the next release for newly update driver.

No comments:

Post a Comment