Friday, May 8, 2015

Elasticsearch no node exception happened in tomcat web container

If you ever get the stack trace in web container log file such as below and wondering how to solve these. Then read on but first, a little background. A elasticsearch cluster 0.90 and client running on tomcat web container using elasticsearch java transport client. Both server and client running same elasticsearch version and same java version.
16.Feb 6:21:30,830 ERROR WebAppTransportClient [put]: error
org.elasticsearch.client.transport.NoNodeAvailableException: No node available
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:212)
at org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106)
at org.elasticsearch.client.support.AbstractClient.index(AbstractClient.java:84)
at org.elasticsearch.client.transport.TransportClient.index(TransportClient.java:316)
at org.elasticsearch.action.index.IndexRequestBuilder.doExecute(IndexRequestBuilder.java:324)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
at com.example.elasticsearch.WebAppTransportClient.put(WebAppTransportClient.java:258)
at com.example.elasticsearch.WebAppTransportClient.put(WebAppTransportClient.java:307)
at com.example.threadpool.TaskThread.run(TaskThread.java:38)
at java.lang.Thread.run(Thread.java:662)

This exception will disappear once web container is restarted but restarting webapp that often is not a good solution in production. I did a few research on line and gather a few information, they are as following:

* The default number of channels in each of these class are configured with the configuration prefix of transport.connections_per_node.
https://www.found.no/foundation/elasticsearch-networking/

* If you see NoNodeAvailableException you may have hit a connect timeout of the client. Connect timeout is 30 secs IIRC.
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/elasticsearch/VyNpCs17aTA/CcXkYvVMYWAJ

* You can set org.elasticsearch.client.transport to TRACE level in your logging configuration (on the client side) to see the failures it has (to connect for example). For more information, you can turn on logging on org.elasticsearch.client.transport.
https://groups.google.com/forum/#!topic/elasticsearch/Mt2x4d5BCGI

* This means that you started to get disconnections between the client (transport) and the server. It will try and reconnect automatically, and possibly manages to do it. For more information, you can turn on logging on org.elasticsearch.client.transport.
* Can you try and increase the timeout and see how it goes? Set client.transport.ping_timeout in the settings you pass to the TransportClient to 10s for example.
* We had the same problem. reason: The application server uses a older version of log4j than ES needed.
http://elasticsearch-users.115913.n3.nabble.com/No-node-available-Exception-td3920119.html

* The correct method is to add the known host addresses with addTransportAddresses() and afterwards check the connectedNodes() method. If it returns empty list, no nodes could be found.
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/elasticsearch/ceH3UIy14jM/XJSFKd8kAXEJ

* the most common case for NoNodeAvailable is the regular pinging that the transport client does fails to do it, so no nodes end up as the list of nodes that the transport client uses. If you will set client.transport (or org.elasticsearch.client.transport if running embedded) to TRACE, you will see the pinging effort and if it failed or not (and the reason for the failures). This might get us further into trying to understand why it happens.
* .put("client.transport.ping_timeout", pingTimeout)
* .put("client.transport.nodes_sampler_interval", pingSamplerInterval).build();
https://groups.google.com/forum/#!msg/elasticsearch/9aSkB0AVrHU/_4kDkjAFKuYJ

* this has nothing to do with migration errors. Your JVM performs a very long GC of 9 seconds which exceeds the default ping timeout of 5 seconds, so ES dropped the connection ,assuming your JVM is just too busy. Try again if you can reproduce it. If yes, increase the timeout to something like 10 seconds, or consider to update your Java version.
http://elasticsearch-users.115913.n3.nabble.com/Migration-errors-0-20-1-to-0-90-td4035165.html

* During long GC the JVM is somehow suspended. So your client can not see it anymore.
http://grokbase.com/t/gg/elasticsearch/136fw0hppp/transport-client-ping-timeout-no-node-available-exception

* You wrote that you have a 0.90.9 cluster but you added 0.90.0 jars to the client. Is that correct?
* Please check:
*
* if your cluster nodes and client node is using exactly the same JVM
* if your cluster and client use exactly the same ES version
* if your cluster and client use the same cluster name
* reasons outside ES: IP blocking, network reachability, network interfaces, IPv4/IPv6 etc.
* Then you should be able to connect with TransportClient.

https://groups.google.com/forum/#!msg/elasticsearch/fYmKjGywe8o/z9Ci5L5WjUAJ

So I have tried all that option mentioned and the problem solve by added sniff to the transport client setting. 08988For more information, read here.

I hope this will solve your problem too.

No comments:

Post a Comment