Friday, May 22, 2015

learning elasticsearch percolator

Today, we are going to learn elasticsearch percolator. But first, what's a percolator? Excerpt from wikipedia,

A coffee percolator is a type of pot used to brew coffee by continually cycling the boiling or nearly boiling brew through the grounds using gravity until the required strength is reached.
But that's coffe's percolator but for elasticsearch's percolator,
The percolator allows to register queries against an index, and then send percolate requests which include a doc, and getting back the queries that match on that doc out of the set of registered queries. 
Think of it as the reverse operation of indexing and then searching. Instead of sending docs, indexing them, and then running queries. One sends queries, registers them, and then sends docs and finds out which queries match that doc.

If that sounds a little abstract, let's dip our hand into the water. Let's start doing experiement using elasticsearch percolator. Start by create an index.

1:  [user@localhost ~]$ curl -XPUT 'localhost:9200/test?pretty'  
2:  {  
3:   "ok" : true,  
4:   "acknowledged" : true  
5:  }  

Then we register a percolator query.
1:  [user@localhost ~]$ curl -XPUT 'localhost:9200/_percolator/test/kuku?pretty' -d '{ "query" : { "term" : { "field1" : "value1" } } }'  
2:  {  
3:   "ok" : true,  
4:   "_index" : "_percolator",  
5:   "_type" : "test",  
6:   "_id" : "kuku",  
7:   "_version" : 1  
8:  }  

Now we start to index, but we need to append _append to the url.

1:  [user@localhost ~]$ curl -XGET 'localhost:9200/test/type1/_percolate?pretty' -d '{ "doc" : { "field1" : "value1" } }'  
2:  {  
3:   "ok" : true,  
4:   "matches" : [ "kuku" ]  
5:  }  

So now we see a match "query" when we index a document 'field 1' equal to 'value 1'. Another way of index, see below, if you have multiple percolators to match with, you can use asterisk.

1:  [user@localhost ~]$ curl -XPUT 'localhost:9200/test/type1/1?percolate=*&pretty' -d ' { "field1" : "value1" }'  
2:  {  
3:   "ok" : true,  
4:   "_index" : "test",  
5:   "_type" : "type1",  
6:   "_id" : "1",  
7:   "_version" : 2,  
8:   "matches" : [ "kuku" ]  
9:  }  

So yes, another match! that's cool! But what if we index specify using the percolator color green?

1:  [user@localhost ~]$ curl -XPUT 'localhost:9200/test/type1/1?percolate=color:green&pretty' -d '{ "field1" : "value1", "field2" : "value2" }'  
2:  {  
3:   "ok" : true,  
4:   "_index" : "test",  
5:   "_type" : "type1",  
6:   "_id" : "1",  
7:   "_version" : 3,  
8:   "matches" : [ ]  
9:  }  

There is no match. Lets index entirely different content, to see if we match the percolator we setup before.

1:  [user@localhost ~]$ curl -XPUT 'localhost:9200/test/type1/1?percolate=*&pretty' -d '{ "field1" : "value33", "field2" : "value2" }'  
2:  {  
3:   "ok" : true,  
4:   "_index" : "test",  
5:   "_type" : "type1",  
6:   "_id" : "1",  
7:   "_version" : 7,  
8:   "matches" : [ ]  
9:  }  

So there is not match. This is pretty cool, you can pre-register a few percolator and when interesting match coming in (index), then if there is any match to the percolator, it will shown in the output.

Instead of query the index data, with percolator you can get the match query during indexing. Something very cool.

No comments:

Post a Comment