A Query that matches documents within an range of terms.
This query matches the documents looking for terms that fall into the supplied range according to Byte.compareTo(Byte). It is not intended for numerical ranges; use NumericRangeQuery instead.
This query uses the MultiTermQuery.CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method.
So byte to byte comparison of between two ranges, because it is byte to byte comparison, the comparison is lexicographic. If you intend to find range between two numbers, this is not the class you should use. Okay, if this is not clear, let's go into the code, shall we?
As you know, lucene is about two parts, the first indexing (write) part and then search (query) part. So in this article, we are going to index and query using term range query. To give you an overall of this article, we have four class.
- LuceneConstants - just a setting class for this application.
- Indexer - the class that does the indexing.
- Searcher - a class that do the search.
- LearnTermRangeQuery - our main entry class to bind the above three classes into one.
We have create an object tester for this learning journey. We then create index by calling method createIndex and then the index using term range query.
1: LearnTermRangeQuery tester; 2: 3: try { 4: tester = new LearnTermRangeQuery(); 5: tester.createIndex(); 6: tester.searchUsingTermRangeQuery("record2.txt", "record6.txt"); 7: } catch (Exception e) { 8: 9: }
In the method createIndex(), I have some lambda usage, which you can notice with the arrow symbol, so you need to have java8 installed. There are two variables, indexDir and dataDir. The variable, indexDir is there directory where the created index will reside whilst dataDir is the sample data to be index upon. In the class Indexer, method getDocument(), is essentially index all sample documents. Nothing fancy, just ordinary creating lucene document and three fields, filename, filepath and file content.
Back to the class LearnTermRangeQuery, method searchUsingTermRangeQuery(). Notice we search the range with two files as the border. We initialized a lucene directory object and pass to the object index searcher. Everything else for lucene index searcher is just standard. We construct the TermRangeQuery and passed to the searcher object. The results are then shown and eventually close.
Below are the sample output in eclipse output.
record 21.txt
src/resources/samples.termrange/record 21.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record 21.txt
record 33 .txt
src/resources/samples.termrange/record 33 .txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record 33 .txt
record10.txt
src/resources/samples.termrange/record10.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record10.txt
record7.txt
src/resources/samples.termrange/record7.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record7.txt
record6.txt
src/resources/samples.termrange/record6.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record6.txt
record9.txt
src/resources/samples.termrange/record9.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record9.txt
record33.txt
src/resources/samples.termrange/record33.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record33.txt
record2.txt
src/resources/samples.termrange/record2.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record2.txt
record5.txt
src/resources/samples.termrange/record5.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record5.txt
record 33.txt
src/resources/samples.termrange/record 33.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record 33.txt
record3.txt
src/resources/samples.termrange/record3.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record3.txt
record8.txt
src/resources/samples.termrange/record8.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record8.txt
record2.1.txt
src/resources/samples.termrange/record2.1.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record2.1.txt
record1.txt
src/resources/samples.termrange/record1.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record1.txt
record4.txt
src/resources/samples.termrange/record4.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record4.txt
record22.txt
src/resources/samples.termrange/record22.txt
Indexing /home/user/eclipse/test/src/resources/samples.termrange/record22.txt
16 File indexed, time taken: 800 ms
6 documents found. Time :74ms
File : /home/user/eclipse/test/src/resources/samples.termrange/record33.txt
File : /home/user/eclipse/test/src/resources/samples.termrange/record2.txt
File : /home/user/eclipse/test/src/resources/samples.termrange/record5.txt
File : /home/user/eclipse/test/src/resources/samples.termrange/record3.txt
File : /home/user/eclipse/test/src/resources/samples.termrange/record4.txt
File : /home/user/eclipse/test/src/resources/samples.termrange/record22.txt
As you can see above, the result are not correct if you consider numeric file name from record2.txt to record6.txt. So, always try experiment for few values before you implement. hehe, have fun! You can get the source for this codes at my github.
No comments:
Post a Comment