Thứ Năm, 3 tháng 3, 2016

Dive into QueryParser (1) - Hacking into SOLR

This series is different with other tutorials. Normally people will teach you how to write a simple QueryParser and explain that. But it have some drawbacks :
  • It far from the complex QueryParser that you have to write.
  • It does not embrace all the sides of the QueryParser.
  • I find myself cant remember anything after that, when i have to write another QueryParser I must look into the tutorial again and again.
So we will look into Solr with top-down approach. From solrconfig.xml -> SearchHandler -> SearchComponent -> QueryParser, it will help you have an deep understanding of searching flow of Solr.
Note : for any file that linked in this article. You can search it through Intellij IDEA by press double shift and type the name of the file.

Understand searching flow of Solr

This is sample config about SearchHandler in solrconfig.xml.
<requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
    </lst>
</requestHandler>
SearchHandler is a RequestHandler that handle search request from client and injected into “/select”. Let look into its source code (I’d love to look at source code, it self explained, it always up-to-date, it can be debug).
SearchHandler.java
The highlighted method is where your request be handled, and it go through numbers SearchComponent like this (the image from Solr in Action).
SearchHandler
SearchComponents :
  • QueryComponent is where solr find relevent documents coresspond to your query.
  • FacetComponent is where solr running the faceting on result.
The awesome point of Solr is you can replace the above components with your components, you can append your components in anywhere of this chain. Let look into the source code of SearchHandler one more time. Here are the simplify code from handleRequestBody method in non distributed mode (it have some parts be hidden in ...).
SearchHandler.java
ResponseBuilder rb hold refereces to almost any other objects of the request. The code above is quite self-explained, we just have to care aboutprepare and process method of SearchComponents. Here are the QueryComponent.prepare(ResponseBuilder) method.
SearchComponent
So in QueryComponent.prepare(ResponseBuilder) :
  • (1) : it get the defType from request (defType is the name of QueryParser), the default defType is lucene
  • (2) : it get the real QueryParser object from defType, and convert query string to a Lucene query object -> this is the rule of QueryParser in Solr.
  • (3) : set the query object to ResponseBuilder to make sure that in another steps/components we can retrieve it.

Why we need to write a QueryParser?

The role of QueryParser is convert the request to a lucene query object. So why we need to write a QueryParser? Here are some reason :
  • When you write a custom Lucene Query.
  • When you want to blur the long parameters of Solr.
  • When you find all Solr QueryParsers is not suitable for your application

Explain RawQParser

I will use RawQParserPlugin as an example because of its simplicity
SearchComponent
QParserPlugin is the first class you have to write. It is like a Factory used to create QParser, Solr apply Factory pattern a lot in its source. So you can feel free to write any non thread-safe code inside your QParser. Inside the parse() method, we create a new TermQuery with (1) is field name and (2) is the term.
I hope that through this post, you can confidently view Solr source and understand the searching flow of Solr. In the next part we will come back to QueryParser in more practically way, we will write a custom QParser in new project, build, test, and inject the custom QParser to Solr.

Không có nhận xét nào:

Đăng nhận xét