Thứ Năm, 3 tháng 3, 2016

Dive into QueryParser (1) - Hacking into SOLR

This series is different with other tutorials. Normally people will teach you how to write a simple QueryParser and explain that. But it have some drawbacks :
  • It far from the complex QueryParser that you have to write.
  • It does not embrace all the sides of the QueryParser.
  • I find myself cant remember anything after that, when i have to write another QueryParser I must look into the tutorial again and again.
So we will look into Solr with top-down approach. From solrconfig.xml -> SearchHandler -> SearchComponent -> QueryParser, it will help you have an deep understanding of searching flow of Solr.
Note : for any file that linked in this article. You can search it through Intellij IDEA by press double shift and type the name of the file.

Understand searching flow of Solr

This is sample config about SearchHandler in solrconfig.xml.
<requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
    </lst>
</requestHandler>
SearchHandler is a RequestHandler that handle search request from client and injected into “/select”. Let look into its source code (I’d love to look at source code, it self explained, it always up-to-date, it can be debug).
SearchHandler.java
The highlighted method is where your request be handled, and it go through numbers SearchComponent like this (the image from Solr in Action).
SearchHandler
SearchComponents :
  • QueryComponent is where solr find relevent documents coresspond to your query.
  • FacetComponent is where solr running the faceting on result.
The awesome point of Solr is you can replace the above components with your components, you can append your components in anywhere of this chain. Let look into the source code of SearchHandler one more time. Here are the simplify code from handleRequestBody method in non distributed mode (it have some parts be hidden in ...).
SearchHandler.java
ResponseBuilder rb hold refereces to almost any other objects of the request. The code above is quite self-explained, we just have to care aboutprepare and process method of SearchComponents. Here are the QueryComponent.prepare(ResponseBuilder) method.
SearchComponent
So in QueryComponent.prepare(ResponseBuilder) :
  • (1) : it get the defType from request (defType is the name of QueryParser), the default defType is lucene
  • (2) : it get the real QueryParser object from defType, and convert query string to a Lucene query object -> this is the rule of QueryParser in Solr.
  • (3) : set the query object to ResponseBuilder to make sure that in another steps/components we can retrieve it.

Why we need to write a QueryParser?

The role of QueryParser is convert the request to a lucene query object. So why we need to write a QueryParser? Here are some reason :
  • When you write a custom Lucene Query.
  • When you want to blur the long parameters of Solr.
  • When you find all Solr QueryParsers is not suitable for your application

Explain RawQParser

I will use RawQParserPlugin as an example because of its simplicity
SearchComponent
QParserPlugin is the first class you have to write. It is like a Factory used to create QParser, Solr apply Factory pattern a lot in its source. So you can feel free to write any non thread-safe code inside your QParser. Inside the parse() method, we create a new TermQuery with (1) is field name and (2) is the term.
I hope that through this post, you can confidently view Solr source and understand the searching flow of Solr. In the next part we will come back to QueryParser in more practically way, we will write a custom QParser in new project, build, test, and inject the custom QParser to Solr.

Thứ Năm, 25 tháng 2, 2016

Setup enviroment - Hacking into SOLR

Introduction

Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene. But the most powerful power of Solr is its customisation. This series will help you familar with solr’s source, write plugins and hack to any part of Solr/Lucene.

Setup ide

Download the source
> wget http://www.us.apache.org/dist/lucene/solr/5.5.0/solr-5.5.0-src.tgz
> tar -xvf solr-5.5.0-src.tgz
Solr use ant as build tools, so if you want to deeply understand following steps and customise solr build process, I recommend you to read some book about ant.
> ant -p
Buildfile: /Users/caomanhdat/workspace/lucene-solr/build.xml

Main targets:
 ...
 idea                          Setup IntelliJ IDEA configuration
 ...
> ant idea
Firstly, we list all ant targets that come will solr, and run ant idea to generate all necessary configuration file for Intellij IDEA. After that, we can easily open solr source as normal Intellij project.
Open Solr project inside Intellij IDEA

Solr project structure

Solr source folders
The most important folders in Solr source is
  • core : solr core code
  • contrib : contains contribution modules, like dataimporthandlervelocity…etc
  • solrj : java client to access solr.
Be notice that, Solr/Lucene is the most well tested project. So when I find some Solr feature that hard to understand, I just have to go to correspond test class and everything will be much clearer.
For example: you can try to run TestJsonRequest.testLocalJsonRequest and place breakpoint SearchHandler.handleRequestBody() to understand search flow of Solr. 
A debug screen along with variables window
I hope that through this post, you can open Solr inside your favourite ide and run some unit test to familar with Solr code. In the next part we will examine structure of a QueryParser and write custom QueryParser after that.