Skip to main content

Importing/Indexing database (MySQL or SQL Server) in Solr using Data Import Handler

Install Solr

download and install Solr from http://lucene.apache.org/solr/.
you can access Solr admin from your browser: http://localhost:8983/solr/
use the port number used in installation.

MySQL connector

Download JDBC driver for MySQL from http://dev.mysql.com/downloads/connector/j/.
Copy file from the downloaded archive 'mysql-connector-java-*.jar' to the folder 'contrib/dataimporthandler/lib' in the folder where Solr was installed. Create 'lib' folder if needed.

MS SQL Server connector

Download Microsoft JDBC Driver 4.0 for SQL Server from: http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774
copy file 'sqljdbc4.jar' to 'contrib/dataimporthandler/lib'

Setup a new collection

create a new folder for a new collection - 'myproducts'. The collection will be located in '/solr/myproducts' folder. Create folders conf and data in the collection folder:
  • /solr/myproducts/conf
  • /solr/myproducts/data

solrconfig.xml

copy solrconfig.xml from an existing collection. Find my version of solrconfig.xml below in this gist.
edit solrconfig.xml by adding:
<lib dir="../../contrib/dataimporthandler/lib" regex=".*\.jar" />
<lib dir="../../dist/" regex="solr-dataimporthandler-.*\.jar" />
Make sure that 'dist' folder contains two files for data import handler:
  • solr-dataimporthandler-4.10.2.jar
  • solr-dataimporthandler-extras-4.10.2.jar
add these lines to solrconfig.xml:

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
    <str name="config">data-config.xml</str>
    </lst>
</requestHandler>

data-config.xml for MySQL database

the file 'data-config.xml' will define data we want to import/index from our datasource. Assuming that our DB named mydb1 and we have table products with columns id, name and updated_at. Column 'updated_at' of datetime type stores the date of last modification of the row. This column will be used in incremental import to track rows modified since the last import into Solr.
# define data source
<dataConfig>
<dataSource type="JdbcDataSource" 
            driver="com.mysql.jdbc.Driver"
            url="jdbc:mysql://localhost:3306/mydb1" 
            user="root" 
            password=""/>
<document>
  <entity name="product"  
    pk="id"
    query="select id,name from products"
    deltaImportQuery="SELECT id,name from products WHERE id='${dih.delta.id}'"
    deltaQuery="SELECT id FROM products  WHERE updated_at > '${dih.last_index_time}'"
    >
     <field column="id" name="id"/>
     <field column="name" name="name"/>       
  </entity>
</document>
</dataConfig>
  • The 'query' gives the data needed to populate fields of the Solr document in full-import
  • The 'deltaImportQuery' gives the data needed to populate fields when running a delta-import
  • The 'deltaQuery' gives the primary keys of the current entity which have changes since the last index time
Full-import command uses the "query" query, delta-import command uses the delta components.

data-config.xml for SQL Server database

<dataConfig>
  <dataSource type="JdbcDataSource" 
              driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" 
              url="jdbc:sqlserver://servername\instancename;databaseName=mydb"   
              user="sa" 
              password="mypass"/>
  <document>
    <entity name="product"  
      pk="id"
      query="select id,name from products"
      deltaImportQuery="SELECT id,name from products WHERE id='${dih.delta.id}'"
      deltaQuery="SELECT id FROM products  WHERE updated_at > '${dih.last_index_time}'"
      >
       <field column="id" name="id"/>
       <field column="name" name="name"/>       
    </entity>
  </document>
</dataConfig>

schema.xml

edit file 'schema.xml' accordingly to fields defined in data-import.xml:
<schema name="example" version="1.5">
    <field name="_version_" type="long" indexed="true" stored="true"/>
    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
    <field name="name" type="text_general" indexed="true" stored="true"/>
    ...

add collection to solr

use admin interface in your browser to add a new collection. Add core:
  • name: myproducts
  • instanceDir: myproducts

Perform full or incremental import

After successfully adding a collection to Solr you can select it and run dataimport commands:
  • full-import - use URL: http://localhost:8983/solr/myproducts/dataimport?command=full-import
  • delta-import - use URL: http://localhost:8983/solr/myproducts/dataimport?command=delta-import
The full import loads all data every time, while incremental import means only adding the data that changed since the last indexing. By default, full import starts with removal the existing index (parameter clean=true).
Note! Use clean=false while running delta-import command.
debug=true - The debug mode limits the number of rows to 10 by default and it also forces indexing to be synchronous with the request.

Comments

Popular posts from this blog

HTML Emailing in ZF2

<?php namespace Application\Model; use Zend\Mail\Message; use Zend\Mime\Message as MimeMessage; use Zend\Mime\Part as MimePart; use Zend\Mail\Transport\Sendmail as SendmailTransport; use Zend\Mail\Transport\Smtp as SmtpTransport; use Zend\Mail\Transport\SmtpOptions; use Zend\View\Renderer\PhpRenderer; use Zend\View\Resolver\TemplatePathStack; use Zend\View\Model\ViewModel; use Zend\View\Model\ModelInterface; class ApiEmail extends ApiInterface {    protected $From = "XXXXX@gmail.com";    protected $FromName = "XXXX";    protected $To = "" ;    protected $Subject = "" ;    protected $Body = "" ;    protected $Cc = "XXXXXX@gmail.com";    protected $Bcc = "XXXXX@gmail.com" ;    protected $ReplyTo = "XXXX@gmail.com" ;    protected $Sender = "" ;    protected $Encoding = "UTF-8" ;    protected $MESSAGE ; // getter setter ...

How to Set Up a Home Smart Office – A Step-by-Step Guide

  By Luna Lush In today's world, working from home has become the norm for many, making it essential to have a smart, efficient, and comfortable home office. With the right tech tools, you can transform any space into a productivity hub. In this tutorial, we'll walk you through setting up a home smart office that maximizes efficiency, comfort, and connectivity. Step 1: Choose the Right Space Before diving into tech, pick the ideal spot in your home. Whether it's a dedicated room or a cozy corner, ensure the area is quiet, well-lit, and free from distractions. Good lighting is crucial for video calls and reduces eye strain during long hours of work. Step 2: Invest in a Smart Desk Setup A smart desk setup is the foundation of your home office. Consider the following: Adjustable Standing Desk: Invest in a desk that allows you to switch between sitting and standing. Many options come with programmable settings to easily adjust the height. Ergonomic Chair: A chair that support...

Grouped CheckboxSelectMultiple in Django template

from itertools import groupby from django import forms from django . forms . models import ModelChoiceIterator , ModelMultipleChoiceField from . models import Feature , Widget class GroupedModelMultipleChoiceField ( ModelMultipleChoiceField ): def __init__ ( self , group_by_field , group_label = None , * args , ** kwargs ): """ ``group_by_field`` is the name of a field on the model ``group_label`` is a function to return a label for each choice group """ super ( GroupedModelMultipleChoiceField , self ). __init__ (* args , ** kwargs ) self . group_by_field = group_by_field if group_label is None : self . group_label = lambda group : group else : self . group_label = group_label def _get_choices ( self ): if hasattr ( self , '_choices' ): return self . _choices return GroupedModelCho...