The Coder's Way

  • Portfolio
  • Résumé
  • Random
  • Archive
  • RSS
banner

Geospatial Search So Easy with Solr

For WeatherNation, I had to build an AJAX search box that would let users switch their current forecast zones. The requirements included searching citys, places, ICAO, zipcodes, and forecast zones. I also wanted to support additional signals for example if the user GeoIP was coming from colorado I wanted to rank Denver, CO over Denver, TN. This sounds like a great job for lucene or lets make it easier Solr. Solr is a wrapper around lucene and provides a very easy RESTFul interface. One of the hardest parts is setting up Solr so lets go do that now and get it out of the way.

$ cd /tmp/
$ wget http://apache.mesi.com.ar/lucene/solr/3.2.0/apache-solr-3.2.0.zip
$ unzip apache-solr-3.2.0.zip

$ mkdir /tmp/playground
$ cd /tmp/playground
$ cp /tmp/apache-solr-3.2.0/example/start.jar .
$ cp -rp /tmp/apache-solr-3.2.0/example/etc .
$ cp -rp /tmp/apache-solr-3.2.0/example/lib .
$ cp -rp /tmp/apache-solr-3.2.0/example/webapps .

$ mkdir -p data/weather/conf

At this point we have setup the most basic install of Solr. We are going to use start.jar that host the service in jetty and we have copied all the basic files need to run Solr. Lets update the basic configuration file. You can just replace the files with the version below.

# /tmp/playground/data/solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="false">
  <cores adminPath="/admin/cores">
    <core name="weather" instanceDir="weather" />
  </cores>
</solr>

# /tmp/playground/data/weather/conf/solrconfig.xml
<?xml version="1.0" encoding="UTF-8" ?>
<config>
 <luceneMatchVersion>LUCENE_32</luceneMatchVersion>
 <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>

 <indexDefaults>
  <useCompoundFile>false</useCompoundFile>
  <mergeFactor>10</mergeFactor>
  <ramBufferSizeMB>32</ramBufferSizeMB>
  <maxFieldLength>10000</maxFieldLength>
  <writeLockTimeout>1000</writeLockTimeout>
  <commitLockTimeout>10000</commitLockTimeout>
  <lockType>native</lockType>
 </indexDefaults>

 <mainIndex>
  <useCompoundFile>false</useCompoundFile>
  <ramBufferSizeMB>32</ramBufferSizeMB>
  <mergeFactor>10</mergeFactor>
  <unlockOnStartup>false</unlockOnStartup>
  <reopenReaders>true</reopenReaders>
  <infoStream file="INFOSTREAM.txt">false</infoStream> 
 </mainIndex>
  
 <updateHandler class="solr.DirectUpdateHandler2" />

 <requestDispatcher handleSelect="true" >
  <requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="2048" />
 </requestDispatcher>
  
 <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
 <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />

 <requestHandler name="/update/csv" class="solr.CSVRequestHandler" startup="lazy" />
 <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" startup="lazy" />

 <requestHandler name="standard" class="solr.SearchHandler" default="true">
  <lst name="defaults">
   <str name="echoParams">explicit</str>
    <str name="defType">edismax</str>
     <str name="qf">
      name^1.8 locid^2.0 state^1.0
     </str>
     <str name="q.alt">*:*</str>
  </lst>
 </requestHandler>
 <admin>
  <defaultQuery>solr</defaultQuery>
 </admin>
</config>

# /tmp/playground/data/weather/conf/schema.xml
<?xml version="1.0" ?>
<schema name="weather lookup index" version="1.3">
<types>
 <fieldType name="integer" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
 <fieldtype name="string"  class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
 <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
 <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
 <fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer type="index">
   <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
   <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
 </fieldType>
</types>

<fields>
 <field name="name" type="text" indexed="true" stored="true" multiValued="false" required="true"/>
 <field name="state" type="text" indexed="true" stored="true" multiValued="false" /> 
 <field name="locid" type="text" indexed="true" stored="true" multiValued="false" /> 
 <field name="loc" type="string" indexed="false" stored="true" multiValued="false" /> 
 <field name="type" type="string" indexed="false" stored="true" multiValued="false" /> 
 <field name="rank" type="integer" indexed="true" stored="true" multiValued="false" /> 
 <field name="geoloc" type="location" indexed="true" stored="true"/>
 <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>
</fields>

<defaultSearchField>name</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
</schema>

Now that its installed and running lets test it.

$ cd /tmp/playground
$ java -Dsolr.solr.home=data -jar start.jar

Now Fire up your favorite browser i.e. Chrome just a suggestion ;). Go to http://localhost:8983/solr this is the admin interface to the solr service. Now that Solr is running we can load data into lucene. It is easy as pie, we can just import a csv file to be indexed. So create the file /tmp/city.csv see below. Next use curl to POST the city.csv to the RESTFul Solr service to have it load/index that file. This loads the cities  and latitude, longitude, zipcodes, etc.

# /tmp/city.csv
name,state,locid,loc,type,rank,geoloc
"woodland park",NJ,,"40.8892,-74.195",PLACE,4,"40.8892,-74.195"
Denver,NY,12421,"42.27,-74.53",ZIP,2,"42.27,-74.53"
Denver,PA,17517,"40.24,-76.13",ZIP,2,"40.24,-76.13"
Denver,NC,28037,"35.51,-81.01",ZIP,2,"35.51,-81.01"
Denver,IN,46926,"40.89,-86.05",ZIP,2,"40.89,-86.05"
Denver,IA,50622,"42.66,-92.33",ZIP,2,"42.66,-92.33"
Denver,MO,64441,"40.42,-94.29",ZIP,2,"40.42,-94.29"
"Denver City",TX,79323,"32.96,-102.82",ZIP,2,"32.96,-102.82"
Denver,CO,80201,"39.69,-105.08",ZIP,2,"39.69,-105.08"
Denver,CO,80202,"39.74,-104.99",ZIP,2,"39.74,-104.99"
"denver, centennial airport",CO,KAPA,"39.56389,-104.8483",ICAO,9,"39.56389,-104.8483"
"denver, denver international airport",CO,KDEN,"39.83278,-104.6575",ICAO,9,"39.83278,-104.6575"
"denver / stapleton international, co.",CO,KDNR,"39.78333,-104.8666",ICAO,9,"39.78333,-104.8666"
"denver nexrad",CO,KFTG,"39.78333,-104.55",ICAO,9,"39.78333,-104.55"
DENVER/ARTCC,CO,KZDV,"40.1833333333333,-105.133333333333",ICAO,9,"40.1833333333333,-105.133333333333"
"DENVER CITY 7W",TX,XDVS,"32.9833333333333,-102.933333333333",ICAO,9,"32.9833333333333,-102.933333333333"
"North Douglas County Below 6000 Feet/Denver/West Adams and Arapahoe Counties/East Broomfield County",CO,COZ040,"39.72,-104.80",ZONE,10,"39.72,-104.80"
"Denver International",CO,KDEN,"39.8333,-104.65",AIRPORT,5,"39.8333,-104.65"
"DENVER F. RANGE",CO,KFTG,"39.7833,-104.55",AIRPORT,5,"39.7833,-104.55"
DENVER/ARTCC,CO,KBJC,"40.1833,-105.133",AIRPORT,5,"40.1833,-105.133"
"DENVER WATER DEPARTMENT",CO,KBKF,"39.729,-105.009",AIRPORT,5,"39.729,-105.009"
"DENVER HEALTH",CO,KBKF,"39.727,-104.991",AIRPORT,5,"39.727,-104.991"
"DENVER FEDERAL CENTER HELISTOP",CO,KBJC,"39.723,-105.111",AIRPORT,5,"39.723,-105.111"
"DENVER POLICE DEPARTMENT-DISTRICT 3",CO,KAPA,"39.687,-104.96",AIRPORT,5,"39.687,-104.96"
"DENVER ARTCC",CO,KBJC,"40.187,-105.127",AIRPORT,5,"40.187,-105.127"
"DENVER OF THE EAST",NC,KIPJ,"35.493,-80.966",AIRPORT,5,"35.493,-80.966"
"DENVER CITY",TX,KGNC,"32.975,-102.842",AIRPORT,5,"32.975,-102.842"
"denver county",CO,,"39.76,-104.83",PLACE,4,"39.76,-104.83"
"denver nexrad",CO,,"39.78333,-104.55",PLACE,4,"39.78333,-104.55"
denver,CO,,"39.76800,-104.87270",PLACE,4,"39.76800,-104.87270"
denver,IA,,"42.67120,-92.33410",PLACE,4,"42.67120,-92.33410"
denver,IN,,"40.86420,-86.07640",PLACE,4,"40.86420,-86.07640"
denver,KY,,"37.77590,-82.85510",PLACE,4,"37.77590,-82.85510"
denver,MO,,"40.39900,-94.32370",PLACE,4,"40.39900,-94.32370"
denver,NC,,"35.53110,-81.03000",PLACE,4,"35.53110,-81.03000"
denver,NY,,"42.2125,-74.56972",PLACE,4,"42.2125,-74.56972"
denver,PA,,"40.23250,-76.13870",PLACE,4,"40.23250,-76.13870"
"denver city",TX,,"32.96960,-102.83070",PLACE,4,"32.96960,-102.83070"
Denver,IL,,"40.28330,-91.10000",PLACE,4,"40.28330,-91.10000"
Denver,NE,,"40.434655,-98.66677",PLACE,4,"40.434655,-98.66677"
Denver,TN,,"36.04694,-87.92083",PLACE,4,"36.04694,-87.92083"
$ cd /tmp/
$ curl http://localhost:8983/solr/weather/update/csv?commit=true --data-binary @city.csv -H 'Content-type:text/plain; charset=utf-8'

Next lets test the search by trying a few search results out.

http://localhost:8983/solr/weather/select?q=Denver

This will find all matches that have Denver. Our solrconfig.xml also lets Solr know that we want to boost fields using “name^1.8 locid^2.0 state^1.0” So the name field is more important than the state, but the zip code trumps all. For example if you had a city named “cool town” and a state “co” the “cool town” would be the first match then the state. Now lets try:

http://localhost:8983/solr/weather/select?q=Denver&sfield=geoloc&pt=40.08,-105.36&sort=score%20desc&bf=recip(geodist(),2,200,20)

This kicks in the Geo spatial magic, look at the fields sfield, pt, sort, bf fields. The sfield specifies which field is used to calculate the distance from pt. The pt is the point that all results will be measure against. In this example, (40.08,-105.36) is Boulder, CO next to Denver. The bf lets Solr know to boost the search results using the calculated distance from (40.08,-105.36) so Denver, CO wins over Denver, TX because Denver, CO is closer to Boulder, CO.

If you change (40.08,-105.36) to (32.78,-96.79) Dallas,TX you can see the results will boost Denver, TX to the top not Denver, CO. You can do so much more with Sorl, check out the documentation at Solr Spatial Search Wiki to learn more. I hope you find this useful, please leave comments if you have any questions.

  • 1 year ago
  • 1
  • Comments
  • Permalink
  • Share

1 Notes/ Hide

  1. jayncoke likes this
  2. demetriusj posted this

Recent comments

Blog comments powered by Disqus
← Previous • Next →
Avatar In the hope of furthering my knowledge, I want to share my ideas and my passion for writing code and learn from all of you in return. In this blog, I plan to cover technologies I am working on including server and client side javascript, open source, cloud computing, design, the social web and more.

Find Me At:

  • @demetriusjoh on Twitter
  • Facebook Profile
  • Linkedin Profile
  • demetriusj on github

Twitter

loading tweets…

Following

  • RSS
  • Random
  • Archive
  • Mobile

© 2011 Demetrius Johnson.