Spatial search on multiple points in Solr

· django python search solr

At TrialReach we deal with clinical trials data, which contain a lot of spatial information. Tipically, clinical trials treat a certain set of conditions and they happen in various locations globally. If you are a patient then searching across clinical trials becomes really spatial sensitive: you are only interested in the closest location to you.

This case might apply to other events as well, but the key point is global distribution. I am not interested in any point in the globe, just the closest to me.


Solr 4 does have support for this with the new spatial field called SpatialRecursivePrefixTreeFieldType, with many caveats though.

A schema could look this way:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
   <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="title" type="text_en" indexed="true" stored="true" required="true"/>
   <field name="condition" type="text_en" indexed="true" stored="true" required="true" multiValued="true"/>
   <field name="location" type="location_rpt" indexed="true" stored="true" multiValued="true"/>
   <field name="_version_" type="long" indexed="true" stored="true" />
    <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
        geo="true" distErrPct="0.025" maxDistErr="0.000009" units="degrees" />

A sample indexer using GeoDjango and PySolr (Haystack does not support this). It should be quite easy to work out how it works, PySolr is just a very thin wrapper for doing HTTP POST requests to Apache Solr.

import pysolr

solr = pysolr.Solr("", timeout=10)

records = models.Study.objects.all()
solr_data = []
for record in records:
    solr_dict = {
                "id": str(,
                "title": record.title,
                "condition": [ for c in record.conditions.all()],
                "location": ["{1} {0}".format(l.point.coords[0], l.point.coords[1]) for l in record.locations.all()],
		# "point" is a Point GeoDjango type
		# SOLR FORMAT is "long lat", separated by a space

For querying, we use these sort of urls:{!geofilt score=distance sfield=location pt=LAT,LONG d=KM_RADIUS}&fl=*,score
  • to return the distance you need to use the score, and the only thing you use in the q parameter is the geofilt (otherwise will influence the score), all other filters go in fq
  • if you do not need the distance, loose the score parameter in geofilt (it is inefficient)
  • distance returned is the distance between specified LAT,LONG and the closest LAT,LONG in the SpatialRecursivePrefixTreeFieldType set.
  • score returned is in DEGREES. You have to convert it in Km or miles.


- the only way to get the distance is through the score - you cannot get the matched point through highlighting or any other way - units of measure are a bit confusing

Vote on Hacker News