Donnerstag, 9. Januar 2014

Geo-Spatial Features in Solr 3

Solr is mainly known for its full text search capabilities. You index text and are able to search it in lowercase or stemmed form, depending on your analyzer chain. But besides text Solr can do more: You can use RangeQueries to query numeric fields ("Find all products with a price lower than 2€"), do date arithmetic ("Find me all news entries from last week") or do geospatial queries, which we will look at in this post. What I am describing here is the old spatial search support. Next week I will show you how to do the same things using recent versions of Solr.

Indexing Locations

Suppose we are indexing talks in Solr that contain a title and a location. We need to add the field type for locations to our schema:

<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>

LatLonType is a subfield type which means that it not only creates one field but also additional fields, one for longitude and one for latitude. The subFieldSuffix attribute determines the name of the field that will be <fieldname>_<i>_<subFieldSuffix>. If the name of our field is location and we are indexing a latitude/longitude pair this would lead to three fields: location, location_0_coordinate, location_1_coordinate.

To use the type in our schema we need to add one field and one dynamic field definition for the sub fields:

<field name="location" type="location" indexed="true" stored="true"/>
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>

The dynamic field is of type tdouble so we need to make sure that it is also available in our schema. The attributes indexed on location is special in this case: It determines if the subfields for the coordinates are created at all.

Let's index some documents. We are adding three fields, the path (which is our id), the title of the talk and the location.

curl http://localhost:8983/solr/update/json?commit=true -H 'Content-type:application/json' -d '
[
 {"path" : "1", "title" : "Search Evolution", "location" : "49.487036,8.458001"},
 {"path" : "2", "title" : "Suchen und Finden mit Lucene und Solr", "location" : "49.013787,8.419936"}
]'

The location of the first document is Mannheim, the second Karlsruhe. We can see that our documents are indexed and that the location is stored by querying all documents:

curl "http://localhost:8983/solr/select?q=*%3A*&wt=json&indent=true"

Looking at the schema browser we can also see that the two subfields have been created. Each contains the terms for the Trie field.

Sorting by Distance

One use case you might have when indexing locations is to sort the results by distance from a certain location. This can for example be useful for classifieds or rentals to show the nearest results first.

Sorting can be done via the geodist() function. We need to pass in the location that is used as a basis via the pt parameter and the location field to use in the function via the sfield parameter. We can see this in action by sorting twice, once for a location in Durlach near Karlsruhe and once for Heidelberg, which is near Mannheim:

curl "http://localhost:8983/solr/select?wt=json&indent=true&q=*:*&sfield=location&pt=49.003421,8.483133&sort=geodist%28%29%20asc"
curl http://localhost:8983/solr/select?wt=json&indent=true&q=*:*&sfield=location&pt=49.399119,8.672479&sort=geodist%28%29%20asc

Both return the results in the correct order. You can also use the geodist() function to boost results that are closer to your location. See the Solr wiki for details.

Filtering by Distance

Another common use case is to filter the search results to only show results from a certain area, e.g. in a distance of 10 kilometers. This can either be done automatically or via facets.

Filtering is done using another function, geofilt(). It accepts the same parameters we have seen before but of course for filtering you add it as a filter query. The distance can be passed using the parameter d, the unit defaults to kilometers. Suppose you are in Durlach and only want to see talk that are in a distance of 10 kilometers:

curl "http://localhost:8983/solr/select?wt=json&indent=true&q=*:*&fq={!geofilt}&pt=49.003421,8.483133&sfield=location&d=10"

This only returns the result in Karlsruhe. Once we decide that we want to see results in a distance of 100 kilometers we again see both results:

curl "http://localhost:8983/solr/select?wt=json&indent=true&q=*:*&fq={!geofilt}&pt=49.003421,8.483133&sfield=location&d=100"

Pretty useful! If you are interested, there is more on the Solr wiki. Next week I will show you how to do the same using the new spatial support in Solr versions starting from 4.2.

About Florian Hopf

I am working as a freelance software developer and consultant in Karlsruhe, Germany and have written a German book about Elasticsearch. If you liked this post you can follow me on Twitter or subscribe to my feed to get notified of new posts. If you think I could help you and your company and you'd like to work with me please contact me directly.

Keine Kommentare:

Kommentar veröffentlichen

Elasticsearch - Der praktische Einstieg
Java Code Geeks