This is the second part of the article on things I learned while building a simple Java based search application on top of ElasticSearch. In the first part of this article we looked at how to index data in ElasticSearch and what the mapping is. Though ElasticSearch is often called schema free specifying the mapping is still a crucial part of creating a search application. This time we will look at the query side and see how we can get our indexed talks out of it again.
Simple Search
Recall that our documents consist of a title, the date and the speaker of a talk. We have adjusted the mapping so that for the title we are using the German analyzer that stems our terms and we can search on variations of words. This curl request creates a similar index:
curl -XPUT "http://localhost:9200/blog" -d'
{
"mappings" : {
"talk" : {
"properties" : {
"title" : { "type" : "string", "store" : "yes", "analyzer" : "german" }
}
}
}
}'
Let's see how we can search on our content. We are indexing another document with a German title.
curl -XPOST "http://localhost:9200/blog/talk/" -d'
{
"speaker" : "Florian Hopf",
"date" : "2012-07-04T19:15:00",
"title" : "Suchen und Finden mit Lucene und Solr"
}'
All searching is done on the _search
endpoint that is available on the type or index level (you can also search on multiple types and indexes by separating them with a comma). As the title field uses the German analyzer we can search on variations of the words, e.g. suche which stems to the same root as suchen, such.
curl -XGET "http://localhost:9200/blog/talk/_search?q=title:suche&pretty=true"
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.15342641,
"hits" : [ {
"_index" : "blog",
"_type" : "talk",
"_id" : "A2Qv3fN3TkeYEhxA4zicgw",
"_score" : 0.15342641, "_source" : {
"speaker" : "Florian Hopf",
"date" : "2012-07-04T19:15:00",
"title" : "Suchen und Finden mit Lucene und Solr"
}
} ]
}
The _all field
Now that this works, we might want to search on multiple fields. ElasticSearch provides the convenience functionality of copying all field content to the _all
field that is used when omitting the field name in the query. Let's try the query again:
curl -XGET "http://localhost:9200/blog/talk/_search?q=suche&pretty=true"
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
No results. Why is that? Of course we have set the analyzer correctly for the title as we have seen above. But this doesn't mean that the content is analyzed in the same way for the _all
field. As we didn't specify an analyzer for this field it still uses the StandardAnalyzer
that splits on whitespace but doesn't do any stemming. If you want to have a consistent behavior for the title and the _all
field you need to set the analyzer in the mapping:
curl -XPUT "http://localhost:9200/blog/talk/_mapping" -d'
{
"mappings" : {
"talk" : {
"_all" : {"analyzer" : "german"},
"properties" : {
"title" : { "type" : "string", "store" : "yes", "analyzer" : "german" }
}
}
}
}'
Note that as with all mapping changes you can't change the type of the _all
field once it's created. You need to delete the index, put the new mapping and reindex your data. Afterwards our search will return the same results for the two queries.
_source
You might have noticed from the example above that ElasticSearch returns the special _source
field for each result. This is very convenient as you don't need to specify which fields should be stored. But be aware that this might become a problem for large fields that you don't need for each search request (content section of articles, images that you might store in the index). You can either disable the use of the source field and indicate which fields should be stored in the mapping for your indexed type or you can specify in the query which fields you'd like to retrieve:
curl -XGET "http://localhost:9200/blog/talk/_search?q=suche&pretty=true&fields=speaker,title"
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.15342641,
"hits" : [ {
"_index" : "blog",
"_type" : "talk",
"_id" : "MA2oYAqnTdqJhbjnCNq2zA",
"_score" : 0.15342641
}, {
"_index" : "blog",
"_type" : "talk",
"_id" : "aGdDy24cSImz6DVNSQ5iwA",
"_score" : 0.076713204,
"fields" : {
"speaker" : "Florian Hopf",
"title" : "Suchen und Finden mit Lucene und Solr"
}
} ]
}
The same can be done if you are not using the simple query parameters but the more advanced query DSL:
curl -XPOST "http://localhost:9200/blog/talk/_search" -d'
{
"fields" : ["title", "speaker"],
"query" : {
"term" : { "speaker" : "florian" }
}
}'
Querying from Java
Besides the JSON based Query DSL you can also query ElasticSearch using Java. The default ElasticSearch Java client provides builders for creating different parts of the query that can then be combinded. For example if you'd like to query on two fields using the multi_match
query this is what it looks like using curl:
curl -XPOST "http://localhost:9200/blog/_search" -d'
{
"query" : {
"multi_match" : {
"query" : "Solr",
"fields" : [ "title", "speaker" ]
}
}
}'
The Java version maps quite well to this. Once you found the builders you need you can use the excellent documentation of the Query DSL for your Java client as well.
QueryBuilder multiMatch = multiMatchQuery("Solr", "title", "speakers");
SearchResponse response = esClient.prepareSearch("blog")
.setQuery(multiMatch)
.execute().actionGet();
assertEquals(1, response.getHits().getTotalHits());
SearchHit hit = response.getHits().getAt(0);
assertEquals("Suchen und Finden mit Lucene und Solr", hit.getSource().get("title"));
The same QueryBuilder
we are constructing above can also be used on other parts of the query: For example it can be passed as a parameter to create a QueryFilterBuilder
or can be used to construct a QueryFacetBuilder
. This composition is a very powerful way to build flexible applications. It is easier to reason about the components of the query and you could even test parts of the query on its own.
Faceting
One of the most prominent features of ElasticSearch is its excellent faceting support that not only is used for building search applications but also for doing analytics of large data sets. You can use different kinds of faceting, e.g. for certain terms, using the TermsFacet
, or for queries, using the query facet. The query facet would accept the same QueryBuilder
that we used above.
TermsFacetBuilder facet = termsFacet("speaker").field("speaker");
QueryBuilder query = queryString("solr");
SearchResponse response = esClient.prepareSearch("blog")
.addFacet(facet)
.setQuery(query)
.execute().actionGet();
assertEquals(1, response.getHits().getTotalHits());
SearchHit hit = response.getHits().getAt(0);
assertEquals("Suchen und Finden mit Lucene und Solr", hit.getSource().get("title"));
TermsFacet resultFacet = response.getFacets().facet(TermsFacet.class, "speaker");
assertEquals(1, resultFacet.getEntries().size());
Conclusion
ElasticSearch has a really nice Java API, be it for indexing or for querying. You can get started with indexing and searching in no time though you need to know some concepts or the results might not be what you expect.
About Florian Hopf
I am working as a freelance software developer and consultant in Karlsruhe, Germany and have written a German book about Elasticsearch. If you liked this post you can follow me on Twitter or subscribe to my feed to get notified of new posts. If you think I could help you and your company and you'd like to work with me please contact me directly.
Keine Kommentare:
Kommentar veröffentlichen