Solr 3.1 and Beyond
Solr 3.1 and Beyond
1 and Beyond
Yonik Seeley
Lucid Imagination
[email protected]
October 8, 2010
2
Agenda
Goal : Introduce new features you can try & use now in
Solr development versions 3.1 or 4.0
Q&A
10/12/10 3
Solr 3.1? What happened to 1.5?
8
Spatial Search
Step1: Index some locations!
<field name=“name”>The Alpine Shop</field>
<field name=“store”>44.013617,-73.168264</field>
Step3: Profit!
10/12/10 9
RESULT GROUPING /
FIELD COLLAPSING
Field Collapsing Definition
Field collapsing
Limit the number of results per category
“category” normally defined by unique values in a field
Uses
Web Search – collapse by web site
Email threads – collapse by thread id
Ecommerce/retail
Show the top 5 items for each store category (music, movies,
etc)
Field Collapsing by Site
Result Grouping by Category
Field Collapse on Product Type
Group by Field
http://...&fl=id,name&q=ipod&group=true&group.field=manu_exact
"grouped":{
"manu_exact":{
"matches":3,
"groups":[{
"groupValue":"Belkin",
"doclist":{"numFound":2,"start":0,"docs":[
{
"id":"IW-02",
"name":"iPod & iPod Mini USB 2.0 Cable"}]
}},
{
"groupValue":"Apple Computer Inc.",
"doclist":{"numFound":1,"start":0,"docs":[
10/12/10 14
{
"id":"MA147LL/A",
Group by Query
http://...&group=true&group.query=price:[0 TO 99.99]
&group.query=price:[100 TO *]&group.limit=5
"grouped":{
"price:[0 TO 99.99]":{
"matches":3,
"doclist":{"numFound":2,"start":0,"docs":[
{
"id":"IW-02",
"name":"iPod & iPod Mini USB 2.0 Cable"},
{
"id":"F8V7067-APL-KIT",
"name":"Belkin Mobile Power Cord for iPod"}]
}},
"price:[100 TO *]":{
"matches":3,
10/12/10 15
"doclist":{"numFound":1,"start":0,"docs":[
{
Grouping Params
parameter meaning default
10/12/10 18
Pivot Faceting
http://...&facet=true&facet.pivot=cat,popularity
"facet_counts":{ (continued)
"facet_pivot":{
"cat,popularity":[{ {
"field":"cat", "field":"popularity",
14 docs w/ "value":"electronics", "value":"1",
cat==electronics "count":14, "count":2}]},
"pivot":[{ {
5 docs w/ "field":"popularity", "field":"cat",
cat==electronics "value":"6", "value":"memory",
&& popularity==6 "count":5}, "count":3,
{ "pivot":[]},
"field":"popularity",
10/12/10
"value":"7", […] 19
"count":4},
Range Faceting
"facet_counts":{
• Like Date faceting, but "facet_ranges":{
more generic "price":{
"counts":{
"0.0":5,
http://...&facet=true "50.0":2,
&facet.range=price "100.0":0,
"150.0":2,
&facet.range.start=0 "200.0":0,
&facet.range.end=500 "250.0":1,
"300.0":2,
&facet.range.gap=50 "350.0":2,
"400.0":0,
"450.0":1},
"gap":50.0,
10/12/10
"start":0.0, 20
"end":500.0}}}}
Existing single-valued faceting
algorithm
Documents
matching the
base query Lucene FieldCache Entry
“Juggernaut” (StringIndex) for the “hero” field
q=Juggernaut 0 order: for each
&facet=true 2 lookup doc, an index into lookup: the
&facet.field=hero the lookup array string values
7
5 (null)
3 batman
accumulator
5 flash
0
1 spiderman
1
4 superman
Priority queue 0 increment
flash, 5
5 wolverine
0
Batman, 3 2
0
1
2
Per-segment single-valued
algorithm
Segment1 Segment2 Segment3 Segment4
FieldCache FieldCache FieldCache FieldCache
Entry Entry Entry Entry
10/12/10 25
SCALABILITY
SolrCloud
Firststeps toward simplifying cluster management
Integrates Zookeeper
Central configuration (schema.xml, solrconfig.xml, etc)
Tracks live nodes + shards of collections
Removes need for external load balancers
shards=localhost:8983/solr|localhost:8900/solr,
localhost:7574/solr|localhost:7500/solr
Can specify logical shard ids
shards=NY_shard,NJ_shard
Clients don’t need to know shards at all:
http://localhost:8983/solr/collection1/select?distrib=true
SolrCloud : The Future
Eliminate
all single points of failure
Remove Master/Searcher distinction
Enables near real-time search in a highly scalable environment
High Availability for Writes
Eventual consistency model (like Amazon Dynamo, Cassandra)
Elastic
Simply add/subtract servers, cluster will rebalance automatically
By default, Solr will handle document partitioning
ODDS & ENDS
Auto-Suggest
Many people currently use terms component
Can be slow for a large corpus
New auto-suggest builds off SpellCheck component
Compact memory based trie for really fast completions
Based on a field in the main index, or on a dictionary file
http://localhost:8983/solr/suggest?wt=json&indent=true&q=ult
"spellcheck":{
"suggestions":[
"ult",{
"numFound":1,
"startOffset":0,
"endOffset":3,
10/12/10
"suggestion":["ultrasharp"]}, 30
"collation","ultrasharp"]}}
Index with JSON
$
URL=http://localhost:8983/solr/update/json
$
curl
$URL
-‐H
'Content-‐type:application/json'
-‐d
'
{
"add":
{
"doc":
{
"id"
:
"978-‐0641723445",
"cat"
:
["book","hardcover"],
"title"
:
"The
Lightning
Thief",
"author"
:
"Rick
Riordan",
"series_t"
:
"Percy
Jackson
and
the
Olympians",
"sequence_i"
:
1,
"genre_s"
:
"fantasy",
"inStock"
:
true,
"price"
:
12.50,
"pages_i"
:
384
}
}
31
}'
Query Results in CSV
http://localhost:8983/solr/select?q=ipod&fl=name,price,cat,popularity&wt=csv
name,price,cat,popularity
iPod & iPod Mini USB 2.0 Cable,11.5,"electronics,connector",1
Belkin Mobile Power Cord for iPod w/ Dock,19.95,"electronics,connector",1
Apple 60 GB iPod with Video Playback Black,399.0,"electronics,music",10
Results are streamed – good for dumping entire parts of the index
10/12/10 32
http://localhost:8983/solr/browse
10/12/10 33
Q&A
For more information about Solr visit
www.lucidimagination.com