Learn How To Master Solr1.4
Learn How To Master Solr1.4
Solr 1.4
Yonik Seeley
March 4, 2010
Agenda
Faceting
Trie Fields (numeric ranges)
Distributed Search
1.5 Preview
Q&A
See http://bit.ly/mastersolr for
Slides for download after the presentation
On-demand viewing of the webinar in ~48 hours
facet.method=fc
for each doc in base set:
ord = FieldCache.order[doc]
accumulator[ord]++
O(docs_matching_q)
Not used for boolean
facet.method=enum
For each term in field:
Retrieve filter
Solr filterCache (in memory)
Calculate intersection size
hero:batman hero:flash
O(n_terms_in_field)
Short circuits based on term.df
filterCache entries int[ndocs] or BitSet(maxDoc)
filterCache concurrency + efficiency upgrades in 1.4
Size filterCache appropriately
Prevent filterCache use for small terms with
facet.enum.cache.minDf
item_cat:{field=cat,memSize=5376,tindexSize=52,time=2,phase1=2,
nTerms=16,bigTerms=10,termInstances=6,uses=44}
filterCache
Lower size (need room for normal fq filters and “big terms”)
Lower or eliminate autowarm count
1.3 enum method can sometimes be better
f.<fieldname>.facet.method=enum
Field has many values per document and many unique values
Huge index without faceting time constraints
*Index Size reflects only the portion of the index related to indexing the field
searchers
2x4
replication
http://...?shards=VIP1,VIP2,VIP3,VIP4
4x2
VIP1 VIP2 VIP3 VIP4
4x2
2 shards x 4 replicas
Fewer masters to manage (and fewer total servers)
Less increased load on other replicas when one goes down (33%
vs 100%)
Less network bandwidth
4 shards x 2 replicas
Greater indexing bandwidth
Partition by date
Easy incremental scalability - add more servers over time as needed
Easy to remove oldest data
Enables increased replication factor for newest data
Partition by document id
Works well for updating
Not easily resizable
Partitioning to allow querying a subset of shards
Increases system throughput, decreases network bandwidth
Partition by userId for mailbox search
Partition by region for geographic search
PointType
Compound values: 38.89,-77.03
Range queries and exact matches supported
Distance Functions
haversine
Sorting by function query
Still needed
Ability to return sort values
Distance filtering