SlideShare a Scribd company logo
Rebuilding Solr 6 examples –
layer by layer
Alexandre Rafalovitch
www.solr-start.com
Who am I
• Software developer with 20+ years of experience
– Including 3 years as Senior Tech Support (BEA Weblogic)
• Solr popularizer
• Published book author on Solr Indexing (for Solr 4.3)
• Run http://www.solr-start.com resource site
• Solr committer (since August 2016)
• Past and present Solr focus on onboarding, usability,
tooling, information sharing
Example catch-22
• Search is a – surprisingly - complex expertise
• Solr is a complex product
– Wide
– Deep
– History-rich
• And so are its many examples
Fasten the seatbelt
• Review all of the (Solr 6.2) OOTB examples
• Make a small one from scratch
• Deconstruct a real shipped example
• Next learning action...
OOTB Examples – how many?
bin/solr start –e
-e <example> Name of the example to run; available examples:
cloud: SolrCloud example
techproducts: Comprehensive example
illustrating many of Solr's core capabilities
dih: Data Import Handler
schemaless: Schema-less example
techproducts example
• Used to be collection1
• solr.home: example/techproducts/solr
– Can restart with
bin/solr start -s example/techproducts/solr
– Actual core at
example/techproducts/solr/techproducts
techproducts example (cont.)
• Source configuration
– server/solr/configset/sample_techproducts_config
– Not actually a configset (copy, not share)
• Can be rebuilt
rm –rf example/techproducts
• Has data (14 files of products, money, utf8 tests)
bin/post -c techproducts example/exampledocs/*.xml
schemaless example
• solr.home: example/schemaless/solr
• Actual core: example/schemaless/solr/gettingstarted
• Source configuration:
– server/solr/configset/data_driven_schema_configs
– Config you get when you are not using config:
bin/solr create -c newcore
• No data, but can take (nearly) anything:
bin/post -c <name> example/exampledocs/*.xml
schemaless mode?
• “Let us guess what you mean”
– Auto-guess field type based on first content occurrence
– Create explicit field definitions
• booleans, dates, numbers, strings
• Always multivalued (because: who knows?!?)
• Can be configured (URP chain in solrconfig.xml)
– Rewrites managed-schema (coments begone!)
– Makes search work with
<copyField source="*" dest="_text_"/>
techproducts vs schemaless
• Configured techproducts vs
auto-detecting schemaless
• Strings
"name":"Test with some GB18030 encoded characters",
"name":["Test with some GB18030 encoded characters"],
• Numbers
"price":0.0, "price_c":"0.0,USD",
"price":[0.0],
• Booleans
"inStock":true,
"inStock":[true],
cloud example
• Highly configurable (unless using –noprompt)
• solr.home: example/cloud/nodeX/solr
• Source configuration is a choice
Please choose a configuration for the gettingstarted collection, available
options are: basic_configs, data_driven_schema_configs, or
sample_techproducts_configs [data_driven_schema_configs]
• Can be rebuilt:
bin/solr stop -all
rm -rf example/cloud
• Demonstrates Config API (configoverlay.json)
dih example(s)
• Data import handler – legacy, but still kicking
• solr.home: example/example-DIH/solr
• Has 5 (five!) different cores
– db - database import (example/example-DIH/hsqldb/ex.*)
– solr - import from another Solr core (configured for db core)
– mail - import from IMAP (needs some configuration)
– tika - import rich-content (example/exampledocs/solr-word.pdf)
– rss - external XML feed (very broken right now)
• Cannot be rebuilt – only emptied
bin/post -c db -type 'application/json' -d '{delete: {query:"*:*"}}'
What about: bin/solr start?
• solr.home: server/solr
• No initial collection/cores, have to create explicitly:
– With script (see bin/solr create_core –h for details):
bin/solr create –c <corename> -d <name or path>
– With Core Admin UI for non-SolrCloud:
http://localhost:8983/solr/admin/cores?action=CREATE&…
– With Collection API for SolrCloud:
http://localhost:8983/admin/collections?action=CREATE&…
basic_configs configuration
• Available for cloud example
and explicit creation
• Schemaless mode is configured, not enabled
• “Minimal Solr configuration” !?!
– managed-schema: 1005 lines
– solrconfig.xml: 1484 lines
files example
• Specifically tuned for file indexing
– Augmented schemaless mode with language,
content-type guessing
– Custom /browse end-point
– Source configuration: example/files/conf
– Setup instructions: example/files/README.txt
– Bring your own data
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
films example
• Schemaless (Based on data_driven_schema_configs)
– Uses Schema API to add custom fields
– Uses schemaless for rest of fields
• Comes with its own data (1100 film records)
• Uses velocity (/browse), Schema API, Request
Parameters API (params.json)
• Setup instructions: example/films/README.txt
That was a good news
• Many examples
• Easy to get one running
• Some come with data
• Some you can throw your own data into
• Lots of comments
This is the bad news
Files Types Fields Dynamic
Fields
managed-schema
size
solrconfig.
xml size
basic 46 71 4 73 1005 1484
data_driven 46 71 4 73 1005 1482
techproducts 101 66 33 28 1149 1701
dih db 62 62 31 28 1129 1490
dih tika 6 61 3 27 901 1466
files 69 73 9 73 517 1508
films
(data_driven+)
46 71 8 73 481 1482
Tip – getting these numbers
• XML extraction with XMLStarlet (XLST CLI)
– xml sel -t -m "//fieldType" -v @name -n managed-schema
– xml sel -t -m "//copyField" -c . -n managed-schema |wc -l
– xml sel -t -m "//*[@docValues]"
-v "concat(local-name(), ' ', @name, ' docValues:',
@docValues)" -n managed-schema
– xml sel -t -m "//requestHandler" -v "@name" -n
solrconfig.xml
Why is it like this?
• Many examples predate Solr Reference Guide
• grep for options, possibilities, defaults
• Each example is a kitchen sink
“Too much of a good thing is also a bad thing”
Source: 1980s Soviet joke about Virtual Reality
Go small – managed-schema
<schema name="demo" version="1.6">
<dynamicField name="*" type="string"
indexed="true" stored="true" multiValued="true"/>
<field name="text" type="text_basic"
indexed="true" stored="false" multiValued="true"/>
<copyField source="*" dest="text"/>
…
Go small – managed-schema(2)
…
<fieldType name="string" class="solr.StrField"/>
<fieldType name="text_basic" class="solr.TextField">
<analyzer>
<tokenizer class="solr.LowerCaseTokenizerFactory" />
</analyzer>
</fieldType>
</schema>
Go small – solrconfig.xml
<config>
<luceneMatchVersion>6.2.0</luceneMatchVersion>
<requestHandler name="/select” class="solr.SearchHandler”>
<lst name="defaults">
<str name="df">text</str>
</lst>
</requestHandler>
</config>
Go small – load and test
• bin/solr create -c demo -d .../demo-config/
• bin/post -c demo example/exampledocs/*.xml
• Test it works, using HTTPie (HTTP CLI)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Go small - review
• Minimal example could be very minimal
• Some things will not work
– No uniqueKey – no way to update documents, no
SolrCloud
– No _version_ – no SolrCloud
– Everything is multiValued – no sorting
– copyField * => text, no meaningful relevancy,
specialized analyzer chain processing
Deconstructing films example
• bin/solr create –c films
• curl http://localhost:8983/solr/films/schema ... (add name,
initial_release_date)
• Index 1100 records from
– (Solr) XML,
– (generic) JSON (doc), or
– CSV format
• Search for batman
• Use /browse end-point and search for batman
• Enable highlighting in results
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Initial stats for films core
Sizes (line counts)
managed-schema* 481
solrconfig.xml 1482
params.json 20
File count in conf
.txt 41
.xml 3
.json 1
managed-schema (xml) 1
* already has no comments
Deconstructing – just straight tags
• managed-schema lost comments during
construction
• Let's remove comments from solrconfig.xml
• xml ed -L -d "//comment()" solrconfig.xml
– Edit in place
– Delete XPATH
solrconfig.xml without comments
Sizes (line counts)
managed-schema 481
solrconfig.xml 1482
278
params.json 20
File count in conf
.txt 41
.xml 3
.json 1
managed-schema (xml) 1
Deconstructing – what to clean
• Currently
– (explicit) fields: 8
– dynamic fields: 73
• xml sel -t -m "//dynamicField" -v @name -n managed-
schema |wc -l
– types: 71
– copyFields: 1
• Let's start from dynamic fields
Deconstructing – dynamic fields
• Used dynamic fields
– do NOT modify schema
– DO show up in Admin UI, if used
– Example from different schema:
• Used/matched fields
• Generic definitions
Deconstructing – in use dynamic fields
Deconstructing – in use dynamic fields
• NO dynamic fields are used
– * is a copyField instruction
• Can remove them all
• xml ed -L -d "//dynamicField"
managed-schema
Remove dynamicFields
Sizes (line counts)
managed-schema 481
409
solrconfig.xml 278
params.json 20
File count in conf
.txt 41
.xml 3
.json 1
managed-schema (xml) 1
Deconstructing – field types
• How many types out of 71 do we use?
– xml sel -t -m "//field|//dynamicField"
-v "@type" -n conf/managed-schema |sort –u
– long, string, strings, tdate, text_general
• But also some in solrconfig.xml
– booleans, string, strings, tdates, tdoubles, text_general,
tlongs
• Combined total: 9 field type definitions
• Delete the rest (by hand)
Remove no-longer used types
Sizes (line counts)
managed-schema 409
34 (!!!)
solrconfig.xml 278
params.json 20
File count in conf
.txt 41
.xml 3
.json 1
managed-schema (xml) 1
Deconstructing – support files
• Inside lang directory (38 files)
– find lang –name 'stopwords_*.txt' | wc -l
• stopwords_*.txt: 30 files
• contractions_*.txt: 4 files
– find lang -type f |egrep -v 'stopwords_|contractions_'
• hyphenations_ga.txt, stemdict_nl.txt, stoptags_ja.txt,
userdict_ja.txt
Support files – still in use?
• Check for usage
– grep -o 'stopwords_.*.txt' managed-schema solrconfig.xml
– grep -o 'contractions_.*.txt' ...
– ...
• NO Matches (we no longer have related types)
– Delete the whole lang directory
• What about files just inside config directory
– Don't need currency.xml, protwords.txt
Remove no-longer used types
Sizes (line counts)
managed-schema 34
solrconfig.xml 278
params.json 20
File count in conf
.txt 41 2
.xml 3 2
.json 1
managed-schema (xml) 1
Deconstructing – actual field usage
Actual field usage - _root_
The mystery of _root_
• In the original schema – no explanations
• Documentation – used for nested documents:
To support nested documents, the schema must include an indexed/non-stored
field _root_ . The value of that field is populated automatically and is the same for
all documents in the block, regardless of the inheritance depth.
• We are not using nested documents
• And neither does any other shipped example...
Remove _root_
Sizes (line counts)
managed-schema 34 33
solrconfig.xml 278
params.json 20
File count in conf
.txt 2
.xml 2
.json 1
managed-schema (xml) 1
Deconstructing – text_general type
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"
multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
text_general support files
stopwords.txt
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
• synonyms.txt
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with# the
License. You may obtain a copy of the License at#.
......
.#-----------------------------------------------------------------------
#some test synonym mappings unlikely to appear in real input textaaafoo =>
aaabar
bbbfoo => bbbfoo bbbbar
cccfoo => cccbar cccbaz
fooaaa,baraaa,bazaaa
# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs
#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming
#after us won't split it into two words.
# Synonym mappings can be used for spelling correction
toopixima => pixma
text_general's empty stopwords
• No file
=> default stopwords
=> English
• Empty file
=> disabled stopwords
• Currently – NOT used
text_general simplified definition
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Remove stopwords and synonyms
Sizes (line counts)
managed-schema 33 26
solrconfig.xml 278
params.json 20
File count in conf
.txt 2 0
.xml 2
.json 1
managed-schema (xml) 1
How far did we get
Sizes (line counts)
managed-schema* 481 26
solrconfig.xml 1482
278
params.json 20
File count in conf
.txt 41 0
.xml 3 2
.json 1
managed-schema (xml) 1
* already has no comments
Deconstructing – solrconfig.xml
• solrconfig.xml is more complex than schema
• Heterogeneous Sections
• Nested definitions
• Alternative implementations (e.g. highlighter)
• Also remember
– configoverlay.json – overrides solrconfig.xml
– params.json – additional configuration parameters
solrconfig.xml – feature counts
11 requestHandler
8 lib
5 searchComponent
3 queryResponseWriter
2 initParams
1 updateRequestProcessorChain
1 updateHandler
1 requestDispatcher
1 query
1 luceneMatchVersion
1 jmx
1 indexConfig
1 directoryFactory
1 dataDir
1 codecFactory
solrconfig.xml – line counts
55:<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
52:<searchComponent class="solr.HighlightComponent" name="highlight">
18:<query>
17:<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
15:<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
13:<updateHandler class="solr.DirectUpdateHandler2">
9:<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
8:<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
8:<requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
7:<requestHandler name="/update/extract" startup="lazy"
class="solr.extraction.ExtractingRequestHandler">
7:<requestHandler name="/query" class="solr.SearchHandler">
6:<requestHandler name="/debug/dump" class="solr.DumpRequestHandler">
......
Remember, this works!
<config>
<luceneMatchVersion>6.2.0</luceneMatchVersion>
<requestHandler name="/select” class="solr.SearchHandler”>
<lst name="defaults">
<str name="df">text</str>
</lst>
</requestHandler>
</config>
add-unknown-fields-to-the-schema
• Famous "schemaless" mode
• Generic, but fully configurable
• Far from perfect
– Remember, we had to manually pre-add fields
– Development, not production
– Has normalization side-effects (normalizes dates)
• Cannot remove it in our example
solrconfig.xml - highlighter
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<fragmenter name="gap" default="true"
class="solr.highlight.GapFragmenter">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<fragmenter name="regex" class="solr.highlight.RegexFragmenter">
<lst name="defaults">
<int name="hl.fragsize">70</int>
<float name="hl.regex.slop">0.5</float>
<str name="hl.regex.pattern">[-w ,/n"']{20,200}</str>
</lst>
</fragmenter>
<formatter name="html" default="true"
class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
<encoder name="html" class="solr.highlight.HtmlEncoder"/>
<fragListBuilder name="simple" class="solr.highlight.SimpleFragListBuilder"/>
<fragListBuilder name="single" class="solr.highlight.SingleFragListBuilder"/>
.......
• fragmenters
• encoders
• fragListBuilders
• fragmentBuilders
• boundaryScanners
• ....
highlighter – the truth
• Highlighter searchComponent is in default stack
• The params are a mix of standard highlighter,
alternative FastVector highlighter
• Cannot use FastVector version as schema fields
are missing termVectors, etc
• And standard highlighter params are same as
implicit values
• Therefore, we can remove the WHOLE definition
Remove highlighter
Sizes (line counts)
managed-schema 26
solrconfig.xml 278 226
params.json 20
File count in conf
.txt 0
.xml 2
.json 1
managed-schema (xml) 1
Other searchComponents
• Not on the default stack
– spellcheck
– term
– termVector
– elevator
• Have dedicated requestHandlers
• Inception (example within example)
• Can be deleted
– also delete elevate.xml
15:<searchComponent name="spellcheck"
class="solr.SpellCheckComponent">
17:<requestHandler name="/spell"
class="solr.SearchHandler" startup="lazy">
1:<searchComponent name="terms"
class="solr.TermsComponent"/>
9:<requestHandler name="/terms"
class="solr.SearchHandler" startup="lazy">
1:<searchComponent name="tvComponent"
class="solr.TermVectorComponent"/>
8:<requestHandler name="/tvrh"
class="solr.SearchHandler" startup="lazy">
4:<searchComponent name="elevator"
class="solr.QueryElevationComponent">
8:<requestHandler name="/elevate"
class="solr.SearchHandler" startup="lazy">
Remove custom searchComponents
Sizes (line counts)
managed-schema 26
solrconfig.xml 226 163
params.json 20
File count in conf
.txt 0
.xml 2 1
.json 1
managed-schema (xml) 1
solrconfig.xml – more stuff
• There is more that can be taken out
– query section, since you have to tune it anyway
– updateHandler, and revert to basic commits
– jmx
– enableRemoteStreaming – definitely take that out
• But keep velocity, browse, search support
Next action
• Join the (virtual) Solr Example Reading Group
– Starts November 2016
– Register at http://bit.ly/SolrERG
• Join mailing list at http://www.solr-start.com
– Get the link to the presentation source
– Learn about other similar projects
– Get news of Solr articles and projects on the web
Ad

More Related Content

What's hot (20)

Apache Solr
Apache SolrApache Solr
Apache Solr
Minh Tran
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
th0masr
 
Rapid Solr Schema Development (Phone directory)
Rapid Solr Schema Development (Phone directory)Rapid Solr Schema Development (Phone directory)
Rapid Solr Schema Development (Phone directory)
Alexandre Rafalovitch
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
Yonik Seeley
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Jayesh Bhoyar
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Ecommerce Solution Provider SysIQ
 
Mastering solr
Mastering solrMastering solr
Mastering solr
jurcello
 
From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)
Alexandre Rafalovitch
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Alexandre Rafalovitch
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
Erik Hatcher
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
Biogeeks
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
Erik Hatcher
 
Apache Solr
Apache SolrApache Solr
Apache Solr
Semih Hakkıoğlu
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
Erik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Erik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
Erik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
th0masr
 
Rapid Solr Schema Development (Phone directory)
Rapid Solr Schema Development (Phone directory)Rapid Solr Schema Development (Phone directory)
Rapid Solr Schema Development (Phone directory)
Alexandre Rafalovitch
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
Yonik Seeley
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Jayesh Bhoyar
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Ecommerce Solution Provider SysIQ
 
Mastering solr
Mastering solrMastering solr
Mastering solr
jurcello
 
From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)
Alexandre Rafalovitch
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Alexandre Rafalovitch
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
Erik Hatcher
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
Biogeeks
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
Erik Hatcher
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
Erik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Erik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
Erik Hatcher
 

Viewers also liked (20)

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
Trey Grainger
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
ashish0x90
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Andy Jackson
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
Chris Caple
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Alexandre Rafalovitch
 
Solr typo3 konfiguration workshop
Solr typo3 konfiguration workshopSolr typo3 konfiguration workshop
Solr typo3 konfiguration workshop
jweiland
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
Erik Hatcher
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Lucidworks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)
searchbox-com
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
Deep Dive on ArangoDB
Deep Dive on ArangoDBDeep Dive on ArangoDB
Deep Dive on ArangoDB
Max Neunhöffer
 
FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetup
Theodoros Vasiloudis
 
Drupal 7 and SolR
Drupal 7 and SolRDrupal 7 and SolR
Drupal 7 and SolR
Patrick Morin
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Lucidworks
 
EmilyHauserResumeJuly2016
EmilyHauserResumeJuly2016EmilyHauserResumeJuly2016
EmilyHauserResumeJuly2016
Emily Hauser
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Gregg Donovan
 
Creating Fault Tolerant Services on Mesos
Creating Fault Tolerant Services on MesosCreating Fault Tolerant Services on Mesos
Creating Fault Tolerant Services on Mesos
ArangoDB Database
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
Trey Grainger
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
ashish0x90
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Andy Jackson
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
Chris Caple
 
Solr typo3 konfiguration workshop
Solr typo3 konfiguration workshopSolr typo3 konfiguration workshop
Solr typo3 konfiguration workshop
jweiland
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Lucidworks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)
searchbox-com
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetup
Theodoros Vasiloudis
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Lucidworks
 
EmilyHauserResumeJuly2016
EmilyHauserResumeJuly2016EmilyHauserResumeJuly2016
EmilyHauserResumeJuly2016
Emily Hauser
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Gregg Donovan
 
Creating Fault Tolerant Services on Mesos
Creating Fault Tolerant Services on MesosCreating Fault Tolerant Services on Mesos
Creating Fault Tolerant Services on Mesos
ArangoDB Database
 
Ad

Similar to Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016) (20)

Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Lucidworks
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
Kais Hassan, PhD
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
Alkacon Software GmbH & Co. KG
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Kai Chan
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Lucidworks (Archived)
 
Alfresco Content Modelling and Policy Behaviours
Alfresco Content Modelling and Policy BehavioursAlfresco Content Modelling and Policy Behaviours
Alfresco Content Modelling and Policy Behaviours
J V
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
DataArt
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
lucenerevolution
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 
Logstash
LogstashLogstash
Logstash
琛琳 饶
 
JS Essence
JS EssenceJS Essence
JS Essence
Uladzimir Piatryka
 
Advanced guide to develop ajax applications using dojo
Advanced guide to develop ajax applications using dojoAdvanced guide to develop ajax applications using dojo
Advanced guide to develop ajax applications using dojo
Fu Cheng
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
Yasas Senarath
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
Alexander Tokarev
 
Rails Tips and Best Practices
Rails Tips and Best PracticesRails Tips and Best Practices
Rails Tips and Best Practices
David Keener
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analytics
Microsoft Tech Community
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...
Fabio Franzini
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
Anass Bensrhir - Senior Data Scientist
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
Antonio García-Domínguez
 
Hot tutorials
Hot tutorialsHot tutorials
Hot tutorials
Kanagaraj M
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Lucidworks
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
Kais Hassan, PhD
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
Alkacon Software GmbH & Co. KG
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Kai Chan
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Lucidworks (Archived)
 
Alfresco Content Modelling and Policy Behaviours
Alfresco Content Modelling and Policy BehavioursAlfresco Content Modelling and Policy Behaviours
Alfresco Content Modelling and Policy Behaviours
J V
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
DataArt
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
lucenerevolution
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 
Advanced guide to develop ajax applications using dojo
Advanced guide to develop ajax applications using dojoAdvanced guide to develop ajax applications using dojo
Advanced guide to develop ajax applications using dojo
Fu Cheng
 
Rails Tips and Best Practices
Rails Tips and Best PracticesRails Tips and Best Practices
Rails Tips and Best Practices
David Keener
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analytics
Microsoft Tech Community
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...
Fabio Franzini
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
Antonio García-Domínguez
 
Ad

Recently uploaded (20)

Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 

Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)

  • 1. Rebuilding Solr 6 examples – layer by layer Alexandre Rafalovitch www.solr-start.com
  • 2. Who am I • Software developer with 20+ years of experience – Including 3 years as Senior Tech Support (BEA Weblogic) • Solr popularizer • Published book author on Solr Indexing (for Solr 4.3) • Run http://www.solr-start.com resource site • Solr committer (since August 2016) • Past and present Solr focus on onboarding, usability, tooling, information sharing
  • 3. Example catch-22 • Search is a – surprisingly - complex expertise • Solr is a complex product – Wide – Deep – History-rich • And so are its many examples
  • 4. Fasten the seatbelt • Review all of the (Solr 6.2) OOTB examples • Make a small one from scratch • Deconstruct a real shipped example • Next learning action...
  • 5. OOTB Examples – how many? bin/solr start –e -e <example> Name of the example to run; available examples: cloud: SolrCloud example techproducts: Comprehensive example illustrating many of Solr's core capabilities dih: Data Import Handler schemaless: Schema-less example
  • 6. techproducts example • Used to be collection1 • solr.home: example/techproducts/solr – Can restart with bin/solr start -s example/techproducts/solr – Actual core at example/techproducts/solr/techproducts
  • 7. techproducts example (cont.) • Source configuration – server/solr/configset/sample_techproducts_config – Not actually a configset (copy, not share) • Can be rebuilt rm –rf example/techproducts • Has data (14 files of products, money, utf8 tests) bin/post -c techproducts example/exampledocs/*.xml
  • 8. schemaless example • solr.home: example/schemaless/solr • Actual core: example/schemaless/solr/gettingstarted • Source configuration: – server/solr/configset/data_driven_schema_configs – Config you get when you are not using config: bin/solr create -c newcore • No data, but can take (nearly) anything: bin/post -c <name> example/exampledocs/*.xml
  • 9. schemaless mode? • “Let us guess what you mean” – Auto-guess field type based on first content occurrence – Create explicit field definitions • booleans, dates, numbers, strings • Always multivalued (because: who knows?!?) • Can be configured (URP chain in solrconfig.xml) – Rewrites managed-schema (coments begone!) – Makes search work with <copyField source="*" dest="_text_"/>
  • 10. techproducts vs schemaless • Configured techproducts vs auto-detecting schemaless • Strings "name":"Test with some GB18030 encoded characters", "name":["Test with some GB18030 encoded characters"], • Numbers "price":0.0, "price_c":"0.0,USD", "price":[0.0], • Booleans "inStock":true, "inStock":[true],
  • 11. cloud example • Highly configurable (unless using –noprompt) • solr.home: example/cloud/nodeX/solr • Source configuration is a choice Please choose a configuration for the gettingstarted collection, available options are: basic_configs, data_driven_schema_configs, or sample_techproducts_configs [data_driven_schema_configs] • Can be rebuilt: bin/solr stop -all rm -rf example/cloud • Demonstrates Config API (configoverlay.json)
  • 12. dih example(s) • Data import handler – legacy, but still kicking • solr.home: example/example-DIH/solr • Has 5 (five!) different cores – db - database import (example/example-DIH/hsqldb/ex.*) – solr - import from another Solr core (configured for db core) – mail - import from IMAP (needs some configuration) – tika - import rich-content (example/exampledocs/solr-word.pdf) – rss - external XML feed (very broken right now) • Cannot be rebuilt – only emptied bin/post -c db -type 'application/json' -d '{delete: {query:"*:*"}}'
  • 13. What about: bin/solr start? • solr.home: server/solr • No initial collection/cores, have to create explicitly: – With script (see bin/solr create_core –h for details): bin/solr create –c <corename> -d <name or path> – With Core Admin UI for non-SolrCloud: http://localhost:8983/solr/admin/cores?action=CREATE&… – With Collection API for SolrCloud: http://localhost:8983/admin/collections?action=CREATE&…
  • 14. basic_configs configuration • Available for cloud example and explicit creation • Schemaless mode is configured, not enabled • “Minimal Solr configuration” !?! – managed-schema: 1005 lines – solrconfig.xml: 1484 lines
  • 15. files example • Specifically tuned for file indexing – Augmented schemaless mode with language, content-type guessing – Custom /browse end-point – Source configuration: example/files/conf – Setup instructions: example/files/README.txt – Bring your own data
  • 17. films example • Schemaless (Based on data_driven_schema_configs) – Uses Schema API to add custom fields – Uses schemaless for rest of fields • Comes with its own data (1100 film records) • Uses velocity (/browse), Schema API, Request Parameters API (params.json) • Setup instructions: example/films/README.txt
  • 18. That was a good news • Many examples • Easy to get one running • Some come with data • Some you can throw your own data into • Lots of comments
  • 19. This is the bad news Files Types Fields Dynamic Fields managed-schema size solrconfig. xml size basic 46 71 4 73 1005 1484 data_driven 46 71 4 73 1005 1482 techproducts 101 66 33 28 1149 1701 dih db 62 62 31 28 1129 1490 dih tika 6 61 3 27 901 1466 files 69 73 9 73 517 1508 films (data_driven+) 46 71 8 73 481 1482
  • 20. Tip – getting these numbers • XML extraction with XMLStarlet (XLST CLI) – xml sel -t -m "//fieldType" -v @name -n managed-schema – xml sel -t -m "//copyField" -c . -n managed-schema |wc -l – xml sel -t -m "//*[@docValues]" -v "concat(local-name(), ' ', @name, ' docValues:', @docValues)" -n managed-schema – xml sel -t -m "//requestHandler" -v "@name" -n solrconfig.xml
  • 21. Why is it like this? • Many examples predate Solr Reference Guide • grep for options, possibilities, defaults • Each example is a kitchen sink “Too much of a good thing is also a bad thing” Source: 1980s Soviet joke about Virtual Reality
  • 22. Go small – managed-schema <schema name="demo" version="1.6"> <dynamicField name="*" type="string" indexed="true" stored="true" multiValued="true"/> <field name="text" type="text_basic" indexed="true" stored="false" multiValued="true"/> <copyField source="*" dest="text"/> …
  • 23. Go small – managed-schema(2) … <fieldType name="string" class="solr.StrField"/> <fieldType name="text_basic" class="solr.TextField"> <analyzer> <tokenizer class="solr.LowerCaseTokenizerFactory" /> </analyzer> </fieldType> </schema>
  • 24. Go small – solrconfig.xml <config> <luceneMatchVersion>6.2.0</luceneMatchVersion> <requestHandler name="/select” class="solr.SearchHandler”> <lst name="defaults"> <str name="df">text</str> </lst> </requestHandler> </config>
  • 25. Go small – load and test • bin/solr create -c demo -d .../demo-config/ • bin/post -c demo example/exampledocs/*.xml • Test it works, using HTTPie (HTTP CLI)
  • 27. Go small - review • Minimal example could be very minimal • Some things will not work – No uniqueKey – no way to update documents, no SolrCloud – No _version_ – no SolrCloud – Everything is multiValued – no sorting – copyField * => text, no meaningful relevancy, specialized analyzer chain processing
  • 28. Deconstructing films example • bin/solr create –c films • curl http://localhost:8983/solr/films/schema ... (add name, initial_release_date) • Index 1100 records from – (Solr) XML, – (generic) JSON (doc), or – CSV format • Search for batman • Use /browse end-point and search for batman • Enable highlighting in results
  • 30. Initial stats for films core Sizes (line counts) managed-schema* 481 solrconfig.xml 1482 params.json 20 File count in conf .txt 41 .xml 3 .json 1 managed-schema (xml) 1 * already has no comments
  • 31. Deconstructing – just straight tags • managed-schema lost comments during construction • Let's remove comments from solrconfig.xml • xml ed -L -d "//comment()" solrconfig.xml – Edit in place – Delete XPATH
  • 32. solrconfig.xml without comments Sizes (line counts) managed-schema 481 solrconfig.xml 1482 278 params.json 20 File count in conf .txt 41 .xml 3 .json 1 managed-schema (xml) 1
  • 33. Deconstructing – what to clean • Currently – (explicit) fields: 8 – dynamic fields: 73 • xml sel -t -m "//dynamicField" -v @name -n managed- schema |wc -l – types: 71 – copyFields: 1 • Let's start from dynamic fields
  • 34. Deconstructing – dynamic fields • Used dynamic fields – do NOT modify schema – DO show up in Admin UI, if used – Example from different schema: • Used/matched fields • Generic definitions
  • 35. Deconstructing – in use dynamic fields
  • 36. Deconstructing – in use dynamic fields • NO dynamic fields are used – * is a copyField instruction • Can remove them all • xml ed -L -d "//dynamicField" managed-schema
  • 37. Remove dynamicFields Sizes (line counts) managed-schema 481 409 solrconfig.xml 278 params.json 20 File count in conf .txt 41 .xml 3 .json 1 managed-schema (xml) 1
  • 38. Deconstructing – field types • How many types out of 71 do we use? – xml sel -t -m "//field|//dynamicField" -v "@type" -n conf/managed-schema |sort –u – long, string, strings, tdate, text_general • But also some in solrconfig.xml – booleans, string, strings, tdates, tdoubles, text_general, tlongs • Combined total: 9 field type definitions • Delete the rest (by hand)
  • 39. Remove no-longer used types Sizes (line counts) managed-schema 409 34 (!!!) solrconfig.xml 278 params.json 20 File count in conf .txt 41 .xml 3 .json 1 managed-schema (xml) 1
  • 40. Deconstructing – support files • Inside lang directory (38 files) – find lang –name 'stopwords_*.txt' | wc -l • stopwords_*.txt: 30 files • contractions_*.txt: 4 files – find lang -type f |egrep -v 'stopwords_|contractions_' • hyphenations_ga.txt, stemdict_nl.txt, stoptags_ja.txt, userdict_ja.txt
  • 41. Support files – still in use? • Check for usage – grep -o 'stopwords_.*.txt' managed-schema solrconfig.xml – grep -o 'contractions_.*.txt' ... – ... • NO Matches (we no longer have related types) – Delete the whole lang directory • What about files just inside config directory – Don't need currency.xml, protwords.txt
  • 42. Remove no-longer used types Sizes (line counts) managed-schema 34 solrconfig.xml 278 params.json 20 File count in conf .txt 41 2 .xml 3 2 .json 1 managed-schema (xml) 1
  • 44. Actual field usage - _root_
  • 45. The mystery of _root_ • In the original schema – no explanations • Documentation – used for nested documents: To support nested documents, the schema must include an indexed/non-stored field _root_ . The value of that field is populated automatically and is the same for all documents in the block, regardless of the inheritance depth. • We are not using nested documents • And neither does any other shipped example...
  • 46. Remove _root_ Sizes (line counts) managed-schema 34 33 solrconfig.xml 278 params.json 20 File count in conf .txt 2 .xml 2 .json 1 managed-schema (xml) 1
  • 47. Deconstructing – text_general type <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
  • 48. text_general support files stopwords.txt # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0# # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. • synonyms.txt # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with# the License. You may obtain a copy of the License at#. ...... .#----------------------------------------------------------------------- #some test synonym mappings unlikely to appear in real input textaaafoo => aaabar bbbfoo => bbbfoo bbbbar cccfoo => cccbar cccbaz fooaaa,baraaa,bazaaa # Some synonym groups specific to this example GB,gib,gigabyte,gigabytes MB,mib,megabyte,megabytes Television, Televisions, TV, TVs #notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming #after us won't split it into two words. # Synonym mappings can be used for spelling correction toopixima => pixma
  • 49. text_general's empty stopwords • No file => default stopwords => English • Empty file => disabled stopwords • Currently – NOT used
  • 50. text_general simplified definition <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
  • 51. Remove stopwords and synonyms Sizes (line counts) managed-schema 33 26 solrconfig.xml 278 params.json 20 File count in conf .txt 2 0 .xml 2 .json 1 managed-schema (xml) 1
  • 52. How far did we get Sizes (line counts) managed-schema* 481 26 solrconfig.xml 1482 278 params.json 20 File count in conf .txt 41 0 .xml 3 2 .json 1 managed-schema (xml) 1 * already has no comments
  • 53. Deconstructing – solrconfig.xml • solrconfig.xml is more complex than schema • Heterogeneous Sections • Nested definitions • Alternative implementations (e.g. highlighter) • Also remember – configoverlay.json – overrides solrconfig.xml – params.json – additional configuration parameters
  • 54. solrconfig.xml – feature counts 11 requestHandler 8 lib 5 searchComponent 3 queryResponseWriter 2 initParams 1 updateRequestProcessorChain 1 updateHandler 1 requestDispatcher 1 query 1 luceneMatchVersion 1 jmx 1 indexConfig 1 directoryFactory 1 dataDir 1 codecFactory
  • 55. solrconfig.xml – line counts 55:<updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> 52:<searchComponent class="solr.HighlightComponent" name="highlight"> 18:<query> 17:<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy"> 15:<searchComponent name="spellcheck" class="solr.SpellCheckComponent"> 13:<updateHandler class="solr.DirectUpdateHandler2"> 9:<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy"> 8:<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy"> 8:<requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy"> 7:<requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler"> 7:<requestHandler name="/query" class="solr.SearchHandler"> 6:<requestHandler name="/debug/dump" class="solr.DumpRequestHandler"> ......
  • 56. Remember, this works! <config> <luceneMatchVersion>6.2.0</luceneMatchVersion> <requestHandler name="/select” class="solr.SearchHandler”> <lst name="defaults"> <str name="df">text</str> </lst> </requestHandler> </config>
  • 57. add-unknown-fields-to-the-schema • Famous "schemaless" mode • Generic, but fully configurable • Far from perfect – Remember, we had to manually pre-add fields – Development, not production – Has normalization side-effects (normalizes dates) • Cannot remove it in our example
  • 58. solrconfig.xml - highlighter <searchComponent class="solr.HighlightComponent" name="highlight"> <highlighting> <fragmenter name="gap" default="true" class="solr.highlight.GapFragmenter"> <lst name="defaults"> <int name="hl.fragsize">100</int> </lst> </fragmenter> <fragmenter name="regex" class="solr.highlight.RegexFragmenter"> <lst name="defaults"> <int name="hl.fragsize">70</int> <float name="hl.regex.slop">0.5</float> <str name="hl.regex.pattern">[-w ,/n"']{20,200}</str> </lst> </fragmenter> <formatter name="html" default="true" class="solr.highlight.HtmlFormatter"> <lst name="defaults"> <str name="hl.simple.pre"><![CDATA[<em>]]></str> <str name="hl.simple.post"><![CDATA[</em>]]></str> </lst> </formatter> <encoder name="html" class="solr.highlight.HtmlEncoder"/> <fragListBuilder name="simple" class="solr.highlight.SimpleFragListBuilder"/> <fragListBuilder name="single" class="solr.highlight.SingleFragListBuilder"/> ....... • fragmenters • encoders • fragListBuilders • fragmentBuilders • boundaryScanners • ....
  • 59. highlighter – the truth • Highlighter searchComponent is in default stack • The params are a mix of standard highlighter, alternative FastVector highlighter • Cannot use FastVector version as schema fields are missing termVectors, etc • And standard highlighter params are same as implicit values • Therefore, we can remove the WHOLE definition
  • 60. Remove highlighter Sizes (line counts) managed-schema 26 solrconfig.xml 278 226 params.json 20 File count in conf .txt 0 .xml 2 .json 1 managed-schema (xml) 1
  • 61. Other searchComponents • Not on the default stack – spellcheck – term – termVector – elevator • Have dedicated requestHandlers • Inception (example within example) • Can be deleted – also delete elevate.xml 15:<searchComponent name="spellcheck" class="solr.SpellCheckComponent"> 17:<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy"> 1:<searchComponent name="terms" class="solr.TermsComponent"/> 9:<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy"> 1:<searchComponent name="tvComponent" class="solr.TermVectorComponent"/> 8:<requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy"> 4:<searchComponent name="elevator" class="solr.QueryElevationComponent"> 8:<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
  • 62. Remove custom searchComponents Sizes (line counts) managed-schema 26 solrconfig.xml 226 163 params.json 20 File count in conf .txt 0 .xml 2 1 .json 1 managed-schema (xml) 1
  • 63. solrconfig.xml – more stuff • There is more that can be taken out – query section, since you have to tune it anyway – updateHandler, and revert to basic commits – jmx – enableRemoteStreaming – definitely take that out • But keep velocity, browse, search support
  • 64. Next action • Join the (virtual) Solr Example Reading Group – Starts November 2016 – Register at http://bit.ly/SolrERG • Join mailing list at http://www.solr-start.com – Get the link to the presentation source – Learn about other similar projects – Get news of Solr articles and projects on the web

Editor's Notes

  • #5: I fully expect you to get lost several times along the way and hopefully find your way again. But don't worry, at the end, I will give you a way forward even if you got completely lost.
  • #8: I Don't Think It Means What You Think It Means
  • #13: Small tool makes good
  • #20: Too many files, types, fields, dynamic fields, and copyFields. And every example is “same same but different”
  • #44: Looking at bottom-right corner
  • #46: Actually, I added the description to the documentation – while preparing these slides. It's nice when committers dogfood, isn't it?
  • #48: We have split analyzers, because
  • #51: Both stopwords and synonyms here use text configuration files, but Solr now ships with REST-managed implementations as well, which you may prefer. You can see them in techproducts configuration