SlideShare a Scribd company logo
Harnessing The
Power of Search
André Ricardo Barreto de Oliveira ("Arbo")
Software Engineer - Team Lead - Search
Darmstadt, Germany
7 October, 2015
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
What's Search
and why is it so cool?
The dawn of Search
Searching higher
Search and the
Digital Experience
Understanding Search
Inside the Search Engine
The Index
Inside the Search Engine
The Index Documents
Inside the Search Engine
The Index Documents Fields
Inside the Search Engine
The Index Documents Fields
Not that different from ye olde database?...
Indexing documents
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
PUT /megacorp/employee/3
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
Queries and Filters
GET /megacorp/employee/_search?q=last_name:Smith "hits": [
{
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
GET /megacorp/employee/_search
{
"query" : {
"filtered" : {
"filter" : {
"range" : {
"age" : { "gt" : 21 }
}
},
"query" : {
"match" : {
"last_name" : "smith"
}
}
}
}
}
Full-Text Search
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
"hits": [
{
"_score": 0.16273327,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
"_score": 0.016878016,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
Analysis and Analyzers
Set the shape to semi-transparent by calling Set_Trans(5)
Standard analyzer
set, the, shape, to, semi, transparent, by, calling, set_trans, 5
Simple analyzer
set, the, shape, to, semi, transparent, by, calling, set, trans
Whitespace analyzer
Set, the, shape, to, semi-transparent, by, calling, Set_Trans(5)
English language analyzer
set, shape, semi, transpar, call, set_tran, 5
Field mappings
{
"number_of_clicks": {
"type": "integer"
}
}
{
"tag": {
"type": "string",
"index": "not_analyzed"
}
}
{
"tweet": {
"type": "string",
"analyzer": "english"
}
}
Analytics and Aggregations
GET /megacorp/employee/_search
{
"query": {
"match": {
"last_name": "smith"
}
},
"aggs" : {
"all_interests" : {
"terms" : { "field" : "interests" },
"aggs" : {
"avg_age" : {
"avg" : { "field" : "age" }
}
}
}
}
}
"buckets": [
{
"key": "music",
"doc_count": 2,
"avg_age": {
"value": 28.5
}
},
{
"key": "sports",
"doc_count": 1,
"avg_age": {
"value": 25
}
}
]
The Liferay
Search Infrastructure
The Liferay Search architecture
Liferay
Portal
Assets:
web content,
message boards,
wiki pages...
Search
infrastructure
(Magic
happens
here)
Search
engine(s)
Indices,
documents,
analysis...
The Liferay Search Engine plugins
public interface SearchEngine {
public IndexSearcher getIndexSearcher();
public IndexWriter getIndexWriter();
}
public class ElasticsearchSearchEngine
extends BaseSearchEngine
public class ElasticsearchIndexSearcher
extends BaseIndexSearcher
public class ElasticsearchIndexWriter
extends BaseIndexWriter
public class SolrSearchEngine
extends BaseSearchEngine
public class SolrIndexSearcher
extends BaseIndexSearcher
public class SolrIndexWriter
extends BaseIndexWriter
Solr: schema.xml
<fields>
<field indexed="true"
name="articleId"
stored="true"
type="string_keyword_lowercase"
/>
<field indexed="true"
name="companyId"
stored="true"
type="long"
/>
<field indexed="true"
name="emailAddress"
stored="true"
type="string"
/>
</fields>
The Liferay Document Mappings
Elasticsearch: liferay-type-mappings.json
"LiferayDocumentType": {
"properties": {
"articleId": {
"analyzer": "keyword_lowercase",
"store": "yes",
"type": "string"
},
"companyId": {
"index": "not_analyzed",
"store": "yes",
"type": "string"
},
"emailAddress": {
"index": "not_analyzed",
"store": "yes",
"type": "string"
}
}
}
From Portal assets to Index documents…
public interface Indexer<T> {
public Document getDocument(T object);
}
public class JournalArticleIndexer extends BaseIndexer<JournalArticle> {
protected Document doGetDocument(JournalArticle journalArticle) {
Document document = getBaseModelDocument(CLASS_NAME, journalArticle);
document.addText(
LocalizationUtil.getLocalizedName(Field.CONTENT, languageId),
content);
document.addKeyword(
Field.VERSION, journalArticle.getVersion());
document.addDate(
"displayDate", journalArticle.getDisplayDate());
}
}
public class MBMessageIndexer extends BaseIndexer<MBMessage> {
protected Document doGetDocument(MBMessage mbMessage) {
Document document = getBaseModelDocument(CLASS_NAME, mbMessage);
document.addText(
Field.CONTENT, processContent(mbMessage));
document.addKeyword(
"discussion", discussion == null ? false : true);
if (mbMessage.isAnonymous()) {
document.remove(Field.USER_NAME);
}
}
}
public interface Document {
public void addKeyword(String name, String value);
public void addNumber(String name, long value);
}
… from Search Box to queries and filters
public class JournalArticleIndexer
extends BaseIndexer<JournalArticle> {
public void postProcessSearchQuery(
BooleanQuery searchQuery,
BooleanFilter fullQueryBooleanFilter,
SearchContext searchContext) {
addSearchTerm(searchQuery, searchContext,
Field.ARTICLE_ID, false);
addSearchLocalizedTerm(searchQuery, searchContext,
Field.CONTENT, false);
addSearchLocalizedTerm(searchQuery, searchContext,
Field.TITLE, false);
addSearchTerm(searchQuery, searchContext,
Field.USER_NAME, false);
}
}
public class MBThreadIndexer
extends BaseIndexer<MBThread> {
public void postProcessContextBooleanFilter(
BooleanFilter contextBooleanFilter,
SearchContext searchContext) {
contextBooleanFilter.addRequiredTerm(
"discussion", discussion);
if ((endDate > 0) && (startDate > 0)) {
contextBooleanFilter.addRangeTerm(
"lastPostDate", startDate, endDate);
}
}
}
Classic query types (and filters)
TermQuery / TermFilter
"term" : { "locale" : "de_DE" }
TermRangeQuery / RangeTermFilter
"range" : { "age" :
{ "gte" : 8, "lte" : 42 } }
WildcardQuery
"wildcard" : { "company" : "L*ray" }
StringQuery
"query_string": { "query":
"(content:this OR name:this) AND
(content:that OR name:that)" }
BooleanQuery / BooleanFilter
"bool" : {
"must" : {
"term" : { "locale" : "de_DE" }
},
"must_not" : {
"range" : {
"age" : { "from" : 8, "to" : 42 }
}
},
"should" : [
{
"wildcard" : { "company" : "L*ray" }
},
{
"term" : { "product" : "Portal" }
}
]
}
Speaking to the Search Engine
public interface Query {
public BooleanFilter getPreBooleanFilter();
public Filter getPostFilter();
}
public interface Filter {
public Boolean isCached();
}
public class StringQueryTranslatorImpl
implements StringQueryTranslator {
public QueryBuilder translate(StringQuery stringQuery) {
// Elasticsearch Client Java API
return QueryBuilders.queryStringQuery(stringQuery.getQuery());
}
}
public class ElasticsearchIndexSearcher extends BaseIndexSearcher {
protected SearchResponse doSearch(
SearchContext searchContext, Query query) {
// Elasticsearch Client Java API
Client client = _elasticsearchConnectionManager.getClient();
SearchRequestBuilder searchRequestBuilder = client.prepareSearch(
getSelectedIndexNames(queryConfig, searchContext));
QueryBuilder queryBuilder = _queryTranslator.translate(
query, searchContext);
searchRequestBuilder.setQuery(queryBuilder);
SearchResponse searchResponse = searchRequestBuilder.get();
return searchResponse;
}
}
Search in Liferay 7
What's new in Liferay 7
Liferay 6
● Embedded Lucene by default
● Remote: Solr only
● Solr 4
● Portal-centric Lucene clustering
Liferay 7
● Embedded Elasticsearch by default
● Remote: Elasticsearch and Solr
● Solr 5.x and SolrCloud
● Native, transparent Elasticsearch clustering
● Queries + Filters + Boosting + Geolocation
● Extensibility and modularization
● Enterprise extras
○ Shield for security
○ Marvel for cluster monitoring
○ Kibana for visualization
New Queries
MatchQuery
"match" : {
"subject" : {
"query" : "Liferay Portal",
"type" : "phrase"
}
}
MoreLikeThisQuery
"more_like_this" : {
"fields" : ["title", "content"],
"like_text" : "Search In Liferay 7",
"min_term_freq" : 1, "max_query_terms" : 12
}
DisMaxQuery
"dis_max" : {
"tie_breaker" : 0.7,
"queries" : [
{ "term" : { "age" : 34 } },
{ "term" : { "age" : 35 } }
]
}
FuzzyQuery
"fuzzy" : {
"user" : {
"value" : "ed",
"fuzziness" : 2,
"max_expansions": 100
}
}
MatchAllQuery / MatchAllFilter
"match_all" : {
"boost" : 1.2
}
MultiMatchQuery
"multi_match" : {
"query": "Enterprise. Open Source. For Life",
"type": "most_fields",
"fields": [ "title", "title.original", "title.shingles" ]
}
New Filters
ExistsFilter
"exists" : { "field" : "emailAddress" }
MissingFilter
"missing" : { "field" : "emailAddress" }
PrefixFilter
"prefix" : { "product" : "life" }
TermsFilter
"terms" : { "locale" : ["de_DE", "pt_BR", "en_CA"] }
QueryFilter
"fquery" : {
"query" : {
"bool" : {
"must" : [
{
"wildcard" : { "company" : "L*ray" }
},
{
"term" : { "product" : "Portal" }
}
]
}
},
"_cache" : true
}
Geolocation filters
GeoDistanceFilter
"geo_distance" : {
"distance" : "12km",
"pin.location" : {
"lat" : 40,
"lon" : -70
}
}
GeoBoundingBoxFilter
"geo_bounding_box" : {
"pin.location" : {
"top_left" : {
"lat" : 40.73,
"lon" : -74.1
},
"bottom_right" : {
"lat" : 40.01,
"lon" : -71.12
}
}
}
GeoDistanceRangeFilter
"geo_distance_range" : {
"from" : "200km",
"to" : "400km",
"pin.location" : {
"lat" : 40,
"lon" : -70
}
}
GeoPolygonFilter
"geo_polygon" : {
"person.location" : {
"points" : [
[-70, 40],
[-80, 30],
[-90, 20]
]
}
}
Query-time boosting
"should": [
{
"match": {
"title": {
"query": "Liferay Portal",
"boost": 2
}
}
},
{
"match": {
"content": {
"query": "Liferay Portal",
}
}
}
]
New Aggregations: Top Hits
"terms": {
"field": "conference",
"size": 2
},
"aggs": {
"talks": {
"top_hits": {
"size" : 1,
"sort": [
{
"attendees": {
"order": "desc"
}
}
]
}
}
}
{
"key": "Liferay DEVCON",
"talks": {
"hits": [
{
"_source": {
"title": "The Power of Search"
}
}
]
}
},
{
"key": "Liferay North America Symposium",
"talks": {
"hits": [
{
"_source": {
"title": "The ELK Stack"
}
}
]
}
}
New Aggregations: Extended Stats
"extended_stats" : {
"field" : "attendees"
}
"attendees_per_talk_stats": {
"count": 9,
"min": 72,
"max": 99,
"avg": 86,
"sum": 774,
"sum_of_squares": 67028,
"std_deviation": 7.180219742846005
}
Modularity and Search
● OSGi
● Liferay's default Search Engine: now a plugin in itself
● Extension points in Search
○ Node Settings contributors → fine tune your cluster
○ Index Settings contributors → fine tune your shards and
logs
○ Analyzers and Mappings contributors → fine tune your
fields and queries
Liferay 7:
Enter Elasticsearch
Why Elasticsearch?
Best of breed
Built for modern web applications
Distributed and clusterable by design
Lucene based
Multi-tenancy
Great vendor support
Great monitoring tools: Marvel, Logstash
Great for Developers
Open Source
Amazing documentation
High "just works" factor, e.g. zero-config indexing and clustering
REST for queries, health, admin - everything
Update live settings programmatically
Great Java Client API
Pretty JSON for talks ;-)
Clustering with Liferay and Elasticsearch
Production mode
Dev mode
Scaling and tuning made easy
Enterprise-level Search
in Liferay 7 EE
Security: Shield
Protect your Liferay index with a username and password
SSL/TLS encryption for traffic within the Liferay Elasticsearch cluster
Elasticsearch plugin - no need for an external security solution
Restrict access to Liferay Portal instances with IP filtering
Monitoring: Marvel
Visualization:
Kibana
Thanks and happy searching!
http://j.mp/SearchLiferayDevcon2015
andre.oliveira@liferay.com
github.com/arboliveira
@arbocombr

More Related Content

PPT
HTML & CSS.ppt
vaseemshaik21
 
PPTX
DIGESTIVE GLANDS//Associative Glands of GIT// LIVER
Wasim Ak
 
PPTX
New Form Element in HTML5
Zahra Rezwana
 
PPT
Carbohydrates
obanbrahma
 
PDF
9- Learn CSS Fundamentals / Pseudo-classes
In a Rocket
 
PPTX
Disaccharides and their role in health and their structure
muti ullah
 
PPT
PKU-biochemical genetic disorder- 19.1.2023.ppt
ssuser002e70
 
HTML & CSS.ppt
vaseemshaik21
 
DIGESTIVE GLANDS//Associative Glands of GIT// LIVER
Wasim Ak
 
New Form Element in HTML5
Zahra Rezwana
 
Carbohydrates
obanbrahma
 
9- Learn CSS Fundamentals / Pseudo-classes
In a Rocket
 
Disaccharides and their role in health and their structure
muti ullah
 
PKU-biochemical genetic disorder- 19.1.2023.ppt
ssuser002e70
 

Similar to Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany (20)

PDF
ElasticSearch
Volodymyr Kraietskyi
 
PPTX
Big data elasticsearch practical
JWORKS powered by Ordina
 
PDF
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
PDF
Introduction to Elasticsearch
Sperasoft
 
PDF
Full Text Search with Lucene
WO Community
 
PDF
Search Engine-Building with Lucene and Solr
Kai Chan
 
PDF
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Kai Chan
 
PDF
DRUPAL AND ELASTICSEARCH
DrupalCamp Kyiv
 
PDF
Enhancement of Searching and Analyzing the Document using Elastic Search
IRJET Journal
 
PPTX
ElasticSearch Basics
Satya Mohapatra
 
PDF
Faster and better search results with Elasticsearch
Enrico Polesel
 
PDF
Advanced query parsing techniques
lucenerevolution
 
PDF
Advanced Relevancy Ranking
Search Technologies
 
PDF
Introduction to Elasticsearch
Luiz Messias
 
PPTX
An Introduction to Elastic Search.
Jurriaan Persyn
 
PDF
Apache Solr crash course
Tommaso Teofili
 
PDF
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
C4Media
 
PDF
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
PPTX
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
'Moinuddin Ahmed
 
PDF
Scalability and Real-time Queries with Elasticsearch
Ivo Andreev
 
ElasticSearch
Volodymyr Kraietskyi
 
Big data elasticsearch practical
JWORKS powered by Ordina
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
Introduction to Elasticsearch
Sperasoft
 
Full Text Search with Lucene
WO Community
 
Search Engine-Building with Lucene and Solr
Kai Chan
 
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Kai Chan
 
DRUPAL AND ELASTICSEARCH
DrupalCamp Kyiv
 
Enhancement of Searching and Analyzing the Document using Elastic Search
IRJET Journal
 
ElasticSearch Basics
Satya Mohapatra
 
Faster and better search results with Elasticsearch
Enrico Polesel
 
Advanced query parsing techniques
lucenerevolution
 
Advanced Relevancy Ranking
Search Technologies
 
Introduction to Elasticsearch
Luiz Messias
 
An Introduction to Elastic Search.
Jurriaan Persyn
 
Apache Solr crash course
Tommaso Teofili
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
C4Media
 
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
'Moinuddin Ahmed
 
Scalability and Real-time Queries with Elasticsearch
Ivo Andreev
 
Ad

More from André Ricardo Barreto de Oliveira (6)

PDF
Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...
André Ricardo Barreto de Oliveira
 
PDF
Search Intelligently - Liferay Symposium North America 2016, Chicago, USA
André Ricardo Barreto de Oliveira
 
PDF
Search: Explorando Todo O Poder das Buscas - Liferay Symposium Brasil 2015, S...
André Ricardo Barreto de Oliveira
 
PDF
Liferay e Modularização com Arquitetura OSGi
André Ricardo Barreto de Oliveira
 
PDF
Onde nenhum desenvolvedor jamais testou: Introduzindo testes unitários em cód...
André Ricardo Barreto de Oliveira
 
PDF
Escrevendo testes unitários para código legado: técnicas de isolamento
André Ricardo Barreto de Oliveira
 
Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...
André Ricardo Barreto de Oliveira
 
Search Intelligently - Liferay Symposium North America 2016, Chicago, USA
André Ricardo Barreto de Oliveira
 
Search: Explorando Todo O Poder das Buscas - Liferay Symposium Brasil 2015, S...
André Ricardo Barreto de Oliveira
 
Liferay e Modularização com Arquitetura OSGi
André Ricardo Barreto de Oliveira
 
Onde nenhum desenvolvedor jamais testou: Introduzindo testes unitários em cód...
André Ricardo Barreto de Oliveira
 
Escrevendo testes unitários para código legado: técnicas de isolamento
André Ricardo Barreto de Oliveira
 
Ad

Recently uploaded (20)

PDF
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
PPTX
Materi_Pemrograman_Komputer-Looping.pptx
RanuFajar1
 
PDF
Multi-factor Authentication (MFA) requirement for Microsoft 365 Admin Center_...
Q-Advise
 
PDF
QAware_Mario-Leander_Reimer_Architecting and Building a K8s-based AI Platform...
QAware GmbH
 
PPTX
Why Use Open Source Reporting Tools for Business Intelligence.pptx
Varsha Nayak
 
PPTX
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
PDF
Community & News Update Q2 Meet Up 2025
VictoriaMetrics
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PPTX
EU POPs Limits & Digital Product Passports Compliance Strategy 2025.pptx
Certivo Inc
 
PDF
Exploring AI Agents in Process Industries
amoreira6
 
PPTX
AZ900_SLA_Pricing_2025_LondonIT (1).pptx
chumairabdullahph
 
PDF
The Role of Automation and AI in EHS Management for Data Centers.pdf
TECH EHS Solution
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
Wondershare Filmora 14.5.20.12999 Crack Full New Version 2025
gsgssg2211
 
PDF
IEEE-CS Tech Predictions, SWEBOK and Quantum Software: Towards Q-SWEBOK
Hironori Washizaki
 
PPTX
Presentation of Computer CLASS 2 .pptx
darshilchaudhary558
 
PDF
Solar Panel Installation Guide – Step By Step Process 2025.pdf
CRMLeaf
 
PDF
Bandai Playdia The Book - David Glotz
BluePanther6
 
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
Materi_Pemrograman_Komputer-Looping.pptx
RanuFajar1
 
Multi-factor Authentication (MFA) requirement for Microsoft 365 Admin Center_...
Q-Advise
 
QAware_Mario-Leander_Reimer_Architecting and Building a K8s-based AI Platform...
QAware GmbH
 
Why Use Open Source Reporting Tools for Business Intelligence.pptx
Varsha Nayak
 
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
Community & News Update Q2 Meet Up 2025
VictoriaMetrics
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
EU POPs Limits & Digital Product Passports Compliance Strategy 2025.pptx
Certivo Inc
 
Exploring AI Agents in Process Industries
amoreira6
 
AZ900_SLA_Pricing_2025_LondonIT (1).pptx
chumairabdullahph
 
The Role of Automation and AI in EHS Management for Data Centers.pdf
TECH EHS Solution
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Wondershare Filmora 14.5.20.12999 Crack Full New Version 2025
gsgssg2211
 
IEEE-CS Tech Predictions, SWEBOK and Quantum Software: Towards Q-SWEBOK
Hironori Washizaki
 
Presentation of Computer CLASS 2 .pptx
darshilchaudhary558
 
Solar Panel Installation Guide – Step By Step Process 2025.pdf
CRMLeaf
 
Bandai Playdia The Book - David Glotz
BluePanther6
 

Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

  • 1. Harnessing The Power of Search André Ricardo Barreto de Oliveira ("Arbo") Software Engineer - Team Lead - Search Darmstadt, Germany 7 October, 2015
  • 4. What's Search and why is it so cool?
  • 5. The dawn of Search
  • 9. Inside the Search Engine The Index
  • 10. Inside the Search Engine The Index Documents
  • 11. Inside the Search Engine The Index Documents Fields
  • 12. Inside the Search Engine The Index Documents Fields Not that different from ye olde database?...
  • 13. Indexing documents PUT /megacorp/employee/1 { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] } PUT /megacorp/employee/2 { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": [ "music" ] } PUT /megacorp/employee/3 { "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about": "I like to build cabinets", "interests": [ "forestry" ] }
  • 14. Queries and Filters GET /megacorp/employee/_search?q=last_name:Smith "hits": [ { "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] GET /megacorp/employee/_search { "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 21 } } }, "query" : { "match" : { "last_name" : "smith" } } } } }
  • 15. Full-Text Search GET /megacorp/employee/_search { "query" : { "match" : { "about" : "rock climbing" } } } "hits": [ { "_score": 0.16273327, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_score": 0.016878016, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ]
  • 16. Analysis and Analyzers Set the shape to semi-transparent by calling Set_Trans(5) Standard analyzer set, the, shape, to, semi, transparent, by, calling, set_trans, 5 Simple analyzer set, the, shape, to, semi, transparent, by, calling, set, trans Whitespace analyzer Set, the, shape, to, semi-transparent, by, calling, Set_Trans(5) English language analyzer set, shape, semi, transpar, call, set_tran, 5
  • 17. Field mappings { "number_of_clicks": { "type": "integer" } } { "tag": { "type": "string", "index": "not_analyzed" } } { "tweet": { "type": "string", "analyzer": "english" } }
  • 18. Analytics and Aggregations GET /megacorp/employee/_search { "query": { "match": { "last_name": "smith" } }, "aggs" : { "all_interests" : { "terms" : { "field" : "interests" }, "aggs" : { "avg_age" : { "avg" : { "field" : "age" } } } } } } "buckets": [ { "key": "music", "doc_count": 2, "avg_age": { "value": 28.5 } }, { "key": "sports", "doc_count": 1, "avg_age": { "value": 25 } } ]
  • 20. The Liferay Search architecture Liferay Portal Assets: web content, message boards, wiki pages... Search infrastructure (Magic happens here) Search engine(s) Indices, documents, analysis...
  • 21. The Liferay Search Engine plugins public interface SearchEngine { public IndexSearcher getIndexSearcher(); public IndexWriter getIndexWriter(); } public class ElasticsearchSearchEngine extends BaseSearchEngine public class ElasticsearchIndexSearcher extends BaseIndexSearcher public class ElasticsearchIndexWriter extends BaseIndexWriter public class SolrSearchEngine extends BaseSearchEngine public class SolrIndexSearcher extends BaseIndexSearcher public class SolrIndexWriter extends BaseIndexWriter
  • 22. Solr: schema.xml <fields> <field indexed="true" name="articleId" stored="true" type="string_keyword_lowercase" /> <field indexed="true" name="companyId" stored="true" type="long" /> <field indexed="true" name="emailAddress" stored="true" type="string" /> </fields> The Liferay Document Mappings Elasticsearch: liferay-type-mappings.json "LiferayDocumentType": { "properties": { "articleId": { "analyzer": "keyword_lowercase", "store": "yes", "type": "string" }, "companyId": { "index": "not_analyzed", "store": "yes", "type": "string" }, "emailAddress": { "index": "not_analyzed", "store": "yes", "type": "string" } } }
  • 23. From Portal assets to Index documents… public interface Indexer<T> { public Document getDocument(T object); } public class JournalArticleIndexer extends BaseIndexer<JournalArticle> { protected Document doGetDocument(JournalArticle journalArticle) { Document document = getBaseModelDocument(CLASS_NAME, journalArticle); document.addText( LocalizationUtil.getLocalizedName(Field.CONTENT, languageId), content); document.addKeyword( Field.VERSION, journalArticle.getVersion()); document.addDate( "displayDate", journalArticle.getDisplayDate()); } } public class MBMessageIndexer extends BaseIndexer<MBMessage> { protected Document doGetDocument(MBMessage mbMessage) { Document document = getBaseModelDocument(CLASS_NAME, mbMessage); document.addText( Field.CONTENT, processContent(mbMessage)); document.addKeyword( "discussion", discussion == null ? false : true); if (mbMessage.isAnonymous()) { document.remove(Field.USER_NAME); } } } public interface Document { public void addKeyword(String name, String value); public void addNumber(String name, long value); }
  • 24. … from Search Box to queries and filters public class JournalArticleIndexer extends BaseIndexer<JournalArticle> { public void postProcessSearchQuery( BooleanQuery searchQuery, BooleanFilter fullQueryBooleanFilter, SearchContext searchContext) { addSearchTerm(searchQuery, searchContext, Field.ARTICLE_ID, false); addSearchLocalizedTerm(searchQuery, searchContext, Field.CONTENT, false); addSearchLocalizedTerm(searchQuery, searchContext, Field.TITLE, false); addSearchTerm(searchQuery, searchContext, Field.USER_NAME, false); } } public class MBThreadIndexer extends BaseIndexer<MBThread> { public void postProcessContextBooleanFilter( BooleanFilter contextBooleanFilter, SearchContext searchContext) { contextBooleanFilter.addRequiredTerm( "discussion", discussion); if ((endDate > 0) && (startDate > 0)) { contextBooleanFilter.addRangeTerm( "lastPostDate", startDate, endDate); } } }
  • 25. Classic query types (and filters) TermQuery / TermFilter "term" : { "locale" : "de_DE" } TermRangeQuery / RangeTermFilter "range" : { "age" : { "gte" : 8, "lte" : 42 } } WildcardQuery "wildcard" : { "company" : "L*ray" } StringQuery "query_string": { "query": "(content:this OR name:this) AND (content:that OR name:that)" } BooleanQuery / BooleanFilter "bool" : { "must" : { "term" : { "locale" : "de_DE" } }, "must_not" : { "range" : { "age" : { "from" : 8, "to" : 42 } } }, "should" : [ { "wildcard" : { "company" : "L*ray" } }, { "term" : { "product" : "Portal" } } ] }
  • 26. Speaking to the Search Engine public interface Query { public BooleanFilter getPreBooleanFilter(); public Filter getPostFilter(); } public interface Filter { public Boolean isCached(); } public class StringQueryTranslatorImpl implements StringQueryTranslator { public QueryBuilder translate(StringQuery stringQuery) { // Elasticsearch Client Java API return QueryBuilders.queryStringQuery(stringQuery.getQuery()); } } public class ElasticsearchIndexSearcher extends BaseIndexSearcher { protected SearchResponse doSearch( SearchContext searchContext, Query query) { // Elasticsearch Client Java API Client client = _elasticsearchConnectionManager.getClient(); SearchRequestBuilder searchRequestBuilder = client.prepareSearch( getSelectedIndexNames(queryConfig, searchContext)); QueryBuilder queryBuilder = _queryTranslator.translate( query, searchContext); searchRequestBuilder.setQuery(queryBuilder); SearchResponse searchResponse = searchRequestBuilder.get(); return searchResponse; } }
  • 28. What's new in Liferay 7 Liferay 6 ● Embedded Lucene by default ● Remote: Solr only ● Solr 4 ● Portal-centric Lucene clustering Liferay 7 ● Embedded Elasticsearch by default ● Remote: Elasticsearch and Solr ● Solr 5.x and SolrCloud ● Native, transparent Elasticsearch clustering ● Queries + Filters + Boosting + Geolocation ● Extensibility and modularization ● Enterprise extras ○ Shield for security ○ Marvel for cluster monitoring ○ Kibana for visualization
  • 29. New Queries MatchQuery "match" : { "subject" : { "query" : "Liferay Portal", "type" : "phrase" } } MoreLikeThisQuery "more_like_this" : { "fields" : ["title", "content"], "like_text" : "Search In Liferay 7", "min_term_freq" : 1, "max_query_terms" : 12 } DisMaxQuery "dis_max" : { "tie_breaker" : 0.7, "queries" : [ { "term" : { "age" : 34 } }, { "term" : { "age" : 35 } } ] } FuzzyQuery "fuzzy" : { "user" : { "value" : "ed", "fuzziness" : 2, "max_expansions": 100 } } MatchAllQuery / MatchAllFilter "match_all" : { "boost" : 1.2 } MultiMatchQuery "multi_match" : { "query": "Enterprise. Open Source. For Life", "type": "most_fields", "fields": [ "title", "title.original", "title.shingles" ] }
  • 30. New Filters ExistsFilter "exists" : { "field" : "emailAddress" } MissingFilter "missing" : { "field" : "emailAddress" } PrefixFilter "prefix" : { "product" : "life" } TermsFilter "terms" : { "locale" : ["de_DE", "pt_BR", "en_CA"] } QueryFilter "fquery" : { "query" : { "bool" : { "must" : [ { "wildcard" : { "company" : "L*ray" } }, { "term" : { "product" : "Portal" } } ] } }, "_cache" : true }
  • 31. Geolocation filters GeoDistanceFilter "geo_distance" : { "distance" : "12km", "pin.location" : { "lat" : 40, "lon" : -70 } } GeoBoundingBoxFilter "geo_bounding_box" : { "pin.location" : { "top_left" : { "lat" : 40.73, "lon" : -74.1 }, "bottom_right" : { "lat" : 40.01, "lon" : -71.12 } } } GeoDistanceRangeFilter "geo_distance_range" : { "from" : "200km", "to" : "400km", "pin.location" : { "lat" : 40, "lon" : -70 } } GeoPolygonFilter "geo_polygon" : { "person.location" : { "points" : [ [-70, 40], [-80, 30], [-90, 20] ] } }
  • 32. Query-time boosting "should": [ { "match": { "title": { "query": "Liferay Portal", "boost": 2 } } }, { "match": { "content": { "query": "Liferay Portal", } } } ]
  • 33. New Aggregations: Top Hits "terms": { "field": "conference", "size": 2 }, "aggs": { "talks": { "top_hits": { "size" : 1, "sort": [ { "attendees": { "order": "desc" } } ] } } } { "key": "Liferay DEVCON", "talks": { "hits": [ { "_source": { "title": "The Power of Search" } } ] } }, { "key": "Liferay North America Symposium", "talks": { "hits": [ { "_source": { "title": "The ELK Stack" } } ] } }
  • 34. New Aggregations: Extended Stats "extended_stats" : { "field" : "attendees" } "attendees_per_talk_stats": { "count": 9, "min": 72, "max": 99, "avg": 86, "sum": 774, "sum_of_squares": 67028, "std_deviation": 7.180219742846005 }
  • 35. Modularity and Search ● OSGi ● Liferay's default Search Engine: now a plugin in itself ● Extension points in Search ○ Node Settings contributors → fine tune your cluster ○ Index Settings contributors → fine tune your shards and logs ○ Analyzers and Mappings contributors → fine tune your fields and queries
  • 37. Why Elasticsearch? Best of breed Built for modern web applications Distributed and clusterable by design Lucene based Multi-tenancy Great vendor support Great monitoring tools: Marvel, Logstash
  • 38. Great for Developers Open Source Amazing documentation High "just works" factor, e.g. zero-config indexing and clustering REST for queries, health, admin - everything Update live settings programmatically Great Java Client API Pretty JSON for talks ;-)
  • 39. Clustering with Liferay and Elasticsearch Production mode Dev mode
  • 40. Scaling and tuning made easy
  • 42. Security: Shield Protect your Liferay index with a username and password SSL/TLS encryption for traffic within the Liferay Elasticsearch cluster Elasticsearch plugin - no need for an external security solution Restrict access to Liferay Portal instances with IP filtering
  • 45. Thanks and happy searching! http://j.mp/SearchLiferayDevcon2015 [email protected] github.com/arboliveira @arbocombr