SlideShare a Scribd company logo
Deduplication Using Solr: Presented by Neeraj Jain, Stubhub
Deduplication using SOLR 
Neeraj Jain, Software Engineer, Stubhub, Inc. 
neeraj.adi@gmail.com
About myself 
RIDE 
ON
StubHub is about….. 
Worlds 
Largest 
Ticke<ng 
marketplace 
10M active listings 
We 
enable 
“access” 
to 
events 
We want to be more!!!
Some Fun Facts about StubHub! 
Ø An eBay owned company 
Ø Over 25 million users and growing 
Ø We sell one ticket per second 
Ø ~8.5 million page views a day, on an average 
Ø ~ 3 million additional page views per day on Mobile devices 
Ø ~10 M tickets for sale in sports, concerts and others. 
Ø ~ 1 TB of data processed monthly by the analytics infrastructure – This number will 
significantly go up as we bring in data from many of the unstructured data sources 
Ø ~300 Million SQL executions/day
Search at Stubhub! 
SOLR 
1.2 
SOLR 
3+, 
Geo 
spa<al 
search 
SOLR 
4, 
NRT 
SOLR 
Cloud 
2010 
2011 
2012 
2013 
2014
Agenda 
Ø Use case 
Ø Challenges 
Ø Legacy solution 
Ø Our approach 
Ø Results
Use Case : Content Ingestion 
Input 
record 
Pre 
deduplica<on 
Deduplica<on 
Post 
deduplica<on 
Normalize 
Filtering 
Classifica<on 
Geocode 
Review 
Insert 
Update 
Discard 
Feed-­‐1 
Feed-­‐2 
Feed-­‐3 
Feed-­‐n 
Form 
Event 
DB
Challenges : Deduplication 
Ø Problem space 
² Event 
catalog 
Ø Performance considerations 
² Real 
<me 
processing 
² Batch 
processing 
Ø Speed and data quality
Legacy Solution : Deduplication Flow 
Deduplica<onModule 
for 
each 
field 
Event 
DB 
for 
each 
document 
Client 
1: 
getDuplicates() 
2: 
getSubsetByLoca.on() 
3: 
loop 
4: 
DuplicateList 
5: 
upsert() 
Normalize 
Filter 
Compute 
Score 
Feed 
Ingestor 
Batch 
Job 
UGC
Approach : Problem Model 
Ø Milpitas 
Library 
vs 
Milpitas 
Public 
Library 
Ø 1601 
E 
7th 
St 
vs 
1601 
E. 
Seventh 
St. 
Ø Pick 
up 
the 
right 
algo, 
edit 
distance, 
jaccard. 
Library, 
Restaurant, 
etc 
Milpitas 
Library 
160 
N. 
Main 
St; 
40 
N. 
Milpitas 
Blvd. 
Distance 
: 
~0.5 
mi 
e.g. 
venue 
Boost 
name, 
street 
number 
Dup 
detec<on 
-­‐ 
name, 
address 
etc 
Subset 
-­‐ 
Text 
Similarity 
on 
Categories 
Subset 
-­‐ 
Geo 
spa<al 
distance 
Venue 
Deduplica.on
Approach : Deduplication Flow 
Feed 
Ingestor 
Client 
Deduplica<onService 
QueryBuilder 
QueryExecuter 
Scorer 
SOLR 
Index 
Batch 
Job 
UGC 
1: 
/dedupe 
3.1: 
/select 
2: 
build() 
3: 
execute() 
4: 
compute() 
7: 
/update 
6: 
DedupeResponse 
Event 
DB 
8: 
upsert() 
A1: 
poll() 
IndexUpdater 
A2: 
/update, 
/delete 
NameFilter 
AddressFilter 
*Filter 
Filter
Approach : Deduplication Service 
public interface DeduplicationService<T> { 
/** 
* Checks for duplicate entity and return a DeduplicationResponse containing information about duplicates 
found. For each possible duplicate, there is a justification as to why it's a duplicate. 
* @param t entity for which duplicates need to be found. 
* @param options use options provided by this object to find and filter the results. 
* @return a not null instance of DeduplicationResponse object. 
* @throws DeduplicationConnectivityException if there was an issue in connecting to the dedupe data 
store. 
*/ 
public DeduplicationResponse<T> findDuplicates(T t, DedupeOptions options) 
throws DeduplicationConnectivityException; 
}
Approach : Deduplication Service 
@Component(value = "VenueDeduplicationService”) 
public class VenueDeduplicationService 
implements DeduplicationService<Venue> { 
@Override 
public DeduplicationResponse<Venue> findDuplicates(Venue venue, DedupeOptions options) 
throws 
Deduplica<onConnec<vityExcep<on 
{ 
} 
} 
@Component(value = "EventDeduplicationService”) 
public class EventDeduplicationService 
implements DeduplicationService<Event> { 
@Override 
public DeduplicationResponse<Event> findDuplicates(Event event, DedupeOptions options) 
throws DeduplicationConnectivityException { 
} 
}
Approach : Optimizations 
Ø How to keep the score consistent? 
² 
<similarity 
class=“TfSimilarity"/> 
Ø Auto commit settings 
² <autoSomCommit><maxTime>5</maxTime></autoSomCommit> 
Ø Custom PostFilter 
² <queryParser 
name="fdist" 
class=“DistanceQParserPlugin"/> 
Ø Custom update handler 
² 
<processor 
class=“VenueUpdateProcessorFactory”></processor>
Results : Sample Output 
Input 
Venue 
Matched 
Venue 
Score 
Distance 
Jillian's 
Billiards 
Club 
101 
Fourth 
St. 
Jillian's 
175 
4th 
St. 
1.5573 
5.6352 
Lush 
Lounge 
1092 
Post 
St. 
Lush 
Lounge 
1221 
Polk 
St. 
12.9836 
16.6501 
Mountain 
Theatre 
10 
Panoramic 
Hwy. 
Mountain 
Theater 
Nearby 
E 
Ridgecrest 
Boulevard 
and 
Pantoll 
Road 
3.2509 
5.8913
Results : Sample Output 
Input 
Venue 
Matched 
Venue 
Score 
Distance 
The 
Hedley 
Club 
at 
Hotel 
DeAnza 
233 
W. 
Santa 
Clara 
St. 
Hedley 
Club 
233 
W. 
Santa 
Clara 
St. 
5.0805 
0.0000 
Sonya 
Paz 
Fine 
Art 
Gallery 
1793 
LafayeYe 
St. 
Sonya 
Paz 
Gallery 
and 
Studio 
1793 
LafayeYe 
St. 
Suite 
110 
6.6764 
0.0069 
Pearl 
Avenue 
Library 
Community 
Room 
4270 
Pearl 
Ave. 
Pearl 
Avenue 
Branch 
Library 
4270 
Pearl 
Ave. 
5.7024 
0.0000 
Milpitas 
Library 
160 
N. 
Main 
St. 
Milpitas 
Library 
40 
N. 
Milpitas 
Blvd. 
16.4318 
0.7284
Summary 
Ø Use case 
² Content 
inges<on 
Ø Challenges 
² Deduplica<on 
Ø Legacy solution 
Ø Our approach 
² Used 
SOLR 
for 
text 
similarity 
² Extended 
default 
behavior 
² REST 
endpoint 
over 
SOLR 
interface 
Ø Next steps 
² Big 
data 
² Performer 
matching 
² I18n 
Ø Results
Thank You
Ad

More Related Content

What's hot (20)

Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and Recommenders
Lucidworks
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Lucidworks
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Lucidworks
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
lucenerevolution
 
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
Lucidworks
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Lucidworks (Archived)
 
Relevancy hacks for eCommerce
Relevancy hacks for eCommerceRelevancy hacks for eCommerce
Relevancy hacks for eCommerce
Varun Thacker
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Lucidworks
 
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty Images
Lucidworks
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Lucidworks
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
DataWorks Summit/Hadoop Summit
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Lucidworks
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
Lucidworks
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Lucidworks
 
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Ramzi Alqrainy
 
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
Lucidworks
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
lucenerevolution
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Lucidworks
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and Recommenders
Lucidworks
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Lucidworks
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Lucidworks
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
lucenerevolution
 
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks
Lucidworks
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Lucidworks (Archived)
 
Relevancy hacks for eCommerce
Relevancy hacks for eCommerceRelevancy hacks for eCommerce
Relevancy hacks for eCommerce
Varun Thacker
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Lucidworks
 
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty Images
Lucidworks
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Lucidworks
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Lucidworks
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
Lucidworks
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Lucidworks
 
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Ramzi Alqrainy
 
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
Lucidworks
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
lucenerevolution
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Lucidworks
 

Similar to Deduplication Using Solr: Presented by Neeraj Jain, Stubhub (20)

Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015
Johann de Boer
 
Reactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and RxReactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and Rx
Sumant Tambe
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalytics
Johann de Boer
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
Paolo Missier
 
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
Eric D. Boyd
 
CQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveCQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspective
Maria Gomez
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama
Ali Kheyrollahi
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
Wim Godden
 
Idempotency of commands in distributed systems
Idempotency of commands in distributed systemsIdempotency of commands in distributed systems
Idempotency of commands in distributed systems
Max Małecki
 
WebAppseqweqweqweqwewqeqweqweReImagined.pptx
WebAppseqweqweqweqwewqeqweqweReImagined.pptxWebAppseqweqweqweqwewqeqweqweReImagined.pptx
WebAppseqweqweqweqwewqeqweqweReImagined.pptx
FaisalTiftaZany1
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And Reporting
Cloudera, Inc.
 
What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?
Matt Wood
 
.NET Fest 2018. Антон Молдован. One year of using F# in production at SBTech
.NET Fest 2018. Антон Молдован. One year of using F# in production at SBTech.NET Fest 2018. Антон Молдован. One year of using F# in production at SBTech
.NET Fest 2018. Антон Молдован. One year of using F# in production at SBTech
NETFest
 
Design and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemDesign and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management System
Erdi Olmezogullari
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
sash236
 
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Phil Leggetter
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
mysqlops
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Sease
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015
Johann de Boer
 
Reactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and RxReactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and Rx
Sumant Tambe
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalytics
Johann de Boer
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
Paolo Missier
 
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
Eric D. Boyd
 
CQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspectiveCQRS and Event Sourcing: A DevOps perspective
CQRS and Event Sourcing: A DevOps perspective
Maria Gomez
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama
Ali Kheyrollahi
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
Wim Godden
 
Idempotency of commands in distributed systems
Idempotency of commands in distributed systemsIdempotency of commands in distributed systems
Idempotency of commands in distributed systems
Max Małecki
 
WebAppseqweqweqweqwewqeqweqweReImagined.pptx
WebAppseqweqweqweqwewqeqweqweReImagined.pptxWebAppseqweqweqweqwewqeqweqweReImagined.pptx
WebAppseqweqweqweqwewqeqweqweReImagined.pptx
FaisalTiftaZany1
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And Reporting
Cloudera, Inc.
 
What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?
Matt Wood
 
.NET Fest 2018. Антон Молдован. One year of using F# in production at SBTech
.NET Fest 2018. Антон Молдован. One year of using F# in production at SBTech.NET Fest 2018. Антон Молдован. One year of using F# in production at SBTech
.NET Fest 2018. Антон Молдован. One year of using F# in production at SBTech
NETFest
 
Design and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemDesign and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management System
Erdi Olmezogullari
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
sash236
 
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Phil Leggetter
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
mysqlops
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Sease
 
Ad

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 
Ad

Recently uploaded (20)

Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 

Deduplication Using Solr: Presented by Neeraj Jain, Stubhub

  • 2. Deduplication using SOLR Neeraj Jain, Software Engineer, Stubhub, Inc. [email protected]
  • 4. StubHub is about….. Worlds Largest Ticke<ng marketplace 10M active listings We enable “access” to events We want to be more!!!
  • 5. Some Fun Facts about StubHub! Ø An eBay owned company Ø Over 25 million users and growing Ø We sell one ticket per second Ø ~8.5 million page views a day, on an average Ø ~ 3 million additional page views per day on Mobile devices Ø ~10 M tickets for sale in sports, concerts and others. Ø ~ 1 TB of data processed monthly by the analytics infrastructure – This number will significantly go up as we bring in data from many of the unstructured data sources Ø ~300 Million SQL executions/day
  • 6. Search at Stubhub! SOLR 1.2 SOLR 3+, Geo spa<al search SOLR 4, NRT SOLR Cloud 2010 2011 2012 2013 2014
  • 7. Agenda Ø Use case Ø Challenges Ø Legacy solution Ø Our approach Ø Results
  • 8. Use Case : Content Ingestion Input record Pre deduplica<on Deduplica<on Post deduplica<on Normalize Filtering Classifica<on Geocode Review Insert Update Discard Feed-­‐1 Feed-­‐2 Feed-­‐3 Feed-­‐n Form Event DB
  • 9. Challenges : Deduplication Ø Problem space ² Event catalog Ø Performance considerations ² Real <me processing ² Batch processing Ø Speed and data quality
  • 10. Legacy Solution : Deduplication Flow Deduplica<onModule for each field Event DB for each document Client 1: getDuplicates() 2: getSubsetByLoca.on() 3: loop 4: DuplicateList 5: upsert() Normalize Filter Compute Score Feed Ingestor Batch Job UGC
  • 11. Approach : Problem Model Ø Milpitas Library vs Milpitas Public Library Ø 1601 E 7th St vs 1601 E. Seventh St. Ø Pick up the right algo, edit distance, jaccard. Library, Restaurant, etc Milpitas Library 160 N. Main St; 40 N. Milpitas Blvd. Distance : ~0.5 mi e.g. venue Boost name, street number Dup detec<on -­‐ name, address etc Subset -­‐ Text Similarity on Categories Subset -­‐ Geo spa<al distance Venue Deduplica.on
  • 12. Approach : Deduplication Flow Feed Ingestor Client Deduplica<onService QueryBuilder QueryExecuter Scorer SOLR Index Batch Job UGC 1: /dedupe 3.1: /select 2: build() 3: execute() 4: compute() 7: /update 6: DedupeResponse Event DB 8: upsert() A1: poll() IndexUpdater A2: /update, /delete NameFilter AddressFilter *Filter Filter
  • 13. Approach : Deduplication Service public interface DeduplicationService<T> { /** * Checks for duplicate entity and return a DeduplicationResponse containing information about duplicates found. For each possible duplicate, there is a justification as to why it's a duplicate. * @param t entity for which duplicates need to be found. * @param options use options provided by this object to find and filter the results. * @return a not null instance of DeduplicationResponse object. * @throws DeduplicationConnectivityException if there was an issue in connecting to the dedupe data store. */ public DeduplicationResponse<T> findDuplicates(T t, DedupeOptions options) throws DeduplicationConnectivityException; }
  • 14. Approach : Deduplication Service @Component(value = "VenueDeduplicationService”) public class VenueDeduplicationService implements DeduplicationService<Venue> { @Override public DeduplicationResponse<Venue> findDuplicates(Venue venue, DedupeOptions options) throws Deduplica<onConnec<vityExcep<on { } } @Component(value = "EventDeduplicationService”) public class EventDeduplicationService implements DeduplicationService<Event> { @Override public DeduplicationResponse<Event> findDuplicates(Event event, DedupeOptions options) throws DeduplicationConnectivityException { } }
  • 15. Approach : Optimizations Ø How to keep the score consistent? ² <similarity class=“TfSimilarity"/> Ø Auto commit settings ² <autoSomCommit><maxTime>5</maxTime></autoSomCommit> Ø Custom PostFilter ² <queryParser name="fdist" class=“DistanceQParserPlugin"/> Ø Custom update handler ² <processor class=“VenueUpdateProcessorFactory”></processor>
  • 16. Results : Sample Output Input Venue Matched Venue Score Distance Jillian's Billiards Club 101 Fourth St. Jillian's 175 4th St. 1.5573 5.6352 Lush Lounge 1092 Post St. Lush Lounge 1221 Polk St. 12.9836 16.6501 Mountain Theatre 10 Panoramic Hwy. Mountain Theater Nearby E Ridgecrest Boulevard and Pantoll Road 3.2509 5.8913
  • 17. Results : Sample Output Input Venue Matched Venue Score Distance The Hedley Club at Hotel DeAnza 233 W. Santa Clara St. Hedley Club 233 W. Santa Clara St. 5.0805 0.0000 Sonya Paz Fine Art Gallery 1793 LafayeYe St. Sonya Paz Gallery and Studio 1793 LafayeYe St. Suite 110 6.6764 0.0069 Pearl Avenue Library Community Room 4270 Pearl Ave. Pearl Avenue Branch Library 4270 Pearl Ave. 5.7024 0.0000 Milpitas Library 160 N. Main St. Milpitas Library 40 N. Milpitas Blvd. 16.4318 0.7284
  • 18. Summary Ø Use case ² Content inges<on Ø Challenges ² Deduplica<on Ø Legacy solution Ø Our approach ² Used SOLR for text similarity ² Extended default behavior ² REST endpoint over SOLR interface Ø Next steps ² Big data ² Performer matching ² I18n Ø Results