SlideShare a Scribd company logo
Elasticsearch for reporting
analytics on communities
Elasticsearch Meetup - 23 April 2014
Marc Harrison
Lithium makes software that helps brands
better connect with their customers
Our social software helps companies respond on social networks and build
trusted content on a community they own.
Empower brands to distill terabytes of daily
data into understanding participation
▪ fast
▪ flexible
▪ scalable
What
products/services
are generating the
most conversations?
Who is authoring
content that
generates the most
kudos/likes?
Are customer posts
getting timely
replies?
What types of
content does this
audience segment
look for?
Lithium Social Intelligence (LSI)
Cluster specs
▪ One of our clusters – elastic search 1.0
• 7+ billion documents/4.4+ TB and growing fast!
• 21 nodes (3 masters, 2 clients, 16 data)
Lessons learned
▪ Bulk loading
▪ Faceting
Bulk initial load / rebuild of data
Hadoop
mysql stream
Transform/
route
…
JSON Elasticsearch
Bulk loading
▪ Make sure ingest logic is robust
• Idempotent for bulk reply - ‘_id’
• Include revision based on processor/time
• Check cluster/index status to make sure ready to ingest
▪ Know the cache and thread pool sizes
• Bulk – fixed - # of processors - queue size 50
• Handle back off and retry
▪ How many docs?
• Like capacity - test with data –
• number of shards
• index.refresh_interval: 30s
• indices.memory.index_buffer_size: 5%
• indices.memory.*
• index.translog.*
Search - time series pattern for scale
Faceting
▪ Don't forget about memory!
• Strings - not_analyzed
• Numbers long vs int, double vs float, etc
• Do you need seconds/minutes when faceting?
• fielddata format - doc_values (1.0)
• Admin API’s allow checking field data size + evictions
• indices.cache.filter.size: 15%
• indices.fielddata.cache.size: 45%
Faceting II
▪ Accuracy
• shard_size
• Number of shards
• Cardinality
• Routing
▪ Great custom plugin
framework
• Uniques
• Array faceting
Impact
▪ Order of magnitude improvement
▪ Developers able to focus on improving insights
▪ community + elasticsearch + hadoop + horton works =
exciting
Select settings (data center)
• bootstrap.mlockall: true
• cluster.routing.allocation.disk.threshold_enabled: true
• http.compression: true
• transport.tcp.compress: true
• gateway.recover_after_data_nodes: 13
• gateway.recover_after_master_nodes: 2
• gateway.recover_after_time: 3m
• gateway.expected_nodes: 17
• indices.memory.index_buffer_size: 5%
• indices.cache.filter.size: 15%
• indices.fielddata.cache.size: 45%
• index.store.type: mmapfs
• index.translog.flush_threshold_ops: 10000
• action.auto_create_index: false
• action.disable_delete_all_indices: true
• cluster.routing.allocation.node_initial_primaries_recoveries: 4
• cluster.routing.allocation.node_concurrent_recoveries: 15
• indices.recovery.max_bytes_per_sec: 100mb
• indices.recovery.concurrent_streams: 5
• discovery.zen.minimum_master_nodes: 2
• index.search.slowlog.threshold.query.warn: 5s
• index.search.slowlog.threshold.query.info: 1s
• index.indexing.slowlog.threshold.index.warn: 5s
• plugin.mandatory: lithium-unique-facets
Questions?
Ad

More Related Content

What's hot (20)

Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
SaaS Is Beautiful
 
Practical Use of a NoSQL
Practical Use of a NoSQLPractical Use of a NoSQL
Practical Use of a NoSQL
IBM Cloud Data Services
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Yann Cluchey
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
Thoughtworks
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
MongoDB
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Lucidworks
 
Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016
Christoph Wurm
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
Bharvi Dixit
 
NoSQL for SQL Users
NoSQL for SQL UsersNoSQL for SQL Users
NoSQL for SQL Users
IBM Cloud Data Services
 
Practical Use of a NoSQL Database
Practical Use of a NoSQL DatabasePractical Use of a NoSQL Database
Practical Use of a NoSQL Database
IBM Cloud Data Services
 
Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
Regunath B
 
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Redis & MongoDB: Stop Big Data Indigestion Before It StartsRedis & MongoDB: Stop Big Data Indigestion Before It Starts
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Itamar Haber
 
Enterprise Presto PaaS offering in Google Cloud
Enterprise Presto PaaS offering in Google Cloud Enterprise Presto PaaS offering in Google Cloud
Enterprise Presto PaaS offering in Google Cloud
Ashish Tadose
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
Basic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupBasic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB Meetup
Johannes Moser
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
MongoDB
 
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
 Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart... Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
Ashish Tadose
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
Regunath B
 
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
MongoDB
 
Cortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ MicrosoftCortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ Microsoft
MSAdvAnalytics
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
SaaS Is Beautiful
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Yann Cluchey
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
Thoughtworks
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
MongoDB
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Lucidworks
 
Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016
Christoph Wurm
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
Bharvi Dixit
 
Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
Regunath B
 
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Redis & MongoDB: Stop Big Data Indigestion Before It StartsRedis & MongoDB: Stop Big Data Indigestion Before It Starts
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Itamar Haber
 
Enterprise Presto PaaS offering in Google Cloud
Enterprise Presto PaaS offering in Google Cloud Enterprise Presto PaaS offering in Google Cloud
Enterprise Presto PaaS offering in Google Cloud
Ashish Tadose
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
Basic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupBasic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB Meetup
Johannes Moser
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
MongoDB
 
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
 Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart... Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
Distributed Query Service Powered By Presto & Alluxio Across Clouds @Walmart...
Ashish Tadose
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
Regunath B
 
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
MongoDB
 
Cortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ MicrosoftCortana Analytics Workshop: Big Data @ Microsoft
Cortana Analytics Workshop: Big Data @ Microsoft
MSAdvAnalytics
 

Viewers also liked (8)

Fostering Community With Social Media - Midwest Newspaper Summit 2010
Fostering Community With Social Media - Midwest Newspaper Summit 2010Fostering Community With Social Media - Midwest Newspaper Summit 2010
Fostering Community With Social Media - Midwest Newspaper Summit 2010
Nathan Wright
 
How to Grow a Vibrant Community that Delivers Real Business Results
How to Grow a Vibrant Community that Delivers Real Business ResultsHow to Grow a Vibrant Community that Delivers Real Business Results
How to Grow a Vibrant Community that Delivers Real Business Results
CMX
 
Digital Innovation That Drives Business
Digital Innovation That Drives BusinessDigital Innovation That Drives Business
Digital Innovation That Drives Business
Phillip Parisis
 
Digital Engagement Journey
Digital Engagement Journey Digital Engagement Journey
Digital Engagement Journey
Kapil Bhatia
 
Building, Growing & Sustaining Social Media Communities Keynote Presentation
Building, Growing & Sustaining Social Media Communities Keynote Presentation Building, Growing & Sustaining Social Media Communities Keynote Presentation
Building, Growing & Sustaining Social Media Communities Keynote Presentation
Pam Moore
 
Social Media: Growing Brand Awareness
Social Media: Growing Brand AwarenessSocial Media: Growing Brand Awareness
Social Media: Growing Brand Awareness
Liam Dempsey
 
Deep Social: The Next Phase of Social Media
Deep Social: The Next Phase of Social MediaDeep Social: The Next Phase of Social Media
Deep Social: The Next Phase of Social Media
Ogilvy Consulting
 
Radical Candor: No BS, helping your team create better work.
Radical Candor: No BS, helping your team create better work.Radical Candor: No BS, helping your team create better work.
Radical Candor: No BS, helping your team create better work.
Digital Surgeons
 
Fostering Community With Social Media - Midwest Newspaper Summit 2010
Fostering Community With Social Media - Midwest Newspaper Summit 2010Fostering Community With Social Media - Midwest Newspaper Summit 2010
Fostering Community With Social Media - Midwest Newspaper Summit 2010
Nathan Wright
 
How to Grow a Vibrant Community that Delivers Real Business Results
How to Grow a Vibrant Community that Delivers Real Business ResultsHow to Grow a Vibrant Community that Delivers Real Business Results
How to Grow a Vibrant Community that Delivers Real Business Results
CMX
 
Digital Innovation That Drives Business
Digital Innovation That Drives BusinessDigital Innovation That Drives Business
Digital Innovation That Drives Business
Phillip Parisis
 
Digital Engagement Journey
Digital Engagement Journey Digital Engagement Journey
Digital Engagement Journey
Kapil Bhatia
 
Building, Growing & Sustaining Social Media Communities Keynote Presentation
Building, Growing & Sustaining Social Media Communities Keynote Presentation Building, Growing & Sustaining Social Media Communities Keynote Presentation
Building, Growing & Sustaining Social Media Communities Keynote Presentation
Pam Moore
 
Social Media: Growing Brand Awareness
Social Media: Growing Brand AwarenessSocial Media: Growing Brand Awareness
Social Media: Growing Brand Awareness
Liam Dempsey
 
Deep Social: The Next Phase of Social Media
Deep Social: The Next Phase of Social MediaDeep Social: The Next Phase of Social Media
Deep Social: The Next Phase of Social Media
Ogilvy Consulting
 
Radical Candor: No BS, helping your team create better work.
Radical Candor: No BS, helping your team create better work.Radical Candor: No BS, helping your team create better work.
Radical Candor: No BS, helping your team create better work.
Digital Surgeons
 
Ad

Similar to Elasticsearch meetup final_2014_04 (20)

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
Christopher Whitaker
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
Nisha Talagala
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
Hisham Arafat
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and Multitenancy
Bozhidar Bozhanov
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Lucidworks
 
Apache drill
Apache drillApache drill
Apache drill
MapR Technologies
 
Making Session Stores More Intelligent
Making Session Stores More IntelligentMaking Session Stores More Intelligent
Making Session Stores More Intelligent
Kyle Davis
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB
 
Building high performance and scalable share point applications
Building high performance and scalable share point applicationsBuilding high performance and scalable share point applications
Building high performance and scalable share point applications
Talbott Crowell
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented MiddlewareADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
DATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Codemotion
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
Christopher Whitaker
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
Nisha Talagala
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
Hisham Arafat
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and Multitenancy
Bozhidar Bozhanov
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Lucidworks
 
Making Session Stores More Intelligent
Making Session Stores More IntelligentMaking Session Stores More Intelligent
Making Session Stores More Intelligent
Kyle Davis
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB
 
Building high performance and scalable share point applications
Building high performance and scalable share point applicationsBuilding high performance and scalable share point applications
Building high performance and scalable share point applications
Talbott Crowell
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented MiddlewareADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
DATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Codemotion
 
Ad

Recently uploaded (20)

Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 

Elasticsearch meetup final_2014_04

  • 1. Elasticsearch for reporting analytics on communities Elasticsearch Meetup - 23 April 2014 Marc Harrison
  • 2. Lithium makes software that helps brands better connect with their customers Our social software helps companies respond on social networks and build trusted content on a community they own.
  • 3. Empower brands to distill terabytes of daily data into understanding participation ▪ fast ▪ flexible ▪ scalable What products/services are generating the most conversations? Who is authoring content that generates the most kudos/likes? Are customer posts getting timely replies? What types of content does this audience segment look for?
  • 5. Cluster specs ▪ One of our clusters – elastic search 1.0 • 7+ billion documents/4.4+ TB and growing fast! • 21 nodes (3 masters, 2 clients, 16 data)
  • 6. Lessons learned ▪ Bulk loading ▪ Faceting
  • 7. Bulk initial load / rebuild of data Hadoop mysql stream Transform/ route … JSON Elasticsearch
  • 8. Bulk loading ▪ Make sure ingest logic is robust • Idempotent for bulk reply - ‘_id’ • Include revision based on processor/time • Check cluster/index status to make sure ready to ingest ▪ Know the cache and thread pool sizes • Bulk – fixed - # of processors - queue size 50 • Handle back off and retry ▪ How many docs? • Like capacity - test with data – • number of shards • index.refresh_interval: 30s • indices.memory.index_buffer_size: 5% • indices.memory.* • index.translog.*
  • 9. Search - time series pattern for scale
  • 10. Faceting ▪ Don't forget about memory! • Strings - not_analyzed • Numbers long vs int, double vs float, etc • Do you need seconds/minutes when faceting? • fielddata format - doc_values (1.0) • Admin API’s allow checking field data size + evictions • indices.cache.filter.size: 15% • indices.fielddata.cache.size: 45%
  • 11. Faceting II ▪ Accuracy • shard_size • Number of shards • Cardinality • Routing ▪ Great custom plugin framework • Uniques • Array faceting
  • 12. Impact ▪ Order of magnitude improvement ▪ Developers able to focus on improving insights ▪ community + elasticsearch + hadoop + horton works = exciting
  • 13. Select settings (data center) • bootstrap.mlockall: true • cluster.routing.allocation.disk.threshold_enabled: true • http.compression: true • transport.tcp.compress: true • gateway.recover_after_data_nodes: 13 • gateway.recover_after_master_nodes: 2 • gateway.recover_after_time: 3m • gateway.expected_nodes: 17 • indices.memory.index_buffer_size: 5% • indices.cache.filter.size: 15% • indices.fielddata.cache.size: 45% • index.store.type: mmapfs • index.translog.flush_threshold_ops: 10000 • action.auto_create_index: false • action.disable_delete_all_indices: true • cluster.routing.allocation.node_initial_primaries_recoveries: 4 • cluster.routing.allocation.node_concurrent_recoveries: 15 • indices.recovery.max_bytes_per_sec: 100mb • indices.recovery.concurrent_streams: 5 • discovery.zen.minimum_master_nodes: 2 • index.search.slowlog.threshold.query.warn: 5s • index.search.slowlog.threshold.query.info: 1s • index.indexing.slowlog.threshold.index.warn: 5s • plugin.mandatory: lithium-unique-facets