SlideShare a Scribd company logo
DEVELOP OPEN
                SOURCE SEARCH
                ENGINE
26th Feb 2012   Ritesh Ambastha – CEO, iWillStudy.com
Open Source Search Engines

          Lucen     Datapark
Sphinx               Search
                               Zettair
            e

 YaCy     Xapian    SWISH-E    Seeks


 Recoll   OpenFTS    Nutch     Namazu
Platform Ideas !




                   Credits: http://zooie.wordpress.com
Comparision




              Credits: http://zooie.wordpress.com
Comparision




              Credits: http://zooie.wordpress.com
We are going to talk
      about
 Sphinx & Apache-
       Solr
Sphinx
   Sphinx is an open source full
    text search server.
   It's written in C++ and works
    on Linux
    (RedHat, Ubuntu, etc), Window
    s, MacOS, Solaris, FreeBSD, a
    nd a few other systems.
   Sphinx lets you either batch
    index and search data stored
    in an SQL database, NoSQL
    storage, or just files quickly
    and easily
Sphinx

 Text processing features
 Searching via SphinxAPI is as simple as
  3 lines of code, and querying via
  SphinxQL is even simpler
 Sphinx clusters scale up to billions of
  documents and tens of millions search
  queries per day, powering top websites
  such as
  Craigslist, DailyMotion, NetLog, etc.
Performance and scalability
   Indexing performance: Sphinx indexes up to 10-
    15 MB of text per second per single CPU core.
   Searching performance: Searching through
    1,000,000-document, 1.2 GB text collection that
    they use for everyday development and testing runs
    at 500+ queries/sec on a 2-core desktop machine
    with 2 GB of RAM.
   Scalability: Biggest known Sphinx cluster indexes
    almost 5 billion documents, resulting in over 6 TB of
    data.
   Busiest known one is, unsurpisingly, Craigslist, top-
    10 website in the US that serves 50+ million search
Key Features
   Batch and Real-Time full-text indexes
   Non-text attributes support
   SQL database indexing
   Non-SQL storage indexing
   Easy application integration
   Advanced full-text searching syntax
   Rich database-like querying features
   Better relevance ranking
   Flexible text processing
   Distributed searching
http://lucene.apache.org/solr/
Solr is the
popular, blazing fast
open source enterprise
search platform from
the Apache Lucene
project.
Its major features include
powerful full-text search, hit
highlighting, faceted
search, dynamic
clustering, database
integration, rich document
(e.g., Word, PDF)
handling, and geospatial
Solr is written in Java
and runs as a
standalone full-text
search server within a
servlet container such
as Tomcat.
Solr Features
   Advanced Full-Text Search Capabilities
   Optimized for High Volume Web Traffic
   Standards Based Open Interfaces - XML,JSON
    and HTTP
   Comprehensive HTML Administration Interfaces
   Server statistics exposed over JMX for monitoring
   Scalability - Efficient Replication to other Solr
    Search Servers
   Flexible and Adaptable with XML configuration
   Extensible Plugin Architecture
What is it all about?
Develop open source search engine
Solr is based on Lucene
More about Lucene
Ad

More Related Content

What's hot (20)

ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
BeyondTrees
 
Elastic search
Elastic searchElastic search
Elastic search
NexThoughts Technologies
 
Resource Oriented Architecture
Resource Oriented ArchitectureResource Oriented Architecture
Resource Oriented Architecture
Aurélien Pelletier
 
Seravia in the Cloud
Seravia in the CloudSeravia in the Cloud
Seravia in the Cloud
kidrane
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Ricardo Peres
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
Samantha Quiñones
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Lucidworks (Archived)
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
ElasticSearch Basics
ElasticSearch Basics ElasticSearch Basics
ElasticSearch Basics
Satya Mohapatra
 
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search PercolatorUse Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
Maxim Shelest
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
David Smiley
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
Solr 8 interview
Solr 8 interview Solr 8 interview
Solr 8 interview
Alihossein shahabi
 
E commerce Search using Apache Solr
E commerce Search using Apache SolrE commerce Search using Apache Solr
E commerce Search using Apache Solr
Rohan Makkar
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
Rahul Jain
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
Big data Intro by Kaushik Dutta
Big data Intro by Kaushik DuttaBig data Intro by Kaushik Dutta
Big data Intro by Kaushik Dutta
Kaushik Dutta
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Ricardo Peres
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
BeyondTrees
 
Seravia in the Cloud
Seravia in the CloudSeravia in the Cloud
Seravia in the Cloud
kidrane
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
Samantha Quiñones
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Lucidworks (Archived)
 
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search PercolatorUse Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
Maxim Shelest
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
David Smiley
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
E commerce Search using Apache Solr
E commerce Search using Apache SolrE commerce Search using Apache Solr
E commerce Search using Apache Solr
Rohan Makkar
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
Rahul Jain
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
Big data Intro by Kaushik Dutta
Big data Intro by Kaushik DuttaBig data Intro by Kaushik Dutta
Big data Intro by Kaushik Dutta
Kaushik Dutta
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 

Similar to Develop open source search engine (20)

Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software Foundation
Shalin Shekhar Mangar
 
What is Apache Solr? Check Out Its Advantages
What is Apache Solr? Check Out Its AdvantagesWhat is Apache Solr? Check Out Its Advantages
What is Apache Solr? Check Out Its Advantages
NextBrick Inc
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
ABC Talks
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
Grant Ingersoll
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
gagravarr
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
sagar chaturvedi
 
Semtech 2011 impressions
Semtech 2011 impressionsSemtech 2011 impressions
Semtech 2011 impressions
George Roth
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
Mark Miller
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
Swapnil & Patil
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
Alex Kursov
 
963
963963
963
Annu Ahmed
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
Tibor Lipusz
 
Building a semantic website
Building a semantic websiteBuilding a semantic website
Building a semantic website
CJ Jenkins
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
Chetan Giridhar
 
Oss and libraries enabling arabic libraries and creating opportunities
Oss and libraries   enabling arabic libraries and creating opportunitiesOss and libraries   enabling arabic libraries and creating opportunities
Oss and libraries enabling arabic libraries and creating opportunities
Massoud AlShareef
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 report
Koji Kawamura
 
Elastic pivorak
Elastic pivorakElastic pivorak
Elastic pivorak
Pivorak MeetUp
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Lucidworks
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
Charlie Hull
 
Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software Foundation
Shalin Shekhar Mangar
 
What is Apache Solr? Check Out Its Advantages
What is Apache Solr? Check Out Its AdvantagesWhat is Apache Solr? Check Out Its Advantages
What is Apache Solr? Check Out Its Advantages
NextBrick Inc
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
ABC Talks
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
Grant Ingersoll
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
gagravarr
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
sagar chaturvedi
 
Semtech 2011 impressions
Semtech 2011 impressionsSemtech 2011 impressions
Semtech 2011 impressions
George Roth
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
Mark Miller
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
Swapnil & Patil
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
Alex Kursov
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
Tibor Lipusz
 
Building a semantic website
Building a semantic websiteBuilding a semantic website
Building a semantic website
CJ Jenkins
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
Chetan Giridhar
 
Oss and libraries enabling arabic libraries and creating opportunities
Oss and libraries   enabling arabic libraries and creating opportunitiesOss and libraries   enabling arabic libraries and creating opportunities
Oss and libraries enabling arabic libraries and creating opportunities
Massoud AlShareef
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 report
Koji Kawamura
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Lucidworks
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
Charlie Hull
 
Ad

More from NAILBITER (20)

Social Media Strategies
Social Media StrategiesSocial Media Strategies
Social Media Strategies
NAILBITER
 
jQuery for Beginners
jQuery for Beginners jQuery for Beginners
jQuery for Beginners
NAILBITER
 
GBGahmedabad - Create your Business Website
GBGahmedabad - Create your Business WebsiteGBGahmedabad - Create your Business Website
GBGahmedabad - Create your Business Website
NAILBITER
 
Mapathon 2013 - Google Maps Javascript API
Mapathon 2013 - Google Maps Javascript APIMapathon 2013 - Google Maps Javascript API
Mapathon 2013 - Google Maps Javascript API
NAILBITER
 
Cloud Workshop - Presentation
Cloud Workshop - PresentationCloud Workshop - Presentation
Cloud Workshop - Presentation
NAILBITER
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
NAILBITER
 
iWillStudy.com - Light Pitch
iWillStudy.com - Light PitchiWillStudy.com - Light Pitch
iWillStudy.com - Light Pitch
NAILBITER
 
Cloud Summit Ahmedabad
Cloud Summit AhmedabadCloud Summit Ahmedabad
Cloud Summit Ahmedabad
NAILBITER
 
Android Fundamentals & Figures of 2012
Android Fundamentals & Figures of 2012Android Fundamentals & Figures of 2012
Android Fundamentals & Figures of 2012
NAILBITER
 
The iPhone development on windows
The iPhone development on windowsThe iPhone development on windows
The iPhone development on windows
NAILBITER
 
Ambastha EduTech Pvt Ltd
Ambastha EduTech Pvt LtdAmbastha EduTech Pvt Ltd
Ambastha EduTech Pvt Ltd
NAILBITER
 
Branding
BrandingBranding
Branding
NAILBITER
 
Advertising
AdvertisingAdvertising
Advertising
NAILBITER
 
Location based solutions maps & your location
Location based solutions   maps & your locationLocation based solutions   maps & your location
Location based solutions maps & your location
NAILBITER
 
Html5 workshop part 1
Html5 workshop part 1Html5 workshop part 1
Html5 workshop part 1
NAILBITER
 
Android Workshop - Session 2
Android Workshop - Session 2Android Workshop - Session 2
Android Workshop - Session 2
NAILBITER
 
Android Workshop Session 1
Android Workshop Session 1Android Workshop Session 1
Android Workshop Session 1
NAILBITER
 
Linux Seminar for Beginners
Linux Seminar for BeginnersLinux Seminar for Beginners
Linux Seminar for Beginners
NAILBITER
 
Linux advanced concepts - Part 2
Linux advanced concepts - Part 2Linux advanced concepts - Part 2
Linux advanced concepts - Part 2
NAILBITER
 
Linux advanced concepts - Part 1
Linux advanced concepts - Part 1Linux advanced concepts - Part 1
Linux advanced concepts - Part 1
NAILBITER
 
Social Media Strategies
Social Media StrategiesSocial Media Strategies
Social Media Strategies
NAILBITER
 
jQuery for Beginners
jQuery for Beginners jQuery for Beginners
jQuery for Beginners
NAILBITER
 
GBGahmedabad - Create your Business Website
GBGahmedabad - Create your Business WebsiteGBGahmedabad - Create your Business Website
GBGahmedabad - Create your Business Website
NAILBITER
 
Mapathon 2013 - Google Maps Javascript API
Mapathon 2013 - Google Maps Javascript APIMapathon 2013 - Google Maps Javascript API
Mapathon 2013 - Google Maps Javascript API
NAILBITER
 
Cloud Workshop - Presentation
Cloud Workshop - PresentationCloud Workshop - Presentation
Cloud Workshop - Presentation
NAILBITER
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
NAILBITER
 
iWillStudy.com - Light Pitch
iWillStudy.com - Light PitchiWillStudy.com - Light Pitch
iWillStudy.com - Light Pitch
NAILBITER
 
Cloud Summit Ahmedabad
Cloud Summit AhmedabadCloud Summit Ahmedabad
Cloud Summit Ahmedabad
NAILBITER
 
Android Fundamentals & Figures of 2012
Android Fundamentals & Figures of 2012Android Fundamentals & Figures of 2012
Android Fundamentals & Figures of 2012
NAILBITER
 
The iPhone development on windows
The iPhone development on windowsThe iPhone development on windows
The iPhone development on windows
NAILBITER
 
Ambastha EduTech Pvt Ltd
Ambastha EduTech Pvt LtdAmbastha EduTech Pvt Ltd
Ambastha EduTech Pvt Ltd
NAILBITER
 
Location based solutions maps & your location
Location based solutions   maps & your locationLocation based solutions   maps & your location
Location based solutions maps & your location
NAILBITER
 
Html5 workshop part 1
Html5 workshop part 1Html5 workshop part 1
Html5 workshop part 1
NAILBITER
 
Android Workshop - Session 2
Android Workshop - Session 2Android Workshop - Session 2
Android Workshop - Session 2
NAILBITER
 
Android Workshop Session 1
Android Workshop Session 1Android Workshop Session 1
Android Workshop Session 1
NAILBITER
 
Linux Seminar for Beginners
Linux Seminar for BeginnersLinux Seminar for Beginners
Linux Seminar for Beginners
NAILBITER
 
Linux advanced concepts - Part 2
Linux advanced concepts - Part 2Linux advanced concepts - Part 2
Linux advanced concepts - Part 2
NAILBITER
 
Linux advanced concepts - Part 1
Linux advanced concepts - Part 1Linux advanced concepts - Part 1
Linux advanced concepts - Part 1
NAILBITER
 
Ad

Recently uploaded (20)

Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 

Develop open source search engine

  • 1. DEVELOP OPEN SOURCE SEARCH ENGINE 26th Feb 2012 Ritesh Ambastha – CEO, iWillStudy.com
  • 2. Open Source Search Engines Lucen Datapark Sphinx Search Zettair e YaCy Xapian SWISH-E Seeks Recoll OpenFTS Nutch Namazu
  • 3. Platform Ideas ! Credits: http://zooie.wordpress.com
  • 4. Comparision Credits: http://zooie.wordpress.com
  • 5. Comparision Credits: http://zooie.wordpress.com
  • 6. We are going to talk about Sphinx & Apache- Solr
  • 7. Sphinx  Sphinx is an open source full text search server.  It's written in C++ and works on Linux (RedHat, Ubuntu, etc), Window s, MacOS, Solaris, FreeBSD, a nd a few other systems.  Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily
  • 8. Sphinx  Text processing features  Searching via SphinxAPI is as simple as 3 lines of code, and querying via SphinxQL is even simpler  Sphinx clusters scale up to billions of documents and tens of millions search queries per day, powering top websites such as Craigslist, DailyMotion, NetLog, etc.
  • 9. Performance and scalability  Indexing performance: Sphinx indexes up to 10- 15 MB of text per second per single CPU core.  Searching performance: Searching through 1,000,000-document, 1.2 GB text collection that they use for everyday development and testing runs at 500+ queries/sec on a 2-core desktop machine with 2 GB of RAM.  Scalability: Biggest known Sphinx cluster indexes almost 5 billion documents, resulting in over 6 TB of data.  Busiest known one is, unsurpisingly, Craigslist, top- 10 website in the US that serves 50+ million search
  • 10. Key Features  Batch and Real-Time full-text indexes  Non-text attributes support  SQL database indexing  Non-SQL storage indexing  Easy application integration  Advanced full-text searching syntax  Rich database-like querying features  Better relevance ranking  Flexible text processing  Distributed searching
  • 12. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project.
  • 13. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial
  • 14. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat.
  • 15. Solr Features  Advanced Full-Text Search Capabilities  Optimized for High Volume Web Traffic  Standards Based Open Interfaces - XML,JSON and HTTP  Comprehensive HTML Administration Interfaces  Server statistics exposed over JMX for monitoring  Scalability - Efficient Replication to other Solr Search Servers  Flexible and Adaptable with XML configuration  Extensible Plugin Architecture
  • 16. What is it all about?
  • 18. Solr is based on Lucene