Search Engine Using Apache Lucene
Search Engine Using Apache Lucene
net/publication/283771724
CITATIONS READS
7 1,404
2 authors, including:
Balasubramani Ramasamy
NMAM Institute of Technology
31 PUBLICATIONS 49 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Efficient Multimedia Processing in Embedded Devices using Various Power Saving Techniques in the Software Architecture View project
Speech Recognition Based Sentimental Analysis to Enhance the Efficiency of Product Reviews View project
All content following this page was uploaded by Balasubramani Ramasamy on 06 September 2018.
27
International Journal of Computer Applications (0975 – 8887)
Volume 127 – No.9, October 2015
Sara Cohen Mamon et al developed a semantic search engine search engine is developed using Java and threads are created
that uses XML (S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv, to visit each link in different pages and download the links.
“Xsearch: 2003). The search engine retrieves parts of the The memory is efficiently managed by using thread pools (P.
document related to the users’ query. The information Houston, 2013).
retrieved is ranked using extended information retrieval
methods and are presented in the order of their ranking. 3.2 Indexing and searching using Apache
Bhagwat and Polyzotis presented a file system search engine Lucene
– Eureka (D. Bhagwat and N. Polyzotis, 2003). It is a 3.2.1 Apache Lucene
semantic based search engine. It creates links between files
Apache lucene provides API to index documents as well to
and a file ranking system to order the files according to their
query the index and fetch documents that match the query.
importance.
Apache Lucene is a open source library developed by Apache
Wang et al, presented a semantic search technique to get
Software Foundation. It is usually used to implement text
information from regular tables (H.-L. Wang, S.-H. Wu, I.
based search engines. It has API to efficiently index text
Wang, C.-L. Sung, W.-L. Hsu, and W.-K. Shih, 2000). The
documents and search for text in them.
technique recognizes the relationship between table cells and
stores the data in the database. It uses query language to get It can be used to search the web, databases and has been used
information from the database. by sites like Wikipedia, Linkdin,etc., It is Java based but can
also be used in programming languages like Perl, Python and
Kandogan et al, presented a semantic search engine Avatar,
.Net. (A. Sonawane, 2009)
that uses text search along with ontology annotations.
Lucene has efficient and precise search algorithms. It retrieves
Maedch et al, developed an ontology search engine that uses a
the documents queried based on their ranking. It provides
unified method for ontology searching A. M¨adche, B. Motik,
different types of queries like PhraseQuery, WildcardQuery,
L. Stojanovic, R. Studer, and R. Volz, 2003. They use an
RangeQuery, FuzzyQuery, BooleanQuery and more.
ontology registry to store the ontology metadata and a server
to store the ontologies. In the search engine devneloped, an indexer has been built to
index the web pages downloaded, by using Apache Lucene
George Gardarin et al presented SEWISE that maps text data
API (A. Sonawane, 2009).
present in the web pages and creates an XML structure. It also
makes the hidden semantic in the text available to program.
28
International Journal of Computer Applications (0975 – 8887)
Volume 127 – No.9, October 2015
the search engine. The out of memory error on running 3.2.8 Code Snippet for Searching
multiple threads was handled by creating thread pools.
4. CONCLUSION
In this paper, the various techniques used in search engines
and work that has already been done in the area of search
engines are discussed. The paper also describes the use of
JSoup parser and its use in developing a search engine. The
paper also discusses the Apache Lucene API that provides the
use of its indexer and searching API that can be used to index
the downloaded pages and perform text based search in the
indexed documents which is vital to the development of a
search engine. In their future work the authors propose to use
Natural Language processing to mine information available in
the web pages and optimize the search engine.
5. REFERENCES
[1] V. V. Vydiswaran, Q. Mei, D. A. Hanauer, and K.
Zheng, , 2014, “Mining consumer health vocabulary
from community-generated text,” in AMIA Annual
Symposium Proceedings, vol. 2014, p. 1150, American
Medical Informatics Association. .
Fig 4: Code for Indexing using Lucene
[2] H. Sampathkumar, X.-w. Chen, and B. Luo, 2014.
3.2.7 Code Snippet for calculating ranking/scores “Mining adverse drug reactions from online healthcare
forums using hidden markov model,” BMC medical
informatics and decision making, vol. 14, no. 1, p. 91.
29
International Journal of Computer Applications (0975 – 8887)
Volume 127 – No.9, October 2015
[6] H.-L. Wang, S.-H. Wu, I. Wang, C.-L. Sung, W.-L. Hsu, [9] P. Gupta and D. A. Sharma, 2010, “Context based
and W.-K. Shih, 2000, “Semantic search on internet indexing in search engines using ontology,” International
tabular information extraction for answering queries,” in Journal of Computer Applications (0975–8887), vol. 1,
Proceedings of the ninth international conference on no. 14.
Information and knowledge management, pp. 243–249,
ACM. [10] P. Houston, 2013, Instant jsoup How-to. Packt
Publishing Ltd,.
[7] A. M¨adche, B. Motik, L. Stojanovic, R. Studer, and R.
Volz, 2003, “An infrastructure for searching, reusing [11] A. Sonawane, 2009, “Using apache lucene to search
and evolving distributed ontologies,” in Proceedings of text,” Online At http://www. ibm.
the 12th international conference on World Wide Web, com/developerworks/opensource/library/os-
pp. 439–448, ACM. apachelucenesearch/(as of 11 December 2013).
IJCATM : www.ijcaonline.org 30