SlideShare a Scribd company logo
Running Head: MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 1
Multilingualism in Information Retrieval Systems
Ariel Hess
University of North Texas
INFO 5206
May 5, 2017
Summary/Author’s Note:
Multilingualism in information retrieval systems is a topic that researchers have spent
countless hours examining. The challenge of creating a system that allows the user to input a
query that contains multiple languages and a result are populated in multiple languages is
something that will continue to be examined. Information retrieval systems can be adjusted to
include features that are designed to translate documents and queries. This paper will examine
different strategies used for text translation, projects implemented and challenges faced.
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 2
Introduction
Most search engines provide only monolingual search interface for documents mostly
written in English (Chen, Lee & Yang, 2009, p.4). Users often translate their query into English
before using a search engine. The goal of creating a Multilingual Retrieval System is to allow
users to search for information in multiple languages and retrieve information in multiple
languages. This is done with the deployment of Cross Language Retrieval, allows the user to ask
a question in one language and retrieve the information in another.
A survey of academic users was done to gain a better understanding of why users want to
have access to information documents in different languages. This was done to see if users in a
Digital Library would want access to a multilingual retrieval systems. Most users wanted the
access because of educational purposes. Users would use a Multilingual Information Retrieval
System to complete assignments that require documents to be searched using a language other
than English. The study showed that some users felt it would be too difficult to search for
documents that contain more than one language (He, Luo & Wu, 2012, pp. 188). The overall
takeaway from the survey is to gain a better understanding of user needs to determine if this
system works with the preexisting Information Retrieval System and the users. Developers want
to dismantle the barrier between the user query and multilingual documents. This can be done by
adjusting the Information Retrieval System to incorporate multilingualism by adding translation
tools and various other techniques.
Generally, a Multilingual Retrieval Systems works by first searching retrieving
documents from different collections from each language. Then a monolingual list or results is
retrieved from each collection to be merged to create a multilingual list. Each system can be
adapted to cater to the needs of the organization. Different tools are employed to ensure
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 3
compatibility. The Multilingual Retrieval System generally focuses on one or all the following:
document, query, and translation.
Approach/ Methods
The method of executing the process of a Multilingual Retrieval System includes a
variety of tools and features. The system has three levels of concern: query, translation, and
document. These areas are expressed through different techniques such as creating a dictionary
based model. Each Multilingual Retrieval System has its own features and deploys different
methods for retrieval. These methods are adaptable and catered to the type of audience the
system is intended for.
The use of text mining is the process of originating quality information from an
unstructured text. (Chen, Lee, & Yang, 2009, p.4) “Text mining in a multilingual setting [is also
incorporated as] an automated process that is design to discover the relationship between
languages (Hsiao, Lee & Yang, 2009, pp. 648).” These three techniques are often employed to
deal with the problem of creating a multilingual friendly system. Using a machine translation
systems, using a bilingual dictionary or terminology base, and using a statistical/probabilistic
mode based on parallel texts are different methods for creating this system.
Query translation is a strategy where the users query is translated into each language
presented into the multilingual collection to generate a monolingual information retrieval process
per language (Cumbreras, Lopez & Santiago, 2011, pp. 414)” The most common query search
depends on concepts of natural language. Dictionary based tool uses a bilingual list of words and
translates it into different languages. A machine translates every document in the corpus into
multiple languages. Corpus Based retrieval tools use knowledge based procurement techniques
to discover cross-lingual relationships and use them in Multilingual Retrieval Systems. This
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 4
method uses word alignment to generate bilingual corpa which establishes relationships between
words in different languages. This in turn is used to create a translation table used in query
translation. It is recommended that the corpus be virtual to save storage and time. These three
methods are grouped together because of their relation to each other. Query translation is made
possible because of dictionary based tools. Once the query is translated then the information is
obtained from a corpus which may have documents clustered. The documents in the corpora are
commonly indexed based on a single keyword or a group of keywords that can be easily found
during searching. Multilingual Comparable Corpus is another tool translated documents that
have the same topics. Many of the text mining themes are based on this method (Hsiao, Lee &
Yang, 2009, pp. 650).
Thesaurus based multilingual retrieval takes related terms in a document that are
commonly used and indexes them. This method can be done in Multilingual Information
Retrieval through mapping between thesauri of different languages (Chen, Lee, & Yang, 2009,
pp.6).
The methods addressed above are all interchangeable with any system that is
implementing a multilingual extension. The intended purpose of tools such as corpora’s is to
ensure a repository is available to access the intended information. The benefit of clustering
corpora’s is that is provides a narrower grouping of documents and text that are comparable.
Applications
The following sections provides examples of existing systems that have added the multilingual
feature to an existing Information Retrieval System or created a new system. Multilingualism is
designed to be incorporated into an already existing system. The following systems examine
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 5
their implementation of multilingualism into their pre-existing system.
SveMed
SveMed is uses terms from the Medical Subject Headings thesaurus which contains a list
of controlled vocabularies and translates these terms into different languages. These terms are
arranged in a hierarchical tree and when deciding which terms are going to be indexed the
indexer tries to select the finest term possible. These terms are then indexed and can be retrieved
by performing a truncation search. This is to ensure user submitted queries can provide results.
The interfaces use a thesaurus based database to translate the medical terms into three different
languages and distinguish information between the document terms. (Gavel, & Anderson, 2014,
pp.272) Uses the Solr search engine that relies solely on query expansion. “The search interface
allows the user to search terms in English, Swedish, or Norwegian, and browse for MeSH terms.
(Gavel & Anderson, 2014, pp.274).” A great advantage of this searching interface is that it
allows the user to select which language to search for information in.
GHSOM
“Growing hierarchical self-organizing map (GHSOM) constructs hierarchical structure of
expandable maps. Algorithms are developed after the relationships between other languages
based on the hierarchical map has been determined (Chen, Lee & Yang, 2009, pp.7).” A speech
tagger is used to select nouns from the text that will be used as keywords. The queries are
reprocessed to convert to vectors that will attach to the overall meaning of the document. Once
the keywords have been selected then they are converted into roots. The training is aid in the
encoding of bilingual documents to ensure users can access the information in these documents.
The expandable maps allow for better results.
Merge Model
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 6
The system first starts out with the user query that is carried out by the Cross Lingual
Information Retrieval system. The query is sent to three different collections and three sets of
results are populated. The merge model is design to combine the three monolingual lists into one
multilingual list. In this model sixty-two features are extracted from the three levels of
Multilingual Retrieval Systems query, document, and translation (Chen, Tsai, & Wang, 2011,
pp.638) A learning based ranking algorithm is employed called Frank to rank items based on
relevance. This learning based merge model has room for improvement.
ICE-TEA
Interactive Cross-Language search English with Translation Enhancement performs
query translation based on an interactive Multilingual Information Access system. The language
resources used is a bilingual dictionary translating English to Chinese. “Translation enhancement
is a feature of this system that provides users the original returned documents and their
translations. [The] system implements post-translation query expansions (He, Wu & Xu, 2012,
pp.527).” The system is designed to allow users to delete any translations that were returned that
was not needed. The system allows more users to interact with various stages of the Multilingual
Information Access system (He, Wu & Xu, 2012, pp.536). The system will need to be developed
to allow for better retrieval of relevant documents. Users can become more involved in the
information retrieval process with the help of this system.
BRUJA
A question and answer system for the management of multilingual collections. This
system uses Cross Lingual Information Retrieval to retrieve documents form a multilingual
system. This a common practice employed in the multilingual systems. The system produces
more correct answers in Spanish then in other languages. This system uses a machine translation
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 7
resource which requires a word-level alignment algorithm for the translations (Cumbreras, Lopez
& Santiago, 2011, pp. 420)
The commonalities of each system is the use of some form of query translation to bridge
the gap between the query and the documents. Each system’s goal is to enable the user to search
for information in multiple languages. Systems mention the involvement of Cross Lingual
Retrieval System in the Multilingual Retrieval System. These two system work together to
connect the user to information requested. The user is able to submit a query and a tool is used to
translate the query into a language corresponding with each collection. Then a list of
monolingual results are populated. This list is merge together with the use of the merging model
explained above. This model is just a model and can be adjust to cater to any other system. The
process of organizing the multilingual documents is different depending on the use of the system.
Documents can be translated then divided into comparable clusters or comparable corpora’s.
Keywords are often taken from documents and they are then translated into various languages
before being searched in the system. The sample systems and methods explained above discuss
methods of helping the user from the input of the query to receiving of the information.
ML News Clustering
Multilingual Document Clustering involves dividing a set of documents into two
languages into clusters, in such a way that similar documents are in the same cluster. News
cloistering is something that is popular because of the vast amount of news available to users.
This study uses a language independent representation of news documents by focusing of
clustering the news documents according to their content. They started with using comparable
multilingual news articles. (Fresno, Martinez & Montavo, 2015, pp.522) Name entities played a
role in the natural language processing, such as machine translation, clustering, summarizing and
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 8
extraction(cite) Comparable corpora were Spanish and English were the languages used.
Expected Density is a measurement tool that can be used in a ML setting to determine the quality
of the clusters (Fresno, Martinez & Montavo, 2015,pp.528).
Challenges/ Limitations
Each article read explain the challenges of creating a multilingual retrieval systems.
There is a large amount of text that has multiple meanings in different languages. This poses a
problem when indexed terms are translated into a term that is represented in the system.
Multilingualism in Information Retrieval Systems is a challenge due to the limitations of existing
programs that are available. The amount of resources available is limited to main items such as
query translation. Many developers want to steer away from translator due to the inaccuracy of
some translations. When words are translated into another language the developer runs the risk
of the word not being translated correctly due to the missed meaning or inadequate translation
tools for languages derived from a specific region. For example, there are many regions of origin
of Spanish which means a viable translation system must be equipped to translate different
versions of Spanish words. This has not been developed.
Some translation systems aren’t equipped to handle the translation of proper nouns. A machine
translation system is deemed as impractical due to the large amount of text being translated
(Dhavachelvan & Sujatha, 2011, pp.116). The larger the text, the slower the retrieval time.
It is important that when choosing keywords to comprehensive ones to allow for chance of
retrieving relevant documents (Peters, et al., 2011, pp.5) In some languages there is no way to
change a verb to a noun which is why some systems require the keyword to have a noun in it.
(Peters, et al., 2011,pp.11) These challenges are common in an information setting where the
user is looking for information in either their native or nonnative language.
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 9
Future Research
Future research should include the creation of a large bilingual text corpa, large scale text
databases for testing, and a database with lexical semantic relations (Fluhr, n.d.,para. 24).
Systems need to be tested in various languages. The Cross-Language Evaluation Forum spent it’s
time from 2000 to 2005 researching implemented systems that have multilingual features for
digital media. CLEF noticed that most systems examined pre-processed the document collection,
adopted linguistic processors and language resources such as POS-taggers (Peters, 2011,
pp.677).
Future testing should include a wide range of users in the test group. Having a group of
test users who are from one specific region does not allow for accurate results. The test group
used needs to be diverse. Questions catered to multilingualism should be asked to determine how
they would use the system and if it would be necessary to implement.
User knowledge needs to be improved. The challenge of implementing a new system that
involves more than one language can frustrate native English speakers and nonnative English
speakers. A study showed “the language choices made by the students while searching for
information on the Internet seemed to indicate that the students used their native languages just
as much as they used English. This is a reflection of the rising multilingualism and
multiculturalism in the online environment and the fact that English is not as dominant as it was
some years ago: (Ajiferuke, et al., 2016, pp.498)” There needs to be adequate time set aside to
train users how to search and use such system. Organizations need to decide if implementing a
Multilingual Retrieval System will be beneficial to their user audience.
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 10
Discussion
Multilingualism in information retrieval systems is a concept that is still in the beginning
stages. It is a challenge to take a document that is written in multiple languages and translate it
into the language derived in the search query. “Multilingualism plays a role in the quality and
effectiveness of communication services offered [to users] (Menard, 2011, pp.15).”
Multilingualism is not only needed in library systems but a museum felt the need to offer this
service to their users as well. This feature was used to allow users to search images that have
been indexed in multiple languages.
Multilingual Information Retrieval System provides document retrieval techniques that
enable a user to enter a query, including a natural language query, in a desired one of a plurality
of supported languages, and retrieve documents from a database that includes documents in at
least one other language of the plurality of supported languages (Libby, et al., 1999, pp.8.)
A variety of articles were examined, each discussing different but similar aspects of Multilingual
Retrieval Systems. A significant improvement can be made to existing samples of retrieval
systems that are implementing the new system. Multilingualism is design to be incorporate to an
already existing Information Retrieval System. There are many tools currently available and
tools that need to be developed. Currently this system is limited to dictionary based tools,
corpora’s, clustering, indexing, and thesaurus based tools. These tools have been beneficial to the
development of this system but need to be enhanced due to errors that can arise.
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 11
References
García-Cumbreras, M. Á, Martínez-Santiago, F., & Ureña-López, L. A. (2011, 10). Architecture and
evaluation of BRUJA, a multilingual question answering system. Information Retrieval, 15(5),
413-432. doi:10.1007/s10791-011-9177-5
Fluhr, Christian (n.d). Multilingual Information Retrieval. Retrieved from
http://www.cslu.ogi.edu/HLTsurvey/ch8node7.html
Gavel, Y., & Andersson, P. (2014, 06). Multilingual query expansion in the SveMed bibliographic
database: A case study. Journal of Information Science, 40(3), 269-280.
doi:10.1177/0165551514524685
Libby, E. D., Palk, W., Yu, E. S., & Li, M. (1999). U.S. Patent No. 6006221. Washington, DC: U.S.
Patent and Trademark Office.
Montalvo, S., Martínez, R., & Fresno, V. (2015, 08). Quality prediction of multilingual news
clustering: An experimental study. Journal of Information Science, 41(4), 518-530.
doi:10.1177/0165551515586671
Ménard, E. (2011, 07). Search Behaviours of Image Users: A Pilot Study on Museum Objects.
Partnership: The Canadian Journal of Library and Information Practice and Research, 6(1).
doi:10.21083/partnership.v6i1.1433
Nzomo, P., Ajiferuke, I., Vaughan, L., & Mckenzie, P. (2016, 09). Multilingual Information Retrieval
& Use: Perceptions and Practices Amongst Bi/Multilingual Academic Users. The Journal of
Academic Librarianship, 42(5), 495-502. doi:10.1016/j.acalib.2016.06.012
Peters, C., Braschler, M., & Clough, P. (2011, 09). Evaluation for Multilingual Information Retrieval
Systems. Multilingual Information Retrieval, 129-169. doi:10.1007/978-3-642-23008-0_5
MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 12
P., & D. (2011, 10). A Review on the Cross and Multilingual Information Retrieval. International
Journal of Web & Semantic Technology, 2(4), 115-124. doi:10.5121/ijwest.2011.2409
Tsai, M., Chen, H., & Wang, Y. (2011, 09). Learning a merge model for multilingual information
retrieval. Information Processing & Management, 47(5), 635-646.
doi:10.1016/j.ipm.2009.12.002
Wu, D., He, D., & Luo, B. (2012, 04). Multilingual needs and expectations in digital libraries. The
Electronic Library, 30(2), 182-197. doi:10.1108/02640471211221322
Wu, D., He, D., & Xu, X. (2012, 08). A study of relevance feedback techniques in interactive
multilingual information access. Library Hi Tech, 30(3), 523-544.
doi:10.1108/07378831211266645
Yang, H., Hsiao, H., & Lee, C. (2011, 09). Multilingual document mining and navigation using self-
organizing maps. Information Processing & Management, 47(5), 647-666.
doi:10.1016/j.ipm.2009.12.003
Yang, H., Lee, C., & Chen, D. (2009, 02). A method for multilingual text mining and retrieval using
growing hierarchical self-organizing maps. Journal of Information Science, 35(1), 3-23.
doi:10.1177/0165551508088968
Zhang, X., Liu, J. N., & Atwell, E. (n.d.) Multilingual Information Retrieval in World Wide Web.
Retrieved from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.165.90&rep=rep1&type=pdf
Ad

More Related Content

What's hot (20)

Virtual Libraries
Virtual LibrariesVirtual Libraries
Virtual Libraries
Joyce Kasman Valenza
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
Debashisnaskar
 
Natural Language Processing in Artificial Intelligence - Codeup #5 - PayU
Natural Language Processing in Artificial Intelligence  - Codeup #5 - PayU Natural Language Processing in Artificial Intelligence  - Codeup #5 - PayU
Natural Language Processing in Artificial Intelligence - Codeup #5 - PayU
Artivatic.ai
 
Thesaurus 2101
Thesaurus 2101Thesaurus 2101
Thesaurus 2101
roseline2101
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
KU Leuven
 
POPSI
POPSIPOPSI
POPSI
silambu111
 
Library Classifiction- Schemes-DDC-UDC-CC.ppt
Library Classifiction- Schemes-DDC-UDC-CC.pptLibrary Classifiction- Schemes-DDC-UDC-CC.ppt
Library Classifiction- Schemes-DDC-UDC-CC.ppt
Dr. Anjaiah Mothukuri
 
Dictionaries and Tolerant Retrieval.ppt
Dictionaries and Tolerant Retrieval.pptDictionaries and Tolerant Retrieval.ppt
Dictionaries and Tolerant Retrieval.ppt
Manimaran A
 
INFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.LINFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.L
anujessy
 
INDEST-AICTE Library Consortia: A Study
INDEST-AICTE Library Consortia: A StudyINDEST-AICTE Library Consortia: A Study
INDEST-AICTE Library Consortia: A Study
PRASANNA DEVARAMATHA ANILKUMAR
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
Tomek Pluskiewicz
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Vikas Bhushan
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
silambu111
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
Leslie Vargas
 
Thesauri
ThesauriThesauri
Thesauri
Miles Price
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval System
vimalsura
 
Relational Database Management System
Relational Database Management SystemRelational Database Management System
Relational Database Management System
Mian Abdul Raheem
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
Jayatunga Amaraweera
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
Ontology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical StudyOntology and Ontology Libraries: a Critical Study
Ontology and Ontology Libraries: a Critical Study
Debashisnaskar
 
Natural Language Processing in Artificial Intelligence - Codeup #5 - PayU
Natural Language Processing in Artificial Intelligence  - Codeup #5 - PayU Natural Language Processing in Artificial Intelligence  - Codeup #5 - PayU
Natural Language Processing in Artificial Intelligence - Codeup #5 - PayU
Artivatic.ai
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
KU Leuven
 
Library Classifiction- Schemes-DDC-UDC-CC.ppt
Library Classifiction- Schemes-DDC-UDC-CC.pptLibrary Classifiction- Schemes-DDC-UDC-CC.ppt
Library Classifiction- Schemes-DDC-UDC-CC.ppt
Dr. Anjaiah Mothukuri
 
Dictionaries and Tolerant Retrieval.ppt
Dictionaries and Tolerant Retrieval.pptDictionaries and Tolerant Retrieval.ppt
Dictionaries and Tolerant Retrieval.ppt
Manimaran A
 
INFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.LINFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.L
anujessy
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
Tomek Pluskiewicz
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Vikas Bhushan
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
silambu111
 
Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
Leslie Vargas
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval System
vimalsura
 
Relational Database Management System
Relational Database Management SystemRelational Database Management System
Relational Database Management System
Mian Abdul Raheem
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 

Similar to Multilingualism in Information Retrieval System (20)

MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
ijcseit
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
unyil96
 
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDMarathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
IJERA Editor
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
IJERA Editor
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrieval
dannyijwest
 
Viva
VivaViva
Viva
Boshra Albayaty
 
A SURVEY ON VARIOUS CLIR TECHNIQUES
A SURVEY ON VARIOUS CLIR TECHNIQUESA SURVEY ON VARIOUS CLIR TECHNIQUES
A SURVEY ON VARIOUS CLIR TECHNIQUES
International Journal of Technical Research & Application
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
eSAT Publishing House
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
El Habib NFAOUI
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
cscpconf
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
AIMS (Agricultural Information Management Standards)
 
Use and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
Use and integration of controlled vocabularies (AGROVOC) in DSpace RepositoriesUse and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
Use and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
AIMS (Agricultural Information Management Standards)
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
ijcsa
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
ijitcs
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
RIILP
 
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISHDICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
cscpconf
 
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
Kim Daniels
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 
Survey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionarySurvey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse Dictionary
Editor IJMTER
 
Diacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemDiacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval System
CSCJournals
 
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
ijcseit
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
unyil96
 
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDMarathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
IJERA Editor
 
QUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu LanguageQUrdPro: Query processing system for Urdu Language
QUrdPro: Query processing system for Urdu Language
IJERA Editor
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrieval
dannyijwest
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
eSAT Publishing House
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
El Habib NFAOUI
 
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRA NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIR
cscpconf
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
ijcsa
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
ijitcs
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
RIILP
 
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISHDICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
cscpconf
 
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
Kim Daniels
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 
Survey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse DictionarySurvey On Building A Database Driven Reverse Dictionary
Survey On Building A Database Driven Reverse Dictionary
Editor IJMTER
 
Diacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval SystemDiacritic Oriented Arabic Information Retrieval System
Diacritic Oriented Arabic Information Retrieval System
CSCJournals
 
Ad

Recently uploaded (16)

Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
Cyber Safety: security measure about navegating on internet.
Cyber Safety: security measure about navegating on internet.Cyber Safety: security measure about navegating on internet.
Cyber Safety: security measure about navegating on internet.
manugodinhogentil
 
Grade 7 Google_Sites_Lesson creating website.pptx
Grade 7 Google_Sites_Lesson creating website.pptxGrade 7 Google_Sites_Lesson creating website.pptx
Grade 7 Google_Sites_Lesson creating website.pptx
AllanGuevarra1
 
Seminar.MAJor presentation for final project viva
Seminar.MAJor presentation for final project vivaSeminar.MAJor presentation for final project viva
Seminar.MAJor presentation for final project viva
daditya2501
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
Organizing_Data_Grade4 how to organize.pptx
Organizing_Data_Grade4 how to organize.pptxOrganizing_Data_Grade4 how to organize.pptx
Organizing_Data_Grade4 how to organize.pptx
AllanGuevarra1
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
AndrHenrique77
 
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdfBreaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Nirmalthapa24
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdfcxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
ssuser060b2e1
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
AI Days 2025_GM1 : Interface in theage of AI
AI Days 2025_GM1 : Interface in theage of AIAI Days 2025_GM1 : Interface in theage of AI
AI Days 2025_GM1 : Interface in theage of AI
Prashant Singh
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
Cyber Safety: security measure about navegating on internet.
Cyber Safety: security measure about navegating on internet.Cyber Safety: security measure about navegating on internet.
Cyber Safety: security measure about navegating on internet.
manugodinhogentil
 
Grade 7 Google_Sites_Lesson creating website.pptx
Grade 7 Google_Sites_Lesson creating website.pptxGrade 7 Google_Sites_Lesson creating website.pptx
Grade 7 Google_Sites_Lesson creating website.pptx
AllanGuevarra1
 
Seminar.MAJor presentation for final project viva
Seminar.MAJor presentation for final project vivaSeminar.MAJor presentation for final project viva
Seminar.MAJor presentation for final project viva
daditya2501
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
Organizing_Data_Grade4 how to organize.pptx
Organizing_Data_Grade4 how to organize.pptxOrganizing_Data_Grade4 how to organize.pptx
Organizing_Data_Grade4 how to organize.pptx
AllanGuevarra1
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
AndrHenrique77
 
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdfBreaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Nirmalthapa24
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdfcxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
ssuser060b2e1
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
AI Days 2025_GM1 : Interface in theage of AI
AI Days 2025_GM1 : Interface in theage of AIAI Days 2025_GM1 : Interface in theage of AI
AI Days 2025_GM1 : Interface in theage of AI
Prashant Singh
 
Ad

Multilingualism in Information Retrieval System

  • 1. Running Head: MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 1 Multilingualism in Information Retrieval Systems Ariel Hess University of North Texas INFO 5206 May 5, 2017 Summary/Author’s Note: Multilingualism in information retrieval systems is a topic that researchers have spent countless hours examining. The challenge of creating a system that allows the user to input a query that contains multiple languages and a result are populated in multiple languages is something that will continue to be examined. Information retrieval systems can be adjusted to include features that are designed to translate documents and queries. This paper will examine different strategies used for text translation, projects implemented and challenges faced.
  • 2. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 2 Introduction Most search engines provide only monolingual search interface for documents mostly written in English (Chen, Lee & Yang, 2009, p.4). Users often translate their query into English before using a search engine. The goal of creating a Multilingual Retrieval System is to allow users to search for information in multiple languages and retrieve information in multiple languages. This is done with the deployment of Cross Language Retrieval, allows the user to ask a question in one language and retrieve the information in another. A survey of academic users was done to gain a better understanding of why users want to have access to information documents in different languages. This was done to see if users in a Digital Library would want access to a multilingual retrieval systems. Most users wanted the access because of educational purposes. Users would use a Multilingual Information Retrieval System to complete assignments that require documents to be searched using a language other than English. The study showed that some users felt it would be too difficult to search for documents that contain more than one language (He, Luo & Wu, 2012, pp. 188). The overall takeaway from the survey is to gain a better understanding of user needs to determine if this system works with the preexisting Information Retrieval System and the users. Developers want to dismantle the barrier between the user query and multilingual documents. This can be done by adjusting the Information Retrieval System to incorporate multilingualism by adding translation tools and various other techniques. Generally, a Multilingual Retrieval Systems works by first searching retrieving documents from different collections from each language. Then a monolingual list or results is retrieved from each collection to be merged to create a multilingual list. Each system can be adapted to cater to the needs of the organization. Different tools are employed to ensure
  • 3. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 3 compatibility. The Multilingual Retrieval System generally focuses on one or all the following: document, query, and translation. Approach/ Methods The method of executing the process of a Multilingual Retrieval System includes a variety of tools and features. The system has three levels of concern: query, translation, and document. These areas are expressed through different techniques such as creating a dictionary based model. Each Multilingual Retrieval System has its own features and deploys different methods for retrieval. These methods are adaptable and catered to the type of audience the system is intended for. The use of text mining is the process of originating quality information from an unstructured text. (Chen, Lee, & Yang, 2009, p.4) “Text mining in a multilingual setting [is also incorporated as] an automated process that is design to discover the relationship between languages (Hsiao, Lee & Yang, 2009, pp. 648).” These three techniques are often employed to deal with the problem of creating a multilingual friendly system. Using a machine translation systems, using a bilingual dictionary or terminology base, and using a statistical/probabilistic mode based on parallel texts are different methods for creating this system. Query translation is a strategy where the users query is translated into each language presented into the multilingual collection to generate a monolingual information retrieval process per language (Cumbreras, Lopez & Santiago, 2011, pp. 414)” The most common query search depends on concepts of natural language. Dictionary based tool uses a bilingual list of words and translates it into different languages. A machine translates every document in the corpus into multiple languages. Corpus Based retrieval tools use knowledge based procurement techniques to discover cross-lingual relationships and use them in Multilingual Retrieval Systems. This
  • 4. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 4 method uses word alignment to generate bilingual corpa which establishes relationships between words in different languages. This in turn is used to create a translation table used in query translation. It is recommended that the corpus be virtual to save storage and time. These three methods are grouped together because of their relation to each other. Query translation is made possible because of dictionary based tools. Once the query is translated then the information is obtained from a corpus which may have documents clustered. The documents in the corpora are commonly indexed based on a single keyword or a group of keywords that can be easily found during searching. Multilingual Comparable Corpus is another tool translated documents that have the same topics. Many of the text mining themes are based on this method (Hsiao, Lee & Yang, 2009, pp. 650). Thesaurus based multilingual retrieval takes related terms in a document that are commonly used and indexes them. This method can be done in Multilingual Information Retrieval through mapping between thesauri of different languages (Chen, Lee, & Yang, 2009, pp.6). The methods addressed above are all interchangeable with any system that is implementing a multilingual extension. The intended purpose of tools such as corpora’s is to ensure a repository is available to access the intended information. The benefit of clustering corpora’s is that is provides a narrower grouping of documents and text that are comparable. Applications The following sections provides examples of existing systems that have added the multilingual feature to an existing Information Retrieval System or created a new system. Multilingualism is designed to be incorporated into an already existing system. The following systems examine
  • 5. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 5 their implementation of multilingualism into their pre-existing system. SveMed SveMed is uses terms from the Medical Subject Headings thesaurus which contains a list of controlled vocabularies and translates these terms into different languages. These terms are arranged in a hierarchical tree and when deciding which terms are going to be indexed the indexer tries to select the finest term possible. These terms are then indexed and can be retrieved by performing a truncation search. This is to ensure user submitted queries can provide results. The interfaces use a thesaurus based database to translate the medical terms into three different languages and distinguish information between the document terms. (Gavel, & Anderson, 2014, pp.272) Uses the Solr search engine that relies solely on query expansion. “The search interface allows the user to search terms in English, Swedish, or Norwegian, and browse for MeSH terms. (Gavel & Anderson, 2014, pp.274).” A great advantage of this searching interface is that it allows the user to select which language to search for information in. GHSOM “Growing hierarchical self-organizing map (GHSOM) constructs hierarchical structure of expandable maps. Algorithms are developed after the relationships between other languages based on the hierarchical map has been determined (Chen, Lee & Yang, 2009, pp.7).” A speech tagger is used to select nouns from the text that will be used as keywords. The queries are reprocessed to convert to vectors that will attach to the overall meaning of the document. Once the keywords have been selected then they are converted into roots. The training is aid in the encoding of bilingual documents to ensure users can access the information in these documents. The expandable maps allow for better results. Merge Model
  • 6. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 6 The system first starts out with the user query that is carried out by the Cross Lingual Information Retrieval system. The query is sent to three different collections and three sets of results are populated. The merge model is design to combine the three monolingual lists into one multilingual list. In this model sixty-two features are extracted from the three levels of Multilingual Retrieval Systems query, document, and translation (Chen, Tsai, & Wang, 2011, pp.638) A learning based ranking algorithm is employed called Frank to rank items based on relevance. This learning based merge model has room for improvement. ICE-TEA Interactive Cross-Language search English with Translation Enhancement performs query translation based on an interactive Multilingual Information Access system. The language resources used is a bilingual dictionary translating English to Chinese. “Translation enhancement is a feature of this system that provides users the original returned documents and their translations. [The] system implements post-translation query expansions (He, Wu & Xu, 2012, pp.527).” The system is designed to allow users to delete any translations that were returned that was not needed. The system allows more users to interact with various stages of the Multilingual Information Access system (He, Wu & Xu, 2012, pp.536). The system will need to be developed to allow for better retrieval of relevant documents. Users can become more involved in the information retrieval process with the help of this system. BRUJA A question and answer system for the management of multilingual collections. This system uses Cross Lingual Information Retrieval to retrieve documents form a multilingual system. This a common practice employed in the multilingual systems. The system produces more correct answers in Spanish then in other languages. This system uses a machine translation
  • 7. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 7 resource which requires a word-level alignment algorithm for the translations (Cumbreras, Lopez & Santiago, 2011, pp. 420) The commonalities of each system is the use of some form of query translation to bridge the gap between the query and the documents. Each system’s goal is to enable the user to search for information in multiple languages. Systems mention the involvement of Cross Lingual Retrieval System in the Multilingual Retrieval System. These two system work together to connect the user to information requested. The user is able to submit a query and a tool is used to translate the query into a language corresponding with each collection. Then a list of monolingual results are populated. This list is merge together with the use of the merging model explained above. This model is just a model and can be adjust to cater to any other system. The process of organizing the multilingual documents is different depending on the use of the system. Documents can be translated then divided into comparable clusters or comparable corpora’s. Keywords are often taken from documents and they are then translated into various languages before being searched in the system. The sample systems and methods explained above discuss methods of helping the user from the input of the query to receiving of the information. ML News Clustering Multilingual Document Clustering involves dividing a set of documents into two languages into clusters, in such a way that similar documents are in the same cluster. News cloistering is something that is popular because of the vast amount of news available to users. This study uses a language independent representation of news documents by focusing of clustering the news documents according to their content. They started with using comparable multilingual news articles. (Fresno, Martinez & Montavo, 2015, pp.522) Name entities played a role in the natural language processing, such as machine translation, clustering, summarizing and
  • 8. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 8 extraction(cite) Comparable corpora were Spanish and English were the languages used. Expected Density is a measurement tool that can be used in a ML setting to determine the quality of the clusters (Fresno, Martinez & Montavo, 2015,pp.528). Challenges/ Limitations Each article read explain the challenges of creating a multilingual retrieval systems. There is a large amount of text that has multiple meanings in different languages. This poses a problem when indexed terms are translated into a term that is represented in the system. Multilingualism in Information Retrieval Systems is a challenge due to the limitations of existing programs that are available. The amount of resources available is limited to main items such as query translation. Many developers want to steer away from translator due to the inaccuracy of some translations. When words are translated into another language the developer runs the risk of the word not being translated correctly due to the missed meaning or inadequate translation tools for languages derived from a specific region. For example, there are many regions of origin of Spanish which means a viable translation system must be equipped to translate different versions of Spanish words. This has not been developed. Some translation systems aren’t equipped to handle the translation of proper nouns. A machine translation system is deemed as impractical due to the large amount of text being translated (Dhavachelvan & Sujatha, 2011, pp.116). The larger the text, the slower the retrieval time. It is important that when choosing keywords to comprehensive ones to allow for chance of retrieving relevant documents (Peters, et al., 2011, pp.5) In some languages there is no way to change a verb to a noun which is why some systems require the keyword to have a noun in it. (Peters, et al., 2011,pp.11) These challenges are common in an information setting where the user is looking for information in either their native or nonnative language.
  • 9. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 9 Future Research Future research should include the creation of a large bilingual text corpa, large scale text databases for testing, and a database with lexical semantic relations (Fluhr, n.d.,para. 24). Systems need to be tested in various languages. The Cross-Language Evaluation Forum spent it’s time from 2000 to 2005 researching implemented systems that have multilingual features for digital media. CLEF noticed that most systems examined pre-processed the document collection, adopted linguistic processors and language resources such as POS-taggers (Peters, 2011, pp.677). Future testing should include a wide range of users in the test group. Having a group of test users who are from one specific region does not allow for accurate results. The test group used needs to be diverse. Questions catered to multilingualism should be asked to determine how they would use the system and if it would be necessary to implement. User knowledge needs to be improved. The challenge of implementing a new system that involves more than one language can frustrate native English speakers and nonnative English speakers. A study showed “the language choices made by the students while searching for information on the Internet seemed to indicate that the students used their native languages just as much as they used English. This is a reflection of the rising multilingualism and multiculturalism in the online environment and the fact that English is not as dominant as it was some years ago: (Ajiferuke, et al., 2016, pp.498)” There needs to be adequate time set aside to train users how to search and use such system. Organizations need to decide if implementing a Multilingual Retrieval System will be beneficial to their user audience.
  • 10. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 10 Discussion Multilingualism in information retrieval systems is a concept that is still in the beginning stages. It is a challenge to take a document that is written in multiple languages and translate it into the language derived in the search query. “Multilingualism plays a role in the quality and effectiveness of communication services offered [to users] (Menard, 2011, pp.15).” Multilingualism is not only needed in library systems but a museum felt the need to offer this service to their users as well. This feature was used to allow users to search images that have been indexed in multiple languages. Multilingual Information Retrieval System provides document retrieval techniques that enable a user to enter a query, including a natural language query, in a desired one of a plurality of supported languages, and retrieve documents from a database that includes documents in at least one other language of the plurality of supported languages (Libby, et al., 1999, pp.8.) A variety of articles were examined, each discussing different but similar aspects of Multilingual Retrieval Systems. A significant improvement can be made to existing samples of retrieval systems that are implementing the new system. Multilingualism is design to be incorporate to an already existing Information Retrieval System. There are many tools currently available and tools that need to be developed. Currently this system is limited to dictionary based tools, corpora’s, clustering, indexing, and thesaurus based tools. These tools have been beneficial to the development of this system but need to be enhanced due to errors that can arise.
  • 11. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 11 References García-Cumbreras, M. Á, Martínez-Santiago, F., & Ureña-López, L. A. (2011, 10). Architecture and evaluation of BRUJA, a multilingual question answering system. Information Retrieval, 15(5), 413-432. doi:10.1007/s10791-011-9177-5 Fluhr, Christian (n.d). Multilingual Information Retrieval. Retrieved from http://www.cslu.ogi.edu/HLTsurvey/ch8node7.html Gavel, Y., & Andersson, P. (2014, 06). Multilingual query expansion in the SveMed bibliographic database: A case study. Journal of Information Science, 40(3), 269-280. doi:10.1177/0165551514524685 Libby, E. D., Palk, W., Yu, E. S., & Li, M. (1999). U.S. Patent No. 6006221. Washington, DC: U.S. Patent and Trademark Office. Montalvo, S., Martínez, R., & Fresno, V. (2015, 08). Quality prediction of multilingual news clustering: An experimental study. Journal of Information Science, 41(4), 518-530. doi:10.1177/0165551515586671 Ménard, E. (2011, 07). Search Behaviours of Image Users: A Pilot Study on Museum Objects. Partnership: The Canadian Journal of Library and Information Practice and Research, 6(1). doi:10.21083/partnership.v6i1.1433 Nzomo, P., Ajiferuke, I., Vaughan, L., & Mckenzie, P. (2016, 09). Multilingual Information Retrieval & Use: Perceptions and Practices Amongst Bi/Multilingual Academic Users. The Journal of Academic Librarianship, 42(5), 495-502. doi:10.1016/j.acalib.2016.06.012 Peters, C., Braschler, M., & Clough, P. (2011, 09). Evaluation for Multilingual Information Retrieval Systems. Multilingual Information Retrieval, 129-169. doi:10.1007/978-3-642-23008-0_5
  • 12. MULTILINGUALISM IN INFORMATION RETRIEVAL SYSTEMS 12 P., & D. (2011, 10). A Review on the Cross and Multilingual Information Retrieval. International Journal of Web & Semantic Technology, 2(4), 115-124. doi:10.5121/ijwest.2011.2409 Tsai, M., Chen, H., & Wang, Y. (2011, 09). Learning a merge model for multilingual information retrieval. Information Processing & Management, 47(5), 635-646. doi:10.1016/j.ipm.2009.12.002 Wu, D., He, D., & Luo, B. (2012, 04). Multilingual needs and expectations in digital libraries. The Electronic Library, 30(2), 182-197. doi:10.1108/02640471211221322 Wu, D., He, D., & Xu, X. (2012, 08). A study of relevance feedback techniques in interactive multilingual information access. Library Hi Tech, 30(3), 523-544. doi:10.1108/07378831211266645 Yang, H., Hsiao, H., & Lee, C. (2011, 09). Multilingual document mining and navigation using self- organizing maps. Information Processing & Management, 47(5), 647-666. doi:10.1016/j.ipm.2009.12.003 Yang, H., Lee, C., & Chen, D. (2009, 02). A method for multilingual text mining and retrieval using growing hierarchical self-organizing maps. Journal of Information Science, 35(1), 3-23. doi:10.1177/0165551508088968 Zhang, X., Liu, J. N., & Atwell, E. (n.d.) Multilingual Information Retrieval in World Wide Web. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.165.90&rep=rep1&type=pdf