SlideShare a Scribd company logo
Top 10 cited Natural Language Computing
International Journal on Natural
Language Computing (IJNLC)
http://airccse.org/journal/ijnlc/index.html
ISSN : 2278 - 1307 [Online]; 2319 - 4111 [Print]
Google Scholar
https://scholar.google.com/citations?user=A5tqIdoAAAAJ&hl=en
AN IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
Mohammed Al-Maolegi1
, Bassam Arkok2
Computer Science, Jordan University of Science and Technology, Irbid, Jordan
ABSTRACT
There are several mining algorithms of association rules. One of the most popular algorithms is
Apriori that is used to extract frequent itemsets from large database and getting the association
rule for discovering the knowledge. Based on this algorithm, this paper indicates the limitation of
the original Apriori algorithm of wasting time for scanning the whole database searching on the
frequent itemsets, and presents an improvement on Apriori by reducing that wasted time
depending on scanning only some transactions. The paper shows by experimental results with
several groups of transactions, and with several values of minimum support that applied on the
original Apriori and our implemented improved Apriori that our improved Apriori reduces the
time consumed by 67.38% in comparison with the original Apriori, and makes the Apriori
algorithm more efficient and less time consuming.
KEYWORDS
Apriori, Improved Apriori, Frequent itemset, Support, Candidate itemset, Time consuming.
FULL TEXT: http://airccse.org/journal/ijnlc/papers/3114ijnlc03.pdf
VOLUME URL: http://airccse.org/journal/ijnlc/vol3.html
REFERENCES
[1] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng,
B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, “Top 10 algorithms in
data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, Dec. 2007.
[2] S. Rao, R. Gupta, “Implementing Improved Algorithm Over APRIORI Data Mining
Association Rule Algorithm”, International Journal of Computer Science And Technology, pp.
489-493, Mar. 2012
[3] H. H. O. Nasereddin, “Stream data mining,” International Journal of Web Applications, vol.
1, no. 4, pp. 183–190, 2009.
[4] F. Crespo and R. Weber, “A methodology for dynamic data mining based on fuzzy
clustering,” Fuzzy Sets and Systems, vol. 150, no. 2, pp. 267–284, Mar. 2005.
[5] R. Srikant, “Fast algorithms for mining association rules and sequential patterns,”
UNIVERSITY OF WISCONSIN, 1996.
[6] J. Han, M. Kamber,”Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers,
Book, 2000.
[7] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery
in databases,” AI magazine, vol. 17, no. 3, p. 37, 1996.
[8] F. H. AL-Zawaidah, Y. H. Jbara, and A. L. Marwan, “An Improved Algorithm for Mining
Association Rules in Large Databases,” Vol. 1, No. 7, 311-316, 2011
[9] T. C. Corporation, “Introduction to Data Miningand Knowledge Discovery”, Two Crows
Corporation, Book, 1999.
[10] R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items
in large databases,” in ACM SIGMOD Record, vol. 22, pp. 207–216, 1993
[11] M. Halkidi, “Quality assessment and uncertainty handling in data mining process,” in Proc,
EDBT Conference, Konstanz, Germany, 2000.
AUTHORS
Mohammed Al-Maolegi Obtained his Master degree in computer science from Jordan
University of Science and Technology University (Jordan) in 2014. He received his
B.Sc. in computer information system from Mutah University (Jordan) in 2010. His
research interests include: softw are engineering, software metrics, data mining and
wireless sensor networks.
Bassam Arkok Obtained his Master degree in computer science from Jordan University of
Science and Technology University (Jordan) in 2014. He received his B.Sc. in computer science
from Alhodidah University (Yemen). His research interests include: software engineering,
software metrics, data mining and wireless sensor networks.
NAMED ENTITY RECOGNITION USING HIDDEN MARKOV
MODEL (HMM)
Sudha Morwal 1
, Nusrat Jahan 2
and Deepti Chopra 3
1
Associate Professor, Banasthali University, Jaipur, Rajasthan-302001
2
M.Tech (CS), Banasthali University, Jaipur, Rajasthan-302001
3
M. Tech (CS), Banasthali University, Jaipur, Rajasthan-302001
ABSTRACT:
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP)
which is the branch of artificial intelligence. It has many applications mainly in machine
translation, text to speech synthesis, natural language understanding, Information Extraction,
Information retrieval, question answering etc. The aim of NER is to classify words into some
predefined categories like location name, person name, organization name, date, time etc. In
this paper we describe the Hidden Markov Model (HMM) based approach of machine
learning in detail to identify the named entities. The main idea behind the use of HMM model
for building NER system is that it is language independent and we can apply this system for
any language domain. In our NER system the states are not fixed means it is of dynamic in
nature one can use it according to their interest. The corpus used by our NER system is also
not domain specific.
KEYWORDS
Named Entity Recognition (NER), Natural Language processing (NLP), Hidden Markov
Model (HMM).
FULL TEXT: http://airccse.org/journal/ijnlc/papers/1412ijnlc02.pdf
VOLUME URL: http://airccse.org/journal/ijnlc/vol1.html
REFERENCES
[1] Pramod Kumar Gupta, Sunita Arora “An Approach for Named Entity Recognition
System for Hindi: An Experimental Study” in Proceedings of ASCNT – 2009, CDAC,
Noida, India, pp. 103 – 108.
[2] Shilpi Srivastava, Mukund Sanglikar & D.C Kothari. ”Named Entity Recognition System
for Hindi Language: A Hybrid Approach” International Journal of Computational Linguistics
(IJCL), Volume(2):Issue(1):2011.Availableat:
http://cscjournals.org/csc/manuscript/Journals/IJCL/volume2/Issue1/IJCL-19.pdf
[3] “Padmaja Sharma, Utpal Sharma, Jugal Kalita”Named Entity Recognition: A Survey for
the Indian Languages”(Language in India www.languageinindia.com 11:5 May 2011 Special
Volume: Problems of Parsing in Indian Languages.) Available at:
http://www.languageinindia.com/may2011/padmajautpaljugal.pdf.
[4] Lawrence R. Rabiner, " A Tutorial on Hidden Markov Models and Selected Applications
in Speech Recognition", In Proceedings of the IEEE, VOL.77,NO.2, February
1989.Available at: http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf.
[5] Sujan Kumar Saha, Sudeshna Sarkar, Pabitra Mitra “Gazetteer Preparation for Named
Entity Recognition in Indian Languages” in the Proceeding of the 6th Workshop on Asian
Language Resources, 2008 . Available at: http://www.aclweb.org/anthology-new/I/I08/I08-
7002.pdf
[6] B. Sasidhar#1, P. M. Yohan*2, Dr. A. Vinaya Babu3, Dr. A. Govardhan4” A Survey on
Named Entity Recognition in Indian Languages with particular reference to Telugu” in IJCSI
International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011 available at :
http://www.ijcsi.org/papers/IJCSI-8-2-438-443.pdf.
[7] GuoDong Zhou Jian Su,” Named Entity Recognition using an HMM-based Chunk
Tagger” in Proceedings of the 40th Annual Meeting of the Association for Computational
Linguistics (ACL), Philadelphia, July 2002, pp. 473-480.
[8] http://en.wikipedia.org/wiki/Forward–backward_algorithm
[9] http://en.wikipedia.org/wiki/Baum-Welch_algorithm.
[10] Dan Shen, jie Zhang, Guodong Zhou,Jian Su, Chew-Lim Tan” Effective Adaptation of a
Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain” available
at: http://acl.ldc.upenn.edu/W/W03/W03-1307.pdf.
AUTHORS
Sudha Morwal is an active researcher in the field of Natural Language
Processing. Currently working as Associate Professor in the Department of
Computer Science at Banasthali University (Rajasthan), India. She has done
M.Tech (Computer Science), NET, M.Sc (Computer Science) and her PhD is in
progress from Banasthali University (Rajasthan), India.
Nusrat Jahan received B.Tech degree in Computer Science and Engineering from R.N.
Modi Engineering College, Kota, Rajasthan in 2010.Currently she is pursuing her
M.Tech degree in Computer Science and Engineering from Banasthali University,
Rajasthan. Her subject of interests includes Natural Language Processing and
Information retrieval.
Deepti Chopra received B. Tech degree in Computer Science and Engineering from
Rajasthan College of Engineering for Women, Jaipur, Rajasthan in 2011.Currently she
is pursuing her M.Tech.degree in Computer Science and Engineering from Banasthali
University, Rajasthan. Her subject of research includes Natural Language Processing.
SENTIMENT ANALYSIS FOR MODERN STANDARD ARABIC AND COLLOQUIAL
Hossam S. Ibrahim 1
, Sherif M. Abdou2
and Mervat Gheith
1
Computer Science Department, Institute of statistical studies and research (ISSR), Cairo
University, EGYPT
2
Information Technology Department, Faculty of Computers and information Cairo
University, EGYPT
ABSTRACT
The rise of social media such as blogs and social networks has fueled interest in sentiment
analysis. With the proliferation of reviews, ratings, recommendations and other forms of online
expression, online opinion has turned into a kind of virtual currency for businesses looking to
market their products, identify new opportunities and manage their reputations, therefore many
are now looking to the field of sentiment analysis. In this paper, we present a feature-based
sentence level approach for Arabic sentiment analysis. Our approach is using Arabic
idioms/saying phrases lexicon as a key importance for improving the detection of the sentiment
polarity in Arabic sentences as well as a number of novels and rich set of linguistically motivated
features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for
conflicting phrases which enhance the sentiment classification accuracy. Furthermore, we
introduce an automatic expandable wide coverage polarity lexicon of Arabic sentiment words.
The lexicon is built with gold-standard sentiment words as a seed which is manually collected
and annotated and it expands and detects the sentiment orientation automatically of new
sentiment words using synset aggregation technique and free online Arabic lexicons and
thesauruses. Our data focus on modern standard Arabic (MSA) and Egyptian dialectal Arabic
tweets and microblogs (hotel reservation, product reviews, etc.). The experimental results using
our resources and techniques with SVM classifier indicate high performance levels, with
accuracies of over 95%.
KEYWORDS
Sentiment Analysis, opinion mining, social network, sentiment lexicon, modern standard Arabic,
colloquial, natural language processing
FULL TEXT: http://airccse.org/journal/ijnlc/papers/4215ijnlc07.pdf
VOLUME URL: http://airccse.org/journal/ijnlc/vol4.html
REFERENCES
[1] A. Shoukry and A. Rafea, "Sentence-level Arabic sentiment analysis," in Collaboration
Technologies and Systems (CTS) International Conference, Denver, CO, USA, 2012, pp. 546-
550.
[2] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment classification using machine
learning techniques," in Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), 2002, pp. 79–86.
[3] D. Davidiv, O. Tsur, and A. Rappoport, "Enhanced Sentiment Learning Using Twitter Hash-
tags and Smileys," in Proceedings of the 23rd International Conference on Computational
Linguistics (Coling2010), Beijing, China, 2010, pp. 241–249.
[4] L. Barbosa and J. Feng, "Robust Sentiment Detection on Twitter from Biased and Noisy Data
" in Proceedings of the 23rd International Conference on Computational Linguistics (Coling),
2010.
[5] P. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised
Classification of Reviews," in Proceedings of the 40th Annual Meeting on Association for
Computational Linguistics ACL '02, Stroudsburg, PA, USA, 2002, pp. 417-424.
[6] V. Hatzivassiloglou and K. R. McKeown, "Predicting the semantic orientation of adjectives,"
in Proceedings of the Joint ACL / EACL Conference, 1997, pp. 174–181.
[7] B. Pang and L. Lee, "Opinion mining and sentiment analysis," Foundations and Trends in
Information Retrieval vol. 2, pp. 1–135, 2008.
[8] M. Hu and B. Liu, "Mining and summarizing customer reviews " in Proceedings of the ACM
SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2004, pp. 168–177.
[9] B. Liu, "Sentiment Analysis and Subjectivity," in Handbook of Natural Language Processing,
Second ed: CRC Press, Taylor and Francis Group, 2010.
[10] P. Alexander and P. Patrick, "Twitter as a Corpus for Sentiment Analysis and Opinion
Mining " in Proceedings of the Seventh conference on International Language Resources and
Evaluation (LREC'10), European Language Resources Association ELRA, Valletta, Malta, 2010.
[11] C. Scheible and H. Schütze, "Bootstrapping Sentiment Labels For Unannotated Documents
With Polarity PageRank," in Proceedings of the Eight International Conference on Language
Resources and Evaluation (LREC 2012), Istambol-Turki, 2012.
[12] C. Manning and D. Klein, "Optimization, maxent models, and conditional estimation
without magic," in Proceedings of the 2003 Conference of the North American Chapter of the
Association for Computational Linguistics on Human Language Technology, 2003, p. 8.
[13] A. Abbasi, H. Chen, and A. Salem, "Sentiment Analysis in Multiple Languages: Feature
Selection for Opinion Classification in Web Forums," ACM Transactions on Information
Systems, vol. 26, 2008.
[14] E. Riloff and J. Wiebe, "Learning extraction patterns for subjective expressions," in
Proceedings of the Conference on Empirical Methods in Natural Language Processing
(EMNLP), 2003.
[15] E. Riloff, J. Wiebe, and T. Wilson, "Learning subjective nouns using extraction pattern
bootstrapping," in Proceedings of the Conference on Natural Language Learning (CoNLL),
2003, pp. 25–32.
[16] M. Abdul-Mageed and M. Diab, "Subjectivity and Sentiment Annotation of Modern
Standard Arabic Newswire," in Proceedings of the Fifth Law Workshop (LAW V), Association
for Computational Linguistics, Portland, Oregon, 2011, pp. 110–118.
[17] M. Abdul-Mageed, M. Diab, and M. Korayem, "Subjectivity and sentiment analysis of
modern standard Arabic," in Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics, 2011.
[18] M. Abdul-Mageed, K. Sandra, and M. Diab, "SAMAR: A System for Subjectivity and
Sentiment Analysis of Arabic Social Media," in Proceedings of the 3rd Workshop on
Computational Approaches to Subjectivity and Sentiment Analysis, Jeju,Republic of Korea,
2012, pp. 19–28.
[19] A. Mourad and K. Darwish, "Subjectivity and Sentiment Analysis of Modern Standard
Arabic and Arabic Microblogs," in Proceedings of the 4th Workshop on Computational
Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), Atlanta, Georgia,
2013, pp. 55–64.
[20] M. Korayem, D. Crandall, and M. Abdul-Mageed, "Subjectivity and Sentiment Analysis of
Arabic: A Survey," in Advanced Machine Learning Technologies and Applications,
Communications in Computer and Information Science series 322, (Springer), AMLTA, 2012,
pp. 128-139.
[21]M. Abdul-Mageed and M. Diab, "AWATIF: A multi-genre corpus for Arabic subjectivity
and sentiment analysis," in Proceedings of the 8th International Conference on Language
Resources and Evaluation (LREC), Istanbul, Turkey, 2012a.
[22] M. Rushdi-Saleh, M. Mart´ın-Valdivia, L. Ure˜na-L´opez, and J. Perea-Ortega, "Oca:
Opinion corpus for Arabic," Journal of the American Society for Information Science and
Technology, vol. 62, pp. 2045–2054, 2011.
[23] M. Elarnaoty, S. AbdelRahman, and A. Fahmy, "A Machine Learning Approach for
Opinion Holder Extraction Arabic Language," CoRR, abs/1206.1011, vol. 3, 2012.
[24] M. Abdul-Mageed and M. Diab, "SANA: A Large Scale Multi-Genre, Multi-Dialect
Lexicon for Arabic Subjectivity and Sentiment Analysis," in Proceedings of The 9th edition of
the Language Resources and Evaluation Conference (LREC ), Reykjavik, Iceland, 2014.
[25] E. Refaee and V. Rieser, "An Arabic Twitter Corpus for Subjectivity and Sentiment
Analysis," in Proceedings of The 9th edition of the Language Resources and Evaluation
Conference (LREC 2014), Reykjavik, Iceland, 2014.
[26] M. Elmahdy, G. Rainer, M. Wolfgang, and A. Slim, "Survey on common Arabic language
forms from a speech recognition point of view," in proceeding of International conference on
Acoustics (NAG-DAGA), Rotterdam, Netherlands, 2009, pp. 63-66.
[27] J. C. Carletta, "Assessing agreement on classification tasks: the KAPPA statistic "
Computational Linguistics, vol. 22, pp. 249- 254, 1996.
[28] B. Liu, Sentiment Analysis and Opinion Mining Morgan &Claypool Publishers, 2012.
:sayings Colloquial [‫ا‬B‫ا‬ ‫الحرف‬ ‫حسب‬ ‫ومرتبة‬ ‫مشروحة‬ :‫العالمية‬ ‫مثال‬B‫موضوعى‬ ‫كشاف‬ ‫مع‬ ‫المثل‬ ‫من‬ ‫ول‬ ,Basha.
[29] an annotated and arranged by the first letter of ideals with the Scout TOPICAL]. Egypt: Al-
Ahram Foundation - Al-Ahram Center for Translation and Publishing, 1986.
[30] A. Saalan, ‫مثال‬ ‫الشعبية‬ ‫المصرية‬B‫موسوعة‬ ‫]ا‬ Encyclopedia of Egyptian popular sayings], First ed.
Egypt: Dar-alafkalarabia press, 2003. Egyptian, sayings Colloquial [ ‫ا‬ ‫النوادر‬
,‫الشعبية‬ ‫القصص‬ ,‫لعربية‬
‫ا‬B‫المصرى‬ ‫الفولكلور‬ ,‫العامية‬ ‫مثال‬ ,Husain.
F] 31[ folklore]. Egypt: General Egyptian Book Organization GEBO, 1984.
[32] G. Taher. (2006). ‫دراسة‬ ‫علمية‬
-
‫مثال‬ ‫الشعبية‬ P‫موسوعة‬ ‫]ا‬ Encyclopedia of public sayings - a
scientific study]. Available: http://books.google.com.eg/books?id=2CR_EKTjxRgC
[33] PROz. (2014). PROz website for Arabic Idioms/Maxims/Sayings (Jan 2014). Available:
http://www.proz.com/glossary-translations/
[34] M. Diab, "Towards an optimal POS tag set for Modern Standard Arabic processing," in
Proceedings of Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria,
2007.
[35] O. F. Zaidan and C. Callison-Burch, "Arabic dialect identification," Computational
Linguistics, vol. 40, pp. 171-202, March 2014 2012.
[36] H. S. Ibrahim, S. M. Abdou, and M. Gheith, "Automatic expandable large-scale sentiment
lexicon of Modern Standard Arabic and Colloquial," in 16th International Conference on
Intelligent Text Processing and Computational Linguistics (CICLING), Cairo - Egypt, 2015.
[37] M. Sharifi and W. Cohen. (2008, May, 2014). “Finding domain specifc polar words for
sentiment classification. Available: http://www.cs.cmu.edu/~mehrbod/polarity_08.pdf
[38] J. YI, T. NASUKAWA, R. BUNESCU, and W. NIBLACK, "Sentiment analyzer:
Extracting sentiments about a given topic using natural language processing techniques " in
Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), 2003, pp. 427–
434.
[39] Z. Fei, J. LIU, and G. WU, "Sentiment classification using phrase patterns," in Proceedings
of the 4th IEEE International Conference on Computer Information Technology, 2004, pp.
1147–1152.
[40] T. Joachims. (2008, Jan-2013). SVM-light: Support vector machine. Available:
http://svmlight.joachims.org/
SURVEY OF MACHINE TRANSLATION SYSTEMS IN INDIA
G V Garje1
and G K Kharate2
1
Department of Computer Engineering and Information Technology PVG’s College of
Engineering and Technology, Pune, India
2
Principal, Matoshri College of Engineering and Research Centre, Nashik, India
ABSTRACT
The work in the area of machine translation has been going on for last few decades but the
promising translation work began in the early 1990s due to advanced research in Artificial
Intelligence and Computational Linguistics. India is a multilingual and multicultural country
with over 1.25 billion population and 22 constitutionally recognized languages which are
written in 12 different scripts. This necessitates the automated machine translation system for
English to Indian languages and among Indian languages so as to exchange the information
amongst people in their local language. Many usable machine translation systems have been
developed and are under development in India and around the world. The paper focuses on
different approaches used in the development of Machine Translation Systems and also
briefly described some of the Machine Translation Systems along with their features,
domains and limitations.
KEYWORDS
Machine Translation, Example-based MT, Transfer-based MT, Interlingua-based MT
FULL TEXT: http://airccse.org/journal/ijnlc/papers/2513ijnlc04.pdf
VOLUME URL: http://airccse.org/journal/ijnlc/vol2.html
REFERENCES
[1] Sitender & Seema Bawa, (2012) “Survey of Indian Machine Translation Systems”,
International Journal Computer Science and Technolgy, Vol. 3, Issue 1, pp. 286-290, ISSN :
0976-8491 (Online) | ISSN : 2229-4333 (Print)
[2] Sanjay Kumar Dwivedi & Pramod Premdas Sukhadeve, (2010) “Machine Translation
System in Indian Perspectives”, Journal of Computer Science 6 (10): 1082-1087, ISSN 1549-
3636, © 2010 Science
[3] John Hutchins, (2005) “Current commercial machine translation systems and computer-
based translation tools: system types and their uses”, International Journal of Translation
vol.17, no.1-2, pp.5-38.
[4] Vishal Goyal & Gurpreet Singh Lehal, (2009) “Advances in Machine Translation
Systems”, National Open Access Journal, Volume 9, ISSN 1930-2940
http://www.languageinindia.
[5] Latha R. Nair & David Peter S., (2012) “Machine Translation Systems for Indian
Languages”, International Journal of Computer Applications (0975 – 8887) Volume 39–
No.1
[6] Vishal Goyal & Gurpreet Singh Lehal, (2010) “Web Based Hindi to Punjabi Machine
Translation System”, International Journal of Emerging Technologies in Web Intelligence,
Vol. 2, no. 2, pp. 148-151, ACADEMY PUBLISHER
[7] Shachi Dave, Jignashu Parikh & Pushpak Bhattacharyya, (2002) “Interlingua-based
English-Hindi Machine Translation and Language Divergence”, Journal of Machine
Translation, pp. 251-304.
[8] Sudip Naskar & Shivaji Bandyopadhyay, (2005) “Use of Machine Translation in India:
Current status” AAMT Journal, pp. 25-31.
[9] Sneha Tripathi & Juran Krishna Sarkhel, (2010) “Approaches to Machine Translation”,
International journal of Annals of Library and Information Studies, Vol. 57, pp. 388-393
[10] Gurpreet Singh Josan & Jagroop Kaur, (2011) “Punjabi To Hindi Statistical Machine
Transliteration”, International Journal of Information Technology and Knowledge
Management , Volume 4, No. 2, pp. 459-463.
[11] S. Bandyopadhyay, (2004) "ANUBAAD - The Translator from English to Indian
Languages", in proceedings of the VIIth State Science and Technology Congress. Calcutta.
India. pp. 43-51
[12] R.M.K. Sinha & A. Jain, (2002) “AnglaHindi: An English to Hindi Machine-Aided
Translation System”, International Conference AMTA(Association of Machine Translation
in the Americas)
[13] Murthy. K, (2002) “MAT: A Machine Assisted Translation System”, In Proceedings of
Symposium on Translation Support System( STRANS-2002), IIT Kanpur. pp. 134-139.
[14] Lata Gore & Nishigandha Patil, (2002) “English to Hindi - Translation System”, In
proceedings of Symposium on Translation Support Systems. IIT Kanpur. pp. 178-184.
[15] Kommaluri Vijayanand, Sirajul Islam Choudhury & Pranab Ratna
“VAASAANUBAADA - Automatic Machine Translation of Bilingual Bengali-Assamese
News Texts”, in proceedings of Language Engineering Conference-2002, Hyderabad, India
© IEEE Computer Society.
[16] Bharati, R. Moona, P. Reddy, B. Sankar, D.M. Sharma & R. Sangal, (2003) “Machine
Translation: The Shakti Approach”, Pre-Conference Tutorial, ICON-2003.
[17] S. Mohanty & R. C. Balabantaray, (2004) “English to Oriya Translation System
(OMTrans)” cs.pitt.edu/chang/cpol/c087.pdf
[18] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah,
Sawani Bade & Sasikumar M., (2006) “MaTra: A Practical Approach to Fully- Automatic
Indicative EnglishHindi Machine Translation”, In the proceedings of MSPIL-06.
[19] G. S. Josan & G. S. Lehal, (2008) “A Punjabi to Hindi Machine Translation System”, in
proceedings of COLING-2008: Companion volume: Posters and Demonstrations,
Manchester, UK, pp. 157-160.
[20] Sanjay Chatterji, Devshri Roy, Sudeshna Sarkar & Anupam Basu, (2009) “A Hybrid
Approach for Bengali to Hindi Machine Translation”, In proceedings of ICON-2009, 7th
International Conference on Natural Language Processing, pp. 83-91.
[21] Vishal Goyal & Gurpreet Singh Lehal, (2011) “Hindi to Punjabi Machine Translation
System”, in proceedings of the ACL-HLT 2011 System Demonstrations, pages 1–6, Portland,
Oregon, USA, 21 June 2011.
[22] Ankit Kumar Srivastava, Rejwanul Haque, Sudip Kumar Naskar & Andy Way, (2008)
“The MATREX (Machine Translation using Example): The DCU Machine Translation
System for ICON 2008”, in Proceedings of ICON-2008: 6th International Conference on
Natural Language Processing, Macmillan Publishers, India,
http://ltrc.iiit.ac.in/proceedings/ICON-2008.
[23] hutchinsweb.me.uk/Nutshell-2005.pdf
[24] John Hutchins “Historical survey of machine translation in Eastern and Central Europe”,
Based on an unpublished presentation at the conference on Crosslingual Language
Technology in service of an integrated multilingual Europe, 4-5 May 2012, Hamburg,
Germany. (www.hutchinsweb.me.uk/Hamburg-2012.pdf)
[25] Sampark: Machine Translation System among Indian languages (2009)
http://tdildc.in/index.php?option=com_vertical&parentid=74, http://sampark.iiit.ac.in/
[26] Akshar Bharti, Chaitanya Vineet, Amba P. Kulkarni & Rajiv Sangal, (1997)
”ANUSAARAKA: Machine Translation in stages’, Vivek, a quarterly in Artificial
Intelligence, Vol. 10, No. 3, NCST Mumbai, pp. 22-25
[27] Akshar Bharti, Chaitanya Vineet, Amba P. Kulkarni & Rajiv Sangal, (2001)
”ANUSAARAKA: overcoming the language barrier in India”, published in Anuvad:
approaches to Translation
[28] Hemant Darabari, (1999) “Computer Assisted Translation System- An Indian
Perspective”, in proceedings of MT Summit VII, Thialand [29] R. Mahesh K. Sinha & Anil
Thakur, (2005) “Machine Translation of Bi-lingual Hindi-English (Hinglish) Text”, in
proceedings of 10th Machine Translation Summit organized by Asia-Pacific Association for
Machine Translation (AAMT), Phuket, Thailand
[30] Parameswari K, Sreenivasulu N.V., Uma Maheshwar Rao G & Christopher M, (2012)
“Development of Telugu-Tamil Bidirectional Machine Translation System: A special focus
on case divergence”, in proceedings of 11th International Tamil Internet conference, pp 180-
191
[31] Salil Badodekar, (2004) “Translation Resources, Services and Tools for Indian
Languages”, a report of Centre for Indian Language Technology, IITB,
http://www.cfilt.iitb.ac.in/Translationsurvey/survey.pdf
[32] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah,
Sawani Bade & Sasikumar M, (2006) “MaTra: A Practical Approach to Fully-Automatic
Indicative EnglishHindi Machine Translation”, in proceedings of the first national
symposium on Modelling and shallow parsing of Indian languages (MSPIL-06) organized by
IIT Bambay, 202.141.152.9/clir/papers/matra_mspil06.pdf
[33] CDAC Mumbai, (2008) “MaTra: an English to Hindi Machine Translation System”, a
report by CDAC Mumbai formerly NCST.
[34] Sanjay Chatterji, Praveen Sonare, Sudeshna Sarkar & Anupam Basu, (2011) “Lattice
Based Lexical Transfer in Bengali Hindi Machine Translation Framework”, in Proceedings
of ICON2011: 9th International Conference on Natural Language Processing, Macmillan
Publishers, India. Also accessible from ltrc.iiit.ac.in/proceedings/ICON-2011.
[35] R. Ananthakrishnan, Jayprasad Hegde, Pushpak Bhattacharyya, Ritesh Shah & M.
Sasikumar, (2008) “Simple Syntactic and Morphological Processing Can Help English-Hindi
Statistical Machine Translation”, in proceedings of International Joint Conference on NLP
(IJCNLP08), Hyderabad, India.
[36] Yanjun Ma, John Tinsley, Hany Hassan, Jinhua Du & Andy Way, (2008) “Exploiting
Alignment Techniques in MATREX: the DCU Machine Translation System for IWSLT
2008’, in proceedings of IWSLT 2008, Hawaii, USA
[37] projects.uptuwatch.com/cs-it/anubharti-an-hybrid-example-based-approach-for-
machine-aidedtrapnslation/
[38] Sugata Sanyal & Rajdeep Borgohain, (2013) “Machine Translation Systems in India”,
Cornel University Library, arxiv.org/ftp/arxiv/papers/1304/1304.7728.pdf [39] Antony P. J.,
(2013) “Machine Translation Approaches and Survey for Indian Languages”, International
journal of Computational Linguistics and Chinese Language Processing Vol. 18, No. 1, pp.
47-78.
[40] Manoj Jain & Om P. Damani, (2009) “English to UNL (Interlingua) Enconversion”, in
proceedings of 4th Language and Translation Conference (LTC-09).
[41] Smriti Singh, Mrugank Dalal, Vishal Vachhani, Pushpak Bhattacharyya & Om P.
Damani, (2007) “Hindi Generation from Interlingua (UNL)”, in proceedings of MT Summit,
2007
[42] language.worldofcomputing.net
[43] sampark.iiit.ac.in [44] www.cdacmumbai.in/xlit [
45] www.cdacmumbai.in/rupantar
[46] translationjournal.net/journal/29computers.htm
[47] www.cfilt.iitb.ac.in/resources/surveys/MT-Literature%20Survey-2012-Somya.pdf
[48] www.cdacmumbai.in/e-ilmt
[49] www.iiit.net/ltrc/Anusaaraka/anu_home.html
[50] cdac.in/html/aai/mantra.asp
[51] translate.google.com/about/intl/en_ALL/
AUTHORS
G V Garje (gvg_comp@pvgcoet.ac.in) has completed ME in Computer Science and
Engineering from NITTR, Chandigarh, India in 1998. Currently he is
working as Associate Professor and Head of Computer Engineering and
Information Technology Department at Pune Vidyarthi Griha’s College of
Engineering and Technology, Pune. Presently, he is a Chairman, Board of
Studies in Information Technology, University of Pune and formerly,
chairman, Board of Studies in Computer Engineering, University of Pune.
Presently he is pursuing his Ph.D. from University of Pune, Maharashtra,
India, in the area of Machine Translation. His area of research are NLP,
Machine Translation specifically English-Marathi Language Pair. He has developed a tool
for translating simple English interrogative sentences to Marathi sentences funded by
University of Pune. His areas of interest are Data Structures, Operating Systems and
Software Architecture.
G K Kharate (gkkharate@rediffmail.com) has completed his Ph.D. in
Electronics and Telecommunication Engineering from University of Pune
and ME Electronics and Communication from Walchand College of
Engineering, Sangali, Maharashtra. Currently he is a Principal at Matoshri
College of Engineering and Research Centre, Nashik, Maharshtra. He is a
Dean, Faculty of Engineering and Member of Man agement Council,
University of Pune. He is former Chairman, Board of Studies in Electronics
Engineering, University of Pune. His areas of research are Image Processing, Pattern
Recognition, and Artificial Intelligence. His areas of interest are Digital Electronics,
Computer Networks, Image Processing and Natural Language Processing.
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
Deepti Bhalla1
, Nisheeth Joshi2
and Iti Mathur3
1,2,3
Apaji Institute, Banasthali University, Rajasthan, India
ABSTRACT
Machine Transliteration has come out to be an emerging and a very important research area
in the field of machine translation. Transliteration basically aims to preserve the phonological
structure of words. Proper transliteration of name entities plays a very significant role in
improving the quality of machine translation. In this paper we are doing machine
transliteration for English-Punjabi language pair using rule based approach. We have
constructed some rules for syllabification. Syllabification is the process to extract or separate
the syllable from the words. In this we are calculating the probabilities for name entities
(Proper names and location). For those words which do not come under the category of name
entities, separate probabilities are being calculated by using relative frequency through a
statistical machine translation toolkit known as MOSES. Using these probabilities we are
transliterating our input text from English to Punjabi.
KEYWORDS
Machine Translation, Machine Transliteration, Name entity recognition, Syllabification
FULL TEXT: http://airccse.org/journal/ijnlc/papers/2213ijnlc07.pdf
VOLUME URL: http://airccse.org/journal/ijnlc/vol2.html
REFERENCES
[1] Kamal Deep and Vishal Goyal, (2011) ”Development of a Punjabi to English
transliteration system”. In International Journal of Computer Science and Communication
Vol. 2, No. 2, pp. 521-526.
[2] Shubhangi Sharma, Neha Bora and Mitali Halder, (2012) “English-Hindi Transliteration
using Statistical Machine Translation in different Notation” International Conference on
Computing and Control Engineering (ICCCE 2012).
[3] Kamal Deep, Dr.Vishal Goyal, (2011) “Hybrid Approach for Punjabi to English
Transliteration System” International Journal of Computer Applications (0975 – 8887)
Volume 28– No.1.
[4] Jasleen kaur Gurpreet Singh josan , (2011) “Statistical Approach to Transliteration from
English to Punjabi”, In Proceeding of International Journal on Computer Science and
Engineering (IJCSE), Vol. 3 Issue 4, p1518.
[5] Er. Sheilly Padda, Rupinderdeep Kaur, Er. Nidhi, (2012) “Punjabi Phonetic: Punjabi Text
to IPA Conversion” International Journal of Emerging Technology and Advanced
Engineering Website: www.ijetae.com ISSN 2250-2459, Volume 2, Issue 10.
[6] Gurpreet Singh Josan, Gurpreet Singh Lehal, (2010) “A Punjabi to Hindi Machine
Transliteration System” Computational Linguistics and Chinese Language Processing Vol.
15, No. 2, pp. 77-102.
[7] Manikrao L Dhore, Shantanu K Dixit, Tushar D Sonwalkar, (2012) “Hindi to English
Machine Transliteration of Named Entities using Conditional Random Fields.” International
Journal of Computer Applications;6/15/2012, Vol. 48, p31.
[8] Musa, Hafiz, Rabith A.kadir, Azreen Azman, M.taufik Abadullah, (2011) "Syllabification
algorithm based on syllable rules matching for Malay language." Proceedings of the 10th
WSEAS international conference on Applied computer and applied computational science.
World Scientific and Engineering Academy and Society (WSEAS).
[9] To download IRSTLM toolkit http://www.statmt.org
[10] Jenny Rose Finkel, Trond Grenager, and Christopher Manning, (2005) Incorporating
Non-local Information into Information Extraction Systems by Gibbs Sampling. Proceedings
of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005),
pp. 363-370.
[11] Daniel Jurafsky, James H. Martin Speech and Language processing An Introduction to
speech Recognition, natural language processing, and computational linguistics.
AUTHORS
Deepti Bhalla is pursuing her M.Tech in Computer Science from Banasthali
University, Rajasthan and is working as a Research Assistant in English-Indian
Languages Machine Translation System Project sponsored by TDIL
Programme, DEITY. She has her interest in Machine Translation specifically in
English-Punjabi Language Pair. She has developed various tools on Punjabi Language
Processing. Her current research interest includes Natural Language
Processing and Machine Translation.
Nisheeth Joshi is a researcher working in the area of Machine Translation.
He has been primarily working in design and development of evaluation
Matrices in Indian languages. Besides this he is also actively involved in the
development of MT engines for English to Indian Languages. He is one of
the expert empanelled with TDIL programme, Department of electronics
Information Technology (DEITY), Govt. of India, a premier organization which foresees
Language Technology Funding and Research in India. He has several publications in various
journals and conferences and also serves on the Programme Committees and
Editorial Boards of several conferences and journals.
Iti Mathur is an assistant professor at Banasthali University. Her primary
area of research is computational semantics and ontological engineering.
Besides this she is also involved in the development of MT engines for
English to Indian Languages. She is one of the experts empanelled with TDIL
Programme, Department of Electronics Information Technology (DEITY), Govt. of India, a
premier organization which foresees Language Technology Funding and Research in India.
She has several publications in various journals and conferences and also serves on the
Programme Committees and Editorial Boards of several conferences and journals.
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING
SVM
P H Rathod1
, M L Dhore2
, R M Dhore3
1,2
Department of Computer Engineering, Vishwakarma Institute of Technology, Pune
3
Pune Vidhyarthi Griha’s College of Engineering and Technology, Pune
ABSTRACT
Language transliteration is one of the important areas in NLP. Transliteration is very useful
for converting the named entities (NEs) written in one script to another script in NLP
applications like Cross Lingual Information Retrieval (CLIR), Multilingual Voice Chat
Applications and Real Time Machine Translation (MT). The most important requirement of
Transliteration system is to preserve the phonetic properties of source language after the
transliteration in target language. In this paper, we have proposed the named entity
transliteration for Hindi to English and Marathi to English language pairs using Support
Vector Machine (SVM). In the proposed approach, the source named entity is segmented into
transliteration units; hence transliteration problem can be viewed as sequence labeling
problem. The classification of phonetic units is done by using the polynomial kernel function
of Support Vector Machine (SVM). Proposed approach uses phonetic of the source language
and n-gram as two features for transliteration.
KEYWORDS
Machine Transliteration, n-gram, Support Vector Machine, Syllabification.
FULL TEXT: http://airccse.org/journal/ijnlc/papers/2413ijnlc04.pdf
VOLUME URL: http://airccse.org/journal/ijnlc/vol2.html
REFERENCES
[1] Padariya Nilesh, Chinnakotla Manoj, Nagesh Ajay, Damani Om P.(2008) “Evaluation of
Hindi to English, Marathi to English and English to Hindi”, IIT Mumbai CLIR at FIRE.
[2] Saha Sujan Kumar, Ghosh P. S, Sarkar Sudeshna and Mitra Pabitra (2008) “Named entity
recognition in Hindi using maximum entropy and transliteration.”
[3] BIS (1991) “Indian standard code for information interchange (ISCII)”, Bureau of Indian
Standards, New Delhi.
[4] Joshi R K, Shroff Keyur and Mudur S P (2003) “A Phonemic code based scheme for
effective processing of Indian languages”, National Centre for Software Technology,
Mumbai, 23rd Internationalization and Unicode Conference, Prague, Czech Republic, pp 1-
17.
[5] Arbabi M, Fischthal S M, Cheng V C and Bart E (1994) “Algorithms for Arabic name
transliteration”, IBM Journal of Research and Development, pp 183-194.
[6] Knight Kevin and Graehl Jonathan (1997) “Machine transliteration”, In proceedings of
the 35th annual meetings of the Association for Computational Linguistics, pp 128-135.
[7] Stalls Bonnie Glover and Kevin Knight (1998) “Translating names and technical terms in
Arabic text.”
[8] Al-Onaizan Y, Knight K (2002) “Machine translation of names in Arabic text”,
Proceedings of the ACL conference workshop on computational approaches to Semitic
languages.
[9] Jaleel Nasreen Abdul and Larkey Leah S. (2003) “Statistical transliteration for English-
Arabic cross language information retrieval”, In Proceedings of the 12th international
conference on information and knowledge management, pp 139 – 146.
[10] Jung S. Y., Hong S., S., Paek E.(2003) “English to Korean transliteration model of
extended Markov window”, In Proceedings of the 18th Conference on Computational
Linguistics, pp 383–389.
[11] Ganapathiraju M., Balakrishnan M., Balakrishnan N., Reddy R. (2005) “OM: One Tool
for Many (Indian) Languages”, ICUDL: International Conference on Universal Digital
Library, Hangzhou.
[12] Malik M G A (2006) “Punjabi Machine Transliteration”, Proceedings of the 21st
International Conference on Computational Linguistics and the 44th annual meeting of the
ACL, pp 1137–1144.
[13] Sproat R.(2002) “Brahmi scripts, In Constraints on Spelling Changes”, Fifth
International Workshop on Writing Systems, Nijmegen, The Netherlands.
[14] Sproat R.(2003) “A formal computational analysis of Indic scripts”, In International
Symposium on Indic Scripts: Past and Future, Tokyo.
[15] Sproat R.(2004) “A computational theory of writing systems, In Constraints on Spelling
Changes”, Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands.
[16] Kopytonenko M. , Lyytinen K. , and Krkkinen T.(2006) “Comparison of phonological
representations for the grapheme-to-phoneme mapping, In Constraints on Spelling Changes”,
Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands.
[17] Ganesh S, Harsha S, Pingali P, and Verma V (2008) “Statistical transliteration for cross
language information retrieval using HMM alignment and CRF”, In Proceedings of the
Workshop on CLIA, Addressing the Needs of Multilingual Societies.
[18] Sumaja Sasidharan, Loganathan R, and Soman K P (2009) “English to Malayalam
Transliteration Using Sequence Labeling Approach” International Journal of Recent Trends
in Engineering, Vol. 1, No. 2, pp 170-172
[19] Oh Jong-Hoon, Kiyotaka Uchimoto, and Kentaro Torisawa (2009) “Machine
transliteration using target-language grapheme and phoneme: Multi-engine transliteration
approach”, Proceedings of the Named Entities Workshop ACL-IJCNLP Suntec,
Singapore,AFNLP, pp 36–39
[20] Antony P.J, Soman K.P (2010) “Kernel Method for English to Kannada Transliteration”,
Conference on Machine Learning and Cybernetics, pp 11-14
[21] Ekbal A. and Bandyopadhyay S. (2007) “A Hidden Markov Model based named entity
recognition system: Bengali and Hindi as case studies”, Proceedings of 2nd International
conference in Pattern Recognition and Machine Intelligence, Kolkata, India, pp 545–552.
[22] Ekbal A. and Bandyopadhyay S. (2008) “Bengali named entity recognition using
support vector machine”, In Proceedings of the IJCNLP-08 Workshop on NER for South and
South East Asian languages, Hyderabad, India, pp 51–58.
[23] Ekbal A. and Bandyopadhyay S. (2008), “Development of Bengali named entity tagged
corpus and its use in NER system”, In Proceedings of the 6th Workshop on Asian Language
Resources.
[24] Ekbal A. and Bandyopadhyay S. (2008) “A web-based Bengali news corpus for named
entity recognition”, Language Resources & Evaluation, vol. 42, pp 173–182.
[25] Ekbal A. and Bandyopadhyay S.(2008) “Improving the performance of a NER system
by postprocessing and voting”, In Proceedings of Joint IAPR International Workshop on
Structural Syntactic and Statistical Pattern Recognition, Orlando, Florida, pp 831–841.
[26] Ekbal A. and Bandyopadhyay S.(2009) “Bengali Named Entity Recognition using
Classifier Combination”, In Proceedings of Seventh International Conference on Advances in
Pattern Recognition, pp 259–262.
[27] Ekbal A. and Bandyopadhyay S. (2009) “Voted NER system using appropriate
unlabelled data”, In Proceedings of the Named Entities Workshop, ACL-IJCNLP.
[28] Ekbal A. and Bandyopadhyay S. (2010) “ Named entity recognition using appropriate
unlabeled data, post-processing and voting”, In Informatica, Vol 34, No. 1, pp 55-76.
[29] Chinnakotla Manoj K., Damani Om P., and Satoskar Avijit (2010) “Transliteration for
ResourceScarce Languages”, ACM Trans. Asian Lang. Inform,Article 14, pp 1-30.
[30] Kishorjit Nongmeikapam (2012) “Transliterated SVM Based Manipuri POS Tagging”,
Advances in Computer Science and Engineering and Applications, pp 989-999
[31] K.P.Sonam, V. Ajay, R. Laganatha.(2009) “Machine Learning with SVM and Other
Kernel Methods”, Machine Learning Book, PHI.
[32] Koul Omkar N. (2008) “Modern Hindi Grammar”, Dunwoody Press
[33] Walambe M. R. (1990) “Marathi Shuddalekhan”, Nitin Prakashan, Pune [34] Walambe
M. R. (1990) “Marathi Vyakran”, Nitin Prakashan, Pune.
[35] Dhore M L, Dixit S K and Dhore R M (2012) “Hindi and Marathi to English NE
Transliteration Tool using Phonology and Stress Analysis”, 24th International Conference on
Computational Linguistic,s Proceedings of COLING Demonstration Papers, at IIT Bombay,
pp 111-118
AUTHORS
(pravin.rathod@vit.edu) has completed BE in Information Technology, from
Government College of Engineering, Karad, Maharashtra, India, in 2008.
Recently he has completed ME in Computer Science and Engineering from
Vishwakarma Institute of Technology, Pune, India in 2013. Currently he is
working as Assistant Professor in Department of Computer Engineering at
Vishwakarma Institute of Technology, Pune. He has his interest in Machine
Translation and Machine Transliteration specifically in DevanagariEnglish Language Pairs.
His current areas of research are Mobile Ad hoc Networks, Internet Routing Algorithms,
Computer Networking, Machine Translation and Transliteration.
M. L. Dhore (manikrao.dhore@vit.edu) has completed ME in Computer
Science and Engineering from NITTR, Chandigarh, India in 1998. Currently
he is working as Associate Professor in Department of Computer Engineering
at Vishwakarma Institute of Technology, Pune. Presently he is pursuing his
Ph.D. from University of Solapur, Maharashtra, India, in the area of
Computational Lingui stics. He has his interest in Machine Translation and
Machine Transliteration specifically in Marathi-English and Hindi- English Language Pairs.
He has developed the tools for Devanagari to English Machine Transliteration for the online
web based commercial applications. His current areas of research are Internet Routing
Algorithms, Computer Networking, Machine Translation and Transliteration.
Ruchi M Dhore (ruchidhore93@gmail.com) is the student of Third Year
Computer Engineering at Pune Vidyarthi Griha’s College of Engineering and
Technology, Pune, Maharashtra, India. She is scholar student of her college
and securing distinction every year in the University of Pune examinations.
She is very good in programming and won the prizes in state level and
national level competitions. Her area of research interest includes Text
Processing and Pattern Searching. She likes to build her carrier in the development of
language processing tools for Marathi language.
HYBRID PART-OF-SPEECH TAGGER FOR NON-VOCALIZED ARABIC TEXT
Meryeme Hadni1 , Said Alaoui Ouatik1
, Abdelmonaime Lachkar2
and Mohammed
Meknassi1
1
FSDM, Sidi Mohamed Ben Abdellah University (USMBA), Morocco
2
E.N.S.A, Sidi Mohamed Ben Abdellah University (USMBA), Morocco
ABSTRACT
Part of speech tagging (POS tagging) has a crucial role in different fields of natural language
processing (NLP) including Speech Recognition, Natural Language Parsing, Information
Retrieval and Multi Words Term Extraction. This paper proposes an efficient and accurate
POS Tagging technique for Arabic language using hybrid approach. Due to the ambiguity
issue, Arabic Rule-Based method suffers from misclassified and unanalyzed words. To
overcome these two problems, we propose a Hidden Markov Model (HMM) integrated with
Arabic Rule-Based method. Our POS tagger generates a set of three POS tags: Noun, Verb,
and Particle. The proposed technique uses the different contextual information of the words
with a variety of the features which are helpful to predict the various POS classes. To
evaluate its accuracy, the proposed method has been trained and tested with two corpora: the
Holy Quran Corpus and Kalimat Corpus for undiacritized Classical Arabic language. The
experiment results demonstrate the efficiency of our method for Arabic POS Tagging. In fact,
the obtained accuracies rates are 97.6%, 96.8% and 94.4% for respectively our Hybrid
Tagger, HMM Tagger and for the Rule-Based Tagger with Holy Quran Corpus. And for
Kalimat Corpus we obtained 94.60%, 97.40% and 98% for respectively Rule-Based Tagger,
HMM Tagger and our Hybrid Tagger.
KEY WORDS
Part-Of-Speech Tagger, Natural Language Applications, Natural Language Parsing, Hidden
Markov Model, Multi Words Term Extraction, Speech Recognition.
FULL TEXT : http://airccse.org/journal/ijnlc/papers/2613ijnlc01.pdf
VOLUME URL: http://airccse.org/journal/ijnlc/vol2.html
REFERENCE
[1] Lee, S.hyun. & Kim Mi Na, (2008) “This is my paper”, ABC Transactions on ECE, Vol.
10, No. 5, pp120-122.
[2] Gizem, Aksahya & Ayese, Ozcan (2009) Comunications & Networks, Network Books,
ABC Publishers. [1] http://en.wikipedia.org/wiki/Part-of-speech_tagging.
[2] L.Van Guilder, (1995) “Automated Part of Speech Tagging: A Brief Overview” Handout
for LING361, Georgetown University.
[3] H. Halteren, J.Zavrel & Walter Daelemans (2001).Improving Accuracy in NLP Through
Combination of Machine Learning Systems. Computational Linguistics. 27(2): 199–229.
[4] DeRose & J.Steven (1990) "Stochastic Methods for Resolution of Grammatical Category
Ambiguity in Inflected and Uninflected Languages." PhD.Dissertation. Providence, RI:
Brown University Department of Cognitive and Linguistic Sciences.
[5] N. kumar Kumar, Anikel Dalal &Uma Sawant (2006)”hindi part of speech tagging and
chunking”, NLPAI machine learning contest.
[6] M. Mohseni, H. Motalebi, B. Minaei-bidgoli & M. Shokrollahi-far (2008) “A farsi part-
of-speech tagger based on markov”. In the proceedings of ACM symposium on Applied
computing, Brazil.
[7] S. Jabbari &B. Allison(2007)“Persian Part of Speech Tagging”, In the Proceedings of
Workshop on Computational Approaches to Arabic Script-Based Languages (CAASL-2),
USA.
[8] E. Brill (1995) “Transformation-Based Error-Driven Learning and Natural Language
Processing: A case Study in Part of Speech Tagging”, Computational Linguistics, USA.
[9] M. Hepple (2000), ”Independence and Commitment: Assumptions for Rapid Training and
Execution of Rule-based Part of-Speech Taggers”, In Proceedings of the 38th Annual
Meeting of the Association for Computational Linguistics (ACL). Hong Kong.
[10] T. Brants (200),“TNT – a Statistical Part-of-Speech Tagger”, In the Proceedings of 6th
conference on applied natural language processing (ANLP), USA.
[11] K. Megerdoomian (2004), “Developing a Persian part-of speech tagger”, In the
Proceedings of first Workshop on Persian Language and computer, Iran .
[12] Khoja, S.( 2001) “ APT: Arabic part-of-speech tagger”. Proceeding of the Student
Workshop at the 2nd Meeting of the NAACL, (NAACL’01), Carnegie Mellon University,
Pennsylvania, pp: 1- 6. http://zeus.cs.pacificu.edu/shereen/NAACL.pdf
[13] Freeman A (2001), “Brill’s POS tagger and a morphology parser for Arabic”, In
ACL’01 Workshop on Arabic language processing.
[14] Maamouri M, Cieri C. (2002). “Resources for Arabic Natural Language Processing at
the LDC”, Proceedings of the International Symposium on the Processing of Arabic,Tunisia,
pp.125-146.
[15] Diab M., Hacioglu K. and Jurafsky D. (2004), “Automatic Tagging of Arabic Text:
From Raw Text to Base Phrase Chunks”. proc. of HLTNAACL’04: 149–152.
[16] Banko M, Moore R. C. (2004). “Part of Speech Tagging in Context”, Proc of the 20th
international conference on Computational Linguistics, Switzerland.
[17] Tlili-Guiassa Y. (2006) “Hybrid Method for Tagging Arabic Text”. Journal of Computer
Science 2 (3): 245-248.
[18] L. Young-Suk, K. Papineni & S. Roukos ( 2003), “Language Model Based Arabic Word
Segmentation,” in Proceedings of the Annual Meeting on Association for Computational
Linguistics, Japan, pp. 399- 406.
[19] A.T Al-Taani & S. Abu-Al-Rub (2009),”A rule-based approaches for tagging non-
vocalized Arabic words”. The International Arab Journal of Information Technology,
Volume6 (3): 320-328.
[20] T. Brants (2000),” TnT: A statistical part of speech tagger”, Proceedings of the 6th
Conference on Applied Natural Language Processing, Apr. 29- May 04, Association for
Computational Linguistics Morristown, New Jersey, USA., pp: 224-231.
[21] NLTK, Natural Language Toolkit. http://www.nltk.org/Home
[22] Quranic Arabic Corpus: http://corpus.quran.com
[23] Quran Tagset: http://corpus.quran.com/documentation/tagset.jsp
[24] N. Habash & O. Rambow (2005), “Arabic Tokenization, Part-of-Speech Tagging and
Morphological Disambiguation in One Fell Swoop,” in Proceedings of the Annual Meeting
on Association for Computational Linguistics, Michigan, pp. 573-580.
[25] http://sibawayh.emi.ac.ma/web/s/?q=node/79
[26] http://bit.ly/16jO3Ks [27] http://www.alwatan.com/
[28] F. Al Shamsi & A.Guessoum(2006),” A Hidden Markov Model–Based POS Tagger for
Arabic”, 8es Journées internationales d’Analyse statistique des Données Textuelles (JADT).
[29] M. Albared & O.Nazlia(2010),” Automatic Part of Speech Tagging for Arabic: An
Experiment Using Bigram Hidden Markov Model “,Springer-Verlag Berlin Heidelberg,
LNAI 6401, pp. 361– 370.
[30] Y.O. Mohamed Elhadj(2009),” Statistical Part-of-Speech Tagger for Traditional Arabic
Texts”, Journal of Computer Science 5 (11): 794-800.
Authors
Miss. Meryeme Hadni Phd Student in Laboratory of computer and Modelization,
Faculty of Sciences, University Sidi Mohamed Ben Abdellah (USMBA), Fez,
Morocco. She has also presented different papers at different National and
International conferences.
Pr. Abdelmonaime LACHKAR : received his PhD degree from the USMBA,
Morocco in 2004, He is Professor and Computer Engineering Program Coordinator
at (E.N.S.A, FES), and the Head of the Systems Architecture and Multimedia Team
(LSIS Laboratory) at Sidi Mohamed Ben Abdellah University, Fez, Morocco. His
current research interests include Arabic Natural Language Processing ANLP,
Arabic Web Document Clustering and Categorization, Arabic Information Retrieval Systems,
Arabic Text Summarization, Arabic Ontologies development and usage, Arabic Semantic
Search Engines (SSEs).
Pr. Said Alaoui Ouatik i s working as a Professor in Department of Computer
Science, Faculty of Science Dhar EL Mahraz (FSDM), Fez, Morocco. His
research interests include high-dimensional indexing and content-based retrieval,
Arabic Document Categorization. 2D/3D Shapes Indexing and Retrieval in large
3D Objects Database.
Mohammed Meknassi received Ph. D degree in computer sciences from Montreal
University in 1993. Since 1993, he is professor of computer sciences. He teaches
and makes his scientific research in the following fields: Parallel processing,
Distributed Computing, Operating Systems and Image Processing. He is a member
of the research unit: Systems Image and Multimedia (SIM) attached to the
laboratory: Computer Sciences, Statistics and Quality (LISQ). He is the chief of the computer
Sciences Department in the Faculty of Sciences Dhar El Mahraz of Fez.
HYBRID APPROACHES FOR AUTOMATIC VOWELIZATION OF ARABIC
TEXTS
Mohamed Bebah1
Chennoufi Amine2
Mazroui Azzeddine3
and Lakhouaja Abdelhak4
1
Arab Center for Research and Policy Studies, Doha, Qatar
2
Faculty of Sciences/University Mohamed I, Oujda, Morocco
3
Faculty of Sciences/University Mohamed I, Oujda, Morocco
4
Faculty of Sciences/University Mohamed I, Oujda, Morocco
ABSTRACT
Hybrid approaches for automatic vowelization of Arabic texts are presented in this article.
The process is made up of two modules. In the first one, a morphological analysis of the text
words is performed using the open source morphological Analyzer AlKhalil Morpho Sys.
Outputs for each word analyzed out of context, are its different possible vowelizations. The
integration of this Analyzer in our vowelization system required the addition of a lexical
database containing the most frequent words in Arabic language. Using a statistical approach
based on two hidden Markov models (HMM), the second module aims to eliminate the
ambiguities. Indeed, for the first HMM, the unvowelized Arabic words are the observed
states and the vowelized words are the hidden states. The observed states of the second
HMM are identical to those of the first, but the hidden states are the lists of possible diacritics
of the word without its Arabic letters. Our system uses Viterbi algorithm to select the optimal
path among the solutions proposed by Al Khalil Morpho Sys. Our approach opens an
important way to improve the performance of automatic vowelization of Arabic texts for
other uses in automatic natural language processing.
KEYWORDS
Arabic language, Automatic vowelization, morphological analysis, hidden Markov model,
corpus
FULL TEXT: http://airccse.org/journal/ijnlc/papers/3414ijnlc04.pdf
VOLUME URL: http://airccse.org/journal/ijnlc/vol3.html
REFERENCE
[1] Debili, Fathi & Hadhemi Achour (1998) Voyellation automatique de l’arabe. In
Proceedings of the workshop on Computation approaches to Semitic languages, COLING-
ACL ’98, pages 42–49.
[2] Maamouri, Mohamed, Ann Bies, and Seth Kulick. (2006) Diacritization: a challenge to
Arabic treebank annotation and parsing. In Proceedings of the British Computer Society
Arabic NLP/MT Conference.
[3] Zitouni, Imed, Jefrey S. Sorensen, and Ruhi Sarikaya. (2006) Maximum entropy based
restoration of arabic diacritics. In Proceedings of the 21st International Conference on
Computational Linguistics and 44th Annual Meeting of the Association for Computational
Linguistics. Workshop on Computational approaches to Semitic Languages, Sydney,
Australia. July 2006, pages 577– 584.
[4] Vergyri, Dimitra & Katrin Kirchhoff. (2004) Automatic diacritization of arabic for
acoustic modeling in speech recognition. In Proceedings of the Workshop on Computational
Approaches to Arabic Script-based Languages. COLING, Geneva, pages 66–73.
[5] Messaoudi, Abdel, Lori Lamel, and Jean-Luc Gauvain. (2004) The limsi rt04 b arabic
system. In Proceedings DARPA RT04, Palisades NY.
[6] Elshafei, Moustafa, Husni Al-Muhtaseb, and Mansour Alghamdi. (2006) Machine
generation of arabic diacritical marks. In The 2006 World Congress in Computer Science
Computer Engineering, and Applied Computing. Las Vegas, USA., pages 128–133.
[7] Emam, Ossama and Volker Fischer. (2005) Hierarchical approach for the statistical
vowelization of arabic text. Technical report, IBM Corporation Intellectual Property Law,
Austin, TX, US.
[8] Schlippe, Tim, ThuyLinh Guyen, and ThuyLinh Vogel. (2008) Diacritization as a
machinetranslation problem and as a sequence labeling problem. In 8th AMTA conference,
Hawai., pages 21–25.
[9] Gal, Yaakov. (2002) An hmm approach to vowel restoration in arabic and hebrew. In
Proceedings of the Workshop on Computational Approaches to Semitic Languages-
Philadelphia- Association for Computational Linguistics, pages 27–33.
[10] Nelken, Rani and Stuart M. Shieber. (2005) Arabic diacritization using weighted finite-
state transducers. In Proceedings of the ACL 2005 Workshop On Computational Approaches
To Semitic Languages, Ann Arbor, Michigan, USA,, pages 79–86.
[11] Habash, Nizar and Owen Rambow. (2007) Arabic diacritization through full
morphological tagging. In Proceeding NAACL-Short ’07 Human Language Technologies
2007: The Conference of the North American Chapter of the Association for Computational
Linguistics - Companion Volume - Short Papers Rochester - New York- USA, pages 53–56.
[12] Bebah, Mohamed Ould Abdallahi Ould, Abdelouafi Meziane, Azzeddine Mazroui, and
Abdelhak Lakhouaja. (2012) Approche morpho-statistique pour la voyellation des texts
arabes. Journal of Computer Science and Engineering, 5(1).
[13] Bebah, Mohamed Ould Abdallahi Ould, Abdelouafi Meziane, Azzeddine Mazroui, and
Abdelhak Lakhouaja. (2011) Alkhalil morpho sys. In 7th International Computing
Conference in Arabic, May 31- June 2, 2011, Riyadh, Saudi Arabia.
[14] El-Sadany, T and M Hashish. (1988) Semi-automatic vowelization of arabic verbs. In
10th NC Conference, Jeddah, Saudi Arabia.
[15] Manning, Chris and Hinrich Schutze. (1999) Foundations of statistical natural language
processing. Massachusetts Institute of Technology Press - Library of Congress Cataloging in
publication Information.
[16] Deltour, Amelie. (2003) Methodes statistiques pour la voyellisation des texts arabes.
Master’s thesis, ENSIMAG-Karlsruhe University.
AUTHORS
Mohamed Ould Abdallahi Ould Bebah Researcher at Doha Institute for
Graduate Studies since 2013. "Doctorat" in Computer Sciences, Mohamed I
University, Oujda, Morocco, 2013. "DESA" in "Numerical Analysis, Computer
science and Signal Processing" from Mohamed I University, 2005. Member of
Arabic NLP unit, LaRI Laboratory, Mohamed I University since 2005. Member
of Language Studies unit at the Center of Social and Human Studies and
Researches (CERHSO) in Oujda since 2005. Member of Arabic Language Engineering
Society in Morocco (ALESM) since 2012.
Amine CHENNOUFI Master in Computer Sciences from Mohamed I
University, Oujda, Morocco 2010. Engineering degree in Meteorology from the
National School of Meteorology ENM in Toulouse in France since 1994. Since
January 2011, He prepares his PhD thesis in Arabic Natural Language
Processing within the Computer Research Laboratory (LaRI). His research
interests are especially in Automatic vowelization of Arabic language. Professionally he is
the responsible of Meteorological Centre of Oujda Airport.
Azzeddine Mazroui "Doctorat d’Etat" in Numerical Analysis, University
Mohammed I Morocco, 2000. PHD in Probability and Statistics, Pierre & Marie
Curie University France, 1993. Professor of mathematics and Computer Sciences
in University Mohammed I. Member of Computer Research Laboratory (LaRI).
Director of the ANLP unit in the LaRI laboratory
Abdelhak Lakhouaja "Doctorat d’Etat" in Computer Sciences, University
Mohammed I Morocco, 2000. Professor of Computer Sciences in University
Mohammed I. Member of Computer Research Laboratory (LaRI). Cofounder of
the ANLP unit in the LaRI laboratory.
WORD SENSE DISAMBIGUATION USING WSD SPECIFIC WORDNET OF
POLYSEMY WORDS
Udaya Raj Dhungana1
, Subarna Shakya2
, Kabita Baral3
and Bharat Sharma4
1, 2, 4
Department of Electronics and Computer Engineering, Central Campus, IOE,
Tribhuvan University, Lalitpur, Nepal
3
Department of Computer Science, GBS, Lamachaur, Kaski, Nepal
ABSTRACT
This paper presents a new model of WordNet that is used to disambiguate the correct sense
of polysemy word based on the clue words. The related words for each sense of a polysemy
word as well as single sense word are referred to as the clue words. The conventional
WordNet organizes nouns, verbs, adjectives and adverbs together into sets of synonyms
called synsets each expressing a different concept. In contrast to the structure of WordNet,
we developed a new model of WordNet that organizes the different senses of polysemy
words as well as the single sense words based on the clue words. These clue words for each
sense of a polysemy word as well as for single sense word are used to disambiguate the
correct meaning of the polysemy word in the given context using knowledge based Word
Sense Disambiguation (WSD) algorithms. The clue word can be a noun, verb, adjective or
adverb.
KEYWORDS
Word Sense Disambiguation, WordNet, Polysemy Words, Synset, Hypernymy, Context
word, Clue Word
FULL TEXT: http://airccse.org/journal/ijnlc/papers/3414ijnlc05.pdf
VOLUME URL: http://airccse.org/journal/ijnlc/vol3.html
REFERENCES
[1] N. Ide and J. Véronis, “Word sense disambiguation: The state of the art,” Computational
Linguistics, pp. 1–40, 1998.
[2] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller, “Introduction to
wordnet: An on-line lexical database,” International Journal of Lexicography, 1998.
[3] U. R. Dhungana and S. Shakya, “Word sense disambiguation in nepali language,” in The
Fourth International Conference on Digital Information and Communication Technology and
Its Application (DICTAP2014), Bangkok, Thailand, 2014, pp. 46–50.
[4] M. E. Lesk, “Automatic sense disambiguation using machine readable dictionaries: How
to tell a pine cone from a ice cream cone,” in SIGDOC Conference, Toronto, Ontario,
Canada, 1986.
[5] S. Banerjee and T. Pedersen, “An adapted lesk algorithm for word sense disambiguation
using wordnet,” in Third International Conference on Intelligent Text Processing and
Computational Linguistics, Gelbukh, 2002.
[6] M. Sinha, M. K. Reddy, P. Bhattacharyya, P. Pandey, and L. Kashyap, “Hindi word sense
disambiguation,” Master’s thesis, Indian Institute of Technology Bombay, Mumbai, India,
2004.
[7] N. Shrestha, A. V. H. Patrick, and S. K. Bista, “Resources for nepali word sense
disambiguation,” in IEEE International conference on Natural Language Processing and
Knowledge Engineering (IEEE NLP-KE’08), Beijing, China, 2008.
[8] P. Bhattacharyya, P. Pande, and L. Lupu, “Hindi wordnet,” Indian Institute of
Technology Bombay, Mumbai, India, Tech. Rep., 2008.
[9] N. Shrestha, A. V. H. Patrick, and S. K. Bista, “Nepali word sense disambiguation using
lesk algorithm,” Master’s thesis, Kathmandu University, Dhulikhel, Kavre, Nepal, 2004.
AN UNSUPERVISED APPROACH TO DEVELOP STEMMER
Mohd. Shahid Husain
Department of Information Technology, Integral University, Lucknow
ABSTRACT
This paper presents an unsupervised approach for the development of a stemmer (For the
case of Urdu & Marathi language). Especially, during last few years, a wide range of
information in Indian regional languages has been made available on web in the form of e-
data. But the access to these data repositories is very low because the efficient search
engines/retrieval systems supporting these languages are very limited. Hence automatic
information processing and retrieval is become an urgent requirement. To train the system
training dataset, taken from CRULP [22] and Marathi corpus [23] are used. For generating
suffix rules two different approaches, namely, frequency based stripping and length based
stripping have been proposed. The evaluation has been made on 1200 words extracted from
the Emille corpus. The experiment results shows that in the case of Urdu language the
frequency based suffix generation approach gives the maximum accuracy of 85.36% whereas
Length based suffix stripping algorithm gives maximum accuracy of 79.76%. In the case of
Marathi language the systems gives 63.5% accuracy in the case of frequency based stripping
and achieves maximum accuracy of 82.5% in the case of length based suffix stripping
algorithm.
KEYWORDS
Stemming, Morphology, Urdu stemmer, Marathi stemmer, Information retrieval.
FULL TEXT: http://airccse.org/journal/ijnlc/papers/1212ijnlc02.pdf
VOLUME URL: http://airccse.org/journal/ijnlc/vol1.html
REFERENCES
[1] Rizvi, J et. al. “Modeling case marking system of Urdu-Hindi languages by using
semantic information”. Proceedings of the IEEE International Conference on Natural
Language Processing and Knowledge Engineering (IEEE NLP-KE '05). 2005.
[2] Butt, M. King, T. “Non-Nominative Subjects in Urdu: A Computational Analysis”.
Proceedings of the International Symposium on Non-nominative Subjects, Tokyo, December,
pp. 525-548, 2001.
[3] Savoy, J. “Stemming of French words based on grammatical categories”. Journal of the
American Society for Information Science, 44(1), 1-9, 1993.
[4] Lovins Julie Beth: Development of a stemming algorithm. Mechanical Translation and
Computational Linguistics 11:22–31. (1968)
[5] Mokhtaripour, A., Jahanpour, S. “Introduction to a New Farsi Stemmer”. Proceedings of
CIKM Arlington VA, USA, 826-827, 2006.
[6] R. Wicentowski. "Multilingual Noise-Robust Supervised Morphological Analysis using
the Word Frame Model." In Proceedings of Seventh Meeting of the ACL Special Interest
Group on Computational Phonology (SIGPHON), pp. 70-77, 2004.
[7] Rizvi, Hussain M. “Analysis, Design and Implementation of Urdu Morphological
Analyzer”. SCONEST, 1-7, 2005.
[8] Krovetz, R. “View Morphology as an Inference Process”. In the Proceedings of 5th
International Conference on Research and Development in Information Retrieval, 1993.
[9] Porter, M. “An Algorithm for Suffix Stripping”. Program, 14(3): 130-137, 1980.
[10] Thabet, N. “Stemming the Qur’an”. In the Proceedings of the Workshop on
Computational Approaches to Arabic Script-based Languages, 2004.
[11] Paik, Pauri. “A Simple Stemmer for Inflectional Languages”. FIRE 2008.
[12] Sharifloo, A.A., Shamsfard M. “A Bottom up Approach to Persian Stemming”. IJCNLP,
2008
[13] Croft and Xu. “Corpus-Based Stemming Using Co occurrence of Word Variants”. ACM
Transactions on Information Systems (61-81), 1998.
[14] Kumar, A. and Siddiqui, T. “An Unsupervised Hindi Stemmer with Heuristics
Improvements”. In Proceedings of the Second Workshop on Analytics for Noisy
Unstructured Text Data, 2008.
[15] Kumar, M. S. and Murthy, K. N. “Corpus Based Statistical Approach for Stemming
Telugu”. Creation of Lexical Resources for Indian Language Computing and Processing
(LRIL), C-DAC, Mumbai, India, 2007.
[16] Qurat-ul-Ain Akram, Asma Naseer, Sarmad Hussain. “Assas-Band, an Affix-Exception-
List Based Urdu Stemmer”. Proceedings of ACL-IJCNLP 2009.
[17] http://en.wikipedia.org/wiki/Urdu
[18] .http://www.bbc.co.uk/languages/other/guide/urdu/steps.shtml
[19] http://www.andaman.org/BOOK/reprints/weber/rep-weber.htm
[20] Natural Language processing and Information Retrieval by Tanveer Siddiqui, U S
Tiwary.
[21] Information retrieval: data structure and algorithms by William B. Frakes, Ricardo
Baeza-Yates.
[22] http://www.crulp.org/software/ling_resources.htm
[23] Marathi Corpus, http://www.cfilt.iitb.ac.in/marathi_Corpus/ , IIT Powai, Mumbai.
AUTHORS
Mohd. Shahid Husain M.Tech. from Indian Institute of Information Technolo
gy (IIIT-A), Allahabad with Intelligent System as specialization. Currently
pursuing Ph.D. and working as assistant professor in the department of
Information Technology, Integral University, Lucknow.

More Related Content

What's hot (20)

PDF
Hf3413291335
IJERA Editor
 
PDF
IRJET- Survey for Amazon Fine Food Reviews
IRJET Journal
 
PDF
Research Inventy : International Journal of Engineering and Science
researchinventy
 
PDF
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET Journal
 
PDF
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
ijnlc
 
PDF
An unsupervised approach to develop ir system the case of urdu
ijaia
 
PDF
Adopting Quadrilateral Arabic Roots in Search Engine of E-library System
paperpublications3
 
PDF
QUrdPro: Query processing system for Urdu Language
IJERA Editor
 
PDF
The Process of Information extraction through Natural Language Processing
Waqas Tariq
 
PDF
TOP 10 Cited Computer Science & Information Technology Research Articles From...
AIRCC Publishing Corporation
 
PDF
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
ijnlc
 
PDF
[IJET-V2I3P19] Authors: Priyanka Sharma
IJET - International Journal of Engineering and Techniques
 
PPTX
Aspect extraction (A survey)
Mido Razaz
 
PDF
Information_Retrieval_Models_Nfaoui_El_Habib
El Habib NFAOUI
 
DOC
CV_Tatari
Kamran Tatari
 
DOCX
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
kevig
 
PDF
Accessing database using nlp
eSAT Publishing House
 
PDF
Classification of News and Research Articles Using Text Pattern Mining
IOSR Journals
 
PDF
INTELLIGENT QUERY PROCESSING IN MALAYALAM
ijcsa
 
PDF
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
El Habib NFAOUI
 
Hf3413291335
IJERA Editor
 
IRJET- Survey for Amazon Fine Food Reviews
IRJET Journal
 
Research Inventy : International Journal of Engineering and Science
researchinventy
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET Journal
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
ijnlc
 
An unsupervised approach to develop ir system the case of urdu
ijaia
 
Adopting Quadrilateral Arabic Roots in Search Engine of E-library System
paperpublications3
 
QUrdPro: Query processing system for Urdu Language
IJERA Editor
 
The Process of Information extraction through Natural Language Processing
Waqas Tariq
 
TOP 10 Cited Computer Science & Information Technology Research Articles From...
AIRCC Publishing Corporation
 
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
ijnlc
 
[IJET-V2I3P19] Authors: Priyanka Sharma
IJET - International Journal of Engineering and Techniques
 
Aspect extraction (A survey)
Mido Razaz
 
Information_Retrieval_Models_Nfaoui_El_Habib
El Habib NFAOUI
 
CV_Tatari
Kamran Tatari
 
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
kevig
 
Accessing database using nlp
eSAT Publishing House
 
Classification of News and Research Articles Using Text Pattern Mining
IOSR Journals
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
ijcsa
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
El Habib NFAOUI
 

Similar to Top 10 cited articles in nlp (20)

PDF
April 2022 - Top 10 cited articles.pdf
kevig
 
PDF
September 2021: Top10 Cited Articles in Natural Language Computing
kevig
 
PDF
February 2024 - Top 10 cited articles.pdf
kevig
 
PDF
July 2024: Top 10 Download Article in Natural Language Computing
kevig
 
PDF
August 2024: Top 10 Downloaded Articles in Natural Language Computing
kevig
 
PDF
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...
aciijournal
 
PDF
A Novel approach for Document Clustering using Concept Extraction
AM Publications
 
PDF
A comparative study on different types of effective methods in text mining
IAEME Publication
 
PDF
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
ijistjournal
 
PDF
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
ijistjournal
 
PDF
A Review on Text Mining in Data Mining
ijsc
 
DOC
DOC
butest
 
PDF
An optimal unsupervised text data segmentation 3
prj_publication
 
PDF
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
Journal For Research
 
DOCX
MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017
kevig
 
PDF
A genetic based research framework 3
prj_publication
 
PDF
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
home
 
PDF
Extraction and Retrieval of Web based Content in Web Engineering
IRJET Journal
 
PDF
System Based Mining for Discovering Human Interaction in Meetings
IRJET Journal
 
PDF
June 2020: Top Read Articles in Advanced Computational Intelligence
aciijournal
 
April 2022 - Top 10 cited articles.pdf
kevig
 
September 2021: Top10 Cited Articles in Natural Language Computing
kevig
 
February 2024 - Top 10 cited articles.pdf
kevig
 
July 2024: Top 10 Download Article in Natural Language Computing
kevig
 
August 2024: Top 10 Downloaded Articles in Natural Language Computing
kevig
 
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...
aciijournal
 
A Novel approach for Document Clustering using Concept Extraction
AM Publications
 
A comparative study on different types of effective methods in text mining
IAEME Publication
 
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
ijistjournal
 
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
ijistjournal
 
A Review on Text Mining in Data Mining
ijsc
 
DOC
butest
 
An optimal unsupervised text data segmentation 3
prj_publication
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
Journal For Research
 
MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017
kevig
 
A genetic based research framework 3
prj_publication
 
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
home
 
Extraction and Retrieval of Web based Content in Web Engineering
IRJET Journal
 
System Based Mining for Discovering Human Interaction in Meetings
IRJET Journal
 
June 2020: Top Read Articles in Advanced Computational Intelligence
aciijournal
 
Ad

More from kevig (20)

PDF
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
kevig
 
PDF
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
PDF
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
kevig
 
PDF
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
PDF
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH ...
kevig
 
PDF
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
PDF
UNIQUE APPROACH TO CONTROL SPEECH, SENSORY AND MOTOR NEURONAL DISORDER THROUG...
kevig
 
PDF
Call For Papers - 6th International Conference on Natural Language Processing...
kevig
 
PDF
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
PDF
Natural language processing through the subtractive mountain clustering algor...
kevig
 
PDF
Call For Papers - 4th International Conference on Machine Learning, NLP and D...
kevig
 
PDF
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
PDF
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
kevig
 
PDF
Call For Papers - 17th International Conference on Networks & Communications ...
kevig
 
PDF
Call For Papers - 6th International Conference on NLP & Big Data (NLPD 2025)
kevig
 
PDF
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
PDF
LOCATION-BASED SENTIMENT ANALYSIS OF 2019 NIGERIA PRESIDENTIAL ELECTION USING...
kevig
 
PDF
Call For Papers - 6th International Conference on NLP & Big Data (NLPD 2025)
kevig
 
PDF
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
PDF
HUMAN INTENTION SPACE - NATURAL LANGUAGE PHRASE DRIVEN APPROACH TO PLACE SOCI...
kevig
 
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models
kevig
 
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
kevig
 
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH ...
kevig
 
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
UNIQUE APPROACH TO CONTROL SPEECH, SENSORY AND MOTOR NEURONAL DISORDER THROUG...
kevig
 
Call For Papers - 6th International Conference on Natural Language Processing...
kevig
 
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
Natural language processing through the subtractive mountain clustering algor...
kevig
 
Call For Papers - 4th International Conference on Machine Learning, NLP and D...
kevig
 
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
kevig
 
Call For Papers - 17th International Conference on Networks & Communications ...
kevig
 
Call For Papers - 6th International Conference on NLP & Big Data (NLPD 2025)
kevig
 
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
LOCATION-BASED SENTIMENT ANALYSIS OF 2019 NIGERIA PRESIDENTIAL ELECTION USING...
kevig
 
Call For Papers - 6th International Conference on NLP & Big Data (NLPD 2025)
kevig
 
Call For Papers - International Journal on Natural Language Computing (IJNLC)
kevig
 
HUMAN INTENTION SPACE - NATURAL LANGUAGE PHRASE DRIVEN APPROACH TO PLACE SOCI...
kevig
 
Ad

Recently uploaded (20)

PPTX
Basics of Auto Computer Aided Drafting .pptx
Krunal Thanki
 
PDF
NOISE CONTROL ppt - SHRESTH SUDHIR KOKNE
SHRESTHKOKNE
 
PDF
Comparative Analysis of the Use of Iron Ore Concentrate with Different Binder...
msejjournal
 
PPTX
Sensor IC System Design Using COMSOL Multiphysics 2025-July.pptx
James D.B. Wang, PhD
 
PDF
SMART HOME AUTOMATION PPT BY - SHRESTH SUDHIR KOKNE
SHRESTHKOKNE
 
PPTX
Unit II: Meteorology of Air Pollution and Control Engineering:
sundharamm
 
PPTX
ETP Presentation(1000m3 Small ETP For Power Plant and industry
MD Azharul Islam
 
PDF
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
PPTX
Fluid statistics and Numerical on pascal law
Ravindra Kolhe
 
PDF
Jual GPS Geodetik CHCNAV i93 IMU-RTK Lanjutan dengan Survei Visual
Budi Minds
 
PDF
Natural Language processing and web deigning notes
AnithaSakthivel3
 
PDF
1_ISO Certifications by Indian Industrial Standards Organisation.pdf
muhammad2010960
 
PPT
04 Origin of Evinnnnnnnnnnnnnnnnnnnnnnnnnnl-notes.ppt
LuckySangalala1
 
PDF
July 2025 - Top 10 Read Articles in Network Security & Its Applications.pdf
IJNSA Journal
 
PDF
POWER PLANT ENGINEERING (R17A0326).pdf..
haneefachosa123
 
PPTX
GitHub_Copilot_Basics...........................pptx
ssusera13041
 
PDF
MRI Tool Kit E2I0500BC Plus Presentation
Ing. Ph. J. Daum GmbH & Co. KG
 
PPT
IISM Presentation.ppt Construction safety
lovingrkn
 
PDF
Natural Language processing and web deigning notes
AnithaSakthivel3
 
PDF
Non Text Magic Studio Magic Design for Presentations L&P.pdf
rajpal7872
 
Basics of Auto Computer Aided Drafting .pptx
Krunal Thanki
 
NOISE CONTROL ppt - SHRESTH SUDHIR KOKNE
SHRESTHKOKNE
 
Comparative Analysis of the Use of Iron Ore Concentrate with Different Binder...
msejjournal
 
Sensor IC System Design Using COMSOL Multiphysics 2025-July.pptx
James D.B. Wang, PhD
 
SMART HOME AUTOMATION PPT BY - SHRESTH SUDHIR KOKNE
SHRESTHKOKNE
 
Unit II: Meteorology of Air Pollution and Control Engineering:
sundharamm
 
ETP Presentation(1000m3 Small ETP For Power Plant and industry
MD Azharul Islam
 
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
Fluid statistics and Numerical on pascal law
Ravindra Kolhe
 
Jual GPS Geodetik CHCNAV i93 IMU-RTK Lanjutan dengan Survei Visual
Budi Minds
 
Natural Language processing and web deigning notes
AnithaSakthivel3
 
1_ISO Certifications by Indian Industrial Standards Organisation.pdf
muhammad2010960
 
04 Origin of Evinnnnnnnnnnnnnnnnnnnnnnnnnnl-notes.ppt
LuckySangalala1
 
July 2025 - Top 10 Read Articles in Network Security & Its Applications.pdf
IJNSA Journal
 
POWER PLANT ENGINEERING (R17A0326).pdf..
haneefachosa123
 
GitHub_Copilot_Basics...........................pptx
ssusera13041
 
MRI Tool Kit E2I0500BC Plus Presentation
Ing. Ph. J. Daum GmbH & Co. KG
 
IISM Presentation.ppt Construction safety
lovingrkn
 
Natural Language processing and web deigning notes
AnithaSakthivel3
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
rajpal7872
 

Top 10 cited articles in nlp

  • 1. Top 10 cited Natural Language Computing International Journal on Natural Language Computing (IJNLC) http://airccse.org/journal/ijnlc/index.html ISSN : 2278 - 1307 [Online]; 2319 - 4111 [Print] Google Scholar https://scholar.google.com/citations?user=A5tqIdoAAAAJ&hl=en
  • 2. AN IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES Mohammed Al-Maolegi1 , Bassam Arkok2 Computer Science, Jordan University of Science and Technology, Irbid, Jordan ABSTRACT There are several mining algorithms of association rules. One of the most popular algorithms is Apriori that is used to extract frequent itemsets from large database and getting the association rule for discovering the knowledge. Based on this algorithm, this paper indicates the limitation of the original Apriori algorithm of wasting time for scanning the whole database searching on the frequent itemsets, and presents an improvement on Apriori by reducing that wasted time depending on scanning only some transactions. The paper shows by experimental results with several groups of transactions, and with several values of minimum support that applied on the original Apriori and our implemented improved Apriori that our improved Apriori reduces the time consumed by 67.38% in comparison with the original Apriori, and makes the Apriori algorithm more efficient and less time consuming. KEYWORDS Apriori, Improved Apriori, Frequent itemset, Support, Candidate itemset, Time consuming. FULL TEXT: http://airccse.org/journal/ijnlc/papers/3114ijnlc03.pdf VOLUME URL: http://airccse.org/journal/ijnlc/vol3.html
  • 3. REFERENCES [1] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, Dec. 2007. [2] S. Rao, R. Gupta, “Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm”, International Journal of Computer Science And Technology, pp. 489-493, Mar. 2012 [3] H. H. O. Nasereddin, “Stream data mining,” International Journal of Web Applications, vol. 1, no. 4, pp. 183–190, 2009. [4] F. Crespo and R. Weber, “A methodology for dynamic data mining based on fuzzy clustering,” Fuzzy Sets and Systems, vol. 150, no. 2, pp. 267–284, Mar. 2005. [5] R. Srikant, “Fast algorithms for mining association rules and sequential patterns,” UNIVERSITY OF WISCONSIN, 1996. [6] J. Han, M. Kamber,”Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, Book, 2000. [7] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery in databases,” AI magazine, vol. 17, no. 3, p. 37, 1996. [8] F. H. AL-Zawaidah, Y. H. Jbara, and A. L. Marwan, “An Improved Algorithm for Mining Association Rules in Large Databases,” Vol. 1, No. 7, 311-316, 2011 [9] T. C. Corporation, “Introduction to Data Miningand Knowledge Discovery”, Two Crows Corporation, Book, 1999. [10] R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” in ACM SIGMOD Record, vol. 22, pp. 207–216, 1993 [11] M. Halkidi, “Quality assessment and uncertainty handling in data mining process,” in Proc, EDBT Conference, Konstanz, Germany, 2000. AUTHORS Mohammed Al-Maolegi Obtained his Master degree in computer science from Jordan University of Science and Technology University (Jordan) in 2014. He received his B.Sc. in computer information system from Mutah University (Jordan) in 2010. His research interests include: softw are engineering, software metrics, data mining and wireless sensor networks.
  • 4. Bassam Arkok Obtained his Master degree in computer science from Jordan University of Science and Technology University (Jordan) in 2014. He received his B.Sc. in computer science from Alhodidah University (Yemen). His research interests include: software engineering, software metrics, data mining and wireless sensor networks.
  • 5. NAMED ENTITY RECOGNITION USING HIDDEN MARKOV MODEL (HMM) Sudha Morwal 1 , Nusrat Jahan 2 and Deepti Chopra 3 1 Associate Professor, Banasthali University, Jaipur, Rajasthan-302001 2 M.Tech (CS), Banasthali University, Jaipur, Rajasthan-302001 3 M. Tech (CS), Banasthali University, Jaipur, Rajasthan-302001 ABSTRACT: Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific. KEYWORDS Named Entity Recognition (NER), Natural Language processing (NLP), Hidden Markov Model (HMM). FULL TEXT: http://airccse.org/journal/ijnlc/papers/1412ijnlc02.pdf VOLUME URL: http://airccse.org/journal/ijnlc/vol1.html
  • 6. REFERENCES [1] Pramod Kumar Gupta, Sunita Arora “An Approach for Named Entity Recognition System for Hindi: An Experimental Study” in Proceedings of ASCNT – 2009, CDAC, Noida, India, pp. 103 – 108. [2] Shilpi Srivastava, Mukund Sanglikar & D.C Kothari. ”Named Entity Recognition System for Hindi Language: A Hybrid Approach” International Journal of Computational Linguistics (IJCL), Volume(2):Issue(1):2011.Availableat: http://cscjournals.org/csc/manuscript/Journals/IJCL/volume2/Issue1/IJCL-19.pdf [3] “Padmaja Sharma, Utpal Sharma, Jugal Kalita”Named Entity Recognition: A Survey for the Indian Languages”(Language in India www.languageinindia.com 11:5 May 2011 Special Volume: Problems of Parsing in Indian Languages.) Available at: http://www.languageinindia.com/may2011/padmajautpaljugal.pdf. [4] Lawrence R. Rabiner, " A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", In Proceedings of the IEEE, VOL.77,NO.2, February 1989.Available at: http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf. [5] Sujan Kumar Saha, Sudeshna Sarkar, Pabitra Mitra “Gazetteer Preparation for Named Entity Recognition in Indian Languages” in the Proceeding of the 6th Workshop on Asian Language Resources, 2008 . Available at: http://www.aclweb.org/anthology-new/I/I08/I08- 7002.pdf [6] B. Sasidhar#1, P. M. Yohan*2, Dr. A. Vinaya Babu3, Dr. A. Govardhan4” A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu” in IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011 available at : http://www.ijcsi.org/papers/IJCSI-8-2-438-443.pdf. [7] GuoDong Zhou Jian Su,” Named Entity Recognition using an HMM-based Chunk Tagger” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 473-480. [8] http://en.wikipedia.org/wiki/Forward–backward_algorithm [9] http://en.wikipedia.org/wiki/Baum-Welch_algorithm. [10] Dan Shen, jie Zhang, Guodong Zhou,Jian Su, Chew-Lim Tan” Effective Adaptation of a Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain” available at: http://acl.ldc.upenn.edu/W/W03/W03-1307.pdf. AUTHORS Sudha Morwal is an active researcher in the field of Natural Language Processing. Currently working as Associate Professor in the Department of Computer Science at Banasthali University (Rajasthan), India. She has done M.Tech (Computer Science), NET, M.Sc (Computer Science) and her PhD is in progress from Banasthali University (Rajasthan), India.
  • 7. Nusrat Jahan received B.Tech degree in Computer Science and Engineering from R.N. Modi Engineering College, Kota, Rajasthan in 2010.Currently she is pursuing her M.Tech degree in Computer Science and Engineering from Banasthali University, Rajasthan. Her subject of interests includes Natural Language Processing and Information retrieval. Deepti Chopra received B. Tech degree in Computer Science and Engineering from Rajasthan College of Engineering for Women, Jaipur, Rajasthan in 2011.Currently she is pursuing her M.Tech.degree in Computer Science and Engineering from Banasthali University, Rajasthan. Her subject of research includes Natural Language Processing.
  • 8. SENTIMENT ANALYSIS FOR MODERN STANDARD ARABIC AND COLLOQUIAL Hossam S. Ibrahim 1 , Sherif M. Abdou2 and Mervat Gheith 1 Computer Science Department, Institute of statistical studies and research (ISSR), Cairo University, EGYPT 2 Information Technology Department, Faculty of Computers and information Cairo University, EGYPT ABSTRACT The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations, therefore many are now looking to the field of sentiment analysis. In this paper, we present a feature-based sentence level approach for Arabic sentiment analysis. Our approach is using Arabic idioms/saying phrases lexicon as a key importance for improving the detection of the sentiment polarity in Arabic sentences as well as a number of novels and rich set of linguistically motivated features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for conflicting phrases which enhance the sentiment classification accuracy. Furthermore, we introduce an automatic expandable wide coverage polarity lexicon of Arabic sentiment words. The lexicon is built with gold-standard sentiment words as a seed which is manually collected and annotated and it expands and detects the sentiment orientation automatically of new sentiment words using synset aggregation technique and free online Arabic lexicons and thesauruses. Our data focus on modern standard Arabic (MSA) and Egyptian dialectal Arabic tweets and microblogs (hotel reservation, product reviews, etc.). The experimental results using our resources and techniques with SVM classifier indicate high performance levels, with accuracies of over 95%. KEYWORDS Sentiment Analysis, opinion mining, social network, sentiment lexicon, modern standard Arabic, colloquial, natural language processing FULL TEXT: http://airccse.org/journal/ijnlc/papers/4215ijnlc07.pdf VOLUME URL: http://airccse.org/journal/ijnlc/vol4.html
  • 9. REFERENCES [1] A. Shoukry and A. Rafea, "Sentence-level Arabic sentiment analysis," in Collaboration Technologies and Systems (CTS) International Conference, Denver, CO, USA, 2012, pp. 546- 550. [2] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment classification using machine learning techniques," in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002, pp. 79–86. [3] D. Davidiv, O. Tsur, and A. Rappoport, "Enhanced Sentiment Learning Using Twitter Hash- tags and Smileys," in Proceedings of the 23rd International Conference on Computational Linguistics (Coling2010), Beijing, China, 2010, pp. 241–249. [4] L. Barbosa and J. Feng, "Robust Sentiment Detection on Twitter from Biased and Noisy Data " in Proceedings of the 23rd International Conference on Computational Linguistics (Coling), 2010. [5] P. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews," in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics ACL '02, Stroudsburg, PA, USA, 2002, pp. 417-424. [6] V. Hatzivassiloglou and K. R. McKeown, "Predicting the semantic orientation of adjectives," in Proceedings of the Joint ACL / EACL Conference, 1997, pp. 174–181. [7] B. Pang and L. Lee, "Opinion mining and sentiment analysis," Foundations and Trends in Information Retrieval vol. 2, pp. 1–135, 2008. [8] M. Hu and B. Liu, "Mining and summarizing customer reviews " in Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2004, pp. 168–177. [9] B. Liu, "Sentiment Analysis and Subjectivity," in Handbook of Natural Language Processing, Second ed: CRC Press, Taylor and Francis Group, 2010. [10] P. Alexander and P. Patrick, "Twitter as a Corpus for Sentiment Analysis and Opinion Mining " in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), European Language Resources Association ELRA, Valletta, Malta, 2010. [11] C. Scheible and H. Schütze, "Bootstrapping Sentiment Labels For Unannotated Documents With Polarity PageRank," in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istambol-Turki, 2012. [12] C. Manning and D. Klein, "Optimization, maxent models, and conditional estimation without magic," in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2003, p. 8.
  • 10. [13] A. Abbasi, H. Chen, and A. Salem, "Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums," ACM Transactions on Information Systems, vol. 26, 2008. [14] E. Riloff and J. Wiebe, "Learning extraction patterns for subjective expressions," in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2003. [15] E. Riloff, J. Wiebe, and T. Wilson, "Learning subjective nouns using extraction pattern bootstrapping," in Proceedings of the Conference on Natural Language Learning (CoNLL), 2003, pp. 25–32. [16] M. Abdul-Mageed and M. Diab, "Subjectivity and Sentiment Annotation of Modern Standard Arabic Newswire," in Proceedings of the Fifth Law Workshop (LAW V), Association for Computational Linguistics, Portland, Oregon, 2011, pp. 110–118. [17] M. Abdul-Mageed, M. Diab, and M. Korayem, "Subjectivity and sentiment analysis of modern standard Arabic," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011. [18] M. Abdul-Mageed, K. Sandra, and M. Diab, "SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media," in Proceedings of the 3rd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, Jeju,Republic of Korea, 2012, pp. 19–28. [19] A. Mourad and K. Darwish, "Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs," in Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), Atlanta, Georgia, 2013, pp. 55–64. [20] M. Korayem, D. Crandall, and M. Abdul-Mageed, "Subjectivity and Sentiment Analysis of Arabic: A Survey," in Advanced Machine Learning Technologies and Applications, Communications in Computer and Information Science series 322, (Springer), AMLTA, 2012, pp. 128-139. [21]M. Abdul-Mageed and M. Diab, "AWATIF: A multi-genre corpus for Arabic subjectivity and sentiment analysis," in Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 2012a. [22] M. Rushdi-Saleh, M. Mart´ın-Valdivia, L. Ure˜na-L´opez, and J. Perea-Ortega, "Oca: Opinion corpus for Arabic," Journal of the American Society for Information Science and Technology, vol. 62, pp. 2045–2054, 2011. [23] M. Elarnaoty, S. AbdelRahman, and A. Fahmy, "A Machine Learning Approach for Opinion Holder Extraction Arabic Language," CoRR, abs/1206.1011, vol. 3, 2012.
  • 11. [24] M. Abdul-Mageed and M. Diab, "SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis," in Proceedings of The 9th edition of the Language Resources and Evaluation Conference (LREC ), Reykjavik, Iceland, 2014. [25] E. Refaee and V. Rieser, "An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis," in Proceedings of The 9th edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik, Iceland, 2014. [26] M. Elmahdy, G. Rainer, M. Wolfgang, and A. Slim, "Survey on common Arabic language forms from a speech recognition point of view," in proceeding of International conference on Acoustics (NAG-DAGA), Rotterdam, Netherlands, 2009, pp. 63-66. [27] J. C. Carletta, "Assessing agreement on classification tasks: the KAPPA statistic " Computational Linguistics, vol. 22, pp. 249- 254, 1996. [28] B. Liu, Sentiment Analysis and Opinion Mining Morgan &Claypool Publishers, 2012. :sayings Colloquial [‫ا‬B‫ا‬ ‫الحرف‬ ‫حسب‬ ‫ومرتبة‬ ‫مشروحة‬ :‫العالمية‬ ‫مثال‬B‫موضوعى‬ ‫كشاف‬ ‫مع‬ ‫المثل‬ ‫من‬ ‫ول‬ ,Basha. [29] an annotated and arranged by the first letter of ideals with the Scout TOPICAL]. Egypt: Al- Ahram Foundation - Al-Ahram Center for Translation and Publishing, 1986. [30] A. Saalan, ‫مثال‬ ‫الشعبية‬ ‫المصرية‬B‫موسوعة‬ ‫]ا‬ Encyclopedia of Egyptian popular sayings], First ed. Egypt: Dar-alafkalarabia press, 2003. Egyptian, sayings Colloquial [ ‫ا‬ ‫النوادر‬ ,‫الشعبية‬ ‫القصص‬ ,‫لعربية‬ ‫ا‬B‫المصرى‬ ‫الفولكلور‬ ,‫العامية‬ ‫مثال‬ ,Husain. F] 31[ folklore]. Egypt: General Egyptian Book Organization GEBO, 1984. [32] G. Taher. (2006). ‫دراسة‬ ‫علمية‬ - ‫مثال‬ ‫الشعبية‬ P‫موسوعة‬ ‫]ا‬ Encyclopedia of public sayings - a scientific study]. Available: http://books.google.com.eg/books?id=2CR_EKTjxRgC [33] PROz. (2014). PROz website for Arabic Idioms/Maxims/Sayings (Jan 2014). Available: http://www.proz.com/glossary-translations/ [34] M. Diab, "Towards an optimal POS tag set for Modern Standard Arabic processing," in Proceedings of Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, 2007. [35] O. F. Zaidan and C. Callison-Burch, "Arabic dialect identification," Computational Linguistics, vol. 40, pp. 171-202, March 2014 2012. [36] H. S. Ibrahim, S. M. Abdou, and M. Gheith, "Automatic expandable large-scale sentiment lexicon of Modern Standard Arabic and Colloquial," in 16th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING), Cairo - Egypt, 2015. [37] M. Sharifi and W. Cohen. (2008, May, 2014). “Finding domain specifc polar words for sentiment classification. Available: http://www.cs.cmu.edu/~mehrbod/polarity_08.pdf
  • 12. [38] J. YI, T. NASUKAWA, R. BUNESCU, and W. NIBLACK, "Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques " in Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), 2003, pp. 427– 434. [39] Z. Fei, J. LIU, and G. WU, "Sentiment classification using phrase patterns," in Proceedings of the 4th IEEE International Conference on Computer Information Technology, 2004, pp. 1147–1152. [40] T. Joachims. (2008, Jan-2013). SVM-light: Support vector machine. Available: http://svmlight.joachims.org/
  • 13. SURVEY OF MACHINE TRANSLATION SYSTEMS IN INDIA G V Garje1 and G K Kharate2 1 Department of Computer Engineering and Information Technology PVG’s College of Engineering and Technology, Pune, India 2 Principal, Matoshri College of Engineering and Research Centre, Nashik, India ABSTRACT The work in the area of machine translation has been going on for last few decades but the promising translation work began in the early 1990s due to advanced research in Artificial Intelligence and Computational Linguistics. India is a multilingual and multicultural country with over 1.25 billion population and 22 constitutionally recognized languages which are written in 12 different scripts. This necessitates the automated machine translation system for English to Indian languages and among Indian languages so as to exchange the information amongst people in their local language. Many usable machine translation systems have been developed and are under development in India and around the world. The paper focuses on different approaches used in the development of Machine Translation Systems and also briefly described some of the Machine Translation Systems along with their features, domains and limitations. KEYWORDS Machine Translation, Example-based MT, Transfer-based MT, Interlingua-based MT FULL TEXT: http://airccse.org/journal/ijnlc/papers/2513ijnlc04.pdf VOLUME URL: http://airccse.org/journal/ijnlc/vol2.html REFERENCES
  • 14. [1] Sitender & Seema Bawa, (2012) “Survey of Indian Machine Translation Systems”, International Journal Computer Science and Technolgy, Vol. 3, Issue 1, pp. 286-290, ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print) [2] Sanjay Kumar Dwivedi & Pramod Premdas Sukhadeve, (2010) “Machine Translation System in Indian Perspectives”, Journal of Computer Science 6 (10): 1082-1087, ISSN 1549- 3636, © 2010 Science [3] John Hutchins, (2005) “Current commercial machine translation systems and computer- based translation tools: system types and their uses”, International Journal of Translation vol.17, no.1-2, pp.5-38. [4] Vishal Goyal & Gurpreet Singh Lehal, (2009) “Advances in Machine Translation Systems”, National Open Access Journal, Volume 9, ISSN 1930-2940 http://www.languageinindia. [5] Latha R. Nair & David Peter S., (2012) “Machine Translation Systems for Indian Languages”, International Journal of Computer Applications (0975 – 8887) Volume 39– No.1 [6] Vishal Goyal & Gurpreet Singh Lehal, (2010) “Web Based Hindi to Punjabi Machine Translation System”, International Journal of Emerging Technologies in Web Intelligence, Vol. 2, no. 2, pp. 148-151, ACADEMY PUBLISHER [7] Shachi Dave, Jignashu Parikh & Pushpak Bhattacharyya, (2002) “Interlingua-based English-Hindi Machine Translation and Language Divergence”, Journal of Machine Translation, pp. 251-304. [8] Sudip Naskar & Shivaji Bandyopadhyay, (2005) “Use of Machine Translation in India: Current status” AAMT Journal, pp. 25-31. [9] Sneha Tripathi & Juran Krishna Sarkhel, (2010) “Approaches to Machine Translation”, International journal of Annals of Library and Information Studies, Vol. 57, pp. 388-393 [10] Gurpreet Singh Josan & Jagroop Kaur, (2011) “Punjabi To Hindi Statistical Machine Transliteration”, International Journal of Information Technology and Knowledge Management , Volume 4, No. 2, pp. 459-463. [11] S. Bandyopadhyay, (2004) "ANUBAAD - The Translator from English to Indian Languages", in proceedings of the VIIth State Science and Technology Congress. Calcutta. India. pp. 43-51 [12] R.M.K. Sinha & A. Jain, (2002) “AnglaHindi: An English to Hindi Machine-Aided Translation System”, International Conference AMTA(Association of Machine Translation in the Americas) [13] Murthy. K, (2002) “MAT: A Machine Assisted Translation System”, In Proceedings of Symposium on Translation Support System( STRANS-2002), IIT Kanpur. pp. 134-139. [14] Lata Gore & Nishigandha Patil, (2002) “English to Hindi - Translation System”, In proceedings of Symposium on Translation Support Systems. IIT Kanpur. pp. 178-184. [15] Kommaluri Vijayanand, Sirajul Islam Choudhury & Pranab Ratna “VAASAANUBAADA - Automatic Machine Translation of Bilingual Bengali-Assamese News Texts”, in proceedings of Language Engineering Conference-2002, Hyderabad, India
  • 15. © IEEE Computer Society. [16] Bharati, R. Moona, P. Reddy, B. Sankar, D.M. Sharma & R. Sangal, (2003) “Machine Translation: The Shakti Approach”, Pre-Conference Tutorial, ICON-2003. [17] S. Mohanty & R. C. Balabantaray, (2004) “English to Oriya Translation System (OMTrans)” cs.pitt.edu/chang/cpol/c087.pdf [18] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah, Sawani Bade & Sasikumar M., (2006) “MaTra: A Practical Approach to Fully- Automatic Indicative EnglishHindi Machine Translation”, In the proceedings of MSPIL-06. [19] G. S. Josan & G. S. Lehal, (2008) “A Punjabi to Hindi Machine Translation System”, in proceedings of COLING-2008: Companion volume: Posters and Demonstrations, Manchester, UK, pp. 157-160. [20] Sanjay Chatterji, Devshri Roy, Sudeshna Sarkar & Anupam Basu, (2009) “A Hybrid Approach for Bengali to Hindi Machine Translation”, In proceedings of ICON-2009, 7th International Conference on Natural Language Processing, pp. 83-91. [21] Vishal Goyal & Gurpreet Singh Lehal, (2011) “Hindi to Punjabi Machine Translation System”, in proceedings of the ACL-HLT 2011 System Demonstrations, pages 1–6, Portland, Oregon, USA, 21 June 2011. [22] Ankit Kumar Srivastava, Rejwanul Haque, Sudip Kumar Naskar & Andy Way, (2008) “The MATREX (Machine Translation using Example): The DCU Machine Translation System for ICON 2008”, in Proceedings of ICON-2008: 6th International Conference on Natural Language Processing, Macmillan Publishers, India, http://ltrc.iiit.ac.in/proceedings/ICON-2008. [23] hutchinsweb.me.uk/Nutshell-2005.pdf [24] John Hutchins “Historical survey of machine translation in Eastern and Central Europe”, Based on an unpublished presentation at the conference on Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, Hamburg, Germany. (www.hutchinsweb.me.uk/Hamburg-2012.pdf) [25] Sampark: Machine Translation System among Indian languages (2009) http://tdildc.in/index.php?option=com_vertical&parentid=74, http://sampark.iiit.ac.in/ [26] Akshar Bharti, Chaitanya Vineet, Amba P. Kulkarni & Rajiv Sangal, (1997) ”ANUSAARAKA: Machine Translation in stages’, Vivek, a quarterly in Artificial Intelligence, Vol. 10, No. 3, NCST Mumbai, pp. 22-25 [27] Akshar Bharti, Chaitanya Vineet, Amba P. Kulkarni & Rajiv Sangal, (2001) ”ANUSAARAKA: overcoming the language barrier in India”, published in Anuvad: approaches to Translation [28] Hemant Darabari, (1999) “Computer Assisted Translation System- An Indian Perspective”, in proceedings of MT Summit VII, Thialand [29] R. Mahesh K. Sinha & Anil Thakur, (2005) “Machine Translation of Bi-lingual Hindi-English (Hinglish) Text”, in proceedings of 10th Machine Translation Summit organized by Asia-Pacific Association for Machine Translation (AAMT), Phuket, Thailand [30] Parameswari K, Sreenivasulu N.V., Uma Maheshwar Rao G & Christopher M, (2012) “Development of Telugu-Tamil Bidirectional Machine Translation System: A special focus
  • 16. on case divergence”, in proceedings of 11th International Tamil Internet conference, pp 180- 191 [31] Salil Badodekar, (2004) “Translation Resources, Services and Tools for Indian Languages”, a report of Centre for Indian Language Technology, IITB, http://www.cfilt.iitb.ac.in/Translationsurvey/survey.pdf [32] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah, Sawani Bade & Sasikumar M, (2006) “MaTra: A Practical Approach to Fully-Automatic Indicative EnglishHindi Machine Translation”, in proceedings of the first national symposium on Modelling and shallow parsing of Indian languages (MSPIL-06) organized by IIT Bambay, 202.141.152.9/clir/papers/matra_mspil06.pdf [33] CDAC Mumbai, (2008) “MaTra: an English to Hindi Machine Translation System”, a report by CDAC Mumbai formerly NCST. [34] Sanjay Chatterji, Praveen Sonare, Sudeshna Sarkar & Anupam Basu, (2011) “Lattice Based Lexical Transfer in Bengali Hindi Machine Translation Framework”, in Proceedings of ICON2011: 9th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from ltrc.iiit.ac.in/proceedings/ICON-2011. [35] R. Ananthakrishnan, Jayprasad Hegde, Pushpak Bhattacharyya, Ritesh Shah & M. Sasikumar, (2008) “Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation”, in proceedings of International Joint Conference on NLP (IJCNLP08), Hyderabad, India. [36] Yanjun Ma, John Tinsley, Hany Hassan, Jinhua Du & Andy Way, (2008) “Exploiting Alignment Techniques in MATREX: the DCU Machine Translation System for IWSLT 2008’, in proceedings of IWSLT 2008, Hawaii, USA [37] projects.uptuwatch.com/cs-it/anubharti-an-hybrid-example-based-approach-for- machine-aidedtrapnslation/ [38] Sugata Sanyal & Rajdeep Borgohain, (2013) “Machine Translation Systems in India”, Cornel University Library, arxiv.org/ftp/arxiv/papers/1304/1304.7728.pdf [39] Antony P. J., (2013) “Machine Translation Approaches and Survey for Indian Languages”, International journal of Computational Linguistics and Chinese Language Processing Vol. 18, No. 1, pp. 47-78. [40] Manoj Jain & Om P. Damani, (2009) “English to UNL (Interlingua) Enconversion”, in proceedings of 4th Language and Translation Conference (LTC-09). [41] Smriti Singh, Mrugank Dalal, Vishal Vachhani, Pushpak Bhattacharyya & Om P. Damani, (2007) “Hindi Generation from Interlingua (UNL)”, in proceedings of MT Summit, 2007 [42] language.worldofcomputing.net [43] sampark.iiit.ac.in [44] www.cdacmumbai.in/xlit [ 45] www.cdacmumbai.in/rupantar [46] translationjournal.net/journal/29computers.htm [47] www.cfilt.iitb.ac.in/resources/surveys/MT-Literature%20Survey-2012-Somya.pdf [48] www.cdacmumbai.in/e-ilmt
  • 17. [49] www.iiit.net/ltrc/Anusaaraka/anu_home.html [50] cdac.in/html/aai/mantra.asp [51] translate.google.com/about/intl/en_ALL/ AUTHORS G V Garje ([email protected]) has completed ME in Computer Science and Engineering from NITTR, Chandigarh, India in 1998. Currently he is working as Associate Professor and Head of Computer Engineering and Information Technology Department at Pune Vidyarthi Griha’s College of Engineering and Technology, Pune. Presently, he is a Chairman, Board of Studies in Information Technology, University of Pune and formerly, chairman, Board of Studies in Computer Engineering, University of Pune. Presently he is pursuing his Ph.D. from University of Pune, Maharashtra, India, in the area of Machine Translation. His area of research are NLP, Machine Translation specifically English-Marathi Language Pair. He has developed a tool for translating simple English interrogative sentences to Marathi sentences funded by University of Pune. His areas of interest are Data Structures, Operating Systems and Software Architecture. G K Kharate ([email protected]) has completed his Ph.D. in Electronics and Telecommunication Engineering from University of Pune and ME Electronics and Communication from Walchand College of Engineering, Sangali, Maharashtra. Currently he is a Principal at Matoshri College of Engineering and Research Centre, Nashik, Maharshtra. He is a Dean, Faculty of Engineering and Member of Man agement Council, University of Pune. He is former Chairman, Board of Studies in Electronics Engineering, University of Pune. His areas of research are Image Processing, Pattern Recognition, and Artificial Intelligence. His areas of interest are Digital Electronics, Computer Networks, Image Processing and Natural Language Processing.
  • 18. RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI Deepti Bhalla1 , Nisheeth Joshi2 and Iti Mathur3 1,2,3 Apaji Institute, Banasthali University, Rajasthan, India ABSTRACT Machine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi. KEYWORDS Machine Translation, Machine Transliteration, Name entity recognition, Syllabification FULL TEXT: http://airccse.org/journal/ijnlc/papers/2213ijnlc07.pdf VOLUME URL: http://airccse.org/journal/ijnlc/vol2.html
  • 19. REFERENCES [1] Kamal Deep and Vishal Goyal, (2011) ”Development of a Punjabi to English transliteration system”. In International Journal of Computer Science and Communication Vol. 2, No. 2, pp. 521-526. [2] Shubhangi Sharma, Neha Bora and Mitali Halder, (2012) “English-Hindi Transliteration using Statistical Machine Translation in different Notation” International Conference on Computing and Control Engineering (ICCCE 2012). [3] Kamal Deep, Dr.Vishal Goyal, (2011) “Hybrid Approach for Punjabi to English Transliteration System” International Journal of Computer Applications (0975 – 8887) Volume 28– No.1. [4] Jasleen kaur Gurpreet Singh josan , (2011) “Statistical Approach to Transliteration from English to Punjabi”, In Proceeding of International Journal on Computer Science and Engineering (IJCSE), Vol. 3 Issue 4, p1518. [5] Er. Sheilly Padda, Rupinderdeep Kaur, Er. Nidhi, (2012) “Punjabi Phonetic: Punjabi Text to IPA Conversion” International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com ISSN 2250-2459, Volume 2, Issue 10. [6] Gurpreet Singh Josan, Gurpreet Singh Lehal, (2010) “A Punjabi to Hindi Machine Transliteration System” Computational Linguistics and Chinese Language Processing Vol. 15, No. 2, pp. 77-102. [7] Manikrao L Dhore, Shantanu K Dixit, Tushar D Sonwalkar, (2012) “Hindi to English Machine Transliteration of Named Entities using Conditional Random Fields.” International Journal of Computer Applications;6/15/2012, Vol. 48, p31. [8] Musa, Hafiz, Rabith A.kadir, Azreen Azman, M.taufik Abadullah, (2011) "Syllabification algorithm based on syllable rules matching for Malay language." Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science. World Scientific and Engineering Academy and Society (WSEAS). [9] To download IRSTLM toolkit http://www.statmt.org [10] Jenny Rose Finkel, Trond Grenager, and Christopher Manning, (2005) Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363-370. [11] Daniel Jurafsky, James H. Martin Speech and Language processing An Introduction to speech Recognition, natural language processing, and computational linguistics. AUTHORS Deepti Bhalla is pursuing her M.Tech in Computer Science from Banasthali University, Rajasthan and is working as a Research Assistant in English-Indian Languages Machine Translation System Project sponsored by TDIL Programme, DEITY. She has her interest in Machine Translation specifically in
  • 20. English-Punjabi Language Pair. She has developed various tools on Punjabi Language Processing. Her current research interest includes Natural Language Processing and Machine Translation. Nisheeth Joshi is a researcher working in the area of Machine Translation. He has been primarily working in design and development of evaluation Matrices in Indian languages. Besides this he is also actively involved in the development of MT engines for English to Indian Languages. He is one of the expert empanelled with TDIL programme, Department of electronics Information Technology (DEITY), Govt. of India, a premier organization which foresees Language Technology Funding and Research in India. He has several publications in various journals and conferences and also serves on the Programme Committees and Editorial Boards of several conferences and journals. Iti Mathur is an assistant professor at Banasthali University. Her primary area of research is computational semantics and ontological engineering. Besides this she is also involved in the development of MT engines for English to Indian Languages. She is one of the experts empanelled with TDIL Programme, Department of Electronics Information Technology (DEITY), Govt. of India, a premier organization which foresees Language Technology Funding and Research in India. She has several publications in various journals and conferences and also serves on the Programme Committees and Editorial Boards of several conferences and journals.
  • 21. HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM P H Rathod1 , M L Dhore2 , R M Dhore3 1,2 Department of Computer Engineering, Vishwakarma Institute of Technology, Pune 3 Pune Vidhyarthi Griha’s College of Engineering and Technology, Pune ABSTRACT Language transliteration is one of the important areas in NLP. Transliteration is very useful for converting the named entities (NEs) written in one script to another script in NLP applications like Cross Lingual Information Retrieval (CLIR), Multilingual Voice Chat Applications and Real Time Machine Translation (MT). The most important requirement of Transliteration system is to preserve the phonetic properties of source language after the transliteration in target language. In this paper, we have proposed the named entity transliteration for Hindi to English and Marathi to English language pairs using Support Vector Machine (SVM). In the proposed approach, the source named entity is segmented into transliteration units; hence transliteration problem can be viewed as sequence labeling problem. The classification of phonetic units is done by using the polynomial kernel function of Support Vector Machine (SVM). Proposed approach uses phonetic of the source language and n-gram as two features for transliteration. KEYWORDS Machine Transliteration, n-gram, Support Vector Machine, Syllabification. FULL TEXT: http://airccse.org/journal/ijnlc/papers/2413ijnlc04.pdf VOLUME URL: http://airccse.org/journal/ijnlc/vol2.html
  • 22. REFERENCES [1] Padariya Nilesh, Chinnakotla Manoj, Nagesh Ajay, Damani Om P.(2008) “Evaluation of Hindi to English, Marathi to English and English to Hindi”, IIT Mumbai CLIR at FIRE. [2] Saha Sujan Kumar, Ghosh P. S, Sarkar Sudeshna and Mitra Pabitra (2008) “Named entity recognition in Hindi using maximum entropy and transliteration.” [3] BIS (1991) “Indian standard code for information interchange (ISCII)”, Bureau of Indian Standards, New Delhi. [4] Joshi R K, Shroff Keyur and Mudur S P (2003) “A Phonemic code based scheme for effective processing of Indian languages”, National Centre for Software Technology, Mumbai, 23rd Internationalization and Unicode Conference, Prague, Czech Republic, pp 1- 17. [5] Arbabi M, Fischthal S M, Cheng V C and Bart E (1994) “Algorithms for Arabic name transliteration”, IBM Journal of Research and Development, pp 183-194. [6] Knight Kevin and Graehl Jonathan (1997) “Machine transliteration”, In proceedings of the 35th annual meetings of the Association for Computational Linguistics, pp 128-135. [7] Stalls Bonnie Glover and Kevin Knight (1998) “Translating names and technical terms in Arabic text.” [8] Al-Onaizan Y, Knight K (2002) “Machine translation of names in Arabic text”, Proceedings of the ACL conference workshop on computational approaches to Semitic languages. [9] Jaleel Nasreen Abdul and Larkey Leah S. (2003) “Statistical transliteration for English- Arabic cross language information retrieval”, In Proceedings of the 12th international conference on information and knowledge management, pp 139 – 146. [10] Jung S. Y., Hong S., S., Paek E.(2003) “English to Korean transliteration model of extended Markov window”, In Proceedings of the 18th Conference on Computational Linguistics, pp 383–389. [11] Ganapathiraju M., Balakrishnan M., Balakrishnan N., Reddy R. (2005) “OM: One Tool for Many (Indian) Languages”, ICUDL: International Conference on Universal Digital Library, Hangzhou. [12] Malik M G A (2006) “Punjabi Machine Transliteration”, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, pp 1137–1144. [13] Sproat R.(2002) “Brahmi scripts, In Constraints on Spelling Changes”, Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands. [14] Sproat R.(2003) “A formal computational analysis of Indic scripts”, In International Symposium on Indic Scripts: Past and Future, Tokyo. [15] Sproat R.(2004) “A computational theory of writing systems, In Constraints on Spelling Changes”, Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands. [16] Kopytonenko M. , Lyytinen K. , and Krkkinen T.(2006) “Comparison of phonological representations for the grapheme-to-phoneme mapping, In Constraints on Spelling Changes”,
  • 23. Fifth International Workshop on Writing Systems, Nijmegen, The Netherlands. [17] Ganesh S, Harsha S, Pingali P, and Verma V (2008) “Statistical transliteration for cross language information retrieval using HMM alignment and CRF”, In Proceedings of the Workshop on CLIA, Addressing the Needs of Multilingual Societies. [18] Sumaja Sasidharan, Loganathan R, and Soman K P (2009) “English to Malayalam Transliteration Using Sequence Labeling Approach” International Journal of Recent Trends in Engineering, Vol. 1, No. 2, pp 170-172 [19] Oh Jong-Hoon, Kiyotaka Uchimoto, and Kentaro Torisawa (2009) “Machine transliteration using target-language grapheme and phoneme: Multi-engine transliteration approach”, Proceedings of the Named Entities Workshop ACL-IJCNLP Suntec, Singapore,AFNLP, pp 36–39 [20] Antony P.J, Soman K.P (2010) “Kernel Method for English to Kannada Transliteration”, Conference on Machine Learning and Cybernetics, pp 11-14 [21] Ekbal A. and Bandyopadhyay S. (2007) “A Hidden Markov Model based named entity recognition system: Bengali and Hindi as case studies”, Proceedings of 2nd International conference in Pattern Recognition and Machine Intelligence, Kolkata, India, pp 545–552. [22] Ekbal A. and Bandyopadhyay S. (2008) “Bengali named entity recognition using support vector machine”, In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian languages, Hyderabad, India, pp 51–58. [23] Ekbal A. and Bandyopadhyay S. (2008), “Development of Bengali named entity tagged corpus and its use in NER system”, In Proceedings of the 6th Workshop on Asian Language Resources. [24] Ekbal A. and Bandyopadhyay S. (2008) “A web-based Bengali news corpus for named entity recognition”, Language Resources & Evaluation, vol. 42, pp 173–182. [25] Ekbal A. and Bandyopadhyay S.(2008) “Improving the performance of a NER system by postprocessing and voting”, In Proceedings of Joint IAPR International Workshop on Structural Syntactic and Statistical Pattern Recognition, Orlando, Florida, pp 831–841. [26] Ekbal A. and Bandyopadhyay S.(2009) “Bengali Named Entity Recognition using Classifier Combination”, In Proceedings of Seventh International Conference on Advances in Pattern Recognition, pp 259–262. [27] Ekbal A. and Bandyopadhyay S. (2009) “Voted NER system using appropriate unlabelled data”, In Proceedings of the Named Entities Workshop, ACL-IJCNLP. [28] Ekbal A. and Bandyopadhyay S. (2010) “ Named entity recognition using appropriate unlabeled data, post-processing and voting”, In Informatica, Vol 34, No. 1, pp 55-76. [29] Chinnakotla Manoj K., Damani Om P., and Satoskar Avijit (2010) “Transliteration for ResourceScarce Languages”, ACM Trans. Asian Lang. Inform,Article 14, pp 1-30. [30] Kishorjit Nongmeikapam (2012) “Transliterated SVM Based Manipuri POS Tagging”, Advances in Computer Science and Engineering and Applications, pp 989-999 [31] K.P.Sonam, V. Ajay, R. Laganatha.(2009) “Machine Learning with SVM and Other Kernel Methods”, Machine Learning Book, PHI. [32] Koul Omkar N. (2008) “Modern Hindi Grammar”, Dunwoody Press
  • 24. [33] Walambe M. R. (1990) “Marathi Shuddalekhan”, Nitin Prakashan, Pune [34] Walambe M. R. (1990) “Marathi Vyakran”, Nitin Prakashan, Pune. [35] Dhore M L, Dixit S K and Dhore R M (2012) “Hindi and Marathi to English NE Transliteration Tool using Phonology and Stress Analysis”, 24th International Conference on Computational Linguistic,s Proceedings of COLING Demonstration Papers, at IIT Bombay, pp 111-118 AUTHORS ([email protected]) has completed BE in Information Technology, from Government College of Engineering, Karad, Maharashtra, India, in 2008. Recently he has completed ME in Computer Science and Engineering from Vishwakarma Institute of Technology, Pune, India in 2013. Currently he is working as Assistant Professor in Department of Computer Engineering at Vishwakarma Institute of Technology, Pune. He has his interest in Machine Translation and Machine Transliteration specifically in DevanagariEnglish Language Pairs. His current areas of research are Mobile Ad hoc Networks, Internet Routing Algorithms, Computer Networking, Machine Translation and Transliteration. M. L. Dhore ([email protected]) has completed ME in Computer Science and Engineering from NITTR, Chandigarh, India in 1998. Currently he is working as Associate Professor in Department of Computer Engineering at Vishwakarma Institute of Technology, Pune. Presently he is pursuing his Ph.D. from University of Solapur, Maharashtra, India, in the area of Computational Lingui stics. He has his interest in Machine Translation and Machine Transliteration specifically in Marathi-English and Hindi- English Language Pairs. He has developed the tools for Devanagari to English Machine Transliteration for the online web based commercial applications. His current areas of research are Internet Routing Algorithms, Computer Networking, Machine Translation and Transliteration. Ruchi M Dhore ([email protected]) is the student of Third Year Computer Engineering at Pune Vidyarthi Griha’s College of Engineering and Technology, Pune, Maharashtra, India. She is scholar student of her college and securing distinction every year in the University of Pune examinations. She is very good in programming and won the prizes in state level and national level competitions. Her area of research interest includes Text Processing and Pattern Searching. She likes to build her carrier in the development of language processing tools for Marathi language.
  • 25. HYBRID PART-OF-SPEECH TAGGER FOR NON-VOCALIZED ARABIC TEXT Meryeme Hadni1 , Said Alaoui Ouatik1 , Abdelmonaime Lachkar2 and Mohammed Meknassi1 1 FSDM, Sidi Mohamed Ben Abdellah University (USMBA), Morocco 2 E.N.S.A, Sidi Mohamed Ben Abdellah University (USMBA), Morocco ABSTRACT Part of speech tagging (POS tagging) has a crucial role in different fields of natural language processing (NLP) including Speech Recognition, Natural Language Parsing, Information Retrieval and Multi Words Term Extraction. This paper proposes an efficient and accurate POS Tagging technique for Arabic language using hybrid approach. Due to the ambiguity issue, Arabic Rule-Based method suffers from misclassified and unanalyzed words. To overcome these two problems, we propose a Hidden Markov Model (HMM) integrated with Arabic Rule-Based method. Our POS tagger generates a set of three POS tags: Noun, Verb, and Particle. The proposed technique uses the different contextual information of the words with a variety of the features which are helpful to predict the various POS classes. To evaluate its accuracy, the proposed method has been trained and tested with two corpora: the Holy Quran Corpus and Kalimat Corpus for undiacritized Classical Arabic language. The experiment results demonstrate the efficiency of our method for Arabic POS Tagging. In fact, the obtained accuracies rates are 97.6%, 96.8% and 94.4% for respectively our Hybrid Tagger, HMM Tagger and for the Rule-Based Tagger with Holy Quran Corpus. And for Kalimat Corpus we obtained 94.60%, 97.40% and 98% for respectively Rule-Based Tagger, HMM Tagger and our Hybrid Tagger. KEY WORDS Part-Of-Speech Tagger, Natural Language Applications, Natural Language Parsing, Hidden Markov Model, Multi Words Term Extraction, Speech Recognition. FULL TEXT : http://airccse.org/journal/ijnlc/papers/2613ijnlc01.pdf VOLUME URL: http://airccse.org/journal/ijnlc/vol2.html
  • 26. REFERENCE [1] Lee, S.hyun. & Kim Mi Na, (2008) “This is my paper”, ABC Transactions on ECE, Vol. 10, No. 5, pp120-122. [2] Gizem, Aksahya & Ayese, Ozcan (2009) Comunications & Networks, Network Books, ABC Publishers. [1] http://en.wikipedia.org/wiki/Part-of-speech_tagging. [2] L.Van Guilder, (1995) “Automated Part of Speech Tagging: A Brief Overview” Handout for LING361, Georgetown University. [3] H. Halteren, J.Zavrel & Walter Daelemans (2001).Improving Accuracy in NLP Through Combination of Machine Learning Systems. Computational Linguistics. 27(2): 199–229. [4] DeRose & J.Steven (1990) "Stochastic Methods for Resolution of Grammatical Category Ambiguity in Inflected and Uninflected Languages." PhD.Dissertation. Providence, RI: Brown University Department of Cognitive and Linguistic Sciences. [5] N. kumar Kumar, Anikel Dalal &Uma Sawant (2006)”hindi part of speech tagging and chunking”, NLPAI machine learning contest. [6] M. Mohseni, H. Motalebi, B. Minaei-bidgoli & M. Shokrollahi-far (2008) “A farsi part- of-speech tagger based on markov”. In the proceedings of ACM symposium on Applied computing, Brazil. [7] S. Jabbari &B. Allison(2007)“Persian Part of Speech Tagging”, In the Proceedings of Workshop on Computational Approaches to Arabic Script-Based Languages (CAASL-2), USA. [8] E. Brill (1995) “Transformation-Based Error-Driven Learning and Natural Language Processing: A case Study in Part of Speech Tagging”, Computational Linguistics, USA. [9] M. Hepple (2000), ”Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based Part of-Speech Taggers”, In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL). Hong Kong. [10] T. Brants (200),“TNT – a Statistical Part-of-Speech Tagger”, In the Proceedings of 6th conference on applied natural language processing (ANLP), USA. [11] K. Megerdoomian (2004), “Developing a Persian part-of speech tagger”, In the Proceedings of first Workshop on Persian Language and computer, Iran . [12] Khoja, S.( 2001) “ APT: Arabic part-of-speech tagger”. Proceeding of the Student Workshop at the 2nd Meeting of the NAACL, (NAACL’01), Carnegie Mellon University, Pennsylvania, pp: 1- 6. http://zeus.cs.pacificu.edu/shereen/NAACL.pdf [13] Freeman A (2001), “Brill’s POS tagger and a morphology parser for Arabic”, In ACL’01 Workshop on Arabic language processing. [14] Maamouri M, Cieri C. (2002). “Resources for Arabic Natural Language Processing at the LDC”, Proceedings of the International Symposium on the Processing of Arabic,Tunisia, pp.125-146. [15] Diab M., Hacioglu K. and Jurafsky D. (2004), “Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks”. proc. of HLTNAACL’04: 149–152. [16] Banko M, Moore R. C. (2004). “Part of Speech Tagging in Context”, Proc of the 20th international conference on Computational Linguistics, Switzerland.
  • 27. [17] Tlili-Guiassa Y. (2006) “Hybrid Method for Tagging Arabic Text”. Journal of Computer Science 2 (3): 245-248. [18] L. Young-Suk, K. Papineni & S. Roukos ( 2003), “Language Model Based Arabic Word Segmentation,” in Proceedings of the Annual Meeting on Association for Computational Linguistics, Japan, pp. 399- 406. [19] A.T Al-Taani & S. Abu-Al-Rub (2009),”A rule-based approaches for tagging non- vocalized Arabic words”. The International Arab Journal of Information Technology, Volume6 (3): 320-328. [20] T. Brants (2000),” TnT: A statistical part of speech tagger”, Proceedings of the 6th Conference on Applied Natural Language Processing, Apr. 29- May 04, Association for Computational Linguistics Morristown, New Jersey, USA., pp: 224-231. [21] NLTK, Natural Language Toolkit. http://www.nltk.org/Home [22] Quranic Arabic Corpus: http://corpus.quran.com [23] Quran Tagset: http://corpus.quran.com/documentation/tagset.jsp [24] N. Habash & O. Rambow (2005), “Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop,” in Proceedings of the Annual Meeting on Association for Computational Linguistics, Michigan, pp. 573-580. [25] http://sibawayh.emi.ac.ma/web/s/?q=node/79 [26] http://bit.ly/16jO3Ks [27] http://www.alwatan.com/ [28] F. Al Shamsi & A.Guessoum(2006),” A Hidden Markov Model–Based POS Tagger for Arabic”, 8es Journées internationales d’Analyse statistique des Données Textuelles (JADT). [29] M. Albared & O.Nazlia(2010),” Automatic Part of Speech Tagging for Arabic: An Experiment Using Bigram Hidden Markov Model “,Springer-Verlag Berlin Heidelberg, LNAI 6401, pp. 361– 370. [30] Y.O. Mohamed Elhadj(2009),” Statistical Part-of-Speech Tagger for Traditional Arabic Texts”, Journal of Computer Science 5 (11): 794-800. Authors Miss. Meryeme Hadni Phd Student in Laboratory of computer and Modelization, Faculty of Sciences, University Sidi Mohamed Ben Abdellah (USMBA), Fez, Morocco. She has also presented different papers at different National and International conferences. Pr. Abdelmonaime LACHKAR : received his PhD degree from the USMBA, Morocco in 2004, He is Professor and Computer Engineering Program Coordinator at (E.N.S.A, FES), and the Head of the Systems Architecture and Multimedia Team (LSIS Laboratory) at Sidi Mohamed Ben Abdellah University, Fez, Morocco. His current research interests include Arabic Natural Language Processing ANLP, Arabic Web Document Clustering and Categorization, Arabic Information Retrieval Systems, Arabic Text Summarization, Arabic Ontologies development and usage, Arabic Semantic Search Engines (SSEs).
  • 28. Pr. Said Alaoui Ouatik i s working as a Professor in Department of Computer Science, Faculty of Science Dhar EL Mahraz (FSDM), Fez, Morocco. His research interests include high-dimensional indexing and content-based retrieval, Arabic Document Categorization. 2D/3D Shapes Indexing and Retrieval in large 3D Objects Database. Mohammed Meknassi received Ph. D degree in computer sciences from Montreal University in 1993. Since 1993, he is professor of computer sciences. He teaches and makes his scientific research in the following fields: Parallel processing, Distributed Computing, Operating Systems and Image Processing. He is a member of the research unit: Systems Image and Multimedia (SIM) attached to the laboratory: Computer Sciences, Statistics and Quality (LISQ). He is the chief of the computer Sciences Department in the Faculty of Sciences Dhar El Mahraz of Fez.
  • 29. HYBRID APPROACHES FOR AUTOMATIC VOWELIZATION OF ARABIC TEXTS Mohamed Bebah1 Chennoufi Amine2 Mazroui Azzeddine3 and Lakhouaja Abdelhak4 1 Arab Center for Research and Policy Studies, Doha, Qatar 2 Faculty of Sciences/University Mohamed I, Oujda, Morocco 3 Faculty of Sciences/University Mohamed I, Oujda, Morocco 4 Faculty of Sciences/University Mohamed I, Oujda, Morocco ABSTRACT Hybrid approaches for automatic vowelization of Arabic texts are presented in this article. The process is made up of two modules. In the first one, a morphological analysis of the text words is performed using the open source morphological Analyzer AlKhalil Morpho Sys. Outputs for each word analyzed out of context, are its different possible vowelizations. The integration of this Analyzer in our vowelization system required the addition of a lexical database containing the most frequent words in Arabic language. Using a statistical approach based on two hidden Markov models (HMM), the second module aims to eliminate the ambiguities. Indeed, for the first HMM, the unvowelized Arabic words are the observed states and the vowelized words are the hidden states. The observed states of the second HMM are identical to those of the first, but the hidden states are the lists of possible diacritics of the word without its Arabic letters. Our system uses Viterbi algorithm to select the optimal path among the solutions proposed by Al Khalil Morpho Sys. Our approach opens an important way to improve the performance of automatic vowelization of Arabic texts for other uses in automatic natural language processing. KEYWORDS Arabic language, Automatic vowelization, morphological analysis, hidden Markov model, corpus FULL TEXT: http://airccse.org/journal/ijnlc/papers/3414ijnlc04.pdf VOLUME URL: http://airccse.org/journal/ijnlc/vol3.html
  • 30. REFERENCE [1] Debili, Fathi & Hadhemi Achour (1998) Voyellation automatique de l’arabe. In Proceedings of the workshop on Computation approaches to Semitic languages, COLING- ACL ’98, pages 42–49. [2] Maamouri, Mohamed, Ann Bies, and Seth Kulick. (2006) Diacritization: a challenge to Arabic treebank annotation and parsing. In Proceedings of the British Computer Society Arabic NLP/MT Conference. [3] Zitouni, Imed, Jefrey S. Sorensen, and Ruhi Sarikaya. (2006) Maximum entropy based restoration of arabic diacritics. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Workshop on Computational approaches to Semitic Languages, Sydney, Australia. July 2006, pages 577– 584. [4] Vergyri, Dimitra & Katrin Kirchhoff. (2004) Automatic diacritization of arabic for acoustic modeling in speech recognition. In Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages. COLING, Geneva, pages 66–73. [5] Messaoudi, Abdel, Lori Lamel, and Jean-Luc Gauvain. (2004) The limsi rt04 b arabic system. In Proceedings DARPA RT04, Palisades NY. [6] Elshafei, Moustafa, Husni Al-Muhtaseb, and Mansour Alghamdi. (2006) Machine generation of arabic diacritical marks. In The 2006 World Congress in Computer Science Computer Engineering, and Applied Computing. Las Vegas, USA., pages 128–133. [7] Emam, Ossama and Volker Fischer. (2005) Hierarchical approach for the statistical vowelization of arabic text. Technical report, IBM Corporation Intellectual Property Law, Austin, TX, US. [8] Schlippe, Tim, ThuyLinh Guyen, and ThuyLinh Vogel. (2008) Diacritization as a machinetranslation problem and as a sequence labeling problem. In 8th AMTA conference, Hawai., pages 21–25. [9] Gal, Yaakov. (2002) An hmm approach to vowel restoration in arabic and hebrew. In Proceedings of the Workshop on Computational Approaches to Semitic Languages- Philadelphia- Association for Computational Linguistics, pages 27–33. [10] Nelken, Rani and Stuart M. Shieber. (2005) Arabic diacritization using weighted finite- state transducers. In Proceedings of the ACL 2005 Workshop On Computational Approaches To Semitic Languages, Ann Arbor, Michigan, USA,, pages 79–86. [11] Habash, Nizar and Owen Rambow. (2007) Arabic diacritization through full morphological tagging. In Proceeding NAACL-Short ’07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics - Companion Volume - Short Papers Rochester - New York- USA, pages 53–56. [12] Bebah, Mohamed Ould Abdallahi Ould, Abdelouafi Meziane, Azzeddine Mazroui, and Abdelhak Lakhouaja. (2012) Approche morpho-statistique pour la voyellation des texts arabes. Journal of Computer Science and Engineering, 5(1). [13] Bebah, Mohamed Ould Abdallahi Ould, Abdelouafi Meziane, Azzeddine Mazroui, and Abdelhak Lakhouaja. (2011) Alkhalil morpho sys. In 7th International Computing Conference in Arabic, May 31- June 2, 2011, Riyadh, Saudi Arabia.
  • 31. [14] El-Sadany, T and M Hashish. (1988) Semi-automatic vowelization of arabic verbs. In 10th NC Conference, Jeddah, Saudi Arabia. [15] Manning, Chris and Hinrich Schutze. (1999) Foundations of statistical natural language processing. Massachusetts Institute of Technology Press - Library of Congress Cataloging in publication Information. [16] Deltour, Amelie. (2003) Methodes statistiques pour la voyellisation des texts arabes. Master’s thesis, ENSIMAG-Karlsruhe University. AUTHORS Mohamed Ould Abdallahi Ould Bebah Researcher at Doha Institute for Graduate Studies since 2013. "Doctorat" in Computer Sciences, Mohamed I University, Oujda, Morocco, 2013. "DESA" in "Numerical Analysis, Computer science and Signal Processing" from Mohamed I University, 2005. Member of Arabic NLP unit, LaRI Laboratory, Mohamed I University since 2005. Member of Language Studies unit at the Center of Social and Human Studies and Researches (CERHSO) in Oujda since 2005. Member of Arabic Language Engineering Society in Morocco (ALESM) since 2012. Amine CHENNOUFI Master in Computer Sciences from Mohamed I University, Oujda, Morocco 2010. Engineering degree in Meteorology from the National School of Meteorology ENM in Toulouse in France since 1994. Since January 2011, He prepares his PhD thesis in Arabic Natural Language Processing within the Computer Research Laboratory (LaRI). His research interests are especially in Automatic vowelization of Arabic language. Professionally he is the responsible of Meteorological Centre of Oujda Airport. Azzeddine Mazroui "Doctorat d’Etat" in Numerical Analysis, University Mohammed I Morocco, 2000. PHD in Probability and Statistics, Pierre & Marie Curie University France, 1993. Professor of mathematics and Computer Sciences in University Mohammed I. Member of Computer Research Laboratory (LaRI). Director of the ANLP unit in the LaRI laboratory Abdelhak Lakhouaja "Doctorat d’Etat" in Computer Sciences, University Mohammed I Morocco, 2000. Professor of Computer Sciences in University Mohammed I. Member of Computer Research Laboratory (LaRI). Cofounder of the ANLP unit in the LaRI laboratory.
  • 32. WORD SENSE DISAMBIGUATION USING WSD SPECIFIC WORDNET OF POLYSEMY WORDS Udaya Raj Dhungana1 , Subarna Shakya2 , Kabita Baral3 and Bharat Sharma4 1, 2, 4 Department of Electronics and Computer Engineering, Central Campus, IOE, Tribhuvan University, Lalitpur, Nepal 3 Department of Computer Science, GBS, Lamachaur, Kaski, Nepal ABSTRACT This paper presents a new model of WordNet that is used to disambiguate the correct sense of polysemy word based on the clue words. The related words for each sense of a polysemy word as well as single sense word are referred to as the clue words. The conventional WordNet organizes nouns, verbs, adjectives and adverbs together into sets of synonyms called synsets each expressing a different concept. In contrast to the structure of WordNet, we developed a new model of WordNet that organizes the different senses of polysemy words as well as the single sense words based on the clue words. These clue words for each sense of a polysemy word as well as for single sense word are used to disambiguate the correct meaning of the polysemy word in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms. The clue word can be a noun, verb, adjective or adverb. KEYWORDS Word Sense Disambiguation, WordNet, Polysemy Words, Synset, Hypernymy, Context word, Clue Word FULL TEXT: http://airccse.org/journal/ijnlc/papers/3414ijnlc05.pdf VOLUME URL: http://airccse.org/journal/ijnlc/vol3.html
  • 33. REFERENCES [1] N. Ide and J. Véronis, “Word sense disambiguation: The state of the art,” Computational Linguistics, pp. 1–40, 1998. [2] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller, “Introduction to wordnet: An on-line lexical database,” International Journal of Lexicography, 1998. [3] U. R. Dhungana and S. Shakya, “Word sense disambiguation in nepali language,” in The Fourth International Conference on Digital Information and Communication Technology and Its Application (DICTAP2014), Bangkok, Thailand, 2014, pp. 46–50. [4] M. E. Lesk, “Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone,” in SIGDOC Conference, Toronto, Ontario, Canada, 1986. [5] S. Banerjee and T. Pedersen, “An adapted lesk algorithm for word sense disambiguation using wordnet,” in Third International Conference on Intelligent Text Processing and Computational Linguistics, Gelbukh, 2002. [6] M. Sinha, M. K. Reddy, P. Bhattacharyya, P. Pandey, and L. Kashyap, “Hindi word sense disambiguation,” Master’s thesis, Indian Institute of Technology Bombay, Mumbai, India, 2004. [7] N. Shrestha, A. V. H. Patrick, and S. K. Bista, “Resources for nepali word sense disambiguation,” in IEEE International conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE’08), Beijing, China, 2008. [8] P. Bhattacharyya, P. Pande, and L. Lupu, “Hindi wordnet,” Indian Institute of Technology Bombay, Mumbai, India, Tech. Rep., 2008. [9] N. Shrestha, A. V. H. Patrick, and S. K. Bista, “Nepali word sense disambiguation using lesk algorithm,” Master’s thesis, Kathmandu University, Dhulikhel, Kavre, Nepal, 2004.
  • 34. AN UNSUPERVISED APPROACH TO DEVELOP STEMMER Mohd. Shahid Husain Department of Information Technology, Integral University, Lucknow ABSTRACT This paper presents an unsupervised approach for the development of a stemmer (For the case of Urdu & Marathi language). Especially, during last few years, a wide range of information in Indian regional languages has been made available on web in the form of e- data. But the access to these data repositories is very low because the efficient search engines/retrieval systems supporting these languages are very limited. Hence automatic information processing and retrieval is become an urgent requirement. To train the system training dataset, taken from CRULP [22] and Marathi corpus [23] are used. For generating suffix rules two different approaches, namely, frequency based stripping and length based stripping have been proposed. The evaluation has been made on 1200 words extracted from the Emille corpus. The experiment results shows that in the case of Urdu language the frequency based suffix generation approach gives the maximum accuracy of 85.36% whereas Length based suffix stripping algorithm gives maximum accuracy of 79.76%. In the case of Marathi language the systems gives 63.5% accuracy in the case of frequency based stripping and achieves maximum accuracy of 82.5% in the case of length based suffix stripping algorithm. KEYWORDS Stemming, Morphology, Urdu stemmer, Marathi stemmer, Information retrieval. FULL TEXT: http://airccse.org/journal/ijnlc/papers/1212ijnlc02.pdf VOLUME URL: http://airccse.org/journal/ijnlc/vol1.html
  • 35. REFERENCES [1] Rizvi, J et. al. “Modeling case marking system of Urdu-Hindi languages by using semantic information”. Proceedings of the IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE '05). 2005. [2] Butt, M. King, T. “Non-Nominative Subjects in Urdu: A Computational Analysis”. Proceedings of the International Symposium on Non-nominative Subjects, Tokyo, December, pp. 525-548, 2001. [3] Savoy, J. “Stemming of French words based on grammatical categories”. Journal of the American Society for Information Science, 44(1), 1-9, 1993. [4] Lovins Julie Beth: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11:22–31. (1968) [5] Mokhtaripour, A., Jahanpour, S. “Introduction to a New Farsi Stemmer”. Proceedings of CIKM Arlington VA, USA, 826-827, 2006. [6] R. Wicentowski. "Multilingual Noise-Robust Supervised Morphological Analysis using the Word Frame Model." In Proceedings of Seventh Meeting of the ACL Special Interest Group on Computational Phonology (SIGPHON), pp. 70-77, 2004. [7] Rizvi, Hussain M. “Analysis, Design and Implementation of Urdu Morphological Analyzer”. SCONEST, 1-7, 2005. [8] Krovetz, R. “View Morphology as an Inference Process”. In the Proceedings of 5th International Conference on Research and Development in Information Retrieval, 1993. [9] Porter, M. “An Algorithm for Suffix Stripping”. Program, 14(3): 130-137, 1980. [10] Thabet, N. “Stemming the Qur’an”. In the Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, 2004. [11] Paik, Pauri. “A Simple Stemmer for Inflectional Languages”. FIRE 2008. [12] Sharifloo, A.A., Shamsfard M. “A Bottom up Approach to Persian Stemming”. IJCNLP, 2008 [13] Croft and Xu. “Corpus-Based Stemming Using Co occurrence of Word Variants”. ACM Transactions on Information Systems (61-81), 1998. [14] Kumar, A. and Siddiqui, T. “An Unsupervised Hindi Stemmer with Heuristics Improvements”. In Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, 2008. [15] Kumar, M. S. and Murthy, K. N. “Corpus Based Statistical Approach for Stemming Telugu”. Creation of Lexical Resources for Indian Language Computing and Processing (LRIL), C-DAC, Mumbai, India, 2007. [16] Qurat-ul-Ain Akram, Asma Naseer, Sarmad Hussain. “Assas-Band, an Affix-Exception- List Based Urdu Stemmer”. Proceedings of ACL-IJCNLP 2009. [17] http://en.wikipedia.org/wiki/Urdu [18] .http://www.bbc.co.uk/languages/other/guide/urdu/steps.shtml [19] http://www.andaman.org/BOOK/reprints/weber/rep-weber.htm [20] Natural Language processing and Information Retrieval by Tanveer Siddiqui, U S
  • 36. Tiwary. [21] Information retrieval: data structure and algorithms by William B. Frakes, Ricardo Baeza-Yates. [22] http://www.crulp.org/software/ling_resources.htm [23] Marathi Corpus, http://www.cfilt.iitb.ac.in/marathi_Corpus/ , IIT Powai, Mumbai. AUTHORS Mohd. Shahid Husain M.Tech. from Indian Institute of Information Technolo gy (IIIT-A), Allahabad with Intelligent System as specialization. Currently pursuing Ph.D. and working as assistant professor in the department of Information Technology, Integral University, Lucknow.