Proper Noun Extracting Algorithm For Arabic Language: Abstract-Many of Natural Language
Proper Noun Extracting Algorithm For Arabic Language: Abstract-Many of Natural Language
28.1
Riyad Al-Shalabi, Ghassan Kanaan, Bashar Al-Sarayreh, Khalid Khanfar, Ali Al-Ghonmein, Hamed Talhouni,
and Salem Al-Azazmeh
list of roots; most of these roots are three their functions in other languages. In modern
constants. English and modern French for example, the
prefix or the suffix is usually a modifier of
The Arabic language differs from other
the meaning of the noun or the verb. It does
natural languages such as English language,
not add any entity (happy prefixes with un _
its own features that are not found in other
unhappy). In Arabic, the prefix can add an
languages. Natural Language Processing
entity to a noun or a verb. For example, the
(NLP) in the Arabic language is still in its
prefix can be a preposition and the suffix can
initial stage compared to the work in the
be a pronoun. Figure 1 tells more about
English language, which has already
prefixes and suffixes in Arabic
benefited from the extensive research in this
[8][23][24][27].
area. There are some aspects that slow down
progress in Arabic Natural Language
ﻟﻮا ﺻﻖAffixes
Processing (NLP) compared to the
accomplishments in English and European
languages [5]. ﺳﻮا ﺑﻖPrefixes Infixes ﻟﻮاﺣﻖSuffixes
Special Issue of the International Journal of the Computer, the Internet and Management, Vol.17 No. SP1, March, 2009
28.2
Proper Noun Extracting Algorithm for Arabic Language
28.3
Riyad Al-Shalabi, Ghassan Kanaan, Bashar Al-Sarayreh, Khalid Khanfar, Ali Al-Ghonmein, Hamed Talhouni,
and Salem Al-Azazmeh
Special Issue of the International Journal of the Computer, the Internet and Management, Vol.17 No. SP1, March, 2009
28.4
Proper Noun Extracting Algorithm for Arabic Language
the name Al-Ordoni ( اﻻردﻧ ﻲthe Jordanian, or personal name . In such a case the personal
rather "of Jordan"), and Al-Misri( اﻟﻤ ﺼﺮيthe name would be prefixed to bin ﺑﻦor Ibn اﺑﻦ.
Egyptian, or rather "of Egypt") in many Abu Karim Muhammad al-Jamil ibn Nidal
places in the Middle East, despite the fact ibn Abdulaziz al-Filistini
that their families may have resided outside
Jordan or Egypt for several generations. The اﺑ ﻮ آ ﺮﻳﻢ ﻣﺤﻤ ﺪ اﻟﺠﻤ ﻴﻞ ﺑﻦ ﻧﺪال ﺑﻦ ﻋﺒﺪ اﻟﻌﺰﻳﺰ اﻟﻔﻠﺴﻄﻴﻨﻲ
nisba, among the components of the Arabic "Father-of-Karim, Muhammad, the beautiful,
name perhaps most closely resembles the son of Nidal, son of Abdulaziz, the
Western surname and sometimes become Palestinian" (karim means generous,
family of person [22][27]. Muhammad means praised Jamil means
beautiful; Aziz means Magnificent, and it is
IV. ARABIC NAME DETECTION one of the 99 names of God) .Abu Karim is a
Identifying proper noun in Arabic is kunya, Muhammad is the person's proper
particularly difficult, since names in the name (ism), al-Jamil is a laqab, Nidal is his
Arabic language do not start with capital father (a nasab), Abdulaziz his grandfather
letters so we can not mark them in the text (second-generation nasab) and "al-Filistini"
by looking at the first letter of the word. is his family nisba.
There is no fixed method to name in the If the person has performed the ( Hajj ) ﺣ ﺞ,
Arabic language, there ere multiple ways of the honorific ( "Haji" ) اﻟﺤ ﺎجwould be
writing the name ; for example, frequently prefixed to his name, (e.g. Haji Muhammad(
use the word "Ould" ( )وﻟ ﺪthat means "son اﻟﺤ ﺎج ﻣﺤﻤ ﺪ. Another words that prefix the
of" in some North African countries such as person name ("Mr." اﻟ ﺴﻴﺪor "Sheikh"اﻟ ﺸﻴﺦ
Mauritania," Mauritanian poet Ahmed Ould ) ("Sharifah" اﻟ ﺸﺮﻳﻔﺔ,"Mrs." اﻟ ﺴﻴﺪةfor
Abdul Kader" in Arabic " "أﺣﻤ ﺪ وﻟ ﺪ ﻋ ﺒﺪ اﻟﻘ ﺎدر. females) .
While spreading the use of the word "bin or
ibn" ﺑ ﻦ او اﺑﻦthat means "son of " in some of V. DETECTION AND EXTRACTING OF
the Middle East and Arab Gulf countries, as PROPER NOUN
the method to name in the old Arab Islamic Detecting Proper nouns in English languages
name , such as Prince Mohammed bin is not very difficult; Nouns name people,
Rashid Al Maktoum. places, and things. Every noun can further be
Modern naming convention may drop the classified as common or proper. A proper
words "bin","ibn", "ould", or "bint" as it is noun has two distinctive features: it will
already implied, which showed ratios son to name a specific (usually a one of a kind)
his father in many Arab countries, so item, and it will begin with a capital letter no
Fatimah's full name would be "Fatimah matter where it occurs in a sentence.
Ahmad Haroun Al fulany " ﻓﺎﻃﻤ ﺔ أﺣﻤ ﺪ ه ﺎرون Detecting Proper noun is quite challenging in
اﻟﻔﻼﻧﻲ. Arabic languages as it shares no cognates
with English. The Arabic Information
In this paper first we use previous structure
Retrieval proper name module utilizes clue
of Arabic names to guide us to mark Arabic
words in the document text to detect Proper
name in text, second we use set of keywords
Names in six different categories: People اﺳ ﻢ
that help us to identify and detect place of
ﺷ ﺨﺺ, Major Cities اﻟﻤ ﺪن اﻟﺮﺋﻴ ﺴﺔ, Locations
Arabic Names, where we can find them in
ﻣﻮاﻗ ﻊ, Countries دول, Organizations ﻣ ﻨﻈﻤﺎت,
the text and extracts them from the text, this
Political parties أﺣ ﺰاب ﺳﻴﺎﺳ ﻴﺔand Terrorist
keyword usually followed by a personal
Groups ﻣﺠﻤﻮﻋﺎت ارهﺎﺑﻴﺔ.
name. Abd X ﻋ ﺒﺪmeans slave of X where X
is a word describing Allah( اﷲGod) (e.g. To detect proper nouns in Arabic text we use
Abdul aziz )ﻋ ﺒﺪ اﻟﻌﺰﻳ ﺰ. Abu أﺑ ﻮmeans father set of keywords to guide us to the place
of Y ,Umm أمmeans mother of Y ,Ibn اﺑ ﻦ where we can find them in the text. By using
or bin ﺑ ﻦmeans son of Y where Y is keywords we mark name phrases that might
28.5
Riyad Al-Shalabi, Ghassan Kanaan, Bashar Al-Sarayreh, Khalid Khanfar, Ali Al-Ghonmein, Hamed Talhouni,
and Salem Al-Azazmeh
ل ع ا ف " ﺣ ﺔ ﻟﻠ ﺸﺮب ﺟ ﻨﻮب اﻷردن, one named entity is
are based on two things: the keyword and recognized ﺟﻨﻮب اﻷردنas Location
some special verbs. Names seem to appear
close to one of these keywords or special We presented a prefix for personal names
verbs in Arabic text. To mark the proper such as (Mr., Dr., Majesty, Sir, etc…), place
noun in the text we look for the keywords names a prefix such as (city, country,
and special verbs in the text to mark the republic, kingdom, etc…), in this system to
name phrases [20] we classified them in retrieve names (surname, middle name, last
different classes: people, locations, name) we must write, ""ﺑ ﻦor " "ﺑ ﻨﺖbetween
organizations, events and products. Table1 tow names.
shows some examples of these keywords and
VI. EXTRACTING PROPER NOUN
special verbs.
Algorithm steps to extract proper noun in
Arabic language is described as follows:
* Remove diacritics
- Diacritics: special marks are put above or
bellow the characters to determine the
correct pronunciation. Such as " ٌ، ،ِ ُ ،ً " e.g.
اﻟ َﻌ َﺮ ِﺑﻴﱠﺔto اﻟﻌﺮﺑِﻴﺔArabic (language ).
Special Issue of the International Journal of the Computer, the Internet and Management, Vol.17 No. SP1, March, 2009
28.6
Proper Noun Extracting Algorithm for Arabic Language
Start
* Remove punctuation and non letters. Such
as "،؟.،!، ".
Read the word
* Search in keyword file and special verbs
using set of rule. Remove diacritics
* Check for the prefix and strips off " ﻓﺐ، ﻓﻚ
Remove punctuation and non letters
أﺑ ﺎل،وآ ﺎل وﺑ ﺎل أآ ﺎل، ،ﻓ ﺒﺎل، أل،أب، أك، ﻓ ﻞ ﻟ ﻞ، ،
، ال، أﻟﻞ، ﺑﺎل، آﺎل، ﻓﺎل، وال، ﻓﻠﻞ، وﻟ ﻞ، أال، اﻟ ﻼ،"ﻓﻜ ﺎل Search in keyword files and special verbs
and remove it .
* Check for suffixes, " ، ﻳ ﺔ ات، ﻳ ﻦ، ون،ان Yes No
Match
،ﺗﺎنetc."
* Extract the words that follow keywords Check for the prefix and suffix of word
and save them in the proper noun database. then apply stemming and extractor process
28.7
Riyad Al-Shalabi, Ghassan Kanaan, Bashar Al-Sarayreh, Khalid Khanfar, Ali Al-Ghonmein, Hamed Talhouni,
and Salem Al-Azazmeh
Special Issue of the International Journal of the Computer, the Internet and Management, Vol.17 No. SP1, March, 2009
28.8
Proper Noun Extracting Algorithm for Arabic Language
28.9