Arabic Language Learning Assistance Base
Arabic Language Learning Assistance Base
However, P (X) does not depend on a particular value of W Test 9 1.5 Hours 45 45
and can be "released" from the calculation of the argmax: Total 9 8.5 Hours 280 280
= ��� �� � �( ) (3)
3.4 Pronunciation Dictionaries
The pronunciation dictionary provides the link between
Where the term P (W) is estimated using the language model the sequences of acoustic units and the words represented in
and P (X | W) is the probability given by the acoustic models. the language model. While the corpus of text and speech can
This type of approach allows integrating in the same decision be collected, the dictionary pronunciation is usually not
process, the acoustic and linguistic information (Figure 1). directly available. Although a manually created pronunciation
dictionary gives a good performance, the task is very
cumbersome to achieve and requires extensive knowledge of
the language. The literature suggests approaches that can
automatically generate the pronunciation dictionary. The
approach, simple and fully automatic, using phonemes as the compelled by the transcript. The purpose of this experiment is
unit of modeling has been well validated in many works. to compare the forced alignment with the output of acoustic-
However, for the Arabic language, we used a new approach to phonetic decoding. Following the forced alignment
automatically generate our dictionary pronunciation. The first procedure, we get a body segmented into phonemes with the
step is based on the phoneme and each phoneme is timestamps. Each line of transcription result contains the start
representing Arabic as a modeling unit. The second step is to time and end time, the position of phoneme in the phoneme
try to build manually a small dictionary as shown in the frame number expected.
following table, and finally, in the third step, build
automatically an Arabic phonetiser based on our 3.8 Experiments
pronunciation dictionary which was done manually and an
Arabic model language. We use SphinxTrain [1] to train the acoustic models
(HMM). Models context independent (CI) and context
Table 2. Sample from the pronunciation dictionary. dependent (CD with 1000 states) based on graphemes and
phonemes are constructed from the speech corpus described
Arabic word Phonetic
in section I.3. We obtain four acoustic models, namely
أمس S M AE E Grapheme_CI, Grapheme_CD, Phoneme_CI and
Phoneme_CD.
ال ائرة H AE R IH E AE: D EL
ال ائية H AE Y IH E AE : N IH J EL Experiments are conducted with Sphinx3 [13]. The
الرابعة H AE AI IH B AE: R EL topology model is a HMM with 3 states with 8 Gaussians per
بالم كمة H AE M AE K AE M EL IH B state. The parameter vector contains 13 MFCC, their first and
H AE Y IH E AE : D IH T B IH second derivatives. The body of text is first segmented into
ااب ائية E EL words and the 20k most frequent words are extracted for use
ب ونس S IH N UW T IH B as vocabulary test. This vocabulary of words and the corpus
of learning language models are then segmented into 8800
ال ر AE R AE DH2 AE N EL
syllables and 3500 clusters of characters respectively. The
transcript of the speech training corpus is also used to learn
the language model. The language models used in our
3.5 Evaluation of automated tools experiments is obtained by linear interpolation between the
models created from the web data and those of the
In analyzing large-scale variations related to dialect
transcription of speech corpora. Development data are used to
speakers and automatic recognition of Arabic speech, we
optimize the interpolation parameters.
evaluate the contribution of automatic tools acoustic-phonetic
decoding and forced alignment tool based on the Sphinx.
3.9 Results and Discussion
3.6 Acoustic-phonetic decoding Phonemes grouping
This step is the exact transcript generated from the
speech signal that is the transcript that the speaker supposed Using the DAP (acoustic-phonetic decoding), we get a
tosay. phonetic transcription of the corpus from a speech signal
without using the orthographic transcription (without any
knowledge of the lexicon, and no language model). From the
From a speech signal, the first treatment is to extract
phonetic transcription, we performed statistics on the
vectors of parameters. These parameters are input module
percentage of phonemes in our corpus [20].
acoustic or acoustic-phonetic decoding. This acoustic-
phonetic decoding in turn can produce one or more phonetic
assumptions usually associated with a probability for each The fact is that analysis of phonemes first class is
segment (a window or a frame) of speech signals. simpler for analysis as distinct phonemes. Secondly because
there are not many errors in the DAP. We classified all the
phonemes generated by the DAP in two classes: consonant
3.7 Principle of forced alignment and vowel, and each class is divided into subclasses.
The second treatment is to achieve for each sentence in We have compiled the sounds into six classes of phonemes:
the corpus forced alignment between the sentence and the long vowel / short vowel gemination, words containing
corresponding speech signal. The eventual aim is to compare unfamiliar sounds, sounds that exist in other languages,
the results of our acoustic-phonetic decoding results of the Hamza middle and final emphatic Letters, Sounds
forced alignment to extract phonetic confusions. Before unproblematic. These groups maintain production methods
starting the forced alignment, it needs our dictionary and do not include places of articulation of sounds.
pronunciation. This task is to align the speech signals of each
class with its corresponding orthographic transcription in our
corpus to obtain segmentation into phonemes in the corpus,
Table 3. Arabic phonemes grouping. Recognition problems of large vocabulary may appear
depending on the conditions under which the test signal is
LSVG WUS SEOL MFH EL US
recorded. If the word is pronounced more or less close to the
حع ذ هـ ثـ خ قصض
microphone therefore recognition rates can vary widely,
ر طظ
despite the normalization of the signal to prevent this
phenomenon.
ت ّم شارع ماذا جزائري طا ب سن
تفّاح جامعة ه قاء ف
ّ ص اسبانيا However if the user pronounces the word always the
same distance and with the same intensity, the recognition
ستّة عربي ذهب سؤا طب بنان
rates are very satisfactory , and this allow the system to reach
أمّي مرحبًا ا
ً أه سأ برتقا في a new rate of automatic speech recognition for large Arabic
vocabulary never reached before.
We explain here the abbreviation used in this table (Table 3)
LSVG : Long or short vowel germination 4 CALL Applications
WUS: Words with unfamiliar sounds The computer-assisted learning has attracted
SEOL: Sounds that exist in other languages considerable attention in recent years. Many research efforts
MFH : Middle and final “hamza” have been made to improve such systems particularly in the
EL: Emphatic Letters field of foreign language teaching.
US : Unproblematic Sounds
In the second part of this article, we describe our system
and these results for learning of spoken Arabic language
computer-assisted. This work was developed to teach
pronunciation of Arabic people speaking a foreign language:
French ... This application uses our system for speech
recognition to detect errors in pronunciation user.
All these rules are designed to make our system a very simple Fig. 5. Result of the system for each student on the class of
application that allows a genuine dialogue with the student, the phoneme: the unknown sounds of Arabic.
even in the absence of the teacher.
100%
4.2 The process of ALO: Testing and Results 80%
Q
This part corresponds to the test process of our system. 60%
This application was tested for quantitative information on its SS
40%
validity and, in particular, its ability to provide statistics on a DD
learner or the class (level). Systematic tests on a large corpus 20%
in Arabic (of the order of 352 words selected by a linguist and 0% TT
Arabic Language is communication in everyday life 0 5 10 15
(introducing oneself, family, food, clothing , orientation in
space and time ...).
Fig. 6. Result of the system for each student on the class of
- Sounds unproblematic: 52 words. the phoneme: Letters from the Arabic emphatic.
- Letters emphatic ( (ق ص ض ط ظ: 60 words.
- Hamza and final median: 60 words.
- Sounds existing in other languages : )ر (: The results and statistics of our system: Learning
60words. foreign spoken languages: the case of Arabic, are very
- Words with unfamiliar sounds ( ) : 60words satisfactory. The previous figures show very good levels of
each learner in relation to its difficulties in pronunciation of
- Long vowel or short vowel and gemination: 60
each class of phonemes. These statistics are very helpful for
words. the teacher to automatically detect errors in the pronunciation
of each learner.
This system was tested by 13 French students from the
University Stendhal Grenoble, France after the training
Learning foreign languages: the case of Arabic. The following 5 Conclusion and Outlook
figures show the statistics of the level of each student for In this paper, we presented our system, a platform for
classes in Arabic phonemes: long vowel or short vowel and learning foreign spoken languages: The case of Arabic based
gemination, emphatic letters, unfamiliar sounds. on the formalism of standard Arabic automatic speech
recognition.
150%
AE Our system differs from the few other work being done
100% on standard Arabic and use the foreign language learning: the
AE: case of spoken Arabic Computer Assisted on several aspects:
50% UH it incorporates an acoustic model of speech-based Arabic-
Based Approach to Hidden Markov Model (HMM) giving
UW results in the form of phonetic structures, while other systems
0%
IH are lacking and assume that the input signal is already
0 5 10 15 phonetically labeled and organized (the case of El-Kasasy
[11]).
Fig. 4. Result of the system for each student on the class of Our system includes also a language model to validate
the phoneme: the long vowel or short vowel gemination and the acoustic analysis obtained. Several opportunities are
Arabic. offered to our work, we can cite, among others:
In terms of modeling: a multitude of modeling that can [15] Mourad MARS, Georges Antoniadis, Mounir Zrigui:
be undertaken to expand the coverage of linguistic phenomena "Nouvelles ressources et nouvelles pratiques pédagogiques
treated (enlarge our training corpus, our language model, avec les outils TAL", ISDM32, N°571, Avril (2008).
dictionary pronunciation ... etc.).
[16] Aymen Trigui, Mohsen Maraoui, Mounir Zrigui: The
In terms of implementation, we propose the Gemination Effect on Consonant and Vowel Duration in
implementation of other modules of the platform (Learning Standard Arabic Speech. SNPD 2010: 102-105
Voice of Arabic sentences, diversity of exercises for the
learner, expand our platform for learning other languages ... [17] Mohamed Belgacem, Mounir Zrigui: Automatic
etc...). Identification System of Arabic Dialects. IPCV 2010: 740-
749
6 References
[18] Tahar Saidane, Mounir Zrigui, and Mohamed Ben
[1] http://cmusphinx.sourceforge.net. Ahmed: Arabic Speech Synthesis Using a Concatenation of
Polyphones: The Results. Canadian Conference on AI 2005:
[2] Sherwood B. « Man-Machine Studies » University 406-411
Illinois.USA, 1980.
[19] Mounir Zrigui, Mbarki Chahad, Anis Zouaghi, and
[3] Simon J. « L’éduction et l’informatisation de la société » Mohsen Maraoui: A Framework of Indexation and Document
Rapport au président de la république. 1981. Video Retrieval based on the Conceptual Graphs. CIT 18(3):
(2010)
[4] Bestougeff H. Thèse de l’état université de Paris VII,
1970. [20] Rami Ayadi, Mohsen Maraoui, Mounir Zrigui:
Intertextual distance for Arabic texts classification. ICITST
[5] Nelles R. Thèse à l’université de Fribourg, 1977. 2009, pages 1-6.