This thesis proposes a complete system that classifies and recognizes machine-printed
Arabic text. The input to the system is a clean, high-resolution Tag Image File Format
(.TIFF) that contains Arabic text to be recognized; the output is simply the generated
Arabic text saved in a Microsoft Word Document (.DOC) file of the recognized Arabic
text. The technique is based on cleverly describing the text in terms of shape primitives
derived from Freeman chain codes. A rule-based data enhancement technique is used
to improve recognized features as much as possible. The recognized features are
processed by a Prolog feature-matching engine to classify character classes as well as
diacritic information as three separate streams (character class stream, diacritic stream
and corners information stream). In addition to the three provided streams, estimated
font size is also provided as a fourth input. Characters are finally determined by
processing a permutation of the three streams using Definite Clause Grammar (DCG).