A Hybrid Approach for Automatic Morphological Diacritization of Arabic Text

Document Type : Original Research Articles.

Authors

1 Computer Science Department, Mansoura University, Egypt

2 Electronics and Communications Department, Cairo University, Egypt

Abstract

 Arabic Modern texts are commonly written without diacritization, which is a critical task for other Arabic
processing tasks as word sense disambiguation, automatic speech recognition, and text to speech, where word meaning
or pronunciation is decided based on the diacritic signs assigned to each letter. This paper presents a novel approach for automatic Arabic text diacritization using deep encode-decode recurrent neural networks that is followed by several text correction techniques, to improve the overall system output accuracy. Experimental results of the proposed system on Wikinews test set show superior performance and are competitive with those of the-state-of-the-art diacritization methods. Namely, our method achieves morphological diacritization Word Error Rate (WER) 3.85% and Diacritic Error Rate (DER) 1.12%
 

Keywords

Main Subjects