A Top-Down Chart Parser for Amharic Sentences

Addis Ababa University


Natural language processing applications have an important role in our daily life, by enabling computers to understand human languages. NLP applications such as machine translation, question answering, knowledge extraction and information retrieval are among the most common applications which we need to accomplish different tasks. For better development of the above mentioned applications the assigning of part of speech of words, the extraction of phrases and sub-phrases, and the extraction of syntactic structure of sentences from natural language texts are important. Amharic is one of the under resourced languages whose natural language tools and applications are not yet built successfully. Therefore, parsing Amharic sentences is a necessary mechanism for many applications. Sentence parsing is one of the tasks of NLP tools which identify the syntactic structure of a specific sentence according to the grammar of a language. For this reason, many natural language applications underlie on sentence parser for better performance. For foreign languages like English and Arabic, many sentences parsers are developed in different approaches. However in the case of Amharic, there are few works done which still require improvements and additional features. In addition, they are conducted in small dataset on specific types of sentences. In our study, we have designed a similar system to parse all types of Amharic sentences using a top-down chart parsing algorithm using Context Free Grammar to represent the Amharic grammars. We have developed a lexicon generator to automatically generate the lexicon which is separated from the CFG. In addition, we have integrated a morphological analyzer in the construction of the lexicon. The main purpose of the morphological analyzer is to reduce the number of words required to be stored in the lexicon. The morphological analyzer results the morpheme of the given words so that words which have common root are represented by their morpheme in the lexicon. The parser is tested on test sentences which are extracted from different sources. Experimental results showed the effectiveness of the proposed parser. Keywords: NLP, Parser, context free grammar, top-down chart parser, lexicon generator, lexicon, morphological analyzer.



NLP; Parser, Context Free Grammar; Top-Down Chart Parser; Lexicon Generator, Lexicon; Morphological Analyzer