A Top-Down Chart Parser for Amharic Sentences
No Thumbnail Available
Date
2015-04
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Natural language processing applications have an important role in our daily life, by enabling
computers to understand human languages. NLP applications such as machine translation,
question answering, knowledge extraction and information retrieval are among the most common
applications which we need to accomplish different tasks. For better development of the above
mentioned applications the assigning of part of speech of words, the extraction of phrases and
sub-phrases, and the extraction of syntactic structure of sentences from natural language texts are
important. Amharic is one of the under resourced languages whose natural language tools and
applications are not yet built successfully. Therefore, parsing Amharic sentences is a necessary
mechanism for many applications. Sentence parsing is one of the tasks of NLP tools which
identify the syntactic structure of a specific sentence according to the grammar of a language.
For this reason, many natural language applications underlie on sentence parser for better
performance.
For foreign languages like English and Arabic, many sentences parsers are developed in different
approaches. However in the case of Amharic, there are few works done which still require
improvements and additional features. In addition, they are conducted in small dataset on
specific types of sentences.
In our study, we have designed a similar system to parse all types of Amharic sentences using a
top-down chart parsing algorithm using Context Free Grammar to represent the Amharic
grammars. We have developed a lexicon generator to automatically generate the lexicon which is
separated from the CFG. In addition, we have integrated a morphological analyzer in the
construction of the lexicon. The main purpose of the morphological analyzer is to reduce the
number of words required to be stored in the lexicon. The morphological analyzer results the
morpheme of the given words so that words which have common root are represented by their
morpheme in the lexicon. The parser is tested on test sentences which are extracted from
different sources. Experimental results showed the effectiveness of the proposed parser.
Keywords: NLP, Parser, context free grammar, top-down chart parser, lexicon generator,
lexicon, morphological analyzer.
Description
Keywords
NLP; Parser, Context Free Grammar; Top-Down Chart Parser; Lexicon Generator, Lexicon; Morphological Analyzer