Designinig a Stemming Algorithm for Silt’e Language
No Thumbnail Available
Date
2012-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Variant word forms that are likely to be encountered in indexing and retrieval are one of the
causes of the problems that are involved in the use of freetext retrieval system. The variant word
structure s used in indexing and searching are to be expected in determining the relevance of a
document to a user query that specifies just a single form. Shrinking the variant words in to one
form advances the performance of IR system and this can be achieved by conflation techniques,
which is usually stemming that is established in this work. Stemmers are used in information
retrieval to reduce as many related words and word forms as possible to a standard form, which
can then be used in the retrieval process. This research explores the possibility of developing a
stemmer to conflate variant words of Silt’e language.
Silt’e belongs to the Semitic language group. These languages have a common grammatical
system based on a root-pattern structure. Consonants bear the basic meaning while vowels form
different patterns. Stems are built from consonantal roots before other word forms are built.
Silt’e uses affixation and reduplication to derive different word forms from stems. Common
affixations are prefix, suffix, and infix. Silt’e uses extensive concatenation of affixes and can
result in relatively long words, which often contain an amount of semantic information
equivalent to a whole English phrase, clause or sentence. As a result of this complex
morphological structure, a single Silt’e word can have very large variants.
To design the stemmer, a sample text was collected from different sources and research paper
that explains the morphology of Silt’e language also used and affixes and stopwords collected
from this research paper and the sample text document to develope the stemmer. The stemmer,
developed in this study is iterative and uses context sensitive and recoding rules that remove
prefix, suffix and reduplication of letters (type 1 and type 2). In this experiment the stripping
procedure were applied in order: prefix, suffix and finally letter reduplication. The stemmer was
tested on a sample data of 1486 words, which were selected randomly from the sample texts. The
result of the experiment shows that, the stemmer performs at accuracy of 85.71%, and brings a
dictionary reduction of 34.99% for stem words. Lastly conclution and the possible
recommendation for future work were reported.
Description
Keywords
Algorithem