Amharic - English cross-lingual information retrieval (Cllr): A corpus based approach

dc.contributor.advisorAbebe Ermias (Ato)
dc.contributor.authorTesfaye Aynalem
dc.date.accessioned2020-05-27T10:07:38Z
dc.date.accessioned2023-11-18T12:44:41Z
dc.date.available2020-05-27T10:07:38Z
dc.date.available2023-11-18T12:44:41Z
dc.date.issued2009-08
dc.description.abstractAmharic is the official working language of the Federal Democratic Republic of Ethiopia. On the other hand, English serves as medium of instruction and communication in educational centers, working language in governmental and nongovernmental organizations in Ethiopia. Thus, experimenting on the applicability of a cross language information retrieval system for Amharic-English that can break the language barrier is important. This research is conducted to break the language barrier that users face in obtaining and using documents prepared in Amharic and English. Amharic is the official working language of the Federal Democratic Republic of Ethiopia. On the other hand, English serves as medium of instruction and communication in educational centers, working language in governmental and nongovernmental organizations in Ethiopia. Thus, experimenting on the applicability of a cross language information retrieval system for Amharic-English that can break the language barrier is important. This research is conducted to break the language barrier that users face in obtaining and using documents prepared in Amharic and English. The method that is employed to conduct the experimentation is a corpus-based approach. This approach requires availability of a large volume of parallel documents prepared in Amharic and English. The documents that were collected to conduct this research are news articles and legal items.The performance of the system was measured by precision and recall. At the first phase of the experimentation, precision values were very low - the highest being 0.2 and 0.3 for Amharic and English respectively. This was due to the index term list which could not fully represent the documents used for the experimentation. The process of indexing removed important terms from index list which resulted in lack of documents to be retrieved for most of the queries. Thus, the index list was modified, i.e., all the terms which occur in the corpus with the exception of stop words were used. This showed the increase in precision values - the highest being 0.36 and 0.33 for Amharic and English documents respectively. Therefore, with the use of sufficiently large and cleaned parallel Amharic-English document collection, it is possible to develop a cross language information retrieval for the language pairs.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/21335
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectInformation Scienceen_US
dc.titleAmharic - English cross-lingual information retrieval (Cllr): A corpus based approachen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Aynalem Tesfaye.pdf
Size:
14.51 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: