Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Српски
  • Yкраї́нська
  • Log In
    New user? Click here to register. Have you forgotten your password?
Repository logo
  • Colleges, Institutes & Collections
  • Browse AAU-ETD
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Српски
  • Yкраї́нська
  • Log In
    New user? Click here to register. Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Yifiru, Martha (PhD)"

Now showing 1 - 13 of 13
Results Per Page
Sort Options
  • No Thumbnail Available
    Item
    Afan Oromo news text summarizer
    (Addis Ababa University, 2012-06) Debele, Girma; Yifiru, Martha (PhD)
    Information overload is a global problem that requires solution. Automatic text summarization is one of the natural language processing technologies that have got researchers focus to help information users. It is a computer program that summarizes a text. A summarizer removes redundant information from the input text and produces a shorter non-redundant output text. In this study, a generic automatic text summarizer for Afan Oromo news text has been developed based upon the Open Text Summarizer (OTS). OTS summarizes texts in English, German, Spanish, Russian, Hebrew, Esperanto and other languages. For this master’s thesis most of the work done is customizing the OTS code so that it can make use of the Afan Oromo lexicons and work for the Afan Oromo language. The summarizer basically uses the combinations of term frequency and sentence position methods with language specific lexicons in order to identify the most important sentence for extractive summary. In this study we have developed three methods for Afan Oromo news text summarization and tested their performance both objectively and subjectively. These three summarizers are: M1 that uses term frequency and position methods without Afan Oromo stemmer and other lexicons (synonyms and abbreviations), M2 is a summarizer with combination of term frequency and position methods with Afan Oromo stemmer and language specific lexicons (synonyms and abbreviations) and M3 is with improved position method and term frequency as well as the stemmer and language specific lexicons (synonyms and abbreviations). The performance of the summarizers was measured based on subjective as well as objective evaluation methods. The result of objective evaluation shows that the three summarizers: M1, M2 and M3 registered f-measure values of 34%, 47% and 81% respectively i.e. M3 outperformed the two summarizers ( M1 and M2 ) by 47% and 34 % . Moreover, the subjective evaluation result shows that the three summarizers’ (M1, M2 and M3) performances with informativeness, linguistic quality and coherence and structure are: (34.37 %, 37%, and 62.5%), (59.37%, 60% and 65%) and (21.87%, 28.12% and 75%) respectively as it is judged by human evaluators. In both subjective and objective evaluation, the results are consistent. Summarizer M3 that uses the combination of term frequency and improved position methods outperform other summarizers followed by M2.
  • No Thumbnail Available
    Item
    Amharic Named Entity Recognition Using A Hybrid Approach
    (Addis Ababa University, 2014-08) Tadele, Mikiyas; Yifiru, Martha (PhD)
    Named Entity Recognition (NER) is a subcomponent of information extraction (IE) that detects and classifies named entities (NE) which, among others can be proper nouns representing person, location, and organization names and also date, time, and measurements. NER has been also found to be vital for other NLP applications, such as Information Retrieval, Question and Answering, Machine Translation, and Text summarization to mention a few. This research reports the performance of Amharic NER (ANER) built using the hybrid approach and different feature sets to detect and classify NEs of type person, location, and organization. Two state of the art machine learning (ML) algorithms, namely decision tree and support vector machines (SVM), are used to investigate the performance of the hybrid ANER. This is the first research that has used these ML algorithms for ANER and also the first research to explore ANER using the hybrid approach. The rule-based component of the hybrid ANER has been built using two rules that base their predictions on the presence of trigger words before and after NEs. The ML component is built using decision tree (J48) and SVM (libsvm). The hybrid ANER integrates those two components by using the NE class predicted from the rule-based component as a feature in the ML component. We have conducted different experiments to compare the performance of the hybrid approach with that of the pure ML approach by using different feature sets. From our experiments we have obtained a high performing model for both J48 and libsvm algorithms without using the rule-based feature but using POS feature with the nominal flag feature with an F-measure of 96.1% for J48 and 85.9% for libsvm. Based on the experimental results we have concluded that the pure ML approach with POS and nominal flag feature outperformed the hybrid approach. This is because the rule-based component used in the experiment uses only trigger words. Using rules prepared by linguists and gazetteers may improve the rule based component and consequently the hybrid ANER system. Keywords: Amharic Named Entity Recognition, Information Extraction, Decision tree, Support Vector Machine, Hybrid Named Entity Recognition System.
  • No Thumbnail Available
    Item
    Applying Text Mining Techniques to Extract Knowledge from Cancer Patients’ Medcial Records-The case of Tikur Anbessa Specialized Hospital
    (Addis Ababa University, 2018-01) Abebe, Bethelhem; Yifiru, Martha (PhD); Yilma, Mengistu (PhD)
    Background: Currently, there are a lot of medical texts accumulated than interpreted. These texts have to be organized and analyzed effectively in order to be useful. Nowadays text mining has become very important in analyzing these medical texts and finding patterns effectively. The medical records of chronic ill patients (especially cancer) contain a lot of information both image and textual formats which are vital to immediate patient care which could also help in different researches and finding difficult cases. Objective: The aim of this project is applying text mining techniques to extract knowledge from oncology patients’ medical records. Method: In order to conduct this project, data was collected from oncology patients’ medical records of Tikur Anbessa specialized hospital. The CRISP methodology was applied for describing the pattern in these medical records. For extracting the patterns from 137 medical records, R software was used. After creating a corpus and having pre processed the medical records, a pattern was extracted using hierarchical and k means clustering algorithm. The patterns extracted from these two algorithms were compared and evaluated. To evaluate the pattern extracted both the subjective and objective evaluation approaches were used. The subjective evaluation was done with the help of ten physcians (both residents and oncologists). For the objective evaluation Rand index, accuracy, precison, recall and F measures were performed. Result: According to the assessment of the medical records indicated, searching the necessary medical records from the record room was difficult almost impossible, their follow up formats are disorganized and the physicians’ handwriting is illegible. These make knowledge discovery difficult, time taking and tiresome. As the objective evaluation methods showed the hierarchical algorithm performed better than the k means (Rand index=66.2%, Accuracy=48.8% and Precision and recall=65.6%) and 50% of the physicians chose also the hierarchical algorithm. During the subjective evaluation, out of the ten physicians, three of them (30%) did not have any idea as which one is better because the idea was new to them and difficult to understand the patterns. Two of them (20%) preferred the k means to the hierarchical because the hierarchical seemed complicated to them. The rest (50%) chose the hierarchical algorithm since it tried to show almost the necessary pattern found in the patients’ medical records.Conclusion: In general text mining eases access of the necessary knowledge rather than going through patients’ medical records and any one can get the necessary knowledge from the pattern extracted in the oncology patients’ medical records. From the project it is easy to see that the hierarchical algorithm performed better. Recommendation: Text mining has different applications and each application has different benefits in the medical health care. And different kinds of knowledge can be discovered and predicted not only from cancer medical records but also from other chronic illnesses that need further researches and experiments. For researchers, there is a great need of text mining applications in the medical domain specially using clustering algorithms in order to extract new knowledge. Also the efficiency of the k means and hierarchical clustering needs to be improved. For the health practitioners and Tikur anbessa specialized hospital, this application will give them a great benefit, so handling the patients’ medical record in a proper and organized way will give opportunity to give quality of care for the patient. Also for the physicians and different researchers, it will give ease acccess of the necessary data for knowledge extraction and patient management system. For software development business organizations, there is a great opportunity to work on the text mining area especially in the medical domain which needs more structuring and handling medical data.
  • No Thumbnail Available
    Item
    Assessing the Usage of Ict: In Ethiopian Management Institute
    (Addis Ababa University, 2017-10-04) Tesfaye, Natnael; Yifiru, Martha (PhD)
    Introduction: These days managements in every organization are operating in highly dynamic and competitive environment. Thus, the current management training institutions are starting to understand the relevance and importance of information and communication technology. They are beginning to appreciate the use of ICT in any organization. This study tried to assess the current day to day usage of information and communication technology (ICT) and by how much work burden is resolved using ICT in Ethiopian management institute. Objective: The general objective of this study is to assess the usage of information and communication technology in Ethiopian management institute. Methodology: this study adopted a descriptive survey research design. The target population under the study was the employee of Ethiopian management institute. To this end, the necessary data were collected from 46 core employee (development program department) and 53 support employee (IT department, Audit, HRM, Women and youth, PR, Accounting and finance, Procurement and supply management and strategic) were included in this study. Self administered questionnaires and interviews were used and SPSS version 21 was utilized to analyze quantitative data, frequency and percentage were used to illustrate the study population. Conclusion: gaps or problems identified in assessing the usage of ICT were the institute have computers and some other ICT equipment like copiers, printers and scanners the major barrier to use ICT is lack of skill and unaware of what ICT equipments are found in their institute Recommendation: the study recommended that administrators should work hard to ensure the institute is equipped with latest ICT equipment that could be useful in promoting e- learning and should also consider regular training session to improve the usage of ICT. Further research direction also recommended enhancing usage of ICT in Ethiopian management institute.
  • No Thumbnail Available
    Item
    Assessment of the Current Paper Based Medical Record System At Multi-Drug Resistance Tuberculosis Department in Saint Peter Hospital for Introducing Electronic Medical Record System.
    (Addis Ababa University, 2014-05) Melaku, Zenebech; Yifiru, Martha (PhD); Seme, Assefa( PhD)
    Introduction: Health care is one of the critical components of basic social services that have a direct linkage to the growth and development of a country as well as to the wellbeing of society. In response to this, the Federal Ministry of Health, supported by its technical partners, is involved in a number of ICT projects and services. And one of these projects is electronic medical record (EMR) which is computerized medical information systems that collect, store and display patient information. It is a means to create legible and organized recordings and to access clinical information about individual patients. As the medical recorded system of Ethiopia had been entangled with a number of problems, this study tried to assess the problem of paper based medical record for introducing EMR at MDR TB department in St. Peter hospital. Objective: the general objective of this study is assessment of the current paper based medical record for introducing EMR at MDR TB department in St. Peter Hospital. Methodology: Across-section study design with quantitative and qualitative methods of collecting data was conducted at MDR TB department in St. Peter TB Hospital from April/2014 to May/2014. All the 27 health professionals (physicians ,nurses and health officers) staff working in the MDR department including head of laboratory, pharmacy, imaging, MDRTB department and head nurse of the department were selected purposive sampling technique was utilized for the selection. Moreover, all the 182 MDR TB patients were included in the study. Self administered questionnaires, Interviews and observation techniques were utilized SPSS version 20 was utilized to analyze quantitative data, frequency and percentages were used to describe the study population. . Findings: gaps or problems identified in the paper based medical record system were illegibility, incompleteness, redundancy of data, difficulty in accessing data, inefficient communication system with different departments about patient issues, and shortage of storage space, digital x-ray machine shortage of computer and misplacing of patient card. Conclusion and Recommendations: Most of the problems identified with the paper based medical system at MDR-TB are typical of those problems faced in any paper based system. Therefore, Introduction of EMR would help to reduce problems associated with legibility, completeness, redundancy and other problems and foster good communication of patient data among the departments. Moreover, it would relieve storage space problems. The Hospital plans to adopt Smart care developed by Tulane University. But it would be important to analyze the actual situation in the department and study the software so as to make it adaptable.
  • No Thumbnail Available
    Item
    Automatic Summarization for Amharic Text Using Open Text Summarizer
    (Addis Ababa University, 2013-06) Ashagre, Addis; Yifiru, Martha (PhD)
    Information overload is a problem in this information era due to the mass production of information in many formats which is enhanced by the internet technology. Amharic text documents are part of this mass production. In order to extract the useful information from a given text document with in short time, automatic text summarization plays a decisive role. There are quite a few researches done for Amharic text summarization but still more research needs to be done to accomplish better result achieved in other languages like English. The objective of this study is therefore to investigate the applicability of the open text summarizer for Amharic news text summarization. The system is an open source, language independent single document text summarization tool. It uses combinations of term frequency and sentence position methods to rank the sentences of the article. 40 news articles on different issues are gathered from EPA, WIC and RANP web pages from which a corpus containing 30 news articles is prepared for the experimentation. Some modifications were made on the interface of the tool that was designed in C# programming language. The OTS tool is customized in two ways for performing the two experiments. The first one is done without changing the code of the tool significantly, but with few modifications on the punctuation rules and by preparing the dictionary file that holds the Amharic language lexicons. The system uses language specific lexicons which include list of affixes, abbreviations, stop words, synonyms, compound words and other rules. The second one is done by changing the Porter stemmer of the tool with an Amharic stemmer. The experiment is done on both systems by generating 90 summaries for each news article at 10%, 20% and 30% extraction rates. The performance of the two systems is evaluated using subjective and objective evaluation. Subjective evaluation is done for 45 summaries extracted in experiment one and good result is obtained. Objective evaluation is done for all the summaries generated in both experiments by comparing them with an ideal manual summary using F-measure. The highest score for the first experiment is 75.65% at the 30% extraction rate for middle size articles and a corpus average score of 66.23% has been achieved whereas for experiment two it is 72.83% at the extraction rate of 30% for the large size news articles and a corpus average score of 72.37%. The system with Amharic stemmer gave better performance than the other regardless of the size of the original article in a given extraction rate with better average corpus score at 20% and 30%. The system also showed regularity in performance improvement as the extraction rate increases.
  • No Thumbnail Available
    Item
    College of Natural and Computational Sciences School of Information Science
    (Addis Ababa University, 2017-10-04) Getaneh, Desalegn; Yifiru, Martha (PhD)
    Ethiopia is one of the most populous country in Africa with a fertility rate of 4.6 and eight percent unwanted pregnancy. This contributes for maternal mortality and child death. To make every child is wanted, family planning plays an essential role through delaying, spacing or limiting birth. But the contraceptive prevalence rate of Ethiopia is low and needs to work hard to address the demand of family planning through providing choice of contraceptive method to a woman. Socio demographic characteristic has a contributing factor for choice of contraceptive method but which variables determine the choice of contraceptive method is a challenge. Moreover, among clients received contraceptive method, they might not get their choices. Due to this, there is 13 percent discontinuation rate though service provider uses different job aids to support the choice of contraceptive method. Thus, this research focus on identifying key variables that determine the choice of contraceptive method though applying data mining techniques and develop a knowledge based system that supports the health service provider for the choice of contraceptive methods. Empirical research design is applied to achieve this objective which combine both experimental and non-experimental researches. Prototyping approach is followed to develop the knowledge based system. As a research method, knowledge engineering and hybrid data mining methodology was employed. Interview and document analysis was also conducted to acquire knowledge from domain experts and documents respectively. A decision tree J48 algorithm was used to predict variables that determine the choice of contraceptive method. Thus, client age, number of children, education, residence, marital status, religion, region and contraceptive history determine the choice of contraceptive methods are key variables in the choice of contraceptive method. In addition, medical eligibility criteria and life style of a woman has a factor in the choice of contraceptive method. Prototype knowledge based system is developed that determine choice of contraceptive methods through integrating data mining results (socio demographic variables), medical eligibility criteria (explicit knowledge) and life style of a woman (tacit knowledge). x Based on system performance evaluation and user acceptance test, 86.6 % of accuracy and 76% acceptance was scored respectively so that integrating socio demographic data, medical eligibility criteria and life style of a woman is possible and can be implemented in the domain area. Finally, further exploration has to be done to refine the knowledge base and boost the advantages of choice of contraceptive method of knowledge based system to incorporate woman who has special characteristics.
  • No Thumbnail Available
    Item
    Designing Amharic Definitive Question Answering
    (Addis Ababa University, 2013-06) Teshome, Wondwossen; Yifiru, Martha (PhD)
    The amount of available information is becoming very huge, especially with the Web proliferation. The problem faced by the user is not the lack of documents or information but is the lack of time to find a short and precise answer among the variety of available documents. Search engines offer a lot of links toward web pages, but are not able to provide an exact answer; instead return ranked documents based on relevance measure with the posed query from users. Thus, a new need is emerged: the possibility of obtaining a brief and concise answer. Providing a brief and concise answer is the main goal of Question Answering systems. Though there are studies towards developing question – answering system for factoid questions, there is no research conducted to develop a definition question answering system for Amharic and we couldn’t compare our result with any other efforts in the topic. In this study, an attempt is made to design Amharic question answering for definitive questions. Definition QA systems in other languages have been extensively researched and have shown reasonable outcomes. The proposed Question Answering approach in this study deals with Amharic definition question by applying surface text pattern method. This method considers two main steps. First, it applies a pattern to discover a set of definition-related text patterns from the Amharic legal corpus. Then, using these patterns, it extracts a collection of concept-description pairs from a target document file, and applies the definition extraction to return answer to a given question. The research achieved nugget precision of 85.6 %, nugget recall of 73% and F-measure of 78.8%. Usage of surface patterns is effective to answer Amharic definition questions. Definiendum extraction from users question and extracting concept-descriptions from corpora are the major challenges of this study. Further sequence mining algorithm can be experimented to extract concept-description relationship from the corpus.
  • No Thumbnail Available
    Item
    Developing Mobile Application for Public Health Emergency Management System for Ethiopia Public Health Institute
    (Addis Ababa University, 2015-07) Sebsibe, Esubalew; Yifiru, Martha (PhD); Sime, Assefa( PhD)
    Introduction: Public Health Emergency Management (PHEM) is one of the core business processes in Ethiopia Public Health Institute. The main function is to collect timely information about the occurrence of disease outbreaks throughout the country. This information would help the responsible organizations to take timely action if the number of cases is above the expected threshold level. The major problems in PHEM is lack of adequate communication media among all the responsible stakeholders, problem of getting quality and complete information on time. Objectives: The objective of this project is to analyze, design, and develop a prototype mobile application for Public Health Emergency Management system for Ethiopia Public Health Institute. Methods: All the important data/requirement collection instruments were used for this study. On the basis of the identified instruments we conducted interview for the selected respondents. The sample respondents were selected using purposive sampling techniques. In addition national PHEM guidelines and other related documents were reviewed along with the interview and observation we conducted. This helped us to determine the requirements of the new system. Later on we mentioned the methodology software development life cycle, and water fall approaches were used to develop the prototype mobile application. Analysis and Design of the System: Then after analysis and design models were used like Use cases to describe the basic functions of the information system, and Use case description to show detail description of the activities and functions running in the early warning and surveillance sub-process and other health related services, data flow diagram to depicts the actual flow or movement of data in the system, activity diagram to show business process and work flow. The analysis and design model is finalized by identifying the relevant analysis classes, attributes and their respective operation for designing the new system, architectural design, entity relationship diagram and user interface diagram were applied to elicit the parts of the system. Conclusion: Problem of on time data collection, organization and reporting about disease outbreak data are the main challenges in the public health emergency management system. In the meantime the rapid growth of mobile phones has been contributing a lot in human’s daily activity and organizational business. Therefore, we understood that the application of mobile phones and its application in PHEM would play a vital role to bring a timely collection, organizing and summarizing of data for evidence based decision making processes therefore a prototype mobile application was developed to mitigate the above mentioned problems in PHEM. Recommendation: We recommend the responsible scholars to develop a complete mobile application based on the analysis and design, the developed prototype mobile application done in this study. To conduct summative usability testing and documenting all the necessary documents to the responsible government organization for sustainability and future usage of the system.
  • No Thumbnail Available
    Item
    Developing Tigrinya Speech Recognizer Using Amharic and Tigrinya Data
    (Addis Ababa University, 2015-03) Deressa, Dionasios; Yifiru, Martha (PhD)
    This study has introduced the design of a Hidden Markov Model based LVCSR system in a new target language based on a different source language and without the need of a large speech databases on the target language. The Tigrinya LVCSR was developed using an Amharic Corpus consisting of 10,850 sentences and a limited Tigrinya data containing 600 sentences to train the acoustic models. The study was conducted based on the knowledge based approach taking the assumption that the articulatory representations of phonemes are similar across the Tigrinya and Amharic languages with the exception of 2 phonemes unique to Tigrinya and using all the phonemes as acoustic units. A total of six experiments were performed using different parameters each one done in an effort of increasing the performance of the recognizer. Out of the five experiments, the best result obtained with the experiment that is done by training the seed model with the 10,850 Amharic sentences up to the 8th iteration and using the 600 Tigrinya sentence starting from the 8th iteration of the training process. The experimental result showed percentage of correctly recognized words of 88.33% with an accuracy of 73.43 %. The baseline Tigrinya recognizer which was trained using only the 600 Tigrinya data resulted in correctly recognized words of 80.80% and an accuracy of 67.39 % on the tri-phone model with 12 Gaussian mixtures. Comparing this result with the best result obtained in the experiment showed that an increase of about 8% was achieved in terms of correctly recognized words and of about 6% in terms of accuracy. This has proven that the use of Amharic data with limited Tigrinya data for training a Tigrinya recognizer does result in significant performance increase and that it is a promising future research direction given that different methods are applied to further achieve better results. As this is the first attempt other phone mapping techniques and approaches such as the data driven approach can also be tried for performance improvement purpose.
  • No Thumbnail Available
    Item
    Impact of Knowledge Management System on Customer Service Employees Performance the Case of Ethio Telecom
    (Addis Ababa University, 2021-12-02) Mesele, Meklit; Yifiru, Martha (PhD)
    This research aimed to identify the impacts of knowledge management system on employee performance of ethio telecom customer service department, call center employees. The researcher used quantitative research design among the various quantitative methods to explain the relationship variables that is to show the impacts of four independent variables which are knowledge acquisition, knowledge storage, and knowledge sharing and knowledge management system application on three dependent variables which are indicators of employee performance (innovational performance, operational performance and quality performance). Based on the usability of knowledge management system ethio telecom call center employees, working on 994 contact center are randomly selected from two cluster and questionnaire were administered to collect data from 316 sample respondents selected. Structural equation model (SEM) and confirmatory factor analysis (CFA) were conducted to analyze the relationship and impact of KMS on employee performance of ethio telecom. The findings show that three of the independent variables (knowledge acquisition, knowledge sharing and KMS application) had a positive and significant impact on quality performance, that means we can increase our organizations quality performance by giving more attention for knowledge acquisition, knowledge sharing and KMS application, whereas knowledge management system application and knowledge acquisition had insignificant impact on innovational performance, but knowledge acquisition have positive and significant impact on operational performance, so to increase operational performance of an organization we should give more prominence to knowledge acquisition. And but the relationship between knowledge sharing, knowledge storage and KMS application on operational performance is insignificant.
  • No Thumbnail Available
    Item
    Predicting Under Nutrition Status of Under-Five Children Using Data Mining Techniques: The Case of 2011 Ethiopian Demographic and Health Survey
    (Addis Ababa University, 2013-06) Markos, Zenebe; Yifiru, Martha (PhD)
    Background: under nutrition is one of the leading causes of morbidity and mortality in children under the age of five in most developing countries including Ethiopia. Objective: The general objective of this study was to design a model that predicts the nutritional status of under-five children using data mining techniques. Methodology: This study followed hybrid methodology of Knowledge Discovery Process to achieve the goal of building predictive model using data mining techniques and used secondary data from 2011 Ethiopia Demographic and Health Survey dataset. Hybrid process model was selected since it combines best features of Cross-Industry Standard Process for Data Mining and Knowledge Discovery in Database methodology to identify and describe several explicit feedback loops which are helpful in attaining the research objectives. WEKA 3.6.8 data mining tools and techniques such as J48 decision tree, Naïve Bayes and PART rule induction classifiers were utilized as means to address the research problem. Result: In this particular study, the predictive model developed using PART pruned rule induction found to be best performing having 92.6% of accurate results and 97.8% WROC area. Promising result has been achieved from the rules regarding nutritional status prediction. Conclusion: The results from this study were encouraging and confirmed that applying data mining techniques could indeed support a predictive model building task that predicts nutritional status of under-five children in Ethiopia. In the future, integrating large demographic and health survey dataset and clinical dataset, employing other classification algorithms, tools and techniques could yield better results. Keywords: Predictive modeling, Nutritional status, children, Data mining, EDHS dataset
  • No Thumbnail Available
    Item
    Query-based Automatic Summarizer for Afaan Oromo Text
    (Addis Ababa University, 2015-03) Bayisa, Asefa; Yifiru, Martha (PhD)
    Text summarization is the most challenging task in information retrieval. It is an outcome of electronic document explosion and can be seen as the condensation of the document collection. Automatic text summarization can be generic or query specific. In query-focused or query-oriented summarization a query is provided to a summarizer in addition to the source documents. The summarizer is supposed to construct a summary that contains information requested by the query. A document retrieval system together with a query-oriented summarization system is potentially a very powerful combination, which might be much more effective than a document retrieval system alone. Thus, this thesis focused on the possibility of developing query-based, single document, extractive summarization system. In this thesis, two methods that create text summaries by extracting and ranking sentences from the original documents are proposed. The first method is based on the most commonly used IR model called vector space model (VSM) for finding the most important sentences related to the query given by the user. The second method is the position method which is used in attempting to improve the quality of the summary along with VSM. The sentence ranking algorithm performs based on the sentence score to rank sentences in the order of their importance and finally summary is produced by selecting the top N sentences, where the value of N is set by the user. Experiments were conducted using40 Afaan Oromo news contained in the corpus. Three language experts of Oromia Culture and Tourism Bureau, language department were employed to conduct manual summarization which serves as the ideal summary. Intrinsic evaluation technique is used for evolution purpose. It involves both objective and subjective evaluation. The objective evaluation evaluate the performance of the system using standard Information Retrieval (IR) evaluation metrics (Precision, Recall and F-measure) and the subjective evaluation evaluate the linguistic quality such as informativeness and coherence using the scores on five scale measures by human evaluators. The results of the evaluations showed that the proposed system registered f-measure of 82%, 78% and 82% at summary extraction rate of 10%, 20%, and 30% respectively when VSM is used along with position method. Moreover, the informativeness and coherence of the proposed system also registered its best performance summary of 59%, 77% and 91% average score on five scale measures at extraction rate of 10%, 20%, and 30% respectively when both methods used together. The challenging task in the study includes lack of query expansion tools which help to obtain more clues in finding important sentences in the final summary and some final summaries contain unresolved references that may cause difficulties in understanding. These will be the future research directions in this area which contribute in the improvement of the proposed system.

Home |Privacy policy |End User Agreement |Send Feedback |Library Website

Addis Ababa University © 2023