Automatic Amharic Factual Question Generation from Historic Text Using Rule Based Approach
No Thumbnail Available
Date
2021-06-28
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Nowadays, due to the availability of digital devices, important educational materials in a variety of languages have become available. However, these texts do not have sufficient amount of practical questions and assessments. Manually preparing meaningful and relevant questions from such materials is a time-consuming and difficult endeavor that necessitates expertise, experience, and resources. This research addresses the problem by automatically generating questions from Amharic texts, with a particular focus on automating the construction of factual questions from text. The automatic Amharic factual question generation systems, which is developed in this research, takes a historical text as input and produces a set of possible questions as output. Historical texts contain various named entities such as names of persons, locations name, cities name, countries name, dates and other entities, which helps to generate many questions.
The methodology used in this study is design science. It has six main activities namely, problem identification and motivation, defining objectives, design and development, demonstration, evaluation and communication. The current research used Part of Speech (PoS) tagger and Named Entity Recognition (NER). The PoS aids in the development of NER. The NER was also utilized to identify answer keywords and generate probable question phrases. In addition, informative sentence selection is used to select informative sentences from the text based on NER and using a certain rules. Transformation rules are used to construct questions from sentences. A prototype is developed using python. Human-evaluator is used to evaluate the question generation system.
The experimental results showed 86.4% accuracy for PoS tagger, 82.0% accuracy for NER and 95.3% accuracy for relevant sentence selection. The experimental results of each question type got 94.1% accuracy for “ስንት” (how much/many), 91.6% accuracy for “ማን” (who), 83.3% accuracy for “መቼ” (When) and 73.0% accuracy for “የት” (where). The overall question generation system come up with 84.6% of accuracy. This shows that the system has high accuracy in question type “ስንት” (how much/many) and needs some improvement in question type “የት” (where).
The system gives a good results for some question types. Accordingly, it is concluded that the system gives a good accuracy for a good coverage of domain specific datasets and also defining more rules by adding more word classes.
For future works, forming new rules to improve the existing rules by adding more word classes, handling exceptions, preparing more domain specific training datasets, preparing common automatic question generation architecture and evaluation techniques are recommended.
Description
Keywords
Automatic Question Generation, Factual Questions, Natural Language Generation