The Automatic Extraction of Bibliographic Information from Locally Published Journals in Ethiopia: A Feasibility of Ocr
No Thumbnail Available
Date
2000-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Research and development communities use journals as mechanisms of communications
among themselves. As the size of research output increases idiom time to time, however, it was
impossible to access each and every report that appeared in journals. Therefore, journal
articles have to be indexed to facilitate access and control. The activity of indexing has to be
systematic, so that research outputs remain accessible to the scientific collinearity. To
achieve this lofty goal, indexing has to be made on regional/national basis to serve as part of
the universal bibliographic control of journals. For document analysis, two levels of segmentation are used. The first level segmentation
divides an input text into four zones (first text zone -- consisting of journal title, voluble, issue
number, year and page range --, article title, author (s) and author abstract) using white line
spacing as the end of a text zone. The second level segmentation degenerates the contents of the first text Holley ill to journal title, voluble, all issue lumber, year ally page range. The
results of the two level segmentation algorithms are then considered for field classification
(document understanding). Classification of fields is made based on geometric and non geometric
features. The geometric feature zone order is lased to label article title, author (s)
and author abstract. all the other hand the non-geometric features (different punctuation
marks consisting of comma, colon, braces, etc.) serves to label the fields in the first text zone
as journal title, volume, issue number, year, and page doing. The system is 85.57 %
successful in correctly segmenting and labeling bibliographic fields. The recognized fields
are converted to ISO 2709 format to export into Misfortune Windows.
Description
Keywords
Information Science