The Automatic Extraction of Bibliographic Information from Locally Published Journals in Ethiopia: A Feasibility of Ocr

No Thumbnail Available

Date

2000-05

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Research and development communities use journals as mechanisms of communications among themselves. As the size of research output increases idiom time to time, however, it was impossible to access each and every report that appeared in journals. Therefore, journal articles have to be indexed to facilitate access and control. The activity of indexing has to be systematic, so that research outputs remain accessible to the scientific collinearity. To achieve this lofty goal, indexing has to be made on regional/national basis to serve as part of the universal bibliographic control of journals. For document analysis, two levels of segmentation are used. The first level segmentation divides an input text into four zones (first text zone -- consisting of journal title, voluble, issue number, year and page range --, article title, author (s) and author abstract) using white line spacing as the end of a text zone. The second level segmentation degenerates the contents of the first text Holley ill to journal title, voluble, all issue lumber, year ally page range. The results of the two level segmentation algorithms are then considered for field classification (document understanding). Classification of fields is made based on geometric and non geometric features. The geometric feature zone order is lased to label article title, author (s) and author abstract. all the other hand the non-geometric features (different punctuation marks consisting of comma, colon, braces, etc.) serves to label the fields in the first text zone as journal title, volume, issue number, year, and page doing. The system is 85.57 % successful in correctly segmenting and labeling bibliographic fields. The recognized fields are converted to ISO 2709 format to export into Misfortune Windows.

Description

Keywords

Information Science

Citation