AAU-ETD AAU-ETD
 

Addis Ababa University Libraries Electronic Thesis and Dissertations: AAU-ETD! >
Faculty of Informatics >
Thesis - Information Science >

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/1188

Title: OPTICAL CHARACTER RECOGNITION OF AMHARIC TEXT: AN INTEGRATED APPROACH
Authors: Yaregal, Assabie
Advisors: Ato Million Meshesha
Ato Dereje Teferi
Copyright: 2002
Date Added: 23-May-2008
Publisher: Addis Ababa University
Abstract: CHAPTER ONE INTRODUCTION............................................................................................................. 1 1.1. Background......................................................................................................... 1 1.2. Statement of the Problem.................................................................................... 5 1.3. Justification of the Study .................................................................................... 7 1.4. Objectives of the Study....................................................................................... 9 1.4.1. General Objective ....................................................................................... 9 1.4.2. Specific Objectives ..................................................................................... 9 1.5. Methods............................................................................................................. 10 1.5.1. Review of Literature ................................................................................. 10 1.5.2. Development and/or Adoption of Pattern Extraction Algorithms ............ 10 1.5.3. Design and Development of Character Database ..................................... 10 1.5.4. Neural Network Model Building .............................................................. 10 1.5.5. Testing....................................................................................................... 11 1.6. Scope and Limitation of the Study.................................................................... 11 1.7. Organization of the Study ................................................................................. 12 CHAPTER TWO REVIEW OF OCR SYSTEM........................................................................................ 13 2.1. Introduction....................................................................................................... 13 2.2. Basic Methods of Preprocessing....................................................................... 13 2.2.1. Digitization ............................................................................................... 14 2.2.2. Segmentation............................................................................................. 14 2.2.3. Thinning.................................................................................................... 15 2.2.4. Size Normalization.................................................................................... 16 2.2.5. Slant correction ......................................................................................... 16 2.3. Pattern Recognition Techniques ....................................................................... 17 2.3.1. Template Matching Approach .................................................................. 18 2.3.2. Statistical Approach.................................................................................. 19 2.3.3. The Syntactic or Structural Approach....................................................... 20 2.3.4. The Neural Network Approach................................................................. 25 2.3.4.1. The Biological Neural System.......................................................... 25 2.3.4.2. From Biological Neural Network to ANN........................................ 27 2.3.4.3. Why Neural Networks are Used? ..................................................... 28 2.3.4.4. Network Layers................................................................................. 29 2.3.4.5. Architectures of Neural Networks .................................................... 30 2.3.4.6. The Learning Process........................................................................ 32 2.3.4.7. Transfer Functions ............................................................................ 33 2.3.4.8. The Back-Propagation Algorithm..................................................... 34 2.3.4.9. Neural Network Parameters.............................................................. 34 V 2.3.4.10. Neural Network Statistical Data ....................................................... 36 2.3.5. Hybrid Approach ...................................................................................... 36 CHAPTER THREE THE AMHARIC WRITING SYSTEM........................................................................ 38 3.1. Introduction....................................................................................................... 38 3.2. The Amharic Writing System........................................................................... 38 3.2.1. The Amharic Characters (Fidel) ............................................................... 41 3.2.2. Nature of Amharic Characters .................................................................. 43 3.3. The Amharic Character OCR Developments ................................................... 44 3.3.1. Segmentation............................................................................................. 45 3.3.2. Image Restoration ..................................................................................... 45 3.3.3. Underline Detection and Removal............................................................ 46 3.3.4. Thinning.................................................................................................... 46 3.3.5. Size Normalization.................................................................................... 47 3.3.6. Feature Extraction..................................................................................... 48 3.3.7. The Neural Network Approach................................................................. 49 CHAPTER FOUR DESIGN AND DEVELOPMENT ................................................................................. 50 4.1. Introduction....................................................................................................... 50 4.2. Primitive Structures in Amharic Characters ..................................................... 50 4.2.1. The Vertical Line Primitives ( ) ............................................................... 53 4.2.2. The Appendage Primitives ( ) ................................................................ 54 4.2.3. The Backslash Primitive ( ) ................................................................... 55 4.2.4. The Forward Slash primitives ( )........................................................... 55 4.3. Primitive Relationship Handling and Pattern Generation................................. 55 4.4. Neural Network Approach................................................................................ 62 4.5. General Design of Amharic OCR System........................................................ 66 4.6. Preprocessing .................................................................................................... 69 4.6.1. Digitization ............................................................................................... 69 4.6.2. Segmentation............................................................................................. 69 4.6.3. Identification of Character Boundary ....................................................... 70 4.7. Primitive Extraction.......................................................................................... 71 4.8. Pattern Generation ............................................................................................ 79 4.8.1. Selection of the Root Primitive................................................................. 79 4.8.2. Construction of Primitive Tree ................................................................. 80 4.8.3. Generation of Patterns of Primitives......................................................... 85 4.9. Training Patterns with Neural Network............................................................ 87 4.10. Recognition................................................................................................... 92 4.11. An Improved Primitive Extraction Algorithm.............................................. 92 4.12. The EthiopicOCR Prototype ....................................................................... 103 VI CHAPTER FIVE TESTING AND EVALUATION................................................................................. 106 5.1. Testing............................................................................................................. 106 5.2. Evaluation ....................................................................................................... 107 5.2.1. General Error Analysis ........................................................................... 107 5.2.2. Analysis of Results for VG2000 Agazian Font Size 12 ......................... 108 5.2.3. Analysis of Results for VG2000 Agazian Font Size 8 ........................... 109 5.2.4. Analysis of Results for VG2000 Agazian Font Size 14 ......................... 111 5.2.5. General Discussion of the Results .......................................................... 111 CHAPTER SIX CONCLUSION AND RECOMMENDATION.......................................................... 112 5.1. Conclusion ...................................................................................................... 112 5.2. Recommendation ............................................................................................ 117 REFERENCES.............................................................................................................. 119 APPENDICES............................................................................................................... 123 Appendix I. The Amharic Character Set (Bender et al., 1976) ...................... 123 Appendix II. The Amharic Characters Included in the Training Set ................ 124 Appendix III. The Amharic Document Used for Test Case ................................. 125 Appendix IV. Result of the Test Case with Font of Size 8.................................. 126 Appendix V. Result of the Test Case with Font of Size 12 ................................ 127 Appendix VI. Result of the Test Case with Font of Size 14................................ 128 Appendix VII. The Source Code of the Experimentation...................................... 129
Description: A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENT FOR THE DEGREE OF MASTERS OF SCIENCE IN INFORMATION SCIENCE
URI: http://hdl.handle.net/123456789/1188
Appears in:Thesis - Information Science

Files in This Item:

File Description SizeFormat
Yaregal Assabie Lake.pdf836.67 kBAdobe PDFView/Open

Items in the AAUL Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.

 

  Last updated: May 2010. Copyright © Addis Ababa University Libraries - Feedback