Ethio-Semitic Proto-Language Reconstruction with In-Context Learning and LSTM Encode-Decode Model

dc.contributor.advisorFitsum Assamnew
dc.contributor.authorElleni Sisay
dc.date.accessioned2026-04-24T13:27:49Z
dc.date.available2026-04-24T13:27:49Z
dc.date.issued2024-12
dc.description.abstractAs language evolve, it change and words obtain new meanings and lose old ones, making their reconstruction a critical area of study. Proto-EthioSemitic languages, in particular remain underexplored despite their cultural and historical significance. This research investigates Historical Language Reconstruction (HLR) for Proto-EthioSemitic languages in word level, focusing on two core objectives: cognate identification and proto-word reconstruction. A three-way dictionary was used to compile a dataset of 14,100 semantically related words from Amharic, Ge’ez, and Tigrinya. Linguists manually identified a golden data set with 74 cognate pairs from the Swadesh list concept translated into the three languages of interest and reconstructed proto-forms, while using automated methods (SCA and LexStat) extracted an additional 1,847 cognates from the dataset, significantly enhancing scale. Building on these results, synthetic proto-forms were generated using in-context learning with GPT-4o, based on its performance of achieving a reconstruction accuracy of 85% when evaluated against the golden data. Furthermore, an LSTM-based encodedecode model was trained on the generated data to predict proto-forms from cognates, achieving a prediction accuracy of 91% and an average edit distance of 0.21. This work establishes a foundation for reconstructing ancestral languages within the Afro-Semitic family by integrating linguistic expertise, automated cognate extraction tools, and state-of the-art large language models. The findings underscore the potential of interdisciplinary approaches in preserving and understanding linguistic heritage, with implications for future studies in historical linguistics and language preservation.
dc.identifier.urihttps://etd.aau.edu.et/handle/123456789/8078
dc.language.isoen_US
dc.publisherAddis Ababa University
dc.subjectCognates
dc.subjectProto-word
dc.subjectIn-context learning
dc.subjectGPT 4o
dc.subjectLSTM based encodedecode.
dc.titleEthio-Semitic Proto-Language Reconstruction with In-Context Learning and LSTM Encode-Decode Model
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Elleni Sisay.pdf
Size:
542.59 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: