Knowledge Graph Construction Based on Ontology from Source Code: The Case of Python
No Thumbnail Available
Date
2021-03-29
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Technology companies and online communities have been emerging tremendously and this resulted in release of millions of software. Source code is believed to hold necessarily important information about the software and business logic. Therefore, a semantically well linked and organized code data management system has been crucial issue in the field of software engineering. This study deals with an automatic method for constructing knowledge graph for python source code based on domain ontology. This allows software engineers in various fields such as online communities, open-source developers, knowledge management, expert systems, and semantic web to understand and process code semantically. A supervised Bi-LSTM (bi directional Long Short-Term Memory) network with CRF (Conditional Random Fields) on the top was used to extract candidate terms to be concepts/entities. The models were defined manually and trained automatically and simultaneously on a labeled data corpus. Using CRF on the top of BI-LSTM makes an optimized classification of terms in a given source code. Some features to be extracted from source code in addition to the default CRF features were defined and this helped the model to learn constraints for classification. Then Bi-LSTM model was adopted to extract relations (taxonomic and non-taxonomic). We have extracted relations among concepts both in term level and code level and the result was merged using max pooling.
Experiments on SNIPS-NLU library (python library for natural language processing) shows the relevance and feasibility of proposed approach. Evaluation was done in two ways, one using gold standard ontology developed by expert and the other by expert evaluation. The result of experiment shows this approach achieved average f-measure of 77.04 and average relevance of 81.275 based on expert evaluation. This result implies that recurrent neural networks are efficient and promising in entity and relation extraction from python and other related programming languages.
Description
Keywords
Knowledge Graph, Knowledge Graph Construction, Ontology, Ontology Learning, Semantic Web, Knowledge-Base