Selection of Data Mining Algorithm for Masked Feature Network Intrusion Detection on Real World Data With Missing Value: The Case of Ethiopian Institutes of Agricultural Research

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


. Intrusion detection has become a critical component of network administration due to the vast number of attacks persistently threating our computer system. As network attacks have increased in number and severity over the past few years, intrusion detection system (IDS) is increasingly becoming a critical component to secure the network. Due to large volumes of security audit data as well as complex and dynamic properties of intrusion behaviors, optimizing performance of IDS becomes an important open problem that is receiving more and more attention from the research community. Recently there has been much interest in applying data mining to computer network intrusion detection. Many methods have been developed to secure computer networks and communication over the Internet. However, none of the existing methods developed by different researches have an accuracy of detecting attacks with high detection rate and low false alarm rate. Moreover intruders can also chat the system by masking their some features to attack the system. The other thing is most deal with single detection approach with high number of features which is challenging and time consuming to implement. This thesis work is devoted to solve those problems of Ethiopian Institute of Agricultural Research (EIAR) using intrusion detection system architecture that is based on semi-supervised collective classification algorithm of meta.Filtered Collective Classifier that can promptly detect and classify attacks, whether they are known or never seen before, even they mask their some features by using missing value dataset. The data set in this study is taken from EIAR data center. After taking the data, it has been preprocessed. In the preprocessing activities, removing outliers and resolving inconsistencies tasks are taken place. The researcher has taken the dataset initially had 25192 records but after the preprocessing stage, it was reduced to 28 attributes and 12596 records which are labeled as Normal, DOS, U2R, Probe and R2L. For supervised modeling, the 6965 records are taken. For building a predictive model for intrusion detection semi-supervised collective classification meta.Filtered Collective Classifier and ordinary J48 decision tree algorithms have been tested as a classification approach by using unlabeled class with missing value and with no missing value dataset. xvii The model that was created using the Semi-Supervised meta.Filtered Collective Classifier parameters with fully Training/Test set showed the best classification accuracy of 96.2% by using the first dataset, with missing value and the ordinary J48 tree with its default 10-fold cross validation showed better performance of 100% accuracy by using the second dataset, with no missing value to classify the new instances as Normal, DOS, U2R, Probe and R2L classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.



Intrusion Detection, Data Mining, Semi-Supervise Learning, Collective Classifier, Missing Value Dataset, Masked Feature Intrusion Detection