K‐Means Clustering and Random Forest Based Hybrid Intrusion Detection Algorithm

No Thumbnail Available

Date

2017-12

Journal Title

Journal ISSN

Volume Title

Publisher

AAU

Abstract

The rapid growth of computers transformed the way in which information and data was stored and transmitted. With this new paradigm of data access, comes the threat of this information being exposed to unauthorized and unintended users. Because of this the integrity, confidentiality, and availability of data in a network become the most challenging issue. Many systems have been developed which scrutinize the data for deviation from the normal behavior or search for a known signature within the data. These systems are termed as Intrusion Detection Systems (IDS). IDSs employ different techniques varying from statistical methods to machine learning algorithms. This paper evaluates the performance of different intrusion detection algorithms using KDD’99 dataset and explores if certain algorithms perform better for certain attack classes and consequently, if a multi-expert classifier design can deliver desired performance measure. The algorithms detection performance is compared by using Detection Rate (DR) and False Alarm Rate (FAR) evaluation metrics. The experiment performed shows that those algorithms did in fact have different detection performance for different attack types and no single algorithm exceeds in detecting all attack types. Based on this evaluation results, best algorithms for each attack category is chosen and an optimized hybrid algorithm called K-Means Clustering and Random Forest Based Hybrid Intrusion Detection Algorithm (KRHA) is proposed. The proposed algorithm classifies DoS, Probe, U2R and R2L attacks with 99.12%, 99.06 %, 89.79% and 78.63% accuracy respectively. This is an improvement from Fuzzy Logic which has high detection rate for probe with 98.51% and Random Forest for U2R with 85.6% and K-means clustering algorithm for R2L with 72.04% detection rate.

Description

Keywords

Intrusion Detection System, Data Mining, Machine Learning, Anomaly, Misuse, Clustering, Classification, KDD’99 Dataset, Hybrid, Detection Rate, Detection Rate

Citation