K‐Means Clustering and Random Forest Based Hybrid Intrusion Detection Algorithm
No Thumbnail Available
Date
2017-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
AAU
Abstract
The rapid growth of computers transformed the way in which information and data was stored
and transmitted. With this new paradigm of data access, comes the threat of this information
being exposed to unauthorized and unintended users. Because of this the integrity,
confidentiality, and availability of data in a network become the most challenging issue. Many
systems have been developed which scrutinize the data for deviation from the normal behavior or
search for a known signature within the data. These systems are termed as Intrusion Detection
Systems (IDS). IDSs employ different techniques varying from statistical methods to machine
learning algorithms.
This paper evaluates the performance of different intrusion detection algorithms using KDD’99
dataset and explores if certain algorithms perform better for certain attack classes and
consequently, if a multi-expert classifier design can deliver desired performance measure. The
algorithms detection performance is compared by using Detection Rate (DR) and False Alarm
Rate (FAR) evaluation metrics.
The experiment performed shows that those algorithms did in fact have different detection
performance for different attack types and no single algorithm exceeds in detecting all attack
types. Based on this evaluation results, best algorithms for each attack category is chosen and an
optimized hybrid algorithm called K-Means Clustering and Random Forest Based Hybrid
Intrusion Detection Algorithm (KRHA) is proposed.
The proposed algorithm classifies DoS, Probe, U2R and R2L attacks with 99.12%, 99.06 %,
89.79% and 78.63% accuracy respectively. This is an improvement from Fuzzy Logic which has
high detection rate for probe with 98.51% and Random Forest for U2R with 85.6% and K-means
clustering algorithm for R2L with 72.04% detection rate.
Description
Keywords
Intrusion Detection System, Data Mining, Machine Learning, Anomaly, Misuse, Clustering, Classification, KDD’99 Dataset, Hybrid, Detection Rate, Detection Rate