User Behavior-Based Insider Threat Detection Using Few Shot Learning
No Thumbnail Available
Date
2023-08
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Insider threats are among the most difficult cyber threats to counteract since they emerge
from an Organization’s own trusted employee who knows its organizational structure plus
system and often leads that organization to a significant loss. The problem of insider threat
detection has been researched for a long time in both the security and data mining sectors.
The existing studies face challenges due to a lack of labeled datasets, imbalanced classes,
and feature representation. The machine learning approaches depend on manual feature
engineering that takes time and requires expertise and knowledge. The deep learning approaches
depend on a huge amount of labeled and balanced training data. In the case of
insider threat detection, the number of malicious users compared to normal ones is significantly
imbalanced in both real-world scenarios and insider threat working datasets.
Data on insider threats often includes a lot of features, which can make the data highdimensional
and difficult to represent the relevant features. In this paper, we propose a
novel approach that includes CNN (Convolutional Neural Network) and LSTM (Long
short-term memory) approach to act independently over the publicly available insider
threat dataset, CERT (Computer Emergency Response Team) release 4.2 for feature extraction
and few-shot learning based detector for insider threat detection. We concatenate
features extracted from the dataset using both deep learning models and do feature selection
to have relevant and best features. We use the Siamese neural network (SNN), called
the ’twin’ network of few-shot learning, to detect malicious insiders. We do an experiment
using three datasets which are CNN-extracted features datasets (datasets found from a process
of feature extraction using CNN), LSTM-extracted features datasets (datasets found
from a process of feature extraction using LSTM), and Selected-features datasets (datasets
found after applying feature concatenation and selection techniques). We do compare our
model with other baseline models such as RNN, isolation forest, and XGBoost. The experimental
result shows that with the experiment done using CNN-extracted features datasets,
the proposed model best performs with an F1 score of 68% which is 18% better than the
isolation forest which performs the worst, and an FNR (False Negative Rate) value of
0.006. The second experiment is done using LSTM-extracted features datasets, and the
results of SNN in terms of an F1 score is 69% which is 11% better than the isolation forest
which performs the worst with an F1 score of 58% and an FNR value of 0.2.The last experiment is done using the Selected-features dataset, the proposed model outperforms
in terms of F1 score by having the highest value of 87% which is 10% greater than
compared to the least performer baseline model (XGBoost) that has 77% of F1 score. In
terms of FNR (False Negative Rate) the lowest value of 0.11 FNR with the CNN-extracted
features datasets.
Description
Keywords
Insider threat, detection, Training, CNN, LSTM, Few-shot learning, dataset, SNN, features, dataset