Fitsum Assamnew (PhD)Eden Teklemariam2023-12-142023-12-142023-08http://etd.aau.edu.et/handle/123456789/1032Insider threats are among the most difficult cyber threats to counteract since they emerge from an Organization’s own trusted employee who knows its organizational structure plus system and often leads that organization to a significant loss. The problem of insider threat detection has been researched for a long time in both the security and data mining sectors. The existing studies face challenges due to a lack of labeled datasets, imbalanced classes, and feature representation. The machine learning approaches depend on manual feature engineering that takes time and requires expertise and knowledge. The deep learning approaches depend on a huge amount of labeled and balanced training data. In the case of insider threat detection, the number of malicious users compared to normal ones is significantly imbalanced in both real-world scenarios and insider threat working datasets. Data on insider threats often includes a lot of features, which can make the data highdimensional and difficult to represent the relevant features. In this paper, we propose a novel approach that includes CNN (Convolutional Neural Network) and LSTM (Long short-term memory) approach to act independently over the publicly available insider threat dataset, CERT (Computer Emergency Response Team) release 4.2 for feature extraction and few-shot learning based detector for insider threat detection. We concatenate features extracted from the dataset using both deep learning models and do feature selection to have relevant and best features. We use the Siamese neural network (SNN), called the ’twin’ network of few-shot learning, to detect malicious insiders. We do an experiment using three datasets which are CNN-extracted features datasets (datasets found from a process of feature extraction using CNN), LSTM-extracted features datasets (datasets found from a process of feature extraction using LSTM), and Selected-features datasets (datasets found after applying feature concatenation and selection techniques). We do compare our model with other baseline models such as RNN, isolation forest, and XGBoost. The experimental result shows that with the experiment done using CNN-extracted features datasets, the proposed model best performs with an F1 score of 68% which is 18% better than the isolation forest which performs the worst, and an FNR (False Negative Rate) value of 0.006. The second experiment is done using LSTM-extracted features datasets, and the results of SNN in terms of an F1 score is 69% which is 11% better than the isolation forest which performs the worst with an F1 score of 58% and an FNR value of 0.2.The last experiment is done using the Selected-features dataset, the proposed model outperforms in terms of F1 score by having the highest value of 87% which is 10% greater than compared to the least performer baseline model (XGBoost) that has 77% of F1 score. In terms of FNR (False Negative Rate) the lowest value of 0.11 FNR with the CNN-extracted features datasets.en-USInsider threat, detection, Training, CNN, LSTM, Few-shot learning, dataset, SNN, features, datasetUser Behavior-Based Insider Threat Detection Using Few Shot LearningThesis