Automatic Malaria Detection Using Machine Learning Approaches

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Malaria parasites are one of the most common infectious diseases, causing widespread suffering and deaths in various parts of the world. To ease the process of detecting whether a person is infected or not, various studies have been conducted for a long time. However, most of the proposed techniques that have been used by different researchers for automating the detection process have limited detection accuracy. Besides, those proposed techniques are only focused on specific types of features rather than finding better feature types for automating the detection process. Thus, leading to models not generalizing very well. Furthermore, it is an active area of research demanding the development of automatic, efficient, reliable, and accurate detection systems. Due to this reason, this thesis aims to assess various features and classification techniques and selects the best possible method that yields the highest detection performance. The approaches followed in this study to determine whether a patient's blood sample is infected with malaria or not are dataset collection, image preprocessing, feature extraction and classification. To conduct the experiments a total of 27,558 segmented cell images extracted from thin blood smear slide images were used from the US National Institute of Health (NIH) recorded data. These images are enhanced using various preprocessing techniques. Once the preprocessing phase is done, three types of features namely color histogram features, haralick texture features and the combination of the two features are extracted. Finally, different supervised machine learning techniques with different model parameters such as support vector machine, decision tree, K nearest neighbor, multi-layer perceptron, random forest, and naive Bayes were used for the classification purpose. The proposed techniques were evaluated using a confusion matrix, and classification performance report to assess which has a higher classification potential. The random forest algorithm has achieved an average accuracy of 95%, average precision of 95.0%, 95.0% of average recall and an average F1 value of 95.0% over a test dataset of previously unseen 8266 images. From the analysis of the experimental results, the random forest algorithm gives better results than the other supervising machine learning classifiers. Thus, due to the fact that random forest aggregates more than two decision trees to avoid overfitting as well as error due to bias making it more accurate from the analyzed algorithms, and thereby showing the feasibility of its usage in real-time applications for determining whether a cell is infected with the malaria parasite.



Malaria, Blood smear, Image processing, Supervising Machine learning, Feature extraction