Explainable Rhythm-Based Heart Disease Detection from ECG Signals

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Healthcare decision support systems must function with confidence, trust, and a functional understanding. Many researches have been done to automate the identification and classification of cardiovascular conditions from Electrocardiogram (ECG) signals. One such area of research is the use of Deep Learning (DL) for classification of ECG signals. However, DL models do not provide the information on why they reached their final decision. This makes it difficult to trust their output in a medical environment. In order to resolve the trust issue, there is research being done to explain the decision the DL model has arrived at. Some approaches have been used to improve the interpretability of DL models, using the Shapley Value SHAP technique. However, SHAP’s explanation happens to be computationally expensive. In this research, we develop a deep learning model that can detect five rhythm-based heart diseases that incorporate explainability. We employ visual explainers: Grad-CAM and Grad-CAM++; as an explainability framework. These explainers are relatively lightweight and can be executed quickly on a standard CPU or GPU. Our model was trained using 12- leads ECG signals from the PTB-XL large dataset. We used 3,229 ECG records to train the model, 404 ECG records to validate it, and 403 ECG records to test it. Our model was effective, with a classification evaluation accuracy of 0.96 and an F1 of 0.88. In order to evaluate the explainability, we gave ten randomly selected outputs to two domain experts. The two experts agreed with at least 80% of the explanations given to them. In the explanations that were not completely accepted by the experts, many of the leads out of the 12 were correctly explained. Showing that the use of visual explainability like Grad-CAM++ could be useful in the diagnosis process of heart diseases. The outcomes of this evaluation suggest that our model output is, on average, on the ten sample cases, 80% correct and consistent with the evaluation of the two experts.