Explainable Rhythm-Based Heart Disease Detection from ECG Signals
No Thumbnail Available
Date
2023-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Healthcare decision support systems must function with confidence, trust, and a functional
understanding. Many researches have been done to automate the identification and classification
of cardiovascular conditions from Electrocardiogram (ECG) signals. One such
area of research is the use of Deep Learning (DL) for classification of ECG signals. However,
DL models do not provide the information on why they reached their final decision.
This makes it difficult to trust their output in a medical environment. In order to resolve
the trust issue, there is research being done to explain the decision the DL model has arrived
at. Some approaches have been used to improve the interpretability of DL models,
using the Shapley Value SHAP technique. However, SHAP’s explanation happens to be
computationally expensive.
In this research, we develop a deep learning model that can detect five rhythm-based heart
diseases that incorporate explainability. We employ visual explainers: Grad-CAM and
Grad-CAM++; as an explainability framework. These explainers are relatively lightweight
and can be executed quickly on a standard CPU or GPU. Our model was trained using 12-
leads ECG signals from the PTB-XL large dataset. We used 3,229 ECG records to train
the model, 404 ECG records to validate it, and 403 ECG records to test it. Our model was
effective, with a classification evaluation accuracy of 0.96 and an F1 of 0.88. In order to
evaluate the explainability, we gave ten randomly selected outputs to two domain experts.
The two experts agreed with at least 80% of the explanations given to them. In the explanations
that were not completely accepted by the experts, many of the leads out of the 12
were correctly explained. Showing that the use of visual explainability like Grad-CAM++
could be useful in the diagnosis process of heart diseases. The outcomes of this evaluation
suggest that our model output is, on average, on the ten sample cases, 80% correct and
consistent with the evaluation of the two experts.