Automated Construction of a New Dataset for Histopathological Breast Cancer Images
No Thumbnail Available
Date
2024-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Cancer is a medical condition where cells grow uncontrollably and can spread to other
parts of the body, posing a significant global health challenge. Among women worldwide,
breast cancer is the most frequently diagnosed cancer and the leading cause of cancerrelated
deaths. Automated classification of breast cancer has been extensively studied,
particularly in differentiating types, subtypes, and stages. However, simultaneous classification
of subtypes with stages, such as Lobular Carcinoma In Situ (LCIS) and Invasive
Lobular Carcinoma (ILC), remains challenging due to limited data availability.
This research aims to address this gap by generating a new dataset that includes these unclassified
subtypes with staging, utilizing existing datasets as primary sources. Labels for
ductal and lobular carcinoma from the BreakHis dataset and invasive and in situ carcinoma
labels from the Yan et al. dataset are used to train models for generating the new
dataset.
To achieve this, two separate ensemble models are trained using distinct datasets. The
first ensemble model classifies ductal and lobular carcinoma using the BreakHis dataset.
The second ensemble model classifies invasive and in situ carcinoma using the Yan et al.
dataset. Both models are then used to extract a new dataset through soft voting techniques.
The extracted labels include Ductal Carcinoma In Situ (DCIS), Invasive Ductal Carcinoma
(IDC), LCIS, and ILC. This approach aims to provide a more comprehensive classification
system by leveraging labels from both datasets.
To validate the newly extracted labels, three pathologists were given randomly extracted
images from the Yan et al. dataset test set. The pathologists agreed with the model outputs
on 87.5% of the samples. Subsequently, the newly generated dataset was used to classify
DCIS, IDC, LCIS, and ILC with an accuracy of 76.06%.
Description
Keywords
Breast cancer, histopathology, DCIS, IDC, LCIS, ILC, BreakHis, Yan et al.