Automated Construction of a New Dataset for Histopathological Breast Cancer Images

dc.contributor.advisorFitsum Assamnew (PhD)
dc.contributor.authorKalkidan Kebede
dc.date.accessioned2025-01-14T13:58:59Z
dc.date.available2025-01-14T13:58:59Z
dc.date.issued2024-01
dc.description.abstractCancer is a medical condition where cells grow uncontrollably and can spread to other parts of the body, posing a significant global health challenge. Among women worldwide, breast cancer is the most frequently diagnosed cancer and the leading cause of cancerrelated deaths. Automated classification of breast cancer has been extensively studied, particularly in differentiating types, subtypes, and stages. However, simultaneous classification of subtypes with stages, such as Lobular Carcinoma In Situ (LCIS) and Invasive Lobular Carcinoma (ILC), remains challenging due to limited data availability. This research aims to address this gap by generating a new dataset that includes these unclassified subtypes with staging, utilizing existing datasets as primary sources. Labels for ductal and lobular carcinoma from the BreakHis dataset and invasive and in situ carcinoma labels from the Yan et al. dataset are used to train models for generating the new dataset. To achieve this, two separate ensemble models are trained using distinct datasets. The first ensemble model classifies ductal and lobular carcinoma using the BreakHis dataset. The second ensemble model classifies invasive and in situ carcinoma using the Yan et al. dataset. Both models are then used to extract a new dataset through soft voting techniques. The extracted labels include Ductal Carcinoma In Situ (DCIS), Invasive Ductal Carcinoma (IDC), LCIS, and ILC. This approach aims to provide a more comprehensive classification system by leveraging labels from both datasets. To validate the newly extracted labels, three pathologists were given randomly extracted images from the Yan et al. dataset test set. The pathologists agreed with the model outputs on 87.5% of the samples. Subsequently, the newly generated dataset was used to classify DCIS, IDC, LCIS, and ILC with an accuracy of 76.06%.
dc.identifier.urihttps://etd.aau.edu.et/handle/123456789/4103
dc.language.isoen_US
dc.publisherAddis Ababa University
dc.subjectBreast cancer
dc.subjecthistopathology
dc.subjectDCIS
dc.subjectIDC
dc.subjectLCIS
dc.subjectILC
dc.subjectBreakHis
dc.subjectYan et al.
dc.titleAutomated Construction of a New Dataset for Histopathological Breast Cancer Images
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Kalkidan Kebede.pdf
Size:
1.24 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: