Automated Construction of a New Dataset for Histopathological Breast Cancer Images
dc.contributor.advisor | Fitsum Assamnew (PhD) | |
dc.contributor.author | Kalkidan Kebede | |
dc.date.accessioned | 2025-01-14T13:58:59Z | |
dc.date.available | 2025-01-14T13:58:59Z | |
dc.date.issued | 2024-01 | |
dc.description.abstract | Cancer is a medical condition where cells grow uncontrollably and can spread to other parts of the body, posing a significant global health challenge. Among women worldwide, breast cancer is the most frequently diagnosed cancer and the leading cause of cancerrelated deaths. Automated classification of breast cancer has been extensively studied, particularly in differentiating types, subtypes, and stages. However, simultaneous classification of subtypes with stages, such as Lobular Carcinoma In Situ (LCIS) and Invasive Lobular Carcinoma (ILC), remains challenging due to limited data availability. This research aims to address this gap by generating a new dataset that includes these unclassified subtypes with staging, utilizing existing datasets as primary sources. Labels for ductal and lobular carcinoma from the BreakHis dataset and invasive and in situ carcinoma labels from the Yan et al. dataset are used to train models for generating the new dataset. To achieve this, two separate ensemble models are trained using distinct datasets. The first ensemble model classifies ductal and lobular carcinoma using the BreakHis dataset. The second ensemble model classifies invasive and in situ carcinoma using the Yan et al. dataset. Both models are then used to extract a new dataset through soft voting techniques. The extracted labels include Ductal Carcinoma In Situ (DCIS), Invasive Ductal Carcinoma (IDC), LCIS, and ILC. This approach aims to provide a more comprehensive classification system by leveraging labels from both datasets. To validate the newly extracted labels, three pathologists were given randomly extracted images from the Yan et al. dataset test set. The pathologists agreed with the model outputs on 87.5% of the samples. Subsequently, the newly generated dataset was used to classify DCIS, IDC, LCIS, and ILC with an accuracy of 76.06%. | |
dc.identifier.uri | https://etd.aau.edu.et/handle/123456789/4103 | |
dc.language.iso | en_US | |
dc.publisher | Addis Ababa University | |
dc.subject | Breast cancer | |
dc.subject | histopathology | |
dc.subject | DCIS | |
dc.subject | IDC | |
dc.subject | LCIS | |
dc.subject | ILC | |
dc.subject | BreakHis | |
dc.subject | Yan et al. | |
dc.title | Automated Construction of a New Dataset for Histopathological Breast Cancer Images | |
dc.type | Thesis |