Automated Construction of a New Dataset for Histopathological Breast Cancer Images

Kalkidan Kebede

Automated Construction of a New Dataset for Histopathological Breast Cancer Images

dc.contributor.advisor	Fitsum Assamnew (PhD)
dc.contributor.author	Kalkidan Kebede
dc.date.accessioned	2025-01-14T13:58:59Z
dc.date.available	2025-01-14T13:58:59Z
dc.date.issued	2024-01
dc.description.abstract	Cancer is a medical condition where cells grow uncontrollably and can spread to other parts of the body, posing a significant global health challenge. Among women worldwide, breast cancer is the most frequently diagnosed cancer and the leading cause of cancerrelated deaths. Automated classification of breast cancer has been extensively studied, particularly in differentiating types, subtypes, and stages. However, simultaneous classification of subtypes with stages, such as Lobular Carcinoma In Situ (LCIS) and Invasive Lobular Carcinoma (ILC), remains challenging due to limited data availability. This research aims to address this gap by generating a new dataset that includes these unclassified subtypes with staging, utilizing existing datasets as primary sources. Labels for ductal and lobular carcinoma from the BreakHis dataset and invasive and in situ carcinoma labels from the Yan et al. dataset are used to train models for generating the new dataset. To achieve this, two separate ensemble models are trained using distinct datasets. The first ensemble model classifies ductal and lobular carcinoma using the BreakHis dataset. The second ensemble model classifies invasive and in situ carcinoma using the Yan et al. dataset. Both models are then used to extract a new dataset through soft voting techniques. The extracted labels include Ductal Carcinoma In Situ (DCIS), Invasive Ductal Carcinoma (IDC), LCIS, and ILC. This approach aims to provide a more comprehensive classification system by leveraging labels from both datasets. To validate the newly extracted labels, three pathologists were given randomly extracted images from the Yan et al. dataset test set. The pathologists agreed with the model outputs on 87.5% of the samples. Subsequently, the newly generated dataset was used to classify DCIS, IDC, LCIS, and ILC with an accuracy of 76.06%.
dc.identifier.uri	https://etd.aau.edu.et/handle/123456789/4103
dc.language.iso	en_US
dc.publisher	Addis Ababa University
dc.subject	Breast cancer
dc.subject	histopathology
dc.subject	DCIS
dc.subject	IDC
dc.subject	LCIS
dc.subject	ILC
dc.subject	BreakHis
dc.subject	Yan et al.
dc.title	Automated Construction of a New Dataset for Histopathological Breast Cancer Images
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kalkidan Kebede.pdf
Size:: 1.24 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Computer Engineering