Training Stability of Multi-modal Unsupervised Image-to-Image Translation for Low Image Resolution Quality
dc.contributor.advisor | Bisrat Derbesa (PhD) | |
dc.contributor.author | Yonas Desta | |
dc.date.accessioned | 2024-05-27T08:35:28Z | |
dc.date.available | 2024-05-27T08:35:28Z | |
dc.date.issued | 2023-05 | |
dc.description.abstract | The ultimate objective of the unsupervised image-to-image translation is to find the relationship between two distinct visual domains. The major drawback of this task is several alternative outputs from a single input image. In a Multi-modal unsupervised image-to-image translation model, There exist common latent space representations across images from many domains. The model showed one-to-many mapping and its ability to produce several outputs from a particular image source. One of the challenges with the Multi-modal Unsupervised Image-to-Image Translation model is training instability, which occurs when the model is training using a data set with low-quality images, such as 128x128. During the training instability, the generator loss reduces slowly because the generator is too hard trying to find a new equilibrium. To address this limitation, We propose spectral normalization as a method for weight normalization, which would limit the fitting ability of the network to stabilize the training of the discriminator in networks. The Lipschitz constant was a single hyperparameter that was adjusted. Our experiments used two different datasets. The first dataset contains 5000 images, and we conducted two separate experiments using data set with 5 and 10 epochs. In 5 epochs, our proposed method has achieved overall training loss generator losses reduced by 5.049 % on average and discriminator losses reduced by 2.882 % on average. In addition, in 10 epochs, total training loss generator losses of 5.032% and discriminator losses of 2.864% decreased on average. The second data-set contains 20000 images, and we used datasets with 5 and 10 epochs in two different experiments. Over 5 epochs, our proposed method reduced overall training loss generator losses by 4.745 % on average and discriminator losses by 2.787 % on average. Furthermore, in 10 epochs, the average total training loss was reduced, with generator losses of 3.092 % and discriminator losses of 2.497%. In addition, During the transition, our approach produces output images that are more realistic than multi modal unsupervised imageto- image translation. | |
dc.identifier.uri | https://etd.aau.edu.et/handle/123456789/3033 | |
dc.language.iso | en_US | |
dc.publisher | Addis Ababa University | |
dc.subject | Generative Adversarial Networks | |
dc.subject | Image-to-Image translation | |
dc.subject | Style Transfer | |
dc.title | Training Stability of Multi-modal Unsupervised Image-to-Image Translation for Low Image Resolution Quality | |
dc.type | Thesis |