Training Stability of Multi-modal Unsupervised Image-to-Image Translation for Low Image Resolution Quality

dc.contributor.advisorBisrat Derbesa (PhD)
dc.contributor.authorYonas Desta
dc.date.accessioned2024-05-27T08:35:28Z
dc.date.available2024-05-27T08:35:28Z
dc.date.issued2023-05
dc.description.abstractThe ultimate objective of the unsupervised image-to-image translation is to find the relationship between two distinct visual domains. The major drawback of this task is several alternative outputs from a single input image. In a Multi-modal unsupervised image-to-image translation model, There exist common latent space representations across images from many domains. The model showed one-to-many mapping and its ability to produce several outputs from a particular image source. One of the challenges with the Multi-modal Unsupervised Image-to-Image Translation model is training instability, which occurs when the model is training using a data set with low-quality images, such as 128x128. During the training instability, the generator loss reduces slowly because the generator is too hard trying to find a new equilibrium. To address this limitation, We propose spectral normalization as a method for weight normalization, which would limit the fitting ability of the network to stabilize the training of the discriminator in networks. The Lipschitz constant was a single hyperparameter that was adjusted. Our experiments used two different datasets. The first dataset contains 5000 images, and we conducted two separate experiments using data set with 5 and 10 epochs. In 5 epochs, our proposed method has achieved overall training loss generator losses reduced by 5.049 % on average and discriminator losses reduced by 2.882 % on average. In addition, in 10 epochs, total training loss generator losses of 5.032% and discriminator losses of 2.864% decreased on average. The second data-set contains 20000 images, and we used datasets with 5 and 10 epochs in two different experiments. Over 5 epochs, our proposed method reduced overall training loss generator losses by 4.745 % on average and discriminator losses by 2.787 % on average. Furthermore, in 10 epochs, the average total training loss was reduced, with generator losses of 3.092 % and discriminator losses of 2.497%. In addition, During the transition, our approach produces output images that are more realistic than multi modal unsupervised imageto- image translation.
dc.identifier.urihttps://etd.aau.edu.et/handle/123456789/3033
dc.language.isoen_US
dc.publisherAddis Ababa University
dc.subjectGenerative Adversarial Networks
dc.subjectImage-to-Image translation
dc.subjectStyle Transfer
dc.titleTraining Stability of Multi-modal Unsupervised Image-to-Image Translation for Low Image Resolution Quality
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Yonas Desta.pdf
Size:
5.38 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: