Training Stability of Multi-modal Unsupervised Image-to-Image Translation for Low Image Resolution Quality
No Thumbnail Available
Date
2023-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The ultimate objective of the unsupervised image-to-image translation is to
find the relationship between two distinct visual domains. The major drawback
of this task is several alternative outputs from a single input image.
In a Multi-modal unsupervised image-to-image translation model, There exist
common latent space representations across images from many domains. The
model showed one-to-many mapping and its ability to produce several outputs
from a particular image source. One of the challenges with the Multi-modal
Unsupervised Image-to-Image Translation model is training instability, which
occurs when the model is training using a data set with low-quality images,
such as 128x128. During the training instability, the generator loss reduces
slowly because the generator is too hard trying to find a new equilibrium. To
address this limitation, We propose spectral normalization as a method for
weight normalization, which would limit the fitting ability of the network to
stabilize the training of the discriminator in networks. The Lipschitz constant
was a single hyperparameter that was adjusted. Our experiments used two
different datasets. The first dataset contains 5000 images, and we conducted
two separate experiments using data set with 5 and 10 epochs. In 5 epochs,
our proposed method has achieved overall training loss generator losses reduced
by 5.049 % on average and discriminator losses reduced by 2.882 %
on average. In addition, in 10 epochs, total training loss generator losses of
5.032% and discriminator losses of 2.864% decreased on average. The second
data-set contains 20000 images, and we used datasets with 5 and 10 epochs
in two different experiments. Over 5 epochs, our proposed method reduced
overall training loss generator losses by 4.745 % on average and discriminator
losses by 2.787 % on average. Furthermore, in 10 epochs, the average total
training loss was reduced, with generator losses of 3.092 % and discriminator
losses of 2.497%. In addition, During the transition, our approach produces
output images that are more realistic than multi modal unsupervised imageto-
image translation.
Description
Keywords
Generative Adversarial Networks, Image-to-Image translation, Style Transfer