Recent research has shown that it is possible to find interpretable directions in the latent spaces of pre-trained Generative Adversarial Networks (GANs). These directions enable controllable image generation and support a wide range of semantic editing operations, such as zoom or rotation. The discovery of such directions is often done in a supervised or semi-supervised manner and requires manual annotations which limits their use in practice. In comparison, unsupervised discovery allows finding subtle directions that are difficult to detect a priori.
In this work, we propose a contrastive learning-based approach to discover semantic directions in the latent space of pre-trained GANs in a selfsupervised manner. Our approach finds semantically meaningful dimensions compatible with state-of-the-art methods.
We propose to use contrastive learning on feature divergences to discover interpretable directions in the latent space of pre-trained GAN models such as Style- GAN2 and BigGAN.
Our visual analysis shows that the directions learned from the one ImageNet class are applicable to a variety of ImageNet classes. Below you can find the transferred directions from the Bulbul class to various other ImageNet classes.
We compare how the directions found on FFHQ differ across methods. Figure shows the visual comparison between several directions found in common by all methods, including the directions Smile, Lipstick, Elderly, Curly Hair , and Young.
Comparison of manipulation results on FFHQ dataset with Ganspace and SeFa methods. The leftmost image represents the original image, while images denoted with ↑ and ↓ represent the edited image moved in the positive or negative direction.@misc{yüksel2021latentclr,
title={LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions},
author={Oğuz Kaan Yüksel and Enis Simsar and Ezgi Gülperi Er and Pinar Yanardag},
year={2021},
eprint={2104.00820},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
This publication has been produced benefiting from the 2232 International Fellowship for Outstanding Researchers Program of TUBITAK (Project No:118c321). We also acknowledge the support of NVIDIA Corporation through the donation of the TITAN X GPU and GCP research credits from Google. We thank to Irem Simsar for proof-reading our paper.