In ICCV 2021

LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions

Oguz Kaan Yuksel^1*, Enis Simsar^2,3*, Ezgi Gulperi Er³, Pinar Yanardag³,

¹EPFL, ²TUM, ³Bogazici University

Video

Abstract

Recent research has shown that it is possible to find interpretable directions in the latent spaces of pre-trained Generative Adversarial Networks (GANs). These directions enable controllable image generation and support a wide range of semantic editing operations, such as zoom or rotation. The discovery of such directions is often done in a supervised or semi-supervised manner and requires manual annotations which limits their use in practice. In comparison, unsupervised discovery allows finding subtle directions that are difficult to detect a priori.

In this work, we propose a contrastive learning-based approach to discover semantic directions in the latent space of pre-trained GANs in a selfsupervised manner. Our approach finds semantically meaningful dimensions compatible with state-of-the-art methods.

LatentCLR Framework

Direction models on the space yielding latent code edits: D_k
Target feature layer in the pre-trained GAN: G_f
Contrastive objective function based on NT-Xent loss

z₁ is passed through directions models to obtain edits.

Generator is fed with obtained z_k until target layer.

Original z₁ is also fed into the generator.

The effects of each direction model is calculated by obtained feature representations with a subtraction.

Same steps are executed for all z_k in the batch.

Feature differences yielded by the same direction model is considered positive while others as negative pairs. The aim is to establish unique footprint in the target feature layer for each direction and semantic disentanglement.

Diversity of the directions

We propose to use contrastive learning on feature divergences to discover interpretable directions in the latent space of pre-trained GAN models such as Style- GAN2 and BigGAN.

StyleGAN2 Directions

Directions discovered by our method in FFHQ dataset with StyleGAN2 (left).
Different types of smile directions (denoted with SMILE 1-3) discovered by the non-linear approach (right).

BigGAN Directions

Class-specific directions discovered by our method in several ImageNet classes on the BigGAN model.

Transferability of directions

Our visual analysis shows that the directions learned from the one ImageNet class are applicable to a variety of ImageNet classes. Below you can find the transferred directions from the Bulbul class to various other ImageNet classes.

Zoom

Rotate

Contrast

Sitting

Comparison with other methods

We compare how the directions found on FFHQ differ across methods. Figure shows the visual comparison between several directions found in common by all methods, including the directions Smile, Lipstick, Elderly, Curly Hair , and Young.

Comparison of manipulation results on FFHQ dataset with Ganspace and SeFa methods. The leftmost image represents the original image, while images denoted with ↑ and ↓ represent the edited image moved in the positive or negative direction.

BibTeX

@misc{yüksel2021latentclr,
      title={LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions},
      author={Oğuz Kaan Yüksel and Enis Simsar and Ezgi Gülperi Er and Pinar Yanardag},
      year={2021},
      eprint={2104.00820},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Acknowledgments

This publication has been produced benefiting from the 2232 International Fellowship for Outstanding Researchers Program of TUBITAK (Project No:118c321). We also acknowledge the support of NVIDIA Corporation through the donation of the TITAN X GPU and GCP research credits from Google. We thank to Irem Simsar for proof-reading our paper.