In this paper, we propose a graph-based image-to-image translation framework for generating images. We use rich data collected from the popular creativity platform Artbreeder (http://artbreeder.com), where users interpolate multiple GAN-generated images to create artworks. This unique approach of creating new images leads to a tree-like structure where one can track historical data about the creation of a particular image. Inspired by this structure, we propose a novel graph-to-image translation model called Graph2Pix, which takes a graph and corresponding images as input and generates a single image as output. Our experiments show that Graph2Pix is able to outperform several image-to-image translation frameworks on benchmark metrics, including LPIPS (with a 25% improvement) and human perception studies (n=60), where users preferred the images generated by our method 81.5% of the time.
A sample lineage data up to 2 levels (Access the full tree via https://www.artbreeder.com/lineage?k=1fcdf872ec11c80e955bb5c1.) The creators of the images are annotated with labels
One of the most popular GAN-based creativity platforms is Artbreeder. The easy-to-use interface of the platform attracted thousands of users and enabled them to generate over 70 million GAN-based images. ArtBreeder helps users to create new images using BigGAN [4] or StyleGAN based models where users are able to adjust parameters or blend different images to generate new ones. Users can breed new images from a single one by editing genes such as ages, gender, ethnicity or create new ones by crossbreeding multiple images together. The unique ability of the crossbreeding functionality allows generated images to have lineage data where the ancestors of the generated image can be tracked in a tree-based structure.
The lineage structure of images generated with Artbreeder opens up a wide range of possible applications, but also presents unique challenges. The lineage information provides a tree-like structure in which one can trace the parents, grandparents, and further ancestors of a given image. However, it is not entirely clear how such a structure can be used in GAN-based models. For example, how can we generate a child image based on a list of its ancestors? One could try to use image-to-image translation methods such as Pix2Pix or CycleGAN and use any ancestor to generate the child image (e.g., feeding Parent1 and generating Child1 in Figure. However, this approach results in a significant loss of information since only one ancestor can be used (e.g., Parent2 or the grandparents are considered).
In this paper, we propose a novel image-to-image translation method that takes multiple images and their corresponding lineage structure and generates the target image as output. To propose a general solution, our framework takes a graph structure, considering the tree structure of lineage data as a sub-case. To the best of our knowledge, this is the first image-to-image translation method that uses a graph-based structure.
An illustration of Graph2Pix. During the generation process (shown on the left) our 2-layer GCN module takes A1 ... An, the ancestors of an image as an input, and generates the prediction. The discriminator (shown on the right) takes the concatenation of the input images A1 ... An and the generated image G(x) and creates a prediction. Note that we placed GCN after the convolutional layer to use it more efficiently.
We compared our method against several baselines quantitatively and qualitatively. Our quantitative experiments show that our method outperforms the competitors and improves LPIPS, FID, and KID scores by 25%, 0.3%, and 12%, respectively. Furthermore, our qualitative experiments show that human participants prefer the images generated by our method 81.5% of the time compared to its competitors.
A qualitative comparison of our method with the top performing competitors, Pix2PixHD and U-GAT-IT. First-level ancestors (Parent1 and Parent2) are shown on the left where the ground truth image is denoted with Child. As can be seen from the results, our method is able to incorporate the ancestor information when generating the images whereas Pix2PixHD and U-GAT-IT are limited to a single ancestor image (Parent1). Due to lack of space, only first-degree parents are shown, however a sample lineage for the bottom-left image can be seen on Artbreeder (https://www.artbreeder.com/lineage?k=f8c31131db9a73ac0425).
The results for each question in our human evaluation survey are shown on the left (the x-axis represents the image IDs, the y-axis represents the number of votes received for each method). The top-performing image, where participants unanimously found our method was most successful, is shown in the upper right. Our method is the least successful on the image shown on the bottom right, where participants preferred the Pix2PixHD result.
@inproceedings{graph2pix2021,
title={Graph2Pix: A Graph-Based Image to Image Translation Framework},
author={Gokay, Dilara and Simsar, Enis and Atici, Efehan and Ahmetoglu, Alper and Yuksel, Atif Emre and Yanardag, Pinar},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={2001--2010},
year={2021}
}
This publication has been produced benefiting from the 2232 International Fellowship for Outstanding Researchers Program of TUBITAK (Project No: 118c321). We also acknowledge the support of NVIDIA Corporation through the donation of the TITAN X GPU and GCP research credits from Google. We also would like to thank Joel Simon for their support in collecting the dataset.