While recent works have shown that it is possible to find disentangled directions in the latent space of image generation networks, finding directions in the latent space of sequential models for music generation remains a largely unexplored topic. In this work, we propose a method for discovering linear directions in the latent space of a music generating Variational Auto-Encoder (VAE).
We use PCA, a statistical method to transform the input data such that the variation along the new axes is maximized. We apply PCA on the latent space activations of our model and find largely disentangled directions that change the style and characteristics of the input music. Our experiments show that the found directions are often monotonic, global and encode fundamental musical characteristics such as colorfulness, speed and repetitiveness. Moreover, we propose a set of quantitative metrics to describe different musical styles and characteristics to evaluate our results. We show that the found directions decouple content and can be utilized for style transfer and conditional music generation tasks.
This publication has been produced benefiting from the 2232 International Fellowship for Outstanding Researchers Program of TUBITAK (Project No:118c321). We also acknowledge the support of NVIDIA Corporation through the donation of the TITAN X GPU and GCP research credits from Google.