In ACM Creativity & Cognition 2022

MIDISpace: Finding linear directions in latent space for music generation

Meliksah Turker*, Alara Dirik*, Pinar Yanardag
Bogazici University

Change in musical features of slow and colorful music -random 5 of each- as latent space is traversed in directions of PCs. Positive values in X axis indicate taking s number of steps in direction of corresponding PC, and negative values indicate the opposite direction. Local fluctuations/noise are due to sampling in VAE.


While recent works have shown that it is possible to find disentangled directions in the latent space of image generation networks, finding directions in the latent space of sequential models for music generation remains a largely unexplored topic. In this work, we propose a method for discovering linear directions in the latent space of a music generating Variational Auto-Encoder (VAE).

We use PCA, a statistical method to transform the input data such that the variation along the new axes is maximized. We apply PCA on the latent space activations of our model and find largely disentangled directions that change the style and characteristics of the input music. Our experiments show that the found directions are often monotonic, global and encode fundamental musical characteristics such as colorfulness, speed and repetitiveness. Moreover, we propose a set of quantitative metrics to describe different musical styles and characteristics to evaluate our results. We show that the found directions decouple content and can be utilized for style transfer and conditional music generation tasks.

More details are coming soon.


This publication has been produced benefiting from the 2232 International Fellowship for Outstanding Researchers Program of TUBITAK (Project No:118c321). We also acknowledge the support of NVIDIA Corporation through the donation of the TITAN X GPU and GCP research credits from Google.