MIDISpace: Finding linear directions in latent space for music generation

While recent works have shown that it is possible to find disentangled directions in the latent space of image generation networks, finding directions in the latent space of sequential models for music generation remains a largely unexplored topic. In this work, we propose a method for discovering linear directions in the latent space of a music generating Variational Auto-Encoder (VAE).

We use PCA, a statistical method to transform the input data such that the variation along the new axes is maximized. We apply PCA on the latent space activations of our model and find largely disentangled directions that change the style and characteristics of the input music. Our experiments show that the found directions are often monotonic, global and encode fundamental musical characteristics such as colorfulness, speed and repetitiveness. Moreover, we propose a set of quantitative metrics to describe different musical styles and characteristics to evaluate our results. We show that the found directions decouple content and can be utilized for style transfer and conditional music generation tasks.

More details are coming soon.

In ACM Creativity & Cognition 2022

MIDISpace: Finding linear directions in latent space for music generation

Abstract

Acknowledgments