Abstract

We create and study a generative model for Irish traditional music based on Variational Autoencoders and analyze the learned latent space trying to find musically significant correlations in the latent codes’ distributions in order to perform musical analysis on data. We train two kinds of models: one trained on a dataset of Irish folk melodies, one trained on bars extrapolated from the melodies dataset, each one in five variations of increasing size. We conduct the following experiments: we inspect the latent space of tunes and bars in relation to key, time signature, and estimated harmonic function of bars; we search for links between tunes in a particular style (i.e. “reels”) and their positioning in latent space relative to other tunes; we compute distances between embedded bars in a tune to gain insight into the model’s understanding of the similarity between bars. Finally, we show and evaluate generative examples. We find that the learned latent space does not explicitly encode musical information and is thus unusable for musical analysis of data, while generative results are generally good and not strictly dependent on the musical coherence of the model’s internal representation.

Introduction

In the following work, we study the use of generative models based on artificial neural networks applied to Irish folk music.

Similar to natural language, music follows specific rules and patterns: for example, one could think to create an iterative rule-based system, but such an approach quickly becomes impractical because of the difficulty of specifying such rules for high-level concepts (e.g. “ a meaningful sentence”, “a good melody” etc). Having a large dataset of Irish folk music at our disposal, we chose to use machine learning, which does not require an explicit formulation of rules but rather infers them from data. In particular, we use deep learning, which is the state of the art for both natural language and musical generation.

Inspired by “The Ai Music Generation Challenge 2022”, we were interested in creating a model capable of both generating and analyzing musical material: as such, we chose Variational Autoencoders as a starting point, because they are generative models and they learn an internal representation of the training data that we aim to use for musical analysis.

We expect to find meaningful links between the model’s internal representation and the musical properties of the data while being able to generate plausible and interesting Irish folk music.

Music has often entered the realm of computer science: from Hiller & Isaacson’s Illiac Suite in 1958 to the most recent deep models, algorithmic and AI-driven musical composition already has a long history. A variety of approaches have been proposed, from expert systems to Markov Chains; two of the most effective results come from Google’s Magenta team, which created MusicVAE [] and Music Transformer []. These models are really good at representation and generation of musical excerpts and have been a source of inspiration for numerous works, including the following.

Irish traditional music

By Irish traditional music, we mean a collection of tunes and melodies, mostly monophonic, with no specific harmonic accompaniment, native to Ireland and built up in centuries of continuous playing and composition. It is usually performed in “sessions”, where several musicians play the same melody at once, at times backed by a rhythm section (for example a pianist or a guitarist, or an accordion).

This paper focuses on this particular genre of music for a few reasons:

the presence of an extensive, homogenous, and cleaned dataset of Irish traditional music in ABC notation;
a previous history of being used in AI composition and analysis;
its monophonic structure, allowing us to consider only one voice;
the regularity of tune structures (mostly AB or ABC) and the repeating elements and clichès among melodies.

All of this allows us to easily build and train models and simplifies the process of analyzing generated material.

Moreover, each melody in the dataset is stored in ABC Notation Format, which is a textual, human-readable encoding for musical notation; this allows the repurposing of existing text-based models with few modifications.

As stated before, other works have targeted the problem of Irish folk music generation.