September 13-15 2022, Online, Anywhere

3rd Conference on AI Music Creativity

Welcome to the 3rd Conference on AI Music Creativity!

This year's edition was hosted from Japan and held online, with auspices to return to in-person mode in the very near future.

AIMC 2022 brought together several collaborating institutions and partners in a three-day program featuring paper presentations, concerts, panel discussions, workshops, and three keynotes.

Important Dates

CFP launch: 15 January 2022
Submission deadline: 9 May 2022 (AoE)
Notification of acceptance: 11 July 2022
NEW! Camera-ready deadline: 25 July 2022
Conference dates: 13-15 September 2022

Conference Theme

The Sound of Future Past — Colliding AI with Music Tradition

AI is becoming increasingly ubiquitous in the music industry, allowing end users to have easy access to music experimentation and creation, for example generating original compositions, cinematic soundtracks, orchestrations, digital mastering or novel remixes from audio stems, among other things. Of course, these newly acquired affordances are not devoid of political, sociological and aesthetic implications, and shape our understanding and fruition of the musical experience. These implications are particularly salient when AI techniques & methods are applied to the generation of music modeled on traditional musical expressions and heritages. For example, AI involvement with musical traditions foregrounds issues regarding copyright, technocratic appropriation and dilution of musical expression specificities, standardization on genres, perpetuation of embedded musico-cultural biases and cliches, uniformization of the aesthetic experience as a result of indiscriminate usage of music corpora, to name but a few.

In this edition of AIMC, we were particularly interested in contributions that addressed themes such as (but not limited to):

Call for Participation


There are two formats for paper submissions:

- Full papers: 8 pages maximum
- Short papers: 4 pages maximum

Note that the maximum length does not include references.


All papers must use the AI Music Creativity 2022 template, which you can download here (do not use previous years' templates). Alternatively, you might want to use our Overleaf template. Papers must be submitted via CMT3.
Submissions must be anonymized and will be subject to peer-review by at least three independent reviewers. We encourage audio and video material to accompany and illustrate submitted papers. We ask that authors arrange for their web hosting of audio and video files, and give URL links to all such files within the text of the submitted paper.

At least one author of each accepted submission will be required to register for the conference.


All accepted papers, regardless of their format, will be included in the Conference Proceedings. The proceedings will be published as an electronic Open Access publication with DOI. Selected papers will be published, in expanded form, in a special issue of the Journal of Creative Music Systems.

We invite artistic submissions that deal with topics related to AI music creativity, human-machine co-creativity, and machine learning. We especially encourage and promote submissions that address AI from diverse perspectives with regard to questions of ethics, aesthetics, ecology, performance, and gender. Due to the format of this year's edition, the music program will take place online and feature recorded material, online installations and live streamed discussions with the participating artists. Within these constraints, the conference provides a number of different formats in order to represent a broad range of artistic approaches.


There are three formats for music work submissions:

- Concert/Performance
- Fixed Media
- Web-based

Special Call for Music Generation

AI x Japanese Music Tradition

In this year's edition of AIMC, we are particularly interested in how the conference's theme maps to the music tradition of Japan. We note a substantial lack of musical corpora and resources, to this end; for example, the Essen Folk Song or the Nottingham Folk Music Database have long been used in generative and musicological tasks of computational nature, while others have explored traditional Irish folk music or Canadian folk songs. While there exist collections of Carnatic, Turkish Makam, Beijing Opera, and Arab Andalusian music, little exists for traditional Japanese music. For this year's special call for music generation, we are inviting submissions of AI-driven and Japanese-inspired music works and soundscapes. Criteria for submissions are:

  • Music works must comprise an element of AI technology, or what is perceived/referred to as such (e.g., machine learning, deep learning, seemingly creative and autonomous generative processes, computer-assisted composition, etc.)
  • Said element must use or be based on Japanese folk music for the generation of musical output/interaction
  • All works must be original and composed/designed/performed specifically for this music generation challenge


For all formats, please submit the following via CMT3:

  • A two-page A4-portrait PDF containing
    • A description of your work and an explanation of how it relates to the topics of the conference
    • A link to audio files (fixed media), video recordings (concert/performance) or web-based live demo
    • A biography (name, affiliation, short biography up to 150 words)

  • A score and/or a link to a repository with the source code, if applicable

We strongly encourage submissions accompanied by a short paper (follow instructions in the Papers call). All submissions will be reviewed by independent members of the selection committee. Once these reviews have been completed the reviewers and the chairs will discuss the works, the reviews and their technical feasibility. Reviewers will be anonymous to the authors. At least one artist per work must register for the conference.


Workshop and tutorial proposals should include:
- A two-page workshop/tutorial description, outlining its aims, scope, presentation and participation formats
- Bios of the workshop leaders/tutors
- A detailed description of technical requirements
- A link to audio and/or video material that showcases the technological components and presentation formats described in the proposal

All workshop and tutorial proposals must be submitted to [deactivated link]. Submissions do not need to be anonymized and will be evaluated by a review panel based on their feasibility, impact and relevance to the conference themes and topics. Software and hardware components should be tested and working by the time of submission. All workshops and tutorials will be held online.

About AIMC

AIMC's principal goal is to bring together scholars and artists interested in the virtual emulation of musical creativity and its use for music creation, and to provide an interdisciplinary platform to promote, present and discuss their work in scientific and artistic contexts.

You can find the previous editions here.

Organising Committee

Conference Chair & Webmaster

Stefano Kalonaris, RIKEN, Japan

Artistic Program Chairs

Akihiro Kubota, Tama Arts University, Japan

Scientific Program Chairs

Taketo Akama, Sony Computer Science Laboratories, Inc., Japan
Eric P. Nichols, Zillow Group, USA

Workshops & Tutorials Chairs

Hiroya Miura, RIKEN, Japan
Iris Yuping Ren, Utrecht University, Netherlands

Virtual Conference Chairs

Anna Aljanaki, University of Tartu, Estonia
Gianluca Micchi, IRIS Audio Technologies, UK

Public Relations

Motoko Kakubayashi, Freelance Writer, Japan

Steering Committee

Philippe Pasquier, Simon Fraser University, Canada
Robin Laney, The Open University, UK
Anna Jordanous, University of Kent, UK
Roisin Loughran, Dundalk Institute of Technology, Ireland
Steven Jan, University of Huddersfield, UK
Valerio Velardo, MusiMAP
Bob L. T. Sturm, Royal Institute of Technology, Sweden
Artemi-Maria Gioti, University of Music Carl Maria von Weber Dresden, Germany

Scientific Program Committee

Anna Aljanaki, Cyran Aouameur, Oded Ben-Tal, Andrea Bolzoni, Jean-Pierre Briot, Ke Chen, Kin Wai Cheuk, Laura Cros Vila, Tom Davis, Roger Dean, Ken Déguernel, Hao-Wen Dong, Andrea Franceschini, Satoru Fukayama, Siddharth Gurunani, Simon Holland, Steven Jan, Anna Jordanous, Tetsuro Kitahara, Stefan Lattner, Keon Ju Lee, Wei-Hsiang Liao, Roisin Loughran, Marco Martinez Ramirez, Panos Mavromatis, Gianluca Micchi, Ashis Pati, Elio Quinton, Adam Roberts, Robert Rowe, Richard Savery, Hugo Scurto, Ian Simon, Jason Smith, Bob Sturm, Kivanç Tatar, George Tzanetakis, Mauro Vallati, Juan Vasquez, Valerio Velardo, Benedikte Wallace, Anna Weisling, Kazuyoshi Yoshii, Maximos Kaliakatsos-Papakostas

Artistic Program Committee

Alo Allik, Tim Blackwell, Oliver Bown, Teresa Carrasco Garcia, Scott Cazan, Marko Ciciliani, Cathy Cox, Agostino Di Scipio, Frédéric Dufeu, Robert Ek, Artemi-Maria Gioti, Owen Green, Bjarni Gunnarsson, Kazuhiro Jo, Shelly Knotts, Adam Linson, Scott McLaughlin, Jonathan Reus, Alexander Sigman, Nikos Stavropoulos, Pierre Alexandre Tremblay, Sebastian Trump, Iannis Zannos, Ivan Zavada



The proceedings have been published as an electronic Open Access publication with DOI. You can find all the papers here or bulk download them from here.


Visit the AIMC 2022 YouTube channel for the video recordings of keynote speeches, spotlight talks, panel discussions, music sessions, and paper presentations.



Click on the different tabs to view details for each day.

Note: all times are JST (UTC + 9) in 24-hour format

Day One

Tuesday, September 13

10:45 – 11:00 Opening Session
11:00 – 12:00 Industry Session
12:00 – 14:00
14:00 – 16:00 Workshop
16:00 – 17:00 Industry Session
17:00 – 18:00
18:00 – 18:15 Spotlight Talk
18:15 – 19:00 Keynote Speech
19:00 – 20:00
20:00 – 22:00 Workshop
Day Two

Wednesday, September 14

11:00 – 13:00 Paper Session
13:00 – 14:00 Music Session
14:00 – 17:00
17:00 – 17:15 Spotlight Talk
17:15 – 18:00 Panel Discussion
18:00 – 18:15 Spotlight Talk
18:15 – 19:00 Keynote Speech
19:00 – 20:00
20:00 – 22:00 Paper Session
22:00 – 23:00 Music Session
Day Three

Thursday, September 15

12:00 – 13:00 Music Session
13:00 – 15:00
15:00 – 15:45 Panel Discussion
16:00 – 18:00 Paper Session
18:00 – 18:15 Spotlight Talk
18:15 – 19:00 Keynote Speech
19:00 – 20:00
20:00 – 22:00 Paper Session
22:00 – 23:00 Music Session
23:00 – 23:15 Closing Session
JST 10.45 – 11:00 | UTC 01.45 – 02:00

Opening Session

Session Chair: Stefano Kalonaris

JST 11.00 – 12:00 | UTC 02.00 – 03:00

Industry Session

Session Chair: Taketo Akama


JST 14.00 – 16:00 | UTC 05.00 – 07:00


by Qosmo


This 2-hour tutorial walks you through the process of building real-time timbre transfer models using DDSP and RAVE algorithms. We will then demonstrate how you can make these models available for use by musicians and creators wrapping them with Neutone SDK and finally running the models in DAWs.


The Neutone project was launched in the spring of 2022 with the aim of narrowing the gap between AI research and creativity. Our Plugin/SDK/Community framework makes latest AI technologies accessible for users with little technical requirements. In this workshop we hope to involve AI researchers with wide range of technical specialities to our movement.


  • Training real-time audio models: RAVE and DDSP has been some of the most exciting real-time sound modeling algorithm. We will set up the environment for development together and share our tips on how to prepare data and build quality models using these algorithms.
  • Neutone architecture: a walkthrough the components of Neutone framework and the Neutone VST/AU plugin interfaces between an AI model and a DAW, taking care of all the trickiness of DSP
  • Wrapping AI models: we will outline the development workflow of preparing PyTorch audio processing models for use in DAWs using a simple SDK and demonstrate how easily they can be distributed to potential users

Workshop leaders/tutors:

Akira Shibata is COO of Qosmo, Inc. where he is responsible for the business execution and products. Akira has an extensive background in applying AI/ML to wide range of industrial use cases, previously being CEO of DataRobot, Japan. Akira received Ph.D. in Experimental Particle Physics from Queen Mary, University of London.
Andrew Fyfe is a Tech Lead on the Neutone project at Qosmo, Inc. He is an AI and DSP engineer who has worked at Krotos Audio, Audio Imperia and Otago Engineering, developing audio software for music/sound production. His PhD research focuses onvNeural Audio Synthesis, Neural Audio effects and developing tools for artists thatvutilise AI audio technologies. Andrew is currently pursuing his PhD at the Universityvof Glasgow.
Bogdan Teleaga is an AI Engineer at Qosmo, Inc. where he is building different kinds of AI solutions for music related use cases. His work includes building innovative solutions as well as deploying them into production at different levels of the stack. Bogdan graduated from the University of Amsterdam and previously worked at CERN and startups in the cloud computing and AI fields.
Naotake Masuda works as an AI engineer for the Neutone project, building new AI models for Neutone. He is a Ph.D student at the University of Tokyo and his main research interests are audio synthesis, machine learning, and AI-assisted music production. His research focuses on AI systems for assisting the use of synthesizers. He has worked on the FlowSynth project during his time as an intern at IRCAM.

JST 16.00 – 17:00 | UTC 07.00 – 08:00

Industry Session

Session Chair: Taketo Akama


JST 18.00 – 18:15 | UTC 09.00 – 09:15

Spotlight Talk

by Valerio Velardo

In this talk, I present a post-mortem analysis of Open Source Research. This is a collaborative AI Music research project run by The Sound of AI Community. The project lasted for 1.5 years, it involved >150 people with different backgrounds and levels of expertise, and led to the publication of a paper at AIMC2022. I discuss how we managed the project, set goals, and organised members. I also share insights, DOs, and DONTs for community-driven research projects.

Valerio Velardo is an AI music consultant with experience in R&D and production. He is the founder of The Sound of AI, the largest YouTube channel and online community on AI Music. He works as MLOps lead at Utopia Music.

JST 18.15 – 19:00 | UTC 09.15 – 10:00

Keynote Speech

by Nao Tokui

How can we use AI to enhance our creativity and create something original? This talk will present a series of music performances based on real-time interaction with AI models, including dance music performances using generative music models in both symbolic and signal domains. I'll also introduce a deep learning-based real-time audio processing tool/platform for AI researchers and musicians. By illustrating the process of realizing the performances and how I used the tool, I'd like to discuss how we can create AI music tools musicians actually use and the importance of "misusable" AI.

Nao Tokui is an artist, researcher, and associate professor at Keio University. Tokui received his Ph.D. from The University of Tokyo on his research on Human-Computer Interaction and Artificial Intelligence (AI). While pursuing his Ph.D., he released his first album and collaborated with Nujabes, a legendary Japanese hip-hop producer. He has been exploring the potential expansion of human creativity through the use of AI. His recent works include a collaboration with Brian Eno and AI DJ Project, where a Machine Learning-based DJ plays alongside a human DJ (Tokui, himself), taking turns selecting and playing one song at a time. His works have been exhibited at New York MoMA, Barbican Centre (London), and InterCommunication Center (Tokyo), and more. In April 2019, he started his professorship at Keio University and founded Computational Creativity Lab to advance his practices in research and educational environment as well. He published his first book on AI and Creativity in January 2021.

JST 20.00 – 22:00 | UTC 11.00 – 13:00


by Bob L. T. Sturm

Description & Aim

The Machine Folk Music School introduces participants to Ai-generated (machine) folk music through practice - as if such a thing has been done for centuries.


One AI-generated folk tune will be taught and discussed. A brief presentation of the originating machine learning system will be given. There will be a pause in the middle of the school as well.


The tune will first be performed. Then it will be taught gradually by repeating small phrases and combining them to form the parts. Participants should be comfortable with their musical instrument of choice and be able to learn by ear (but music notation will be provided).

Workshop leader/tutor

Bob L. T. Sturm has been experimenting with machine folk music since building a folk-music Ai in 2015 “for a laugh”. He runs the blog, Tunes from the Ai Frontiers, where every week a new machine folk tune is premiered. In 2020, he founded The Society for the Preservation and Promotion of Machine Folk Music (v1.1), with support of the MUSAiC project (Music at the Frontiers of Artificial Creativity and Criticism, ERC-2019-COG, No. 864189). One of the remits of The Society is the organization of Machine Folk Music Schools – which have appeared at AIMC 2020 and 2021. Sturm has also been learning Irish traditional music and accordion since 2018.

JST 11.00 – 13:00 | UTC 02.00 – 04:00

Paper Session

Session Chair: Eric P. Nichols

  • Sarah Lecompte-Bergeron, Dominic Thibault, and I Putu Arya Deva Suryanegara. Automatic Transcription of "Gambang" Balinese Gamelan

    This article describes the automatic transcription process of a gambang ensemble, a type of Balinese gamelan, from multitrack audio recordings. This project showcases the use of artificial intelligence techniques that have proven efficient in the literature, namely multi-layer perceptron classifiers trained on chroma analysis, and here adapted to repertoire that offers specific challenges due to its musical realities. After describing the musical formation and its musical context, we will describe the method used for data acquisition, the analysis process of aforesaid dataset and transcription, concluding with an evaluation of the process' efficiency and discussion on further developments of the algorithm.

  • Philon Nguyen and Eldad Tsabary. Random Walks on Neo-Riemannian Spaces: Towards Generative Transformations

    Random walks, fractional Brownian motion and stochastic processes have been used extensively by composers such as Iannis Xenakis and others, creating instantly recognizable textures. A trained ear can differentiate a uniform random walk from a Poisson process or an fBm process and random rotations. In the opera Sophocles: Antigone by one of the authors of this paper, random walks on neo- Riemannian PLR spaces were experimented with yielding mixed impressions of process music and post-romantic chromaticism. When the random walk is steered by transformational rules, special textures and harmonies emerge. We propose a new kind of parameterizable random walks, a generative system, on a space of arbitrary length chords equipped with an arbitrary distance measure steered from a customizable corpus learned by the system. The corpus provides a particular texture and harmony to the generative process. The learned neo-Riemannian spaces equipped with some distance measure provide the transformational rule base of the concatenative synthesis process.

  • Elliott Waissbluth, Jon Gillick and Carmine Cella. Synthesis by Layering: Learning a Variational Space of Drum Sounds

    In this work, we demonstrate a variational autoencoder designed to reconstruct drum samples using linear combinations from a small predefined library of existing samples. Inspired by the music production practice of layering two or more samples on top of each other to create rich and unique textures, we synthesize drum sounds by producing sparse sets of mixing coefficients to apply to the predefined library, which are then layered to create new audio samples. By training this model to approximate a range of professionally produced and recorded drum samples, we aim to learn a distribution over possible layering strategies given a fixed sample library, which we can subsequently sample from or otherwise manipulate. We find that varying a particular dimension of the latent vectors in the space learned by the model does not simply linearly scale the mixing weights; rather, it smoothly varies the perceptual nature of the sample by swapping different samples in and out of the sparse mixture. We present a user-interface prototype to engage intuitively with our system, discuss the performance of our modeling approach, and highlight potential applications in a studio production environment.

  • Luke Dzwonczyk, Carmine-Emanuele Cella, Léo Chédin and Helene-Camille Crayencour. Source Separation Methods for Computer-assisted Orchestration

    In this paper, we study the possibility of adding source separation as a pre-processing step to the computer-assisted orchestration process. We first discuss the motivation of this addition and its potential to increase the quality of orchestrations of multi-layered sounds. Second, we select several state-of-the-art models for both music source separation (separation of instruments) and universal sound separation (separation of arbitrary sounds), and compare their effectiveness for the task of orchestration. We assess which methods best suit the needs of orchestration by applying them on our own target sounds, orchestrating the separated outputs, and finally comparing them to the orchestration of the same target without separation. Our experiments show that source separation improves the quality of orchestrations, and the more accurate the separation, the better the resulting orchestration. Finally, we compare unsupervised methods to supervised methods for separation, and comment on the effect of training data selection on performance of supervised methods.

  • Atsuya Kobayashi, Ryogo Ishino, Ryuku Nobusue, Takumi Inoue, Keisuke Okazaki, Shoma Sawa and Nao Tokui. MR4MR: Mixed Reality for Melody Reincarnation

    There is a long history of an effort made to explore musical elements with the entities and spaces around us, such as musique concrète and ambient music. In the context of computer music and digital art, interactive experiences that concentrate on the surrounding objects and physical spaces have also been designed. In recent years, with the development and popularization of devices, an increasing number of works have been designed in Extended Reality to create such musical experiences. In this paper, we describe MR4MR, a sound installation work that allows users to experience melodies produced from interactions with their surrounding space in the context of Mixed Reality (MR). Using HoloLens, an MR head-mounted display, users can bump virtual objects that emit sound against real objects in their surroundings. Then, by continuously creating a melody following the sound made by the object and re-generating randomly and gradually changing melody using music generation machine learning models, users can feel their ambient melody "reincarnating".

  • Aiko Uemura and Tetsuro Kitahara. Morphing-based Reharmonization with VAE: Reducing Dissonance with Consonance-based Loss Function

    This study proposes a model to reduce dissonant chords in morphing-based reharmonization using a variational autoencoder (VAE). The conventional morphing of chords generates dissonance because the generated results are supplemented by the surrounding learning data in the latent space without considering consonance. Therefore, this study focuses on degrees of dissonance that can be calculated from pitch intervals. We define an ideal consonance degree based on the dissonance degree, and we implement a dissonance penalty as a simple network structure by adding nodes to the VAE so that a co-occurrence vector of pitch interval computed from each chord tone approaches the ideal consonance degree. Our objective evaluation demonstrates that the number of standard triads increased compared to the conventional VAE morphing method when the model was used. In addition, the number of non-standard and dissonant chords decreased through our model.

JST 13.00 – 14:00 | UTC 04.00 – 05:00

Music Session

Session Chair: Cathy Cox

  • Yiğit Kolat. Inference Engines

  • Fernando Egido. Collaborametrum

  • Jack Walker. Power Trio

JST 17.00 – 17:15 | UTC 08.00 – 08:15

Spotlight Talk

by Ed Newton-Rex

An overview of how companies are commercialising AI music creation.

Ed Newton-Rex founded Jukedeck, an AI music generation company, which was acquired by ByteDance in 2019. At ByteDance, he ran the Music Intelligence lab, then led Product in Europe for TikTok. Since then, he has been a Product Director at Snap, working on music collaboration. He is a mentor at Abbey Road Studios’ music startup incubator, Abbey Road Red, and he composes choral music, which is published by Boosey & Hawkes.

JST 17.15 – 18:00 | UTC 08.15 – 09:00

Panel Discussion

Session Chair: Stefano Kalonaris


Bogdan Teleaga is an AI Engineer at Qosmo, Inc. where he is building different kinds of AI solutions for music related use cases. His work includes building innovative solutions as well as deploying them into production at different levels of the stack. Bogdan graduated from the University of Amsterdam and previously worked at CERN and startups in the cloud computing and AI fields.
Jennifer Walshe is a composer and performer whose music has been commissioned, broadcast and performed all over the world. Many of her recent works involve AI. For ULTRACHUNK, a collaboration with artist Memo Akten, Walshe spent a year filming herself improvising to provide material for an ML system to train on. Her third solo album, A Late Anthology of Early Music Vol. 1: Ancient to Renaissance, released on Tetbind in 2020, uses AI to rework canonical works from early Western music history. A Late Anthology was chosen as an album of the year in The Irish Times, The Wire and The Quietus. Walshe is Professor of Composition at the University of Oxford.
Harry Lehmann studied physics and received his Ph.D. in philosophy from the University of Potsdam in Germany with a thesis on Niklas Luhmann's sociology of art. Has since published a variety of essays and books on art philosophy, music philosophy, and aesthetics, most notably the 2012 book "The Digital Revolution of Music, A Philosophy of Music." This year the follow-up volume to this, also published by Schott Music, will appear with the title "Music and Reality, Models of a Philosophy of Music".
Alexis André is a researcher, artist and designer aiming at redefining entertainment. In this golden age of computation and data overflow, why is our entertainment still designed to be consumed in a passive way? A few media are offering interactive experiences, but none of them are designed specifically for you. Alexis is working towards a future where you could enjoy unique experiences that were tailored to your preferences, where the power of generative systems is leveraged to offer individually custom-created pieces.
Enongo Lumumba-Kasongo (a.k.a., SAMMUS) is a Black-feminist-rap-artist and -producer from Ithaca, NY with family roots in Côte D'Ivoire and the Congo. She is currently the David S. Josephson Assistant Professor of Music at Brown University, a member of the steering committee for Brown's science, technology, and society program, and a member of theKEEPERS, a Hip Hop collective that is currently developing the most comprehensive digital archive to map the international contributions of womxn and girls across Hip Hop's 50-year history.
Hideki Nakazawa is a representative of the Artificial Intelligence Art and Aesthetics Research Group (AIAARG). His activities include "Silly CG" in the 1990s, "Methodicist Manifesto" in 2000, "New-Methodicist Manifesto" in 2010, and "Manifesto of Artificial Intelligence Art and Aesthetics" in 2016; a patent related to 3D printing, the publication of books including "Art History: Japan 1945-2014," and the release of his CD "Nakazawa Hideki Music Works."

JST 18.00 – 18:15 | UTC 09.00 – 09:15

Spotlight Talk

by Rujing Stacy Huang

In this talk, I examine the visible optimism surrounding an AI-enabled future blessed with the "democratic" production of music: what does it really mean to "democratize" music? What is the ethical significance of this "wave of democratization"? How can we grasp the shifting notions of talent, virtuosity, musical labor, and the "artist-genius" as new technologies continue to alter the social relations of creative production?

Rujing Stacy Huang is a musicologist, singer-songwriter, and currently Presidential Postdoctoral Fellow at the University of Hong Kong. In 2020-21, she was a postdoc in the ERC-funded "MUSAiC" project (ERC-2019-COG No. 864189). She is also the Co-Organizer of the AI Song Contest, and most recently the founder and CEO of her startup, Project Grain (under Grain Music International Ltd.). She completed her PhD in Ethnomusicology from Harvard University in 2019.

JST 18.15 – 19:00 | UTC 09.15 – 10:00

Keynote Speech

By Dorien Herremans

In its more than 60-year history, music generation systems have never been more popular than today. While the number of music AI startups are rising, there are still a few issues with generated music. Firstly, it is notoriously hard to enforce long-term structure (e.g. earworms) in the music. Secondly, by making the systems controllable in terms of meta-attributes like emotion, they could become practically useful for music producers. In this talk, I will discuss several deep learning-based controllable music generation systems that have been developed over the last few years in our lab. These include TensionVAE, a music generation system guided by tonal tension; MusicFaderNets, a variational autoencoder model that allows for controllable arousal; and a seq2seq controllable lead sheet generator with Transformers. Finally, I will discuss some more recent projects by our AMAAI lab, including generating music that matches a video.

Dorien Herremans is an Assistant Professor at Singapore University of Technology and Design, where she is also Director of Game Lab. Before joining SUTD, she was a Marie Sklodowska-Curie Postdoctoral Fellow at the Centre for Digital Music at Queen Mary University of London, where she worked on the project: "MorpheuS: Hybrid Machine Learning – Optimization techniques To Generate Structured Music Through Morphing And Fusion". She received her Ph.D. in Applied Economics, and graduated as a Business Engineer in management information systems at the University of Antwerp in 2005. After that, she worked as a consultant and was an IT lecturer at the Les Roches University in Bluche, Switzerland. Dr. Herremans' research interests include AI for novel applications such as music and audio. At SUTD she leads the AMAAI Lab on Audio, Music, and AI.

JST 20.00 – 22:00 | UTC 11.00 – 13:00

Paper Session

Session Chair: Roisin Loughran

  • Anna Aljanaki. Attitude towards and evaluation of computer-generated music in music listeners and musicians

    Despite accepting the involvement of computers in such important areas as finance, self-driving and surveillance, society still struggles with the concept of computer as an originator of art, and has difficulties with defining value and place of computer-generated art. In this paper we will explore the questions of value of computer music from the perspectives of music listeners and professional musicians with different backgrounds in a series of open-ended interviews. Several solutions for the problem of lack of intentional agency are proposed.

  • Bob L. T. Sturm. The Ai Music Generation Challenge 2021: Summary and Results

    We discuss the design and results of The Ai Music Generation Challenge 2021 and compare it to the challenge of the previous year. While the 2020 challenge was focused on the Irish double jig, the 2021 challenge was focused on a particular kind of Swedish traditional dance music, called slängpolska. Six systems participated in the 2021 challenge, each generating a number of tunes evaluated by five judges, all professional musicians and experts in the music style. In the first phase, the judges reject all tunes that are plagiarised, or that have incorrect meter or rhythm. In the second phase, they score the remaining tunes along four qualities: dancability, structure coherence, formal coherence, and playability. The judges know all the tunes are computer generated, but do not know what tunes come from what systems, or what kinds of machine learning and data are involved. In the third stage, the judges award prizes to the top tunes. This resulted in five tunes garnering first and second prizes, four of which come from one particular system. We perform a statistical analysis of the scores from all judges, which allows a quantitative comparison of all factors in the challenge. Finally, we look to the 2022 challenge.

  • Axel Chemla–Romeu-Santos and Philippe Esling. Challenges in creative generative models for music: a divergence maximization perspective

    The development of generative Machine Learning (ML) models in creative practices, enabled by the recent improvements in usability and availability of pre-trained models, is raising more and more interest among artists, practitioners and performers. Yet, the introduction of such techniques in artistic domains also revealed multiple limitations that escape current evaluation methods used by scientists. Notably, most models are still unable to generate content that lay outside of the domain defined by the training dataset. In this paper, we propose an alternative framework, starting from a new general formulation of ML objectives, that we derive to delineate possible implications and solutions that already exist in the ML literature (notably for the audio and musical domain). We also discuss existing relations between generative models and computational creativity and how our framework could help address the lack of creativity in existing models.

  • Luc Döbereiner. Artistic Potentials of Fallacies in AI Research

    This paper seeks to identify aesthetically productive problems. Based on Melanie Mitchell's much-discussed 2021 paper "Why AI is Harder Than We Think," it seeks to outline four areas of artistic potential that are related to the four "fallacies" in AI research identified by Mitchell. These are underlying assumptions of AI research that have contributed to overconfident predictions. The paper uses these fallacies as a point of departure to discuss the relation of AI research and artistic practice, not from a utilitarian or problem-solving point of view, but rather in order to identify how frictions and fallacies disclose aesthetically productive areas. The paper seeks to demonstrate how these fallacies are not only shortcomings with regard to our understanding of intelligence, but how they are actually at the core of what constitutes aesthetics and artistic practice.

  • Steven Jan. Universal Acid in the Computer Chip: Music, Memetics and Metacreation

    Universal Darwinism (UD) (Plotkin, 1995) holds that the "evolutionary algorithm" (Dennett, 1995, pp. 50–52) operates across the interconnected realms of a "recursive ontology" (Velardo, 2016) that binds together all that exists. Indeed, UD maintains that all phenomena in the universe are emergent properties of Darwinian processes of variation, replication and selection. If true, an evolutionary view of culture must take priority over more "creationist" accounts that rely upon the mysterious intercession of inspiration and imagination and the supposed conscious agency of the composer (or programmer). The most widely accepted (and critiqued) theory of cultural evolution – memetics – arguably has significant explanatory power for human-generated music (HGM), and accepting the logic of UD it follows that computer-generated music (CGM) must in some sense be "accountable" to evolutionary precepts. This paper explores how AIMC can be understood in the light of memetics (as a window on the operation of UD in human culture), even when the outputs of generative systems seem far removed from the structural norms, aesthetic values and sound-worlds, of HGM.

  • Mia Windsor. Using raw audio neural network systems to define musical creativity

    This paper will use the hacker-duo Dadabots (who generate raw audio using SampleRNN) and OpenAI's Jukebox project (who generate raw audio using a hierarchical vq-vae transformer) as case studies to assess whether machines are capable of musical creativity, how they are capable of musical creativity, and whether this helps to define what musical creativity is. It will also discuss how these systems can be useful for human creative processes. The findings from evaluating Dadabots' and OpenAI's work will firstly demonstrate that our assumptions about musical creativity in both humans and machines revolve too strongly around symbolic models. Secondly, the findings will suggest that what Boden describes as 'transformational creativity' can take place through unexpected machine consequences.

JST 22.00 – 23:00 | UTC 13.00 – 14:00

Music Session

Session Chair: Cathy Cox

  • William A. Thompson IV. Seven Piano Etudes Speaks the Moody Machine

  • Agustin Jorge Spinetto and Andres Duarte Loza. 30-hour dream – 30時間の夢

  • Kiyoshi Furukawa and Yota Morimoto. Beethoven meets the Wizard of Oz

  • Axel Chemla–Romeu-Santos. Aletheia

JST 12.00 – 13:00 | UTC 03.00 – 04:00

Music Session

Session Chair: Akihiro Kubota

  • Hongshuo Fan. Conversation in the Cloud

  • Mark Hanslip. 'SoloSoloDuo'

  • Farzaneh Nouri. Énacteur x Énacteur I

JST 15.00 – 15:45 | UTC 06.00 – 06:45

Panel Discussion

Session Chair: Tomasz Rutkowski


Elvira Brattico is a Professor at the Center for Music in the Brain, Aarhus University, Denmark and at the University of Bari Aldo Moro, Italy). She has worked with the EU Project Tuning the Brain for Music, directed the Aesthetics module of the Finnish Centre of Excellence in Interdisciplinary Music Research. Her main interests cover neuroplasticity, neuroaesthetics, auditory prediction, and individual differences in auditory processing.
Tatsuya Daikoku is an Assistant Professor at The University of Tokyo, and studies how musical creativity emerges in the brain, using neural and computational approaches. He is also a composer and is interested in developing a neofuturistic music based on brain's statistical learning.
Natalia Polouliakh is a researcher at Sony CSL Tokyo. Her interest is to understand how music can impact psycho-somatics to create technologies for wellness. Natalia is working on creating a recommendation technology that comprehends and optimizes the person's challenge towards education, healthcare and happiness, using AGCT-Deep and AI methods. Natalia serves as Associate Editor in the Journal of Frontiers of Systems Biology, Sections Interactive Genetics and Genomics and Interactive Systems Neuroscience.
Max Frenzel is an AI researcher, health tech entrepreneur and bestselling author. He is the creator of the biofeedback app Yudemon, and part of the healthcare team at Woven City, a prototype city of the future currently under construction at the base of Mt. Fuji.
Kat Agres is an Assistant Professor at the Yong Siew Toh Conservatory of Music, at the National University of Singapore, where she leads the Music Cognition lab. She recently co-lead an interdisciplinary, 15-author roadmap paper on 'Music, Computing, and Health', and is PI of a 4-year grant to develop a 'music-based Brain-Computer Interface' that leverages neurofeedback and a real-time affective music generation system for emotion regulation in listeners.
Shinya Fujii is an Associate Professor at the Keio University Shonan Fujisawa Campus (SFC) and the Director of NeuroMusic / x-Music Laboratories. He was awarded a scholarship from AN Music School Kyoto as a drummer and completed his Ph.D. in Human and Environmental Studies at Kyoto University by studying neuro-muscular mechanisms underlying skilled drumming movements. His research interests lie in the Neurosciences and Music, the neural origins of human musicality to perceive and produce rhythm, beat and groove.

JST 16.00 – 17:30 | UTC 07.00 – 08:30

Paper Session

Session Chair: Taketo Akama

  • Jeremy J. Lee, Carmine Cella and Helene-Camille Crayencour.Vivace: Web Application for Real-Time feedback on Piano Performance

    Software that provide feedback on music performance are being increasingly used by music students to support their daily practice. The dominant piano performance evaluation models are based on metrics such as playing correct notes and global tempo. However, these metrics are often limited in capturing expressive aspects of performance, which can be seen through slight variations a performer chooses to employ in their playing. In order to capture some of these variations, we propose a web application that serves to compare the user's playing to that of professional recordings of the same piece. The users playing is plotted in real time on a tempo-volume graph, alongside performances by great pianists such as Rubinstein and Horowitz. We then give this application to music students and from the feedback we receive, determine in what kind of live performance feedback will be useful to pianists.

  • Nutan Chen, Djalel Benbouzid, Francesco Ferroni, Mathis Nitschke, Luciano Pinna and Patrick van der Smagt. Flat latent manifolds for human-machine co-creation of music

    The use of machine learning in artistic music generation leads to controversial discussions of the quality of art, for which objective quantification is nonsensical. We therefore consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal interplay is to lead to new experiences, both for the musician and the audience. To obtain this behaviour, we resort to the framework of recurrent Variational Auto-Encoders (VAE) and learn to generate music, seeded by a human musician. In the learned model, we generate novel musical sequences by interpolation in latent space. Standard VAEs however do not guarantee any form of smoothness in their latent representation. This translates into abrupt changes in the generated music sequences. To overcome these limitations, we regularise the decoder and endow the latent space with a flat Riemannian manifold, i.e., a manifold that is isometric to the Euclidean space. As a result, linearly interpolating in the latent space yields realistic and smooth musical changes that fit the type of machine–musician interactions we aim for. We provide empirical evidence for our method via a set of experiments on music datasets and we deploy our model for an interactive jam session with a professional drummer. The live performance provides qualitative evidence that the latent representation can be intuitively interpreted and exploited by the drummer to drive the interplay. Beyond the musical application, our approach showcases an instance of human-centred design of machine-learning models, driven by interpretability and the interaction with the end user.

  • Behzad Haki, Marina Nieto, Teresa Pelinski and Sergi Jordà. Real-Time Drum Accompaniment Using Transformer Architecture

    This paper presents a real-time drum generation system capable of accompanying a human instrumentalist. The drum generation model is a transformer encoder trained to predict a short drum pattern given a reduced rhythmic representation. We demonstrate that with certain design considerations, the short drum pattern generator can be used as a real-time accompaniment in musical sessions lasting much longer than the duration of the training samples. A discussion on the potentials, limitations and possible future continuations of this work is provided.

  • Victor Shepardson, Jack Armitage and Thor Magnusson. Notochord: a Flexible Probabilistic Model for Embodied MIDI Performance

    Deep learning-based probabilistic models of musical data are producing increasingly realistic results and promise to enter creative workflows of many kinds. Yet they have been little-studied in a performance setting, where the results of user actions typically ought to feel instantaneous. To enable such study, we designed Notochord, a deep probabilistic model for sequences of structured events, and trained an instance of it on the Lakh MIDI dataset. Our probabilistic formulation allows interpretable interventions at a sub-event level, which enables one model to act as a backbone for diverse interactive musical functions including steerable generation, harmonization, machine improvisation, and likelihood-based interfaces. Notochord can generate polyphonic and multi-track MIDI, and respond to inputs with latency below ten milliseconds. Training code, model checkpoints and interactive examples are provided as open source software.

JST 18.00 – 18:15 | UTC 09.00 – 09:15

Spotlight Talk

by Emily Howard

An introduction to some of the ongoing research collaborations developing through the Centre for Practice & Research in Science & Music (PRiSM) at the Royal Northern College of Music in Manchester, UK.

Emily Howard holds a Personal Chair in Composition at the Royal Northern College of Music and is Director of PRiSM, the RNCM Centre for Practice & Research in Science & Music. Howard's music is commissioned, performed and broadcast all over the world. Her work is known for its inventive connections with mathematical shapes and processes: Antisphere – the latest addition to Howard's ongoing series of orchestral geometries – was commissioned by the Barbican for Sir Simon Rattle and the London Symphony Orchestra and opened the 2019-20 season. PRiSM takes a lead in interdisciplinary and reflexive research between the creative arts and the sciences with a view to making a real contribution to society, to developing new digital technology and creative practice, and to addressing fundamental questions about what it means to be human and creative today.

JST 18.15 – 19:00 | UTC 09.15 – 10:00

Keynote Speech

By Patrick Savage

AI has made great strides in analyzing and generating Western classical and popular music. However, our ability to extend such advances to traditional/non-Western music remains limited, due in part to limited availability of training data and less commercial potential. Is this a good or bad thing? What promises and pitfalls await as we expand automated approaches to traditional music and other creative arts? I will discuss recent work from our and other labs attempting to apply automated methods to large samples of non-traditional music, including the Global Jukebox (including coded recordings of >5,000 traditional songs from almost 1,000 societies), and a sample of >10,000 Japanese and English folk songs. I will propose ideas for ethical and equitable ways forward as AI inevitably expands to incorporate more traditional music.

Patrick Savage is an Associate Professor in the Faculty of Environment and Information Studies at Keio University in Japan, where he directs the CompMusic Lab for comparative and computational musicology. Previously, he received his MSc in Psychology from McMaster University, PhD in Ethnomusicology from Tokyo University of the Arts, and was a postdoc in the University of Oxford School of Anthropology. His research focuses on using science to understand cross-cultural diversity and unity in human music. His research has been featured in outlets including Proceedings of the National Academy of Sciences, Behavioral & Brain Sciences, Current Biology, Ethnomusicology, The New York Times, and Nautilus.

JST 20.00 – 22:00 | UTC 11.00 – 13:00

Paper Session

Session Chair: Taketo Akama

  • Koray Tahiroğlu and Miranda Kastemaa. Augmented Granular Synthesis Method for GAN Latent Space with Redundancy Parameter

    In this paper we introduce an augmented granular sound synthesis method for a GAN latent space exploration in audio domain. We use the AI-terity musical instrument for sound generating events in which the neural network (NN) parameters are optimised and then the features are used as a basis to generate new sounds. The exploration of a latent space is realised by creating a latent space through the original features of the training data set and finding the corresponding audio feature of the vector points in this space. Our proposed sound synthesis method can achieve multiple audio generation and sound synthesising events simultaneously without interrupting the playback grains. To do that we introduce redundancy parameter that schedules additional buffer slots divided from a large buffer slot, allowing multiple latent space vector points to be used in granular synthesis, in GPU real-time. Our implementation demonstrates that augmented buffer schedule slots can be used as a feature for a sound synthesis method to explore GAN-latent sound synthesis of granular-musical events with multiple generated audio samples without interrupting the granular musical features of the synthesis method.

  • Emmanuel Deruty and Maarten Grachten. "Melatonin": A Case Study on AI-induced Musical Style

    Although the use of AI tools in music composition and production is steadily increasing, as witnessed by the newly founded AI song contest, analysis of music produced using these tools is still relatively uncommon as a mean to gain insight in the ways AI tools impact music production. In this paper we present a case study of "Melatonin", a song produced by extensive use of BassNet, an AI tool originally designed to generate bass lines. Through analysis of the artists' work flow and song project, we identify style characteristics of the song in relation to the affordances of the tool, highlighting manifestations of style in terms of both idiom and sound.

  • The Sound of AI Community. From Words to Sound: Neural Audio Synthesis of Guitar Sounds with Timbral Descriptors

    Interest in neural audio synthesis has been growing lately both in academia and industry. Deep Learning (DL) synthesisers enable musicians to generate fresh, often completely unconventional sounds. However, most of these applications present a drawback. It is difficult for musicians to generate sounds which reflect the timbral properties they have in mind, because of the nature of the latent spaces of such systems. These spaces generally have large dimensionality and cannot easily be mapped to semantically meaningful timbral properties. Navigation of such timbral spaces is therefore impractical. In this paper, we introduce a DL-powered instrument that generates guitar sounds from vocal commands. The system analyses vocal instructions to extract timbral descriptors which condition the sound generation.

  • Mark R. H. Gotham, Kunpeng Song, Nicolai Böhlefeld and Ahmed Elgammal. Beethoven X: 'Es könnte sein!' ('It could be!')

    To mark the 250th anniversary of Beethoven's birth, Deutsche Telekom commissioned a team to develop with machine learning Beethoven's fragmentary sketches for a Tenth symphony. Here, the research part of that team discuss some of the key musical and computational decisions involved, including: making sense of Beethoven's few, ambiguous plans for this work; curating that and other musical data in machine-readable formats; and identifying suitable generation tasks, relevant training materials, and network architectures. Most of all, we consider highly topical themes for AI in the creative arts today, notably on the interaction between not only human and computer, but also living and historical artists.

  • Fabian Ostermann, Igor Vatolkin, and Günter Rudolph. Artificial Music Producer: Filtering Music Compositions by Artificial Taste

    Human composers arrive at creative decisions on the basis of their individual musical taste. For automatic algorithmic composition, we propose to embrace that concept and encode taste as binary classification task. We identify and reconsider an implicit assumption: each and every result of a successful composing algorithm should be of great quality. In contrast, we formulate a general concept of composer-producer collaboration: an artificial music producer that filters 'good' and 'poor' results of an independent composer can improve musical quality without the need of refactoring composing strategies. That way, creative programming can be divided into independent subtasks, which allow for modular (multi-agent) system designs as well as productive team development. In a proof-of-concept experiment, we perform the discrimination of real Bach chorales from fakes generated by DeepBach using neural networks. This leads to an improvement of the overall results and provides possibilities to explain model behavior. Our concept can effortlessly be transferred to any pre-existing music generator.

JST 22.00 – 23:00 | UTC 13.00 – 14:00

Music Session

Session Chair: Artemi-Maria Gioti

  • Phivos-Angelos Kollias. Nostophiliac

  • Jean-Philippe Jullin. Aus' dreams

  • Andrew Fyfe. Resembling Shakuhachi

  • Bob L. T. Sturm. The shadow still lay where he had been standing

JST 23.00 – 23:15 | UTC 14.00 – 14:15

Closing Session

Session Chair: Stefano Kalonaris


AIMC 2022 is free for all registrants

Registration is now closed


Code of Conduct

Participants of AIMC 2022, including organizers, speakers, volunteers, staff and attendees are expected to abide by the Berlin code of conduct.

Reporting Violations

You can report any violation of this code of conduct by contacting us.


By taking the opportunity to become a proud sponsor of the conference, you may:

  • Advertise your research & products to potential users/customers as well as to potential innovation partners
  • Recruit top technical talent and catch the attention of highly qualified people
  • Follow state-of-the-art AI music creativity research and technologies
AIMC offers sponsorship opportunities at Gold, Silver, and Bronze tiers. For more information, please read below and/or contact us.

Gold 🥇

¥100,000 (JPY)

  • Large logo & company name on the conference website
  • Sponsorship acknowledgment and announcements on social media
  • Submit promoted posts to the job board
  • 30 minute Sponsor Presentation
  • 2 reserved seats in 2 Panel Discussions

Silver 🥈

¥50,000 (JPY)

  • Medium logo & company name on the conference website
  • Sponsorship acknowledgment and announcements on social media
  • Submit posts to the job board
  • 15 minute Sponsor Presentation
  • 1 reserved seats in 1 Panel Discussion

Bronze 🥉

¥30,000 (JPY)

  • Small logo & company name on the conference website
  • Sponsorship acknowledgment and announcements on social media
  • Submit posts to the job board
  • 10 minute Sponsor Presentation


General Enquiries

--- closed ---