This year's edition was hosted from Japan and held online, with auspices to return to in-person mode in the very near future.
AIMC 2022 brought together several collaborating institutions and partners in a three-day program featuring paper presentations, concerts, panel discussions, workshops, and three keynotes.
CFP launch: 15 January 2022
Submission deadline: 9 May 2022 (AoE)
Notification of acceptance: 11 July 2022
NEW! Camera-ready deadline: 25 July 2022
Conference dates: 13-15 September 2022
AI is becoming increasingly ubiquitous in the music industry, allowing end users to have easy access to music experimentation and creation, for example generating original compositions, cinematic soundtracks, orchestrations, digital mastering or novel remixes from audio stems, among other things. Of course, these newly acquired affordances are not devoid of political, sociological and aesthetic implications, and shape our understanding and fruition of the musical experience. These implications are particularly salient when AI techniques & methods are applied to the generation of music modeled on traditional musical expressions and heritages. For example, AI involvement with musical traditions foregrounds issues regarding copyright, technocratic appropriation and dilution of musical expression specificities, standardization on genres, perpetuation of embedded musico-cultural biases and cliches, uniformization of the aesthetic experience as a result of indiscriminate usage of music corpora, to name but a few.
In this edition of AIMC, we were particularly interested in contributions that addressed themes such as (but not limited to):
There are two formats for paper submissions:
- Full papers: 8 pages maximum
- Short papers: 4 pages maximum
Note that the maximum length does not include references.
All papers must use the AI Music Creativity 2022 template, which you can download here (do not use previous years' templates).
Alternatively, you might want to use our Overleaf template.
Papers must be submitted via CMT3.
Submissions must be anonymized and will be subject to peer-review by at least
three independent reviewers. We encourage audio and video material to accompany
and illustrate submitted papers. We ask that authors arrange for their web hosting
of audio and video files, and give URL links to all such files within the text of
the submitted paper.
At least one author of each accepted submission will be required to register for the conference.
All accepted papers, regardless of their format, will be included in the Conference Proceedings. The proceedings will be published as an electronic Open Access publication with DOI. Selected papers will be published, in expanded form, in a special issue of the Journal of Creative Music Systems.
We invite artistic submissions that deal with topics related to AI music creativity, human-machine co-creativity, and machine learning. We especially encourage and promote submissions that address AI from diverse perspectives with regard to questions of ethics, aesthetics, ecology, performance, and gender. Due to the format of this year's edition, the music program will take place online and feature recorded material, online installations and live streamed discussions with the participating artists. Within these constraints, the conference provides a number of different formats in order to represent a broad range of artistic approaches.
There are three formats for music work submissions:
- Concert/Performance
- Fixed Media
- Web-based
In this year's edition of AIMC, we are particularly interested in how the conference's theme maps to the music tradition of Japan. We note a substantial lack of musical corpora and resources, to this end; for example, the Essen Folk Song or the Nottingham Folk Music Database have long been used in generative and musicological tasks of computational nature, while others have explored traditional Irish folk music or Canadian folk songs. While there exist collections of Carnatic, Turkish Makam, Beijing Opera, and Arab Andalusian music, little exists for traditional Japanese music. For this year's special call for music generation, we are inviting submissions of AI-driven and Japanese-inspired music works and soundscapes. Criteria for submissions are:
For all formats, please submit the following via CMT3:
We strongly encourage submissions accompanied by a short paper (follow instructions in the Papers call). All submissions will be reviewed by independent members of the selection committee. Once these reviews have been completed the reviewers and the chairs will discuss the works, the reviews and their technical feasibility. Reviewers will be anonymous to the authors. At least one artist per work must register for the conference.
Workshop and tutorial proposals should include:
- A two-page workshop/tutorial description, outlining its aims, scope, presentation and participation formats
- Bios of the workshop leaders/tutors
- A detailed description of technical requirements
- A link to audio and/or video material that showcases the technological components and presentation formats described in the proposal
All workshop and tutorial proposals must be submitted to [deactivated link]. Submissions do not need to be anonymized and will be evaluated by a review panel based on their feasibility, impact and relevance to the conference themes and topics. Software and hardware components should be tested and working by the time of submission. All workshops and tutorials will be held online.
AIMC's principal goal is to bring together scholars and artists interested in the virtual emulation of musical creativity and its use for music creation, and to provide an interdisciplinary platform to promote, present and discuss their work in scientific and artistic contexts.
You can find the previous editions here.
Philippe Pasquier, Simon Fraser University, Canada
Robin Laney, The Open University, UK
Anna Jordanous, University of Kent, UK
Roisin Loughran, Dundalk Institute of Technology, Ireland
Steven Jan, University of Huddersfield, UK
Valerio Velardo, MusiMAP
Bob L. T. Sturm, Royal Institute of Technology, Sweden
Artemi-Maria Gioti, University of Music Carl Maria von Weber Dresden, Germany
Anna Aljanaki, Cyran Aouameur, Oded Ben-Tal, Andrea Bolzoni, Jean-Pierre Briot, Ke Chen, Kin Wai Cheuk, Laura Cros Vila, Tom Davis, Roger Dean, Ken Déguernel, Hao-Wen Dong, Andrea Franceschini, Satoru Fukayama, Siddharth Gurunani, Simon Holland, Steven Jan, Anna Jordanous, Tetsuro Kitahara, Stefan Lattner, Keon Ju Lee, Wei-Hsiang Liao, Roisin Loughran, Marco Martinez Ramirez, Panos Mavromatis, Gianluca Micchi, Ashis Pati, Elio Quinton, Adam Roberts, Robert Rowe, Richard Savery, Hugo Scurto, Ian Simon, Jason Smith, Bob Sturm, Kivanç Tatar, George Tzanetakis, Mauro Vallati, Juan Vasquez, Valerio Velardo, Benedikte Wallace, Anna Weisling, Kazuyoshi Yoshii, Maximos Kaliakatsos-Papakostas
Alo Allik, Tim Blackwell, Oliver Bown, Teresa Carrasco Garcia, Scott Cazan, Marko Ciciliani, Cathy Cox, Agostino Di Scipio, Frédéric Dufeu, Robert Ek, Artemi-Maria Gioti, Owen Green, Bjarni Gunnarsson, Kazuhiro Jo, Shelly Knotts, Adam Linson, Scott McLaughlin, Jonathan Reus, Alexander Sigman, Nikos Stavropoulos, Pierre Alexandre Tremblay, Sebastian Trump, Iannis Zannos, Ivan Zavada
The proceedings have been published as an electronic Open Access publication with DOI. You can find all the papers here or bulk download them from here.
Visit the AIMC 2022 YouTube channel for the video recordings of keynote speeches, spotlight talks, panel discussions, music sessions, and paper presentations.
How can we use AI to enhance our creativity and create something original? This talk will present a series of music performances based on real-time interaction with AI models, including dance music performances using generative music models in both symbolic and signal domains. I'll also introduce a deep learning-based real-time audio processing tool/platform for AI researchers and musicians. By illustrating the process of realizing the performances and how I used the tool, I'd like to discuss how we can create AI music tools musicians actually use and the importance of "misusable" AI.
Nao Tokui is an artist, researcher, and associate professor at Keio University. Tokui received his Ph.D. from The University of Tokyo on his research on Human-Computer Interaction and Artificial Intelligence (AI). While pursuing his Ph.D., he released his first album and collaborated with Nujabes, a legendary Japanese hip-hop producer. He has been exploring the potential expansion of human creativity through the use of AI. His recent works include a collaboration with Brian Eno and AI DJ Project, where a Machine Learning-based DJ plays alongside a human DJ (Tokui, himself), taking turns selecting and playing one song at a time. His works have been exhibited at New York MoMA, Barbican Centre (London), and InterCommunication Center (Tokyo), and more. In April 2019, he started his professorship at Keio University and founded Computational Creativity Lab to advance his practices in research and educational environment as well. He published his first book on AI and Creativity in January 2021.
In its more than 60-year history, music generation systems have never been more popular than today. While the number of music AI startups are rising, there are still a few issues with generated music. Firstly, it is notoriously hard to enforce long-term structure (e.g. earworms) in the music. Secondly, by making the systems controllable in terms of meta-attributes like emotion, they could become practically useful for music producers. In this talk, I will discuss several deep learning-based controllable music generation systems that have been developed over the last few years in our lab. These include TensionVAE, a music generation system guided by tonal tension; MusicFaderNets, a variational autoencoder model that allows for controllable arousal; and a seq2seq controllable lead sheet generator with Transformers. Finally, I will discuss some more recent projects by our AMAAI lab, including generating music that matches a video.
Dorien Herremans is an Assistant Professor at Singapore University of Technology and Design, where she is also Director of Game Lab. Before joining SUTD, she was a Marie Sklodowska-Curie Postdoctoral Fellow at the Centre for Digital Music at Queen Mary University of London, where she worked on the project: "MorpheuS: Hybrid Machine Learning – Optimization techniques To Generate Structured Music Through Morphing And Fusion". She received her Ph.D. in Applied Economics, and graduated as a Business Engineer in management information systems at the University of Antwerp in 2005. After that, she worked as a consultant and was an IT lecturer at the Les Roches University in Bluche, Switzerland. Dr. Herremans' research interests include AI for novel applications such as music and audio. At SUTD she leads the AMAAI Lab on Audio, Music, and AI.
AI has made great strides in analyzing and generating Western classical and popular music. However, our ability to extend such advances to traditional/non-Western music remains limited, due in part to limited availability of training data and less commercial potential. Is this a good or bad thing? What promises and pitfalls await as we expand automated approaches to traditional music and other creative arts? I will discuss recent work from our and other labs attempting to apply automated methods to large samples of non-traditional music, including the Global Jukebox (including coded recordings of >5,000 traditional songs from almost 1,000 societies), and a sample of >10,000 Japanese and English folk songs. I will propose ideas for ethical and equitable ways forward as AI inevitably expands to incorporate more traditional music.
Patrick Savage is an Associate Professor in the Faculty of Environment and Information Studies at Keio University in Japan, where he directs the CompMusic Lab for comparative and computational musicology. Previously, he received his MSc in Psychology from McMaster University, PhD in Ethnomusicology from Tokyo University of the Arts, and was a postdoc in the University of Oxford School of Anthropology. His research focuses on using science to understand cross-cultural diversity and unity in human music. His research has been featured in outlets including Proceedings of the National Academy of Sciences, Behavioral & Brain Sciences, Current Biology, Ethnomusicology, The New York Times, and Nautilus.
Click on the different tabs to view details for each day.
10:45 – 11:00 | Opening Session |
11:00 – 12:00 | Industry Session |
12:00 – 14:00 | |
14:00 – 16:00 | Workshop |
16:00 – 17:00 | Industry Session |
17:00 – 18:00 | |
18:00 – 18:15 | Spotlight Talk |
18:15 – 19:00 | Keynote Speech |
19:00 – 20:00 | |
20:00 – 22:00 | Workshop |
11:00 – 13:00 | Paper Session |
13:00 – 14:00 | Music Session |
14:00 – 17:00 | |
17:00 – 17:15 | Spotlight Talk |
17:15 – 18:00 | Panel Discussion |
18:00 – 18:15 | Spotlight Talk |
18:15 – 19:00 | Keynote Speech |
19:00 – 20:00 | |
20:00 – 22:00 | Paper Session |
22:00 – 23:00 | Music Session |
12:00 – 13:00 | Music Session |
13:00 – 15:00 | |
15:00 – 15:45 | Panel Discussion |
16:00 – 18:00 | Paper Session |
18:00 – 18:15 | Spotlight Talk |
18:15 – 19:00 | Keynote Speech |
19:00 – 20:00 | |
20:00 – 22:00 | Paper Session |
22:00 – 23:00 | Music Session |
23:00 – 23:15 | Closing Session |
In this talk, I present a post-mortem analysis of Open Source Research.
This is a collaborative AI Music research project run by The Sound of AI Community.
The project lasted for 1.5 years, it involved >150 people with different backgrounds
and levels of expertise, and led to the publication of a paper at AIMC2022.
I discuss how we managed the project, set goals, and organised members.
I also share insights, DOs, and DONTs for community-driven research projects.
Valerio Velardo is an AI music consultant with experience in R&D and production.
He is the founder of The Sound of AI, the largest YouTube channel and online community on AI Music.
He works as MLOps lead at Utopia Music.
How can we use AI to enhance our creativity and create something original?
This talk will present a series of music performances based on real-time interaction with AI models,
including dance music performances using generative music models in both symbolic and signal domains.
I'll also introduce a deep learning-based real-time audio processing tool/platform for AI researchers and musicians.
By illustrating the process of realizing the performances and how I used the tool,
I'd like to discuss how we can create AI music tools musicians actually use and the importance of "misusable" AI.
Nao Tokui is an artist, researcher, and associate professor at Keio University. Tokui received his Ph.D. from The University of Tokyo on his research on
Human-Computer Interaction and Artificial Intelligence (AI). While pursuing his Ph.D., he released his first album and collaborated with Nujabes, a legendary
Japanese hip-hop producer. He has been exploring the potential expansion of human creativity through the use of AI. His recent works include a collaboration with
Brian Eno and AI DJ Project, where a Machine Learning-based DJ plays alongside a human DJ (Tokui, himself), taking turns selecting and playing one song at a time.
His works have been exhibited at New York MoMA, Barbican Centre (London), and InterCommunication Center (Tokyo), and more. In April 2019, he started his professorship
at Keio University and founded Computational Creativity Lab to advance his practices in research and educational environment as well. He published his first book on
AI and Creativity in January 2021.
This article describes the automatic transcription process of a gambang ensemble, a type of Balinese gamelan, from multitrack audio recordings.
This project showcases the use of artificial intelligence techniques that have proven efficient in the literature, namely multi-layer perceptron
classifiers trained on chroma analysis, and here adapted to repertoire that offers specific challenges due to its musical realities.
After describing the musical formation and its musical context, we will describe the method used for data acquisition, the analysis process of
aforesaid dataset and transcription, concluding with an evaluation of the process' efficiency and discussion on further developments of the algorithm.
Random walks, fractional Brownian motion and stochastic processes have been used extensively by composers such as Iannis Xenakis and others,
creating instantly recognizable textures. A trained ear can differentiate a uniform random walk from a Poisson process or an fBm process and random rotations.
In the opera Sophocles: Antigone by one of the authors of this paper, random walks on neo- Riemannian PLR spaces were experimented with yielding
mixed impressions of process music and post-romantic chromaticism. When the random walk is steered by transformational rules, special textures and harmonies emerge.
We propose a new kind of parameterizable random walks, a generative system, on a space of arbitrary length chords equipped with an arbitrary distance measure steered
from a customizable corpus learned by the system. The corpus provides a particular texture and harmony to the generative process. The learned neo-Riemannian spaces
equipped with some distance measure provide the transformational rule base of the concatenative synthesis process.
In this work, we demonstrate a variational autoencoder designed to reconstruct drum samples using linear combinations from a small predefined library of existing samples.
Inspired by the music production practice of layering two or more samples on top of each other to create rich and unique textures, we synthesize drum sounds by producing
sparse sets of mixing coefficients to apply to the predefined library, which are then layered to create new audio samples.
By training this model to approximate a range of professionally produced and recorded drum samples, we aim to learn a distribution over possible layering strategies given
a fixed sample library, which we can subsequently sample from or otherwise manipulate. We find that varying a particular dimension of the latent vectors in the space learned
by the model does not simply linearly scale the mixing weights; rather, it smoothly varies the perceptual nature of the sample by swapping different samples in and out of the
sparse mixture. We present a user-interface prototype to engage intuitively with our system, discuss the performance of our modeling approach, and highlight potential applications
in a studio production environment.
In this paper, we study the possibility of adding source separation as a pre-processing step to the computer-assisted orchestration process.
We first discuss the motivation of this addition and its potential to increase the quality of orchestrations of multi-layered sounds.
Second, we select several state-of-the-art models for both music source separation (separation of instruments) and universal sound separation
(separation of arbitrary sounds), and compare their effectiveness for the task of orchestration. We assess which methods best suit the needs of
orchestration by applying them on our own target sounds, orchestrating the separated outputs, and finally comparing them to the orchestration of
the same target without separation. Our experiments show that source separation improves the quality of orchestrations, and the more accurate the
separation, the better the resulting orchestration. Finally, we compare unsupervised methods to supervised methods for separation, and comment on
the effect of training data selection on performance of supervised methods.
There is a long history of an effort made to explore musical elements with the entities and spaces around us, such as musique concrète and ambient music.
In the context of computer music and digital art, interactive experiences that concentrate on the surrounding objects and physical spaces have also been
designed. In recent years, with the development and popularization of devices, an increasing number of works have been designed in Extended Reality to
create such musical experiences. In this paper, we describe MR4MR, a sound installation work that allows users to experience melodies produced from
interactions with their surrounding space in the context of Mixed Reality (MR). Using HoloLens, an MR head-mounted display, users can bump virtual objects
that emit sound against real objects in their surroundings. Then, by continuously creating a melody following the sound made by the object and re-generating
randomly and gradually changing melody using music generation machine learning models, users can feel their ambient melody "reincarnating".
This study proposes a model to reduce dissonant chords in morphing-based reharmonization using a variational autoencoder (VAE). The conventional morphing of chords
generates dissonance because the generated results are supplemented by the surrounding learning data in the latent space without considering consonance. Therefore,
this study focuses on degrees of dissonance that can be calculated from pitch intervals. We define an ideal consonance degree based on the dissonance degree, and we
implement a dissonance penalty as a simple network structure by adding nodes to the VAE so that a co-occurrence vector of pitch interval computed from each chord tone
approaches the ideal consonance degree. Our objective evaluation demonstrates that the number of standard triads increased compared to the conventional VAE morphing
method when the model was used. In addition, the number of non-standard and dissonant chords decreased through our model.
An overview of how companies are commercialising AI music creation.
Ed Newton-Rex founded Jukedeck, an AI music generation company, which was acquired by ByteDance in 2019. At ByteDance, he ran the Music Intelligence lab, then led Product in Europe for TikTok.
Since then, he has been a Product Director at Snap, working on music collaboration. He is a mentor at Abbey Road Studios’ music startup incubator, Abbey Road Red, and he composes choral music,
which is published by Boosey & Hawkes.
In this talk, I examine the visible optimism surrounding an AI-enabled future blessed with the "democratic" production of music: what does it really mean to "democratize" music?
What is the ethical significance of this "wave of democratization"? How can we grasp the shifting notions of talent, virtuosity, musical labor, and the "artist-genius" as new
technologies continue to alter the social relations of creative production?
Rujing Stacy Huang is a musicologist, singer-songwriter, and currently Presidential Postdoctoral Fellow at the University of Hong Kong. In 2020-21, she was a postdoc in the
ERC-funded "MUSAiC" project (ERC-2019-COG No. 864189). She is also the Co-Organizer of the AI Song Contest, and most recently the founder and CEO of her startup, Project Grain
(under Grain Music International Ltd.). She completed her PhD in Ethnomusicology from Harvard University in 2019.
In its more than 60-year history, music generation systems have never been more popular than today.
While the number of music AI startups are rising, there are still a few issues with generated music.
Firstly, it is notoriously hard to enforce long-term structure (e.g. earworms) in the music.
Secondly, by making the systems controllable in terms of meta-attributes like emotion, they could become practically useful for music producers.
In this talk, I will discuss several deep learning-based controllable music generation systems that have been developed over the last few years in our lab.
These include TensionVAE, a music generation system guided by tonal tension; MusicFaderNets, a variational autoencoder model that allows for controllable arousal;
and a seq2seq controllable lead sheet generator with Transformers.
Finally, I will discuss some more recent projects by our AMAAI lab, including generating music that matches a video.
Dorien Herremans is an Assistant Professor at Singapore University of Technology and Design, where she is also Director of Game Lab. Before joining SUTD,
she was a Marie Sklodowska-Curie Postdoctoral Fellow at the Centre for Digital Music at Queen Mary University of London, where she worked on the project:
"MorpheuS: Hybrid Machine Learning – Optimization techniques To Generate Structured Music Through Morphing And Fusion". She received her Ph.D. in Applied Economics,
and graduated as a Business Engineer in management information systems at the University of Antwerp in 2005. After that, she worked as a consultant and was an IT lecturer
at the Les Roches University in Bluche, Switzerland. Dr. Herremans' research interests include AI for novel applications such as music and audio.
At SUTD she leads the AMAAI Lab on Audio, Music, and AI.
Despite accepting the involvement of computers in such important areas as finance, self-driving and surveillance,
society still struggles with the concept of computer as an originator of art, and has difficulties with defining
value and place of computer-generated art. In this paper we will explore the questions of value of computer music
from the perspectives of music listeners and professional musicians with different backgrounds in a series of
open-ended interviews. Several solutions for the problem of lack of intentional agency are proposed.
We discuss the design and results of The Ai Music Generation Challenge 2021 and compare it to the challenge of the previous year.
While the 2020 challenge was focused on the Irish double jig, the 2021 challenge was focused on a particular kind of Swedish traditional
dance music, called slängpolska. Six systems participated in the 2021 challenge, each generating a number of tunes evaluated by
five judges, all professional musicians and experts in the music style. In the first phase, the judges reject all tunes that are plagiarised,
or that have incorrect meter or rhythm. In the second phase, they score the remaining tunes along four qualities: dancability, structure coherence,
formal coherence, and playability. The judges know all the tunes are computer generated, but do not know what tunes come from what systems, or what
kinds of machine learning and data are involved. In the third stage, the judges award prizes to the top tunes. This resulted in five tunes garnering
first and second prizes, four of which come from one particular system. We perform a statistical analysis of the scores from all judges, which allows
a quantitative comparison of all factors in the challenge. Finally, we look to the 2022 challenge.
The development of generative Machine Learning (ML) models in creative practices, enabled by the recent improvements
in usability and availability of pre-trained models, is raising more and more interest among artists, practitioners
and performers. Yet, the introduction of such techniques in artistic domains also revealed multiple limitations that
escape current evaluation methods used by scientists. Notably, most models are still unable to generate content that
lay outside of the domain defined by the training dataset. In this paper, we propose an alternative framework, starting
from a new general formulation of ML objectives, that we derive to delineate possible implications and solutions that
already exist in the ML literature (notably for the audio and musical domain). We also discuss existing relations between
generative models and computational creativity and how our framework could help address the lack of creativity in existing
models.
This paper seeks to identify aesthetically productive problems.
Based on Melanie Mitchell's much-discussed 2021 paper "Why AI is Harder Than We Think," it seeks to outline four
areas of artistic potential that are related to the four "fallacies" in AI research identified by Mitchell.
These are underlying assumptions of AI research that have contributed to overconfident predictions.
The paper uses these fallacies as a point of departure to discuss the relation of AI research and artistic practice,
not from a utilitarian or problem-solving point of view, but rather in order to identify how frictions and fallacies
disclose aesthetically productive areas. The paper seeks to demonstrate how these fallacies are not only shortcomings
with regard to our understanding of intelligence, but how they are actually at the core of what constitutes aesthetics
and artistic practice.
Universal Darwinism (UD) (Plotkin, 1995) holds that the "evolutionary algorithm" (Dennett, 1995, pp. 50–52)
operates across the interconnected realms of a "recursive ontology" (Velardo, 2016) that binds together all that exists.
Indeed, UD maintains that all phenomena in the universe are emergent properties of Darwinian processes of variation,
replication and selection. If true, an evolutionary view of culture must take priority over more "creationist" accounts
that rely upon the mysterious intercession of inspiration and imagination and the supposed conscious agency of the composer
(or programmer). The most widely accepted (and critiqued) theory of cultural evolution – memetics – arguably has significant
explanatory power for human-generated music (HGM), and accepting the logic of UD it follows that computer-generated music (CGM)
must in some sense be "accountable" to evolutionary precepts. This paper explores how AIMC can be understood in the light of
memetics (as a window on the operation of UD in human culture), even when the outputs of generative systems seem far removed
from the structural norms, aesthetic values and sound-worlds, of HGM.
This paper will use the hacker-duo Dadabots (who generate raw audio using SampleRNN) and OpenAI's Jukebox project (who generate raw audio using a
hierarchical vq-vae transformer) as case studies to assess whether machines are capable of musical creativity, how they are capable of musical creativity,
and whether this helps to define what musical creativity is. It will also discuss how these systems can be useful for human creative processes. The findings
from evaluating Dadabots' and OpenAI's work will firstly demonstrate that our assumptions about musical creativity in both humans and machines revolve too
strongly around symbolic models. Secondly, the findings will suggest that what Boden describes as 'transformational creativity' can take place through
unexpected machine consequences.
Software that provide feedback on music performance are being increasingly used by music students to support their daily practice.
The dominant piano performance evaluation models are based on metrics such as playing correct notes and global tempo.
However, these metrics are often limited in capturing expressive aspects of performance, which can be seen through slight variations
a performer chooses to employ in their playing. In order to capture some of these variations, we propose a web application that serves
to compare the user's playing to that of professional recordings of the same piece. The users playing is plotted in real time on a
tempo-volume graph, alongside performances by great pianists such as Rubinstein and Horowitz. We then give this application to music
students and from the feedback we receive, determine in what kind of live performance feedback will be useful to pianists.
The use of machine learning in artistic music generation leads to controversial discussions of the quality of art, for which objective quantification is nonsensical.
We therefore consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal interplay is to lead to new experiences,
both for the musician and the audience. To obtain this behaviour, we resort to the framework of recurrent Variational Auto-Encoders (VAE) and learn to generate music,
seeded by a human musician. In the learned model, we generate novel musical sequences by interpolation in latent space. Standard VAEs however do not guarantee any form
of smoothness in their latent representation. This translates into abrupt changes in the generated music sequences. To overcome these limitations, we regularise the decoder
and endow the latent space with a flat Riemannian manifold, i.e., a manifold that is isometric to the Euclidean space. As a result, linearly interpolating in the latent space
yields realistic and smooth musical changes that fit the type of machine–musician interactions we aim for. We provide empirical evidence for our method via a set of
experiments on music datasets and we deploy our model for an interactive jam session with a professional drummer. The live performance provides qualitative evidence that the
latent representation can be intuitively interpreted and exploited by the drummer to drive the interplay. Beyond the musical application, our approach showcases an instance of
human-centred design of machine-learning models, driven by interpretability and the interaction with the end user.
This paper presents a real-time drum generation system capable of accompanying a human instrumentalist. The drum generation model is a transformer encoder trained to predict a short
drum pattern given a reduced rhythmic representation. We demonstrate that with certain design considerations, the short drum pattern generator can be used as a real-time accompaniment
in musical sessions lasting much longer than the duration of the training samples. A discussion on the potentials, limitations and possible future continuations of this work is provided.
Deep learning-based probabilistic models of musical data are producing increasingly realistic results and promise to enter creative workflows of many kinds.
Yet they have been little-studied in a performance setting, where the results of user actions typically ought to feel instantaneous. To enable such study,
we designed Notochord, a deep probabilistic model for sequences of structured events, and trained an instance of it on the Lakh MIDI dataset. Our probabilistic
formulation allows interpretable interventions at a sub-event level, which enables one model to act as a backbone for diverse interactive musical functions including
steerable generation, harmonization, machine improvisation, and likelihood-based interfaces. Notochord can generate polyphonic and multi-track MIDI, and respond to
inputs with latency below ten milliseconds. Training code, model checkpoints and interactive examples are provided as open source software.
An introduction to some of the ongoing research collaborations developing through the Centre for Practice & Research in Science & Music (PRiSM) at the Royal Northern College of Music in Manchester, UK.
Emily Howard holds a Personal Chair in Composition at the Royal Northern College of Music and is Director of PRiSM, the RNCM Centre for Practice & Research in Science & Music.
Howard's music is commissioned, performed and broadcast all over the world. Her work is known for its inventive connections with mathematical shapes and processes: Antisphere – the latest addition to Howard's
ongoing series of orchestral geometries – was commissioned by the Barbican for Sir Simon Rattle and the London Symphony Orchestra and opened the 2019-20 season. PRiSM takes a lead in interdisciplinary and
reflexive research between the creative arts and the sciences with a view to making a real contribution to society, to developing new digital technology and creative practice, and to addressing fundamental questions
about what it means to be human and creative today.
AI has made great strides in analyzing and generating Western classical and popular music.
However, our ability to extend such advances to traditional/non-Western music remains limited, due in part to limited availability of training data and less commercial potential.
Is this a good or bad thing? What promises and pitfalls await as we expand automated approaches to traditional music and other creative arts?
I will discuss recent work from our and other labs attempting to apply automated methods to large samples of non-traditional music, including the Global Jukebox
(including coded recordings of >5,000 traditional songs from almost 1,000 societies), and a sample of >10,000 Japanese and English folk songs.
I will propose ideas for ethical and equitable ways forward as AI inevitably expands to incorporate more traditional music.
Patrick Savage is an Associate Professor in the Faculty of Environment and Information Studies at Keio University in Japan, where he directs the CompMusic Lab for comparative
and computational musicology. Previously, he received his MSc in Psychology from McMaster University, PhD in Ethnomusicology from Tokyo University of the Arts, and was a postdoc in
the University of Oxford School of Anthropology. His research focuses on using science to understand cross-cultural diversity and unity in human music. His research has been featured
in outlets including Proceedings of the National Academy of Sciences, Behavioral & Brain Sciences, Current Biology, Ethnomusicology, The New York Times, and Nautilus.
In this paper we introduce an augmented granular sound synthesis method for a GAN latent space exploration in audio domain.
We use the AI-terity musical instrument for sound generating events in which the neural network (NN) parameters are optimised
and then the features are used as a basis to generate new sounds. The exploration of a latent space is realised by creating a
latent space through the original features of the training data set and finding the corresponding audio feature of the vector
points in this space. Our proposed sound synthesis method can achieve multiple audio generation and sound synthesising events
simultaneously without interrupting the playback grains. To do that we introduce redundancy parameter that schedules additional
buffer slots divided from a large buffer slot, allowing multiple latent space vector points to be used in granular synthesis,
in GPU real-time. Our implementation demonstrates that augmented buffer schedule slots can be used as a feature for a sound
synthesis method to explore GAN-latent sound synthesis of granular-musical events with multiple generated audio samples without
interrupting the granular musical features of the synthesis method.
Although the use of AI tools in music composition and production is steadily increasing, as witnessed by the newly founded AI song contest,
analysis of music produced using these tools is still relatively uncommon as a mean to gain insight in the ways AI tools impact music production.
In this paper we present a case study of "Melatonin", a song produced by extensive use of BassNet, an AI tool originally designed to generate bass lines.
Through analysis of the artists' work flow and song project, we identify style characteristics of the song in relation to the affordances of the tool,
highlighting manifestations of style in terms of both idiom and sound.
Interest in neural audio synthesis has been growing lately both in academia and industry. Deep Learning (DL) synthesisers enable musicians to generate fresh,
often completely unconventional sounds. However, most of these applications present a drawback. It is difficult for musicians to generate sounds which reflect
the timbral properties they have in mind, because of the nature of the latent spaces of such systems. These spaces generally have large dimensionality and
cannot easily be mapped to semantically meaningful timbral properties. Navigation of such timbral spaces is therefore impractical. In this paper, we introduce
a DL-powered instrument that generates guitar sounds from vocal commands. The system analyses vocal instructions to extract timbral descriptors which condition
the sound generation.
To mark the 250th anniversary of Beethoven's birth, Deutsche Telekom commissioned a team to develop with machine learning Beethoven's fragmentary sketches for a Tenth symphony.
Here, the research part of that team discuss some of the key musical and computational decisions involved, including: making sense of Beethoven's few, ambiguous plans for this work;
curating that and other musical data in machine-readable formats; and identifying suitable generation tasks, relevant training materials, and network architectures. Most of all,
we consider highly topical themes for AI in the creative arts today, notably on the interaction between not only human and computer, but also living and historical artists.
Human composers arrive at creative decisions on the basis of their individual musical taste. For automatic algorithmic composition, we propose to embrace that concept and encode taste
as binary classification task. We identify and reconsider an implicit assumption: each and every result of a successful composing algorithm should be of great quality. In contrast, we
formulate a general concept of composer-producer collaboration: an artificial music producer that filters 'good' and 'poor' results of an independent composer can improve musical quality
without the need of refactoring composing strategies. That way, creative programming can be divided into independent subtasks, which allow for modular (multi-agent) system designs as well
as productive team development. In a proof-of-concept experiment, we perform the discrimination of real Bach chorales from fakes generated by DeepBach using neural networks. This leads to
an improvement of the overall results and provides possibilities to explain model behavior. Our concept can effortlessly be transferred to any pre-existing music generator.
Participants of AIMC 2022, including organizers, speakers, volunteers, staff and attendees are expected to abide by the Berlin code of conduct.
You can report any violation of this code of conduct by contacting us.
By taking the opportunity to become a proud sponsor of the conference, you may: