2025

JAES

Designing Neural Synthesizers for Low Latency Interaction

Caspe, Franco, Shier, Jordie, Sandler, Mark and 2 more authors

Journal of the Audio Engineering Society 2025

Abs arXiv Code Plugin

Neural Audio Synthesis (NAS) models offer interactive musical control over high-quality, expressive audio generators. While these models can operate in real-time, they often suffer from high latency, making them unsuitable for intimate musical interaction. The impact of architectural choices in deep learning models on audio latency remains largely unexplored in the NAS literature. In this work, we investigate the sources of latency and jitter typically found in interactive NAS models. We then apply this analysis to the task of timbre transfer using RAVE, a convolutional variational autoencoder for audio waveforms introduced by Caillon et al. in 2021. Finally, we present an iterative design approach for optimizing latency. This culminates with a model we call BRAVE (Bravely Realtime Audio Variational autoEncoder), which is low-latency and exhibits better pitch and loudness replication while showing timbre modification capabilities similar to RAVE. We implement it in a specialized inference framework for low-latency, real-time inference and present a proof-of-concept audio plugin compatible with audio signals from musical instruments. We expect the challenges and guidelines described in this document to support NAS researchers in designing models for low-latency inference from the ground up, enriching the landscape of possibilities for musicians.

2024

CHIME

Ethnographic Exploration of Timbre in Hackathon Designs

Saitis, Charalampos, Del Sette, Bleiz Macsen, Tian, Haokun and 5 more authors

In CHIME Annual Conference 2024

Abs PDF

This paper reports a summary account of the Timbre Tools Hackathon: a hackathon that invited audio developers and music technologists to consider and work with timbre through the design of tools that promote a timbre-first approach to digital instrument craft practice—timbre tools. Through ethnographic observation, we identified different approaches towards integrating timbre as an active part of creating tools and technologies in music. These strategies inform future work and the development of tools to assist awareness and exploration of timbre for instrument makers.
AM

Timbre Tools: Ethnographic Perspectives on Timbre and Sonic Cultures in Hackathon Designs

Saitis, Charalampos, Del Sette, Bleiz M, Shier, Jordie and 5 more authors

In Proceedings of the 19th International Audio Mostly Conference: Explorations in Sonic Cultures 2024

Abs PDF

Timbre is a nuanced yet abstractly defined concept. Its inherently subjective qualities make it challenging to design and work with. In this paper, we propose to explore the conceptualisation and negotiation of timbre within the creative practice of timbre tool makers. To this end, we hosted a hackathon event and performed an ethnographic study to explore how participants engaged with the notion of timbre and how their conception of timbre was shaped through social interactions and technological encounters. We present individual descriptions of each team’s design process and reflect on our data to identify commonalities in the ways that timbre is understood and informed by sound technologies and their surrounding sonic cultures, e.g., by relating concepts of timbre to metaphors. We further current understanding by offering novel interdisciplinary and multimodal insights into understandings of timbre.
NIME

Real-time timbre remapping with differentiable DSP

Shier, Jordie, Saitis, Charalampos, Robertson, Andrew and 1 more author

In New Interfaces for Musical Expression 2024

Abs arXiv Code Plugin Video Website

Timbre is a primary mode of expression in diverse musical contexts. However, prevalent audio-driven synthesis methods predominantly rely on pitch and loudness envelopes, effectively flattening timbral expression from the input. Our approach draws on the concept of timbre analogies and investigates how timbral expression from an input signal can be mapped onto controls for a synthesizer. Leveraging differentiable digital signal processing, our method facilitates direct optimization of synthesizer parameters through a novel feature difference loss. This loss function, designed to learn relative timbral differences between musical events, prioritizes the subtleties of graded timbre modulations within phrases, allowing for meaningful translations in a timbre space. Using snare drum performances as a case study, where timbral expression is central, we demonstrate real-time timbre remapping from acoustic snare drums to a differentiable synthesizer modeled after the Roland TR-808.
Frontiers

A review of differentiable digital signal processing for music and speech synthesis

Hayes, Ben, Shier, Jordie, Fazekas, György and 2 more authors

Frontiers in Signal Processing 2024

Abs PDF

The term “differentiable digital signal processing” describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music and speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably, which is further supported by a web book containing practical advice on differentiable synthesiser programming (https://intro2ddsp.github.io/). Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research.

2023

EAA

Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

Shier, Jordie*, Caspe, Franco*, Robertson, Andrew and 3 more authors

In Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023 2023

Abs PDF Code Website

Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified synthesis framework aiming to address transient generation and percussive synthesis within a DDSP framework. To this end, we propose a model for percussive synthesis that builds on sinusoidal modeling synthesis and incorporates a modulated temporal convolutional network for transient generation. We use a modified sinusoidal peak picking algorithm to generate time-varying non-harmonic sinusoids and pair it with differentiable noise and transient encoders that are jointly trained to reconstruct drumset sounds. We compute a set of reconstruction metrics using a large dataset of acoustic and electronic percussion samples that show that our method leads to improved onset signal reconstruction for membranophone percussion instruments.

2022

DMRN

Real-time timbre mapping for synthesized percussive performance

Shier, Jordie

In DMRN+17: Digital Music Research Network One-day Workshop 2022 2022

Poster
arXiv

Hear 2021: Holistic evaluation of audio representations

Turian, Joseph, Shier, Jordie, Khan, Humair Raj and 20 more authors

arXiv preprint 2022

Abs arXiv

What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. HEAR was launched as a NeurIPS 2021 shared challenge. In the spirit of shared exchange, each participant submitted an audio embedding model following a common API that is general-purpose, open-source, and freely available to use. Twenty-nine models by thirteen external teams were evaluated on nineteen diverse downstream tasks derived from sixteen datasets. Open evaluation code, submitted models and datasets are key contributions, enabling comprehensive and reproducible evaluation, as well as previously impossible longitudinal studies. It still remains an open question whether one single general-purpose audio representation can perform as holistically as the human ear.

2021

DAFx

One billion audio sounds from gpu-enabled modular synthesis

Turian, Joseph*, Shier, Jordie*, Tzanetakis, George and 2 more authors

In 2021 24th International Conference on Digital Audio Effects (DAFx) 2021

Abs PDF Code Video Website

We release synth1B1, a multi-modal audio corpus consisting of 1 billion 4-second synthesized sounds, paired with the synthesis parameters used to generate them. The dataset is 100x larger than any audio dataset in the literature. We also introduce torchsynth, an open source modular synthesizer that generates the synth1B1 samples on-the-fly at 16200x faster than real-time (714MHz) on a single GPU. Finally, we release two new audio datasets: FM synth timbre and subtractive synth pitch. Using these datasets, we demonstrate new rank-based evaluation criteria for existing audio representations. Finally, we propose a novel approach to synthesizer hyperparameter optimization.
JAES

Manifold learning methods for visualization and browsing of drum machine samples

Shier, Jordie, McNally, Kirk, Tzanetakis, George and 1 more author

Journal of the Audio Engineering Society 2021

Abs PDF Analysis Code Audio Plugin

The use of electronic drum samples is widespread in contemporary music productions, with music producers having an unprecedented number of samples available to them. The task of organizing and selecting from these large collections can be challenging and time consuming, which points to the need for improved methods for user interaction. This paper presents a system that computationally characterizes and organizes drum machine samples in two dimensions based on sound similarity. The goal of the work is to support the development of intuitive drum sample browsing systems. The methodology presented explores time segmentation, which isolates temporal subsets from the input signal prior to audio feature extraction, as a technique for improving similarity calculations. Manifold learning techniques are compared and evaluated for dimensionality reduction tasks, and used to organize and visualize audio collections in two dimensions. This methodology is evaluated using a combination of objective and subjective methods including audio classification tasks and a user listening study. Finally, we present an open-source audio plug-in developed using the JUCE software framework that incorporates the findings from this study into an application that can be used in the context of a music production environment.

2020

AES

Spiegelib: An automatic synthesizer programming library

Shier, Jordie, Tzanetakis, George, and McNally, Kirk

In Audio Engineering Society Convention 148 2020

Abs PDF Code Poster Website

Automatic synthesizer programming is the field of research focused on using algorithmic techniques to generate parameter settings and patch connections for a sound synthesizer. In this paper, we present the Synthesizer Programming with Intelligent Exploration, Generation, and Evaluation Library (spiegelib), an open-source, object oriented software library to support continued development, collaboration, and reproducibility within this field. spiegelib is designed to be extensible, providing an API with classes for conducting automatic synthesizer programming research. The name spiegelib was chosen to pay homage to Laurie Spiegel, an early pioneer in electronic music. In this paper we review the algorithms currently implemented in spiegelib, and provide an example case to illustrate an application of spiegelib in automatic synthesizer programming research.

2017

AES

Analysis of drum machine kick and snare sounds

Shier, Jordie, McNally, Kirk, and Tzanetakis, George

In Audio Engineering Society Convention 143 2017

Abs PDF Poster

The use of electronic drum samples is widespread in contemporary music productions, with music producers having an unprecedented number of samples available to them. The development of new tools to assist users organizing and managing libraries of this type requires comprehensive audio analysis that is distinct from that used for general classification or onset detection tasks. In this paper 4230 kick and snare samples, representing 250 individual electronic drum machines are evaluated. Samples are segmented into different lengths and analyzed using comprehensive audio feature analysis. Audio classification is used to evaluate and compare the effect of this time segmentation and establish the overall effectiveness of the selected feature set. Results demonstrate that there is improvement in classification scores when using time segmentation as a pre-processing step.
WIMP

Sieve: A plugin for the automatic classification and intelligent browsing of kick and snare samples

Shier, Jordie, McNally, Kirk, and Tzanetakis, George

In 3rd Workshop on Intelligent Music Production. WIMP 2017

Abs PDF Code Poster

The use of electronic drum samples is widespread in contemporary music productions, with music producers having an unprecedented number of samples available to them. To be efficient, users of these large collections require new tools to assist them in sorting, selection and auditioning tasks. This paper presents a new plugin for working with a large collection of kick and snare samples within a music production context. A database of 4230 kick and snare samples, representing 250 individual electronic drum machines are analyzed by segmenting the audio samples into different sample lengths and characterizing these segments using audio feature analysis. The resulting multidimensional feature space is reduced using principle component analysis (PCA). Samples are mapped to a 2D grid interface within an audio plug-in built using the JUCE software framework.

thesis

2021

Master’s

The synthesizer programming problem: improving the usability of sound synthesizers

Shier, Jordie

University of Victoria 2021

Abs PDF Slides

The sound synthesizer is an electronic musical instrument that has become commonplace in audio production for music, film, television and video games. Despite its widespread use, creating new sounds on a synthesizer - referred to as synthesizer programming - is a complex task that can impede the creative process. The primary aim of this thesis is to support the development of techniques to assist synthesizer users to more easily achieve their creative goals. One of the main focuses is the development and evaluation of algorithms for inverse synthesis, a technique that involves the prediction of synthesizer parameters to match a target sound. Deep learning and evolutionary programming techniques are compared on a baseline FM synthesis problem and a novel hybrid approach is presented that produces high quality results in less than half the computation time of a state-of-the-art genetic algorithm. Another focus is the development of intuitive user interfaces that encourage novice users to engage with synthesizers and learn the relationship between synthesizer parameters and the associated auditory result. To this end, a novel interface (Synth Explorer) is introduced that uses a visual representation of synthesizer sounds on a two-dimensional layout. An additional focus of this thesis is to support further research in automatic synthesizer programming. An open-source library (SpiegeLib) has been developed to support reproducibility, sharing, and evaluation of techniques for inverse synthesis. Additionally, a large-scale dataset of one billion sounds paired with synthesizer parameters (synth1B1) and a GPU-enabled modular synthesizer (torchsynth) are also introduced to support further exploration of the complex relationship between synthesizer parameters and auditory results.