Designing Percussive Timbre Remappings: Negotiating Audio Representations and Evolving Parameter Spaces

Jordie Shier, Rodrigo Constanzo, Charalampos Saitis, Andrew Robertson, and Andrew McPherson

Paper | Appendix | FluCoMa Torch | Max Package

Website accompanying the NIME 2025 submission. See Appendix PDF for further details including feature extraction and hyperparameter tuning.

Annotated Portfolio

An annotated portfolio brings together a collection of individual artefacts with annotations that help to illuminate design thinking and capture similarities and differences amongst artefacts. They serve to present material that was created in a particular design context while drawing out important features that may be interesting in a broader research context. “Annotated portfolios are, perhaps, a way of modestly reaching out beyond the particular without losing grounding” (Gaver & Bowers, 2012).

Here, we present a set of eight patches that were created during an in-studio session. Videos of short performance recordings are provided for each patch, which are annotated with reflections on design thinking and motivations. We used the real-time timbre remapping system presented in the paper and each patch either exemplifies a feature of the system or presents an intervention applied to overcome a limitation or achieve an aesthetic goal. A concept map highlights important features that emerged during the curation of this portfolio and connects individual patches.

Derailer

Patch Notes

This patch used Derailer by Physical Audio, a physical modeling VST insrument, for timbre remapping. The MIDI note being triggered was varied on inference based on the spectral centroid of the real time audio analysis (256 sample window). The spectral centroid was scaled down to a smaller range of the available MIDI notes, then was split into two registers corresponding with the head of the drum and the cowbell, and each register was quantized to a fixed key and scale.

Reflections

(Author B) The long resonance of Derailer and the somewhat polyphonic nature of playback worked well with separating the two playing surfaces out into different registers. The idea here was to have a kind of “bass line” underneath a flurry of melodic pitches somewhat inspired by handpan instruments.

(Author A) Pitch and timbre interact in interesting ways in Derailer, with lower pitches creating deeper and darker timbres. Mapping spectral centroid from the input drum to pitch and splitting the ranges into different registers based on playing surfaces allowed us to create “two instruments” in a way. Quantizing the pitch mitigated the effect of subtle, but random pitch variations, turning them into a more musical performance.

Modular Synthesizer

Patch Notes

This patch uses the timbre remapping approach as presented in the paper applied to a Eurorack modular synthesizer patch. Five parameters are included for modulations and are controlled via CV from an Expert Sleepers ES-8. Modules/parameters used for mapping include: a Make Noise Maths, which was used for generating an attack-decay envelope (mapping decay), an Inelijel Shapeshifter was the main sound source (mapping FM ratio, FM index, modulator wavetable shape), an Intellijel Polaris was used as a low-pass filter (mapping resonance), an Intellijel uVCA was used as a voltage controlled amplified (VCA), and reverb was added with a Mutable Instuments Clouds. One channel is used to trigger the envelope on Maths, and another CV was being mapped to Clouds dry/wet, but was removed during performance.

Reflections

(Author B) There was a palpable fragility to the mapping here where tiny changes in dynamics led to very wide timbral changes on the synth. This led to a more subtle and subdued language that tried to navigate some subtle nuances in the notes. As a contrast, some faster playing was useful to show the melodic capabilities of this mapping and setup.

(Author A) This patch reflects a synthesizer patch that I might use in my practice with a modular synth, but with modulations on parameter values from the drum. One affordance of this patch that isn’t shown in the video, but has been musically useful, is the ability to directly modify parameter values via the knobs on the synthesizer or scale the CV values being used for modulation. This enables high-level control in conjunction with the modulation mapping and allows for collaborative performances.

808 Snare

Patch Notes

This patch used a mapping generated by the genetic algorithm crawl, but was then modified for playback by removing the modulation of one of the oscillators to limit the variability of the snare “tone”, keeping it more in line with conventional uses of an 808 snare.

Reflections

(Author B) Using the patch with the reduced pitch mapping felt much more like an 808 snare would, in the context of a backbeat. To highlight this, a more conventional backbeat with “cowbell” accents was used to demonstrate the nuance available in the core 808 hits based on the dynamic and timbral variation of the input sounds.

(Author A) Our perceptual sensitivity to pitch and the use of frequency controls on the 808 by the mapping system to achieve target timbral variations resulted in mappings that mis-aligned with our expectations. Removing frequency control altogether during playback was an effective method for re-aligning expectations.

Loudness Compensation

Patch Notes

This patch used a noisier Quantussy mapping as the basis, and was chosen for its acoustic snare-like qualities. To add additional variance, the loudness of the matched preset was compensated for by computing the difference between the preset and the real time audio. The result is that loudness closely followed the performance dynamic.

Reflections

(Author B) Although this example is simply a demonstration of the impact of loudness compensation, the overall expressiveness enabled by loudness compensation meant that it was something that was applied to nearly every example. Even though loudness is directly encoded in some descriptor types (i.e. hybrid descriptors and Melbands) and implicitly in the others via just variations in synthesis parameters, direct loudness compensation resulted in additional dynamic variance that felt significantly more organic than with it being disabled.

(Author A) The effectiveness of loudness compensation highlights the benefit of directly connecting the dynamics of the input drum to the synthesized results, regardless of the timbre. Considering loudness/timbre interactions in conjunction with direct loudness compensation is something that we didn’t explicitly investigate, but could be an interesting line of inquiry for future exploration.

Classification + Parallel Timbre Remapping

Patch Notes

This patch is built from two parallel mappings on the Quantussy synthesizer which are routed to during inference from a classifier trained on examples of hits on the head and on the rim of the drum. The loudness of the realtime audio input is additionally mapped to the duration of the low-pass gate that the genetic algorithm crawled through. The additional layer of a one-to-one mapping (loudness->duration) adds morphological variety to the timbral range of the inference.

Reflections

(Author B) Having two clear presets opened up some performance possibilities where inter and intra modulations were more distinct and “playable” in the instrument. The additional mapping of loudness to duration allowed for some very fast gestures and rolls, while still retaining clear articulation with the quieter and shorter sounds.

(Author A) Similar to the Derailer patch, this patch represents another approach to distinguishing between the different playing zones present on the drum, effectively creating “two instruments.” The classification approach enables two different mappings to be included in parallel, meaning that each parallel path can be modified independent of each other, for example by using two different synthesizers or different feature sets.

Multiple Timescale Morphing

Patch Notes

This patch uses three different mappings, each generated from the same anchor preset on the Quantussy synthesizer, but focusing on a different window size (timescales) for each: 256, 4410, and 11025 samples, respectively. The idea was to capture the morphology of the synthesizer using features captured using each timescale, and then map to synthesizer parameters for each timescale using timescale regressors.

The presets are then morphed through (via linear interpolation applied to synthesizer parameters) at the time of inference based on the morphology of the original synthesizer sound; the first preset corresponding to 256 samples is triggered immediately, then the preset corresponding to 4410 samples after 4410 samples, and finally the preset corresponding to 11025 samples after 11025 samples. This creates an evolving synthesizer sound.

The interface also allowed for transforming and distorting the preset interpolation in a manner similar to an attack-decay-release (ADR) envelope in conventional synthesis topologies. This allowed us to compress or stretch the time spent in each preset.

During inference, loudness and spectral centroid were also mapped to the interpolation/timbral morphology duration (making it shorter or longer overall) as well as adjusting the attack and release of the ADSR being applied to the Quantussy output.

Reflections

(Author B) This combination of processes, incorporating multiple timescales of feature mappings and real time analysis, via timescale regression, and additional layers of morphological mappings based on real time descriptor analysis felt the most expressive and dynamic of all the mappings. The nuance and variability afforded by the layers of mappings created a responsive and expressive instrument in terms of timbral, morphological, and dynamic playability. Much like in the classification example, the direct control of duration also enabled incredibly fast gestures without any loss in clarity.

(Author A) Early experiments with a prototype mapping system identified that sonic morphology of our input sounds mapped poorly onto synthesizer parameters responsible for controlling morphology (e.g., envelope parameters). This patch shows one method for overcoming this limitation through computation of audio features over different timescales and explicit mapping of presets over time.

Latent Feature Control

Patch Notes

This patch uses the same set of mappings as the timescale morphing example, and on the same Quantussy synthesizer, but makes explicit use of the “latent” manipulation of the feature differences in the anchor used to compute the nearest matching preset. In the video, one author adjusts “latent controls” via sliders (shown in the top right of the video) while the other performs on the drum. The top slider adjusts the range and offset of spectral centroid, spectral skewness, and 5% spectral rolloff. The second slider adjusts the range and offset of spectral spread.

Reflections

(Author B) Although this example is simply a demonstration of the “latent” space scaling, this kind of adjustment and fine-tuning was profoundly helpful when evaluating the expressivity of any given mapping. There were many examples where the preset encoded the timbral difference well, but the resultant mapping felt “too bright” or “too harsh”. Being able to manipulate and scale the “latent” space this way made these kinds of presets still useful and expressive.

(Author A) The “latent controls” apply affine transforms (i.e., shift + scale) to feature differences, and were surprisingly effective at enabling high-level modifications. The effect of this was similar to the modular patch where high-level modifications were applied directly on the synthesizer, however, here they are applied at an intermediary stage within the mapping. The extent of modification is limited by the range of sounds “discovered” during the genetic algorithm search and could likely be improved with larger searches or with a more general parameter mapping neural network (e.g., Vaillant & Dutoit (2024)).

Multiple Timescale + Multiple Feature Morphing

Patch Notes

This patch is similar to the timescale morphing example, using a Quantussy synthesizer, but uses different feature types for each temporal slice of the preset. This allowed using features that were more tailored to different areas of the sound, for example, using MFCCs computed with a window of 256 samples to capture the transient phase of the synthesized sound, which we found beneficial due to its ability to encode complex timbral variety. This patch also made use of the loudness and spectral centroid mappings tied to playback morphology and some “latent” feature manipulation to shift the overall voicing of the presets.

Reflections

(Author B) Much like in the Timescale Morphing patch, this complex and multidimensional mapping schema creates an expressive instrument with clear detail and articulation for the different types of sounds coming from the muffled drum, rim, and cowbell.

(Author A) Different audio features may be more or less suited to capturing sonic phenomena that are present at different times during the evolution of sound. This patch serves to highlight the ability of our timbre remapping system to support this sort of audio feature pick-n-mix to draw out different sonic attributes across the evolution of a sound. Although not thoroughly investigated here, it points to a line of future work that investigates morphology with audio features.

Concept Map

Different concepts emerged from patching examples and annotated reflections. The graph below presents a visual summary connecting concepts and patches, where circle nodes are the different patches (click on a circle to navigate back to the video) and concepts are represented by squares. A brief description of each concept is provided below the graph.

Direct Mapping: Mappings made in addition or in parallel to timbre remapping. These include directly mapping loudness from the input loudness to the synthesizer loudness, or mapping specific features to controls such as envelope controls to effect more dynamic response to the input.

High-Level Control: Patches that enabled high-level modifications to mappings/synthesized sounds.

Mapping Intermodulations: Distinguishing between intermodulations more explicitly through mapping, where we refer to intermodulations as larger timbral differences coming from different sound sources (e.g., drum head vs. rim). These mappings sought to create more sonic distinction between these sources and used them for timbral performance.

Morphology: Evolving timbral and amplitude attributes over time through interpolation across multiple presets or with explicit amplitude envelopes.

Navigating Pitch: Intervening on pitch mappings by reducing the emphasis of frequency controls or by quantizing pitch controls and explicitly mapping to MIDI notes.

References

Gaver, B., & Bowers, J. (2012). Annotated portfolios. Interactions, 19(4), 40–49. doi.org/10.1145/2212877.2212889

Le Vaillant, G., & Dutoit, T. (2024). Latent Space Interpolation of Synthesizer Parameters Using Timbre-Regularized Auto-Encoders. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 3379–3392. IEEE/ACM Transactions on Audio, Speech, and Language Processing. doi.org/10.1109/TASLP.2024.3426987