Вы находитесь на странице: 1из 73

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/324808456

Multitrack Mixing: An Investigation into Music Mixing Practices

Thesis · March 2018


DOI: 10.13140/RG.2.2.26537.49767

CITATIONS READS

0 94

1 author:

Josef Tot
Staffordshire University
4 PUBLICATIONS   0 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Multitrack Mixing: An Investigation into Music Mixing Practices View project

All content following this page was uploaded by Josef Tot on 27 April 2018.

The user has requested enhancement of the downloaded file.


Multitrack Mixing
An Investigation into Music Mixing Practices

By Josef Tot

MSc Music Technology

A project submitted in partial fulfilment of the award of the degree


of Master of Science from Staffordshire University

Supervised by Dr. Dave Payling, March 2018

School of Computing and Digital Technologies


Abstract

Presented work explores music mixing practices and workflows commonly used
by experienced authors and acclaimed mix engineers. A historical overview is
given, exploring roots of mix engineering up to the first use of multitrack tape
recording and to dates view on mix engineering as creative “mix artistry”. Analysis
of 10 participant mixes and perceptual audio evaluation by 21 assessors feedback
were undertaken and compared with dynamic and spectral audio features of
multitrack’s and final mixes. Additional surveys recorded 29 subject responses
enabling insight into mixing workflows and common practices used by engineers.
Further literature research extracted 800 equalization and 300 compression rules
deriving author, tutorial and manufacturer’s mixing recommendations. Text mining
of 195 mixing session transcripts of critically acclaimed mix engineers explored
mix engineering data using natural language processing techniques. Investigated
literature proved that audio perceptual evaluation (APE) is an efficient
methodology for collecting subjective preferences, hence was used to collect the
perceptual feedback of 21 mix engineers. The concluded data showed agreement
among participants ratings toward positive and negative quality mixes. Analysis
combined assessor comments and ratings with dynamic and spectral audio
feature data, concluding qualitative differences between the mixes. The further
combination of perceptual data and features with session settings allowed better
estimation into which parameters and mixing strategies may contribute to the
enhancement of mixes.

1
Table of Contents

Introduction 3

Mixing History and Analogies 4

Literature Review 7

Mixing Session & Survey Analysis 15

Invitation of Mix Engineers & Preparation 15

Organization & Workflow 17

Submixing, Color Coding & Track Order 19

Mixes Dynamic Properties 22

Mixes Spectral Properties 30

Audio Perceptual Evaluation 35

Perceptual Evaluation Participants Background 36

Perceptual Evaluation Ratings 39

Perceptual Evaluation Feedback Analysis 41

Acclaimed Engineers Semantic Analysis 50

Conclusion 56

Appendix 57

References 66

2
Introduction

The aim of this project is to measure the subjective preferences toward mixes and analyze
parameters used to create music mixes. The subjective feedback of 21 assessors toward 10
mixes is then combined with mixing surveys, mixing parameters from literature and acclaimed
mix engineer sessions to explore workflows and techniques used in popular music mixes. The
aims and objectives can be broken down as follows:

• Investigation of literature and manufacturer mixing parameters


• Investigation of “real-world” mixes workflow, parameters & audio features
• Analysis of audio perceptual feedback and sentiment classification
• Analysis of popular mixing engineer’s workflow using text analysis techniques

Historical advancements in audio technologies enabled the recording, transmission and


duplication of sound for the masses. Developments such as the phonautograph and subsequent
phonograph made it possible to record, replicate and commercialize music recordings, further
enabling the development of advanced recording and mixing technologies. The relative recent
introduction of digital audio workstations and virtual studio technologies democratized the
production of music itself (Owsinski, 2017). Apart from reducing recording costs and enabling
home studio owners to achieve high quality recordings, developments also enabled access to
an integral part of the music production chain: mix engineering (Huber & Runstein, 2014).

This project examines the process of mix engineering, which is undertaken after a recording is
finalized and before a song (final mix) is sent to mastering. In the context of music production,
terms such as mix engineering, mixing, and mixdown denote synonyms of multitrack mixing or a
studio mix of recorded music. Broadly, the objective of mixing is finding and applying the right
settings of audio processing parameters and effects on multiple audio sources, resulting in a
desirable sounding final mix. Fundamentally, mixing is a technical artform in which usually an
experienced mix engineer through a myriad of iterations adjusts dynamic, spectral, and spatial
audio features among multiple audio tracks to achieve a perceptually cohesive mix (Izhaki,
2018). In other words, audio signal processing tools such as equalization, dynamic range
compression, reverberation, delay, modulation as well as level adjustments and panning are
applied to dynamically, tonally and spatially balance multiple audio tracks to achieve a
technically, but more importantly, creatively adequate stereo mix. Instrument, vocal and fx
recordings are commonly referred to as multitrack’s or multitrack stems and contain the final
recordings (takes, composites, edits etc.) undertaken in prior recording sessions, usually within
respective audio files such as e.g. consolidated WAV or AIFF files commonly of the same length
(Nichols, 2017).

Naturally, fundamental questions emerge when introduced to the subject: “how do mix
engineers make a great mix?”. Unfortunately, research into the perceptual side is fairly limited
and understanding the subject relies on a few textbooks and profoundly on experience or
concepts that give only little insight into why a certain mixing technique led to a great sounding

3
mix. Furthermore, achieving consensus on what great mixes are is only possible by
understanding the language used to describe negative and positive subjective preferences
towards audio features, thus requires bridging the gap between an emotional response and
descriptions of audio stimuli using natural language.

Mixing History and Analogies

The invention of magnetic tapes first spawned multitrack mixing as it is popular to date.
However, mixing or the mindset of a “right sound” in consideration of a mix of audible sources
and characteristics can be traced back much earlier. Arguably earliest forms of mixing were
simply considered when a subject performed a musical piece in a way that would complement a
songs tonal and dynamic balance (Izhaki, 2018).

A study of 3 painted caves in France proposes that the location of paleolithic paintings strongly
correlate to acoustically resonant spaces within cave systems, consequently suggesting that a
resonant space played a key role as far back as 20,000 BC (Scarre, 1989). Similarly to how
archeologists produced a series of tones to determine points of resonance, paleolithic man
would through trial and error have played an early musical instrument or chanted through cave
systems until finding resonant ceremonial spaces yielding a better sound for ritualistic purposes.

As far-fetched as it may seem to compare the process of paleolithic man with mixing, the core
process of iterating until a “good sound” is found is similar to mixing, where engineers iterate
through processing parameters until a pleasant sound is achieved. The principal conception
draws many parallels as shown in below process (fig.1) which represents a fundamental
concept to mix engineering:

Fig. 1 “The three steps of creative mixing” (Izhaki, 2018, p.21)

Wherein a subject envisions a goal “I want a good sound”. Then acts through gathered
knowledge or experimentation, and finally evaluates the outcome until the best solution is found.

Much later example may be the 12th century polyphonic church music composer “Perotin the
great” who considered the “right mix” of compositional approaches (note articulation, timbre,

4
spacing of notes etc.) for the acoustic spaces he composed for, such as when commissioned to
compose for the French cathedral of Notre Damme (Cox, 2014).

Centuries later more advanced uses of spatial effects in music may be found by the 15th
century composer Giovanni Gabrieli who composed sacred vocal and instrumental music
deliberately utilizing acoustic characteristics of the St. Mark’s church in Venice, lending his
compositions antiphonal, echoing and reverberant features (Cox, 2014). Notably he relocated
choirs and ensembles in various parts of the church experimenting and creating desired effects.

Consequently, a musical piece written for one space may have not always sounded adequate in
other spaces. As a result, words describing negative characteristics of sound emerged. An
example may be 18th century romantic composer Hector Berlioz commenting on hearing Haydn
and Mozart’s work being played in a large space: “The symphonies of Haydn and Mozart
(generally speaking, works of a rather intimate kind), when played by an inadequate orchestra in
a building far too large and acoustically unsuitable, produced about as much effect as if they
had been performed in the Plaine de Grenelle. They sounded small, frigid, and incoherent.“
(Eustace, n.d.). In the case of receiving descriptors such as “small, frigid and incoherent” an
experienced mix engineer might already possess various solutions to overcome a poor mix.

The later invention of recorded media dramatically changed the way music was perceived and
enabled the replication of music for the masses, consequently demanding further audio
enhancement technologies.

French inventor Edouard-Leon Scott de Martinville Patented the ‘Phonautograph’ in 1857, which
was the first device to record sound deliberately (UCSB, 2005). As explained in NPR Radio’s
podcast 1860 'Phonautograph' Is Earliest Known Recording: Feaster & Flatow (2008) audio
engineers at the Indiana University in Bloomington were able to reconstruct the
“phonautograms” which were made onto a sheet of soot-covered paper by vibration of a
membrane and bristle, as a result creating recordings of to date earliest human voice and vocal
music performance. Although it is not known whether Martinville recorded multiple sources
simultaneously he must have positioned himself strategically around the membrane to capture
an adequate representation of the sound wave.

Edison’s later phonograph made it first possible to play back a recorded medium utilizing a
malleable tin foil as physical recording and a stylus which through a membrane attached to a
horn caused the vibrations to cut a groove into the foil. Subsequently the playback of the groove
caused the membrane to move in sympathy, thus causing the air in the horn to reproduce the
sound. The commercial distribution however was made possible by Berliner’s gramophone
discs, which were simpler to replicate (Rumsey & McCormick, 2013).

Notably early recordings onto acetate disc required the whole band to be playing in one room,
where the recording engineer strategically positioned band members around the horn and
instructed them to play as loud as possible (Winer, 2012). Similarly, the engineer made use of
positioning individuals until the best sound (“mix”) was achieved.

5
Later magnet tape technology and the commercialization of tape recorders in the late 1940’s
enabled the recording of multiple audio sources onto tape machines, which was an important
development as it was also first utilized for multitrack recordings as explained by AES (1989) in
an interview with Bill Putnam:

“I was fortunate to be on the show only as it related to having done the first multiple voice
recording. I did the original tracks, the rhythm tracks on disk at 78 rpm on the outside diameter
of a 17 114 inch disk. Then we did the second generation and the rhythm track with one vocal
part, usually the lead. " (AES. 1989, p. 730).

Multitrack recording then was further popularized by Les Paul who commissioned AMPEX to
build the first 8-track recording deck, which he routinely utilized for his “sound on sound”
technique in the 50’s (Juried, n.d.).

Moving forward to dates advancements in discrete audio and mixing tools (equalization,
dynamic range compression etc.) came thus far as to virtually introducing no degradation of
audio signals while enabling users the simple manipulation of audio sources with nearly minimal
effort (e.g. DAW). This in turn formed creative descriptions of plugin parameters to use in order
to achieve perceptually adequate characteristics in music mixes and progressively turned mix
engineering into more of a “mix artistry”. This is best explained by acclaimed mix engineer
Michael Brauer at recent AES convention as seen in Mix With The Masters presentation
Michael Brauer live workshop at AES NY 2017 (2018):

“Whether it’s a compressor or it’s an EQ, I don’t look at it as a piece of gear or piece of software,
I’m just thinking of what’s an attitude? What am I trying to feel? It’s always based on that, it’s
always been based on that. …you know I look at the gear and I don’t see like a motown EQ, I
just think: ”ah man, that’s gonna add a sort of “fatness” on the kick and then I can scan through
all my compressors and say oh that’s “silky”, “nasty”, “rough”, you know “smooth”. …let’s say I
got a vocal and I go like: ”man that vocal really lacks urgency”, so I’ll just scan through my stuff
you know (outboard compressors)...or: ”ah that guitar man, if I can just get more “bump” or real
“fatness” and get it “nasty” so you know I just go through my software (plugins) and look…” (Mix
With The Masters. 2017, 1:13 min.).

The engineers therefore conceptualized the technical parameters under terms such as “silky”,
“rough”, “smooth” etc. and “play” the mixing tools like a musical instrument.

6
Literature Review

Empirical research publications into automated mixing technologies offered the best resource
for the essence of what mix engineering is while connecting perceptual characteristics with
technical parameters. On a practical or conceptual level, experienced mix engineer’s tutorials
and books made best explanations applicable through concepts.

Various books and plugins mixing rules were extracted and combined with existing collections
amounting to a total of ~1300 rules. Some literature sources were entirely dedicated to mixing
while majority where either recording or music production books including chapters dedicated to
the mixing process. More concise books were often practically inclined supplying information as
“tips” while others were practical tutorials showcasing exact parameter settings used with a set
of specialized manufacturer plugins. Notable difference between practical tutorials and theoretic
literature was that the latter used less drastic parameter settings.

Observably computer music science labs and audio engineering centers are most commonly
attempting to automate the mixing process. As to the definition of mixing, engineering literature
agrees it to be a task of expertise and being “non-linear” in nature, which yet must be measured
empirically in order for machines to be able to understand the process. Goals vary between
computer assisted mixing and fully automating the process with either knowledge engineered
processes, grounded theory or artificial intelligence and machine learning algorithms (ideally
combination thereof) as seen in Semantic Audio’s video Intelligent Music Production:
Challenges, Frontiers and Implications - Josh Reiss (2015).

For the formulation of what the mixing process entails it is important to analyze processor
settings used by experienced mix engineers. A great extent of groundwork is laid out by
researchers at QMUL and Birmingham City University, primarily investigating autonomous
mixing systems. One of several studies according to De Man & Reiss (2013) concluded:

“We have found such a knowledge-engineered system performs well, with no measured
difference in subject preference from professional human mixes, and outperforming a system
based on recent automatic mixing work using only low-level features to inform mixing
decisions.” (De Man & Reiss. 2013, p. 18).

In this study researchers extracted semantic mixing rules from text books and applied collected
parameters settings to a set of multitrack’s, while the exact same multitrack material was mixed
in parallel by a group of professional engineers. Perceptual data showed that there was no
significant difference in perceived quality, hence using presets and standards derived from
literature can greatly speed up the mixing process for practical purposes of this study.

Furthermore, the work laid out by primarily De Man (2017) was extended by collecting additional
literature resources in total amounting to ~800 equalization and ~300 compression rules as well
as various suggestions to the order of mixing processes as well as common effects processors
used to create perceived width, depth and space in mixes. The information was collected from

7
authors who are renowned engineers and of which majority have worked with critically
acclaimed artists and therefore have gathered substantial experience.

Although many engineers stress the importance to view such rules as mere guidelines, a
database of rules may further improve the accuracy.

Best practices and recommendations (EQ, compression etc.) were gathered extensively by De
Man (2017) covering following literature sources (titles only):

● Mixing Audio: Concepts, Practices and Tools


● Mix Smart: Pro Audio Tips for Your Multitrack Mix
● Mixing Secrets
● Sound FX: Unlocking the Creative Potential of Recording Studio Effects
● The Mixing Engineer's Handbook
● Guerrilla Home Recording: How to Get Great Sound from Any Studio (No Matter How Weird or Cheap Your
Gear Is)
● The Art of Mixing: A Visual Guide to Recording, Engineering, and Production
● Basic Effects & Processors
● Compressors: Exploration
● Basic Mixers
● Practical Mastering: A Guide to Mastering in the Modern Studio
● Complete Audio Mastering: Practical Techniques
● Mastering Audio
● Modern Recording Techniques
● Zen and the Art of Mixing

Which were additionally combined by following literature sources parameter recommendations


(titles only):

● The Audio Expert


● Pro tools 9, The Mixers Toolkit
● Mixing and Mastering in the Box
● Dance Music Manual
● The Systematic Mixing Guide
● Audio Engineering 101
● Mastering Pro Tools Effects
● Idiots Guides to Mixing Music
● Recording Tips for Engineers
● Producing in the Home Studio with Pro Tools
● Production, Mixing, Mastering with Waves
● Pro tools 7 Session Secrets
● Desktop Mastering
● Pro Tools all in one Desk Reference
● Remixers Bible
● Creative Sequencing Techniques
● The Secrets of House Music Production
● Practical Recording Techniques
● Audio Post for Video Production

8
EQ and compression rules were parsed into an excel sheet available through below links:

Literature EQ Practices & Spectral Characteristics Recommendations:


https://docs.google.com/spreadsheets/d/1xzogs5nnUP-
j5bbWCJWvv6KNGJRZaJvDsZJ0SDSup-Y/edit?usp=sharing

Literature & Manufacturer Compression Recommendations:


https://docs.google.com/spreadsheets/d/1m3rWn9ID1oH_BmNa_skakwBI1EhvOxsfXkmOknMP
_Hs/edit?usp=sharing

The parsed rules are categorized by audio tracks, descriptors and their relevant parameter
suggestions. Excerpt table below (fig. 2) showcases the basic categories:

track name descriptor filter subtractive additive


kick bottom bell 60-100
kick attack bell 1000-3000
kick click bell 4000-5000
Fig. 2 “Literature EQ Practices & Spectral Characteristics Recommendations”

Wherein above example outlines the “kick” track being treated by an author using additive
equalization for more perceived “bottom”, “attack” or “click” as well as the corresponding
frequency range in Hz.

Among many rules various widespread equalization techniques have been observed frequently
such as filtering cymbals up to extreme values such as 700 Hz using hi-pass filters (Egizii,
2004). Usually the higher the hi-pass filter the less drastic the roll-off curve (6db/oct), whereas
when hi-passing a kick drum “rumble” (<40 Hz) much steeper roll-offs were utilized (18-
24db/oct) (Adam & Ward, 2007).

The overall data was scattered as nearly every author specified only few rules such as the
descriptor e.g. “rumble” and its location within the frequency band. Exact parameters as to the Q
width and gain (dB) were less commonly mentioned. If an author referred to a technique as
adding it to a mix without specifying gain values then it was parsed into the “additive” category.
If an author presented data without exact information on how to utilize the frequency range then
it was parsed as “neutral” attribution. Occasionally the author mentioned additional comments
that should be taken into consideration when using the EQ techniques, which were parsed in
the “comment” category.

Similarly, compression techniques were collected and parsed into a separate excel sheet as
demonstrated in an excerpt (fig. 3) below:

9
track descriptor threshold (db) gain reduction (db) ratio (n:1) attack (ms) release (ms)
pump, in your
vocal face -2 more 4-6 <1 40
electric guitar slight punch -1 peaks 2-3 25-30 200
bass guitar control -4 peaks 2.5-3 40-50 180
Fig. 3 “Literature & Manufacturer Compression Recommendations”

Recurrently, not all parameters were mentioned, and some values were referred to in very
generalized terms only (e.g. “more”, “peaks only” etc.). Various concepts to the perceived
character of compression were common such as utilizing the tool to only catch “peaks”, which
meant rather sparse use when compared to extreme uses such as “squash” or “slamming” a
track with generally “severe” ratios e.g. 8:1 (Krug, 2013). In general, fast attack settings (0.01-10
ms) “kill” transients, and slower attack times (>50ms) let transients pass to create more “punch”
while it was commonly agreed upon that release time is dependent on the songs tempo and in
general should return prior to the next critical transient (Dittmar, 2013).

Panning rules generally recommend heavy percussive elements to be mono, while “lighter” e.g.
cymbals, hats, shaker elements to be more spread, creating more width and separation among
drum kits. Occasionally it is recommended to pan the snare +25-45%. Bass heavy instruments
are generally mono while e.g. guitars are panned opposingly (commonly -100/+100%). Authors
panning rules were parsed into a spreadsheet available through below link:

https://docs.google.com/spreadsheets/d/1anRuOnjoz0M9eGHJBQQsANu8gsXtNRT8KqOi_xHK
xpM/edit?usp=sharing

Similarly send effects recommendations were collected into following spreadsheet:

https://docs.google.com/spreadsheets/d/1GFyLdT9JtXC-
S0NOHm9mTX9S1xiMyWleCob3gnFVpVw/edit?usp=sharing

As mixing is a non-linear and a complex process which follows an organic approach, it is very
difficult to map the order of individual processes. Nevertheless, given the various authors
recommendations and order of presenting their approach an approximate map was attempted
for visual purposes available through following link:

https://drive.google.com/file/d/1l6s9ua-kvqfhSoXZjVxZgsKXaP12JLC5/view?usp=sharing

One of many processes such as the use of “subtractive” processing before adding FX is
commonly considered a logical approach, as additive processing on top of untreated audio may
increase noise, amplify excessive resonance or unwanted artefacts. Additionally, identifying
frequently equalized instruments may allow the setup of tracks with EQs and common settings.

10
A common processing unit in following chain may be a compressor, again with a common
setting derived from literature or an averaged setting considering all recommendations for the
instrument in question. Also, the organizational approach can potentially be greatly sped up by a
template which includes a series of common instruments (respecting engineers genre routines)
as well as color coding, grouping, session markers, sub mixes, aux FX, mix buss etc.

Considering all recommendations however mixing is far more complex than current literature
research may be able to explain as e.g. compressors come in various forms such as FET,
optical, tube (vari-mu), VCA etc., of which every compressor may have a different perceptual
characteristic as e.g. an analog modelled tube compressor may introduce more harmonic
distortion when compared to a VCA (Universal Audio, 2009). Similarly, there are tremendous
amounts of equalizers of which probably majority will have a different perceptual character
because of manufacturing choices (e.g. digital or analog emulation), overall circuit design or use
of minimum phase (IIR filters) vs linear phase (FIR filters) (MiniDSP, n.d.). Also considering that
many popular plugins are analog emulations engineers need to consider the input signal to be
at nominal operating levels respecting manufacturers dbu calibrations e.g. -18dbfs (Waves,
2017).

The non-uniformity at which plugins are manufactured further complicates the process of
deriving a streamlined workflow and increases the count of “if then” scenarios infinitely that can
only be mitigated by collecting data and by setup of a strategic workflow tailored to specific
circumstances, while at the same time keeping up to date with musical and technological
advancements.

Relative recent research into automated mixing tools enabled researchers developing an
intelligent audio workstation (IAW) accessible through an internet browser allowing the
manipulation of mixes similar to standard digital audio workstations such as Pro Tools.
Advanced functions enabled the extraction of participants usage metrics as stated by Jillings &
Stables (2017):

“In an early experiment, we were able to use the DAW to discover trends in users producing
balance mixes [8]. The results showed that performing a mix on an excerpt of a song takes 100-
150 actions over 8-16 minutes of work (an interaction every 4-10 seconds).” (Jillings & Stables,
2017, p.3)

Further data extraction showed which actions were performed and how often as seen in table
(fig 4) below taken from the authors publication:

11
Fig. 4 “Count of all action types when creating a balance mix” (Jillings & Stables, 2017, p.3)

Compared to current projects mix engineers track routing approach, similar sub mixes were
created in the online IAW for drums (containing kick, snare, hat, toms etc.), while leaving the
bass primarily isolated as stated by Jillings & Stables (2017):

“There were 13 entries in total for this song, and three prominent groups of tracks appear which
can be attributed to the Drums (cyan), percussive (red) and synthesisers (green) of the mix. The
bass guitar is mostly independent and has a high distance from the other instruments in the
mix.” (Jillings & Stables, 2017).

The most relevant literature for this projects perceptual evaluation and mixing practices analysis
are derived from publications such as “Analysis of Peer Reviews in Music Production” (De Man
& Reiss, 2015) and “Toward a better understanding of Mix Engineering” (De Man, 2017). This
work follows similar principals in the collection of mix feedback data using the APE methodology
which was devised by the mentioned authors themselves and then undertake similar keyword
analysis to be able to compare textual feedback with audio features.

The main differences in this projects approach are the more conceptually inclined collection of
data and less computer scientific rigor e.g. the further ranking of textual feedback by considering
additional weighting of semantic quantitative terms such as “slight”, “bit”, “too much” etc. pose a
risky and yet not standardized approach, which are judged by the author himself. Nevertheless,
the attempt will be made offering a slight different methodology. Also, the inclusion of a different
multitrack set (authors own production), analysis of pro tools sessions with various manufacturer
plugins and international assessor demographic may lead to different results, which will be
compared and contrasted to existing data.

In order to gain “real-world” understanding of mixing it was important to assess collected mixing
engineer’s sessions effectively. An evaluation solution needed to be found available ideally
through a web browser, while allowing enough data to be collected for meaningful analysis. A
paper namely ”Web Audio Evaluation Tool: A Browser-based Listening Test Environment”
introduces a project for a browser based listening test environment without the need for

12
participants to install proprietary software “WAET”, thus was the best solution for purposes of
this study. As outlined by the creators (Jillings et al, 2015) in comparison to available tools:

“For instance, the option to provide free-text comment fields allows for tests with individual
vocabulary methods, as opposed to only allowing quantitative scales associated to a fixed set of
descriptors.” (Jillings et al. 2015, p. 2)

Furthermore, the use of the XML document format made it simpler for the author to modify the
interface for adding customized feedback boxes such as “what did you like about mix XYZ” as
well as “what did you dislike about mix XYZ” allowing the collection of both positive and negative
assessor feedback. Additionally, options such as “randomize fragment”, “require moving” and
“enforce scale usage” ensured more reliable data, as subjects would be reminded in case of
forgetting to comment or moving audio stimuli.

The WAET interface included all functions necessary to provide an as bias free collection of
data as possible. The randomize function was enabled to mitigate “order biases” and
“sequential dependencies” (Zacharov, 2007), which is also recommended by the ITU-R
BS.1534-1. The ITU further recommends keeping the audio stimuli short in length as candidates
cannot recollect more than 20 seconds of audio correctly for purposes of comparison, due to
human memory limitations (ITU, 2003). However, in case of this projects stimuli, limiting the
samples to a short length would have not enabled the playback of at least 2 sections of the
mixes, which would be unnatural for music production purposes because participants would
merely judge one section of a song only, while mix engineers often use automation or varying
plugin processing across several sections of a song. Therefore, for the projects purposes, the
samples were limited to 26 seconds which enabled the playback of the bridge leading into the
chorus of the mixes, enabling a more realistic judgement. Additionally, given the assessors
survey responses, only one assessor mixed less than 20 songs, which meant that 20 assessors
at least mixed 20 songs in their career or as hobby. From this it is safely assumed that the
candidates possess enough experience to recollect information about a 26 second audio
excerpt for comparison between mixes as mixing is a highly evaluative task.

Further custom approach was the use of a simplified scale including “least best” to “best” mix.
One of the main reasons was that music mixes are highly subjective and the recommended
rating by the ITU-R BS.1284-1 with inclusion of “bad” and “poor” as scale did not seem as fit as
rewording “bad” to a less offensive “least”. Also, as less incentives were offered for the prize
draw (only 1 random winner) meant that receiving a rating of “bad” may have negatively
impacted the participation rate out of fear of receiving a bad rating. Additionally, because of the
time consumption that it takes to evaluate mixes, additional scales such as “poor”, “fair” or
“good” may have led participants to rate mixes according to their preconceptions of e.g. the
word “fair” as opposed to using the mixes themselves as means of differentiation, which may
have further caused “stimulus spacing and frequency” bias as mentioned by Bech & Zacharov
(2007):

13
“However owing to the spacing bias, all parts of the response range will be used equally often,
which means that the average responses are more equally spaced than they should be
according to the spacing of the stimuli”. (Bech & Zacharov. 2007, p. 91)

To date it is unclear whether the impact of high sample rates in music recordings perceptually
makes a clear difference or not. The subject received substantial debate over many years with
no clear indication in relevance to its practicality in music production. The Nyquist-Shannon
theorem states that sampling a bandlimited signal at twice the highest frequency would suffice
to be able to reconstruct a signal without losses (Olshausen, 2000). However, several factors
may impact the perceived difference between standard and high-resolution sample rates. One
of various factors may be that anti-aliasing filters introduce errors during conversion of which
one significant factor may be “time smearing” which because humans can perceive timing
differences as low as 5 microseconds may be a relevant factor when differentiating between low
and high-resolution audio (Reiss, 2016).

Although a detailed analysis of the perceived differences is outside of the scope of this project a
meta study and analysis of 80 references toward differences between standard and high-
resolution audio by Reiss (2016) concluded:

“Overall, there was a small but statistically significant ability to discriminate between standard
quality audio (44.1 or 48 kHz, 16 bit) and high resolution audio (beyond standard quality). When
subjects were trained, the ability to discriminate was far more significant.” (Reiss. 2016, p. 373)

However, retaining high sample rates throughout a production chain may prove straining
because nearly every audio processor manufacturer implements their own band limiting
specifications as pointed out by Robjohns (2014) article:

“Not surprisingly, it is the emulations involving the most complex non-linearities that are band-
limited: plug-ins like the Urei 1176 and Neve 33609 compressors, and the Manley Massive
Passive equaliser, for example. These emulations all roll off smoothly above 28 to 35kHz”
(Sound On Sound, 2014)

Finally, if resources allow working with higher sample rates it should definitely be considered.
Moore’s law further predicts the exponential acceleration of technological advancements and
growth of integrated circuits which may allow the real-time processing of vast amounts of
multitrack’s (at high sample rates) simultaneously at low processing costs in near future.
Engineers should, however, still aim to maintain similar recording specifications across the
entire production chain (e.g. use of microphones, preamps, AD/DAs etc.) maintaining the high-
resolution paradigm.

14
Mixing Session & Survey Analysis

10 mixing participants pro tools sessions were collected and combined with the 29 mix engineer
surveys. A set of practically essential parameters and responses were parsed into excel and
graphing tools to be able to extract meaningful data. The basic methodology included
subtractive and additive arithmetic as well as mean and median calculations concluding the
session settings and surveys. Aim was to represent the data in such way that it could be easily
understood and potentially used to inform mix engineers decisions and to gain insight into music
mixing practices.

Invitation of Mix Engineers & Preparation

The mix engineers were sourced from online audio mixing forums and groups. There was no
formal screening and no minimum requirements set to limit the subjects to minimum mixing
experience or age (besides age of minimum 18). Majority of mix engineers had at least an
online portfolio showcasing mixes and according to the questionnaire except for one participant,
subjects had minimum 1-2 years of mixing or audio engineering related experience.

The songs multitrack’s were sourced from the authors own production undertaken in previous
studies. This made it simpler to understand the production circumstances and intention behind
the production. The song titled “If You Don’t Know” performed by Julian Michel was a hybrid
Pop/Hip-Hop production in an energetic, cheerful and upbeat type mood. The production itself
was done via a home studio environment using professional equipment. The beat was
sequenced in Ableton Live using MIDI, sample shots as well as several synths. The vocal
recording was undertaken in a DIY vocal booth isolated from background noise using a Rode
NT1a microphone and the Focusrite Scarlett 6i6 interface. The cut down song had 18
multitrack’s for which for purposes of this study 4 additional instruments were produced
amounting to a total of 22 multitrack’s. The additional melodic instruments (e-guitar, sax, strings
& trumpet) were added as the cut down song only had 18 multitrack’s with a rather sparse
harmonic section. The multitrack’s were shortened to a verse/bridge/chorus section amounting
to a total track length of 01:02 minutes. Participants were fully briefed about the projects aims
and objectives and given 12 days to complete the mix.

To keep the natural mixing workflow and create an as real as possible scenario it was important
to impose less limitations, hence let the engineers use their own mixing tools. Nevertheless a
few set of rules had to be considered to limit the submissions to tools available to the
researcher. To be able to open the sessions and analyze parameter and workflow settings,
participants were instructed to use Avid Pro Tools (minimum version 10) and digital processing
tools only (AAX) which did not need DSP acceleration cards such as UAD plugins. The
limitation was imposed after a test run where 1 participant submitted a Reaper session and 1 an
FL Studio session, which both had several compatibility issues ranging from being windows
versions, limited demo capabilities as well as using plugins not available for OSX. Additionally,
to focus strictly on mix processing, participants were instructed not to add sample replacement

15
or additional instrument tracks. The last crucial rule was to prevent the subjects from any master
processing such as limiting on the master buss, however participants were free to use mix buss
processing (e.g. compression or “coloring” a stereo mix).

Considering the timeframe and mix engineering being a time intensive task a prize money worth
100 GBP was made available, given out to one participant via a random ticketing system
(http://www.randomresult.com/) after the mixes have been collected.

16
Organization & Workflow

It is common standard to organize a session prior to mixing due to the complexity of the
process. A general mixing survey concluded that within a range of 2 days majority of engineers
estimate approximately spending 3.5 hours a day or 3 hours in two days each for finalizing a
mix (approx. 24 multitrack’s) as shown below (fig. 5).

Fig. 5 “Time spent on mixing” (Mixing Survey)

Fig. 6 “Genre Routine” (Mixing Survey)

However, besides the number of multitrack’s, the time spent on a mix can also depend on which
music genre is being mixed. 21 out of 29 subjects stated rock music as their primary mixing
routine, hence the amount of time taken completing a rock mix may differ from an EDM (11
responses) or Pop/Hip Hop (10 responses) mix as seen in above bar graph (fig. 6). It is
assumed that acoustic music with live recordings may take longer to mix when compared to

17
sequenced or sampled music as live recordings are more prone to errors (e.g. inconsistent
dynamics, background noise, microphone bleed etc.) whereas popular electronic music such as
EDM or Hip Hop to considerable extent rely on samples that most likely have been
preprocessed (e.g. licensed sample packs). Furthermore, the use of sequenced instruments
may have less dynamic artifacts as it is common practice to statically sequence e.g. trigger MIDI
“sample shots” or use prefabricated loops (Egizii, 2006). Nevertheless, this does not preclude
that an individual may still introduce errors in a “premix” while using sample triggers such as
mentioned by Snoman (2009):

● “Poor recording, programming or choice of timbre/sample


● Poor-quality effects, or use of effects when programming
● Poor arrangement or MIDI programming” (Snoman. 2009, p. 309)

Further survey data shows that prior to organizing the actual mixing session a series of “pre-
editing” may still be required due to errors in recordings. It is also assumed that a perfectly clean
and prepared set of multitrack’s greatly reduces the amount of effort needed to create an
adequate mixdown.

Following problems are most frequently encountered in order of occurrence (graph bars
available in appendix):

1. Unnecessary track count (superfluous audio)


a. stereo audio tracks where mono tracks would suffice
b. silent tracks or duplicates
2. Vocal recordings
a. Pop’s and otherwise undesired plosives or harsh
“ess”
b. Mouth sounds (lip smack & breath)
3. Dynamic artifacts
a. Harsh transients in instrument recordings
b. Dynamics (e.g. mic placement or moving subject)
4. Background Noise
a. General room noise
b. Instrument/headphone bleed
c. Hiss, hum, rumble etc.
5. Phase Incoherence
a. Out of phase multitrack’s
b. DC offset
6. Tuning
a. Out of tune vocals or instruments
7. Gain artifacts
a. Clipping or “hot” recordings

The relatively high number of erroneous recordings encountered may indicate subject’s clients
to be working from home studios with untreated rooms and with little to no recording expertise.

18
Notably errors may become more apparent after processing has been applied such as e.g.
dynamic range compression on vocals may increase background noise or reveal intricate errors.

The exact nature at which these errors occur is subject to further study. Nevertheless, given the
relatively high amount of responses an adequate mix may be less complex to achieve when
errors are removed prior to the mixing process. This could potentially be solved in a pre-mix or
pre-processing stage using editing techniques or audio restoration tools and then starting the
“clean” mix session.

Submixing, Color Coding & Track Order

The setup of sub mixes allows speeding up mixing as it allows processing a subset of tracks
using one auxiliary channel track. Common approach is submixing a set of tracks that belong to
a similar category e.g. “kick, snare and hats” as “drums submix”. However, depending on the
genre in question, different submixing strategies may be utilized. Also, a large set of multitrack’s
may require hierarchical submixing such as e.g. “kick 1”, “kick layer 2” may require submixing to
“kicks” (aux track) prior to further submixing to a “drum” aux track or stereo master. On the
contrary a genre with relative sparse multitrack count may need less sub mixes or none.

9 out of 10 engineers utilized submixing and each session had unique approach. Below
dendrogram (fig. 7) shows mix 8’ submixing strategy categorizing tracks into a “drum sub”, “inst
sub” and “vocal sub” prior to sending the audio to a sub master.

Fig. 7 “Mix 8 Submixes Dendrogram” (Mix Engineer Sessions)

19
Whereas mix 1 sub mixed tracks as shown below (fig. 8) into “chorus Vox”, “outro”, “keyboard”,
“horns” & “kick”, while having percussive and FX related tracks routed directly to the stereo
master.

Fig. 8 “Mix 1 Submixes Dendrogram” (Mix Engineer Sessions)

Comparatively the submixing of less tracks in mix 1 may have led the mix engineer to frequently
iterate level balances between individual non-submixed audio on a track level, therefore
achieving better assessor feedback. However, this observation may be viewed carefully as mix
engineers still can adjust individual track levels within submixes and at this point it is unclear
how exactly engineers iterate through the levelling process.

Nevertheless, a negative qualitative indicator in Mix 8 (fig. 7) may be the submixing of vocals as
well as reverb and delay under same submix while utilizing compression on the “Vox sub”,
which allows less control over individual track processing and in majority of the mixes is an
uncommon approach.

In nearly all cases specialized FX or single occurring tracks and parallel processing auxiliary
tracks such as reverbs, delays or FX were directly routed to either the mix buss or the master
output without further submixing. Tracks with no inherent similarity were often either submixed

20
under a “misc.” or “other” aux track or directly routed to a mix buss (sub master) or stereo
master. Further submix dendrograms can be viewed in the appendix.

Similarly, to submixing, timeline markers are commonly utilized to improve organization and to
mark changes in songs, which may indicate varying processing methodology respecting new
sections (e.g. level automation). Following session markers were utilized in 4 out of 10 mixes:

● Mix 1: Intro, Verse A, Verse B, Chorus A, Chorus B, Outro, End


● Mix 3: intro, verse, bridge, chorus
● Mix 4: song start, instruments in, hi hat in, all in, arpeggios in, sax in, end
● Mix 8: verse, bridge, chorus, end

Similarly, color coding was utilised in 8 out of 10 mixes, with no qualitative impact as mix 1 did
not have any color coding, while subject in mix 8 applied advanced color coding schemes such
as e.g. drum/percussion instruments colored red while submixes thereof being darker shade of
same color. The mixing participants color coding scheme similarly resembled the mix engineer
survey responses as seen below (fig 9.):

Fig. 9 “Color Coding Multitracks” (Mixing Survey)

Where responses were simplified into 1 graph showing the number of responses on the Y axis
while color spectrum on the X axis.

Majority of sessions mix window followed similar pattern when arranging the track order and
most common approach in the mix window was (left to right): drums, melodic instruments,
vocals, submixes and parallel processing. Mix 6 followed similar pattern except for the vocals
being the leftmost and mix 10 having the vocals after the drums. The track order may loosely
relate to the order in which the tracks are being processed.

21
Mixes Dynamic Properties

The below scatter plot (fig. 10) reveals the 10 mixes individual integrated LUFS levels. Reason
of choice for LUFS over RMS is that according to the EBU the LUFS measurement includes
technical and statistical measurement yielding a more accurate representation of the term
“loudness” as it relates to human perception, which is realized through an advanced algorithmic
calculation. A detailed analysis of audio feature algorithms is out of the scope of the project,
however the basic approach according to EBU TECH 3341 (2016) is as follows:

● Input signal is filtered in accordance with K-weighting


● Momentary loudness computed
● Calculate relative threshold
● Computation of integrated loudness

Ultimately this allows investigating the target level at which invited mix engineers adjusted the
sum of their mix to in LUFS. Potentially this may allow engineers to find an anchor point when
metering mixes similar to this project. The Integrated loudness was measured using MATLAB’s
audio analysis toolkit and following commands as specified by Mathworks (2018):

● Read file
○ [x,fs] = audioread('Drum-Verb.cm_01.wav');
● Calculate values from file:
○ [L,LRA] = integratedLoudness(x,fs);
● Print values:
○ fprintf(['Loudness: %0.2f\n', 'Loudness range: %0.2f\n\n'], L, LRA);

Fig. 10 “Mixes Integrated Levels (LUFS)” (Mix Engineer Sessions)

22
The above scatter plot (fig. 10) shows the LUFS of each mix in descending order. Qualitative
differences to the second group of evaluators feedback could not be found, except for mix 8
which was rated as the least favorable mix being the highest in perceived level, which at this
point may not indicate any negative qualitative link as mix 1 was the highest rated, yet also had
the second highest in level (although relatively lower by 3.13 LU). Nonetheless a practical
reference anchor point based on this projects median may be achieved by focusing on a value
between -19.48 and -21 LUFS. Finally, this consideration does not suggest an improved mix
and may be standardized to this project’s mix only or mixes very similar in nature as engineers
tend to mix towards target levels.

Fig. 11 “Mixes Loudness Range (LRA)” (Mix Engineer Sessions)

Standardized by the EBU above LRA value (fig. 11) is represented in LU (loudness units)
relative to full scale. Perceptually LRA indicates an audio signals loudness range. In the case of
this projects genre a median LRA value of 2.6 may be used as similar genres standard value.
Similarly, to the integrated LUFS levels this does not suggest a qualitative benefit but rather a
starting point. As explained by EBU (2010):

“‘Loudness Range’ estimates the distribution of loudness of a programme with statistical tools. A
broadcaster can establish a maximum LRA value for specific genres and transmission
platforms.” (EBU. 2010, p. 2)

An improved insight into the levelling of the mixes can be established by analyzing the individual
multitrack’s integrated LUFS levels as shown in below boxplot (fig. 12):

23
Fig. 12 “Mixes Median Track Level (LUFS Integrated)” (Mix Engineer Sessions)

24
The central tendency (in LUFS) can be broken down from highest to lowest level as:

1. 808 Bass -26.27


2. Kick -29.02
3. Vocals -29.58
4. Keys -32.52
5. Stringed -33.44
6. Snare -33.81
7. Brass -37.47
8. Synth/FX -39.05
9. Cymbals -42.71
10. Percussion -43.26

Further isolation of mixes by negative and positive sentiment in relation to “balance” descriptors
(from assessor feedback) reveals the level differences in the top 5 positively balanced (fig 13)
vs top 5 negatively balanced (fig 14) mixes:

25
Fig. 13 “Top 5 Balance Mixes Track Levels (LUFS Integrated)” (Mix Engineer Sessions)

26
Fig. 13 “Top 5 Balance Mixes Track Levels (LUFS Integrated)” (Mix Engineer Sessions)

27
Most apparent difference in negative group is the levelling of kick drum relative to snare +6.64
LU higher whereas in group 1 the relationship between kick and snare drum was nearly no
difference being apart by only 0.26 LU. Below table (fig. 14) demonstrates the relative level
differences between the 5 highest levelled instruments and the kick track for both positive and
negative group.

Positive Negative

808 Bass +6.47 808 Bass +4.01

Vocal +2.93 Vocals -0.41

Snare +0.26 Stringed -2.93

Stringed +0.13 Keys -4.27

Keys +0.06 Snare -6.26

Fig. 14 “Instrument Levels Relative to Kick Track (LUFS)” (Mix Engineer Sessions)

The results indicate that majority of assessors preferred the vocal track to be higher in level,
while having the kick drum less prominent (in LU). Additionally, the snare drum in positive group
indicates much closer level relationship to kick drum (+0.26 LUFS) when compared to the snare
in negative group (-6.26 LUFS) which most likely was not perceived loud enough.

Recurrently the demonstrated observations should be viewed with care, as perceived loudness
can be altered spectrally using e.g. equalization, while retaining similar amplitude. Further e.g.
the relationship of a kick drum regards to common timbral characteristics such as “cut through”,
“punch” or “click” needs to be further investigated, as altering the “click” of a kick drum may
make the sound perceptually “cut through” and louder, while statistically (LUFS) it may not make
a significant difference.

The relationship between music sound sources frequency and perceived loudness is subject to
further study as also studies popularized by Fletcher & Munson concluded that the response of
the ear is not linear and perception can vary depending on frequency content (Smyth, 2017).

Nevertheless, according to given data it would be beneficial to err on a multitrack balance


similar to the top 5 positively rated mixes. Ultimately, this projects standard may differ extremely
for other genres as collected data is limited to 10 mixing sessions of one genre and 21
assessors feedback only.

28
It is common practice to reference mixes in various ways (The Recording Revolution, 2015).
One common practice is to reference a mix on different playback systems or environments as
noted by experienced engineer and author Mike Collins (2011):

“I always check my mixes in the stereo system in my car, which instantly reveals whether I have
made the bass drum and bass guitar too loud or not when compared with typical commercial
albums.” (Collins. 2011, p. 305)

Similarly mix engineers were surveyed regards 3 possible referencing procedures common to
mixing. Subjects were asked to rate the importance of each with a Likert type response anchor
(not at all important, slightly important, important, fairly important, very important and no
opinion).

Concluded survey resulted all the referencing methodologies to be important, with highest to
lowest priority being:

1. Various Playback Devices & Environments


a. Subjects compare their mixes on various playback devices making sure the mix
“translates” well on different speakers/consumer devices and environments (e.g.
car, living room etc.).
2. Alternating Playback Levels
a. Subjects playback a mix at varying monitoring levels enabling the audibility of
sources at low playback levels whilst retaining dynamic coherence also at higher
playback levels.
3. Referencing Other Music Mixes
a. Subjects tend to reference other music mixes, presumably comparing various
referencing tracks dynamic and tonal characteristics to their own mixes (e.g.
multitrack balances).

Collected response count bars are demonstrated in the appendix.

29
Mixes Spectral Properties

Frequency is a multi-dimensional property incorporating amplitude, timing & pitch and often the
word timbre is used as better term (Lerch, 2012). One of several ways to effectively measure
timbral characteristics is extracting an audio signals spectral features. This was realized using
an open source MATLAB toolkit “MIRtoolbox” used for musical information retrieval (Lartillot &
Toiviainen, 2007). Using the mirrolloff function enabled the extraction of the individual mixes
spectral roll-off, whereas the mircentroid extracted the central magnitude of a frequency
spectrum.

The perceptual relevance of the spectral roll-off is that it represents the average magnitude of
the mixes high frequency roll off over the course of the mixes length, which perceptually may
indicate how bright or harsh (if extreme) a signal is when combined with assessor feedback
data.

Below graph (fig. 15) demonstrates the distribution of subject mixes spectral roll-off (blue) and
spectral centroid (red) in Hz arranged in ascending order.

Fig. 15 “Mixes Spectral Properties” (Mix Engineer Sessions)

30
Combined with assessor’s feedback mix 5 received a high amount of negative feedback for
being too “harsh”. The spectral analysis further proves that a roll-off of 10923 Hz may be too
“harsh” for the project mixes when compared to the next lowest in Hz (Mix 8) being apart by a
relatively high value (~1400 Hz). However, mix 1 which was rated overall the best mix is not too
far (~500 Hz) off from the lowest rated mix 8. A contradiction may also be that mix 9 was
excessive in “highs” (2 points) yet spectral analysis indicates lowest roll-off (7365 Hz). Relying
exclusively on spectral analysis may not necessarily indicate perceptual relationships as it may
be logical to think that mix 9 would receive negative keywords relating to it being “dark” or not
enough “highs”, although it received 2 points for being excessively “high” while receiving 4
points for being deficiently “low” (fig. 40). In this case an exclusively spectral analysis may not
show enough data without combination with assessor feedback.

It is unclear until now how exactly phase artefacts affect the quality of tracks. It is commonly
recommended using linear phase filters for “surgical” or subtractive workflows as they maintain
phase relationship at all frequencies at the cost of extra processing power. The use of standard
EQ’s and minimum phase filters (IIR) may introduce phase distortion often also introducing a
post-ringing character at high manipulation. On the contrary FIR finite impulse response filters,
however may introduce pre-ringing as artefact. The perceptual relationship of hearing the
differences in using FIR vs IIR filters are subject to further study, but given the data linear phase
EQs may be utilized in subtractive processing while additive workflows minimum phase filters
may be “good enough” while retaining the processed audios quality (or for “coloring”).

Fig. 16 “Filter Type Count” (Mix Engineer Sessions)

31
Above graph (fig. 16) shows the most common equalization shapes which were bell and hi-pass
filtering. Similar to experienced engineer’s usage bell curves are most commonly utilized for
boosting or cutting a frequency as they allow specific ranges to be treated. As to the
equalization filter model type no linear phase filters were utilized for subtractive processing. The
mix engineer’s session parameters were parsed into an excel sheet available through following
link:

https://docs.google.com/spreadsheets/d/1GbBCigRPq9WrA4LhlH_ezT1HvCn3RFYNU-
LU2ptZgiY/edit?usp=sharing

Fig. 17 “Subtractive vs. Additive Equalization” (Mix Engineer Sessions)

Above graph (fig. 17) showcases the count of subtractive vs additive equalization by mix
(descending order). The sessions EQ settings were counted for every EQ plugin used. Boosts
were simply additive and cuts were subtractive, whereas if both were utilized a count for each
was given. Notably the 2 negatively rated mixes utilized most filtering, which may indicate “over
processing”. Mix 1 utilized relatively sparse overall cuts/boosts and had only 1 additive
equalization. Qualitatively it may not be as beneficial to use additive processing when compared
to cutting frequencies. Also, overall count indicates that sparse equalization may be “just
enough” to create an adequate mix when comparing the best rated mix (17 processes) against
lowest rated mix 8 (56 processes).

32
Fig. 18 “Overall Plugin Type Count” (Mix Engineer Sessions)

Basic counts in the bar chart above (fig. 18) shows the total number of plugin types used
regardless of manufacturer or “all-in-one” solutions. Equalization was the most utilized tool
followed by compression and distortion/saturation units. Compared to generally insert
processors, send FX (reverb, modulation, delay) were used sparingly. Engineers often utilized
additional filtering after send FX, most prominently after reverb and delay units. Notably most
reverb and delay units include a filter section, with at least LP/HP functionality.

Fig. 19 “Average Processing Chain Count by Track” (Mix Engineer Sessions)

33
The above graph (fig. 19) shows the highest plugin chain count by track in descending order.
The vocals had on average (mean) the highest number of plugins (3). The average processing
chain may have been subtractive EQ - Compression - Additive EQ, however further analysis is
still needed as to the exact chain usage. A study undertaken by researchers at BCU enabled
extracting the plugin chain order by analyzing a total of 178 submissions which further through a
generality score enabled extracting which plugins are most likely to be used in which order
(Stasis et al, 2017). Taken from the authors publication below graph (fig. 20) shows the plugin
chain order for each instrument including the generality score:

Fig. 20 “Number of Plugins applied and Generality Score” (Stasis et al. 2017, p. 2)

Furthermore, compression settings were isolated based on mixes with the top 5 positive “vocal”
sentiment which were further averaged. The average (median) compression settings isolated for
the top 5 vocal mixes can be seen in below table (fig. 20).

threshold (db) ratio (n:1) attack (ms) release (ms) gain (db)

-17 5 2 45 0

Fig. 20 “Top 5 Vocal Mixes Compression Parameters” (Mix Engineer Sessions)

34
Audio Perceptual Evaluation

The tool for subjective assessment was setup on a web server accessible through following link:
http://waemixing.com/test.html?url=tests/apemixes.xml

The user-friendly GUI allowed advanced customization features without the need for web
programming knowledge. Further reason for choosing the APE model was that unlike majority of
subjective assessment methods it allowed for multiple audio stimuli to be represented as
“tangible” green fragments which are moveable on a horizontal axis, as a result making the
rating of stimuli more intuitive as seen in (fig. 21) below.

Fig. 21 “Web Audio Evaluation Tool: A Browser-Based Listening Test Environment” (De Man et al. 2015)

The core focus of the assessment configuration however was the comment box feature which
allowed participants to describe the mixes using their native audio vocabulary. A small
configuration in the XML specification allowed to separate the comments into negative and
positive sentiment by adding 2 comment boxes per stimuli one labelled “what do you like about
mix xyz” and one “what do you dislike about mix xyz” as demonstrated in (fig. 22) “comment
boxes” below.

Fig. 22 “Comment Boxes WAET - APE” (Audio Perceptual Evaluation)

The web server configuration was simple, and required uploading the WAET package to the
server and assigning permissions (0777), while the configuration was done using the interface
accessible via the browser itself. The data was parsed by the server into corresponding xml
files, which were further parsed and visualized using the WAET toolkits python scripts. Overall
participants feedback toward the testing procedure were positive and only 1 participant
suggested the use of less audio samples for future evaluations (~8 stimuli).

35
Perceptual Evaluation Participants Background

Similar to the first group of mix engineers the assessors were sourced from mixing forums and
groups through the internet. Links to the evaluation were shared after participants stated their
interest and registered with the author.

Below figure (fig. 23) shows the subjects location. Majority of participants originated from the
U.S. Overall from given responses subjects from 12 different countries participated in the
evaluation of the 10 mixes.

Fig. 23 “Assessor Geo-Location” (Audio Perceptual Evaluation)

Participants age span from 21 to 49 years with average age being 31 years (median) as shown
below (fig. 24).

Fig. 25 “Evaluators Age” (Audio Perceptual Evaluation)

36
Majority (13) of subjects evaluated the mixes from their home studio, with only 2 participants
using commercial studios for assessment and 6 participating from project studios (fig. 26).

Fig. 26 “Monitoring Environment” (Audio Perceptual Evaluation)

Majority of subjects mix engineering experience ranged from 3 to 10+ years with only 1 subject
having less than 3 years of mixing experience as seen in bar graph below (fig. 27).

Fig. 27 “Mix/Engineering Experience” (Audio Perceptual Evaluation)

When surveyed on total number of songs mixed, majority of participants stated mixing between
50 to 100 songs, whereas 2 of the assessors mixed over 300 songs and only 2 mixed below 20
songs as demonstrated in below bar graph (fig. 28).

Fig. 28 “Total Nr. of Songs Mixed” (Audio Perceptual Evaluation)

37
Additionally, majority of participants (17) stated using professional monitors for playback,
whereas 1 participant stated additionally using earbuds when assessing. Notably 2 participants
used room acoustics correction software when playing back the mixes as seen in below graph
(fig. 29).

Fig. 29 “Type of Playback Device” (Audio Perceptual Evaluation)

Finally, majority (14) of assessors mixing routine included rock whereas less than half (6) stated
hip-hop as their common mixing routine as seen in bar graph below (fig. 30).

Fig. 30 “Mix Routine (Genre)” (Audio Perceptual Evaluation)

Given the data it can be assumed that the participants had more than substantial mixing
knowledge for the purposes of evaluating 10 mixes. Additionally, the fairly varied distribution of
location and age with 1 of the participants being a German speaker (taken from feedback) may
have impacted the data in fairly different ways compared to when sourced from specific
locations or age range and genre. Further notable observation may be that the majority of
participants mixed rock music routinely, which again may have had a fairly different impact on
the results as the mix was primarily hip hop oriented.

38
Perceptual Evaluation Ratings

Fig. 31 “Mixes Evaluation Ratings” (Audio Perceptual Evaluation)

As demonstrated in boxplot above (fig. 31) majority of mixes subjective rating did not suggest
clear discrimination among least (0.0) to most favorable mixes (1.0), but rather indicate high
ranges (whiskers & dashed lines) for majority of mixes rating distribution. Notably ignoring the
outlier in mix 1 a clearer qualitative distinction was made for mix 1 and mix 8 with higher
agreement when looking at the total rating ranges. Mix 8 scoring a median (red line) of 0.05,
errs closer to its low boundary 0. Whereas mix 1 had a strong agreement as being the best mix
scoring 0.85, however positioned further from its upper boundary by 0.15 points. Taking the
median, mixes can be ordered from least to best as follows (fig. 2):

Fig. 32 “Mixes Rating Median Scale” (Audio Perceptual Evaluation)

39
The above median ratings scale (fig. 32) demonstrates the ratings between the individual mixes,
where the “least’ and “best” rated mixes are noticeably distant from the rest of the mixes, with
mix 8 isolated from mix 5 by 0.19, while mix 1 being apart from mix 2 by 0.30 points. The overall
central tendency among all mixes falls toward 0.45 with majority of ratings positioned below 0.5.
The comparatively long distance between mix 1 and 2 by 0.30 points indicates a relatively high
qualitative difference in terms of mix 1 being the “best” mix. Similarly, the distance of mix 8 from
mix 5 by 0.19 suggests mix 8 being a “less good” mix by also much higher magnitude.

The dense clustering of mixes 6, 3 and 2 may indicate high qualitative similarities, also scoring a
relatively high positive median of 0.55 points being further right by 0.10 values from the overall
central tendency (0.45). Contrary to aforementioned clustering, the pair of mix 5 & 10, score
both similarly 0.25 (±0.01) points and lie fairly negative by ~0.20 values, laying further apart
from the overall central tendency, suggesting lower subjective rating.

Mix 9 and 4 share a fairly similar position on the scale by being merely apart 0.04 points. Mix 4
being same distance from the central tendency (0.45) as mixes 6, 3 and 2 by 0.10 steps in
opposite direction and not as far left as the mix 5 & 10 pair may suggest overall “acceptable”
subjective preference. Furthermore mix 4 positioned at 0.35 indicates a slightly “better”
subjective preference than mix 9 at 0.31.

Lastly mix 7 scoring 0.45 points falls toward central tendency which may suggest an overall
neutral inclination among participant ratings. Also, mix 7 is notably isolated by an equal value of
0.10 from both mix 4 and mix 6 suggesting a higher qualitative difference.

Interpreting statistical indicators in isolation may not yield as effective as comparisons with
participant feedback in terms of natural language used (descriptors) as well as combination with
audio features. Consequently, in order to understand subjective preferences with higher
certainty it is crucial to understand how subjects interpreted their preferences toward the
individual mixes through collected textual feedback presented in the next section.

40
Perceptual Evaluation Feedback Analysis

Using the “dislike” and “like” comment boxes in the web audio evaluation interface a total of 402
comments were collected, of which 204 were dislikes and 188 likes. Notably not all boxes were
utilized with 6 idle for negative feedback and 22 idle for positive feedback. The comments were
collected via the server’s xml files and later parsed using the WAET python scripts. The text was
then parsed and linked to the corresponding mixes for further analysis. The sorted participant
comments can be accessed via following link:

https://docs.google.com/document/d/1BhqLfBUQsfUImdBnaIcsfNUmOdRE0g1RLID8ZQONWp
E/edit?usp=sharing

Some of the comments were simpler e.g. “by far the best one”, whereas some were more
elaborate:

“Nice mix with elements well placed and at good levels I feel that this is the best mix. It's
cohesive in levels and stereo placement, and I feel it will respond very well to mastering with
just some corrective treatment for the kick frequencies”

Furthermore, one participant used his native language (German) for describing the mixes:

“es fehlt bass. alles wirkt zu stark zusammengedrückt vor allem am anfang.”
Translated:
“The bass is missing. Everything seems to be strongly compressed, especially in the beginning”

The evaluation participants feedback was ideally analyzed in such way that it allowed
quantifying the responses around keywords in a weighted manner. As an example if a
participant commented in a negative sentiment: “the drum’s are slightly high in the mix” while
another participant commented: ”the drum’s are way too loud in the mix” (on same mix),
differentiation had to be taken into account respecting each quantitative term used. Therefore,
quantitative descriptors such as slight, way too, little etc. demanded each a respective
“weighting score” to be assigned.

41
Fig. 33 “Level of Problem” (Vagias & Wade, 2006)

Borrowing the psychometric scale from Likert-type responses for “level of problem” as
demonstrated in above figure (fig. 33) allowed assigning each quantitative descriptor a “level of
problem” rating, consequently making the assignment of a number manageable for further
evaluation by the author. This approach is experimental and ideally a standardized text
classification system may be able to better detect a participant’s sentiment as well as the
quantitative terms surrounding the descriptor. Unfortunately, tested services such as
text2data.com were not able to classify the comments effectively and returned wrong “sentiment
scores” as most commercial solutions are configured for customer reviews and therefore are not
trained for music production feedback.

One modification was undertaken to adjust the scale to feedback analysis. The metric for “Not at
all a problem” was substituted with “slight” as weighting, since if there would be no problem at
all, participants would have not commented in the dislike box as seen below (fig 34):

“Problem” Weighting Table

Weighting Nr. Sentiment descriptor examples Likert type


response
(equivalent/similar)

+1 point tiny(bit) slight


slight(ly)
“just a tiny/slightly”

+2 points little(bit) minor


bit(more)

+3 points more/too much moderate

+4 points (much)more, missing serious


very|way(much)too)more) use of exclamation
marks, use of neighboring words “wow” or other
extreme semantic hints.

Fig. 34 “Descriptor Problem Weighting” (APE Feedback Analysis)

42
Additionally, negative sentiment was divided into 2 groups; one group denoting an excess and
another group denoting a deficit as assessors frequently evaluated negative aspects as being
“not enough” or “too much”. Both representing a problem in the mixing process together can
denote the overall “problematic score” when combining the total excess - deficit scores. This
allowed a better discrimination between the individual mixes and also allowed understanding in
which particular areas a mix performed well over another (e.g. Mix “A” problematic score for
vocals = ‘36’, whereas Mix “B” problematic score for vocals = ‘7’, could offer an indication that
Mix “B” outperformed mix “A” in terms of vocal quality because it had a significantly lower “vocal
problem” score.

The scoring forms can be viewed through below links:

Positive sentiment scores:


https://docs.google.com/document/d/18cdniTZ5V36-GyfM6V-
0BFrxCtfEOncg_fQdt9NCkww/edit?usp=sharing

Negative sentiment scores:


https://docs.google.com/document/d/1wAwzxHu1ch8xzh4t0mjGoSRe56gYLBzchofG3s09Shk/e
dit?usp=sharing

Fig. 35 “Mixes Sentiment Scores” (APE Feedback Analysis)

43
The above graph (fig. 35) demonstrates a segmentation between negative (red bar) and positive
(green bar) descriptors used by subjects to describe each mix. Participants use of descriptors
and the formulation of sentences indicates profound focus on negative aspects when evaluating
mixes, consequently pointing out more dislikes than likes.

Fig. 36 “Problematic Score” (APE Feedback Analysis)

The above “problematic score” bar chart (fig. 36) is simply a subtraction of the positive scores
from the negative scores for purposes of visualizing the overall problematic score in descending
order.

Further isolation in graph below (fig. 37) reveals mixes elements and corresponding negative
sentiment excess and deficiencies. The graph further indicates assessor’s inclination to respond
to excessive characteristics more than deficient characteristics, in other words assessors tend
to focus more on “something that's too much”. Assessors often commented on the overall mix
without specifying instrument tracks, which was grouped as “mix”. Occasionally assessors also
evaluated specifically the instrumental (instruments) or harmonic (HMX) elements within the
mixes. Taking mix 8 and comparing track keywords with the rest of the mixes a unique negative
excessive characteristic may be the “bass/808” (yellow bar) which has been frequently pointed
out by assessors (23 points) as relatively high excess.

44
Fig. 37 “Negative Sentiment by Mix Element” (APE Feedback Analysis)

45
Similarly, positive response keywords were graphed as seen in below bar chart (fig. 38).
Notably mix 1 outperformed all mixes in terms of overall positive “mix” keywords. Similarly
(among other insights) mix 10 outperformed the rest of the mixes with positive “vocals”.

Fig. 38 “Positive Sentiment by Element” (APE Feedback Analysis)

46
Further isolation of feedback by descriptors enabled the representation of perceptual keywords
toward a mix’ overall “mix” character. Similarly, the feedback was divided into excess, deficiency
and positive descriptors.

Fig. 39 “Excess Descriptors in Mixes” (APE Feedback Analysis)

Above bar chart (fig. 39) shows that overall excessive lows were the most common problem
with mix 8 being the most excessive in the “lows” descriptor, followed by mix 3. Overall if
respecting the overall occurrences of “excesses” it may be of benefit focusing on treating “lows”
right from the beginning of a mixing process, followed by the next-most occurring common
problems.

47
Fig. 40 “Deficiency Descriptors in Mixes” (APE Feedback Analysis)

As seen in previous (fig. 39) chart in the above bar chart (fig. 40) mixes were isolated by
deficiency descriptors. Mixes 9 and 5 were both relatively deficient in the “lows” descriptor (4
points). In terms of “balance” descriptors both mix 5 and 8 had deficiently balanced overall mix
(4 points). Notably mix 4 had a deficient “stereofield” (4 points). Combined with the excessive
“panning” descriptor from previous bar chart (fig. 39) it may indicate a negative sentiment
toward poor panning choices. Notably several descriptors were section dependent, e.g. mix 3
(yellow) had deficient “chorusenergy” (2 points). The results indicate similarly to the excess,
deficiently balanced lows seem to be common among the mixes, further suggesting the
treatment of lows as highest priority when starting a mix.

48
Fig. 41 “Positive Descriptors in Mixes for overall Mix” (APE Feedback Analysis)

Mixes positive descriptors are shown in above bar chart (fig. 41) showcasing the top 3 occurring
descriptors to be “balance”, “width” and “spectrum” related. As an example, mix 2 received
positive feedback in terms of: balance (4), width (2), panning (2), spectrum (1), dynamic (1),
lows (1), warm (1), controlled (1) and cohesive (1). The positive “balance” descriptors indicate
that the assessors primarily focused on overall multitrack balances, hence focusing on level
balances may be done prior to equalization processing.

Given the various observations Mix 1 was the best in terms of “balance” & “Vocal” keywords.
Considering the sparse amount of organization and processing used such as subtractive
equalization with digitally modelled stock units means that a simple approach can suffice to
achieve a great mix with seemingly low effort by focusing on the “basics” such as levelling,
panning, subtractive equalization.

49
Acclaimed Engineers Semantic Analysis

As mixing greatly relies on experience it would be effective to understand how critically


acclaimed mix engineers navigate through their mixing process practically. However, even if it
would be possible to sit in on a session, this would still pose a very time-consuming approach
because manually transcribing a large volume of tutorials and numerous mix engineers
approach would be a straining endeavor. Fortunately, methods such as natural language
processing (NLP) can simplify the process by quantifying e.g. the occurrence of words within a
transcribed text from video tutorials. Further challenge poses that automatically generated
captions (e.g. YouTube auto caption) cannot achieve adequate transcriptions of engineer’s
tutorials as the speech recognition algorithms are not trained to understand the mixing context
or technical nomenclatures.

Fortunately, knowledge hubs such as Mix With The Masters (MWTM) enable users to switch on
the captions which have been curated by content editors. The transcripts in turn are
downloadable through the page source as “VTT” files, ultimately making further text analysis
possible. A total of 195 sessions were downloaded with each being between ~15-25 minutes
amounting to a total runtime of approximately 65 hours resulting in approximately 500,000
words of expert content available for analysis.

The video captions were accessible through inspecting the video embeddings on the MWTM
website via the chrome browser console, hence revealed the path to the VTT on the server as
seen in below screenshot (fig. 42).

Fig. 42 “MWTM VTT Path” (Semantic Analysis)

Making the download of the “Web Video Text Track Formats” (VTT) possible as plaintext files.
This process was repeated for all mix specific MWTM session videos at the time. Permission to
use information for academic purposes was granted by staff with the only requirement being
referencing. The downloaded plaintext files are time stamped as seen in a text excerpt of one of
the sessions in below screenshot (fig. 43).

50
Fig. 43 “Raw VTT Files” (Semantic Analysis)

The transcripts were then divided into two large files containing each ~250,000 words/numbers,
as processing power did not allow handling the whole text within one file being open while
running regular expressions for preparation.

It is common methodology to process a large text file prior analysis to filter superfluous text and
numeric data while preserving intended meaning only. Below breakdown demonstrates the
process for this projects text analysis:

1. Load file into text editor capable of regular expressions (e.g. ATOM)
a. Convert all to lower case (Mac: Cmd+K then Cmd+L)
b. Remove
i. Superfluous characters (“.”,”space”,”line-break”, arrows, empty lines etc.)
ii. Plural character (e.g. compressor”’s”, reverb”’s” etc.)
c. Remove non mixing/engineering words
i. Remove “stop words” available online (e.g. “a”, “an”, “and”, “to” etc.)
ii. Music production terms that do not relate to mixing directly
1. E.g. regex function:
a. \s+(ahead|allowed|nicely|gonna|needs|real|...)+(\s|\.|,)
d. Erase numeric data and preserving technical nomenclatures (e.g. preserve “1176”,
“LA2A”)
i. VTT fragments
1. Intro numbers & transcript timecode
2. E.g. regex function:
a. [1234567890.:,->]

The exact words to be removed was the critical part of cleaning the text file as there is no
standardized stop word list of terms too general to audio production. However, given that the

51
subject involves mix engineering, terms such as mix, engineering, audio or music were removed
as they have little semantic value compared to more specific words such as “drums”, “EQ”,
“punchy” among others. Primary focus was to retain words that include mixing tools (EQ,
compression, reverb etc.), audio tracks (instrument tracks, aux etc.) and descriptors (hard,
punchy, crunch etc.) that with help of further processing was aimed to represent semantic links
which preserve an interpretable meaning between e.g. tools being used and associated sound
characteristics.

A common way to visualize words is using a word cloud, which was created using below
MATLAB commands as specified by Mathworks (2017):

1. Load & clean text:


a. str = extractFileText("MWTM Filtered.txt")
b. cleanDocuments = tokenizedDocument(str);
c. cleanDocuments = removeShortWords(cleanDocuments,1);
d. cleanBag = bagOfWords(cleanDocuments)
e. cleanBag = removeInfrequentWords(cleanBag,3)
2. Create wordcloud:
a. figure
b. subplot(1,1,1)
c. wordcloud(cleanBag); title("MWTM - Deconstructing a Mix")

This allows visualizing words by occurrence as demonstrated in word cloud below (fig. 44).

Fig. 44 “MWTM Wordcloud” (Semantic Analysis)

The most occurring words (fig. 44) quickly demonstrate the vocal, snare, kick, bass and
equalization to be the most mentioned terms by the critically acclaimed mix engineers (e.g.
Andrew Scheps, Chris Lord-Alge, Michael Brauer etc.). Although the cloud just shows the word
occurrences it further proves the vocal to be the center of attention when mixing practically. It is

52
important to note that the transcripts are derived from practical tutorials as the mix engineers
iterate through their session, hence the focus on elements can be inferred by simple word
occurrence counts, however the context as to what processing was used cannot be answered
by simple word count.

Nevertheless, an effective way to map data links for visualization purposes while retaining the
relationship between data is using “t distributed stochastic neighbor embedding” in short “t-
SNE”, which is a dimensionality reduction process often used for the representation of high
dimensional data, which in turn is mapped on low dimensional 2D/3D maps, preserving the
underlying data intuitively (Van der Maaten, 2013).

This approach is commonly used for exploratory data analysis with text, making it possible to
read text as a “map of text” (Heuer, 2015). The author's main goal was to combine acclaimed
engineer’s transcriptions, remove unnecessary words/numerical data (as done previously) and
have the t-SNE algorithm identify similar words to be able to infer meaning through interpreting
“clusters” of individual parts of the word map.

The process was realized using MATLAB’s text analysis toolkit which includes the necessary
functions enabling the creation of a t-SNE map with relative ease as specified by Mathworks
(2017):

1. Read file and train word embedding:


a. mwtmfiltered = "mwtmfiltered.txt";
b. emb = trainWordEmbedding(mwtmfiltered, 'Dimension',
200,'MinCount',3,'NumEpochs', 500)
2. Visualise 2D scatter plot:
a. words = emb.Vocabulary; V = word2vec(emb, words);
XY = tsne(V); textscatter(XY,words) title("Deconstructing
a Mix t-SNE Plot")

A common complication is that successive runs with similar settings not always create similar
embeddings (Jaju, 2017). In order to compare various iterations of the map, various lengths of
“epochs” were compared. The epoch count specifies the number of iterations the algorithm
undertakes to optimize the distances between data points. It is commonly recommended to use
a higher number of epochs depending on the size of the underlying data. This was done through
trial and error until a relatively adequate representation was achieved while not straining
processing resources. Below example (fig. 45) shows the map surrounding the word
“saturation”, after only 50 epochs.

53
Fig. 45 “MWTM t-SNE 50 Epoch’s” (Semantic Analysis)

The underlying algorithm seemed to better identify the relationship between the text when
longer epochs were run. Notably a low number of epochs neighbored similar sounding words at
times and grouped words which did not seem to relate much in the context of mixing, while
longer epochs enabled more meaningful neighboring of words as shown in the next map (fig.
46) for which 500 epochs were run. The idea was to represent the words while retaining the
relationship between e.g. processors used and the track that has been affected as opposed to
just showing the term frequency.

54
Fig. 46 “MWTM t-SNE 500 Epoch’s” (Semantic Analysis)

In the context of mixing the second map makes more sense as words such as saturation better
relate to tools such as distressor, federal or arousor (analog modelled plugins) and seem more
adequate to be “neighbors”. However, a longer epoch of 5000 or even 10000 may achieve more
precise results. Unfortunately, at 500 epochs it took the authors computer approximately 1
minute to render the word embedding (OSX: 16 GB, i7), which extensively maximized the CPU
load throughout the render.

Potentially if there was a search function in a DAW an engineer may search “Michael Brauer”
while the DAW auto suggests a tree of processors & settings associated with the engineer’s
explanations (parallel to a user typing into the search box, similar to Google’s autocomplete
function). Further an engineer may type “saturation” into a search box for which the engine
returns: “distressor”, “federal” or “arousor” processing units (if based on current t-SNE
embedding). Ultimately, the exact working method of t-SNE in combination with mixing related
text is subject to further extensive studies as the relationship between the words seemed to
differ upon successive runs with minimal change of settings, hence similar settings returned
differing results at times.

55
Conclusion

Overall, findings from the audio perceptual evaluation, literature research and mix session
analysis indicate common agreements among mixing practices, which can be used as starting
point when mixing a multitrack session. However, the mixing process in its entirety remains a
complicated subject and the findings of this project may be valid for the mixes demographic and
analyzed music genre only. Furthermore, finding qualitative differences in mixes proved hard
even when a range of audio features were correlated with categorized subjective feedback data.
The extent at which the quality of data improves is therefore relative to the methodology used to
collect the data. Additionally, the accuracy of predicting which parameters to utilize on a specific
track is proportional to the data that is available to inform such decisions (audio features,
descriptors and parameters). The participants awareness of the project being a study may have
already biased the data in a different way as if it would have been their own project. Also, less
compensation in the form of merely a prize draw may have resulted in less overall effort by
subjects, which ultimately may have degraded the data further. Ultimately the factor at which
feedback improves also relies to great extent on the subject’s evaluation experience and
breadth of audio vocabulary as opposed to being practically capable of “feeling what's a good
sound”. Future developments in decentralized applications may improve the collection of data
by integrating statistical analysis as part of a DAW making the “sharism” of data accessible to a
music production market on-demand. Tools such as the “IAW” developed by the BCU & QMUL
may enable running a full session with any plugin native to a participant while collecting user’s
metrics in near future. Furthermore, It is important to note that majority of mixing procedures are
heavily genre dependent. It is more than plausible that the introduction and morphism of new
music genres greatly influences engineers mixing choices and creative possibilities (and vice
versa). Notably in popular electronic music and sampled music as there is a lower degree of
errors introduced during recording mix engineers may have more time to experiment and further
“perfect” something that otherwise would have received lower priority. Future improvements
could be made by isolating mixes by genre, location and experience of mix engineer participants
to be able to better understand the differences between mixes of various demographics. Further
improvement may be the classification of assessor comment feedback data by standardized
approaches using advanced text classifiers instead of manual judgement. This may be realized
with open source python libraries such as “NLTK” & “scikit-learn”. Ultimately the formulation of
standards allows further experimentation on top of “old” best practices, furthering the exploration
of “new” best practices, hence enhancing the art of mixing.

56
Appendix

General Mix Engineer Survey (Subjects Background)

57
General Mixing Survey (Errors encountered during Mixing)

58
General Mixing Survey (Referencing Methods)

59
Mixing Engineers Background

60
Mix Engineer Data by Mix*

Mix Primary Genre(s) Experience in Years Monitoring Tool

1 RnB, Hip Hop, Rap, Jazz 3-4 years Pro Monitors

2 Rock, Metal, Pop, Classical 9-10 years Pro Monitors, Pro


Headphones

3 Rock 7-8 years Pro Headphones

4 Rock, Jazz, Blues 5-6 years Pro Monitors, Pro


Headphones Consumer
Headphones

5 Rock, Metal 3-4 years Pro Headphones

6 Rock, Pop, RnB, Hip Hop, Rap 3-4 years Pro Monitors, Pro Monitors

7 Rock, Jazz, Classical 5-6 years Pro Monitors, Pro


Headphones

8 Rock 1-2 years Pro Monitors, Pro


Headphones

9 Rock, Hip Hop, Rap 3-4 years Pro Monitors + Subwoofer,


Pro Headphones

10 Rock, Pop, Blues More than 10 years Pro Headphones, Room


Correction Software

*exact data excluded to protect identities

61
Mix Engineer Session Data

62
Mix Sessions Submixes Dendrograms

63
“Top 5 Positive Vocal Compression Parameters” (Mix Sessions)

64
“Mixes Median Track LRA Range” (Mix Sessions)

65
References

Turnidge, S. (2012). Desktop Mastering. Milwaukee, WI: Hal Leonard Books.

Adam, N. & Ward, K. (2011). Pro Tools 9: The Mixer's Toolkit. Kidlington, Oxford: Elsevier, Inc.

Adarno, M., Felton, D. & Raoofi, S. (2012). The Secrets of House Music Production. 4th Ed.
London: Sample Magic.

Waves Audio Ltd (n.d.). API 2500. [Online]. Available from: https://www.waves.com/plugins/api-
2500. [Accessed: 16 February 2018].

Bartlett, B. & Bartlett, J. (2013). Practical Recording Techniques: The Step- by- Step Approach
to Professional Audio Recording. 6th Ed. Waltham, MA: Focal Press.

Waves Audio Ltd. (n.d.). BSS DPR-402. [Online]. Available from:


https://www.waves.com/plugins/bss-dpr-402. [Accessed: 16 February 2018].

Case, A.U. (2011). Mix Smart: Pro Audio Tips for Your Multitrack Mix. Waltham, MA: Focal
Press.

Crich, T. (2010). Recording Tips for Engineers: For Cleaner, Brighter Tracks. 3rd Ed. Kidlington,
Oxford: Elsevier, Ltd.

Egizii, A. (2004). Production - Mixing - Mastering with Waves. Knoxville, TN: Waves Publishing
Inc.

Franz, D. (2003). Producing in the Home Studio with Pro Tools. 3rd Ed. Boston, MA: Berklee
Press.

Hamidovic, E. (2012). The Systematic Mixing Guide. Melbourne: Systematic Productions.

Hirsch, S. & Heithecker, S. (2006). Pro Tools 7 Session Secrets: Professional Recipes for High-
Octane Results. Indianapolis, Indiana: Wiley Publishing, Inc.

Izhaki, R. (2018). Mixing Audio: Concepts, Practices, and Tools. 3rd Ed. New York, NY:
Routledge.

Katz, B. (2014). Mastering Audio: The Art and the Science. 3rd Ed. Burlington, MA: Focal Press.

Krug, J. (2013). Mastering Pro Tools Effects: Getting the Most Out of Pro Tools' Effects
Processors. Boston, MA: Course Technology, a part of Cengage Learning.

Miller, M. (2016). Mixing Music. Idiot's Guides. Indianapolis, Indiana: DK Publishing.

66
Owsinski, B. (2017). The Mixing Engineer’s Handbook. 4th Ed. Burbank, CA: Bobby Owsinski
Media Group.

Pejrolo, A. (2005). Creative Sequencing Techniques for Music Production: A practical guide to
Logic, Digital Performer, Cubase and Pro Tools. Burlington, MA: Focal Press.

Preve, F. (2006). The Remixer's Bible: Build Better Beats. San Francisco, CA: Backbeat Books.

Waves Audio Ltd. (n.d.). PuigChild Compressor. [Online]. Available from:


https://www.waves.com/plugins/puigchild-compressor. [Accessed: 16 February 2018].

Waves Audio Ltd. (n.d.). Renaissance Compressor. [Online]. Available from:


https://www.waves.com/plugins/renaissance-compressor. [Accessed: 16 February 2018].

Rose, J. (2002). Audio Post Production for Digital Video. San Francisco, CA: CMP Books.

Savage, S. (2014). Mixing and Mastering In the Box: The Guide to Making Great Mixes and
Final Masters on Your Computer. New York, NY: Oxford University Press.

Senior, M. (2011). Mixing Secrets for the Small Studio. Burlington, MA: Focal Press.

Snoman, R. (2012). Dance Music Manual: Tools, Toys, and Techniques. 2nd Ed. Burlington,
MA: Focal Press.

Waves Audio Ltd. (n.d.). SSL G-Master Buss Compressor. [Online]. Available from:
https://www.waves.com/plugins/ssl-g-master-buss-compressor. [Accessed: 16 February 2018].

Strong, J. (2008). Pro Tools All-in-one Desk Reference for Dummies. 2nd Ed. Hoboken, NJ:
Wiley Publishing, Inc.

Waves Audio Ltd. (n.d.). V-Comp. [Online]. Available from: https://www.waves.com/plugins/v-


comp. [Accessed: 16 February 2018].

Winer, E. (2013). The Audio Expert: Everything You Need to Know about Audio. Burlington, MA:
Focal Press.

Mixerman (2014). Zen and the Art of Mixing. Milwaukee, WI: Hal Leonard Books.

Dittmar, T. (2012). Audio Engineering 101: A Beginner's Guide to Music Production. Kidlington,
Oxford: Elsevier, Inc.

67
Mix With The Masters. (2018) Michael Brauer live workshop at AES NY 2017 [Online Video].
January 15. Available from: https://www.youtube.com/watch?v=joR2Osi-yho. [Accessed:
February 10th 2018].

Semantic Audio. (2015) Intelligent Music Production: Challenges, Frontiers and Implications -
Josh Reiss [Online Video]. September 11. Available from:
https://www.youtube.com/watch?v=joR2Osi-yho. [Accessed: February 10th 2018].

Method for the subjective assessment of intermediate quality level of coding systems.
Recommendation ITU-R BS.1534-1, 2003.

General methods for the subjective assessment of sound quality. Recommendation ITU-R
BS.1284-1, 2003.

Anon (1989). An Afternoon With: Bill Putnam. [Online]. 37 (9). Available from:
http://www.aes.org/aeshc/docs/afternoon_putnam.pdf. [Accessed: 14 February 2018].

Bech, S. & Zacharov, N. (2007). Perceptual Audio Evaluation - Theory, Method and Application.
John Wiley & Sons.

Cox, T. (2014). Sonic Wonderland: A Scientific Odyssey of Sound. Bodley Head.

De Man, B. & Reiss, J.D. (2013). A Semantic Approach To Autonomous Mixing. Journal on the
Art of Record Production. [Online]. (8). Available from:
https://www.eecs.qmul.ac.uk/~josh/documents/2013/De%20Man%20Reiss%20-
%20ARP2013.pdf. [Accessed: 5 February 2018].

De Man, B. & Reiss, J.D. (2015). Analysis of Peer Reviews in Music Production. Journal on the
Art of Record Production. [Online]. (10). Available from:
https://qmro.qmul.ac.uk/xmlui/bitstream/handle/123456789/12562/De%20Man%20Analysis%20
of%20Peer%20Reviews%202015%20Accepted.pdf?sequence=2. [Accessed: 4 January 2018].

De Man, B. (2017). Towards a better understanding of mix engineering. thesis. [Online].


Available from: http://www.brechtdeman.com/publications/pdf/PhD-thesis.pdf. [Accessed: 6
January 2018].

Eustace, D. (n.d.). Musical Acoustics. [Online]. Concert Hall Acoustics: Art and Science.
Available from:
http://www.acoustics.salford.ac.uk/acoustics_info/concert_hall_acoustics/?content=musical_aco
ustics. [Accessed: 9 February 2018].

Anon (n.d.). FIR vs IIR filtering. [Online]. Mini DSP. Available from:
https://www.minidsp.com/applications/dsp-basics/fir-vs-iir-filtering. [Accessed: 4 February 2018].

68
Flatow, I. & Feaster, P. (2008). 1860 'Phonautograph' Is Earliest Known Recording. Technology.
[Online]. Available from: https://www.npr.org/templates/story/story.php?storyId=89380697.
[Accessed: 9 February 2018].

Anon (2017). Gain Staging in Your DAW: Good Levels, Better Mix. [Online]. 23 November 2017.
Available from: https://www.waves.com/gain-staging-in-your-daw-better-mix. [Accessed: 3
February 2017].

Huber, D.M. & Runstein, R.E. (2014). Modern Recording Techniques. Burlington, MA: Focal
Press.

Jillings, N., De Man, B., Moffat, D. & Reiss, J.D. (2015). Web Audio Evaluation Tool: A Browser-
based Listening Test Environment.

Jillings, N. & Stables, R. (2017). Web Audio Conference WAC-2017. In: An Intelligent audio
workstation in the browser. [Online]. 21 August 2017. Available from:
http://eecs.qmul.ac.uk/~keno/37.pdf. [Accessed: 2 February 2018].

Juried, C. (n.d.). Les Paul. [Online]. History Of Recording. Available from:


https://www.historyofrecording.com/Les_Paul.html. [Accessed: 18 February 2018].

Nichols, P. (2017). Stems and Multitracks: What’s the Difference? [Online]. 5 January 2017.
izotope. Available from: https://www.izotope.com/en/community/blog/tips-
tutorials/2018/01/stems-and-multitracks-whats-the-difference.html. [Accessed: 7 February
2018].

Rumsey, F. & McCormick, T. (2013). Sound and Recording. 6th Ed. Burlington, MA: Focal
Press.

Scarre, C. (1989). Painting by Resonance. [Online]. 1989. UCLA. Available from:


http://cogweb.ucla.edu/ep/Art/Scarre_89.html. [Accessed: 8 February 2018].

Anon (n.d.). The Phonautograph and Precursors to Edison's Phonograph. [Online]. UCSB
Cylinder Audio Archive. Available from: http://cylinders.library.ucsb.edu/history-early.php.
[Accessed: 7 February 2018].

Olshausen, B.A. (2000). Aliasing. [Online]. 10 October 2000. rctn. Available from:
http://www.rctn.org/bruno/npb261/aliasing.pdf. [Accessed: 7 February 2018].

Reiss, J.D. (2016). A Meta-Analysis of High Resolution Audio Perceptual Evaluation. Journal of
the Audio Engineering Society. [Online]. Available from:
https://qmro.qmul.ac.uk/xmlui/bitstream/handle/123456789/13493/Reiss%20A%20Meta-

69
Analysis%20of%20High%20Resolution%202016%20Published.pdf?sequence=1. [Accessed: 22
February 2018].

Robjohns, H. (2014). Q. Why do Universal Audio restrict the processing bandwith of their UAD
plug-ins? [Online]. May 2014. SOS. Available from: https://www.soundonsound.com/sound-
advice/q-why-do-universal-audio-restrict-processing-bandwith-their-uad-plug-ins. [Accessed: 22
February 2018].

Loudness Metering: 'EBU Mode' Metering To Supplement EBU R 128 Loudness Normalization.
Recommendation EBU TECH 3341, 2016.

Anon (2017). integratedLoudness. [Online]. 11 October 2017. Mathworks. Available from:


https://uk.mathworks.com/help/audio/ref/loudnessmeter-system-object.html. [Accessed: 7
February 2018].

European Broadcast Union (2011). EBU R 128 – the EBU loudness recommendation. [Online].
Available from: https://tech.ebu.ch/docs/events/ibc11-
ebutechnical/presentations/ibc11_10things_r128.pdf. [Accessed: 14 February 2018].

Smyth, T. (2017). Music 175: Loudness. [Online]. Available from:


http://musicweb.ucsd.edu/~trsmyth/loudness175/loudness175_4up.pdf. [Accessed: 3 January
2018].

Collins, M. (2013). Pro Tools 8: Music Production, Recording, Editing, and Mixing. Taylor &
Francis.

Lerch, A. (2012). An Introduction to Audio Content Analysis: Applications in Signal Processing


and Music Informatics. John Wiley & Sons.

Lartillot, O. & Toiviainen, P. (2007). Proc. of the 10th Int. Conference on Digital Audio Effects
(DAFx-07). In: A Matlab Toolbox For Musical Feature Extraction From Audio. [Online]. 15
September 2007, Bordeaux. Available from: http://dafx.labri.fr/main/papers/p237.pdf. [Accessed:
15 February 2018].

Anon (n.d.). Boxplots. [Online]. Stat Trek. Available from:


http://stattrek.com/statistics/charts/boxplot.aspx?Tutorial=AP. [Accessed: 3 January 2018].

Vagias & Wade, M. (2006). Likert Type Response Anchors. Available from:
https://www.uc.edu/content/dam/uc/sas/docs/Assessment/likert-
type%20response%20anchors.pdf. [Accessed: 6 February 2018].

Anon (2017). Wordcloud. [Online]. 11 October 2017. Mathworks. Available from:


https://uk.mathworks.com/help/matlab/ref/wordcloud.html. [Accessed: 14 February 2018].

70
Maaten, L. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research.
[Online]. Available from:
http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf. [Accessed: 7
February 2018].

Heuer, H. (2016). Proc. Of The 8th Eur. Conf. On Python In Science (Euroscipy 2015). In: Text
comparison using word vector representations and dimensionality reduction. [Online]. 2 July
2016. Available from: https://arxiv.org/pdf/1607.00534. [Accessed: 15 February 2018].

Jaju, Saurabh (2017). Comprehensive Guide on t-SNE algorithm with implementation in R &
Python. [Online]. 22 January 2017. Analytics Vidhya. Available from:
https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r-python/. [Accessed: 1
February 2018].

Brauer, M., Alge, C.L., Alge, T.L., Wallace, A., King, J., Maserati, T., Massy, S., Blake, T., Wells,
G., Douglas, J., Albini, S., Puig, J.J., Meyerson, A., Kramer, E., Schmitt, A., Launay, N.,
Douglas, J. & Scheps, A. (n.d.). Mix With The Masters - Deconstructing a Mix Series. [Online].
Mix With The Masters. Available from: https://www.mixwiththemasters.com/videos/series.
[Accessed: 2 January 2018].

71

View publication stats