Science by Ear Diss DeCampo

Institute of Electronic Music and Acoustics - IEM, University for Music and Dramatic Arts Graz
Science By Ear. An Interdisciplinary Approach to Sonifying Scientic Data Alberto de Campo
Dissertation
Graz, February 23, 2009
Supervisor: Prof Dr Robert Hldrich (IEM/KUG), o Prof Dr Curtis Roads (MAT/UCSB)
ii Science By Ear. An Interdisciplinary Approach to Sonifying Scientic Data Author: Alberto de Campo Contact: decampo@iem.at Supervisor: Prof Dr Robert Hldrich (IEM/KUG), o Prof Dr Curtis Roads (MAT/UCSB) Contact: hoeldrich@iem.at, clang@create.ucsb.edu Dissertation Institute of Electronic Music and Acoustics - IEM, University for Music and Dramatic Arts Graz Ineldgasse 10, A-8020 Graz, Austria February 23, 2009, 211 pages
Abstract
Sonication of Scientic Data is intrinsically interdisciplinary: It requires collaboration between experts in the respective scientic domains, in psychoacoustics, in artistic design of synthetic sound, and in working with appropriate programming environments. The SonEnvir project hosted at IEM Graz put this view into practice: in four domain sciences, sonication designs for current research questions were realised. This dissertation contributes to sonication research in three aspects: The body of sonication designs realised within the SonEnvir context is described, which may be reused in sonication research in dierent ways. The software framework built with and for these sonication designs is presented, which supports uid experimentation with evolving sonication designs. A theoretical model for sonication design work, the Sonication Design Space Map, was synthesised based the analysis of this body of sonication designs (and a few selected others). This model allows systematic reasoning about the process of creating sonication designs, and provides concepts for analysing and categorising existing sonications designs more systematically. Deutsche Zusammenfassung - German abstract Die Sonikation von wissenschaftlichen Daten ist intrinsisch interdisziplinr: Sie verlangt a Zusammenarbeit zwischen ExpertInnen in den jeweiligen wissenschaftlichen Gebieten, in Psychoakustik, in der knstlerischen Gestaltung von synthetischem Klang, und in der u Arbeit mit geeigneten Programmierumgebungen. Das Projekt SonEnvir, das am IEM Graz stattfand, hat diese Sichtweise in die Praxis umgesetzt: in vier wissenschaftlichen Gebieten (domain sciences) wurden Sonikations-Designs zu aktuellen Forschungsfragen realisiert.
iii Diese Dissertation trgt drei Aspekte zur Sonikationforschung bei: a Der Korpus der im Kontext von SonEnvir entwickelten Sonication Designs wird detailliert beschrieben; diese Designs knnen in der Forschungsgemeinschaft in verschiedener o Weise Weiterverwendung nden. Das Software-Framework, das fr und mit diesen Designs gebaut wurde, wird beschrieben; u es erlaubt iessendes Experimentieren in der Entwicklung von Sonikationsdesigns. Ein theoretisches Modell fr die Gestaltung von Sonikationen, die Sonication Design u Space Map, wurde auf Basis der Analysen dieser (und ausgewhlter anderer) Designs a synthetisiert. Dieses Modell erlaubt systematisches Nachdenken (reasoning) uber den Gestaltungsprozess von Sonikationsdesigns, und bietet Konzepte fr die Analyse und u Kategorisierung existierender Sonikationsdesigns an. Keywords: Sonication, Sonication Theory, Perceptualisation, Interdisciplinary Research, Interactive Software Development, Just In Time Programming
iv
Acknowledgements
First of all, I would like to thank Marianne Egger de Campo for designing several versions of the XENAKIS proposal with me - a sonication project with European partners that eventually became SonEnvir. Then, I would like to thank my research partners in the SonEnvir project: Christian Day, Christopher Frauenberger, Kathi Vogt and Annette e Wallisch, without whom this work would not have been possible. I would like to thank Robert Hldrich for his collaboration on the grant proposals, and for his contribution to o the EEG realtime sonication; and Gerhard Eckel for leading the SonEnvir project for most of its lifetime. I would like to thank the participants of the Science By Ear workshop, who have been very open to a very particular experimental setup in interdisciplinary collaboration, especially for the discussions which eventually led to formulating the concept of the Sonication Design Space Map. A very special thank you is in order for the brave people who were willing to try programming sonication designs just-in-time within this workshop: Till Bovermann, Christopher Frauenberger, Thomas Musil, Sandra Pauletto, and Julian Rohrhuber. For the Spin Models, the following Science By Ear participants also worked on a sonication design for the Ising model (besides the SonEnvir team): Thomas Hermann, Harald Markum, Julian Rohrhuber and Tony Stockman. Concerning the background in theoretical physics, we would also like to thank Christof Gattringer, Christian Bernd Lang, Leopold Mathelitsch and Ulrich Hohenester. For the piece Navegar, I would to thank Peter Jakober for researching the detailed timeline, and Marianne Egger de Campo for suggesting the Gini index as an interesting variable.
Alberto de Campo
Graz, February 23, 2009
Contents
1 Introduction 1.1 1.2 1.3 1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2 3 4 5 6 6 8
2 Psychoacoustics, Perception, Cognition, and Interaction 2.1 2.2 2.3 2.4 2.5 Psychoacoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Auditory perception and memory . . . . . . . . . . . . . . . . . . . . .
Cognition, action, and embodiment . . . . . . . . . . . . . . . . . . . . 10 Perception, perceptualisation and interaction . . . . . . . . . . . . . . . 11 Mapping, mixing and matching metaphors . . . . . . . . . . . . . . . . 12 13
3 Sonication Systems 3.1 3.1.1 3.1.2 3.2 3.2.1 3.2.2 3.3 3.4
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 A short history of sonication . . . . . . . . . . . . . . . . . . . 14 A taxonomy of intended sonication uses . . . . . . . . . . . . . 17 Historic systems . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Current systems . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Sonication toolkits, frameworks, applications . . . . . . . . . . . . . . 18
Music and sound programming environments . . . . . . . . . . . . . . . 20 Design of a new system . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4.1 3.4.2 Requirements of an ideal sonication environment . . . . . . . . 23 Platform choice . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Software framework . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5
SonEnvir software - Overall scope . . . . . . . . . . . . . . . . . . . . . 24 3.5.1
vi 3.5.2 3.5.3 Framework structure . . . . . . . . . . . . . . . . . . . . . . . . 25 The Data model . . . . . . . . . . . . . . . . . . . . . . . . . . 26 29
4 Project Background 4.1 4.1.1 4.1.2 4.1.3 4.2 4.2.1 4.2.2 4.2.3 4.3 4.3.1 4.3.2
The SonEnvir project . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Partner institutions and people . . . . . . . . . . . . . . . . . . 29 Project ow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Workshop design . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Working methods . . . . . . . . . . . . . . . . . . . . . . . . . 32 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Listening to the Mind Listening . . . . . . . . . . . . . . . . . . 34 Global Music - The World by Ear . . . . . . . . . . . . . . . . . 34 37
Science By Ear - An interdisciplinary workshop . . . . . . . . . . . . . . 32
ICAD 2006 concert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 General Sonication Models 5.1 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 5.1.6 5.1.7 5.2 5.2.1 5.2.2 5.2.3 5.3 5.3.1 5.3.2 5.3.3 5.3.4
The Sonication Design Space Map (SDSM) . . . . . . . . . . . . . . . 38 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 The Sonication Design Space Map . . . . . . . . . . . . . . . . 41 Renement by moving on the map . . . . . . . . . . . . . . . . 43 Examples from the Science by Ear workshop . . . . . . . . . . . 47 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Extensions of the SDS map . . . . . . . . . . . . . . . . . . . . 51 Data categorisation . . . . . . . . . . . . . . . . . . . . . . . . 52 Data organisation . . . . . . . . . . . . . . . . . . . . . . . . . 52 Task Data analysis - LoadFlow data . . . . . . . . . . . . . . . . 53 Sonication strategies . . . . . . . . . . . . . . . . . . . . . . . 57 Continuous Data Representation . . . . . . . . . . . . . . . . . 57 Discrete Data Representation . . . . . . . . . . . . . . . . . . . 61 Parallel streams . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Data dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Synthesis models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
vii 5.3.5 5.4 5.4.1 5.4.2 5.4.3 5.4.4 5.4.5 5.4.6 5.5 5.5.1 5.5.2 5.5.3 Model Based Sonication . . . . . . . . . . . . . . . . . . . . . 63 Background - related disciplines . . . . . . . . . . . . . . . . . . 64 Music interfaces and musical instruments . . . . . . . . . . . . . 65 Interactive sonication . . . . . . . . . . . . . . . . . . . . . . . 66 The Humane Interface and sonication . . . . . . . . . . . . . 67 Goals, tasks, skills, context . . . . . . . . . . . . . . . . . . . . 69 Two examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Speaker-based sound rendering Headphones Handling speaker imperfections . . . . . . . . . . . . . . . . . 75 . . . . . . . . . . . . . . . . . 80 81
User, task, interaction models . . . . . . . . . . . . . . . . . . . . . . . 64
Spatialisation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6 Examples from Sociology 6.1 6.1.1 6.1.2 6.1.3 6.1.4 6.1.5 6.1.6 6.2 6.2.1 6.2.2 6.3 6.3.1 6.3.2 6.3.3 6.3.4
FRR Log Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Technical background . . . . . . . . . . . . . . . . . . . . . . . 82 Analysis steps . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Sonication design . . . . . . . . . . . . . . . . . . . . . . . . . 86 Interface design . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Evaluation for the research context . . . . . . . . . . . . . . . . 88 Evaluation in SDSM terms . . . . . . . . . . . . . . . . . . . . 88
Wahlgesnge - Election Songs . . . . . . . . . . . . . . . . . . . . . 90 a Interface and sonication design . . . . . . . . . . . . . . . . . . 91 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Social Data Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Interaction design . . . . . . . . . . . . . . . . . . . . . . . . . 94 Sonication design . . . . . . . . . . . . . . . . . . . . . . . . . 96 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 98
7 Examples from Physics 7.1 7.1.1 7.1.2
Quantum Spectra sonication . . . . . . . . . . . . . . . . . . . . . . . 100 Quantum spectra of baryons . . . . . . . . . . . . . . . . . . . . 101 The Quantum Spectra Browser . . . . . . . . . . . . . . . . . . 101
viii 7.1.3 7.1.4 7.2 7.2.1 7.2.2 7.2.3 7.2.4 7.2.5 7.2.6 7.2.7 7.2.8 The Hyperne Splitter . . . . . . . . . . . . . . . . . . . . . . . 104 Possible future work and conclusions . . . . . . . . . . . . . . . 107 Physical background . . . . . . . . . . . . . . . . . . . . . . . . 109 Ising model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Potts model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Audication-based sonication . . . . . . . . . . . . . . . . . . 114 Channel sonication . . . . . . . . . . . . . . . . . . . . . . . . 116 Granular sonication . . . . . . . . . . . . . . . . . . . . . . . . 117 Sonication of self-similar structures . . . . . . . . . . . . . . . 119 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 122
Sonication of Spin models . . . . . . . . . . . . . . . . . . . . . . . . 109
8 Examples from Speech Communication and Signal Processing 8.1 8.1.1 8.1.2 8.1.3 8.1.4 8.2 8.2.1 8.2.2 8.2.3 8.2.4
Time Series Analyser . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Mathematical background . . . . . . . . . . . . . . . . . . . . . 123 Sonication tools . . . . . . . . . . . . . . . . . . . . . . . . . 124 The PDFShaper . . . . . . . . . . . . . . . . . . . . . . . . . . 124 TSAnalyser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Listening experiment . . . . . . . . . . . . . . . . . . . . . . . . 128 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . 129 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 134
Listening test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9 Examples from Neurology 9.1 9.1.1 9.1.2 9.1.3 9.2 9.2.1 9.2.2 9.3 9.3.1
Auditory screening and monitoring of EEG data . . . . . . . . . . . . . . 134 EEG and sonication . . . . . . . . . . . . . . . . . . . . . . . . 134 Rapid screening of long-time EEG recordings . . . . . . . . . . . 135 Realtime monitoring during EEG recording sessions . . . . . . . . 136 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 . . . . . . . . . . . . . . . . . . . . . . . . . 138 . . . . . . . . . . . . . . . . . . . . . . . . 140 Sonication design . . . . . . . . . . . . . . . . . . . . . . . . 136 Interface design
The EEG Screener
The EEG Realtime Player
Sonication design . . . . . . . . . . . . . . . . . . . . . . . . . 141
ix 9.3.2 9.4 9.4.1 9.4.2 9.4.3 9.4.4 9.4.5 9.4.6 9.4.7 9.4.8 9.4.9 Interface design . . . . . . . . . . . . . . . . . . . . . . . . . . 143 . . . . . . . . . . . . . . . . . . . . . . . . 144 EEG test data . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Evaluation with user tests
Initial pre-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Tests with expert users . . . . . . . . . . . . . . . . . . . . . . 145 Analysis of expert user tests EEG Screener 1 vs. 2 . . . . . . . . 146 Analysis of expert user tests - RealtimePlayer 1 vs. 2 . . . . . . 147 Qualitative results for both players (versions 2) . . . . . . . . . 149 Conclusions from user tests . . . . . . . . . . . . . . . . . . . . 149 Next steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Evaluation in SDSM terms . . . . . . . . . . . . . . . . . . . . 150 151
10 Examples from the Science by Ear Workshop
10.1 Rainfall data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 10.2 Polysaccharides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 10.2.1 Polysaccharides - Materials made by nature . . . . . . . . . . . . 156 10.2.2 Session notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 11 Examples from the ICAD 2006 Concert 160
11.1 Life Expectancy - Tim Barrass . . . . . . . . . . . . . . . . . . . . . . 160 11.2 Guernica 2006 - Guillaume Potard . . . . . . . . . . . . . . . . . . . . 162 11.3 Navegar E Preciso, Viver No E Preciso . . . . . . . . . . . . . . . . 163 a 11.3.1 Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 11.3.2 The route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 11.3.3 Data choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 11.3.4 Economic characteristics . . . . . . . . . . . . . . . . . . . . . 167 11.3.5 Access to drinking water . . . . . . . . . . . . . . . . . . . . . 168 11.3.6 Mapping choices . . . . . . . . . . . . . . . . . . . . . . . . . 168 11.4 Terra Nullius - Julian Rohrhuber 11.4.2 The piece . . . . . . . . . . . . . . . . . . . . . 169 11.4.1 Missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 11.5 Comparison of the pieces . . . . . . . . . . . . . . . . . . . . . . . . . 172 12 Conclusions 175
x 12.1 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 A The SonEnvir framework structure in subversion A.1 177
The folder Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 177
A.2 The folder SC3-Support . . . . . . . . . . . . . . . . . . . . . . . . . 178 A.3 Other folders in the svn repository . . . . . . . . . . . . . . . . . . . . . 178 A.4 Quarks-SonEnvir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 A.5 Quarks-SuperCollider . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 B Models - code examples B.1.1 B.1.2 B.1.3 B.1.4 B.1.5 Physical sources 180 . . . . . . . . . . . . . . . . . . . . . . . . . 180
B.1 Spatialisation examples . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Amplitude panning . . . . . . . . . . . . . . . . . . . . . . . . 181 Ambisonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Headphones . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 . . . . . . . . . . . . . . . . . 186 189 Handling speaker imperfections
C Physics Background
C.1 Constituent Quark Models . . . . . . . . . . . . . . . . . . . . . . . . . 189 C.2 Potts model- theoretical background . . . . . . . . . . . . . . . . . . . 192 C.2.1 Spin models sound examples . . . . . . . . . . . . . . . . . . . . 193 D Science By Ear participants E Background on Navegar F Sound, meaning, language 195 197 198
List of Tables
5.1 5.2 5.3 5.4 5.5 6.1 9.1 9.2 Scale types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 The Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 The Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 The Data/Information: . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 The Data: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Sectors of economic activities . . . . . . . . . . . . . . . . . . . . . . . 95 Equally spaced EEG band ranges. . . . . . . . . . . . . . . . . . . . . . 135 Questionnaire scales for EEG sonication designs . . . . . . . . . . . . . 146
11.1 Navegar - Mappings of data to sound parameters . . . . . . . . . . . . . 169 11.2 Some stations along the timeline of Navegar . . . . . . . . . . . . . . . 170 B.1 Remapping spatial control values . . . . . . . . . . . . . . . . . . . . . 182 E.1 Os Argonautas - Caetano Veloso . . . . . . . . . . . . . . . . . . . 197
xi
List of Figures
2.1 3.1 3.2 5.1 5.2 5.3 5.4 5.5 6.1 6.2 6.3 6.4 6.5 6.6 6.7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 Some aspects of auditory memory, from Snyder (2000). . . . . . . . . . 9
Inclined plane for Galileis experiments on the law of falling bodies. . . . 15 UML diagram of the data model. . . . . . . . . . . . . . . . . . . . . . 27 The Sonication Design Space Map . . . . . . . . . . . . . . . . . . . 42
SDS Map for designs with varying numbers of streams. . . . . . . . . . . 46 All design steps for the LoadFlow dataset. . . . . . . . . . . . . . . . . 48 LoadFlow - time series of dataset (averaged over many households) . . . 55 LoadFlow - time series for 3 individual households . . . . . . . . . . . . 55 The toilet prototype system used for the FRR eld test. . . . . . . . . . 83 Graphical display of one usage episode (Excel). . . . . . . . . . . . . . . 85 FRR Log Player GUI and sounds mixer. SDS Map for the FRR Log Player. . . . . . . . . . . . . . . . . . 87 . . . . . . . . . . . . . . . . . . . . 89
GUI Window for the Wahlgesnge Design. . . . . . . . . . . . . . . . . 91 a SDS-Map for Wahlgesnge. . . . . . . . . . . . . . . . . . . . . . . . . 94 a GUI Window for the Social Data Explorer. . . . . . . . . . . . . . . . . 96 Excitation spectra of N (left) and (right) particles. . . . . . . . . . . 101 The QuantumSpectraBrowser GUI. . . . . . . . . . . . . . . . . . . . . 103 The Hyperne Splitter GUI. . . . . . . . . . . . . . . . . . . . . . . . . 106 Schema of spins in the Ising model as an example for Spin models. . . . 110 Schema of the orders of phase transitions in spin models. . . . . . . . . 111 GUI for the running 4-state Potts Model in 2D. . . . . . . . . . . . . . . 113 Audication of a 4-state Potts model. . . . . . . . . . . . . . . . . . . . 115 Sequentialisation schemes for the lattice used for the audication. . . . . 115 A 3-state Potts model cooling down from super- to subcritical state. . . 117 xii
xiii 7.10 Granular sonication scheme for the Ising model. . . . . . . . . . . . . . 118 7.11 A self similar structure as a state of an Ising model. . . . . . . . . . . . 119 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 9.1 9.2 9.3 9.4 9.5 9.6 9.7 The PDFShaper interface . . . . . . . . . . . . . . . . . . . . . . . . . 125 The TSAnalyser interface . . . . . . . . . . . . . . . . . . . . . . . . . 126 The interface for the time series listening experiment. . . . . . . . . . . 128 Probability of correctness over kurtosis in set 1 . . . . . . . . . . . . 129 Probability of correctness over kurtosis in set 2 . . . . . . . . . . . . 130 Probability of correctness over skew in set 2 . . . . . . . . . . . . . . 130 Probability of correctness over skew and kurtosis in set 2 . . . . . . 131 Number of replays over kurtosis in set 2 . . . . . . . . . . . . . . . . 132 The Sonication Design Space Map for both EEG Players. . . . . . . . . 137 The EEGScreener GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 The Montage Window. . . . . . . . . . . . . . . . . . . . . . . . . . . 139 EEG Realtime Sonication block diagram. . . . . . . . . . . . . . . . . 142 The EEG Realtime Player GUI. . . . . . . . . . . . . . . . . . . . . . . 143 Expert user test ratings for both EEGScreener versions. . . . . . . . . . 147 Expert user test ratings for both RealtimePlayer versions. . . . . . . . . 148
10.1 Precipitation in the Alpine region, 1980-1991. . . . . . . . . . . . . . . . 152 10.2 Orography of the grid of regions. . . . . . . . . . . . . . . . . . . . . . 153 10.3 SDSM map of Rainfall data set. . . . . . . . . . . . . . . . . . . . . . . 156 11.1 Magellans route in Antonio Pigafettas travelogue . . . . . . . . . . . . 165 11.2 Magellans route, as reported in wikipedia. . . . . . . . . . . . . . . . . 166 11.3 The countries of the world and their Gini coecients. . . . . . . . . . . 167 11.4 Terra Nullius, latitude zones . . . . . . . . . . . . . . . . . . . . . . . . 171 11.5 SDSM comparison of the ICAD 2006 concert pieces. . . . . . . . . . . . 173 B.1 The Spectralyzer GUI window. . . . . . . . . . . . . . . . . . . . . . . . 187 C.1 Multiplet structure of the baryons as a decuplet. . . . . . . . . . . . . . 191
Chapter 1
Introduction
Sonication of Scientic Data, i.e., the perceptualisation of data by means of sound in order to nd structures and patterns within them, is intrinsically interdisciplinary: It requires collaboration between experts in the respective scientic domains the data come from, in psychoacoustics, in the artistic design of synthetic sound, and in working with appropriate programming environments to realise successful sonication designs. The concept of the SonEnvir project (hosted at IEM Graz from 2005 to 2007) has put this view into practice: in four science domains, sonication designs for current research questions were realised in close collaboration with audio programming specialists. The research reported here mainly took place in the SonEnvir project context. This dissertation contributes to sonication research in three ways: The body of sonication designs realised within SonEnvir is described in detail. They may be reused in sonication research by the community, both as concepts and as open-source implementations on which new solutions can be based. For realising these sonication designs, a software framework was built in the language SuperCollider3 that allows for exible, rapid experimentation with evolving sonication designs (in Just In Time programming style). Being open-source, this framework may be reused and possibly maintained by the research community in the future. The analysis of this body of sonication designs (and a few others of interest) has eventually led to a general model of sonication design work, the Sonication Design Space Map. This contribution to sonication theory allows systematic reasoning about the process of developing sonication designs; based on data properties and context, it suggests candidates for the next experimental steps in the ongoing design process. It also provides concepts for analysing and categorising existing sonications designs more systematically.
1.1
Motivation
Data are pervasive in modern societies: Science, politics, economics, and everyday life depend fundamentally on data for decisions. Larger and larger amounts of data are being acquired in the hope of their usefulness, taking advantage of continuing progress in information technology. While data may contain obvious information (i.e., well-understood content), very often one also assumes they contain implicit or even hidden facts about the phenomena observed; understanding these hitherto unknown facts is highly desired. The research eld that most directly addresses this interest is Data Mining, or Exploratory Data Analysis. Two approaches are in common use for extracting new information from data: One is statistical analysis, the other is data perceptualisation, i.e, making data properties perceptible to the human senses; and many existing software tools combine both: from statistics programs like Excel and SPSS, science and engineering environments like MATLab and Mathematica, to a host of special-purpose tools for specic domains of science or economy. For scientists, perceptualisation of data is of vital interest; it is almost exclusively approached by visual means for a combination of reasons1 . Visualisation tools have permeated scientic cultures to the point of being invisible; many scientists are well-versed in tools that visualize their results, and rarely do scientists question how accurately and adequately visual representations represent the data content. Many Virtual Reality systems, such as the CAVE (Cruz-Neira et al. (1992)) and others, claim scientic data exploration as one of their stronger usage scenarios. Nevertheless, sound often seems to be added to such systems only as an afterthought, usually with the intention to achieve better immersion and emotional engagement (sometimes even alluding to cinema-like eects as the inspiration for the approach intended). Sonication, the representation of data by acoustic means, is a potentially useful alternative and complement to visual approaches that has not reached the same level of acceptance. This is the starting point for the research agenda described here: To create an interdisciplinary research setting where scientists from dierent domains (domain scientists) and specialists in artistic audio design and programming (sound experts) work together on auditory representations (sonication designs) for specic scientic data sets and their context. Such a venture should be well positioned to contribute to the progress of sonication as a scientic discipline. This has been the guiding strategy for the research project SonEnvir, described in some detail in section 4.1. The thesis presented here analyses sonication design work done within the SonEnvir project2 . From these designs, it abstracts a general model for approaching sonication
1 2
Availability, traditions of scientic cultures, ease of publishing on paper, and many others. These analyses follow the notion of providing rich context, taken from Science Studies (see e.g.
3 design work, from the general Sonication Design Space Map to detailed models of synthesis, spatialisation, and user interaction, presented in chapter 5. This abstraction process is based on Grounded Theory (Glaser and Strauss (1967)), aiming to design exible theoretical models that capture and explain as much detail as possible of the observation data collected. Such an integrative approach appears to be the most promising way forward for sonication as a research discipline. Finally, it should be noted that scientists are not the only social group that is interested in the role of data for modern societies: Artists have always taken part in the general discourse in society, and in recent years, media artists as well as musicians and sound artists have become interested in creating works of art that represent data in artistically interesting ways. This aspect certainly played a role in my personal motivation for this dissertation project.
1.2
Scope
While multimodal display systems are extremely interesting for data exploration, the complexity of interactions between modalities and individual dierences in perception is considerable. Therefore, the research work in this thesis has been intentionally limited to audio-centric data representation; however, simple forms of visual representations and haptic interaction have been provided where it seemed appropriate and helpful. Abstract representations of data by auditory means are not at all well understood yet; thus providing collections of dierent approaches for discussion may well be fruitful for the community. Special importance has been given to design methodology, and to considering the human-computer interaction loop; ranging from interaction in the design process to interactive choices and control in a realtime sonication design. Sonication designs may be intended for several dierent uses, with dierent aims. To give a few examples: Presentation entails clear, straightforward auditory demonstration of nished results; this may be useful in conference lectures, science talks, and similar situations. Exploration is all about interaction with the data, acquiring a feeling for ones data; this must necessarily remain informal, as it is a heuristic for generating hypotheses, which will be cross-checked and veried later with every analysis tool available. Analysis requires well-understood, reliable tools for detecting specic phenomena; accepted by the conventions in the scientic domain they belong to. In Pedagogy, dierent students may learn to understand structures/patterns in data better when presented in dierent modalities; the auditory approach may be more apLatour and Woolgar (1986); Rheinberger (2006)).
4 propriate and useful for some cases, e.g. people with visual impairments. This thesis focuses on studying the viability of exploration and analysis of scientic data by means of sonication; thus we (meaning the author and the SonEnvir team) developed exemplary cases in close collaboration with the domain scientists, implemented sonication designs for these cases, and analysed them to understand their general usefulness. We built a software framework to support the ecient realisation of these sonication designs; this is reported on in section 3.5.1, and available as open-source code here3 . The sonication design prototypes developed are also accessible online4 and can be re-used both as concepts and as fully functional code. Note that the SonEnvir software environment is not a complete big system, but a exible, extensible collection of approaches, and the infrastructure needed to support them. Ths software environment is freely extensible by others (being open source), and it aims to shorten development times for Auditory Display design sketches, thus allowing for freely moving between discussion and fast redesign. It also supports Auditory Display design pedagogy, as well as other uses, such as artistic projects involving data-related control of sound and image processes.
1.3
Methodology
The methodology employed in the SonEnvir project is centered on interdisciplinary collaboration - domain scientists bring current questions and related data from their research context, and learn the basic concepts of sonication and auditory perception. The questions are addressed with sonication design prototypes which are rened in iterative steps; common understanding and patience while learning is the key to eventual success. This concept was condensed into an experimental setting of the interdisciplinary work process: The Science By Ear workshop brought together international sonication experts, mostly Austrian domain scientists, and audio programming specialists to work on sonication designs in a very controlled setting, within very short time frames. This workshop has been received very favorably by the participants, and is reported on in section 4.2. The methodology of the thesis is based on Grounded Theory5 (Glaser and Strauss (1967), see also section 5.1): By looking at a body of sonication designs, and analysing their context, design approaches and decisions, a general, practice-based model is abstracted: the Sonication Design Space Map (SDSM). Aspects of this model that
https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Framework/ https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/ 5 In sociology, Grounded Theory is used inductively to create new hypotheses from observations or data collected with few pre-assumptions; this is in contrast to formulating hypotheses a priori and testing them by experiments.
4 3
5 warrant further detail are given: models for synthesis approaches, spatialisation, and user/task/interaction. The sonication designs analysed stem from the following sources: Work with SonEnvir domain scientists The Science By Ear workshop Submissions to the ICAD 2006 concert
1.4
Overview of this thesis
Chapter 2, Psychoacoustics, Perception, Cognition, and Action, provides the necessary background in psychoacoustics, covering mainly psychoacoustics and auditory cognition literature that is directly relevant to sonication design work in more detail, rather than giving a general overview of the psychoacoustics literature. Chapter 3, Sonication Systems, provides an introduction to sonication and its history, and covers some current systems that support sonication design work. The software system implemented for the SonEnvir project is described here from a more general perspective. Chapter 4, Sonication and Interdisciplinary Research, provides further details on the interdisciplinary nature of sonication research; here, the research design of the SonEnvir project, and two activities within it, namely the Science By Ear workshop and the ICAD 2006 Concert, are described. Chapter 5, General Sonication Models, is the main contribution to sonication theory in this thesis. It describes a general model for sonication design work, divided into several aspects: Overall design decisions and strategies are covered by the Sonication Design Space Map (SDSM); appropriate synthesis approaches are covered in the Synthesis model; user interaction is covered in the User Interaction model; and spatial aspects of sonication design are covered in the Spatialisation model. Chapters 6, 7, 8, and 9 present example sonication designs from the four domain sciences in SonEnvir, chapter 10 presents designs for two datasets explored in the Science By Ear workshop, and chapter 11 discusses and compares four works from the ICAD 2006 concert. This is the main practical and analytic contribution in this thesis. These chapters describe much of the body of sonication designs created within the SonEnvir project, as well as some others; this body of designs provided the background material for creating the General Sonication Models. Chapter 12, Conclusions, positions the scope of work presented within the wider context of sonication research, and concludes the insights gained.
Chapter 2
Psychoacoustics, Perception, Cognition, and Interaction
2.1
Psychoacoustics
Psychoacoustics is a branch of psychophysics, the psychological discipline which studies the relationship between (objective) physical stimuli and their subjective perception by human beings; psychoacoustics then studies acoustic stimuli and their auditory perception. Consequently, much of its literature is mainly concerned with the physiological base of auditory perception, i.e., nding out how perception works by creating stimuli that force the auditory system into specic interpretations of what it hears. When considering the stimuli used in traditional psychoacoustics experiments as a world of sounds, this world has an extremely reduced vocabulary. Of course this reduction makes perfect sense for experiments which try to clarify how (especially lower level, more physiological) perceptual mechanisms (assumed to be hard-coded in the neural hardware) work, but the knowledge thus acquired is often only indirectly useful for sonication design work. A number of works are considered major references for the eld: For psychoacoustics in general, Psychoacoustics - Facts and Models (Zwicker and Fastl (1999)) is very comprehensive; a good introductory textbook that is also accessible for non-specialists is An Introduction to the Psychology of Hearing (Moore (2004)); Bregman thoroughly studies the organisation of auditory perception in more complex (and thus nearer to everyday life) situations in Auditory Scene Analysis (Bregman (1990)); for the spatial aspects of human hearing, the standard reference is Spatial Hearing (Blauert (1997)). The typical background of psychoacoustics research is speech, spatial hearing, and music; sonication is fundamentally dierent from all of these, possibly with the exception of conceptual similarity to experimental strands of electronic music. The main concepts in these sources which are relevant for sonication research are:
7 Just Noticeable Dierences (JNDs) for dierent audible properties of sounds (and consequently, the corresponding synthesis parameters) have been studied extensively; being aware of these helps to make sure that dierences in synthetic sounds will be noticeable by users with normal hearing. Masking Eects can occur when sonications produce dense streams of events; understanding how these depend on properties of the individual events is important to avoid perceptually losing information in the soundscape created by a sonication design. Auditory Stream Formation and its rules are essential for multiple stream sonication; here it is important to control whether streams will tend to perceptually segregate or fuse into merged percepts. Testing Methodology can be employed to verify that sonication users are physically able to perceive the sensory dierences of interest. In eect, this entails writing auditory tests for sonication designs, such that designers can test that they can hear the dierentiation they are aiming for, and that users can acquire analytical listening skills from well-controlled examples. Cognitive and Memory Limits determine how we understand common musical structures, and in fact, much music intended to be accessible is created (unknowingly) conforming to these limits. Sonication design issues from choices for time scalings, to user interface options for quick repetitions, choosing segments to listen to, and others, also crucially depend on these limits. More recent research assumes the perspective of Ecological Psychoacoustics (Neuho (2004)), which takes into account that in daily life, hearing usually deals with complex environments of sounds, and thus allows for considering sonication designs from the perspective of ecologies of sounds that coexist. However, in a way sonication research and design work addresses a problem that is inverse to what psychoacoustics studies: rather than asking how we perceive existing worlds of sounds, the question in sonication is, how can we create a world of sounds that can communicate meaning by aggregates of nely dierentiated streams of sound events? Bob Snyder actually addresses this inverse problem (i.e., how to create worlds of sounds that can communicate meaning) directly, if for a more traditional purpose: Music and Memory (Snyder (2000)) is a textbook for teaching composition to non-musicians in a perceptually deeply informed way, in a course Snyder gives at the Art Institute of Chicago. He describes how limitations of perception and memory inuence artistic choices, and explains and demonstrates these with examples from a very wide range of musical cultures and traditions, almost entirely without traditional (Western) music notation. This
8 is intended to give musicians/composers informed free choice to stay within these limitations (and be accessible), or approach and transgress them intentionally. By covering a wide range of psychoacoustics and auditory perception literature from the perspective of art practice, and describing it in terms accessible for art students, many of whom do not have traditional musical or scientic training, Snyder has created a very useful resource for practicing sonication designers who are willing to learn more about creating perceptually informed (and artistically interesting) worlds of sounds.
2.2
Auditory perception and memory
This section is a brief summary of the rst part of Music and Memory, to provide enough background for readers to follow auditory perception-related arguments made later. Figure 2.1 shows a symbolic representation of the current models of both bottom-up and top-down perceptual processes. Bottom-up processes begin with sound exciting the eardrums, which gets translated into ring patterns of a large number of auditory nerves (ca. 30.000) coming from the ears. For a short time, a raw representation of the sound just heard remains in echoic memory. This raw signal is being held available for many concurrent feature extraction processes: these processes can include rather lowlevel aspects (which are almost certainly built into the neural hardware) like ascribing sound components coming from the same direction to the same sound source, but also higher-level aspects like a surprising harmonic modulation in a piece of music (which is certainly culturally learned). The extracted features are then integrated into higher level percepts, often in several stages; in this process of abstraction, ner details are discarded, e.g. pitches in a musical context are categorised into a familiar tuning system, and nuances in rhythm and articulation usually also fade quickly from memory, unless one makes a special eort to retain them. Feature extraction interacts very strongly with long term memory: personal auditory experience determines what is in long term memory, so for any listener, the extracted features will unconsciously activate related memory content, which may or may not become conscious. Note that unconsciously activated memories feed back into the feature extraction processes, potentially priming the perceptual binding that happens toward specic cultural or personal notions. Short term memory (STM) is the only conscious part in gure 2.1: perceptual awareness of what one is hearing now, as well as the few related memories that become activated enough are the only results of perception one becomes consciously aware of. Short term memory content can be rehearsed, and thus kept in working memory for a while, which increases its chance of being committed to long term memory eventually. On average,
Figure 2.1: Some aspects of auditory memory, from Snyder (2000), p.6. The connections
shown are only a momentary conguration of the perceptual system, and will continuously change quite rapidly.
short term auditory memory can keep several seconds of sound around. This depends on chunking: generally, it is assumed that one can keep 7 +- 2 items in working memory at any moment; however, one can and does increase this number by forming groups of multiple items, which are then treated by memory as single (bigger) items (again with a limit of ca. 7 applying).
10 The longer the auditory structures one tries to keep in memory, the more this depends on abstraction; i.e. forming categories, simplifying detail, and grouping into higher level items. This imposes a limit that is relevant for sonication contexts: comparing a hard to categorize structural shape that only becomes recognizable over two minutes to a potentially similar episode of two minutes one hears an hour later is very dicult. Generally, while bottom-up processes (usually equalled with perception) are usually assumed to be properties of the human neural system, and thus quite universal for all people with normal hearing, top-down processes (often equalled with cognition) are more personal: they depend on cultural learning and are informed by individual experience, and thus can vary much more between individuals.
2.3
Cognition, action, and embodiment
A closer connection to sonication research, as well as some useful terminology, can be found in Music Cognition research: Recent work, e.g. by Marc Leman (Leman and Camurri (2006), and Leman (2006)), denes terminology that works well for describing what sonication can achieve. Leman talks of proximal and distal cues: Proximal (near) cues refer to the perceptually relevant features of auditory events, i.e. the audible properties on the surface of a sound event; by contrast, distal (further away) cues are actions inferred by the listeners that are likely to have caused the proximal cues. One example of distal actions would be a musicians physical actions; and a little further away, a performers likely intentions behind her actions would also be considered distal cues. In recent years, Cognition research has widely moved away from the traditionally abstract notion of cognitive (meaning only dealing with symbols, and thus easy to model by computation); today the idea is widely accepted that cognition is deeply intertwined with the body, resulting in the concept of Embodied Cognition (see e.g. Anderson (2003)); applying this idea to auditory cognition, Leman says that the perception of gesture in music involves the whole body (of the performer and the listener). Music listeners who engage with listening may spontaneously express this by moving along with the music; when asked in experimental settings to make movements that correspond to the music they are listening to, even musically untrained listeners can be remarkably good at imitating performer gestures. Appropriating this terminology and applying it to sonication, one can describe sonication elegantly in these terms: sound design decisions inform details of the created streams of sound, i.e. they determine the proximal cues; ideally, these design decisions lead to perceptual entities (auditory gestalts), which can create a sensation of plausible distal cues behind the proximal cues. In case of success, these distal cues, which arise
11 within the listeners perception, create an implied sense in the sounds presented (which could be called the sonicate); thus these distal cues are likely to be closely related to data meaning (the equivalent to performers gestures, which are commonly taken to correspond closely to their intentions). In reecting on his research on design of experimental electronic music instruments, David Wessel argues that the equivalent of the babbling phase (of small infants) is really essential for electronic music instruments: free-form, purpose-free interaction with the full possibility space of an instrument allows for more ecient and meaningful learning of what the instrument is capable of; just like children learn the phonetic possibilities of their vocal tract by (seemingly random) exploration (Wessel (2006)). He cites a classic experiment by Held and Hein, where two kittens are acquiring visual perception skills in very dierent ways: one kitten can move about the space, while the other kitten gets the same visual stimuli, but does not make its own choices of where to move - instead, it has the moving kittens choices imposed on it. This second kitten sustained considerable perceptual impairments. Wessel argues that the role of sensory-motor engagement is essential in auditory learning, but not well understood yet; he suggests designing electro-acoustic musical instruments such that they allow for the described forms of interaction by providing control intimacy, in short low-latency, highresolution, multichannel control data from performer gestures. This strategy should create a long term chance of arriving at the equivalent of virtuosity on (or at least mastery of) that instrument. Transposed to the context of sonication for scientic data, this is in full agreement with an Embodied Cognition perspective, and is another strong argument for allowing as much user interaction with sonication tools as possible: from haptic interfaces used e.g. for dynamic selection of data subsets, to access for tuning sound design parameters, to fully accessible code that denes how a particular sonication design operates.
2.4
Perception, perceptualisation and interaction
Perception of the physical world is intuitively non-modal and unied: events in the world are synchronous, so sensory input from dierent modalities is too1 ; many multimodal data exploration projects use virtual environments so that they can provide integrated visual, auditory and haptic modes for perception and interaction. The argument that learning is strongly dependent on sensory-motor involvement has found its way into HCI research literature; here, the common term is closing the human-computer interaction loop (see e.g. Dix (1996)).
One interesting exception here is far away events that are both visible and audible; the puzzling dierence between speeds of sound and light has led to the rst measurements of the speed of sound.
1
12 In the context of sonication research, this has led to a special conference series, the Interactive Sonication workshops (ISon)2 , so far held at Bielefeld (2004) and York (2007). In a special issue of IEEE Multimedia resulting from ISon2004, the editors emphasize that learning how to play a sonication design with physical actions, in fact similar to a musical instrument, really helps for an enactive understanding of both the nature of the perceptualisation processes involved and of the data under study (Hermann and Hunt (2005)). They nd that there is a lack of research on how learning in interactive contexts take place; obviously this applies equally to interactive visual display applications.
2.5
Mapping, mixing and matching metaphors
Mapping data dimensions to representation parameters always involves choices. Walker and Kramer (1996) report interesting experiments on this topic: They play through a number of dierent permutations of mappings of the same data to the same set of display parameters, rated by the designers as intuitive, okay, bad, and random, and they test how well users accomplished dened tasks with them. Expert assumptions turned out to be not as accurate as they expected; users could learn quite arbitrary mappings nearly as well as supposedly more natural ones3 . Whether this also holds true for exploratory contexts, when there is no pre-dened goal to be achieved, is an open question. Here, performance in an easy-to-measure (but trivial) task is not a very interesting criterium for sonication designs. On the other hand, it is of course good design to reduce cognitive load while users are involved in data exploration (by using cognitively simple mappings). For visualisation systems designed for exploration, the idea of measuring insight and the number of hypotheses formed in the exploration process has been suggested recently (Saraiya et al. (2005)); as far as we know this evaluation strategy has not been applied to exploratory sonication yet. In de Campo et al. (2004), we make the case that the impression of perceiving the sources of representations (in Lemans terms, the distal cues) becomes easier when the metaphorical distance between the data dimension and the audible representation appears smaller; i.e., when a reasonably similar concept in the world of sound was found for the data property to be communicated. For example, almost all time-series data can be treated as if they were acoustic waveforms, which is what audication essentially does. With more complex data, the option of accessing data subsets by interactive choice, browsing through the data space with dierent auditory perspectives, can potentially allow forming new hypotheses on the data.
http://interactive-sonication.org/ This paper was republished in a recent special issue of IEEE Spectrum Multimedia on Sonication, with a new commentary (Walker and Kramer (2005a,b))
3 2
Chapter 3
Sonication Systems
In a certain Chinese Encyclopedia, the Celestial Emporium of Benevolent Knowledge, it is written that animals are divided into: (a) Those that belong to the Emperor, (b) embalmed ones, (c) those that are trained, (d) suckling pigs, (e) mermaids, (f) fabulous ones, (g) stray dogs, (h) those included in the present classication, (i) those that tremble as if they were mad, (j) innumerable ones, (k) those drawn with a very ne camelhair brush, (l) others, (m) those that have just broken a ower vase, (n) those that from a long way o look like ies. in Jorge Luis Borges - The Analytical Language of John Wilkins Borges (1980) Perceptualisation of scientic data by visualisation has been extremely successful. It is by now completely established scientic practice, and a wide variety of visualisation tools exist for a wide range of applications. Given the dierent set of perceptual strengths of audition compared to vision, sonication has long been considered to have similar potential as an exploratory tool for scientists which is complementary to visualisation and statistics. One strategy to realize more of this potential of sonication is to create a general software environment that supports fast development of sonication designs for a wide range of scientic applications, a design process in close interaction with scientic users, and simple exchange of fully functional sonication designs. This is the central idea of the SonEnvir project, as described in detail (in advance of the project itself) in de Campo et al. (2004). There are a number of software packages for sonication and auditory display (Ben-Tal et al. (2002); Pauletto and Hunt (2004); Walker and Cothran (2003), and others), all of which make dierent choices: whether they are to be used as toolkits to integrate into applications, or whether they are full applications already; which data formats or realtime input modalities are supported; what sonication models are assumed (sometimes 13
14 implicitly); and what kinds of interaction modes are possible and provided. This chapter provides a very short overview of the history of sonication, and describes the most common uses of sonication. Then, some historical and current sonication toolkits and environments are described, and the main types of audio and music programming environments. Finally, the system developed for the present thesis is described.
3.1
3.1.1
Background
A short history of sonication
The prehistory and early history of sonication is covered very interestingly (within a very good general overview) in Gregory Kramers Introduction to Auditory Display (Kramer (1994a)). Employing auditory perception for scientic research was not always as unusual as it is considered in todays visually dominated scientic cultures; in fact, sonication can be said to have had a number of precursors: In medicine, the practice of auscultation, i.e., listening to the bodys internal sounds for diagnostic purposes, seems to have been present in Hippocrates time (McKusick et al. (1957)). This was long before Laennec, who is usually credited with the invention of the stethoscope in 1819. In engineering, mechanics tend to be very good at hearing which parts of a machine they are familiar with are not functioning well; just consider how much good car mechanics can tell just from listening to a running engine. Moving on to technically mediated acoustic means of measurement, there is evidence that Galileo Galilei employed listening for scientic purposes: Following Stillman Drakes biography of Galilei (Drake (1980)), it seems plausible that Galilei used auditory information to verify the quadratic law of falling bodies (see gure 3.1.1). By running strings across the plane at distances increasing according to the quadratic law ( 1, 4, 9, 16, etc.), the ball running down the plane would ring the bells attached to the strings in a regular rhythm. In a reconstruction of the experiment, Riess et al. (2005) found that time measuring devices of the 17th century were likely too imprecise, while listening for rhythmic precision works well and is thus more plausible to have been used. An early example of a technical device rendering an environment variable perceptible which humans do not naturally perceive is the Geiger-Mller-Counter: Incidence of a u particle generated by radioactive decay on the detector causes an audible click; the density of the irregular sequence of such clicks informs users instantly about changes in radiation levels.
15
Figure 3.1: Inclined plane for Galileis experiments on the law of falling bodies.
This device was rebuilt at the Istituto e Museo di Storia della Scienza in Florence. c Photo Franca Principe, IMSS, Florence.
Sonar is another interesting case to consider: Passive Sonar, where one listens to underwater sound to determine distances and directions of ships, has apparently been experimented with by Leonardo da Vinci (Urick (1967), cited in Kramer (1994a)); in Active Sonar, sound pulses are projected in order to penetrate visually opaque volumes of water, listening to reections to understand local topography, as well as moving objects of interest, be they vessels, whales, or sh swarms. In seismology, Speeth (1961) had subjects try to dierentiate between seismograms of natural earthquakes and articial explosions by playing them back speeded up. While subjects could classify the data very successfully, and rapidly (thanks to the speedup), little use was made of this until Hayward (1994) and later Dombois (2001) revived the practice and the discussion.
16 An interesting case of auditory proof of a long-standing hypothesis was reported in Pereverzev et al. (1997): In the early 1960s, Josephson and Feynman had predicted quantum oscillations between weakly coupled reservoirs of superuid helium; 30 years later, the eect was veried by listening to an amplied vibration sensor signal of these mass-current oscillations (see also chapter 7). One can say that the history of sonication research ocially began with the rst International Conference for Auditory Display (ICAD) in 1992, organised by Gregory Kramer to bring all the researchers working on related topics, but largely unaware of each other, into one research community. The extended book version of the conference proceedings (Kramer (1994b)) is considered the main founding document of this research domain, and the yearly ICAD conferences are still the central event for researchers, generating much of the body of sonication research literature. In 1997, the ICAD board wrote a report for the NSF (National Science Foundation) on the state of the art in sonication1 ; and more recently, a collection of seminal papers mostly presented at ICADs between 1992 and 2004 appeared as a special issue of ACM Transactions on Applied Perception(TAP, ACM (2004)), which shows the range and quality of related research. Many interesting applications of sonication for specic surposes have been made: Fitch and Kramer (1994) showed that an auditory display of medical patients life signs can be superior to visual displays; Gaver et al. (1991) found that monitoring a virtual factory (ArKola) by acoustic means works remarkably well for keeping it operating smoothly. The connection between neural signals and audition has its own fascinating history, from early neurophysiologists like Wedensky (1883) listening to nerve signals by telephone, to current EEG sonications like Baier et al. (2007); Hermann et al. (2006); Hinterberger and Baier (2005); as well as musicians fascination with brainwaves, beginning with Alvin Luciers Music for Solo Performer (1965), among many others. (See also the ICAD concert 2004, described in section 4.3.) The idea of listening for scientic insight keeps being rediscovered by researchers even if they seem to be unaware of sonication research; e.g., what James Gimzewski calls Sonocytology (Pelling et al. (2004), see also here2 ) is (in auditory display terminology) a form of audication of signals recorded with an atomic force microscope used as a vibration sensor. There are also current uses in Astronomy by NASA (Candey et al. (2006)), where one of the motivations given is providing better data accessibility for visually impaired scientists; and at University of Iowa3 , mainly dealing with electromagnetic signals.
1 2
http://icad.org/node/400 http://en.wikipedia.org/wiki/James Gimzewski 3 http://www-pw.physics.uiowa.edu/space-audio/
17 Nevertheless, a large number of scientists still appear quite surprised when they hear of the idea of employing sound to understand scientic data.
3.1.2
A taxonomy of intended sonication uses
Sonication designs may be intended for a wide range of dierent uses, with substantially dierent aims4 : Presentation calls for clear, straightforward, auditory demonstration of nished results; this may be useful in conference lectures, science talks, teaching contexts, and other situations. Exploration is very much about interaction with the data, acquiring a feeling for the data; while this seems a rather fuzzy target, and is in fact hard to measure, it is actually indispensible and central. Following Rheinberger (2006), exploration must necessarily remain informal; it is a heuristic for generating hypotheses - once they appear on the epistemic horizon, they will be cross-checked and veried with every analysis tool available. So generating some hypotheses that turn out to be wrong eventually is not a problem at all; in the worst case, if too many hypotheses are wrong, this can be an eciency issue. Analysis requires well-understood, reliable tools for detecting specic phenomena, which are accepted by the conventions in the scientic domain they belong to. The practice of auscultation in medicine may be considered to belong into this category, even though it only relies on physical means, with no electronic mediation. Also the informal practice of listening to seismic recordings belongs here. Monitoring is intended for a variety of processes that benet from continuous monitoring by human observers, whether in industrial production, in medical contexts like intensive care units, or in scientic experiments. Human auditory perception habituates quickly to soundscapes with little change; any sudden changes, even of an unexpected nature, in the soundscape are easily noticed, and enable the observer to intervene if necessary. Pedagogy - Dierent students may learn to understand structures/patterns in data better when presented in dierent modalities; an auditory approach to presentation may be more appropriate and useful in some cases. For example, students with visual impairments may benet from data representations with sound, as research on auditory graphs shows (e.g. Harrar and Stockman (2007); Stockman et al. (2005)).
4
Note that the points separated here may overlap; e.g. presentation and pedagogy certainly do.
18 Artistic Uses - Many works in sound art are sonication-based, whether they are soundonly installations, or more generally, data-driven multimedia works. The recent appearance of special topics issues like Leonardo Music Journal, Volume 16 (2006) conrm this trend, as do sonication research activities at art institutions like the Bern University of Arts5 . The intended uses a specic sonication system has been designed for largely determine the scope of its functionality, and its usefulness for dierent contexts.
3.2
Sonication toolkits, frameworks, applications
A number of sonication systems have been implemented and described since the 1980s. They all dier in scope of features, and limitations; some are historic, meaning they run on operating systems that are obsolete, while others are in current use, and thus alive and well; most of them are toolkits meant for integration into (usually visualisation) applications. Few are really open and easily extensible; some are specialised for very particular types of datasets. Current systems are given more space here, as they are more interesting to compare with the system developed for this thesis.
3.2.1
Historic systems
The Porsonify toolkit (Madhyastha (1992)) was developed at a time when realtime synthesis was still out of reach on aordable computers; thus Porsonify aimed to provide an interface for the Sun Sparcs audio device and two MIDI synthesizers. Behaviour dened for a single sound event (usually triggered from a single data point) is formulated in sonic widgets, which generate control commands for the respective sound device. Example sonications were created using data comparing living conditions of dierent U.S. cities (cf. the accompanying CD to Kramer (1994b)), and multi-processor performance data. The LISTEN toolkit (Wilson and Lodha (1996)) was written for SGI workstations, using (alternatively) the internal sound chip, or external MIDI as sound rendering; it was meant to be easy to integrate into existing visualisation software, which was done for visualising geometric uncertainty of surface interpolants, and for algorithmic uncertainty in uid ow. The Musical Data Sonication Toolkit, or MUSE (Lodha et al. (1997)), was a followup project, aiming to map scientic data to musical sound. Also written for SGI, it uses mapping to very traditional musical notions: timbres are traditional orchestra instruments and vowel sounds generated with CSound instruments, rhythms come from a choice of
5
see http://www.hkb.bfh.ch/y.html
19 seven dance rhythms, pitch is dened from the major scale, following rules for melodic shapes, and harmony is based on overtone ratios. It has been applied to visualize (sic) uncertainty in isosurfaces and volumetric data. A later incarnation, MUSART (Musical Audio transfer function Real-time Toolkit, see Joseph and Lodha (2002)) sonies data by means of musical sound maps. It converts data dimensions into audio transfer functions, and renders these with CSound instruments. Users can personalise their auditory displays by choosing which data dimensions to map to which display parameters. In the article cited, the authors report uses for exploring seismic volumes for the oil industry. Again, the authors emphasize their use of musical concepts for sonication design. While not a single software system, Auditory Information Design by Stephen Barrass (Barrass (1997)), is a fascinating collection of multiple concepts (all with catchy names): it encompasses a task-data analysis method (TaDa), a collection of use cases for nding auditory metaphors for design (ear-benders), a set of design principles (Hearsay), a perceptually linearised information sound space (GreyMUMS), and tools for designing sonications (Personify). The practical implementations described show a wide variety of approaches; they all share unix avor, often being shell scripts that connect commandline programs. Thus it is not one consistent framework, but rather a collection of how-to examples. For data treatment, mostly perl scripts are used; for sound synthesis, CSound, which at the time was non-realtime. Some examples also appeared in the CSound book (Boulanger (2000)) mentioned below.
3.2.2
Current systems
xSonify (Candey et al. (2006)) has been developed at NASA; it is also based on Java, and runs as a web service6 . It aims at making space physics data more easily accessible to visually impaired people. Considering that it requires data to be in a special format, and that it only features rather simplistic sonication approaches (here called modi), it will likely only be used to play back NASA-prepared data and sonication designs. SonART (Ben-Tal et al. (2002); Yeo et al. (2004)) is a framework for data sonication, visualisation and networked multimedia application. In its latest incarnation, it is intended to be cross-platform and uses OpenSoundControl for communication between (potentially distributed) processes for synthesis, visualisation, and user interfaces. The Sonication Sandbox (Walker and Cothran (2003)) has intentionally limited range, but it covers that range well: Being written in Java, it is cross-platform; it generates MIDI output e.g. to any General MIDI synth (such as the internal synth on many soundcards). One can import data from CSV textles, and view these with visual graphs; a mapping editor lets users choose which data dimension to map to which sound parameter: Timbre
6
http://spdf.gsfc.nasa.gov/research/sonication
20 (musical instruments), pitch (chromatic by default), amplitude, and (stereo) panning. One can select to hear an auditory reference grid (clicks) as context. It is very useful for learning basic concepts of parameter mapping sonication with simple data, and it may be sucient for many auditory graph applications. Development is still continuing, as the release of version 4 (and later small updates) in 2007 show. The Sonication Integrable Flexible Toolkit (SIFT, see Bruce and Palmer (2005)) is again a toolkit for integration into other applications, typically for visualisation. While it is also written in Java and uses MIDI for sound rendering, it emphasizes realtime data input support from network sources. It has been used for oceanographic data sets; however, the paper cited describes the rst prototype of this system, and no later versions of it seem to have been developed. Sandra Paulettos toolkit for Sonication (Pauletto and Hunt (2004)) is based on PureData (see section 3.3 below), and has been used for several application domains: Electromyelography data for Physiotherapy (Hunt and Pauletto (2006)), helicopter ight data, and others. While it supports some data types well, adapting it for new data is rather cumbersome, mainly because PureData is not a general-purpose programming language. SoniPy is a very recent and quite ambitious project, written in the Python language, and described in Worrall et al. (2007). It is still in the early stages of development at this time, but may well become interesting. Being an open source project, it is hosted at sourceforge7 ; at the beginning of this thesis, it did not exist yet. All these toolkits and applications are limited in dierent ways, based on resources for development available to their creators, and the applications envisioned for them. For the broad parallel approach we had in mind, and the exibility required for it, none of these systems seemed entirely suitable, so we chose to build on a platform that is both a very ecient realtime performance system for music and audio processing and a full-featured modern programming language: SuperCollider3 (McCartney (2007)). To provide some more background, here is an overview of the three main families of music programming environments.
3.3
Music and sound programming environments
Computer Music has been dealing with programming to create sound and music structures and processes for over fty years now; current music and sound programming environments oer many features that are directly useful for sonication purposes as well. Mainly, three big families of programs have evolved; most other music programming
7
http://sourceforge.net/projects/sonipy
21 systems are conceptually similar to one of them: Oine synthesis - MusicN to CSound MusicN languages started in 1957/58, from the Music I program developed at Bell Labs by Max Mathews and others; Music IV (Mathews and Miller (1963)), already featured many central concepts in computer music languages, such as the idea of a Unit Generator as the building block for audio processes (unit generators can be e.g. oscillators, noises, lters, delay lines, and envelopes). As the rst widely used incarnation, Music V, was written in FORTRAN and thus relatively easy to port to new computer architectures, it spawned a large number of descendants. The main strand of successors in this family is CSound, developed at MIT Media Lab beginning in 1985 (Vercoe (1986)), which has been very popular in academic computer music. Its main approach is to use very reduced language dialects for orchestra les (consisting of descriptions of DSP processes called instruments), and score les (descriptions of sequences of events that each call one specic instrument with specic parameters at specic times). A large number of programs were developed as compositional frontends, to write score les based on algorithmic procedures, such as Cecilia (Pich and e Burton (1998)), Cmix, Common Lisp Music, and others; so CSound has in fact created an ecosystem of surrounding software. CSound has a very wide range of unit generators and thus synthesis possibilities, and a strong community; e.g. the CSound Book (Boulanger (2000)) demonstrates its scope impressively. However, for sonication, it has a few disadvantages: Even though it is textbased, it uses specialised dialects for music, and thus is not a full-featured programming language. Any control logic and domain-specic logic would have to be built in other languages/applications, while CSound could provide a sound synthesis back-end. Being originally designed for oine rendering, and not built for high-performance realtime demands, it is not an ideal choice for realtime synthesis either. CSound has been ported to very many platforms. Graphical patching - Max/FTS to Max/MSP(/Jitter) to PD/GEM The second big family of music software began with Miller Puckettes work at IRCAM on Max/FTS in the mid-1980s, which later evolved into Opcode Max, which eventually became Cycling74s Max/MSP/Jitter environment. In the mid-1990s, Puckette began developing an open source program called PureData (Pd), later extended with a graphics system called GEM. All these programs share a metaphor of patching cables, with essentially static object allocation of both DSP and control graphs. This approach was never meant to be a full programming language, but a simple facility
22 to allow for patching multiple DSP processes written in lower-level (and thus more ecient) languages; with Max/FTS, the programs actually ran on a DSP card built by IRCAM. Thus, the usual procedure for making patches for more complex ideas often entails writing new Max or Pd objects in C; while these can run very eciently if well written, special expertise is required, and the development process is rather slow. In terms of sound synthesis, Max/MSP has a much more limited palette than CSound, though a range of user-written MSP objects exist; support for graphics with Jitter has become very good recently. Both Max and Pd have a strong (and somewhat overlapping) user base; Pd is somewhat smaller, having started later than Max. While Max is commercial software with professional support by a company, Pd is open-source software. Max runs on Mac OS X and Windows, but not on linux, while Pd runs best on linux, reasonably well on Windows, and less smoothly on OS X. Realtime text-based environments - SuperCollider, ChucK The SuperCollider language and realtime system came from the idea of having both realtime synthesis and musical structure generation in one environment, using the same language. Like Max/PD, it can be said to be an indirect descendant of CSound. From SC1 written by James McCartney in 1996, it has gone through three complete rewriting cycles, thus the current version SC3 is a very mature system. In version 2, SC2, it inherited much of its language characteristics from Smalltalk; in SC3 the language and the synthesis engine were split into a client/server architecture, and many syntax features from other languages were adopted as options. Its sound synthesis is fully dynamic like CSound, it has been written for realtime use with scientic precision, and being a textbased, modern, elegant, full programming language, it is a very exible environment for very many uses, including sonication. The range of unit generators is quite wide, though not as abundant as in CSound; synthesis in SC3 is very ecient. SC3 also provides a GUI system with a variety of interface widgets, but its main emphasis is on stable realtime synthesis. SC3 has a somewhat smaller user community, which is nevertheless quite active. Having become open source with version 3, it has since ourished in terms of development activity. SC3 runs very well on OS X, pretty well on Linux, and less well on Windows (though the SonEnvir team put some eort into improving the Windows port). The ChucK language has been written by Ge Wang and Perry Cook, starting in 2002. It is still under development, exploring specic notions such as being strongly-timed, and others. Like SC3, it is not really intended as a general purpose language, but as a music-specic environment. While being cross-platform, and having interfacing options similar to SC3 and Max, it has a considerably smaller palette of unit generator choices. One possible advantage of ChucK is that it has very ne-grained control over time; both synthesis and control can have single-sample precision.
23
3.4
Design of a new system
As the existing systems did not have the scope we required, we designed our own. A full description of the design of the Sonication Environment as it was before the SonEnvir project started is given in de Campo et al. (2004); the following section is updated from a post-project perspective.
3.4.1
Requirements of an ideal sonication environment
The main design aim is to allow uid development of new and modication of existing sonication designs. By using modular software design which decouples components like basic data handling objects, data processing, sound synthesis processes, mappings used, playback approaches, and real-time interaction possibilities, all the individual aspects of one sonication design can be re-used as starting points for new designs. A Sonication Environment should: Read data les in various formats. The minimum is human-readable text les for small data sets, and binary data les for fast handling of large data sets. Reading routines for special le formats should be writable quickly. Realtime input from network sources should also be supported. Perform basic statistics on the data for user orientation. This includes (for every data channel): minimum, maximum, average, standard deviation, and simple histograms. This functionality should be user-extensible in a straightforward way. Provide basic playback facilities like ordered iteration (in eect, a play button with a speed control), loop playback of user-chosen segments, zooming while playing, data-controlled playback timing, and 2D and 3D navigation along user-chosen data dimensions. Later on, navigation along data-derived dimensions such as lower-dimensional projections of the data space is also desirable. Provide a rich choice of interaction possibilities: Graphical user interfaces, MIDI controllers, graphics tablets, other human interaction devices, and tracking data should be supported. (The central importance of interaction only became clear in the course of the project.) Provide a variety of possible synthesis approaches, and allow for changing and rening them while playing. (The initial design suggested a more static library of synthesis processes, which turned out to be unnecessary.) Allow for programming domain-specic models to run and generate data to sonify. This strongly suggests a full modern programming language. (This requirement only came up in the course of the project, for the physics sonications.)
24 Store sonication designs in human-readable text format: This allows for longterm platform independence of designs, provides possibilities for informal rapid exchange (text is easy to send by e-mail), and can be an appropriate and useful publication format for sonication designs that employ user interaction. Serve to build a library/database of high-quality sonication designs made in this environment, with real research data coming from a diverse range of scientic elds, developed in close collaboration with experts from these domains. More generally, the implementation should be kept as lightweight, open, and exible as possible to accommodate evolving new understanding of the design issues involved.
3.4.2
Platform choice
While PureData was a platform option for a while, we soon decided to stay entirely in SuperCollider3, based on the list of requirements given above. This decision had some benets, as well as some drawbacks. The benets we experienced were: A fully open source programming language is easy to extend in ways that are useful for a wider community; Interpreted languages like SC3 provide relatively simple entry to users programming (starting with little scripts, and changing details for experimentation); Readability has turned out to be very useful, as the code script is also a full technical documentation; An interactive development environment encourages code literacy, and thus general competence, of sonication users. In this context, the notion of Just In Time Programming (as described e.g. in Rohrhuber et al. (2005)) has turned out to be extremely useful for interdisciplinary team development sessions, see chapter 4. The main drawback we encountered was that SC3 only runs really well on OS X, a bit more uncomfortably on linux (which was not used by any of the team members), while on Windows (which we had to support), it was initially quite unusable; this led to SonEnvir taking care of substantially improving the Windows port.
3.5
SonEnvir software - Overall scope
The main goal of the SonEnvir sonication framework is to allow for the creation of meaningful and eective sonications more easily. Such a sonication environment sup-
25 ports sonication designers by providing software components, and concepts for using them. It combines all the important aspects that need to be considered: data representation, interaction, mapping and rendering. A famous phrase about computer music programming systems is that they are kitchens, not restaurants, which also applies to SonEnvir: rather than giving users a menu of nished dishes to choose from (which other people created), it provides ingredients, utensils, recipes and examples.
3.5.1
Software framework
SuperCollider3 has a very elegant extension system; one can assemble components to be published in dierent ways: Classes, their respective Help les, UnitGenerator plugins, and all kinds of support les can be combined into packages which can be downloaded, installed, and de-installed directly from within SC3. Such packages are called Quarks. Currently, most of the code created in the project is under version control with Subversion at the SonEnvir website8 . In order to achieve maximum reuse, some parts have been converted into Quarks, while for others, this is still in process. Many items of general usefulness have already been migrated directly into the main SC3 distribution. The sonication-specic components will remain available at the SonEnvir website, as will the collection of sonication designs. (For an overview, see the end of this section.) The subsequent sections briey describe the overall structure of the framework and the design and implementation of the data representation module. For reference, the framework structure in the subversion repository is described in appendix A.
3.5.2
Framework structure
The SonEnvir framework implements a generic sonication model consisting of four aspects: Data model The data model unies the notions of how data are handled in the framework and deals with the diversity of data types that can be used for sonication. User-Interaction model This aspect deals with all aspects of interactive model for exploration and analysis of data. It is mainly implemented in the JInT package (see below). Synthesis model The mapping onto properties of sound or the creation of more complex structures of sound by a sonication model. As all the needed code infras8
https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/
26 tructure existed in the JITLib library within SC3, it is not coded as classes, but only a conceptual model, described in section 5.3, Synthesis Models. Spatialisation model This model takes care of the audio rendering of the designed sonication for dierent requirements and playback environments. It is described in detail in section 5.5, Spatialisation Model. Its code components reside partially in SC3 itself, in the Framework/Rendering folder, and in the AmbIEM package9 , which is now a SuperCollider quark package. All these models taken together allow for designing sonications in a exible way. As the data model is the most implementation-related aspect, it is described in detail here, and not in the more conceptual chapter on the general models (chapter 5).
3.5.3
The Data model
The aim of the data model is to provide a unied representation of dierent types of data that can be used in the sonication framework. This demands a highly exible and abstract model as data may have very dierent structures. The data model also provides functionality for input/output in the original form the data are supplied in, and includes various statistical functions for data analysis. All models are object-oriented in design, and the classes and their inter-relations are described using UML (Unied Modelling Language) charts. In order to avoid possible name-space conicts with other class denitions on any target platform, the classes in the SonEnvir framework have a prex SE. Figure 3.2 illustrates the design of the data model in a UML graph. The SEData class is central to the design of the data model. It is the highest abstraction of any kind of dataset to be sonied. Besides providing properties for the name and the data source, the actual data is organised in channels. An SEData object contains instances of SEDataChannel, which is the base class for all dierent types of data channels and represents a single dimension in a dataset. Data channels can be numerical data, but also any sort of nominal data with the only restriction that they are organised as a sequence and addressable by index. SENumDataChan species that the data values in the given channel are all numbers, and provides a basic set of numerical properties of this set of numbers. Besides the usual minimum, maximum, mean, and standard deviation values, it also implements functions that proved to be useful for sonications, such as removing osets or a drift, as well as normalising and whitening the numbers. Another important subclass of numeric data channel is covering all time-based data channels. These basically refer to two types: time series (SETimeSeriesCh) providing
9
AmbIEM is a port of a subset of a system by Musil et al. (2005); Noisternig et al. (2003).
27
Figure 3.2: UML diagram of the data model. a sample rate, and data with time stamps (SETimeStampsCh). Although basically a numeric data channel as well, we decided to introduce another basic type for vector based data with a subclass for 3D spatial data. Any of the data channel types mentioned above may be combined in order to form a dataset described through SEData. For convenience, there are two predened classes derived from SEData that cover some common combinations of data channels: SETimeData and SESpatialData. Every SEData instance is associated with a SEDataSource. This class abstracts the
28 access to the raw data material. It takes care that the space required for big datasets is made available when needed, and uses dierent parsers for reading dierent le formats. If needed, it can be extended to include network resources and real-time data. Each SEDataSource also provides information about the type of each data series that is contained in the raw data. This might be available from headers of some data formats, or it has to be set explicitly such that SEData can create the appropriate SEDataChannels. Like the entire framework, the data model is provided as a class library for SuperCollider3. Once the library is brought into place, it is compiled at startup of the SuperCollider3 language. The following listing illustrates using SEData objects in SC3: // Example listing of data model usage in SC3. ( // read an ascii data file ~vectors = FileReader.readInterpret( "~/data/C179_T_s.dat", true, true ); // supply data channel names by hand ~chanNames = [temperature, solvent, specificHeat, marker]; // make an SEData object ~phaseData = SEData.fromVect( phaseData, ~chanNames, ~vectors, SENumDataChan // all numerical data, so use SENumDataChan class. ); // provide simple statistics ~phaseData.analyse; ~phaseData.means.postln; ~phaseData.stdDevs.postln; )
Chapter 4
Project Background
A physicist, a chemist, and a computer scientist try to go up a hill in an ancient car. The car crawls, stutters, and then stalls. The physicist says, The transmission ratio is wrong - Ill take a look at it.; the chemist says, No, the fuel mix is wrong, Ill experiment with it.; the computer scientist says, why dont we all get out, close the doors, get back in, and try again.. This chapter describes the research design for and the working methodology developed within the SonEnvir project, the design and the process of the Workshop Science By Ear the project team held in March 2006, and the concert the team organized for the ICAD 2006 conference in London. As most of the work presented in this dissertation was done within the context of the SonEnvir project, it is helpful to provide some background on that context here.
4.1
The SonEnvir project
The central concept of the SonEnvir project was to create an interdisciplinary setting in which scientists from dierent domains and sonication researchers could learn how to work on data perceptualisation by auditory means. The project took place from January 2005 to March 2007, and it was the rst collaboration of all four universities in Graz. SonEnvir was funded by the Future Funds of the Province of Styria.
4.1.1
Partner institutions and people
The project brought together the following institutions as partners: the Institute of Electronic Music and Acoustics (IEM), at the University of Music and Dramatic Arts Graz; 29
30 the Theoretical Physics Group - Institute of Physics, at the University of Graz; the Institute for Sociology, at the University of Graz; the University Clinic for Neurology, at the Medical University Graz; and the Signal Processing and Speech Communication Laboratory SPSC, at the University of Technology Graz. The IEM was the host institution coordinating the project, and the source of audio design and programming as well as sonication expertise in the project. The main researcher here was the author of this dissertation. From the Institute of Sociology, Christian Day provided data from a variety of sociologie cal contexts, and co-designed and experimented with sonications for them, as discussed in section 6. He was also responsible for feedback and evaluation of the interdisciplinary work process from the perspective of sociology of science. The Physics group had changing members in the course of the project: initially Bianka Sengl provided data from quantum physics research, namely from competing Constituent Quark models, as discussed in section 7.1 and appendix C.1. Later on, Katharina Vogt worked on a number of dierent physics topics and sonications for them, including the Ising and Potts models discussed in section 7.2. The Signal Processing and Speech Communication Laboratory was represented by Christopher Frauenberger, who worked on a number of dierent sonication experiments, among others on propagation of electromagnetic waves, and time series classication, as discussed in section 8. He also contributed substantially to the code implementations, and has become the main developer for the python-based Windows version of SuperCollider3. For the Institute of Neurology, Annette Wallisch was the main researcher. She provided a variety of EEG data for experimenting with sonication designs for screening and monitoring, as described in section 9. She also dealt with an industry research partner, the company BEST medical systems (Vienna), and she wrote a dissertation (Wallisch (2007), in German) on the research done within SonEnvir.
4.1.2
Project ow
In order to create a broad base of sonication designs for a wide range of data from the scientic contexts described, the project was structured in three iterations. Each iteration began with identifying potentially interesting research questions from the domains, and collecting example data for these. Then sonication designs were created and tested, which became a more collaborative and experimental cooperation as the project proceeded.
31 In each of the scientic elds, we started by building simple sonication designs to begin the discussion process. The key question here has turned out to be learning how to work in such a highly interdisciplinary group, how to build bridges for common understanding, and to develop a common language for collaboration. We focused on building sonication designs that demonstrate the usefulness of sonication by showing practical benet for the respective scientic eld. Identifying good research questions at this intermediate level of complexity was not trivial. Nevertheless, being able to come up with suciently convincing examples to reach the immediate partner audience is very important. Finally, the project goal was to integrate all the approaches that worked well in one context into a single software framework that includes all the software infrastructure, thus making them re-usable for a wide range of applications; this was intended to result in a meaningful contribution to the sonication community. The diversity of the research group and their problem domains forced us toward very exible and re-usable solutions. By making our collection of implemented sonication designs freely accessible, we hope to capture much of what we have learned in a form that other researchers can build on.
4.1.3
Publications
Many research results were published in conference and journal papers, which are indicated in the respective chapters, and briey listed here: de Campo et al. (2004) was a project plan for SonEnvir before the fact. Papers on sociological data (Day et al. (2005)), quantum spectra (de Campo et al. (2005d)), and e the project in general (de Campo et al. (2005a)) were presented at ICAD and ICMC 2005. We wrote some papers with external collaborators, on electrical systems (Fickert et al. (2006)), and various kinds of lattice data (de Campo et al. (2005c), de Campo et al. (2006b), de Campo et al. (2006c), de Campo et al. (2005b)). For ICAD 2006, we contributed an overview paper, de Campo et al. (2006a), and organised a concert of sonications described in section 4.3, contributing a piece described in de Campo and Day (2006) and in section 11.3. e At ICAD 2007, we presented papers on EEG (de Campo et al. (2007)), time series (Frauenberger et al. (2007)), Potts models (Vogt et al. (2007)), and on the Design Space Map concept (de Campo (2007b)). At the ISon workshop in York 2007, we presented work on juggling sonication (Bovermann et al. (2007)) and the Sonication Design Space Map (de Campo (2007a)). Some project results and insights in the sociological context were also presented in two journal publications: Day et al. (2006) and Day and de Campo (2006). e e
32
4.2
Science By Ear - An interdisciplinary workshop
This workshop was in our opinion the most innovative experiment in methodology within SonEnvir. Aiming to intensify the interdisciplinary work setting within SonEnvir, we brought in both sonication experts and scientists from dierent domains to spend three days working on sonication experiments. Considering participant responses (both during and after the event), this workshop was very successful. Detailed background is available online here1 .
4.2.1
Workshop design
We chose the participants to invite so they would form an ideal combination of competences: Eight international sonication experts, eight domain scientists (mainly from Austria), six audio specialists and programmers, and (partially overlapping with the above) the SonEnvir team itself (see appendix D). This group of ca. 24-28 people was just large enough to allow for dierent combinations for three days, but still small enough to allow for good group cohesion. The workshop program consisted of ve short lectures by the sonication experts, which served to inform less experienced domain scientists about sonication history, methodology, and psychoacoustics. This helped to bring all participants closer to a common language. Most of the workshop time was spent in sonication design sessions. For each day, three interdisciplinary teams were formed, composed of the three categories; 2-3 sonication experts, 2-3 domain scientists, 2 audio programmers, 1 moderator (a SonEnvir member). These sessions typically lasted 2 hours, after which the group would report to the plenary about their results. For the rst two days, all three teams worked on the same problem at the same time (in parallel), which allowed for good comparisons of design results. On the last day, each group worked on a separate problem for two sessions to allow working more in depth on the exploration of ideas.
4.2.2
Working methods
The design sessions focused on data submitted by the participating domain scientists; the scientic domains included Electrical Power Systems, EEG Rhythms, Global Social data, meteorology in the Alpine region, computational Ising models, Ultra-Wide-Band communication, and research in biological materials called Polysaccharides. The parallel sessions began with a talk by the submitting domain specialist introducing the problem dataset, for to the plenary group. Then the group split into the three teams,
1
http://sonenvir.at/workshop/
33 and the teams began their parallel sessions. The typical sequence in a session was to do some brainstorming rst, to get ideas what sonication strategies may be applicable. Once a few candidate ideas were around, experimentation began by coding little sonication designs (some administrative code like data reading routines was prepared beforehand). Time tended to be rather short, so decisions what to try rst were often based on what seemed doable within limited time. Toward the end of the session time, the group began preparing what they would report to the plenary meeting. This usually consisted of little demos of what the group had tried, many more ideas for experiments to do as follow-up steps, and an informal evaluation of what the group felt they had learned. On the nal workshop day, spending two sessions on a topic was a welcome change. Having more time to experiment, and especially taking a break and then continuing work on a problem allowed for more sophisticated mini-realisations. Having a wiki set up for the workshop allowed to distribute latest versions of information materials, all the code examples written, and the notes that were taken during all sessions. Furthermore, most sessions and discussions were recorded (audio, and some video) to allow later analysis of the working process and the interactions taking place.
4.2.3
Evaluation
Many of the designs ended up being adapted in some form for later work in SonEnvir; two that were not used elsewhere are described in section 10 for completeness. Based on feedback given by the workshop participants, it can be considered a highly successful experiment in methodology. Many participants commented very positively on the innovative aspects of this workshop: Actually doing design work in an interdisciplinary group setting rather than going through prepared examples was considered remarkable. The major design tradeo that was also discussed in the responses was how much time to spend on each data problem: time pressure limited the eventual usefulness of the designs that were created, so the alternative of working on much fewer data sets for much longer may be worth trying - at the potential risk of having less comprehensive overall scope. Christian Day made a qualitative and quantitative content analysis of the audio recorde ings of the sessions that conrmed the overall positive response (publication still in progress), and he developed a number of guidelines for future similar events: Prepare and distribute basic literature on the domains well beforehand. In the SBE workshop, there was sometimes a tendency that domain scientists would mainly listen, thus leaving the sonication experts and programmers to do most
34 of the talking. From an interdisciplinary point of view, this is not ideal, as it does not create equally shared understanding. Do more technical preparation together with the programmers beforehand. In some sessions problems came up with reading and handling data properly, which made them less practical than intended. Have a scientist from the problem domain in every group. As the SBE workshop covered a wide range of problems, this was not feasible in the parallel sessions. This strategy would work well in combination with a more limited set of problems to work on.
4.3
ICAD 2006 concert
While the ICAD has been holding conferences since 1992, the rst ever concert of sonications at an ICAD conference was only in 2004.
4.3.1
Listening to the Mind Listening
For the ICAD conference in Sydney 2004, Stephen Barrass organised a concert of sonications of brain activity, called Listening to the Mind Listening2 . The concert call3 invited participants to create sonications of neural activity: a dataset was provided with ve minutes of multichannel EEG recording of a person listening to a piece of music. A jury selected ten submissions for the concert which took place in the Sydney Opera House. Even though the pieces were constrained to adhere to the time axis of the recording, the diversity of the approaches taken, and the variety in the sounding results was extremely interesting. The pieces can be listened to here4 and the organisers published an analytical paper in Leonardo Music Journal comparing all the entries in a number of dierent ways (Barrass et al. (2006)). The concert was a great success, so it seemed likely to become a regular event at ICAD.
4.3.2
Global Music - The World by Ear
In 2006 the author was invited to be Concert Chair for the ICAD conference in London. Together with SonEnvir colleagues Christopher Frauenberger and Christian Day, we e agreed that social data would be an interesting and accessible topic for a sonication
http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert.htm http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert call.htm 4 http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert.htm
3 2
35 concert/competition, and we proceeded to collect and prepare social data of 190 nations represented in the United Nations. The concert call5 invited participants to contribute a sonication that illuminates aspects of the social, political and economic circumstances represented in the data. The following quote is the central part of the concert call. Motivation Werner Pirchner, Ein halbes Doppelalbum, 1973: The military costs every person still alive roughly as much as half a kilogram of bread per day. Global data are ubiquitous - one nds them in every newspaper, and they cover a range of themes, from global warming to increasing poverty, from individual purchasing power to the ageing of the worlds population. Obviously these data are of a social nature: They describe specic aspects (e.g. ecological or economic) of the environment in which societies exist, which taken together determine culture, i.e. the way people live. Rising awareness of these global interdependencies has led both to fear and concerns (e.g. captured in the notion of the risk society, see Beck (1992); Giddens (1990, 1999)), as well as hopes for eventual positive consequences of globalisation. Along with developments like the scientisation of politics (see Drori et al. (2003)), this growing understanding of global issues has redened the context of the political discourse in modern societies: As modern societies claim to steer their own course based on self-observation by means of data, an information feedback loop is realised. Alternative choices of data that are important to consider, which data should be set in relation to each other, and a consideration of how to perceptualise these data choices meaningfully can enrich this discourse. Closing the feedback loop by informing society about its current state and its development is a task that both scientists and artists have responded to, and this is the key point of this call: You can contribute to the discourse by perceptualising aspects of world societal developments, search for data that concern interesting questions, and devise strategies for investigating them, and demonstrate that sound can communicate information in an accessible way for the general public.
5
http://www.dcs.qmul.ac.uk/research/imc/icad2006/concertcall.php
36 The reference dataset of 190 countries included data ranging from commonly expected dimensions like geographical data (capital location, area), population number, to basic social indicators such as GDP, access to sanitation and drinking water, and life expectancy. An extended dataset included data on education (years in school for males and females), illiteracy, housing situation, economic independence of males and females, and others. The call went on to specify the following constraints: Using this reference dataset was mandatory, so countries, capital locations, population and area data should be used. Participants were strongly encouraged to extend this dataset with more dimensions, and possible resources for such data extensions were pointed out. The concert sound system was to be a symmetrical ring of eight speakers, so any spatialisation used in pieces should employ such a conguration. Finally, participants had to provide a short paper that documents the context and background of their data choices and sonication design. An international jury composed of sociologists, computer musicians/composers, and sonication specialists wrote reviews rating the anonymous submissions, and eight pieces were nally selected for the concert6 . Four of these pieces are described in more detail in section 11.
Papers and headphone-rendered mp3 les for all pieces are available at http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html.
Chapter 5
General Sonication Models

A British Euro-joke tells of a meeting of ocials from various countries who listen to a British proposal, nodding sagely at its numerous benets; the French delegate stays silent until the end, then taps his pencil and remarks: I can see that it will work in practice. But will it work in theory? reported in Barnes (2007) In this chapter, several models are proposed to allow better understanding of the main aspects of sonication designs: Sonication Design Space Map - General orientation in the design process Synthesis Model - Considerations of and examples for synthesis approaches User Interaction Model - Understanding sonication usage contexts and users goals and tasks to be achieved Spatialisation Model - Using spatial distribution of sound for sonication Note the entangled nature of these aspects: splitting sonication designs into aspects is only a simplication that is temporarily useful for grasping the concepts. Because of their close connections, it will be necessary to cross-reference between sections. Generally, because of these interdependencies the understanding of these sections will benet from re-reading.
37
38
5.1
5.1.1
The Sonication Design Space Map (SDSM)

Introduction
This section describes a systematic approach for reasoning about experimental sonication designs for a given type of dataset. Starting from general data properties, the approach recommends initial strategies, and lists possible renements to consider in the design process. An overview of the strategies included is presented as a mental (and visual) map called the Sonication Design Space Map (SDSM), and the renement steps to consider correspond to movements on this map. The main purpose of this approach is to extract theory from observation (in our case, of design practice), similar to Grounded Theory in sociology (Glaser and Strauss (1967)): to make implicit knowledge (often found in ad hoc design decisions which sonication experts consider natural) explicit and thus available for reection, discussion, learning, and application in design work. This approach is mainly the result of studying design sessions which took place in the interdisciplinary sonication workshop Science By Ear, described in detail in section 4.2. In order to explain the concept in practice as well, a set of workshop sessions on one simple dataset is analysed here in the terms proposed; in the chapters on implemented designs, many more of these are described in detail using SDSM terms.
5.1.2
Background
When collaborations on sonication for a new eld of application start, sonication researchers may know little about the new domain, its common types of data, and its interesting research questions; similarly, domain scientists may know little about sonication, its general possibilities, and its possible benets for them. In such early phases of collaboration, the task to be achieved with a single particular sonication is often dicult to dene clearly, so it makes sense to employ an exploratory strategy which allows for mutual learning and exchange. Eventually, the interesting tasks to achieve become clearer in the process. Note that even when revisiting familiar domains, it is good methodological practice to start with as few implicit assumptions as possible, and introduce any concepts from domain knowledge later, and transparently and explicitly, in the course of the design process. Rheinberger (2006) describes that researchers deal with epistemic things, which are by denition vague at rst (they can be e.g. physical objects, concepts or procedures whose usefulness is only slowly becoming clear); they choose experimental setups (ensembles of epistemic things and established tools, devices, procedures), which allow for endless
39 repetitions of experiments with minimal variations. The dierential results gained from this exhaustion of a chosen area in the possibility space can allow for new insights. Then, an experimental setup can collapse into an established device or practice, and become part of a later experimental setup. From this perspective, sonication designs start their lifecycle as epistemic things, which need to be rened under usage; they may in time become part of experimental setups, and if successful, eventually disappear into the background of a scientic culture as established tools. Some working denitions The objects or content to be perceptualised can be well-known information, or new unknown data (or shades of gray in between). The aims for these two applications are very dierent: for information, establishing easy-to-grasp analogies is central, for data, enabling the perceptual emergence of latent phenomena of unforeseeable type in the data. As working terminology for the context here, we propose to dene the following three terms: Auditory Display is the rendering of data and/or information into sound designed for human listening. This is the most general, all-encompassing term (even though the term display has a visual undertone to it). We further propose to dierentiate between two subspecies of Auditory Displays: Auditory Information Display is the rendering of well-under-stood information into sound designed for communication to human beings. It includes speech messages such as in airports and train stations, auditory feedback sounds on computers, alarms and warning systems, etc. Sonication or Data Sonication is the rendering of (typically scientic) data into (typically non-speech) sound designed for human auditory perception. The informational value of the rendering is often unknown beforehand, particularly in data exploration. The model described here focuses on Data Sonication in the narrower sense. These denitions are quite close to the current state of the evolving terminology; In the International Ecyclopedia of Ergonomics and Human Factors, Walker and Kramer (2006) dene the terms quite similarly: Auditory display is a generic term including all intentional, nonspeech audio that is designed to transmit information between a system and a user. ... Sonication is the use of nonspeech audio to present data. Specically, sonication is the transformation of data relations into auditory relations, for the purpose of studying and interpreting the data.
40 Common sonication strategies The literature usually classies sonication approaches into Audication and Parameter Mapping (Kramer (1994b)), and Model-Based Sonication (Hermann (2002)). For the context here, we prefer to dierentiate the categories more sharply, which will become clear along the way; so, our three most common approaches are: Sonication (or generally, perceptualisation) by Continuous Data Representation, Discrete Point Data Representation, and Model-Based Data Representation. Continuous Data Representation treats data as quasi-analog continuous signals, and relies on two preconditions: equal distances along at least one dimension, typically time and/or space; and sucient (spatial or temporal) sampling rate, so that the signals is free of sampling artifacts, and interpolation between data points is smooth. Both simple audication and parameter mapping involving continuous sounds belong in this category. Its advantages include: subjective perceptual smoothness; interpolation can make the sampling interval (which is an observation artifact) disappear; perception of continuous shapes (curves) can be appropriate; audition is very good at structures in time; mapping data time to listening time is metaphorically very close and thus easy to understand. Its drawbacks include: it is often tied to linear movement along one axis only; and events present in the data (e.g. global state changes in a system) may be dicult to represent well. Discrete Point Data Representation creates individual events for every data point. Here, one can easily arrange the data in dierent orders, choose subsets based on special criteria (e.g. based on navigation input), and when special conditions arise, they can be expressed well. Its advantages include: more exibility, e.g. subset selections of changeable sizes, based on changeable criteria, and random iterations over the chosen subsets; and the lack of illusion of continuity may be more accurate to the data. Its drawbacks include: attention may be drawn to data independent display parameters, such as a xed grain repetition rate; at higher event rates, interactions between overlapping sound events may occur, such as phase cancellations. Model-Based Data Representation employs more complex mediation between data and sound rendering by introducing a model, whose properties are informed by the data. Its advantages include: apart from data properties, more domain knowledge can be captured and employed in the model; and models may be applicable to datasets from a variety of contexts, as is commonly aimed for in Data Mining. Its drawbacks include: assumptions built into models may introduce bias leading away from understanding the domain at hand; there may be a sense of disconnection between data and sounding representations; higher complexity of model metaphors may be
41 dicult to understand and interpret.
5.1.3
The Sonication Design Space Map
Task/Data Analysis (Barrass (1997)) focuses on solving well-dened auditory information design problems: How to design an Auditory Display for a specic task, based on systematic descriptions of the task and the data. Here, the phenomena to be perceptualised are known beforehand, and one tries to render them as clearly as possible. The Sonication Design Space Map given here addresses a similar but dierent problem: The aim to be achieved here is to nd transformations that let structures/patterns in the data (which are not known beforehand) emerge as perceptual entities in the sound which jump to the foreground, i.e. as identiable interesting audible objects; these are closely related to sound objects in the electronic music eld (from objets sonores, see Schaeer (1997)), and in psychoacoustics literature, auditory gestalts (e.g. Williams (1994)). In other words, the most general task in data sonication designs for exploratory purposes is to detect auditory gestalts in the acoustic representation, which one assumes correspond to any patterns and structures in the data one wants to nd. SDS Map axes To facilitate this search for the unknown, the Design Space Map enables a designer, researcher, or artist to engage in systematic reasoning about applying dierent sonication strategies to his/her task or problem, based on data dimensionality and perceptual concepts. Especially while the task is not yet clearly understood and dened (which is often the case in exploratory contexts), reasoning about data aspects, and making well-informed initial choices based on perceptual givens can help to develop a clearer formulation of useful tasks. So, the proposed map of the Sonication Design Space (see gure 5.1) has these axes: X-axis: the number of data points estimated to be involved in forming one gestalt, or expected gestalt size; Y-axis: the number of data dimensions of interest, i.e. to be represented in the current sonication design; Z-axis: the number of auditory streams to be employed for data representation.
42
Figure 5.1: The Sonication Design Space Map

The overlapping zones are fuzzy areas where dierent sonication approaches apply; the arrows on the right refer to movements on the map, which correspond to design iterations. For detailed explanations see sections 5.1.3 and 5.1.4.
To ensure that the auditory gestalts of interest will be easily perceptible, the most fundamental design decision is the time scale: In auditory gestalts (or sound objects) of 100 milliseconds and less it becomes more and more dicult to discern meaningful detail, while following a single gestalt for longer than say 30 seconds is nearly impossible, or at least takes enormous concentration; thus, a reasonable rule of thumb for single gestalts is to time-scale their rendering into the duration of echoic memory and short term memory, i.e. on the order of 1-3 seconds (Snyder (2000)). Sounds up to this duration can be kept in working memory with much detail information, keeping all the nuances and inections while more perceptual processing goes on. This time frame can be called echoic memory time frame. The expected gestalt size is the number of data points (of the dataset under study) that should be represented within this time frame to allow for perception of individual gestalts at this data subset size. Note that the three-second time frame does not impose a limit on the number of data points represented: as a deep exploration of the world of Microsound (Roads (2002)) shows, clouds of short sound events can happen at very high densities in the micro-time scale; in fact this is a fascinating area for creating sound that is rich in perceptual detail and artistic possibilities.
43 SDS Map zones The zones shown in gure 5.1 do not have hard borders; their extensions are only meant to give an indication how close-by (and thus meaningfully applicable) which strategies are for a given data gestalt size and dimensions number. Similarly, the number ranges given below are only approximate orders of magnitude, and mainly based on personal experience both in electronic music and sonication research. The Discrete-Point zone ranges roughly from gestalt size 1 - 1000 and from dimensions number 1 - 20; the transitions shown in the map from note-like percepts via textures to granular events which merge into clouds of sound particles are mainly perceptual. The Continuous zone ranges roughly from gestalt size 10 - 100.000 and from dimensions number 1 - 20; the main transition here is between parameter mapping and audication, with various technical choices indicated along the way, such as using the continuous data signal as a modulation source, band splitting it, and/or applying ltering to it. The Model-Based zone ranges roughly from gestalt size 10 - 50.000 and from dimensions number 8 - 128; because the approach is so varied and exible, there are no further orientation points in it yet. Existing varieties of model-based approaches are still to be analysed in the terms of this Sonication Design Space, and can eventually be integrated in appropriate locations on the map. All these zones apply mainly for single auditory streams; generally, when multiple streams are used in a sonication design, the individual streams can and should use fewer dimensions. In fact, using multiple streams is the main strategy for reducing the number of dimensions while keeping the overall density of presentation constant.
5.1.4
Renement by moving on the map
In the evolution of a sonication design, all intermediate incarnations can be conceptualised easily as locations on the map, based on how many data points are rendered into the basic time interval, how many data dimensions are being used in the representation, and how many perceptual streams are in use. A step from one version to the next can then be considered analogous to a movement on the map. This mind model aims to capture the design processes we could observe in concentrated form in the Science by Ear workshop (SBE, described in detail in section 4.2), and in extended form in the development work in the main strands of the SonEnvir project. Data anchor For exploring a dataset, one can start by putting a reference point on the map, which we call Data Anchor: This is a point on the map corresponding to the full number of data
44 points and data dimensions. A rst synopsis, or more properly Synakusis, of the entire dataset (within the echoic memory time frame of ca. 3 seconds) can then be created with one of the nearest sonication strategies on the map. Subsequent sonication designs and sketches will typically correspond to a movement down from this point, i.e. toward using fewer dimensions at a time, and to the left, toward listening to less than the total number of data points in the echoic memory time frame. Of course one can still listen to the entire dataset, total presentation time will simply become longer. Shift arrows Shift arrows, as shown in gure 5.1 on the right hand side, allow for moving ones current working position on the Design Space Map, in eect deploying dierent sonication strategies in the exploration process. Note that some shifting operations are used for zooming, and leave the original data untouched, while others employ (temporary) data reduction, extension, and transformation; in any sonication design one develops, it is essential to dierentiate between these kinds of transformations and document the steps taken clearly. Finally, one can decide to defer such decisions and turn them into interaction possibilities, so that e.g. subsets are selected interactively. A left-shifting arrow can be used to reduce the expected gestalt size, in eect using fewer data points within the echoic memory time frame. Some options are: investigating smaller, user-chosen data point subsets (this can be by means of interaction, e.g. tapping on a data region and hearing that subset); downsampling; by subsets chose by appropriate random functions; and other forms of data preprocessing. A down-shifting arrow can be used to reduce the dimensions number, i.e. to employ less data properties (or dimensions) in the presentation. Some options are: dimensionality reduction by preprocessing (e.g. statistical approaches like Principal Component Analysis (PCA), or using locality-preserving space-lling curves in higher-dimensional spaces, e.g. Hilbert curves); and user-chosen data property subsets, keeping the option to explore others later. Model-based sonication concepts may also involve dimensionality reduction techniques, yet they are in principle quite dierent from mapping-based approaches.1 An up-shifting arrow can be used to increase the number of dimensions used in the sonication design; e.g. for better discrimination of components in mixed signals, or to increase contrast by emphasizing aspects with relevance-based weighting. Some options are: time series data could be split into frequency bands to increase detail resolution; extracting the amplitude envelope of a time series and using it to accentuate its dynamic range2 ; other domain-specic forms of preprocessing may be appropriate for adding secondary data dimensions to be used in the sonication design.
Thomas Hermann, personal communication, Jan 2007. Whether such transformations happen in the data preprocessing stage or in the audio DSP implementation of a sonication design makes no dierence to the conceptual reasoning process.
2 1
45 A right-shifting arrow can be used to increase the number of data points used, which can help to reduce representation artifacts. Some options are: interpolation of signal shape between data points; repetition of data segments (e.g. granular synthesis with slower-moving windows); local waveset audication (see section 5.3); and model-based sonication strategies can be used to create e.g. physical vibrational models, whose state may be represented in larger secondary datasets informed by comparatively few original data points. Interpolation in time-series data is often employed habitually without further notice; the model proposed here strongly suggests notating this transformation as a right-shifting arrow. If one is certain that the sampling rate used was sucient, using cubic (or better) interpolation instead of the actually measured steps creates a smoother signal which is nearer to the phenomenon measured than the sampled values. When such a smoothed signal is used for modulating an audible synthesis parameter, the potentially distracting presence of the time step unit should be less apparent. Z axis shifts So far, all arrows have concerned movement in the front plane of the map, where only a single auditory stream is used for data representation. After the time scale, the number of streams is the second most fundamental perceptual design decision. By presenting some data dimensions in parallel auditory streams (especially data dimensions of the same type, such as time-series of EEG measurements for multiple electrodes), overall display dimensionality can be increased in a straightforward way, while dimensionality in each individual stream can be lowered substantially, thus making each single stream easier to perceive. (The equivalent movement is dicult to represent well visually on a 2D map, but easy to imagine in 3D space. Figure 5.2 shows a rotated view.) For multiple streams, all previous arrow movements apply as above, and two more arrows become available: An inward arrow can be used to increase the number of parallel streams in the representation. Some options are: multichannel audio presentation; and setting one perceptual dimension of the parallel streams to xed values with large enough dierences to cause stream separation, thus in eect labelling the streams. An outward arrow can be used to decrease the number of parallel streams in the representation. Some options are: selecting fewer streams to listen to; intentionally allowing for perceptual merging of streams. Experimenting with dierent numbers of auditory streams can be very interesting, as multiple perspectives on the same data content may well contribute to more intuitive understanding of the dataset under study. Figure 5.2 shows the range of hypothetical variants of a sonication design for a dataset with 16 dimensions; the graph plane is at
46
Figure 5.2: SDS Map for designs with varying numbers of streams.
Hypothetical variants of a sonication design for a dataset with 16 dimensions; see text.
an expected gestalt size of 100 data points, and the axes shown are Y (number of data properties mapped) and Z (number of auditory streams employed). Dierent designs might employ, for example, one stream with 16 mapped parameters, 2 streams with 8, 4 streams with 4, 8 with 2 and 16 streams with a single parameter. Of course, depending on the character of the data dimensions, other, more asymmetrical combinations may be worth exploring; these will typically be located below the diagonal shown. Note that the map is slightly ambiguous between number of generated versus perceived streams; parallel streams of generated sound may fuse or separate based on perceptual context. This is a very interesting phenomenon whch can be quite fruitful: perceptual fusion between streams can be an appropriate expression of data features, e.g. in EEG recordings, massive synchronisation of signals across electrodes may cause the streams to fuse, which can represent the nature of some epileptic seizures well.
47
5.1.5
Examples from the Science by Ear workshop
In order to clarify the theoretical considerations given so far, we now turn to analysing design work done in an interdisciplinary setting. We report one exemplary set of design sessions as they happened, with added after-the-fact analysis in terms of the Sonication Design Space Map concept (short: SDSM). Where SDSM strongly calls for additional designs, these are provided and marked as additions. This is intended to demonstrate the potential of going from practice-grounded theory back to theory-informed practice. The workshop concept is described in section 4.2. The workshop setting True to the inherently interdisciplinary nature of scientic data sonication, the SBE workshop brought together three groups of people for three days: Domain scientists who were invited to supply data they usually work with; an international group of sonication experts; and audio programmers/sound designers. Apart from invited talks by the sonication experts, the main body of work consisted of sonication design sessions, where interdisciplinary groups (ca. 8 people, domain scientists, sonication experts, programmers, and a moderator) spent 2 hours discussing one submitted data set, experimenting with dierent sonication designs, and then discussing results across groups in plenary meetings. In each session, discussion notes were taken as documentation, where possible the sonication designs were kept as code, and all the sound examples played in the plenary meetings were rendered as audio les. All this documentation is available online3 . Load Flow - data background The particular data set serving as a starting example came from electrical power systems: It captures electrical power usage for one week (December 18 - 24, 2004) across ve groups of power consumers: households, trade and industry, agriculture, heating and warm water, and street lighting; a sum over all consumer groups was also provided. Clear daily cycles were to be expected, as well as changes between workdays and weekends/holidays. While this is not scientically challenging, it is a good example of simple data with everyday relevance. We chose this dataset for the rst parallel session, and it did serve well for exploring basic sonication concepts with novices. The full documentation for these sessions is available online here4 .
http://sonenvir.at/workshop/ http://sonenvir.at/workshop/problems/loadow/. All sound examples can be found here, in the folders TeamA, TeamB, TeamC, and Extras; for layout reasons, relative links at this site are given here as ./TeamX/name.mp3 etc.
4 3
48
Figure 5.3: All design steps for the LoadFlow dataset.

Steps are shown as locations labeled with team name and step number (A1, B2, C3, etc.), and arrows between locations.
The dataset was an excel le with 5 columns for the consumer groups, and consumption values were sampled at 15 minute intervals; so for a week, there are 24 * 4 * 7 = 672 data points for the entire dataset. In SDSM terms, this puts the Data Anchor for this set right in the middle of the Design Space Map, in the overlap zone between Discrete-Point and Continuous sonication, see section 5.3. Sonication designs All sonication designs are shown as locations on the Design Space Map in gure g:loadmap labeled as A1, B1, C3 etc. Teams A and B created their design sketches in SuperCollider3, while Team C worked with the PureData environment. [A1] Team A began by sonifying the entire dataset as ve parallel streams, scaled to 13 seconds, i.e. one day scaled to ca. 2 seconds; power values were mapped to frequency with identical scaling for all channels5 . The resulting ve parallel streams were panned into one stereo panorama. After experimenting with larger and smaller timescales, agreement was reached that the
5
./Team A/TeamA 1 FiveSines PowersToFreqs.mp3
49 initial choice of timescale was appropriate and useful. In SDSM terms, this means the team was looking for auditory gestalts at the scale of single days. [A+] As SDSM recommends starting with a synakusis into a timeframe of 3 seconds, this is provided here6 . This was only added after the workshop. Then, alternative sound parameter mappings were tried out based on team suggestions: [A2] Mapping powers to amplitudes of ve tones labeled with dierent pitches7 . While this is closer in metaphorical distance, it is perceptually less successful: one could not distinguish much shape detail in amplitude changes. [A3] Mapping powers to amplitudes and the cuto frequencies of resonant lowpass lters of ve dierently pitched tones8 . This was clearer again, but still not as dierentiated as mapping to tone frequencies. [A4] Going back to mapping to frequencies, each tone was labeled with a dierent phase modulation index (essentially, dierent levels of brightness)9 . While this allowed for better stream identication, the (very quickly chosen) scaling was not deemed very pleasant, if inadvertently amusing. [A5] Finally, the team tried using less parallel streams, and adding secondary data: the phase modulation depth (basically, the brightness) of both channels (household and agriculture) was controlled from the dierence between the two data channels10 . While this did not work very well, it seemed promising with better secondary data choices; however, at this point session time was over. In SDSM terms, design A5 is a move down - to less channels - and a move back up - derived data used to control additional parameters (the map only shows the resultant move). Team B chose to do audication (following one sonication experts request), and to use an interactive sonication approach: Their design loaded the entire data for one channel (672 values, equivalent to one week of data time) into a buer, and played back a movable 96-value segment (equal to one day) as a looped waveform. The computer mouse position was used to control which 24hour-segment is heard at any time. This maps the signals local jaggedness into spectral richness and its overall daily change into amplitude. (For the non-interactive sound examples that follow, the mouse is moved automatically through the week within 14 seconds.) While the team found the data sample rate and overall data size too low for much detail, an interesting side eect turned up: when audifying segments in this fashion, the dierence between the same time of day for two adjacent days was emphasized; large
./extras/LoadowSynakusis.mp3 ./Team A/TeamA 2 FiveTones PowersToAmps.mp3 8 ./Team A/TeamA 3 FiveTones PowersToAmpsAndFilterfreqs.mp3 9 ./Team A/TeamA 4 FiveFMSounds IDbyModDepth.mp3 10 ./Team A/TeamA 5 TwoFMSounds DiToModDepth.mp3
7 6
50 dierences at a specic time between adjacent days created strong buzzing11 . In the next design step, 2 channels, households (left) and agriculture (right) were compared side by side12 , and for clearer separation, they were labeled with dierent loop frequencies 13 . The nal design example maps the power values corresponding to the current mouse position directly to the amplitude of a 50Hz (European mains frequency) ltered pulse wave 14 . As above, in the xed rendering here, the mouse moves through the week at constant speed within 14 seconds. In SDSM terms, the initial choices were to move all the way down on the map (to only 1, and then 2 out of 5 channels at a time), and essentially a move to the left: a one-day window chosen data subset was played by moving a one-day window within the data. Note that this move is actually creating an interaction parameter for sonication design users, which is one the many advantages of current interactive programming environments. Note that the interpolation commonly used in audication is actually slightly dubious here: There may well have been meaningful short-time uctuations within 15 minute intervals which would not have been captured in the data as supplied. Team C used PureData as programming environment. Their approach was quite similar to Team A, with interesting dierences: They began with scaling each single data channel into 3 seconds, mapping power in that channel both to frequency and to amplitude, and subsequently rendered all channels in this fashion15 . Finally, this team also produced a version with six parallel streams (including the sum value), scaled into 12 seconds, and with dierent timbres16 . In SDSM terms, they rst moved to the bottom of the map, while keeping full data scale, i.e. a synakusis-sized time window; example 7 moves back up (using all channels), and to the left (i.e. toward higher time resolution, gestalts on the order of single days of data).
5.1.6
Conclusions
Conceptualising the sonication design process in terms of movements on a design space map, one can experiment freely by making informed decisions between dierent strategies to use for the data exploration process; this can help to arrive at a representation which produces perceptible auditory gestalts more eciently and more clearly. Understanding the sonication process itself, its development, and how all the choices made inuence
11 12
./Team B/1 LoadFlow B Households.mp3 ./Team B/2 LoadFlow B households agriculture.mp3 13 ./Team B/3 LoadFlow B households agriculture.mp3 14 ./Team B/4 LoadFlow B households agriculture.mp3 15 http://sonenvir.at/workshop/problems/loadow/Team C/, sound examples 1-6. 16 ./Team C/TeamC AllChannels.mp3
51 the sound representation one has arrived at, is essential in order to attribute perceptual features of the sound to their possible causes: They may express properties of the dataset, they may be typical features of the particular sonication approach chosen, or they can be artifacts of data transformation processes used. As these analyses of some rather basic sonication design sessions show, the terminology and map metaphor provide valuable descriptions of the steps taken; having the map available (mentally or physically) for a design work session seems very likely to provide good clues for next experimental steps to take. Note that the map is open to extensions: As new sonication strategies and techniques evolve, they can easily be classied as either new zones, areas within existing zones, or as transforms belonging to one of the directional arrows categories; then their appropriate locations on the map can easily be estimated and assigned.
5.1.7
Extensions of the SDS map
There are several ways to extend the map, and make it more useful, and this dissertation aims to provide some of them: More and richer detail can be added by analysing the steps taken in observed design sessions, classifying them as strategies, and adding them if new or dierent. This is the object of chapters 6, 7, 8, 9, 10, and 11, the example sonication designs from dierent SonEnvir research activities. A more detailed analysis of the existing varieties model-based sonication can be undertaken, and that understanding can and should be expressed in the terms of the conceptual framework of the map; however, this is beyond the scope of this thesis. Expertise can be integrated by interviewing sonication experts, tapping into their experience, inquiring about their favorite strategies, or decisions they remember that made a big dierence for a specic design process. One can imagine building an application that lets designers navigate a design space map, on which simple example data sets with coded sonication designs are located. When one moves in an area that corresponds to the dimensionality of the data under study, the nearest example pops up, and can be adapted for experimentation with ones own data. Obviously such examples should be canonical and capture established sonication best practices and guidelines, e.g. concerning mapping Walker (2000), as well as sonication design patterns Barrass and Adcock (2004). Finally, many of the strategies need not be xed decisions made once; being able to delay many of the strategic choices, and to make them available as interaction parameters when exploring a dataset can be extremely valuable.
52
5.2
Data dimensions
Before proceeding to synthesis models, it will be helpful to discuss the nature of data dimensions in more depth.
5.2.1
Data categorisation
In data analysis, data dimensions are classied by scales: data may capture categorical dierences, ordered dierences, which may have a metric, and a natural zero. Table 5.1: Scale types Scale: nominal ordinal interval ratio Characteristics: dierence without order dierence with order dierence with order and metric dierence, order, metric, and natural zero Example: kind of animal degrees of sweetness temperature length
For nominal scales (such as kind of animal) and ordinal scales, it is useful to know the set of all occurring values, or categories (such as cat, dog, horse). The size of this set greatly inuences the choices of possible representations of the values in this data dimension. For metrical scales (interval and ratio), it is necessary to know the numerical range in order to make scaling choices; also knowing the measurement resolution or increment (for example, age could measured in full years, or days since birth) and precision (e.g. tolerances of a measuring device) is useful.
5.2.2
Data organisation
Apart from the phenomena recorded, and their respective values, data may have dierent forms of organisational structure: Individual data points may have dierent kinds of neighbour relations to specic other data points. The simplest case would be no organisation at all: Measuring all the individual weights of a herd of cows is just a set of measured values with no order. When recording health status at the same time, each data point has two dimensions, but there is still no order. If the cows identities are recorded as well, similar measurements at dierent times can be compared. If the cows have names, the data can be sorted alphabetically (nominal scale); if the cows birth dates are known as well, the data can also be sorted by age
53 (interval). Both sortings are derived from data dimensions, and there is no obvious best, or preferable order. Often the order in which data are recorded is considered an implicit order; however, in the example given, it may simply be the order in which the cows happened to be weighed. In social statistics, data for individuals or aggregates without obvious neighbour relations are the most frequent case. When physical phenomena are studied, measurements and simulations are often organised in time (e.g. time series of temperature) and space (temperature in n measuring stations in a geographical area, or force eld simulations in a 3D grid of a specic resolution). These orders can actually be considered separate data dimensions; for clear dierentiation one may call a dimension which expresses a value (such as temperature) a value dimension, while a dimension that expresses an order (e.g. a position in time or space) can be called ordering dimension or indexing dimension. TaskData analysis by Barrass (1997), chapter 4, provides a template that captures data dimensions systematically, as well as initial ideas for desirable ways of representation of and interaction with the data under study. As a practical example, the TaDa Analysis made for the LoadFlow dataset as a preparation for the Science By Ear workshop is reproduced here.
5.2.3
Task Data analysis - LoadFlow data
Name of Dataset: Load Flow Date: March 12, 2006 Authors: Walter Hipp, Alberto de Campo (TaDa) File: LoadFlow.xls (original), .tab, .mtx. Format: excel xls original, tab delimited for Sc3, mtx format for pure data. The le contains 672 lines with date and time, total electrical power consumption, and consumption for ve groups of power consumers. Scenario The Story: Load Flow describes how the electrical power consumption of dierent groups of consumers changes in time. A time series was taken for a week (in Winter 2004) of 15 minute average values, documenting date and time, total power consumption, and consumption for a) households, b) trade, c) Agriculture, d) heating and warm water, and e) street lighting. Tasks for this data set:
54 Find out which kinds of patterns can be discerned at which time domain; e.g. daily cycles versus shorter uctuations. Since all ve individual channels have the same unit of measurement, nd ways to represent them in a way that their values and their movements can be compared directly.
Table 5.2: The Keys Question: Who uses how much power when? Are there patterns that recur? At what time scales? Are there overall periodicities? One or several of the channels; Yes/No, days/hour/times of day; categories of pattern shapes Relative proportions, patterns of change in time ? (none at the time the TaDa analysis was written)
Answers:
Subject: Sounds:
TaDa Table 5.3: The Task Generic question: Purpose: Mode: Type: Style: What is it? How does it develop? Identify, compare interactive continuous exploration
Table 5.4: The Data/Information: Level: Intermediate and global Reading: Conventional (possibly direct) Type: 5 channels, ratio Range: continuous Organization: time
55 Table 5.5: The Data: Type: Range: Organisation: 5 channels of ratio scale with absolute zero Individual channels 0 2.24, total power 1.08 4.55 Time
Appendix
Figure 5.4: LoadFlow - time series of dataset (averaged over many households)
Figure 5.5: LoadFlow - time series for 3 individual households
56
5.3
Synthesis models
Perceptualisation designs always require decisions in what manner precisely data values (the sonicate) determine perceptible representations (in the case of auditory representation, the sonications). While section 5.1 focused on which data subsets are to be presented in the rendering, this section covers the question which technical aspects of the sound synthesis algorithms deployed are to be determined by which data dimensions. The three sonication strategies dened in section 5.1.2 are discussed in more depth, and concrete examples of synthesis processes are provided in ascending complexity. With all strategies from the very simplest to the most complex model-based designs, decisions of mappings (of data dimensions or model properties) to synthesis parameters are required; these decisions need to be informed by perceptual principles such as those covered in chapter 2. While building sonication designs may be technically simple, mapping choices are by no means trivial. One aspect to consider is metaphorical proximity: Mappings that relate closely to concepts in the scientic domain may well reduce cognitive load and thus allow for better concentration on explorations tasks. (For a discussion of performance of clearly dened tasks with intuitive, okay, random, and intentionally bad mappings, see Walker and Kramer (1996), described in section 2.5.) Another aspect is the clarity of the communicative function to be fullled in the research context: What will a perceptible aspect of the sound serve as? Some possible categories are: analogic display of a data dimension - a value dimension mapped to a synthesis parameter which is straightforward to recognise and follow a label identication for a stream - needed when several streams are heard in parallel an indexing strategy - ordering the data by one dimension, then indexing into subsets context information/orientation - mapping non-data; e.g. using clicks to represent a time grid Finally, it is essential to understand the resolution of perceptual dimensions, such as Just Noticeable Dierences (JNDs) of perceptual dimensions. Note that sound process parameters need not be directly perceptible; they may govern aspects of the sound that will indirectly produce dierences that may be described perceptually in other terms. Perceptual tests can be integrated into the sonication design process, like writing tests to verify that new code works as intended. Writing examples that test whether a specic concept produces audible dierences for the data dierences of interest can provide
57 such immediate conrmatory feedback, as well as direct learning experience for the test listeners immediately at hand. Such examples also provide a good base for discussions with domain specialists. Similar mapping decisions come up in the process of designing electronic or softwarebased music instruments; how the ranges of sensor/controller inputs (the equivalent to data to be sonied) are scaled into synthesis parameter ranges determines how playing that instrument will feel to a performer.
5.3.1
Sonication strategies
The three most common concepts, Continuous Data Representation, Discrete Point Data Representation, and Model-Based Data Representation, correspond closely to the approaches described rst in Scaletti (1994). The examples given again use the LoadFlow dataset, and loosely follow the order given by Scaletti. Pauletto and Hunt (2004) briey describe how dierent data characteristics sound under dierent sonication methods: Static areas, trends, single outliers, discontinuities, noisy sections, periodicities (loops), or near-periodicities are simple characteristics that may occur in a single data dimension, and will be used as examples of easily detectable phenomena. The data for the code examples can be prepared as follows:
( // load data file q = q ? (); // a dictionary to store things by name // load tab-delimited data file: q.text = TabFileReader.read( "LoadFlow.tab".resolveRelative, true, true ); // keep the 5 interesting channels, convert to numbers q.data = q.text.drop(1).collect { |line| line[3..7].collect(_.asFloat) }; // load one data channel into a buffer on the server q.buf1 = Buffer.loadCollection(s, q.data.flop[0]); // households );
5.3.2
Continuous Data Representation
Audication is the simplest case of continuous data representation: Typically, converting the numerical values of a long enough time series into a soundle is a good rst pass at nding structures in the data. Scaletti (1994) calls this 0th order sonication. Scaling the numerical values is straightforward, as one only needs to t them into the legal range for the type of soundle to be used; for high precision, 32 bit oating point data can be converted to sound le formats without any loss of information. For audication, one can simply scale the (expected or actual) maximum and minimum values to the conventional -1.0 to +1.0 range for audio signals at full level. This maps the data dimension under
58 study directly to the amplitude of the audible signal. By making the playback rate user-adjustable allows for simple time-scaling, one can change expected gestalt size interactively. The fastest timescaling value will typically be around 40-50 kHz, which includes the default sample rates of most common audio hardware; this puts roughly 100.000 data points into working memory, which makes audication the fastest option for screening large amounts of data with minimal preprocessing. Typical further operations to provide are: selection of an index range in the data, options for looped and non-looped playback, and synchronised visual display of the waveform under study. The EEGScreener described in chapter 9.1 is an example of a powerful, exible audication instrument. Of the phenomena to be detected, static values will become silent: the human ear does not hear absolute pressure values, and while audio hardware may output DC osets, loudspeakers do not render these as reproducible pressure osets. Trends are also not represented clearly: Ramp direction is not an audible property. Single outliers become sharp clicks, and discontinuities (e.g. large steps) become be loud pops. Rapidly uctuating sections will sound noisy, and periodicities will be easy to discern even in they are only weak components in mixed signals. Code examples for 0th order - audication.
p = ProxySpace.push; ~audif.play; // prepare sound // start an empty sound source
// play entire week once, within 0.05 seconds ~audif = {PlayBuf.ar(1, q.buf1, q.buf1.duration / 0.05) * 0.1 }; // try agriculture data q.buf1.loadCollection(s, q.data.flop[2]); // play the entire week looped ~audif = {PlayBuf.ar(1, q.buf1, q.buf1.duration / 0.05, loop: 1) * 0.1 };
The next example loops over an adjustable range of days; starting day within the week and loop length can be set in days.
( ~audif = { |dur = 0.05, day=0, length=1| var stepsPerDay = 96; var start = day * stepsPerDay; var rate = q.buf1.duration / dur; // read position in the data buffer
59
var phase = Phasor.ar(1, rate, start, start + (length * stepsPerDay)); BufRd.ar(1, q.buf1, phase, interpolation: 4) * 0.1; }; )
The next example loops a single day, and allows moving the day-long time window, thus navigating by mouse - this is the solution SBE TeamB developed.
( ~audif = { var start = MouseX.kr; // var range = BufFrames.kr(q.buf1); // var rate = 1 / 10; // var phase = Phasor.ar(0, rate, 0, range
time in the week (0 - 1) full range is one week. guess a usable rate / 7) + (start * range);
var out = BufRd.ar(1, q.buf1, phase, interpolation: 4); out = LeakDC.ar(out * 0.5); // remove DC offset }; )
Parameter mapping continuous sonication, or what Scaletti calls 1st-order sonication, maps data dimensions onto parameters that control a directly audible synthesis parameter, such as pitch, amplitude (of a carrier signal), brightness, etc. Here, the simplest case would be mapping to frequency (a synthesis parameter) respectively pitch (a perceptual property of the rendered sound). The rst example maps the data range of 0 - 2.24 into pitch range of (midinote) 60 96, or frequencies between ca 260 and 2000 Hz, time-scaled into 3 seconds.
// loop a weeks equivalent of data ( ~maptopitch.play; ~maptopitch = { | loopdur = 3| var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1); var pitch = datasignal.linlin(0, 2.24, 60, 96); // scale into 3 octaves; var sound = SinOsc.ar(pitch.midicps) * 0.2; Pan2.ar(sound); } )
It may seem a little over-engineered here, but in general, it is a good idea to consider here what the smallest data variations of interest are, and whether they will be audible in the mapping used.
60 While data for Just Noticeable Dierences for some perceptual properties of sound exist in the literature, their values will depend on the experimental context and circumstances. Thus, rather than relying only on experiments which were conducted for other purposes, it makes sense to do at least some perceptual tests for the intended usage context. For example given above, data resolution is 0.01 units; scaled from range [0, 2.24] into [60, 96] creates a minimum step of 0.01 * 36 / 2.24, or 0.16 semitone steps. The literature agrees that humans are most sensitive to pitch variation when it occurs at a (vibrato) rate of ca. 5 Hz, so a rst test may use a pitch of 78 (center of the chosen range), a drift/variation rate of 5 Hz, and a variation depth of +-0.08 semitones; all of these can be adjusted to nd the marginal conditions where pitch variation is just noticeable.
( ~test.play; ~test = { |driftrate = 5, driftdepth = 0.08, centerpitch = 78| var pitchdrift = LFNoise0.kr(driftrate, driftdepth); SinOsc.ar( (centerpitch + pitchdrift).midicps) * 0.2 }; )
Changing driftrate, driftdepth and center pitch will give an impression of how this behaves; to my ears, 0.08 is in fact very near the edge of noticeability. One could systematically test this by setting drift depth to random start values above and below the expected JND, and having test persons do e.g. forced choice tests that would converge on the border for a given drift rate and center pitch. The next example maps the same data values to amplitude, which could seem metaphorically closer - the data value is consumed energy, and amplitude is directly correlated to acoustical energy. However, the rendering is perceptually not very clear: humans are good at lling in dropouts in audio signals, such as speech phonemes masked in noisy environments, or damaged by bad audio connections, such as intermittent telephone lines. The patterns that emerged in the pitch example, where the last three days are clearly dierent, almost disappear. Changing to linear mapping instead of exponential makes little dierence.
( ~maptoamp.play; ~maptoamp = { | loopdur = 3| var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1); var amp = datasignal.linlin(0, 2.24, -60, -10).dbamp; // var amp = datasignal * 0.2; // linear mapping var sound = SinOsc.ar(300) * datasignal * 0.2; Pan2.ar(sound); }
61
)
The next example shows what Scaletti calls a second-order mapping. The data are mapped to a parameter that controls another parameter, phase modulation depth; however, perceptually this translates roughly to brightness (which could be considered a rst-order audible property).
( ~maptomod.play; ~maptomod = { | loopdur = 3| var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1); var modulator = SinOsc.ar(300) * datasignal * 2; var sound = SinOsc.ar(300, modulator) * 0.2; Pan2.ar(sound); } )
5.3.3
Discrete Data Representation
An alternative approach to creating continuous signals based on data dimensions, one can also create streams of events, which may sound note-like when slower than ca. 20 events per second; at higher rates, they can best be described with Microsound terminology, as granular synthesis. The example below demonstrates the simplest case: one creates one synthesis event for each data point, with a single data dimension mapped to one parameter. A duration of 3 seconds will create a continuous-seeming stream; 10 seconds will sound like very fast grains, while 30 seconds takes the density down to 22.4 events per second, which can seem like very fast marimba-like sounds.
( ~grain.play; ~grain = { |pitch=60, pan| var sound = SinOsc.ar(pitch.midicps); var envelope = EnvGen.kr(Env.perc(0.005, 0.03, 0.2), doneAction: 2); Pan2.ar(sound * envelope, pan) }; // ~grain.spawn([\pitch, 79]); Tdef(\data, { var duration = 10; var datachannel = q.data; var power;
62
q.data.do { |chans| power = chans[0]; // households; ~grain.spawn([\pitch, power.linlin(0, 2.24, 60, 96)]); (duration / datachannel.size).wait; }; }).play; )
5.3.4
Parallel streams
When the dimensions in a data set are directly comparable (like here, where they are all power consumption measured in the same units at the same time instants), it is conceptually convincing to render them as parallel streams. Auditory streams, as discussed in Bregman (1990) and Snyder (2000), are a perceptual concept: a stream is formed when auditory events are grouped together perceptually, and multiple streams can form when all the auditory events separate into several groups. With the example above, a minimal change can be made to create two parallel streams: Instead of creating one sound event for one data dimension, one creates two, and pans them left and right for separating the two streams by spatial location.
( Tdef(\data, { var duration = 10; var datachannel = q.data; var powerHouse, powerAgri; ~grain.play; q.data.do { |chans| powerHouse = chans[0]; powerAgri = chans[2]; ~grain.spawn([\pitch, powerHouse.linlin(0, 2.24, 60, 96), \pan, -1]); ~grain.spawn([\pitch, powerAgri.linlin(0, 2.24, 60, 96), \pan, 1]); (duration / datachannel.size).wait; }; }).play; )
When presenting several data dimensions simultaneously, one can obviously map them to multiple parameters of a single synthesis process, thus creating one stream with multiparametric controls. This makes the individual events fairly complex, and may require that each event has more time to unfold perceptually. (In the piece Navegar, a fairly complex mapping is used, see section 11.3.)
63 It should be noted that what is technically created as one stream of sound events is not guaranteed to fuse into one perceptual stream - it may split into several layers, just like separately created multiple streams may perceptually merge into single auditory stream. In fact, as perception is strongly inuenced by a listeners attitude, one can intentionally choose analytic or holistic listening attitudes; either focusing on details of rather few streams, or listening to the overall ow of the unfolding soundscape - whether it is a piece of music or a sonication.
5.3.5
Model Based Sonication
In Model Based Sonication (Hermann and Ritter (1999)), the general concept is that the data values are not mapped directly, but inform the state of a model; properties of that model (which is a kind of front-end) are then accessed when user input demands it (e.g. by exciting the model with energy, somewhat akin to playing a musical instrument). The model properties then determine how the sound engine renders the current user input; this backend inevitably contains some mapping decisions to which the considerations given here can be applied. Till Bovermanns example implementation of the Data Sonogram (Bovermann (2005)) is a good compact example for MBS. The approach is to treat the data values as points in n-dimensional space (for the example Iris data set, 4); then user input triggers a circular energy wave propagating from a current user-determined position, and the reections of each data point are simulated by mapping distance (in 4D space) to amplitude and delay time, as if in natural 3D space. The other parameters for the sound grains (frequency, number of harmonics) are also determined by data based mappings. The Wahlgesnge sonication based on this examples uses somewhat more elaborate a mapping: Distance in 2D is mapped to delay and amplitude, with user-tunable scaling; panning is determined by 2D circular coordinates; the data value of interest (voter percentage) is mapped to the sound grain parameter pitch, and controls for attack/decay times make the tradeo between auditory pitch resolution and time resolution explicit. Both of these examples are too extended for the context here; but they are both available online, and Wahlgesnge is described in more detail in section 6.2. a While it would be worthwhile to analyse more MBS examples in detail, this is beyond scope of the present thesis. Further research will be necessary for a more ne-grained integration of the model-based approach into the context of the sonication models given here.
64
5.4
User, task, interaction models
Humans experience the world with all their senses, and interacting with objects in the world is the most common everyday activity they are well trained at. For example, handling physical objects may change their visual appearance, and touching, tapping or shaking them may produce acoustic responses they can use to learn about objects of interest. Perception of the world, action in it, and learning are tightly linked in human experience, as discussed in section 2.3. In articial systems that model aspects of the world, from oce software to multimodal display systems, or sonication systems in particular, interaction crucially determines how users experience such a system: whether they can achieve tasks correctly (eectiveness) with it, whether they can do so in reasonable amounts of time (eciency), and whether they enjoy the working process (positive user experience, pleasantness). This section looks at potential usage situations of sonication designs and systems: the people working in these contexts (sonication users); the goals they will want to pursue by means of (or supported by) sonication; the kinds of tasks entailed in pursuing these goals; the kinds of interfaces and/or devices that may be useful for these goals; and some notions of how to go about matching all of these.
5.4.1
Background - related disciplines
Interaction is a eld where a number of disciplines come into play: Human Computer Interaction (HCI) studies the alternatives for communication between humans and computers (from translating user actions into input for a computer system to rendering computer state into output for the human senses), sometimes to amazing depths of detail and variety (Buxton et al. (2008); Dix et al. (2004); Raskin (2000)). Musical instruments are highly-developed physical interfaces for creating highly dierentiated acoustic sound, with a very long tradition; in electronic music performance, achieving similar degrees of control exibility (or better, control intimacy) has long been desirable. While the mainstream music industry has focused on a rather restricted set of protocols (MIDI) and devices (mostly piano-like keyboards, simulated tape machines, and mixing desks), experimental interfaces that allow very specic, sometimes idiosyncratic ideas of musical control have been an interesting source of problems for interested engineers. The research done at institutions like STEIM17 (see Ryan (1991)) and CNMAT18 (Wessel (2006)) has made interface and instrument design its own computer music sub-discipline, with its own conference (NIME19 , or New Instruments/Interfaces
17 18
http://www.steim.nl http://cnmat.berkeley.edu 19 http://www.nime.org
65 for Musical Expression, since 2001). Computer game controllers tend to be highly ergonomic and very aordable; thus they have become a popular resource for artistic (re-)appropriation as cheap and expressive music controllers: Gamepads, and more recently, Wii controllers, have both been adopted as is, and creatively rewired for specialised artistic uses. This has been part of an emerging movement toward more democratic electronic devices: Beginning with precursors like Circuit Bending (Ghazala (2005)), extending the design of sound devices my introducing controlled options for what engineers might consider malfunction), designers have created open-source hardware - such as the Arduino microcontroller board20 - to simplify experimentation with electronic devices. With these developments, nding ways to create meaningful connections and new usage contexts for object-oriented hardware (Igoe (2007)) has become interesting for a much larger public than strictly electronics engineers and tinkerers.
5.4.2
Music interfaces and musical instruments
CD/DVD players or MP3 players tend to have rather simple interfaces: play the current piece, make it louder or softer, go to the next or previous track, use randomised or ordered playback of tracks. A piano has a simple interface for playing single notes: one key per note, ordered systematically, and hitting the key with more energy will make it louder. Thus, beginners can experience rather fast success at nding simple melodies on this instrument. Playing polyphonic music really well on piano is a dierent matter; as Mick Goodrick puts it, in music there is room for innite renement (Goodrick (1987)). On a violin, learning to produce good tone already takes a lot of practice; and playing in tune (for whichever musical culture one is in) requires at least as much practice again. (One is reminded of the joke where a neighbour asks, why cant your children spend more time practicing later, when they can already play better?) Instruments from non-western cultures may provide interesting challenges: Playing nose utes is a good example of an instrument that involves the coordination of unusual combinations of body parts, thus developing (in Western contexts) rather unique skills. However, a violin allows very subtle physical interaction with musical sound while it is sounding, and in fact requires that skill for playing expressively. On piano, each note sounds by itself once it has been struck, thus the relations between keys pressed, such as chord balance, micro-timing between notes, and agogics are the main strategies for playing expressively on the piano. In Electronic Music performance, mappings between user actions as registered by con20
http://www.arduino.cc
66 trollers (input devices like the ones HCI studies, buttons, sliders, velocity-sensitive keys, sensors for pressure, exing, spatial position etc.) and the resulting sounds and musical structures are essentially arbitrary - there are no physical constraints as in physical instruments. Designing satisfying personal instruments with digital technology is an interesting research topic in music and media art; e.g. Armstrong (2006) bases his approach on a deep philosophical background, and discusses his example instrument in these terms; Jord` Puig (2005) provides much historical context of electronic instruments, and disa cusses an array of his own developments in that light. Thor Magnussons (and others) ongoing work with ixi software21 explores applying intentional constraints to interfaces for creating music in interesting ways.
5.4.3
Interactive sonication
The main researchers who have been raising awareness for interaction in sonication are Thomas Hermann and Andy Hunt, who started the series of Interactive Sonication workshops, or ISon22 . In the introduction to a special issue of IEEE Multimedia resulting from ISon2004, the editors give the following denition: We dene interactive sonication as the use of sound within a tightly closed humancomputer interface where the auditory signal provides information about data under analysis, or about the interaction itself, which is useful for rening the activity. (Hermann and Hunt (2005), p 20) In keeping with Hermanns initial descriptions of Model-Based Sonication (Hermann (2002)); they maintain that learning to play a sonication design with physical interaction, as with a musical instrument, really helps users acquire an understanding of the nature of the perceptualisation processes involved and of the data to be explored. They nd that there is not enough research on how learning in interactive contexts actually occurs. The Neuroinformatics group at University Bielefeld (Hermanns research group) has studied a number of very interactive interfaces in sonication contexts: recognizing hand postures to control data exploration (Hermann et al. (2002)), a malleable surface for interaction with model-based sonications (Milczynski et al. (2006)), tangible data scanning using a physical object to control movement in model space (Bovermann et al. (2006)), and others. At University of York, in Music Technology, Hunt has studied both musical interface design issues (e.g. Hunt et al. (2003)) and worked on a number sonication projects mainly with Sandra Pauletto (e.g. Hunt and Pauletto (2006)). Paulettos PhD thesis, Interactive non-speech auditory display of multivariate data (Pauletto (2007)), discusses
21 22
http://www.ixi-software.net/ http://interactive-sonication.org/
67 interaction and sonication in great detail (pp. 56-67), and studies central sonication issues with user experiments: The rst two experiments compare listening to auditory displays of data (audications of helicopter ight data, sonications of EMG (electromyelography) data) with their traditional analysis methods (visually reading spectra, signal processing analysis). In both cases, auditory display of large multi-variate data sets turned out to be an eective choice of display. Her third experiment directly studies the role of interaction in sonication: Three alternative interaction methods are provided for exploring synthetic data sets to locate a given set of structures. A low interaction method allows selection of data range, playback speed, and play/stop commands. For the medium interaction method, a jog wheel and shuttle is used to navigate the sonication at dierent speeds and direction. The high interaction method lets the analyst navigate by moving the mouse over a screen area that corresponds to the data, like tape scrubbing. Both objective measurements and subjective opinions found the low interaction method less eective and ecient, and preferred the two higher interaction modes. Interestingly, users preferred the medium interaction mode for its option to quickly set the sonication parameters, and then letting it play while concentrating on listening; the high interaction method requires constant user activity to keep the sound going. It should be noted here that these results strictly apply only to the specic methods studied, and cannot be generalised; however, they do provide interesting background.
5.4.4
The Humane Interface and sonication
The eld of Human Computer Interaction (HCI) is very wide and diverse, and cannot be covered here in depth. However, a rather specialised look at some examples of interfaces may suce to provide enough context for discussing the main issues in designing sonication interfaces. Rather than attempting to cover the entire eld, I will take a strong position statement by an expert in the eld as a starting point: Jef Raskin was responsible for the Macintosh Human Interface design guidelines that set the de facto standard for best practice in HCI for a long time, and his book The Humane Interface (Raskin (2000)) is an interesting mix between best practice patterns and rather provocative ideas. Here is a brief overview of the main statements by chapter: 1. Background - The central criterium for interfaces is quality of the interaction; it should be made as humane as possible. Humane means responsive to human needs, and considerate of human frailties. As one example, the user should always determine the pace of interaction. 2. Cognetics - Human beings only have a single locus of attention, and a single focus of attention, which in interactions with machines is nearly always on the task they try
68 to achieve.23 Computers and interfaces should not distract users from their intentions. Human beings always tend to form habits; user interfaces should allow the formation of good habits, as through benign habituation competence becomes automatic. A possible measure of how well an interface supports benign habituation is to imagine whether a blind user can learn it. As a more general point, humans mostly use computers to get work done; here, user work is sacred, and user time is sacred. 3. Modes - Modes are system states where the same user gesture can have dierent eects, and are generally undesirable; one should eliminate modes where possible. The exception to the rule is physically maintained modes, which he calls quasi-modes (entered e.g. by holding down a special key, and reverted to normal when the key is released.) Visible aordances should provide strong clues as to their operations. If modes cannot be entirely avoided, monotonic behaviour is the next best solution: a single gesture always causes the same single operation; and in a mode where the operation is not meaningful the gesture should do nothing. It is worth keeping in mind that everyone is both expert and novice at the same time when dierent aspects of a system are considered. 4. Quantication - Interface eciency can be measured, e.g. with the GOMS Keystroke model. For most cases, back of the envelope calculations give a good rst indication of eciency; standard times for hitting a key, pointing by mouse, moving from mouse to keyboard, and mentally preparing an action are sucient for that. Finding the minimum combination for a given task is likely to make that task more pleasant to perform. Obviously, the time a user is kept waiting for software to respond should be as low as possible; while a user is busy with other things, s/he will not notice waiting times. 5. Unication - This chapter ranges far beyond the scope needed here, eventually making a case that operating systems and applications should disappear entirely. Fundamental actions are catalogued, and variants of computer-wide string search are discussed as one example of how system-wide unied behaviour should work. 6. Navigation - The adjectives intuitive and natural when used for interfaces generally translate to familiar. Navigation, as with the ZoomWorld approach might be interesting for organising larger collections of sonication designs; for the context of the SonEnvir project these ideas were not applicable. 7. Interface issues outside the user interface - Programming Environments are notoriously bad interfaces, and actually have been getting worse: On a 1984 computer, starting up, running Basic, and typing a line to evaluate 3 +4 may be accomplished in maybe 30 seconds; on a current (2000) computer, every one of these steps takes much longer, even for expert users.
Raskins motto for the chapter is a quote from a character in the TV series Northern Exposure, Chris: I cant think of X if Im thinking of Y.
23
69 Relevance to sonication and the SonEnvir project The most closely related notion to disappearing system software (chapter 5) is the Smalltalk heritage of SC3. Smalltalk folklore says that when Smalltalk was a little girl, she thought she was an operating system - one could do almost everything within Smalltalk, including one of Raskins major desirables, namely, dening new operations by text at any time, which change or extend the ways things work in a given environment. The question what user content is actually being created is extremely important in sonication work: In sonication usage situations, content to keep can comprise uses of a particular data le, particular settings of the sonication design, perceptual phenomena observed with these data and settings, and text documentation, i.e., descriptions of all of the above and possibly user actions to take to cause certain phenomena to emerge. The text editor and code interface in SC3 is well suited for this: commands to invoke a sonication design (e.g. written as a class), code to access a specic data le, and notes of observations can be kept in a single document, as they are all just text. Across dierent sonication designs, SC3 behaves uniformly in this respect. Compared to most programming environments, the SC3 environment allows very uid working styles. Documentation within program creation (literate programming, as Donald Knuth called it), is supported directly.
5.4.5
Goals, tasks, skills, context
From a pragmatic point of view, a number of compromises need to be balanced in interaction design, especially when it is just one of several aspects to be negotiated: Simple designs are quicker to implement, test and improve than more complex designs. Given that one usually understands requirements much better by implementing and discussing sketches, simpler designs will often be better. Exotic devices can be very interesting; however, they limit transferability to other users, and will require extra costs and development time. Even when there is a strong reason to use a special interface device, including a fallback variants with standard UI devices is recommended. Functions should be clearly made available to the users; usually that means making them visible aordances. (Buxton et al. (2008) argues here that the attitude you can do that already, in some arcane way experts may know about, means that nal users will not use that implemented function.) Goals are rmly grounded in the application domain, and with the users. What do users want to achieve with the sonication design to be created? The goals will naturally
70 be dierent for dierent domains, datasets, and contexts (e.g. research prototypes or applications for professional use); nevertheless these examples may apply to most designs: experience the dierences between comparable datasets of a given kind nd phenomena of interest within a given dataset, e.g. at specic locations, with specic settings document such phenomena and their context, as they may become ndings make situations in which phenomena of interest occurred repeatable for other users The interaction design of a sonication design should allow the users focus of attention to remain at least close to these top-level goals. Ideally, the design should add as little cognitive load as necessary for the user, to keep her attention free for the goals. The sonication designs interface should oer ways to achieve all necessary and useful actions toward achieving these goals. The concepts for these actions should obviously be formulated in terms of the mental model the user has of the data and the domain they come from. Tasks comprise all the actions users take to achieve their top-level goals. Tasks can be directly functional for attaining a goal, or necessary to change the systems state such that a desired function becomes available. Systems that often require complicated preparation to get things done tend to distract users from their goals, and are thus experienced as frustrating. Some example tasks that come up when using a sonication design are: load a sonication design of choice (out of several available) load a dataset to explore start the sonication compare with a dierent datasets tune the sonication design while playing explore dierent regions of a dataset by moving through them look up documentation of the sonication design details start, repeat, stop sonication of dierent sections store a current context: a dataset, current selection, current sonication parameter settings, and accompanying text/explanation.
71 For all these tasks, there should be visible aordances that communicate to the user how the related tasks can be done. Ideally, a single task should be experienced as one clear sequence of individual actions (or subtasks). More complex tasks will be composed of a sequence of subtasks. As novice users acquire more expertise, they will form conceptual chunks of these operations that belong together. As long as these subtasks require meaningful decisions, it is preferable to keep them separate; if there is only a single course of actions, one should consider making it available as a single task. Skills are what users need to have or acquire to use an interface eciently. These can include physical skills like manual dexterity, knowledge of operating systems, and other skills. In the HCI literature, two conicting viewpoints can be found here: a. users already possess skills that should be re-used; one should add as little learning load as possible, and enable as many users as possible to use a design quickly; b. interfaces should allow for long-term improvement, and enable motivated users to learn to do very complex things very elegantly eventually. Which of these apply will depend on the context the sonication is designed for; in any case it is advisable to consider well what one is expecting of users in terms of learning load. Some necessary knowledge / skills include: locating les (e.g. program les, data les) reading documentation les selecting and executing program text using program shortcuts (e.g. start, stop) using input devices like mice, trackballs, tablets Context should be represented clearly to reduce cognitive load: all changeable settings should be e.g. visible on a graphical user interface, such as choice of data le, sonication parameter settings, current subset choice, and others. Often, the display elements for these can double as aordances that invite experimentation. In some cases, it can be useful to display the current data values graphically, or to double auditory events visually as they occur in realtime playback.
5.4.6
Two examples
EEG players In the course of the SonEnvir projects, most of the interaction design was done in collaborative sessions. One exception that required more formal procedures was redesigning
72 the EEG Screener and Realtime Player (discussed in depth in chapter 9.1), as the intended expert users were not available for direct discussion. These designs went through a full design revision, with a task analysis that is identical for most of the interface. The informal wish list included: Simple to use, start in very few steps, low eort, keep results reproducible; include a small example data le that can be played directly. The task analysis comprised these items: Goals: quickly screen large EEG data les to nd episodes to look at in detail Tasks: 1. locate and load EEG les in edf format 2. select which EEG electrode channels will be audible 3. select data range to playback: which time segment within le speedup factor, ltering 4. play control: play, stop, pause, loop; feedback current location 5. document current state so it can be reproduced by others 6. include online documentation in german 7. later: prepare example les for dierent absences All of these were addressed with the GUI shown in gure 9.2: 1. File selection is done with a Load EDF button and regular system le dialog; for faster access, the edf le is converted to soundles in the background, and feedback is given when ready. 2. Initially, this was only planned with popup menus and the electrode names; however, making a symbolic map of the electrode positions on the head and letting users dragand-drop electrode labels to the listening locations (see gure 9.3) proved was much appreciated by the users. 3. Time range within the le was realised in multplie ways: graphical selection within a soundle view showing the entire le; providing the start time, duration, and end time as adjustable number boxes; and showing the selected time segment in a magnied second soundle view. This largely follows sound editor designs, which EEG experts are typically not familiar with.
73 4. Play controls are implemented as buttons; play state is shown by button color (white font is active) and by a moving cursor in both soundle views. The cursors location is also given numerically. Looping and ltering is also controlled by buttons; in looped mode, a click plays when the loop point is crossed. In lter mode, the volume controls for the individual bands are enabled. When ltering is o, these controls are disabled for clarity. Adjustable playback parameters are all available as named sliders, with the exact numerical values and units. (Recommended presets for dierent usage situations were planned, but not realised eventually.) 5. The current state can be documented with buttons and shortcuts: The Take Notes button opens a text window, which contains the current lename; the current time and playback settings can be pasted into it, so they can be reconstructed later. 6. The Help button opens a detailed help page in German. The EEG Realtime Player re-uses this design with minimal extensions, as shown in gure 9.5; this reduces learning time for both designs, which are intended for the same group of users. The main dierences are the use of dierent time units (seconds instead of minutes) and more parameter controls, as the synthesis concept is more elaborate. Wahlgesnge a This design is described in detail in section 6.2; its GUI is shown in gure 6.5. As this design follows a Model-Based concept, the realtime interaction mode is central: Goals: compare geographical distribution of voters for ca. 12 parties in four elections in a region of Austria. Tasks: 1. switch between a xed range of elections and parties to explore 2. inject energy by interaction to excite the model at a visually chosen location 3. compare parties and elections quickly after another 4. adjust free sonication parameters like timescale 1. Choosing which election and party results to explore is done with two groups of buttons which show all available choices. The currently active button has a white font. 2. As common in Model-Based Sonication, this design requires much more interaction: to obtain sound, users must click on the geographical map. This causes a circular wave
74 to emerge from the given location, which spreads over the entire extent of the map. Each data point is indexed by spatial location on the map; when the expanding wave hits it, a sound is played based on its value for the current data channel (voter turnout for one of the parties). 3. For faster comparisons, switching to a new election or party plays the sonication for the new choice with the last spatial location; switching between parties can also be done by typing reasonably mnemonic characters as shortcuts. 4. The free sonication parameters like expansion speed of the wave, number of data points to play (to reduce spatial range), etc., can be adjusted with sliders which also show the precise numerical values. Full explanations are given in a long documentation section before the program code, which was deemed sucient at the time. An interesting possible extension here would be the use of a graphical tablet to obtain a pressure value when clicking on the map; this would be equivalent to a velocity-sensitive MIDI keyboard. However, in the interest of easier transfer to other users, we preferred to keep the design independent of specic non-standard input devices.
5.5
Spatialisation Model
The most immediate quality of a sound event is its localization: What direction did that sound come from? Is it near or far away? We often spontaneously turn toward an unexpected sound, even if we were not paying attention earlier. Spatial direction is also one of the stronger cues for stream separation or fusion (Bregman (1990), Snyder (2000), Moore (2004)); when sound events come from dierent directions, they are unlikely to be attributed to the same physical source. Music technology has developed a variety of solutions for spatialising synthesized sound, and both SuperCollider3 and the SonEnvir software environment support multiple approaches for dierent source characteristics, and dierent reproduction setups. Sources can either be continuous or short-term single events; while continuous sources may have xed or moving spatial positions, streams of individual events may have different spatial positions for each event. In eect, giving each individual sound event its own static position in space is a granular approach to spatialisation. (1D) Stereo rendering over loudspeakers works well for few parallel streams, where spatial location mainly serves to identify and disambiguate streams. The most common spatialisation method employed is amplitude panning, which relies on the illusion of phantom sound sources created between a pair of loudspeakers, with the perceived position depending on the ratio of signal levels between the two speakers. Panorama
75 potentiometers (pan pots) on mixing desks employ this method. Sound localisation on such setups is of course compromised at listening positions outside the sweet spot. (2D) Few channel rendering is typically done with horizontal rings of 4 - 8 speakers. This has become more easy in recent years with 5.1 (by now, up to 7.1) home audio systems, which can be used with external input from multichannel audio interfaces. Such systems can spatialize sources on the horizontal plane quite well, and can be used as up to 7 static physical point sources as well. (3D) Multichannel systems, such as the CUBE at IEM Graz with 24 speakers, or the Animax Multimedia Theater in Bonn with 40 speakers, are usually designed for symmetry, spreading a number of loudspeakers reasonably evenly on the surface of a sphere. This allows for good localisation of sources on the sphere, with common spatialisation approaches including vector based panning, Ambisonics, and Wave Field Synthesis. Source distances outside the sphere can be simulated well by reducing the level of the direct sound relative to the reverb signal, and lowpass ltering it. (1D/3D) Headphones are a special case: they can be used to listen to stereo mixes for loudspeakers (and most listeners today are well trained at localising sounds with this kind of spatial information); and they can be used for binaural rendering, i.e. sound environments that feature the cues which allow for sound localisation in normal auditory perception. For music, this may be done with dummy head recordings; for auditory display, this is done with simulations of these cues applied to all the sound sources individually to create their spatial characteristics.
5.5.1
Speaker-based sound rendering
Physical sources For multiple speaker setups, a simple and very eective strategy is to use individual speakers as real physical sources. The main advantage is that physics really help in this case; when locations only serve to identify streams, as with few xed sources, xed single speakers work very well. Amplitude Panning The most thorough overview on amplitude panning methods is provided in Pulkki (2001). Note that all of the following methods work for both moving and static sources. Code examples for all these are given in Appendix B.1. 1D: In the simplest case of panning between two speakers, equal power stereo panning is the standard method. 2D: The most common case here is panning to a horizontal, symmetrical ring of n
76 speakers by controlling azimuth; in many implementations, the width over how many speakers (at most) the energy is distributed can be adjusted. In case the angles along the ring are not symmetrical, adjustments can be made by remapping, e.g. with a simple breakpoint lookup strategy. However, using the best geometrical symmetry attainable is always superior to compensation for asymmetries. Often it is necessary to mix multiple single-channel sources down to stereo: The most common technique for this is to create an array of pan positions (e.g. n steps from 80% left to 80% right), to pan every single channel to its own stereo position, and summing these stereo signals. Mixing multiple channel sources into a ring of speakers can be done the same way; the array of positions then corresponds to (potentially compensated) equal angular distances around the ring. Both larger numbers of channels can be panned into rings of fewer speakers, and vice versa. 3D: For simple geometrical arrangements of speakers, straightforward extensions of amplitude panning will suce. E.g. for the CUBE setup at IEM consists of rings of 12, 8, and 4 speakers (bottom, middle, top); the setup at Animax Multimedia Theater in Bonn adds a bottom ring of 16 speakers. For these systems, having 2 panning axes, one between the rings for elevation, and one for azimuth in each ring, works well. Again, the speaker setup should be as symmetrical as possible; compensation can be trickier here. Generally speaking, even while compensations for less symmetrical setups are mathematically plausible, spatial images will be worse outside the sweet spot. Maximum attainable physical symmetry cannot be fully substituted by more DSP math. Compensating overall vertical ring angles and individual horizontal speaker angles within each ring is straightforwrd with the remapping method described above. For placement deviations that are both horizontal and vertical, using fuller implementations of Vector Based Amplitude Panning (VBAP, see e.g. Pulkki (2001)) is recommended24 ; however, this was not required within the context of the SonEnvir project, or this dissertation. For placement deviations that are both horizontal and vertical, it is preferable to have . However, this was not needed within the context of the SonEnvir project. Ambisonics Ambisonics is a multichannel reproduction system developed independently by several researchers in the 1970s Cooper and Shiga (1972); Gerzon (1977a,b), based on the idea that spherical harmonics can be used to encode and decode directions from which sound energy comes; a good basic introduction to Ambisoncis math is online here25 .
VBAP has been implemented for SC3 in 2007 by Scott Wilson and colleagues, see http://scottwilson.ca/site/Software.html 25 http://www.york.ac.uk/inst/mustech/3d audio/ambis2.htm
24
77 The simplest form of Ambisonics, rst order, can be considered an extension of the classic Blumlein MS stereo microphone technique: in MS, one uses an omnidirectional microphone as a center channel (M for Mid), and a gure-of-8 mike to create a Side signal (S). By adding or subtracting the side signal from the center, one obtains Left and Right signals; e.g. L = M-S, R = M+S. By using gure-of-8 mikes for Left/Right, Front/Back, and Top/Bottom signals, one obtains a rst order Ambisonic microphone, such as those made by the Soundeld company26 . The channels are conventionally named W, X, Y, Z. Such an encoded recording can be decoded simply for speaker positions on a sphere. In the 1990s, the mathematics for 2nd and 3rd order Ambisonics were developed to achieve increasingly higher spatial resolution; these are formulated in Malham (1999), and also available online here27 . Extensions to even higher orders were realised recently by IEM researchers (Musil et al. (2005); Noisternig et al. (2003)), with multiple DSP optimizations implemented as a PureData library. Using MATLab tools written by Thomas Musil, coecients for encoding/decoding matrices for dierent speaker combinations and tradeo choices can be calculated oine, and can then simply be read in from text les in the realtime platform of choice. The most complex use of this library so far has been the VARESE system (Zouhar et al. (2005)). This is a dynamic recreation of the acoustics of the Philips pavil ion at Brussels World Fair, for which Edgard Var`ses Po`me Electronique (and Iannis e e Xenakis concr`te PH) was composed. e While some Ambisonics UGens previously existed in SuperCollider, the SonEnvir team decided to write a consistent new implementation of Ambisonics in SC3, based on a subset of the existing PureData libraries. This package was realised up to third order Ambisonics by Christopher Frauenberger for the AmbIEM package, available here28 . It supports the main speaker setup of interest, the IEM Cube, as well as a setup for headphone rendering as described below.
5.5.2
Headphones
For practical reasons, such as when working in one room with colleagues, scientists experimenting with sonications are required to use headphones. Many standard techniques work well for lateralising sounds, which can be entirely sucient for making streams segregate or fuse as desired. In order to achieve perceptually credible simulations of auditory cues for full localisation, for example, making sounds appear to come from the front, or above, more complex approaches are needed; the most common approach is to model the cues by means of which the human ear determines sound location.
26 27
http://www.soundeld.com http://www.york.ac.uk/inst/mustech/3d audio/secondor.html 28 http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/
78 Sound localisation in human hearing depends on the dierences between the sound heard in the left and right ears; in principle, three kinds of cues are involved: Interaural Level Dierence (ILD), which is the level dierence of a sound source between the ears, dependent on the sources direction. This can roughly be simulated with amplitude panning, which is however limited to left/right distinction in headphones (usually called lateralisation). Being so similar to amplitude panning, it is fully compatible with stereo speaker setups. Interaural Time Dierence (ITD), the dierence in arrival time of a sound between the ears. This is on the order of a maximum of 0.6 msec: at a speed of sound of 340 m/sec, this is the time equivalent to a typical ear distance of 21 cm. This can be simulated well for headphones; but because delay panning does not transfer reliably for speakers (one hardly ever sits exactly on the equidistance symmetry axis of ones loudspeaker pair), it is hardly used. Like amplitude panning, delay panning only creates lateralisation cues. Head Related Transfer Functions - HRTF / HRIR Head Related Transfer Functions (HRTFs) or equivalently, Head Related Impulse Responses (HRIRs) capture the fact that both ITD and ILD are frequency-dependent: For every direction of sound incidence, the sound arriving at each ear is colored by reections on the human pinna, head, and upper torso; such pairs of lters are quite characteristic for the particular direction they corrsepond to. Roughly speaking, localising a heard sound depends on extracting the eect of the pair of lters that colored it, and inferring the corresponding direction from the characteristics of this pair of lters; obviously, this works more reliably on known sources. HRTFs/HRIRs can be measured by recording known sounds from a set of directions with miniature microphones at the ear, and extracting the eect of the lters. Obviously, HRTF lters are dierent for every person (as are peoples ears and heads), and every person is completely accustomed to decoding sound directions from her own HRTFs. Thus, there is no miracle HRTF curve that works perfectly for everyone; however, because some features in HRTFs are generalizable (such as the directional bands described in Blauert (1997)), the idea of using HRTFs to simulate sounds coming from dierent directions has become quite popular. The KEMAR set of HRIRs (see Gardner and Martin (1994); the data are available online here29 ) is based on recordings made with a dummy head, and is considered to work reasonably well for dierent listeners. The IRCAM has also published individual HRIRs of ca. 50 people for the LISTEN project (Warusfel (2003), online here30 ), so one can try to nd matches to suit a particular persons preferences well.
29 30
http://sound.media.mit.edu/resources/KEMAR/full.tar.Z http://recherche.ircam.fr/equipes/salles/listen/
79 Implementing xed HRIRs for xed source locations is straightforward, as one only needs to convolve the sound source with one pair of HRIRs. However, this is not sucient: static angles tend to sound like colouration (as caused by inferior audio equipment); in everyday life, we usually move our heads slightly, creating small changes in ITD, ILD and HRTF which quickly disambiguate any localisation uncertainties. Thus, creating convincing moving sources with HRTF spatialisation is required, which is not trivial: as a sources position changes, its impulse responses must be updated quickly and smoothly. There is no generally accepted scheme for ecient high-quality HRIR interpolation, and convolving every source separately is computationally expensive. Ambisonics and Virtual Binaural Rendering For complex changing scenes, the IEM has developed a very ecient approach for binaural rendering (Musil et al. (2005); Noisternig et al. (2003)): In eect, taking a virtual, symmetrical speaker setup, and spatializing to that setup with Ambisonics; then rendering these virtual speakers as point sources with their appropriate HRIRs, thus arriving at a binaural rendering. This provides the benet that the Ambisonic eld can be rotated as a whole, which is really useful when head movements of the listener are tracked, and the binaural rendering is designed to compensate for them. Also, the known problems with Ambisonics when listeners move outside the sweet zone disappear; when one carries a setup of virtual speakers around ones head, one is always right in the center of the sweet zone. This approach has been ported to SC3 by C. Frauenberger; its main use is in the VirtualRoom class, which simulates moving sources within a rectangular box-shaped room. This class is especially useful for preparing spatialisation with multi-speaker setups by headphone simulation. Among other things, the submissions for the ICAD 2006 concert31 , described also in section 4.3) were rendered from 8 channels to binaural for the reviewers, and for the web documentation32 . One can of course also spatialize sounds on the virtual speakers by any of the simpler panning strategies given above as well; this trades o easy rotation of the entire setup for better point source localisation. To support simple headtracking, C. Frauenberger also created the ARHeadTracker application, also available as a SuperCollider3 Quark.
31 32
http://www.dcs.qmul.ac.uk/research/imc/icad2006/concert.php http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html
80
5.5.3
Handling speaker imperfections
All standard spatialisation techniques work best when speaker setups are as symmetrical and well-controlled as possible. While it may not always be feasible to adjust mechanical positions of speakers freely for very precise geometry, a number of factors can be measured and compensated for, and this is supported by several utility classes written in SuperCollider, which are part of the SonEnvir framework. Latency The Latency class plays a test signal for a given number of audio channels, and waits for the signals to arrive back at an audio input. The resulting list of measured per-channel latencies can be used to create compensating delay lines, e.g. in the SpeakerAdjust class described below. Spectralyzer While inter-speaker latency dierences are well-known and very often addressed, we have found another common problem to be more distracting for multichannel sonication: Each individual channel of the reproduction chain, from D/A converter to amplier, cable, loudspeaker, and speaker mounting location in the room, can sound quite dierent. When changes in sound timbre can encode meaning, this is potentially really confusing! To address this, the Spectralyzer class allows for simple analysis of a test signal as played into a room, with optional smoothing over several measurements, and then tuning compensating equalizers by hand for reasonable similarity across all speaker channels. SpeakerAdjust Once one has achieved usable EQ curves for every speaker channel, one can begin to compensate for volume dierences between channels (with big timbral dierences between channels, measuring volume or adjusting it by listening is rather pointless). The SpeakerAdjust class expects specications for relative amplitude, (optionally) delay time, and (optionally) as many parametric EQ bands as needed for each channel. Thus, a speaker adjustment can be created that runs at the end of the signal chain and linearizes the given speaker setup as much as possible; of course, adding limiters for speaker and listener protection can be built into such a master eects unit as well.
Chapter 6
Examples from Sociology

Though sociology has early on been considered a promising eld of application (Kramer (1994a)), sonication to date is not widely known within the social sciences. Thus, one purpose of collaborating with sociologists was to raise the awareness of the potential benets sonication can bring to social research. Three sonication designs and their research context are described and analysed as case studies here: the FRR Log Player, Wahlgesnge (election/whale songs), and the Social a Data Explorer. Social (or sociological) data generally show characteristics that make them promising for sonication: They are multi-dimensional, and they usually depict complex relations and interdependencies (de Campo and Egger de Campo (1999)). We consider the application of sonication to data depicting historical (or geographical) sequences as the most promising area within the social sciences. The fact that sound is inherently time-bound is an advantage here, because sequential information can be conveyed very directly by mapping the sequences on the implicit time axis of the sonication. In fact, social researchers are very often interested in events or actions in their temporal context. The importance of developmental questions is even growing due to the globalized notion of social change. Sequence analysis, the eld methodologically concerned with these kinds of questions, assembles methodologies that are by now rather established, like event history analysis, and appropriate techniques to model causal relations over time (Abbott (1990, 1995); Blossfeld et al. (1986); Blossfeld and Rohwer (1995)). Like most methods of quantitative (multivariate) data analysis, sequence analysis methods need to be based on an exploratory phase. The quality of the analysis process as a whole depends critically on the outcome of this exploratory phase. As the amount of social data is continuously increasing, eective exploratory methods are needed to screen these data. On higher aggregation levels (such as global, or UN member states level), social data have both a time (e. g. year) and a space dimension (e. g. nation) and thus can be understood both as time and geographical sequences. The use of sonication to explore data of social sequences was the main focus of the sociological part within the 81
82 SonEnvir project.
6.1
FRR Log Player
An earlier stage of this work was described in detail in a poster for ICAD 2005 (Day e 1 et al. (2005)), it is briey documented in the SonEnvir sonication data collection here , and the full code example is available from the SonEnvir code repository here2 . Researchers in social elds, be they sociologists, psychologists or design researchers, sometimes face the problem of studying actions in an area which is not observable for ethical reasons. This was especially true in the context of the RTD project Friendly Rest Room FRR 3 (see Panek et al. (2005), which was partly funded by the European Commission. The projects aim was to develop an easy to use toilet for older persons, and persons with (physical) disabilities. In order to meet that objective, an interdisciplinary consortium was set up, bringing together specialists of various backgrounds like industrial design, technical engineering, software engineering, user representation, and social scientists. In the nal stage of the FRR project, a prototype of this toilet was installed at a day care center for patients with multiple sclerosis (MS) in Vienna, in order to validate the design concept in daily life use. The sonication design described here was intended for sonifying the log data gathered during this validation phase, because diculties had arisen with these analyses. Being unable, for ethical reasons, to gather observational data, these log data are the only way to understand the actions taken by the user. The FRR researchers are interested in these data because they provide information on the users interaction with the given technical equipment, and thus on the usability and everyday usefulness of the toilet system.
6.1.1
Technical background
The guests of this day care center are patients with varying degrees of Multiple Sclerosis (MS); some need support from nurses when using the toilet while others can use it independently. Due to security considerations as well as for pragmatical reasons, not all components developed within the FRR-project were selected for this eld test (see Panek et al. (2005)). The main features of the installed conceptual prototype are: Actuators to change the height of the toilet seat, ranging from 40 to 70 cm.
http://sonenvir.at/data/logdata1/ https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/Soziologie/FRR Logs/ 3 http://www.is.tuwien.ac.at/fortec/reha.e/projects/frr/frr.html
2 1
83 Actuators to change the seats tilt, ranging from 0 to 7 degrees forward/down. Six buttons on a hand-held remote control to use these actuators: toilet up, toilet down, tilt up, tilt down, as well as ush and alarm triggers. Two horizontal support bars next to the toilet that can be folded up manually. A door handle of a new type which is easier to use for people with physical disabilities was mounted on the outside of the entrance door.
Figure 6.1: The toilet prototype system used for the FRR eld test.
Left to right: the door with specially designed handle, the toilet prototype as installed at the day center, and an illustration of the tilt and height changing functionality.
As direct observation of the users interaction with the toilet system was out of the question, sensors were installed in the toilet area that continuously logged the current status of the toilet area. These sensors recorded: the height of the toilet seat (in cm, one variable), the tilt of the toilet seat (in degree, one variable), the status of the remote control buttons (pressed/not pressed, six variables), the status of the entrance door (open/not open, one variable); and, the presence of RFID tagged smart cards (RFID mid range technology) near the toilet seat to identify any persons present. The guests and the employees of the day care center were provided with such smart cards, and an RFID module in the toilet area registered the identities of up to four cards simultaneously.
84 The log data matrix recorded from these sensor data is quite unusual for sociological data, due to its time resolution of about 0.1 sec maximum (which is high for social data), and the sequential properties of the information captured by the data. One log entry consists of about 25 variables, of which 11 are relevant for our analysis: A timestamp for when an entry was logged, and the ten variables described above. Of these eleven variables, seven are binary. Each log le records the events of one day. In case there is no event for a longer time (e.g. during the night), a watchdog in the logging software creates a blank event every 18 minutes to show the system is still on. In order to use these log les to understanding what the users did, we needed to reconstruct sequences of actions of a user based on the events registered by the sensors. The technical data had to be interpreted in terms of users interaction with the equipment; otherwise the toilet prototype could not be evaluated. The technical data themselves are not sucient for a validation, as we need to validate whether or not the proposed technical solution results in an improvement of the users quality of life, which is the eventual social phenomenon of interest here. Due to the sequential nature of the information contained in the log les, established routines from multivariate statistics could not be applied, as they usually do not consider the fundamental dierence of data composed of events in temporal sequence.
6.1.2
Analysis steps
Graphical Screening On a graphical display (which is what the FRR researchers used), it is not at all easy to follow the sequential order of the events, above all because such a sequence consists of several variables. Yet, as the rst step of analysis, we relied on graphs with the purpose on identifying episodes. An episode in our context is dened by a single users visit to the toilet. A prototypical minimum episode consists the following logged events: door open door close tilt down (multiple events) tilt up (multiple events) button ush door open Note that, in this specic episode, the height and the tilt of the toilet bowl are adjusted via remote control by the user. Still, this episode is a very simple chain of events. Most of the logged events for tilt down and tilt up result only from the weight of the person sitting on the toilet seat.
85
Figure 6.2: Graphical display of one usage episode (Excel).
The rst step in analysing the data material was to use graphical displays to look for sections that could be identied as one users visit to the toilet prototype, and to chunk the data into such episodes, which formed our new entities of analysis. The episode displayed graphically in gure 6.2 is an example for a very simple, single episode. It is obvious that the graph is not easy to interpret due to its complexity (possibly additionally complicated on black and white printouts). The sequential character of the events can be read visually, if not very comfortably: One can see that the starting event is that the door opens, and then closes; followed by the event that the toilet bowl tilts forward (the tilting degree grows). We can assume that the person is now sitting on the toilet. Then the height is adjusted, and the tilt as well. After the tilt returns to a lower value (we can assume the weight has been removed, so we can infer that the person has stood up), the ush button is pressed, and the door opens and closes again. The other variables remain unchanged. Investigating patterns of use The FRR researchers were not interested primarily in the way a single person behaves in the Friendly Rest Room, but rather whether dierent groups of people would be found who, for instance due to similar physical limitations, show similarities in interacting with the given technical equipment. Such typical action patterns of various user groups, are interesting to cross-reference with data from other sources: Characteristics like sex, weight, age of a person, her/his physical and cognitive limitations, additional informa-
86 tion like whether s/he is using a wheelchair, or crutches, are important to deepen the interpretation and allow for causal inferences. For this purpose, an identication module was mounted behind the cover of the water container of the FRR prototype, which was intended to recognize users wearing RFID tags. To give just one example how user identication can help: usually, people who use wheelchairs need more time than non-wheelchair users to open a door, enter the room and close the door again. This is partly because of the need to manoeuvre around the door when sitting in a wheelchair, mainly because standard door handles are hard to use, especially when, as is the case with MS, people have restricted mobility in their arms. Thus, if an analysis shows that the time needed by wheelchair users to enter the room is on average shorter than with a standard door, one can conclude that the FRR-designed door handle is a usability improvement for wheelchair users. Similarly, one can identify further patterns of use and possibly relate them to user characteristics as mentioned above. However, these patterns are not only important for the evaluation of the equipment, but also for guring out user IDs that were accidentally not recorded. Comparing anonymous episodes with patterns Unfortunately, RFID tag recognition only worked within a range of about 50cm around the toilet, and so not every person using the toilet was identied. Thus there are anonymous episodes which cannot be related to personal data from other sources. From a heuristic perspective, these anonymous data are nearly useless. As this applies for 53 % of the 316 episodes, this was a serious concern for the validity of the results. Thus it was decided to study the episodes of identied users in order to nd patterns that may allow for eventual identication of anonymous episodes. For some of the anonymous episodes, direct identication was possible. For others, most likely for users who did not use the prototype often, we could rely on conjecture based on what we could derive from the episodes of identied users. By comparing with the patterns identied in step 2, we made use of the anonymous episodes we analysed them by approaching the problem with empirically found categories.
6.1.3
Sonication design
The repertoire of sounds for the FRR Log player sonication design is: Door state (open or closed) is represented by coloured noise similar to diuse ambient outside noise; this noise plays when open and fades out when closed. Button presses on the remote control for height and tilt change (up or down) play short glissando events, up or down, identied for height or tilt by dierent basic pitch and timbre.
87 Alarm button presses are rendered by a doorbell-like sound - this button is mostly used to inform nurses that assistance is needed; use for emergency is rare. Flush button presses are represented with a decaying noise burst. Continuous state of height and tilt are both represented as soft background drones; when their values change, they move to the foreground, and when their values are static, they recede into the background again. This design mixes discrete-event sonication (marker sounds for the button presses) and continuous sonication (tilt and height).
Figure 6.3: FRR Log Player GUI and sounds mixer.
6.1.4
Interface design
The graphical user interface shown in 6.3 provides visual feedback, and possibilities for interaction: A button allows for selection of dierent episode les to sonify; it shows the lename when a le has been selected. If a user ID tag has been recorded in the log, that is shown.
88 For playing the sequence, Start/Stop buttons and a speed control are provided. Speed is the most useful control, as dierent patterns may appear on dierent timescales. A mixer for the levels of all the sound components is provided, and for tuning the details of each sound, can be called up from a button (px mixer). This ProxyMixer window allows for storing all tuning details as code scripts, so that useful settings can be fully documented, communicated and reproduced. The binary state variables are all visually displayed as buttons, and allow for triggering from the GUI: A button for the door, and buttons for remote buttons turn red when activated in the log. When they are pressed on the GUI, they play their corresponding sound, so users can learn the repertoire of symbolic sounds very quickly. The continuous variables are all displayed: time within the log as hours:minutes:seconds; height and tilt of the seat as labeled numbers and as a line with variable height and tilt. Finally, the last 5 and the next 5 events in the log are shown as text; this was very useful for debugging, and it provided an extra layer of available information to the users of the sonication design.
6.1.5
Evaluation for the research context
For the research context these data came from, this sonication design was successful: It represented time sequence data with several parallel streams of parameters, and events to be detected eciently, and it was straightforward to learn and use. The researchers reported being able to use rather high speedups, and being able to achieve good recognition of dierent user categories. In fact, the time scaling was essential for understanding the meaning behind the sequential order and timing of events. Especially the times between events, the breaks, were instructive as they possibly point to problems of the user with the equipment to be evaluated. In short, the sonication design solved the task at hand more eciently than the other tools previously used by the researchers.
6.1.6
Evaluation in SDSM terms
Within the subset of 30 episodes used for design development (out of 316), the longest is 1660 lines, and covers 32 minutes, while the shortest ones are ca. 180 lines, and 5 minutes. The SDSM Map shows data anchors for this variety of les, and marks for three dierent speedups of these 2 example les, original speed (x1) and speedups of x10 and x100. At lower speeds, one can leave the continuous sounds (tilt and height) on, while at high speedups, the rendering is clearer without them. The 8 (or 6 at higher speeds) data properties used are actually rendered technically as parallel streams; whether they are perceived as such is a question of the episode under study, the listening attitude, and
89 the playback speed. For example, one could listen to each button sound indivually, but usually the timed sequence of button presses would be heard as one stream of related, but disparate sound events.
Figure 6.4: SDS Map for the FRR Log Player. While this design was created before the SDSM concept existed, it conforms to all basic SDSM recommendations, as well as secondary guidelines. Time is represented as time; time scaling as the central SDSM navigation strategy is available as a user interaction parameter. Thus, users can experiment freely with dierent time scales to bring dierent event patterns into focus; in SDSM terms, the expected gestalt number (here, the data time rescaling) can be adjusted to t into the echoic memory time frame. This is supported here by adaptive time-scaling of the sound events: as time is sped up, sound durations shorten by the square root of the speedup factor (see below). Recorded (binary) events in time are represented by simple, easily recognized marker sounds; they either sound similar to the original events (the ush button, alarm bell), or they employ straightforward metaphors consistently (glissando up is up both for tilt or height), thus minimizing learning load. Continuous state is represented by a background drone, which is turned louder when changes happen; this jumping to the foreground amplies the natural listening behavior of tuning out constant background sounds, and being alerted when the soundscape
90 changes. For higher speedups, researchers reported that they often turned these components o completely, so the option to let users do that quickly was useful. The time scaling of marker sounds is handled in a way that can be recommended for re-use: Constant sound durations create too much overlap at higher speeds, while proportional scaling to the speedup factor deforms the symbolic marker sounds too much for easy recognition. So, the strategy invented for this sonication was to scale the durations of the marker sounds by 1/(timeScalescaleExp ); scaleExp values being between 0.0 (no duration scaling) and 1.0 (fully match sequence time scaling). For the time scaling range desired here, 1 to 100, scaling sound durations by the power of 0.5, i.e. the square root, has turned out to work well: the sounds are still easily recognized as transformations of their original type, and one can still follow dense sequences well.
6.2
Wahlgesnge - Election Songs a
This work is also described in de Campo et al. (2006a), and in the SonEnvir data collection here4 . The SC3 code for running this design can be downloaded from the SonEnvir svn repository here5 . It was designed by Christian Day and the author, and e it is based on an example for the Model-Based Sonication concept called Sonogram described by Hermann (2002); Hermann and Ritter (1999) (not to be confused with standard spectrograms or medical ultrasound-based imaging). The code example by Till Bovermann is available here6 . With the sonication design presented here, we can explore geographical sequences. As a straightforward and familiar example for social data with geographical distributions, we use election results; in particular, from the Austrian province of Styria, for provincial parliament elections in 2000 and 2005, and the national parliament election in 20067 . Our interest focused on displaying social data both in their geographical distribution, and at a higher spatial resolution than usual. Whereas most common displays of social data focus on the level of districts (here, 17), we wanted to design a sonication that displays spatial distances and similarities in the election results among neighboring communities. The mind model is that of a journey through Styria. A journey can be dened as the transformation of a spatial distribution into a time distribution. A traveler who starts at
http://sonenvir.at/data/wahlgesaenge/ https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/Soziologie/ElectionsDistMap/ 6 http://www.techfak.uni-bielefeld.de/ tboverma/sc/tgz/MBS Sonogram.tgz 7 Styria is one of nine federal states in Austria. It consists of 542 communities grouped in 17 districts, and about 1 190 000 people live here. In autumn 2005, more than 700 000 Styrian voters elected their political representatives. The result of this election was politically remarkable: the ruling conservative party OVP (Osterreichische Volkspartei: Austrian Peoples Party) has been defeated for the rst time since 1945 by the left social-democratic party SPO (Sozialdemokratische Partei Osterreichs: SocialDemocratic Party of Austria).
5 4
91 community A passes rst the neighbouring communities, and the longer she is on the way the more space is between her and community A. Hence, in this sonication, the spatial distances between communities are mapped onto the time axis.
6.2.1
Interface and sonication design
The communities are displayed in a two-dimensional window on a computer screen (see gure 6.5). For each community, the coordinates of the communitys administrative oces were determined and used as the geographical reference point of the respective community. The distances as well as the angles within our data thus correspond with the real distances and angles between the communities administrative oces.
Figure 6.5: GUI Window for the Wahlgesnge Design. a

The left hand panel allows switching between dierent election results (and district/community levels of aggregation), and between the parties to listen to. It also allows tuning some parameters of the sonication, and it displays a short description of the closest ten communities. The maps window shows a map of Styria with the community borders; this map is the clicking interface.
This sonication design depends strongly on user interaction: like most Model-Based Sonications, it needs to be played, like to a musical instrument; without user actions,
92 there is no sound. Clicking the mouse anywhere in the window initiates a circular wave that spreads in two-dimensional space. The propagation of this wave is shown on the window by a red circle. When the wave hits a data point, this point begins to sound in a way that reects its data properties. In our case, these data properties are the election results within each community. Thus, the user rst hears the data point nearest to the clicking point, from the proper spatial direction, with pitch being controlled by the turnout percentage of the currently selected party in that community (high pitch being high percentage); then the result for the second-nearest community, and so on. The researcher can select dierent parties to listen to their results from the election under study. Further, the researcher can choose a direction in which to look and listen. In gure 6.5, this direction is North, indicated by the soft radial line within the circular wave. The line begins at the point where the researcher has initiated the wave, to provide visual feedback while listening, and keeping a trace of which initial location the current sound was generated for. Data points along this line are heard from the front, others are panned to their appropriate directions. While this sonication was designed for a ring of twelve speakers surrounding the listener, it can be used with standard stereo equipment as well: For stereo headphones, one changes to a ring of four, and listens to the front two channels. Then, data points along the main axis are heard from the center, those on the left (or right) are panned accordingly, 90 degrees being all the way left or right8 . The points at more than 90 degrees o axis progressively fade out, and those above 135 degrees o axis are silent. The GUI provides the following sonication parameter controls: A distance exponent denes how much the loudness for a single data point decreases with increasing distance. For 2D spaces, 1/distance is physically correct, but stronger or weaker weightings are interesting to experiment with. The velocity of the expanding wave in km/second. The default of 50 km/sec scales the entire area (when played from the centre) into a synakusis-like time scale of 3 seconds. Slower or higher speeds can be experimented with to zoom further in or out. The maximum number of communities (Orte in German) that will be played. Selecting only the nearest 50 or so data points allows for exploration of smaller areas in more detail. The decay time for each individual sound grain. At higher speeds, shorter decays create less overall overlap, and thus provide more clarity; for smaller sets and slower speeds, longer decay times allow for more detailed pitch perception and thus higher perceptual
Note that for stereo speakers at +-30 degrees, the angles within +-90 degrees are scaled together to +-30 degrees - which we nd preferable to keeping the angles intact and only hearing a 60 degree slice of all the data points, which could be done by leaving the setting at 12 channels, and only using the rst 2.
8
93 resolution. The direction in which the wave is looking; in the sound, this determines which direction will be heard from the front. The direction can be rotated through North, West, South and East. For more detail information, the ten data point locations nearest to the clicked point are shown on a list view.
6.2.2
Evaluation
This sonication design is a good tool for outlier analysis. It works rather fast at a low level of aggregation (communities), and outliers are easily identied by tones that are higher than their surroundings. Typically, these are local outliers: in an area that has a local average value of say 30%, you can hear a 40% result sticking out; when analysing the entire dataset statistically, this may not show up as an outlier. A second strong feature is the ability to get a quick impression of distributions of a data dimension with their spatial order intact, so achieving the tricky task of developing an intuitive grasp of the details of ones data becomes more likely. This sonication design is not restricted to election data: Other social indicators that are assessed at the community level (unemployment rates, labor force participation rate of women, and others) can be included. To represent them in conjunction with e.g. election results promotes the investigation of local dependencies that might be hidden by higher aggregation levels or by the mathematical operations of correlation coecients. Finally, this sonication design is of course not restricted to the geographical borders of Styria. It can be used as an exploratory tool enabling researchers to quickly scan social data in their geographical distribution, at dierent aggregation levels. Given an interesting question to address at such higher levels, an adaptation to dierent geographical scales, i.e. European and global data distributions is straightforward to do, e.g. with nations as the aggregation entity. When considered from an SDSM perspective, this sonication design respects a number of SDSM recommendations: It shows the important role of interaction design, while the sound aspect of the sonication design itself remains rather basic. It also shows the central importance of time scaling/zooming between overview and details; in fact this design was the source for recommending this particular time-scaling strategy within the SDSM concept. The design also demonstrates metaphorical simplicity recommended by SDSM. An SDSM graph shows that the sonication can render one data property of the entire set within echoic memory time frame, and zoom into more detail by selecting subsets, or by slowing down the propagation speed.
94
Figure 6.6: SDS-Map for Wahlgesnge. a
6.3
6.3.1
Social Data Explorer

Background
This sonication design is a study for mapping geographically distributed multidimensional social data to a multiparametric sonication, i.e. a classical parameter mapping sonication. It oers a number of interaction possibilities, so that sociologists (the intended user group) can experiment with changing the mappings freely. This serves both for learning sonication concepts by experimentation and for nding interesting mappings, for instance, mappings that conrm known correlations between parameters. The example data le contains the distribution of the working population of all 542 communities in Styria by sectors of economic activities, given in table 6.1. This data le is quite typical for geographically distributed social data.
6.3.2
Interaction design
A number of interactions can be accessed from the user interface shown in gure 6.7: Order allows sorting by a chosen parameter (alphabetically or numerically); up is ascending, dn is descending. The number-box is for choosing one data item to inspect
95 Table 6.1: Sectors of economic activities Agrarian, Wood-, and Fishing Industries Mining Production of commodities Energy and Water Industries Construction Trade Hotel and Restaurant Trade Trac and Communication Credit and Insurance Realty, Company Services Public Administration, Social Security Pedagogy Health, Veterinary, and Social Services Other Services Private Households Exterritorial Organisations First-time seeking work
by index in the sorted data, so e.g. 0 is the rst data point of the current sorted order. Every parameter of the sonication can be mapped by using the elements of a mapping line: For every synthesis or playback parameter, users can select a data dimension. The data range in minimum and maximum values is displayed. The data can have a warp property, i.e. whether the data should be considered linear, exponential, or have another characteristic mapping function. The arrow-button below pushes the range of the current data dimension to the editable number boxes, as this is the data scaling range (mimax) to use for parameter mapping. This range can be adjusted, in case this becomes necessary to experiment with a specic hypothesis. The second mimax range is the Synth parameter range, which is adjustable, as is the warp factor. Here, the arrow-button also pushes in the default parameter values. The range display that follows shows the default synthesis parameter range (e.g. 2020000 for frequency), and the popup menu under Synth Param shows the name of the parameter chosen for that mapping line. Setting playRange determines the range of data point indices to play within the current sorted data, with 0 being the rst datapoint. post current range posts the current range in the current order. The nal group of elements, labeled styrData, allows for starting and stopping the
96
Figure 6.7: GUI Window for the Social Data Explorer.

The top line of elements is used for sorting data by criteria. The ve element lines below are for mapping data dimensions to synth parameters, and scaling the ranges exibly. The bottom line allows selecting a range of interest within the sorted data, and sonication playback control.
sonication playback.
6.3.3
Sonication design
The sonication design itself is quite a simple variant of discrete-event parameter mapping. Three dierent synthesis processes (synthdefs) are provided, all with control parameters for freq, amp, pan, sustain. The synthdefs mainly vary in the envelope they use (one is quasi-gaussian, the other two percussive), and in the panning algorithm (sineAz is for multichannel ring-panning). Which of these sounds is used can be changed in the code. The player algorithm iterates over the chosen range of data indices. It maps the values of each data item to values for the synthesis events parameters, based on the current mapping choices. If nothing is chosen for a given synthesis parameter, a default value is used (e.g. for duration of the event, 0.1 seconds).
97
6.3.4
Evaluation
For experimenting with parameter mapping sonication, this design allows for similar rendering complexity as the Sonication Sandbox (Walker and Cothran (2003)), though without parallel streams of sounds. Both the user interface and the sonication itself are sketches rather than polished applications, e.g. the user interface could allow loading data les, switching between instruments, and derive its initial display state from the current state of the model. Given more development time, it would benet from multiple and more complex sound functions, from making more functionality available from GUI elements, and from fuller visual representation of the ongoing sonication. Nevertheless, according to the sociologist colleague who experimented with it, it supported exploration of the particular type of data le well enough to conrm its viability. While we intended to experiment with designs bridging between the Wahlgesnge design a and the Social Data Explorer, this was not pursued, mainly due to time constraints, and because other ventures within the SonEnvir project were given higher priority.
Chapter 7
Examples from Physics

In the course of the SonEnvir project, we began with sonications of quantum spectra, and later decided to shift the focus to statistical spin models as employed in computational physics, for various reasons given below. Sonication has been used in physics rather intuitively, without referring to the term explicitly. The classical examples are the Geiger counter and the Sonar, both monitoring devices for physical surroundings. An early example of research using sonication is the experiment of the inclined plane by Galileo Galilei. Following Drake (1980), it seems plausible that Galilei used auditory information to verify the quadratic law of falling bodies (see chapter 3, and gure 3.1.1). In reconstructing the experiment, Riess et al. (2005) found that time measuring devices of the 17th century (water clocks) were almost certainly not precise enough for these experiments, while rhythmic perception was. In modern physics, sonication has already played a role: one example of audication is given in a paper by Pereverzev et al., where quantum oscillations between two weakly coupled reservoirs of superuid helium 3 (predicted decades earlier) were found by listening: Owing to vibration noise in the displacement transducer, an oscilloscope trace [...] exhibits no remarkable structure suggestive of the predicted quantum oscillations. But if the electrical output of the displacement transducer is amplied and connected to audio headphones, the listener makes a most remarkable observation. As the pressure across the array relaxes to zero there is a clearly distinguishable tone smoothly drifting from high to low frequency during the transient, which lasts for several seconds. This simple observation marks the discovery of coherent quantum oscillations between weakly coupled superuids. (Pereverzev et al. (1997)) Next to sonication methods in physics, physics methods found their way into sonication, as in the model-based sonication approach by Hermann and Ritter (1999). For example, in so called data sonograms, physical formalisms are used to explore highdimensional data spaces; an adaptation of the data sonogram approach has been used in the Wahlgesnge sonication design described in section 6.2. a
98
99 Physics and sonication In physics, sonication has particular advantages. First of all, modern particle physics is usually described in a four-dimensional framework. For a three dimensional space evolving in time, a complete static visualisation is not possible any more. This makes it harder to understand and thus very abstract - thus in both didactics and research, sonication may be useful. In the auditory domain, many sound parameters may be used to display a four-dimensional space, maintaining symmetry between the four dimensions by comparing dierent rotations of their mappings. A feature of auditory dimensions that has to be taken into account is that these dimensions are generally not orthogonal, but could rather be compared to mathematical subspaces (see Hollander (1994)). This concept is very common in physics, and thus easily applicable. Furthermore in physics, many phenomena are wave phenomena happening in time, just as sound is. Thus sonication provides a very direct mapping. While scientic graphs usually map the time direction of physical phenomena onto a geometrical axis, this is not necessary in a sonication, where physical time persists, and multiple parameters may be displayed in parallel. While perceptualisation is not intended to replace classical analytical methods, but rather to complement them, there are examples where visual interpretation is superior to or at least preceding mathematical treatment. For instance, G. Marsaglia (2003) describes a battery of tests for the quality of numerical random number generators. One of these is the parking lot test, where mappings of randomly lled arrays in one plane are plotted and visually searched for regularities. He argues that visual tests are striking, but not feasible in higher dimensions. As nothing is known beforehand about the nature of patterns that may appear in less than ideal random number generators, there is no all-encompassing mathematical test for this task. Sonication is a logical continuation of such strategies which can be applied with multidimensional data from physical research contexts. The major disadvantage of sonication we encountered is that physicists (and probably natural scientists in general) are not familiar with it. Visualisation techniques and our learnt understanding of them has been rened since the beginnings of modern science. For auditory perception especially, we were e.g. confronted with the opinion that the hearing process is just a Fourier transformation, and could be fully replaced by Fourier analysis. This illustrates that much work is required before sonication becomes standard practice in physics.
100
7.1
Quantum Spectra Sonication1
Quantum spectra are essential to understand the structure and interactions of composite systems in such elds as condensed matter, molecular, atomic, and subatomic physics. Put very briey, quantum spectra describe the particular energy states which dierent subatomic particles can assume; as these cannot be observed directly, competing models have been developed that predict the precise values and orderings of these energy levels. Quantum spectra provide an interesting eld for auditory display due to the richness of their data sets, and their complex inner relations. In our experiments (us referring to the physics group within SonEnvir), we were concerned with the sonication of quantum-mechanical spectra of baryons, the most fundamental particles of subatomic physics observed in nature. The data under investigation stem from dierent competing theoretical models designed for the description of baryon properties. This section reports our attempts at nding valid and useful strategies for displaying, comparing and exploring various model predictions in relation to experimentally measured data by means of sonication. We investigated the possibilities of sonication in order to develop them as a tool for classifying and explaining baryon properties in the context of present particle theory. Baryons - most prominently among them the proton and the neutron - are considered as bound systems of three quarks, which are presently the ultimate known constituents. The forces governing their properties and behaviour are described within the theory of quantum chromodynamics (QCD). While up to now this theory is not yet exactly solvable for baryons (at low and intermediate energies), one resorts to eective models, such as constituent quark models (CQMs). CQMs have been suggested in dierent variants. Existing models dier mainly in which components they consider to constitute the forces binding the constituent quarks: All models include a so called connement component - as the distance between quarks expands, the forces between them grow, which keeps them conned - and a hyperne interaction, which models interactions between quarks by particle exchange. As a result there is a variety of quantum-mechanical spectra for the ground and excited states of baryons. The characteristics of the spectra contain a wealth of information important for the understanding of baryon properties and interactions. Baryons are also classied by the combinations of quarks they are made up of, and by a number of other properties such as color, avor, spin, parity, and angular momentum, which can be arranged in symmetrical orders. For more background in Constituent Quark Models and baryon classication, please refer to Appendix C.1.
This section is based on material from two SonEnvir papers: de Campo et al. (2005d) and de Campo et al. (2006a).
1
101
7.1.1
Quantum spectra of baryons
The competing CQMs produce baryon spectra with characteristic dierences due to the dierent underlying hyperne interactions. In gure 7.1 the excitation spectra of the nucleon (N ) and delta () particles are shown for three dierent classes of modern relativistic CQMs. While the ground states are practically the same (and agree with experiments) for all CQMs, the excited states show dierent energies and thus level orderings. (For instance, in the OGE CQM the rst excitation above the N ground state + is J P = 12 , whereas for the GBE CQM it is J P = 12 .) Evidently the predictions of the GBE CQM reach the best overall agreement with the available experimental data.
Figure 7.1: Excitation spectra of N (left) and (right) particles.

In each column, the three entries left to right are the energies (in MeV, or Mega-electronVolts) based on One-Gluon exchange (Eidelman (2004)), Instanton-induced (Glozman et al. (1998); Loering et al. (2001)), and Goldstone-Boson Exchange (Glantschnig et al. (2005)) constituent quark models. The shaded boxes represent experimental data, or more precisely, the ranges of imprecision that measurements of these data currently have (Eidelman (2004)).
7.1.2
The Quantum Spectra Browser
Sonifying baryon mass spectra The baryon spectra as visualised by patterns such as in Fig. 7.1 allow a discrimination of the qualities of the CQM description of experiment. Also one can read o characteristic features of the dierent CQMs such as the distinct level orderings, etc. However, it is quite dicult to conjecture specic symmetries or other relevant properties in the dynamics of a given CQM by just looking at the spectra. Thus, there are a number of open research questions where we expected sonication to be helpful. We began by identifying phenomena that are likely to be discernible in sonication experiments:
102 Is it possible to distinguish e.g. the spectrum of an N 12 nucleon from, say, a delta + 32 by listening only? Is there a common family sound character for groups of particles, or for entire models? In the connement-only model, the intentionally absent hyperne interaction causes data points to merge into one: is this clearly audible?
+
We studied the sonication of baryon spectra with three specic data sets. They contain the N as well as ground state and excitation levels for three dierent dynamical situations: 1) the GBE CQM (Glozman et al. (1998)), 2) the OGE CQM (Theussl et al. (2001)), and 3) the case with connement interaction only, i.e., omitting the hyperne interaction component. Each one of these data les is made up of 20 lists, and each list contains the energy levels of a particular N as well as multiplet J P . The lists are dierent in length: Depending on the given J P multiplet they contain 2 - 22 entries, since we only take into account energy levels up to a certain limit. Sonication design For sonication of baryon spectra, the most immediately interesting feature is the level spacing. The quantum-mechanical spectrum is bounded from below and its absolute position is xed by the N ground state (at 939 MeV); above that, spectral lines up to ca 3500 MeV appear for the excited states in the spectrum of each particle. As the study of these level spacings depends on the precise nature of the distances between these lines within and across particles, a sonication design demands high resolution for that parameter; thus we decided to map these dierences between the energy levels to audible frequencies. Several mapping strategies were tried for an auditory display of the spacings between the energy levels in the spectra: I) Mapping the mass spectra to frequency spectra directly, with tunable transposition together with optional linear frequency shift and spreading, and II) Mapping the (linear) mass spectra to a scalable pitch range, i.e. using perceptually linear pitch space as representation. Both of these approaches can be listened to as simultaneous static spectra (of one particle at a time) and as arpeggios with adjustable temporal spread against a soft background drone of the same spectrum. Interface design These models are implemented in SuperCollider3 scripts; for more exible browsing, a GUI was designed (see gure 7.2). All the tuneable playback settings can be changed
103 while playing, and they can be saved for reproducibility and an exchange of settings between researchers. Some tuning options have been included in order to account for known data properties: E.g., the values calculated for higher excitations in the mass spectra are considered to be less and less reliable; we modeled this with a tuneable slope factor that reduces amplitude of the sounds representing the higher excitation levels in all models.
Figure 7.2: The QuantumSpectraBrowser GUI.

The upper window allows for multiple selection of particles that will be iterated over in 2D loops; or alternatively, for direct playback of that particle by clicking. The lower window is for tuning all the parameters of the sonication design interactively.
For static data like these, exible, interactive comparison between dierent subsets of the data is a key requirement; e.g. in order to nd out whether discrimination by parity P is possible with auditory display, one will want to automatically play interleaved sequences alternating between selected particles with positive and negative parities. The Quantum Spectra Browser window allows for the following interactions: The buttons Manual, Autoplay choose between manual mode (where click on a button switches to the associated sound) and an autoplay mode that iterates over all the selected
104 particles, either horizontally (line by line) or vertically (column by column). The buttons LoopStart, LoopStop stop and start this automatic loop; the numberbox stepTime sets for how many seconds each spectrum is presented. The three rows of buttons below Goldstone, OneGluon, Connement allow for playing individual spectra, or for a multiple selection of which particles are heard in the loop. The QSB Sound Editor allows for setting many synthesis/spatialisation parameters: xedFreq sets the freqency that corresponds to ground state; the default value is 939 Hz (for 939 MeV). fRangScale rescales the frequency range the other energy levels are mapped into: a scale of 1 is original values, 2 expands to twice the linear range. As this distorts proportions, we mostly left this control at 1. transpose transposes the entire spectrum by semitones, so a value of -24 is two ocatves down. This leaves proportions intact, and many listeners nd this frequency range more comfortable to listen to. slope determines how much the frequency components for higher energy levels are attenuated; this models the decreasing validity of higher energy levels. 0 is full level, 0.4 means each line is softer by a factor of 1 0.4 than the previous line. (The frequency-dependent sensitivity of human hearing is compensated for separately using the AmpComp unit generator). panSpread sets how much spectral lines are separated spatially. With a spread of 1, and stereo playback, the ground state is all the way left, and the highest excited state is all right; less than 1 means they are panned closer together. When using multichannel playback, this can expand over a series of up to 20 adjacent channels. panCenter sets where the center line will be panned spatially - 0 is center, -1 is all left, 1 is all right. The remaining parameters tune the details of an arpeggiation loop: essentially, a loop of spread-out impulses excites the spectral lines individually, and they ring until a minimum level is reached. ringTime determines how long each component will take to decay (RT for -60dB) after an impulse. bgLevel maintains presence of the entire spectrum as one gestalt: the sprectal line sounds will only decay to this minimum level and remain at that level. attDelay determines when within the loop the rst attack will play. attSpread determines how spread out the attacks will be within the loop time; within the loop the rst attack will play. loopTime determines the time for one cycle of impulses.
7.1.3
The Hyperne Splitter
Addressing a more subtle issue, we then designed a Hyperne Level Splitter, which allows for studying the so called splittings of the energy levels due to a variable strength of the hyperne interaction inherent in the CQMs. The hyperne interaction is needed in order to describe the binding of three quarks more realistically, i.e. in closer accordance with experimental observation. When it is absent (in simulations), certain quantum states are degenerate, meaning that the corresponding energy levels of some particles coincide.
105 In the rst demonstration example, we chose the excitation levels of two dierent particles (the Neutron n-1/2+ and the Delta d3/2+), calculated within the same CQM, the Goldstone-Boson Exchange model (gbe) Glozman et al. (1998). These two particles are degenerate when there is no hyperne interaction present. Sonication design Mapped into sound, this means that one hears a chord of three tones for the ground states and the rst two excitation levels, which are the same for both particles. Here, auditory perception is more dicult than in the Quantum Browser, as the mass spectra are being played as continuous chords, and the hyperne interaction may be turned up gradually (to 100 percent). Thereby, the energy levels are pulled apart, and one hears a complex chord of six tones. The two particles that are compared can be distinguished acoustically now, as when they are observed in experiments. With the Level Splitter, the dynamical ingredients leading to these energy splittings may be studied in detail, and likewise the quantitative dierences in distinct CQMs. The underlying sonication design is an extension of that for the Quantum Browser. Mainly, some parameters are added to control the number of spectral lines to be represented at once, and a balance control between the simultaneous or interleaved two channels that are compared. Interface design The Hyperne Data Player window allows for the following interactions: The sets of pop-up menus labeled lef t and right select which model (GBE, OGE), which particle (Nukleon, Delta, etc.), which state (1/2, 3/2 etc), and which parity (+, -, both) is chosen for sonication in that audio channel. The slider percent determines where to interpolate between the model points of choice and their corresponding points in the Connement-only model; this is where the hyperne interaction component to the model can be gradually turned on or o. The graphical view below, labeled l3, l2, l1 l1, l2, l3 shows the precise values for the rst several energy states of the two particles chosen. The very bottom is ground state (939MeV), the visible range above goes up to 3500. In the state shown, a so called level crossing can be seen (and heard): level 3 of GBE nucleon 1/2 (both parities) crosses below level 2; by comparison, in OGE, the same particle has monotonically ascending spectral energy states. The bottom row of buttons stops and starts the sonication, posts the current interpolated values, and recalls a number of prepared demo settings. The Hyperne Editor allows for setting many synthesis/spatialisation parameters familiar from the Quantum Spectra Browser, as well as several more: balance sets the balance between left and right channels. bgLevel sets the minimum level for arpeggiated
106
Figure 7.3: The Hyperne Splitter GUI.

The left window is for selecting two particles by model, particle name, spin, and parity; the hyperne component is faded in and out with the slider in the middle. The bottom area shows the audible spectral lines central-symmetrically. The window on the right side is the editor for the synthesis and spatialisation parameters of the sonication design.
settings, as above. brightness adds harmonic overtones (by frequency modulation) to the individual lines so that judging their pitch becomes easier. pitStretch rescales the pitch range the other energy levels are mapped into: a scale of 1 is original values, 2 expands to twice the intervallic range. (This is dierent from fRangScale above, which used linear scaling). transpose transposes the entire spectrum by semitones, as above. melShift determines when within the loop the second channels attack will play relative to the rst. 0 means they play together, 3 means they are equally spaced apart (by 3 of 6 subdivisions); the maximum of 6 plays them in sync again. melSpread determines how much the attacks within one channel are arpeggiated; 3 means such that they appear equally spread in time. Together, these two controls allow alternating the two spectra as groups, or interleaving the individual spectral lines across the two spectra. ringAtt determines how fast the attack times are for both channels. ringDecay sets the decay time for the spectral lines for both channels. nMassesL, nMassesR is handled automatically when changing particle types and properties. This is the number of masses audible in the spectrum, which can be reduced by hand if desired. ampGrid sets the volume for a reference grid (clicks) which can be turned on for orientation.
107
7.1.4
Possible future work and conclusions
At this point, the physicists that had worked closely with these data in their own research agenda unfortunately had to leave the project, which meant that this line of experiments came to an end before more interesting ideas could be experimented with. These were intended to explore a number of further aspects that may ultimately be relevant in the scientic study of particle physics by sonication, and for completeness, these loose ends are given here. Comparison with experimental data As can be seen seen from gure 7.1, there are several experimental data available for the energy levels. However, they are aected by experimental uncertainties. Consequently, their auditory display needs some adaptations. We intended to dierentiate between (sharp) theoretical data as deduced from the CQMs and (spread) phenomenological data measured in experiment by adding narrow band modulation to spread-out data bands. It should be quite interesting to qualify the theoretical predictions vis-a-vis the experimental data. Representing symmetries with spatial ordering Much eort has gone into nding visual representations for the multiple symmetries between particle groups and families. Arranging the sound representations in 3D-space with virtual acoustics in a spatial order determined by symmetry properties between particle groups may well be scientically interesting; navigating such a symmetry space could become an experience that lets physicists acquire a more intuitive notion of the nature of these symmetries. Temporal aspects There are plenty of interesting time phenomena in the quantum physics, which could be made use of in numerous ways in further explorations. For example, there is an enormous variation in the half-life of dierent particles. This could be expressed quite directly in dierentiated decay times for dierent spectral lines. In addition, including the probabilities for transitions between excited states and ground states will open promising possibilities for demonstrating the dynamical ingredients in the quark interactions inside baryons.
108 Conclusions Our investigations have indicated that sonication is an interesting alternative and a promising complementary tool for analysing quantum-mechanical data. While many interesting design ideas came up in this line of research, which may well be useful for other contexts, the implemented sonication designs were not fully tested by domain experts in this quite specialised eld. Given motivated domain science research partners, a number of good candidates for sonication approaches remain to be explored further in the context of quantum spectra.
109
7.2
Sonication of Spin models2
Spin models provide an interesting test case for sonication in physics, as they model complex systems that are dynamically evolving and not satisfactorily visualisable. While the theoretical background is largely understood, their phase transitions have been an interesting subject of studies for decades, and results in this eld can be applied to many scientic domains. While most classical methods of solving spin models rely on mean values, their most important feature, especially at the critical point of phase transition, are the spin uctuations of single elements. Therefore we started out with the uctuations of the spins, and provided auditory information that can be analysed qualitatively. The goal was to display three-dimensional dynamic systems, distinguish the dierent phases and study the order of the phase transition. Audication and sonication approaches were implemented for the spin models studied, so that both realtime monitoring of the running model and analysis of prerecorded data sets is possible. Sound examples of these sonications are described in Appendix C.2.
7.2.1
Physical background
Spin systems describe macroscopic properties of materials (such as ferromagnetism) by computational models of simple microscopic interactions between single elements of the material. The principal idea of modeling spin systems is to study a complex system in a controlled way, where they are theoretically tractable, and mirror the behaviour of real compounds. From a theoretical perspective, these models are interesting because they allow studying the behaviour of universal properties in certain symmetry groups. This means that some properties do not depend on details like the kind of material, such as so-called order parameters giving the order of the phase transition. Already in 1945, E. A. Guggenheim (cited in Yeomans (1992)) found that the phase diagrams of eight dierent uids he studied shows the very same coexistence curve3 . A theoretical explanation is given by a classication in symmetry groups all of these dierent uids belonged to the same mathematical group.
This section is based on a SonEnvir ICAD conference paper, Vogt et al. (2007) This becomes apparent when plotted in so-called reduced variables, the reduced temperature being T /Tcrit , the actual temperature relative to the critical one, and pressure is treated likewise.
3 2
110
7.2.2
Ising model
One of the rst spin models, the Ising model, was developed by Ernst Ising in 1924 in order to describe a ferromagnet. Since the development of computational methods, this model has become one of the best studied models in statistical physics, and has been extended in various ways.
Figure 7.4: Schema of spins in the Ising model as an example for Spin models.
The lattice size here is 8 x 8. At each lattice location, the spin can have one of two possible values, or states (up or down).
Its interpretation as a ferromagnet involves a simplied notion of ferromagnetism.4 As shown in gure 7.4, it is assumed that the magnet consists of simple atoms on a quadratic (or in three dimensions cubic) lattice. At each lattice point an atom (here, a magnetic moment with a spin of up or down) is located. In the computational model, neighbouring spins try to align to each other, because this is energetically more favorable. On the other hand, the overall temperature causes random spin ips. At a critical temperature Tcrit , these processes are in a dynamic balance, and there are clusters of spins on all orders of magnitude. If the temperature is lowered from Tcrit , one spin orientation will prevail. (Which one is decided by the random initial setting.) Macroscopically, this is the magnetic phase (T < Tcrit ). At T > Tcrit , the thermal uctuations are too strong for uniform clusterings of spins. There is no macroscopic magnetisation, only thermal noise.
There are many dierent application elds for systems with next neighbour interaction and random behaviour. Ising models have even been used to describe social systems, as e.g. in P. Fronczak (2006), though this is a disputed method in the eld.
4
111
7.2.3
Potts model
A straightforward generalisation of this model is the admission of more spin states than just up and down. This was realized by Renfrey B. Potts in 1952, and was accordingly called the Potts model. Several other extensions of models were studied in the past. We worked with the q-state Potts model and its special case for q = 2, the Ising model, both being classical spin models. For mathematical background, see Appendix C.2. The order of the phase transition is dened by a discontinuity in the derivates of the free energy (see gure 7.5). If there is a nite discontinuity in one of the rst derivatives, the transition is called rst order. If the rst derivatives are continuous, but the second derivatives are discontinuous, it is a so-called continuous phase transition.
Figure 7.5: Schema of the orders of phase transitions in spin models.

The mean magnetisation is plotted vs. decreasing temperature. (a) shows a continuous phase transition and (b) a phase transition of rst order. In the latter, the function is discontinuous at the critical temperature. The roughly dotted line gives an approximation on a nite system, e.g. a computational model. The bigger the system, the better this approximation models the discontinuous behaviour.
Nowadays, spin models are usually simulated with Monte Carlo algorithms, giving the most probable system states in the partition function (Yeomans, 1992, p. 96). We implemented a Monte Carlo simulation for an Ising and Potts model in SuperCollider3 (see gure 7.2.3). The lattice is represented as a torus (see g. 7.8) and continually updated: for each lattice point, a dierent spin state is proposed, and the new overall energy calculated. As shown in equation C.3, it depends on the neighours interactions (Si Sj ) and the overall temperature (given by the coupling J 1/T ). If the new energy is smaller than the old one, the new state is accepted. If not, there is still a certain chance that it is accepted, leading to random spin ips representing the overall temperature. To observe the model and draw conclusions from it, usually mean values of observables are calculated from the Monte Carlo simulation, e.g. the overall magnetisation. The simulation needs time to equilibrate at each temperature in order to model physical
112 reality, e.g. with small or large clusters. Big lattices with a length of e.g. 100 need many equilibration steps. With a typical evolution of the model, critical values or the order of the phase transition can be deduced. This is not rigorously doable, as on a nite lattice a function will never be continuous, compare gure 7.5. In a quantised system, the jump in the observable will just look more sudden for a rst order phase transition. This last point is both an argument for using sonication and a research goal for this study: by using more information than the mean values, the order of the phase transition can be more clearly distinguished. Also, we studied dierent phase transitions with the working hypothesis that there might be principal dierences in the uctuations, which can be better heard. (A Potts model with q 4 states has a continuous phase transition, whereas with q 5 states it has a phase transition of rst order.) Thus researchers may gain a quick impression of the order of the phase transition. Implementing spin models In all the analytical approaches, the solving procedures of models are based on abstract mathematics. This gives great insight in the universal basics of critical phenomena, but often a quick glance on a graph complements classical analysis, as mentioned above. Thus in areas where visualisation cannot be done, applying sonication can help to reach an intuitive understanding with relatively few underlying assumptions. Sonication tools can also serve as monitoring devices for highly complex and high dimensional simulations. The phases and the behaviour at the critical temperature can be observed. Finally, we were particularly interested in sonication of the critical uctuations with self-similar clusters on all orders of magnitude. We wanted to provide for a more or less direct observation of data on all levels of the analysis, both to verify assumptions and to not overlook new insights. This should be done by observing the dynamic evolution of the spins, not only mean values. Thus, the important characteristic of spin uctuations can be studied and the entire system continuously observed. Spin model data features Spin models have several basic characteristics, which were used in dierent sonication approaches. These properties refer to the structure of the model, the theoretical background and its interpretation, and they were exploited for the sonication as follows: The models are discrete in space by xed lattice positions and these are lled with discrete valued spins. The data sets are rather big, on the order of a lattice size of 100 in two or three dimensions, and are dynamically evolving. Because of the
113
Figure 7.6: GUI for the running 4-state Potts Model in 2D.
The GUI shows the model in a state above critical temperature, where large clusters emerge. The lattice size is 64x64. The averages below the spin frame show the development of the mean magnetisation for the 4 spin parities over the last 50 congurations. As the temperature is constant and the system has been equilibrated before, these mean values are rather constant.
114 specics of the modeling, the simulations are only correct on the statistical average, and many congurations have to be taken into account together for correct interpretation. Considering that a single auditory event has to have some minimum duration to display perceptually distinguishable characteristics, we explored two options for the auditory display: a fast audication approach, and omission, i.e. representing only a subset of all spins, using a granular approach. The models are calculated by next-neighbour interaction aligning the spins on the one hand, and random uctuations on the other. We aimed to preserve the nextneighbour property at least partially by dierent strategies of moving through the data frame: either along a conventional torus path, or along a Hilbert-curve, see g. 7.8 (in approaches 7.2.4, 7.2.5 and 7.2.7). For the lossy (omission) approach, the statistical nature of the model was preserved by picking random elements for the granular sonication. There is a global symmetry in the spins, thus - in the absence of an exterior magnetic eld - no spin orientation is preferred. This was mapped for the Ising model by choosing the octave for the two spin parities. In the audications, every spin orientation is assigned a xed value, and symmetry is preserved as the sound wave only depends on the relative dierence between consecutive steps in the lattice. At the critical point of phase transition, the clusters of spins become self-similar on all length scales. We tried to use this feature in order to generate a dierent sound quality at the point of phase transition. This would allow a clear distinction between the two phases and the (third) dierent behaviour at the critical temperature itself.
7.2.4
Audication-based sonication
In this approach, we tried to utilise the full available information generated by the model. As the Sonication Design Space Map suggests audication for higher density auditory display, we interpreted the spins within each time instant as a waveform (see gure 7.7). This waveform can be listened to directly or taken as a modulator of a sine wave.5 When the temperature is lowered, regular clusters emerge, changing only slowly from time step to time step. Thus, if the audication preserves locality, longer structures will emerge aurally as well, resulting in more tone-like sounds. When one spin dominates, there is silence, except for some random thermal uctuations at non-zero temperature.
While this would not qualify as an audication by the strictest denition, such a simple modulation is still conceptually quite close.
5
115
Figure 7.7: Audication of a 4-state Potts model.

The rst 3 milliseconds of the audio le of a model with 4 dierent states in the high temperature phase (noise).
Figure 7.8: Sequentialisation schemes for the lattice used for the audication.
The left scheme shows a torus sequentialisation, where spins at opposed borders are treated as neighours. This treats a 2D-grid like a torus (a doughnut shape), as row by row is read. On the right side a Hilbert curve is shown.
While g. 7.7 explains handling of one line of data for the sonication, the question remains how to move through all of them. Dierent approaches of sequentialisation are shown in g. 7.8. The model has periodic boundary conditions, so a torus path is possible. We also experimented with moving through the lattice along a Hilbert curve. This is a space lling curve for quadratic geometries, reaching every point without intersecting with itself. This was intended to make the audication insensitive to dierences which arise depending on whether rows or columns are read rst, which can occur in the case of symmetric clustering. Eventually, it turned out that symmetric clustering mainly depends on unfavorable starting conditions and occurs only rarely, so we mostly used a torus path, as the model does in the calculation. The sounds were recorded directly from the interactive model, using the GUI shown in g. 7.2.3 for a specic temperature. In order to judge the phase of the system, this simple method is most ecient.
116 At the time of recording, the model has already been equilibrated - its state represents a typical physical conguration for the specic temperature. When the temperature is cooled down continually, the system needs several transition steps at each new temperature before the data represents the new physical state correctly. Thus, in a second approach, data was pre-recorded and stored as a sound-le. Contrary to our assumptions, the continuous phase transition is not very clearly distinguishable from the rst order phase transition. This is partly due to the data - on a quantised lattice there are no truly continuous observables, so the distinction between rst and second order transitions is fuzzy in principle. A fundamental problem is that the equilibration steps (which are not recorded!) between the stored congurations cut out the meaningful transitions between them: That these equilibration steps are needed at all is in fact a common drawback in the established computational spin models. When one considers every complete lattice state as one sequence of single audio samples (e.g. 32x32 = 1024 lattice sites), then with a sampling frequency of 44100 Hz, at every 23 ms a potentially completely dierent state is rendered, instead of a continuously evolving system with only few changes in the cluster structures from one frame to the next. This makes it more dicult to understand the dynamic evolution of the transitions. We tried to leave out as few equilibration steps as possible to stick closely to a physical relevant state and still keep the transitions understandable. Consequently, we recorded e.g. for a 32x32-lattice every 32nd step, and on the whole 10 dierent couplings (temperatures), every 32 times. Thus, our soundles (described in appendix C.2) have (32 x 32) lattice sites x 10 couplings x 32 record steps = 327680 samples, and last 7.4 s. Still, when comparing a 4-state Potts model to one with 5 spin states, the change in the audio pattern is only slightly more sudden in the latter.
7.2.5
Channel sonication
We rened the audication approach by recording data for each spin separately. This concept is shown in gure 7.9. All of the lattice is sequentialised like a torus (see g. 7.8) and read out for every spin state separately. When data of spin A is collected, only lattice sites with spin A are set to 1; all the others to 0. On the contrary, when spin B data is collected, all lattice sites with spin A are set to 0, and spin B to 1; and so forth. Thus, the dierent spins are separate and can be played on dierent channels. One remaining problem is that the channels are highly correlated: in the Ising model with only 2 states, the 2 channels are exactly reciprocal. Thus there may be phase cancellations in the listening setup that makes it harder to distinguish the channels. Still, the overall impression is clearer than the simple audication, and this approach is the most promising regarding the order of the phase transition.
117
Figure 7.9: A 3-state Potts model cooling down from super- to subcritical state.
The three states are recorded as audio channels, shown here with time from left to right. Toward the end, channel 2 dominates.
7.2.6
Granular sonication
In this approach, the data were pre-processed, which allowed for designing less fatiguing sounds. Also, more sophisticated considerations can be included in the sonication design. In a cloud sonication we rst sonied each individual spin as a very short sound grain, and played them at high temporal density. A 32x32 lattice (1024 points) can be played within one second, and allowing some overlap, this leaves on the order of 3 ms for each sound grain. One second is a longer than desirable time for going through one entire time instant, but this is simply a trade-o between representing all the available data for that time instant, and moving forward in time fast enough. For bigger lattices, this approach is too slow for practical use. Thus the next step was calculating local mean values. We took random averaged spin blocks in the Ising model6 , see gure 7.10, so the data was pre-processed for the sonication, and we did not use all available information. At rst, for each conguration a
In this sonication we stayed with the simpler Ising model due to realtime CPU limitations, but the results do transfer to the Potts model.
6
118
Figure 7.10: Granular sonication scheme for the Ising model.

The spatial location of each randomly chosen spin block within the grid determines its spatialisation, and its averaged value determines pitch and noisiness of the corresponding grain.
few lattice sites are chosen; then for each site, the average of its neighbouring region is calculated, giving a mean magnetic moment between 1 (all negative) and +1 (all positive); 0 meaning the ratio of spins is exactly half/half. This information is used to determine the pitch and the noisiness of a sound grain. The more the spins in one block are alike, the clearer the tone (either lower or higher), the less alike, the noisier the sound. Location of the block in 3D space is given by spatial position of the sound grain.7 The soundgrains are very short and played quickly after one another from dierent virtual regions. With this setting, a three-dimensional gestalt of the local state of a cubic lattice is generated around and above the listener. Without seeing the state of the model, a clear picture emerges from the granular sound
This spatial aspect can only be properly reproduced with a multi-channel sound system. We adapted the settings for the CUBE, a multi-functional performance space with a permanent multichannel system at the IEM Graz. Using the VirtualRoom class described in section 5.5, one can also render this sonication for headphones.
7
119 texture, and also untrained listeners can easily distinguish the phases of the model.
7.2.7
Sonication of self-similar structures
To study a detail aspect of the above approach, we looked at self-similar structures at the point of phase transition by sonication. Music has been considered to exhibit self-similar structures, beginning with (Voss and Clarke (1975, 1978)); later on, the general popularity of self-similarity within chaos theory has also extended to computer music, and the hypothesis that self-similar structures may be audible has led to a lot of experimentation and compositions with such conceptual background. In internal listening tests we tried to display structures on several orders of magnitude in parallel. These were calculated by a blockspin transformation, which returns essentially the spin orientation of the majority of points in a region of the lattice. It was our goal to make such structures of dierent orders of magnitude recognisable as similarly moving melodies, or as a unique sound stream with a special sound quality.
Figure 7.11: A self similar structure as a state of an Ising model.

This is used as a test case for detecting self similarity. Blockspins are determined by the majority of spins of a certain region.
In our design, three orders of magnitude in the Ising model were compared to each other, as shown in gure 7.11. The whole lattice (on the right side - with the least resolved blockspins) was displayed in the same time as a quarter of the middle and as an eighth of the left blockspin spin structure (second on the left side). The original spins are shown on the left. Comparing three simultaneous streams for similarities turned out to be a demanding cognitive task: Trying to follow three streams and comparing their melodic behaviour at the same time is not trivial, even for trained musicians. Thus we experimented with an alternative: the three streams representing dierent orders of magnitude are interleaved quickly. When the streams are self-similar, one only hears a single (random) stream; as soon as one stream is recognisably dierent from the others, a triple grouping emerges. While this method works well with simple test data as shown
120 in g. 7.11, we could not verify self-similarities in noisy data of running spin models. We suspect that self-similar structures do not persist long enough for detection in running models, but for time reasons did not pursue this further.
7.2.8
Evaluation
Domain Expert Opinions A listening test with statistical analysis was not appropriate as there were not enough subjects available familiar with researching spin models. Thus, as a qualitative evaluation we obtained opinions from experts in the eld. These were four professors of Theoretical Physics in Graz, who were not directly involved in the sonication designs. They were explained the results and given a few questions on the applicability and usefulness of the results. The overall attitude may be summed up as curious but rather sceptical, even if the opinions diered in the details. Asked whether they themselves would use the sonications, all of them answered they would do so only for didactic reasons or popular scientic talks. The possibility of identifying dierent phases was acknowledged but was not seen as superior to other methods (e.g. studying graphs of observables, as would be the standard procedure). One subject remarked that, for research purposes, the aha-moment was missing. This might be due to the fact that the Ising and Potts model have both been studied for decades and are well understood. While the data is mainly thermal noise, there is only little information to extract. Our sonications reveal no new physical ndings for the models we chose. A three dimensional display seems interesting for the experts, even if the dimensions are not experienced explicitly (in the audication approach there is a sequentialisation for displaying one dimension) and the sound grain approach as implemented only applies for three physical dimensions. Another application that was discussed is a quick overview over large data sets: e.g. checking numerical parameters (that there are enough equilibration steps, for instance) or getting a rst impression of the order of the phase transition. This seems plausible to all subjects, even if the standard procedure, e.g. a program for pattern recognition, would still be equivalent and - given the familiarity with such tools - preferable to them. The main point of criticism was the idea of a qualitative rather than quantiable approach towards physics, which is seen as a possible didactics tool but not hard science. General sonication problems were discussed as well: it was noted that visualisation techniques play a more and more important role in science, and that they are tough competitors. Also for state of the art of publishing, sonication is at a disadvantage. Besides this expected scepticism, it can be remarked that all subjects heard immediately the dierences in the sound qualities. Metaphors to the sounds came up spontaneously during the introduction, as e.g. boiling water for the point of phase transition. The
121 experts came up with several ideas for future projects to discuss; this kind of interest is an encouraging form of feedback. Conclusions and Possible Future Work Spin models are interesting test cases for studying sonication designs for running models. We implemented both Monte Carlo simulations of Potts and Ising models and sonication variants in SuperCollider3. These models produce dynamically evolving data with their main characteristics being uctuations of single spins; although analytically well dened, nite computational models can only reproduce a numerical approximation of the predicted behaviour, which has to be interpreted. A number of dierent sonications were designed in order to study dierent aspects of these spin models. We created tools for the perceptualisation of lattice calculations which are extensible to higher dimensions and a higher number of states. They allow both observing running models, and analysing pre-recorded data to obtain a rst impression of the order of the phase transition. Experimenting with alternative sonication techniques for the same models, we found diering sets of advantages and drawbacks: Granular sonication of spin blocks gives a reliable classication of the phase the system is in, and allows to observe running simulations, using the random behaviour of spin models. Audication based tools allow us to make use of all the available data, and even track each spin orientation separately in parallel. This tool is used to study the order of the phase transition. Additionally, we worked on sonications of self similar structures. With this study, sonication was shown to be an interesting complementary data representation method for statistical physics. Useful future directions for extending this work would include increased data quality and choices of dierent input models, which would lead to classication tools for phase transitions that allow studying models of higher dimensionality. Continued work in this direction could lead to applications in current research questions in the eld of computational physics. The research project QCDAudio hosted at IEM Graz with SonEnvir participant Kathi Vogt as lead researcher will explore some of these directions.
Chapter 8
Examples from Speech Communication and Signal Processing

The Signal Processing and Speech Communication Laboratory at TU Graz focuses on research in the area of non-linear signal processing methods, algorithm engineering and applications thereof in speech communication and telecommunication. After investigating sonication approaches to the analysis of stochastic processes and wave propagation in ultra-wide-band communication (briey mentioned in de Campo et al. (2006a)), the focus for the last phase in SonEnvir was on the analysis of time series data. In signal processing and speech communication, most of the data under study are sequences of values over time. There are many properties of time series data that interest the researcher. Besides analysis in the frequency domain, the statistical distribution of values provides important information about the data at hand. With the Time Series Analyser, we investigated the use of sonication in analysing the statistical properties of amplitude distributions in time series data. From the domain sciences point of view, this can be used as a method for the classication of signals of unknown origin, or for the classication of surrogate data to be used in experiments in telecommunication systems.
8.1
Time Series Analyser1
The analysis of time series data plays a key role in many scientic disciplines. Time series may be the result of measurements, unknown processes or simply digitised signals of a variety of origins. Although usually visualised and analysed through statistics, the inherent relationship to time makes them particularly suitable for a representation by means of sound.
1
This section is based on the SonEnvir ICAD paper Frauenberger et al. (2007).
122
123
8.1.1
Mathematical background
The statistical analysis of time series data is concerned with the distribution of values without taking into account their sequence in time. As we will see later, changing the sequence of values in a time series completely destroys the frequency information while keeping the statistical properties intact. The most well known statistical properties of time series data is the arithmetic mean (8.1) and the variance (8.2). x = 2 = 1 n 1 n
n
xi
i=1 n
(8.1) (8.2)
(xi x)2
i=1
However, higher order statistics provide more properties of time series data, describing the shape of the underlying probability function in more detail. They all derive from the statistical moments of a distribution dened by
n
n =
i=1
(xi )n P (x)
(8.3)
where n is the order of the moment, the value around which the moment is taken and P (x) the probability function. The moments are most commonly taken around the mean, which is equivalent to the rst moment 1 . The second moment around the mean (or second central moment) is equivalent with the variance 2 and hence, the squared standard deviation . Higher order moments dene the skewness and kurtosis of the distribution. The skewness is a measurement for the asymmetry of the probability function, meaning a distribution has high skewness if its probability function has a more pronounced tail toward one end than to the other. The skew is dened by 1 = 3
3 2 2
(8.4)
with i being the i th central moment. The kurtosis describes the peakedness of a probability function; the more pronounced peaks there are in the probability function, the higher the kurtosis of the distribution. It is dened by 2 = 4 2 2 (8.5)
Both values distinguish time series data and are signicant properties in signal processing. From the SDSM point of view, the inherent time line and the typically large numbers of data values in time series data suggest the use of the most direct approach to auditory perceptualisation - audication. When interpreted as a sonic waveform the statistical
124 properties of time series data become acoustical dimensions, which may be perceived: The variance corresponds directly to the power of the signal, and hence (though nonlinearly) to its perceived loudness. The mean, however, is nothing more than an oset and is not perceivable. The question of interest is whether the skewness and the kurtosis of signals can be related to perceptible dimensions as well.
8.1.2
Sonication tools
In order to investigate the statistical properties of time series data by audication, we rst developed a simple tool that allows for dening arbitrary probability functions for noise. Subsequently, we built a more generic analysis tool that makes it possible to analyse any kind of signal. This tool was also used as the underlying framework for the experiment described in section 8.2.
8.1.3
The PDFShaper
The PDFShaper is an interactive audication tool that allows users to draw probability functions and listen to the resulting distribution as an audication in real-time. Figure 8.1 shows the user interface. PDFShaper provides four graphs (top down): the probability function, the mapping function, the measured histogram and the frequency spectrum of the time series synthesised as specied by the probability function. The tool allows the user to interactively draw in the rst graph to create dierent kinds of amplitude distributions. It then calculates a mapping function which is dened by C(x) = 1 = g(x)
x
P (t)dt
0
(8.6)
where C(x) is the cumulative probability function and g(x) is a mapping function that if applied to a uniform distribution y produces values according to the probability function P (t). This mapping function essentially shapes values from a uniform distribution to any desired probability function P (t). In the screenshot shown, the probability function is drawn into the top graph as a shifted exponential function. After applying the mapping function shown in the second graph to white noise, the third graph shows the real-time histogram of the result. It approximately resembles the target probability function. Note that both skew and kurtosis are relatively high in this example as the probability function is shifted to the right and has a sharp peak.
125
Figure 8.1: The PDFShaper interface
8.1.4
TSAnalyser
The TSAnalyser is a tool to load any time series data and analyse its statistical properties. Figure 8.2 shows the user interface. Besides providing statistical information about the le loaded (ai format) it shows a histogram and a spectrum. Its main feature is to be able to scramble the signal.
126
Figure 8.2: The TSAnalyser interface
That is, it randomly re-orders the values in the time series and hence, destroys all spectral information. When analysing amplitude distributions, the spectral information is often distracting. Scrambling a signal will result in a noise-like sound with the same statistical properties as the original. In the screenshot the loaded le is a speech sample that comes with every SuperCollider installation. When scrambled, the spectrum at the bottom shows an almost uniform distribution in the frequency domain. Both PDFShaper and TSAnalyser are implemented in SuperCollider, and available as part of the SonEnvir Framework by svn here2 .
2
https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Framework/
127
8.2
Listening test
The experiment described here was designed to investigate whether the higher order statistical properties of arbitrary time series data are perceptible when rendered by audication. If so, what are the perceptual dimensions that would correlate to these properties, and what are the just noticeable dierence levels?
8.2.1
Test data
The rst challenge in designing the experiment was to create appropriate data. They should not contain any spectral information and the statistical properties should be fully controllable, ideally independently. Unfortunately, it is a non-trivial task to dene probability functions with certain statistical moments, as this is an ill-dened problem. We settled on a random number generator for the Levi skew alpha-stable distribution Wikipedia (2007). It was chosen because it features parameters that directly control the resulting skew and kurtosis which also can be made atypically high. It is dened by the probability function f (x; , , c, ) = 1 2
+ inf
(t)eitx dt
(8.7) (8.8) (8.9)
(t) = e
inf it|ct| (1isign(t))
= tan( ) 2
Where is an exponent, directly controls the skewness and c and are scaling parameters. There is no analytic solution to the integral, but there are special cases in which the distribution behaves in specic ways. For example, for = 2 the distribution reduces to a Gaussian distribution. Fortunately, the Levi distribution was implemented as a number generator in the GNU Scientic Library GSL, see GSL Team (2007). It allows for generating sequences of numbers of any length for a distribution determined by providing the and parameters. For the experiment we generated 24 signals with skew values ranging from -0.19 to 0.25 and kurtosis ranging from 0.17 to 14. It turned out to be impossible to completely decouple skew from kurtosis. So, we decided to generate two sets, one that has insignicant changes in skew, but a range in kurtosis of 0.16 to 14, while the other set covered the full range for skew and 0.15 to 5 for kurtosis. All signals were normalised to have a variance of 0.001 and were 3 seconds long (at a samplerate of 44.1 kHz) with 0.2 seconds fade-in and fade-out times.
128
8.2.2
Listening experiment
The experiment was designed as a similarity listening test. Participants were listening to sequences of three signals and had to select the two signals they perceived as being most similar. Each sequence was composed of the signal under investigation (each of the 24), a second randomly chosen signal out of the 24, and the rst signal scrambled; the three signals were in random order. It was pointed out to participants that they will not hear two exactly identical sounds within the sequence, but they were asked to select the two that sounded most similar. The signal under investigation and its scrambled counterpart were essentially dierent signals, but shared identical statistical properties. It was not specied which quality of the sound they should listen for to make this decision. This and the scrambling was done to make sure that participants focus on a generic quality of the noise rather than specic events within the signals. After a brief written introduction into the problem domain and the nature of the experiment, participants started o with a training phase of three sequences to learn the user interface. For this training phase, the signals with the largest dierences in skew and kurtosis were chosen to give people an idea of what to expect. Subsequently, each of the sets were played; Set one with 9 sequences, Set two with 15. The sequence of the sets was altered with each participant. Participants were also able to replay the sequence as often as they wished and adjust the volume to their taste. Figure 8.3 shows the user interface used.
Figure 8.3: The interface for the time series listening experiment.
A post-questionnaire asked for the sound quality participants used to distinguish the signals and asked them to assign three adjectives to describe this quality. Furthermore, participants were asked whether participants could tell any dierence between the sets, and whether they felt there was any learning eect, i.e., whether the task became easier during the experiment.
129
8.2.3
Experiment results
Eleven participants took part in the experiment, most of them working colleagues or students at the institute. Four participants were members of the SonEnvir team and had more substantial background on the topic which, however, did not seem to have any impact on their results. The collected data shows that there is a signicant increase in the probability of choosing the correct signals as the dierence in kurtosis and skew increased. Figure 8.4 shows the average probabilities in four dierent ranges of kurtosis. The skew in this set was nearly constant (0.001), so the resulting dierence in correct answers is related to the change in kurtosis. While up to a dierence of 5 in kurtosis the probability
Figure 8.4: Probability of correctness over kurtosis in set 1
is only insignicantly higher than 0.333 (the probability of random answers), and even decreases, there is a considerable increase thereafter, topping at over 70% at dierences of around 11. This indicates that 5 is the threshold for just noticeable dierences for kurtosis. This is also supported by the results from set 2 as shown in gure 8.5. For skewness the matter was more dicult as we had no independent control over it. Although the data from set 2 suggest that there is an increase in probability with increasing dierence in skew (as shown in gure 8.6), this might also be related to the dierence in kurtosis. Looking at the probability of correctness over both, the dierence in kurtosis and the
130
Figure 8.5: Probability of correctness over kurtosis in set 2
Figure 8.6: Probability of correctness over skew in set 2
dierence in skew (as in gure 8.7) reveals that it is unlikely that the increase is related to the change in skewness. While in every spine in which skew is constant the
131 probability increases with increasing kurtosis, this is not the case vice versa.
Figure 8.7: Probability of correctness over skew and kurtosis in set 2
Summarising, we found evidence that participants could reliably detect changes in kurtosis greater than 5, but we did not nd enough proof for the case of skewness. This may indicate that we need to use a dierent dataset which has bigger dierences in skew while having small values for the kurtosis. However, for this another family of distributions must be found. The number of times participants used the replay option seemed to have no impact on their performance. Figure 8.8 shows the number of replays of all data points over kurtosis. Red crosses indicate correct answers, black dots incorrect answers. Although participants replayed the sequence more often when the dierence in kurtosis was small, there is no evidence that they were more successful when using more replays. The answers to the post-questionnaire must be seen in the light of the data analysis above. The quality participants assessed to drive their decisions must be linked to the kurtosis rather than skewness in the signal. The most common answers for this quality were crackling and the frequency of events. Others included roughness and spikes. However, some participants also stated that they heard dierent colours of noise and other artefacts related to the frequency spectrum. This is a common eect when being exposed to noise signals for a longer period of time. Even if the spectrum of noise is not changing at all (as in our case), humans often start to imagine hearing tones
132
Figure 8.8: Number of replays over kurtosis in set 2
and other frequency related patterns. Asked for adjectives to describe the quality the participants provided cracking, clicking, sizzling, annoying, rhythmic, sharp, rough and bright/dark. In retrospect, this correlates nicely with the kurtosis being the peakedness of the probability function. There was no agreement over which set was easier. Most participants said there was hardly any dierence while some would state the one or the other. Finally, on average people felt that there was no learning curve involved and the examples were short enough for them not to get too tired over listening to them.
8.2.4
Conclusions
In this section we presented an approach for analysing statistical properties of time series data by auditory means. We provided some background on the mathematics involved and presented the tools for audication of time series data that were developed. Subsequently, we described a listening test designed to investigate the perceptual dimensions that would correlate with higher order statistical properties like skew and kurtosis. We discussed the data chosen and the design of the experiment. The results show that there is evidence that participants improved in distinguishing noise signals as the difference in kurtosis increased. The data suggests that in this setting the just noticeable dierence was 5. However, for skew we were not able to nd similar evidence. In a
133 post-questionnaire we probed for the qualities that participants used to distinguish the signals and obtained a set of related adjectives. Future work will have to investigate why there was nothing to be found for skewness in the signals. It might have been the case that our range of values did not allow for segregation by skew, and a dierent source for data will have to be found to have independent control over skew. However, it might also be the case that skew is not perceivable in direct audication and a dierent sonication approach has to be chosen to make this property perceptible. In SDSM terms, the listening experiment respected the 3 second echoic memory time limit, maximising the number of data points to t into that time frame by audifying at a samplerate of 44.1 kHz.
Chapter 9
Examples from Neurology
9.1
Auditory screening and monitoring of EEG data
This chapter describes two software implementations for EEG data screening and realtime monitoring by means of sonication. Both have been designed in close collaboration with our partner institution, the University Clinic for Neurology at the Medical University Graz. Both tools were tested in depth with volunteers, and then tested with the expert users they are intended for, i.e. neurologists who work with EEG data daily. In the course of these tests, a number of improvements to the designs were realised; both the tests and the nal versions of the tools are described in detail here. This scope of reported work is intended to provide an integrated description and analysis of all aspects of the design process from sonication design issues, interaction choices, user acceptance, to steps towards clinical use. This work is described with much more neurological background in the PhD thesis by Annette Wallisch (Wallisch (2007), in German). This chapter is based on a SonEnvir paper for ICAD 2007 (de Campo et al. (2007)), and this work is also briey documented online in the SonEnvir data collection1 , with accompanying sound examples.
9.1.1
EEG and sonication
As the general background for EEG and sonication is covered extensively in a number of papers (Baier and Hermann (2004); Hermann et al. (2006); Hinterberger and Baier (2005); Mayer-Kress (1994); Meinicke et al. (2002)), it is kept rather brief here. EEG is short for electroencephalogram, i.e. the registration of the electrical signals coming from the brain that can be measured on the human head. There are standard systems where to locate electrodes on the head, called montages; e.g., the so-called 10-20 system, which spaces electrodes at similar distances over the head (see Ebe and
1
http://sonenvir.at/data/eeg/
134
135 Homma (2002) and many other EEG textbooks). The signal from a single electrode is often analysed in terms of its characteristic frequency band components: The useful frequency range is typically given as 1-30 Hz, sometimes extended a little higher and lower. Within this range, dierent frequency bands have been associated with particular activities and brain states; e.g., the alpha range is between 8 and 13 Hz, associated with a general state of relaxedness, and non-activity of the brain region for visual perception; thus alpha activity is most prominent with eyes closed. For both sonication designs presented, we split the EEG signal into frequency ranges which closely correspond to the traditional EEG bands2 , as shown in table 9.1. Table 9.1: Equally spaced EEG band ranges. EEG band name frequency range deltaL(ow) 1 - 2 Hz deltaH(igh) 2 - 4 Hz theta 4 - 8 Hz alpha (+ mu) 8 - 16 Hz beta 16 - 32 Hz gamma 32 - 64 Hz
9.1.2
Rapid screening of long-time EEG recordings
For a number of neurological problems, it is standard practice to record longer time stretches of brain activity. A stationary recording usually lasts more than 12 waking hours; night recordings are commonly even longer, up to 36 hours. For people with so-called absence epileptic seizures (often children), recordings with portable devices are made over similar stretches of time. These recordings are then visually screened, i.e. looked through in frames of 20-30 seconds at a time; this process is both demanding and slow. For the particular application toward absences, rapid auditory screening is ideal: these seizures tend to spread over the entire brain, so the risk of choosing only few electrodes to screen acoustically is not critical; furthermore, the seizures have quite characteristic features, and are thus relatively easy to identify quickly by listening. For more general screening, nding time regions of interest quickly (by auditory screening) potentially reduces workload and increases overall diagnostic safety. With visual and auditory screening combined, the risk of failing to notice important events in the recorded brain activity
The alpha band we employ is slightly wider than the common 8-13 Hz; we merge it with the slightly higher mu-rhythm band to maintain equal spacing.
2
136 is quite likely reduced.
9.1.3
Realtime monitoring during EEG recording sessions
A second scenario that benets from sonication is realtime monitoring while recording EEG data. This is a long-term attention task: an assistant stays in a monitor room next to the room where the patient is being recorded; s/he watches both a video camera view of the patient, and the incoming EEG data on two screens. In the event of atypical EEG activity (which must be noticed, so one can intervene if necessary), a patient may or may not show peculiar physical movements. Watching the video camera, one can easily miss atypical EEG activity for a while. Here, sonication is potentially very useful, because it can alleviate constant attention demands: One can easily habituate to a background soundscape, which is known to represent everything is normal. When changes in brain activity occur, the soundscape changes (in most cases, activity is increased, which increases both volume and brightness), and this change in the realtime-rendered soundscape automatically draws attention. A sonication design that aims to render EEG data in real time is also useful for studying brain activity as recorded by EEG devices at its natural speed: One can easily portray activity in the traditional EEG frequency bands acoustically; as many of the phenomena are commonly considered to be rhythmical phenomena, auditory presentation is particularly appropriate here, see Baier et al. (2006). Realtime uses of biosignals have other applications too, see e.g. Hinterberger and Baier (2005); Hunt and Pauletto (2006).
9.2
9.2.1
The EEG Screener

Sonication design
For rapid EEG data screening, there is little need for an elaborate sonication design. As the signal to be sonied is a time signal, and a signal speed of several 10000s of points per seconds is deemed useful for screening, straightforward audication is the obvious choice recommended by the Sonication Design Space Map. Not doing any other processing allows for keeping the rich detail of the signals entirely intact. With common EEG sampling rates around 250 Hz, a typical speedup factor is 60x faster than real time, which transposes our center band (alpha, 8-16Hz) to 480-960 Hz, well in the middle of the audible range. For more time resolution, one can go down to 10x, or for more speedup, up to 360x. See Figure 9.1 for locations on the Sonication Design Space Map.
137
Figure 9.1: The Sonication Design Space Map for both EEG Players.
As there is no total size for EEG les (they can be anything from a few minutes to 36 hours and more), Data Anchors are given for one minute and for one hour (center, and far right). The labels Scr x10, Scr x60, and Scr x360 shows the map locations for minimum, default, and maximum settings of speedUp, i.e. the time scaling of the EEGScreener (bottom right). The labels RTP 1band and RTP 6bands show the locations for a single band and all six bands of the EEGRealtimePlayer. Note that the use of two audio channels moves both of these designs inwards along the number of streams axis, which is not shown here for simplicity.
This allows for wide ranges of time scales of local structures in the data to be put into the optimum time window (the ca. 3 second window of echoic memory, see section 5.1 and de Campo (2007b)), while keeping the inner EEG bands well in the audible range; if needed, one can compensate for reduced auditory sensitivity to the outer bands by raising their relative amplitudes. A lowpass lter for the EEG signal is available from 12 to 75 Hz, with a default value at 30 Hz, to provide the equivalent of visual smoothing used in EEG viewer software. Our users wanted that feature, and it is a simple way to reduce higher band activity, which is mostly considered noise (from a visual perspective that is). A choice is provided between the straight audied signal, and a mix of six equalbandwidth layers, which can all be individually controlled in volume. This allows both for focused listening to individual bands of interest, and for identication of the EEG band in which a particular audible component occurs. A further reason to include this
138
Figure 9.2: The EEGScreener GUI.

The top rows are for le, electrodes, and time range selection. Below the row for playback and note-taking elements are the playback parameter controls, and band ltering display and controls.
band-splitting was to introduce the concept in a simpler form, such that users could transfer the idea to their understanding of the realtime player.
9.2.2
Interface design
The task analysis for the Screener demanded that a graphical user interface be simple to use (low-eort, little training needed), fast, and to provide for keeping reproducible results of screening sessions. Furthermore, it should provide choices of what to listen to, and visual feedback of what exactly one is hearing, and how. The GUI elements are similar to sound le editors (which audio specialists are familiar with, but EEG specialists usually are not). File, electrode, and range selection
139 The button Load EDF is for selecting a le to be screened. Currently, only .edf3 les are supported, but other formats are easy to add if needed. The text views next to it (top line) provide le data feedback: le name, duration, and montage type the le was recorded with4 . The button Montage opens a separate GUI for choosing electrodes by location on the head (see gure 9.3).
Figure 9.3: The Montage Window.

It allows for electrode selection by their location on the head (seen from above, the triangle shape on top being the nose). One can drag the light gray labels and drop them on the white elds Left and Right.
The popup menus Left and Right let users choose which electrode to listen to on which of the two audio channels. Like many soundle editors, the signal views Left and Right show a full-length overview of the signal of the chosen electrodes. During screening, the current playback position is indicated by a vertical cursor. The range slider Selection and the number boxes Start, Duration, End show the current selection and allow for selecting a range within the entire le to be screened. The number box Cursor shows the current playback position numerically. The signal views Left Detail and Right Detail show the waveform of the currently selected electrodes zoomed in for
A common format for EEG les, see http://www.edfplus.info/ As edf les do not store montage information, this is inferred from the number of EEG channels in the le; at our institution, all the raw data montage types have dierent numbers of channels.
4 3
140 the current selection. Playback and note taking The buttons Play, Pause, Stop start, pause, and stop the sound. The button Looped/No Loop switches between once-only playback and looped playback (with a click to indicate when the loop restarts). The button Filters/Bypass switches playback between Bypass mode (the straight audied signal, only low-pass-ltered), and Filters mode, the mixable band-split signal. The button Take Notes opens a text window for taking notes during screening. The edf le name, selected electrodes and time region, and current date are pasted in as text automatically. The button Time adds the current playback time at the end of the notes windows text, and the button Settings adds the current playback settings (see below) to the notes window text. To let the user concentrate on listening while screening a le, it is possible to stay on the notes window entirely: Key shortcuts allow for pausing/resuming playback (e.g. to type a note), for adding the current time as text (so one can take notes for a specic time), and for the current playback settings as text. Playback Controls These control the parameters of the screeners sound synthesis. speedUp sets the speedup factor, with a range between 10-360; the default value of 60 means that one minute of EEG is presented within one second. Note that this is straightforward tape-speed acceleration, which preserves full signal detail. The option to compare dierent time-scalings of a signal segment allows for learning to distinguish mechanical (electrode movements) and electrical artifacts (muscle activity) from EEG signal components. lowPass sets the cuto frequency for the lowpass lter, range between 12 and 75 Hz, with a default of 30 Hz. clickVol sets the volume of the loop marker click, and volume sets the overall volume. In Bypass mode, only the meter views are visible in this section, and they display the amount of energy present in each of the six frequency bands (deltaL, deltaH, theta, alpha, beta, gamma). In Filters mode, the controls become available, and one can raise the level of bands one wants to focus on, or turn down bands that distract from details in other bands. The buttons All On / All O allow for quickly resetting all levels to defaults.
9.3
The EEG Realtime Player
The EEGRealtimePlayer allows listening into details of EEG data in real time (or up to 5x faster when playing back les), in order to follow temporal events in or near their
141 original rhythmic contour. This design (and its eventual distribution as a tool) has been developed in two stages: Stage one is a data player, which plays recorded EEG data les at realtime speed with the same sonication design (and the same adjustment facilities) as the nal monitor application. This allows for familiarising users with the range of sounds the system can produce, for experimenting with a wide variety of EEG recordings, and for nding settings which work well for a particular situation and user. This stage is described here. Stage two is an add-on to the software used for EEG recording, diagnosis, and administration of patient histories at the institute. Currently, this stage is implemented as a custom version of the EEG recording software which simulates data being recorded now (by reading a data le), and sending the incoming data by network on to a special version of the Realtime player (i.e. the sound engine and interface). Here, the incoming data is sonied with the same approach as in the player-only version. Eventually, this second program is meant be implemented within the EEG software itself.
9.3.1
Sonication design
The sonication design for real time monitoring is much more elaborate than the screener. It was prototyped by Robert Hldrich in MATLAB, and subsequently adapted and imo plemented for realtime interactive use in SC3 by the author. For a block diagram, see g. 9.4. The EEG signal of each channel listened to is split into six bands of equal relative bandwidth (one octave, 1-2, 2-4, ... 32-64 Hz). Each band is sonied with its own oscillator and a specic carrier frequency: based on a user-accessible fundamental frequency baseFreq, the carriers are by default multiples of baseFreq by integer numbers, 1, 2, ... 6. If one wants to achieve more perceptual separation between the individual bands, one can deform this overtone pattern with a stretch factor harmonic, where 1 is pure overtone tuning: carF req = baseF req i harmonici1 (9.1) The carrier frequency in each band is modulated with the band-ltered EEG signal, thus creating a representation of the signal shape details as deviation from center pitch. The amplitude of each oscillator band is determined by the amplitude extracted from the corresponding lter-band, optionally stretched by an expansion factor contrast; this creates a stronger foreground/background eect between bands with low energy and bands with more activity. For realtime monitoring as a background task, a second option for emphasis exists: high activity levels activate an additional sideband modulation at carFreq * 0.25, which creates a new fundamental frequency two octaves lower. This should be dicult to miss even when not actively attending.
142
Figure 9.4: EEG Realtime Sonication block diagram.
143
Figure 9.5: The EEG Realtime Player GUI.

Note the similarities to the EEGScreener GUI; the main dierence is the larger number of synthesis control parameters.
Finally, for le playback, crossing the loop point of the current selection is acoustically marked with a bell-like tone.
9.3.2
Interface design
Most elements (buttons, text displays, signal views, notes window) have the same functions as in the EEGScreener. The main dierence to the EEGScreener is that there are many more playback controls, since the sonication model (as described above) is much more complex. The Playback controls are ordered by importance from top to bottom: contrast ranges from 1-4; values above 1 expands the dynamic range, making active
144 bands louder and thus moving them to the foreground relative to average-activity bands. For background monitoring, levels between 2-3 are recommended. baseFreq is the fundamental frequency of the sonication, between 60-240 Hz; this can be tuned to user taste - and our users have in fact expressed strong preferences for their personal choice of baseFreq. freqMod is the depth of frequency modulation of the carrier for each band. At 0, one hears a pure harmonic tone with varying overtone amplitudes, at greater values, the pitch of the band is modulated up and down, driven by the ltered signal of that band. Thus the signal details of the activity in that band are rendered in high perceptual resolution. A value of 1 is normal deviation. emphasis fades in a new pitch two octaves below baseFreq for very high activity levels; this can be used for extra emphasis in background monitoring. harmonic is the harmonicity of the carrier frequencies: A setting of 1 means purely harmonic carrier frequencies, less compresses the spectrum, and more expands it; this can be used to achieve better perceptual band separation. clickVol sets the volume of the loop marker click, volume sets the overall volume of the sonication, and speed controls an optional speedup factor for le playback, with a range between 1-5, 1 being realtime; in live monitoring mode, this control is disabled. Band Filter Controls and Views The buttons All On and All O allow for setting all levels to medium or zero. The meter views show the amount of energy present in each of the six frequency bands, and the sliders next to them set the volume of each frequency band.
9.4
9.4.1
Evaluation with user tests

EEG test data
For development and testing of the sonication players described, a variety of EEG recordings - containing typical epileptic events and seizures - was collected. This database was assembled at the Department for Epileptology and Neurophysiological Monitoring (University Clinic of Neurology, Medical University Graz), by using the in-house archive system. It contains anonymous data of currently or recently treated patients. For the expert users tests, three data examples were chosen, suited for each players special purpose. For the Screener, rather large data sets were selected, to test with a realistic usage example. Two measurements of absences and one day/night EEG with seizures localized in the temporal lobe were prepared. The Realtime Player was tested with three short data les; one a normal EEG (containing eye movement artefacts and
145 alpha waves), and two pathological EEGs (generalized epileptic potentials, and frontotemporal seizures). The experts we worked with considered the use of audition in EEG-diagnostics very unusual. We expected them to nd it dicult to associate sounds with the events, so they did some preliminary sonication training: For all data examples, they could look at the data with their familiar EEG viewer software after having listened rst, and try to match what they had heard with the visual graphs familiar to them.
9.4.2
Initial pre-tests
An initial round of tests was done to get a rst impression of usability, and data appropriateness, which also contained experimental tasks (learning to listen). In order to obtain independent and unbiased opinions, two interns were invited to test the rst versions of the screener and the realtime player by listening through the entire prepared database at their own pace. They were instructed to take detailed notes of the phenomena they heard (including inventing names for them), and where in which les; they spent roughly 40 hours on this task. The documentation of their listening experiments was then veried in internal re-listening and testing sessions. After these pre-tests, we decided to reduce some parameter ranges to prevent users from choosing too extreme settings, and we chose a smaller number of data sets for the second test round with expert users.
9.4.3
Tests with expert users
As the eventual success of these players depends on acceptance by the users in a clinical setting, it was essential to do an evaluation with medical specialists. This was done by means of two feedback trials; using the results of the primary expert test round, the players were then improved in many details. For both players we made pre/postcomparisons of user ratings between the dierent versions. Even though we tested with the complete potential user group at our partner institution, the test group is rather small (n=4); thus we consider the tests, and especially the open question/personal interviews section, as more qualitative than quantitative data. To prepare the four specialists for their separate test sessions, they were introduced to the new aspects of data evaluation and experience by sonication in a group session. For each EEG player a separate test session was scheduled to avoid listening overload and potential confusion. Questionnaire The questionnaire contained the following 11 scales: The ratings to give for each statement ranged from 1 (strongly disagree) to 5 (strongly
146 Table 9.2: Questionnaire scales for EEG sonication designs 1 2 3 4 5 6 7 8 9 10 11 Usability Clarity of interface functions Visual design of interface Adjustability of sound (to individual taste) Freedom of irritation (caused by sounds) Good sound experience (i.e. pleasing) Allows for concentration Recognizability of relevant events in data by listening Comparability (of observations) with EEG-Viewer software Practicality in Clinical Use (estimated) Overall impression (personal liking)
agree). In addition to the 11 standardized questions, space for individual documentation and description was provided. Moreover, an open question asked for further comments, observations, and suggestions. Results of rst expert tests This initial round of tests resulted in a number of improvements in both players: Elaborate EEG waveform display and data range selection was added to both; the visual layout was unied to emphasize elements common to both players; and the screener was extended with band ltering, which is both useful in itself, and a good mediating step toward the more complex realtime sonication design.
9.4.4
Analysis of expert user tests EEG Screener 1 vs. 2
Optimizing the interface and interaction possibilities for version 2 of the Screener improved most of its ratings sustantially: it was considered to oer more comfortable use (+1) and more attractive visual design (+1). The sound experience for the medical specialists has improved somewhat (+0.5), while the freedom of irritation experienced improved very much (+2.0). While all other criteria improved substantially, recognizability of events, comparability with viewer software, and clinical practicality received lower ratings (between -0.5 and -0.25). We suspect that the better rating in the rst test round may have been enthusiasm about the novelty of this tool. Thus, personal conversation with the expert users after the tests showed how strongly opinions diered: One user did not feel safe and comfortable with the screener and could not trust his own hearing skills enough to discriminate relevant information from (technical) artefacts by listening.
147
Figure 9.6: Expert user test ratings for both EEGScreener versions.
By contrast, the three others were quite relaxed and felt positively reassured to have done their listening tasks properly and eectively. Furthermore, the users probably were less motivated in comparing the EEG viewer to the listening result (which was asked in one question), as they had done that carefully in the rst tests already. Overall, all users reported much higher satisfaction with version 2 of the screener (+1). The answers in the open comments section can be summarized as follows: All users conrmed better usability, design, clarity and transparency of version 2. Some improvements were suggested in the visualization of the selected EEG channels, in particular when larger les are analysed. Moreover, integration of the sonication into the real EEG viewer would be appreciated a lot. A plug-in version of the player for the EEGSoftware used (NeuroSpeed by B.E.S.T. medical) was already in preparation before the tests; in eect, the expert users conrmed its expected usefulness.
9.4.5
Analysis of expert user tests - RealtimePlayer 1 vs. 2
The mean estimation of the second realtime player version shows a positive shift in nearly all scales of the questionnaire. Moreover, the range of the ratings is smaller than before, so the answers were more consistent. The best ratings were given for visual design (+1),
148
Figure 9.7: Expert user test ratings for both RealtimePlayer versions.
adjustability of sound (+1) and comparability to viewer (+1.5), all estimated with good to very good. The overall impression was now estimated as good (+1), as well as usability (+0.5), clarity of interface (+0.5), and good sound experience (+1). The aspects recognizability of relevant EEG events ( +1) and practical application (+1) are estimated similarly satisfying. The only item that remains at the same mean rating is freedom of irritation, estimated as a little better than average. The same rating was given for allowed concentration (+1.5), which has improved very much. Probably, these two aspects correspond to each other: in spite of the improved control of irritating sounds and a learning eect, the users were still untrained in coping with the rather complex sound design. This sceptical position was taken in particular by two users, aecting items 5 to 9. All in all, the ratings indicate good progress in the realtime players design. This may well have been inuenced the strong time restraints on these tests: As our experts have very tight schedules in clinical work, it has been dicult to obtain enough time for reasonably unhurried, pressure-free testing. Comparing the ratings across the two rst versions, the Realtime Player 1 was not rated as highly as the Screener 1. We attribute this to the higher complexity of the sound design (which did not come across very clearly under the time pressure given), the related non-transparency of some parameter controls, and to ensuing doubts about the practical
149 benet of this method of data analysis. Only the rating for irritation is better than Screener 1, which indicates that the sound design is aesthetically viable for the users. All these concerns were addressed in the Realtime Player 2: In order to clarify the bandsplitting technique, GUI elements indicate the amount of power present in each band, and allow for interactive choice of which bands to listen to; less parameter controls are made available to the user5 , with simpler and clearer names. Much more detailed help info pages are also provided now. Finally, band-splitting (adapted to audication) was integrated into the Screener 2 as well, which gives users a clearer understanding of this concept across dierent sonication approaches.
9.4.6
Qualitative results for both players (versions 2)
For both players, all users mentioned easy handling (usability), good visual design, and transparency of functionality. More positive comments on the Screener were higher creativity (by using the frequency controls) and that irritating sounds have nearly disappeared. One user explained this by a training eect, and we agree: It seems that as users learn to interpret the meaning of unpleasant sounds (such as muscle movements), the irritation disappears. Regarding the realtime player, users mentioned good visual correlation with the sound, because of the new visual presentation of EEG on the GUI. One user noted that acoustical side-localisation of the recorded epileptic seizure works well. Further improvements were suggested: For both players, the main wish is synchronization of sound and visual EEG representation (within the familiar software): In case of realtime monitoring, this would allow to better compare the relevant activities. As far as screening is concerned, the visual representation of larger les on the GUI was considered not very satisfying. For the realtime player, presets for the complex parameters in accordance to specic seizure types were suggested as very helpful. Moreover, usability could still be improved a bit more (however, no specic wishes were given), as well as irritating sounds should be further decreased. This wish may also be due to the fact that the oered parametercontrols for reducing disturbing sounds may not have been used fully. This can likely be addressed by more training.
9.4.7
Conclusions from user tests
According to the experts evaluation of the EEG Screener, intensive listening training will be essential for its eective use in clinical practice - in spite of improved usability and
5
Version 1 had some visible controls mainly of interest to the developer.
150 acceptance of the second version. As the visual mode in clinical EEG diagnostics and data analysis is still dominant, for the widespread use of sonication tools an alternative time and training management is necessary. After such training, our new tools may well help to successively reduce eort and time in data analysis, decrease clinical diagnostic risk, and in the longer term, oer new ways for exploring EEG data.
9.4.8
Next steps
A number of obvious steps could be taken next (given followup research projects): For the Realtime Player, the top priority would be integration of the network connection for realtime monitoring during EEG recording sessions. Then, user tests in real world long-term monitoring settings can be conducted. These tests should result in recommended synthesis parameter presets for dierent usage scenarios. For the sound design, we have experimented with an interesting variant which emphasizes the rhythmic nature of the individual EEG bands more (see Baier et al. (2006); Hinterberger and Baier (2005)). This feature can be made available as an added user parameter control (rhythmic), with a value of 0 maintaining the current sound design, and 1 accentuating the rhythmic features more strongly. For both Realtime Player and Screener, eventual integration into the EEG administration software used at our clinic was planned; however, this can only be done after another round of longer-term expert user tests, and when the ensuing design changes have been nalised.
9.4.9
Evaluation in SDSM terms
The main contributions to the Sonication Design Space Map concept resulting from work on the EEG players were the following lessons: Adopt domain concepts and terminology wherever possible (band splitting) make interfaces as simple and user-friendly as possible provide lots of visual support for what is going on (here, show band amplitudes) provide opportunities to understand complex representations interactively by providing options to take them apart (here, listening to single bands at a time) give users enough time to learn (this did not happen for the Realtime Player).
Chapter 10
Examples from the Science by Ear Workshop

For more background on the Science By Ear workshop, see section 4.2, and here1 . The dataset LoadFlow and the experiments made with it in the SBE Workshop are instructive basic examples; they are given as rst illustrations of the Sonication Design Space Map in section 5.1. Other SBE datasets and topics (EEG, Ising, UltraWideband, Global Social Data) were elaborated in more depth in mainstream SonEnvir research activities, and are thus covered in the examples from the SonEnvir research domains. The remaining two datasets, RainData, and Polysaccharides, are described briey here for completeness.
10.1
Rainfall data
These data were provided and prepared by Susanne Schweitzer and Heimo Truhetz of the Wegener Center for Climate and Global Change, Graz. The data describe the precipitation per day over the European alpine region from 01.01.1980 to 01.01.1991. Additionally, associated orographic information (i.e. describing the average height of the area) was provided. Such data are quite common in climate physics research. The precipitation for 24 hours is measured as the total precipitation within 6:00 UTC2 and 6:00 UTC of the next day. The data were submitted in a single large binary le of the following format: Each single number is precipitation data in mm/day over the European alpine region (latitude 49.5N43N, longitude 4E-18E) with 78 x 108 grid points. The time range covers 11 years, from 1980-1990; this equals 4018 days. The data is stored in 4018 arrays (one after another) of 78 x 108 (rows x colums) values. The rst array contains precipitation data over the selected geographic region of day 1 (1.1.1980), the 2nd array is precipitation data over the selected geographic region of day 2 (2.1.1980), and so on. A visualisation of the
1 2
http://sonenvir.at/workshop/ Coordinated Universal Time
151
152
Figure 10.1: Precipitation in the Alpine region, 1980-1991.
average precipitation over the 11 years given is shown in gure 10.1. A second le provides associated information on orography of the European alpine region, i.e., the terrain elevation in meters. This data is stored in one 78 x 108 array. General questions the domain scientists deemed interesting were whether it would be possible to hear all three dimensions (geographical distribution and time) simultaneously and to nd a meaningful representation of the distribution of precipitation in space and time. They also speculated that it might be relaxing to listen to a synthetic rendering of the sound of rain. As possible topics to investigate, they suggested: 10-year mean precipitation in the seasons variability of precipitation via standard deviations (i.e., do neighbouring regions more often swing together or against each other?) identication of regions with similar characteristics via covariances (do dierent regions sound dierent?)
153 extreme values (does the rain fall regularly, or are there long droughts in some regions?) correlations in height (does precipitation behave similarly in similar orographic heights?) distribution of precipitation amounts (on how many days is the precipitation higher than 20mm, 19 mm, 18mm, etc?) As a test of the proper geometry of the data format, the SC3 starting le for the sessions provided a graphical representation of the orographic dataset, with higher regions shown as brighter gray, see gure 10.2. We also provided example reading routines for the data le itself.
Figure 10.2: Orography of the grid of regions.
Session team A In the brainstorming phase, team A came up with the idea to use spatial distribution for the denition of features like variability, entropy, etc., possibly using large regions such as quarters of the entire grid. The team agreed that the data should be used as time series, since rhythmical properties are expected to be present. The opinion was that the main interest is in the deviations from the average yearly shape. Thus, the team decided to
154 try using an acoustic representation of the data series conditioned to the average yearly shape as a benchmark curve as follows: if the value in question is higher than average, high pitched dust (single sample crackle through a resonance lter) is audible. if value is lower than average, lower pitched dust is heard. The amplitude should scale with the absolute deviation from average, and the dust density should scale with absolute rain values. In this fashion, one could sonify dierent locations at the same time by assigning the sonications of dierent locations to dierent audio channels. This should produce temporal rhythms if there are systematic dependencies between the locations. As data reading turned out to be more dicult than expected, the team began experimenting with dummy data to design the sounds and behaviour of the sonication, while the second programmer worked on data preparation. In the end, the team ran out of time before the real data became accessible enough for replacement. Session team B Team B discussed many options while brainstorming: as the data set was quite large, choosing data subsets, e.g. by regions; looking for possible correlations; maybe listening to the entire time range for a single location; maybe use a random walk for a reading trajectory; select location by pointing (mouse), compare a reference time-series sonication to the data subset under study. The team found a good solution for the data reading diculties: they read only the data points of interest directly fro the binary le, as this turned out to be fast enough for real time use. The designs written explored comparing the time-series for two single locations over ten years; the sound examples produced demonstrate these pairs played sequentially and simultaneously on left and right channels. The sounds are produced with discrete events: each data point is rendered by a gabor grain with a center frequency determined by the amount of rain for the day and location of interest. In the nal discussion, the team found that a comparison of dierent regions would be valuable, where the mean area over which to average should be exible. Such averaging could also be considered conceptually similar to fuzzy indexing into the data; modulating the averaging range and providing fuzziness in three dimensions would be worth further exploration. Session team C Team C had the most diculties getting the data loaded properly; this was certainly a deciency in the preparation. After converting a subset of the data with Excel, they decided on comparing the data for January in all years, and listen for patterns and dif-
155 ferences across dierent regions. Some uncertainty about whether the conversions were fully correct remained, but this was considered relatively unimportant for the experimental workshop context. The sonication design entailed calculating a mean January value for all locations, and comparing each individual day to the mean value. This was intended to show how the precipitation varies, and to identify extreme events. All 8424 stations are being scanned along north/south lines, which slowly move from west to east. The ration of the days rainfall to the mean was mapped to the resonance value of a bandpass lter driven by white noise. The sound examples provided cover January 5, 15, and 25 for the years 1980 and 1981 scaled into 9 seconds; a much slower variant with only 190 stations is presented as well for comparison, and this shows a much smoother tendency. Varying lter resonance as rapidly as described above is not likely to be very clearly audible. Comparison in SDSM terms The data set given is quite interesting from an SDSM perspective: it has 2 spatial indexing dimensions, with 78 * 108 = 8424 geographical locations, for which orographic data dimension (average elevation above sea level) are also given. For each location, data are given for 1 (or maybe 2) time dimensions, namely, 365 (resp. 366) days * 11 years = 4018 time steps (days). Thus, multiple locations are possible for its data anchor (see gure 10.3), depending on the viewpoint taken. From a temporal point view, one would treat the 8424 locations as the data size, and create a day anchor at x: 8424, y: 1, a month anchor at x: 8424, y: 30, the year anchor and the 11-year are both outside the standard map size. For a single location, an anchor could be at x: 4018, y: 2. In any case, whatever one considers to be the unit size of this kind of data set is arbitrary, as both time and space dimensions could be dierent sizes and/or resolutions. Team A mapped one year into 7.3 seconds, and presented two streams of two mapped dimensions each (pitch label for deviation polarity, and intensity for deviation amount). These choices put its SDSM point at an expected gestalt size of 150 (x), dimensions at 2 (y), and streams at 2 (Z). Continuous parameter mapping is a reasonable choice for this location on the map. Team B begins with 8424 data points per 4 second loop; this is a rather dense gestalt size of ca. 6000. The design choice of averaging over 9-10 values scales this to ca. 600, which seems well suited for granular synthesis with a single data dimension used, mapped to the frequency parameter (y-axis), and using two parallel streams (z). Team C maps 8424 values into 9 seconds, which creates a gestalt size of ca. 3000 (label C1 on the map); this seems very fast for modulation synthesis of lter bandwidth, although it uses only a single stream and dimensions, so y and z values are both 1.
156
Figure 10.3: SDSM map of Rainfall data set.
The slower version (190 values in 9 seconds, C2) is more within SDSM recommended practice, at a gestalt size of ca. 60. While the SDSM concept recommends making indexing dimensions available for interaction, this was too complex for the workshop setting.
10.2
Polysaccharides
This problem was worked on for two two-hour sessions, so the participants had more time to reect and consider how to proceed. The data were submitted by Anton Huber of the Institute of Chemistry at University of Graz.
10.2.1
Polysaccharides - Materials made by nature3
Polysaccharides make up most of the biological substance of plant cells. Their molecular geometries, such as their symmetries determine the physical properties of most plantbased materials. Even materials from trees of the same kind have dierent properties because of the environment they come from; so understanding the properties of a given
3
This was the title of Anton Hubers introductory talk.
157 sample is of crucial importance to Materials scientists. A typical question that occurs is: Are the given datasets (which should be the same) somehow dierent? In aqueous media, polysaccharides form so-called supermolecular structures. Very few of these molecules can structurise amazing amounts of water: water clusters can be several millimeters large. By comparison, the individual molecules are measured in nanometers, so there is a scale dierence of six orders of magnitude! In a given measurement setup the materials are physically sorted by fraction: on the left side particles with big molecules (high mol numbers) are found, one the right small ones. Rather few bins (on the order of 30) of sizes and corresponding weights are conventionally considered suciently precise for classication, both in industry and science. The data for this session were analysis data of four samples of plant materials: beech, birch, oat and rice. Three dierent measurements were given, along with their indexing axes: channel 1 is an index (corresponding to mol size) of the measurement at channel 2, channel 3 is an index of channel 4, and channel 5 is an index of channel 6. Channels 1 and 2 contain the measured delta-refraction index of electromagnetic radiation aimed at the material sample; i.e. how strongly light of a given wavelength is diverted from its direction by the size-ordered regions along the sample. (The exact wavelength used was not given.) Channels 3 and 4 contain the measured uorescence index under electromagnetic radiation, again dependent on the size-ordered regions along the sample. Channels 5 and 6 contain the measured dispersion of the material sample under light, or more precisely, how much the dispersion diers from that of clear water, based on molecule size along the size-ordered axis of the sample.
10.2.2
Session notes
The notes for this session were reconstructed shortly after the workshop. Brainstorming One of the rst observations made was that the data look like FFT analyses - so the team considered using FFT and Convolution on the bins directly. An alternative could be a multi-lter resonator with e.g. 150 bands, maybe detuned from a harmonic series. As was noted several times in the workshop by those favouring audication, it seemed desirable to obtain rawer data as directly as possible from the measurements; these may be interesting to be treated as impulse responses. The rst idea the team decided to try was to create a signature sound of 1-2 seconds for one channel each data le by parameter mapping to about 15-20 dimensions; a
158 second step should be to compare two such signature sounds (for two channels of the same le) binaurally. Experimentation A look at the data revealed that across all les, channel 1 seemed massively saturated in the upper half, so we decided to take only the undistorted part of channel 1, downsample it to e.g. 50 zones, and turn these into 50 resonators, which would ring dierently for the dierent materials when excited. The resonator frequencies were scaled according to the index axis, which is roughly equal to particle size: small particles are represented by high sounds, and big particles by lower resonant frequencies. Based on this scheme, we proceeded to make short sound signatures for the four materials, using delta-refraction index (channel 2) and uorescence (channel 4) data, with two dierent exciter signals: Noise and impulses. The sound examples provided here4 present all four materials in sequence: Delta refraction index, impulse source: Materials1 Pulse BeechBirchOatRice.mp3 Delta refraction index, noise source: Materials1 Noise BeechBirchOatRice.mp3 Fluorescence, impulse source: Materials2 Pulse BeechBirchOatRice.mp3 Fluorescence, noise source: Materials2 Noise BeechBirchOatRice.mp3 The team also started making these playable from a MIDI drum pad for a more interactive interface, but did not have enough time to nish this approach. Evaluation The group agreed that having time for two sessions was much better for deeper discussion and more interesting results. Even so, more time would be desirable. In this particular session, the sound signatures made were easy to distinguish, so in principle, this approach works. What could be next steps? It would be useful to implement signatures of more than one channel to increase reliability of properties tracking; e.g. for materials production monitoring, this could be a useful application. It would also be interesting to try a nonlinear complex sound generator (such as a feedback FM algorithm) and control its inputs from the data, using on the order of 20-30 dimensions; this holistic approach would be interesting from the perspective of sonication research, as it might lead to emergent audible properties without requiring detailed matching of individual data dimensions to specic sound parameters. While
4
http://sonenvir.at/workshop/problems/biomaterials/sound descr
159 there was no time to attempt this within the workshop setting, the idea would certainly warrant further research. In SDSM terms, the dimensionality of each data point is unusually high here. The sonications render each material (consisting of 680 measurements) to a reduced range of the data, downsampled to 50 values, as resonator specications, i.e., intensity and ringtime of each band. Given an interactive design, such as one allowing tapping on the dierent materials or probes, one can easily compare on the order of 5-8 samples within short term memory limits.
Chapter 11
Examples from the ICAD 2006 Concert

The author was Concert Chair for the ICAD 2006 Conference at Queen Mary University London, and together with Christian Day and Christopher Frauenberger, organized e the Concert Call, the review process for the submissions, and the concert itself (see section 4.3 for full details). This chapter discusses four of the eight pieces played in the concert, chose for diversity of the strategies used, and clarity and completeness of documentation.
11.1
Life Expectancy - Tim Barrass
This section discusses a sonication piece created by Tim Barrass for the ICAD 2006 Concert, described in Barrass (2006), and available as headphone-rendered audio le1 . Life Expectancy is intended to allow listeners to nd relationships between life expectancies and living conditions around the world. The sounds he chooses are quite literal representations of their meanings, making them relatively easy to read, even though the piece is quite dense in information. It is structured in three parts, beginning with a 20 second section which mainly provides spatial orientation, a long middle section representing living conditions for each country in a dense 2 second soundscape, and a short nal section illuminating gender dierences in life expectation. The opening section presents the spatial locations of all country capitals, ordered by ascending life expectancy. The speaker ring is treated as if it were a band around the equator, with the listener inside near the center of the globe. Each capital location is marked by a bell sound (which is easy to localise), spatialised in the ring of speakers according to the capitals longitude; latitude (distance to the equator, North or South) is represented by the bells pitch, where North is higher. A whistling tone represents ascending life expectancy for each country, and as it is not spatialised, it is easy to
1
http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/life.mp3
160
161 follow as one stream. Each country has roughly 0.1 second for its bell and whistle tone. The main section of the piece is about six minutes long, and presents a rich, complex audio vignette for every country, at the length of a musical bar of two seconds. The most intriguing aspect here is the ordering of the countries: First we hear the country with the highest life expectancy, then the lowest, the second highest, the second lowest, and so on until the interleaved orders meet in the median. Each sound vignette consists of the following sound components: Two bell sounds whose pitch indicates latitude, rst of the equator, then of of the countrys capital spatial location, their horizontal spatial position being longitude. A chorus speaking the countrys name, with the number of voices representing the population number, and whose extension representing the country area. The capital name is also spoken, at its spatial location. A fast ascending major scale represents life expectancy, once for male, once for female inhabitants of the country. The number of notes of the scale fragment represents the number of life decades, so a life expectancy of 75 years would be represented as a scale covering 8 steps (up to the octave) with the last note shortened by 50 percent. The gender dierences between each pair of scales, and the alternation of extreme contrasts in the beginning of the sequence articulates this aspect very interestingly. Clinking coins signify economic aspects: average income by density of the coin sounds, while gross domestic product (GDP) is indicated by reverb size. The sound of water lling a vessel indicates access to drinking water and sanitation: a full vessel indicates good access, an empty vessel little access. Three pulses of this sound provide total, rural, and urban values. Sanitation is rendered by adding distortion to the water pulses when sanitation values are low (suggesting dirty water). The nal short section of the piece focuses on gender dierences in life expectancy. As the position bell moves from the North Pole to the South Pole, life expectancies for each country are represented with a tied note, going from the value for male to female (usually rising), and spatialised at the capitals location. Tim Barrass is very modest in commenting on the piece (Barrass (2006)): I have taken a straightforward and not particularly musical approach, in an attempt to gain a clear impression of the dataset. The sound mapping is brittle, designed specically for the dataset. I would not expect this approach to provide a exible base to explore the musical, sonic and informational possibilities of similar material, but it may at least serve as an example of one direction that has been tried. While the piece may appear artless in representing so much of the dataset with apparently simplistic sound mappings, I nd the piece extremely elegant, both as a sonication, and as a composition. The sound metaphors are so clear that they almost disappear, as
162 does the spatial representation. It is quite an achievement to create concurrent sound layers that are both rich, complex, dense enough to be demanding to listen to, and transparent enough to allow for discovering dierent aspects as the piece proceeds. This piece certainly provided the richest information representation of all entries for the concert. The beginning and end sections work beautifully as frames for the piece, as orientation help, and as alternative perspectives on the same questions. For me, the questions that remain long after listening to the piece come from the strongest intervention in the piece, the idea of sorting the countries so as to begin with the most extreme contrasts in life expectancy, and moving toward the average lifespan countries.
11.2
Guernica 2006 - Guillaume Potard
This section discusses a piece created by Guillaume Potard for the ICAD 2006 Concert, described in Potard (2006), and available as headphone-rendered audio le2 . Guernica 2006 sonies the evolution of world population and the wars that occurred between the year 1 and 2006. Going far beyond the data supplied with the concert call, Potard has compiled a comprehensive list of 507 documented wars, with geographical location, start and end year, and a ag indicating whether it was a civil war or not. He also located estimates for world population for the same time period. The sonication design represents the temporal and geographical distribution chronologically. The temporal sequence follows historical time: the start year of each war determines when its representing sound begins. As many more wars have occurred toward the end of the period observed, the time axis was slowed down logarithmically in the course of the piece, so the duration of a year near the end of the piece is 4 times longer than at the beginning. This maintains the overall tendency, but still provides better balance of the listening experience. The years 1, 1000 and 2000 are marked by gong sounds for orientation. The entire piece is scaled to a time frame of ve minutes. The start time of each war is indicated by a weapon sound; the sounds chosen change with the evolution of weapon technology. In the beginning, horses, swords, and punches are heard, while after the invention of gunpowder cannons, guns, and explosions dominate. Newer technology such as helicopters is heard only toward the end of the piece, after the year 1900. Civil wars are marked independently by the additional sound of breaking glass. The spatial distribution of the sounds was handled by vector-based amplitude panning for the directions of the sound sources relative to the reference center, the geographical location of London. Sound distance was rendered by controlling the ratio of direct to reverberation sound.
2
http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/guernica.mp3
163 The evolution of world population is sonied concurrently as a looping drone, with playback speed rising as population numbers rise. Guernica 2006 was certainly the most directly dramatic piece in the concert. The use of samples communicates the intended context very clearly, without requiring much prior explanation. As Potard (2006) states, richer data representation with this approach would certainly be possible; he considers representing war durations, distinguishing more types of war, and related factors like population migrations in future versions of the piece.
11.3
Navegar E Preciso, Viver No E Preciso a
This section discusses a sonication piece created by Alberto de Campo and Christian Day for the ICAD 2006 Concert, described in de Campo and Day (2006), and availe e 3 able as headphone-rendered audio le . As this piece was co-written by the author of this dissertation, much more background can be provided than with the other pieces discussed. In this piece, we chose to combine the given dataset containing current (2005) social data of 190 nations with a time/space coordinates dataset of considerable historical signicance: The route taken by the Magellan expedition to the Moluccan Islands from 1519-1522, which was the rst circumnavigation of the Globe.
11.3.1
Navigation
The world data provided by the ICAD 2006 Concert Call all report the momentary state for the year 2005, and thus free of the idea of historical progression. Also, the choice of which variables to include in the sonication, and how, must be based on theoretical assumptions which are not trivial to formulate on a level of aggregation involving 6.513.045.982 individuals (the number of people estimated to have populated this planet on April 30, 2006, see U.S. Census Bureau (2006)). The data do provide detailed spatial information, so we decided to choose a familiar form of data organization that combines space and time: the journey. Traveling can be dened as moving through both space and time. While the time dimension as we experience it is unimpressed by the desires of the traveler, s/he can decide where to move in space. The art and science that has enabled mankind to nd out where one is, and in which direction to go to arrive somewhere specic, is known as Navigation. Navigation as a practice and as a knowledge system has exerted major inuence on the
3
http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/navegar.mp3
164 development of the world. The Western world has changed drastically by the consequences of the journeys led by explorers like Christopher Columbus or Vasco da Gama. (The art of navigation outside Europe, especially in Polynesia, is covered very interestingly in Conner (2005), pp 41-58.) The rst successful circumnavigation of the globe, led by Ferdinand Magellan, proved beyond all scholastic doubts that the earth in fact appears to be round. This would not have happened without the systematic cultivation of all the related sciences in the school for navigation, map-making and ship-building founded by Henry the Navigator, King of Portugal in the 15th century. (Conner (2005) also describes their methods of knowledge acquisition vividly as mainly coercion, appropriation, and information hoarding, see chapter: Blue Water Navigation, pp. 201.) For all these reasons, Magellans Route became an interesting choice for temporal and spatial organization for our concert contribution.
11.3.2
The route
Leaving Seville on August 10, 1519, the ve ships led by Magellan (called Trinidad, San Antonio, Concepcin, Victoria, and Santiago) crossed the Atlantic Ocean to anchor o near present-day Rio de Janeiro after ve months (Pigafetta (1530, 2001); Wikipedia (2006b); Zweig (1983)). Looking for a passage into the ocean later called the Pacic, they moved further south, where the harsh winter and nearly incessant storms forced them to anchor and wait for almost six months. While exploring unknown waters for this passage, the Santiago sank in a sudden storm, and the San Antonio deserted back to Spain; the remaining three ships succeeded and found the passage in the southernmost part of South America which was later called the Magellan Straits, in late October 1520. The ships then headed across the Mar del Sur, the ocean Magellan named the Pacic, towards the archipelago which is now the Philippines, where they arrived four months later. Seeking the mythical Spice Islands, Magellan and his crew visited several islands in this area (Limasawa, Cebu, Mactan, Palawan Brunei, and Celebes); on Mactan, Magellan was killed in a battle, and a monument in Lapu-Lapu City marks the site where he died. In spite of their leaders death, the crew decided to full their mission. By now diminished to 115 persons on just two ships (Trinidad and Victoria), they nally managed to reach the Spice Islands on November 6, 1521. Due to a leak in the Trinidad, only the Victoria set sail via the Indian Ocean route home on December 21, 1521. By May 6, 1522, the Victoria, commanded by Juan Sebastin Elcano, rounded the Cape of Good Hope, with only rice for rations. Twenty crewmen died of starvation before Elcano put into Cape Verde, a Portuguese holding, where he abandoned 13 more crew on July 9 in fear of losing his cargo of 26 tons of spices (cloves and cinnamon). Wikipedia (2006b). On September 6, 1522, more than three years after she left Seville, Victoria reached the port of San Lucar in Spain with a crew of 18 left. One is reminded of a song by Caetano
165
Figure 11.1: Magellans route in Antonio Pigafettas travelogue

(Primo Viaggio Intorno al Globo Terracqueo. - First travel around the terracqueous globe, see Pigafetta (1530)).
166
Figure 11.2: Magellans route, as reported in wikipedia.

http://wikipedia.org/Magellan
Veloso, who, pondering the mentality and fate of the Argonauts, wrote: Navegar e preciso, viver no preciso - Sea-faring is necessary, living is not (see appendix E). a e
11.3.3
Data choices
The explorers in the early 15th century were interested in spices (which Europe was massively addicted to at the time), gold, and the prestige earned by gaining access to good sources of both. Nowadays, other raw materials are considered premium goods. What would someone who undertakes such a journey today hope to gain for his or her exertions; what is as precious today as gold and spices were in the 16th century? We imagine todays conquistadores (or globalizadores) would likely ask rst about economic power: how rich is an area? Second, they would probably check geographical potential; and chances are that if any one resource will be as central to economic activity in the future as spices were centuries ago, it will be drinking water resources. Water might well become the new pepper, the new cinnamon, or even the new gold. (As the Gulf wars showed, oil would have been the obvious current choice; however, we found the future perspective more interesting.) Thus we chose to focus on two main dimensions: one depicting economic characteristics of every country we pass, and another informing us about its inhabitants current access to drinking water.
167
11.3.4
Economic characteristics
The variable GDP per capita included in the given data set provides some insights in the overall economic performance of a country. Obviously, the GDP per capita variable lacks information about the distribution of the income; it only says how much money there would be per person if it were equally distributed. This is never the case; on the contrary, scientists nd that the rich get richer and the poor get poorer both in intranational and international contexts. E.g. in the US of 1980, the head of a company earned on average 42 times as much as an employee by the year 1999, this ratio was more than ten times higher: a company leader earned 475 times more than an average employee (Anonymous (2001)).
Figure 11.3: The countries of the world and their Gini coecients.
From http://en.wikipedia.org/wiki/Gini.
A measure that captures aspects of income distribution is the Gini coecient on income inequality (Wikipedia (2006a)). Developed by Corrado Gini in the 1910s, the Gini coecient is dened as the ratio of area between the Lorenz curve of the distribution and the curve of the uniform distribution, to the area under the uniform distribution. More common is the Gini index, which is the Gini coecient times 100. The higher the Gini index, the higher the income dierences between the poorer and the richer parts of a society. A value of 0 means perfectly equal distribution, while 100 means that one person gets all the income of the country and the others have zero income. However, the Gini index does not report whether one country is richer or poorer than the other. Our sonication tries to balance the limitations of these two variables by combining them: We include two factors that go into a Gini calculation; the ratio of the top and bottom 10 percentile of all incomes in a population, and the ratio of the top to bottom 20%. In Denmark, at Gini index rank 1 of 124 nations for which Gini data exist, the top 10% earn 4.5x as much as the bottom 10%, for the UK (rank 51), the ratio is 13.8:1;
168 the US (rank 91) ratio is 15.9:1; in Namibia, at rank 124, the ratio is 128.8:1. (In the sonication, missing values here are replaced by a dense cluster of near-center values, which is easy to distinguish acoustically from the known occurring distributions.)
11.3.5
Access to drinking water
An interesting variable provided by the ICAD06 Concert data set is Estimated percentage of population with access to improved drinking water sources total. Being part of the so-called Social Indicators (UN Statistics Division (1975, 1989, 2006)), the data are reported to the UN Statistics Division by the national statistic agencies of the UN member states. Unfortunately, this indicator has a high percentage of missing values (46 of 190 countries, or 24.2%). This percentage can be reduced to 16.3% (31 countries) by excluding missing values from countries which are not touched by our Magellanian route. Still, the problem is fundamental and must be addressed. The strategy we chose was to estimate the missing values on the basis of the data value of the neighboring countries, being aware that this procedure does not satisfy scientic rigor. In most cases, though, we claim that our estimates are likely to match reality: for instance, it is very likely that in France and Germany (as in in most EU countries), very close to 100% of the population do have access to improved drinking water resources, and that this fact is considered too obvious to be statistically recorded.
11.3.6
Mapping choices
We deliberately chose rather high display complexity; while this requires more listener concentration and attention for maximum retrieval of represented information, hopefully a more complex piece invites to repeated listening, as audiences tend to do with pieces of music they enjoy. Every country is represented by a complex sound stream composed of a group of ve resonators; the central resonator is heard most often, the outer pairs of resonators (satellites) sound less often. All parameters of this sound stream are determined by (a) data properties of the associated country and (b) the navigation process, i.e. the ships current distance and direction towards this country. At any time, the 15 countries nearest to the route point are heard simultaneously. This is both to limit display complexity for the sake of clarity, and to keep the sonication within CPU limits for realtime interactive use. The mapping choices in detail are given in 11.1. In order to provide a better opportunity to learn this mapping, the author has written a patch which plays only a single sound source/country at a time, where it is possible to switch between the parameters for all 192 countries. This allows comparing the multidimensional changes as one switches from say Hongkong (very dense population, very rich) to Mongolia (very sparse population, poor). In public demonstrations and talks, this has proven to be quite appropriate for this relatively complex mapping. When
169 Table 11.1: Navegar - Mappings of data to sound parameters Population density of country Density of random resonator triggers GDP per capita of country Central pitch of the resonator group Ratio of top to bottom 10% Pitches of the outermost (top and bottom) satellite resonators Ratio of top to bottom 20% Pitches of the inner two satellite resonators (missing values for these become dense clusters) Water access Decay time of resonators (short tones mean dry) Distance from ship Volume and attack time (far away is blurred) Direction toward ship Spatial direction of the stream in the loudspeakers (direction North is always constant) Ship speed, direction, winds Direction, timbre and volume of wind-like noise
hearing the piece after experimenting for a while with an example of its main components, many listeners report understanding the sonication much more clearly. It has also been helpful to provide some points of orientation that can be identied while the piece unfolds, as listed in table 11.2.
11.4
Terra Nullius - Julian Rohrhuber
This section discusses a sonication piece created by Julian Rohrhuber for the ICAD 2006 Concert. It is described in Rohrhuber (2006), and available as headphone-rendered audio le here4 .
11.4.1
Missing values
The concept for Terra Nullius builds on a problem present (or actually, absent) in data from many dierent contexts: missing values. Rohrhuber (2006) states that in sonication, data are assumed to have implicit meaning, and that sonications try to communicate such meaning. In the specic case of the data given for the concert, most data dimensions are quantitative; thus the data can be ordered along any such dimension, and the value for one dimension of a given data point can be mapped to a sonic property of a corresponding sound event. For example, one could order by population size, and map GDP per capita to the pitch of a short sound event. However, with missing values the situation becomes considerably more complicated:
4
http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/terra.mp3
170 Table 11.2: Some stations along the timeline of Navegar 0:00-0:10 Very slow move from Sevilla to San Lucar 0:20-0:26 Cape Verde: very direct sound (i.e. near the capital), rather low, dense spectrum (poor country, unknown income distribution) 0:54-1:00 Uruguay/Rio de la Plata: very direct sound, passing close by. 1:05-2:40 Port San Julian, Patagonia: very long stasis, everything is far away, six months long winter break in Magellans travel 2:45-3:00 Moving into Pacic Ocean: new streams, many dense spectra; unknown income distributions 3:20 Philippines: very direct sound (near capital), high satellites: unequal income distribution 4:00 Brunei: very direct, high, dense sound: very rich, unknown distribution ... towards Moluccan Islands 4:50 East Timor: direct, mostly clicking, only very low frequency resonances (very poor, little access to water, unknown income distribution) 5:15 into Indian Ocean: openness, sense of distance 5:50 approaching Africa: more lower centers, with very high satellites: poor, with very unequal distributions (but at least statistics available) 5:55 Pass Cape of Good Hope: similar to East Timor 6:10 Arrive back at San Lucar, Spain
Rohrhuber states that These non-values break gaps into the continuity of evaluation they belong to another dimension within their dimension. Missing data not only fail to belong to the dimension they are missing from, they also fail to belong in any uniform dimension of missing. Furthermore, one must consider that there are no fully valid strategies for dealing with missing values: Removing data points with missing values distorts the comparisons in other data dimensions; substituting likely data values introduces possible errors and reduces data reliability; marking them by recognizably out-of-range values may be logically correct, but these special values can be quite distracting in a sonication rendering.
11.4.2
The piece
The piece consists of multiple cycles, each moving around the globe once. For every cycle, all countries within a zone parallel to the equator are selected and sonied one at a time in East to West order, as shown in gure 11.4. In the beginning, the zone contains latitudes similar to England, or actually London, as the capitals determine geographical position. The sound is spatialised accordingly in the ring of speakers, so one cycle around
171 the globe moves around the speaker cycle once. With every cycle, the zone of latitudes widens until all countries are included.
Figure 11.4: Terra Nullius, latitude zones
To sonify the missing values in the 46 data dimensions given, a noise source is split into 46 frequency bands. When a value for a dimension is present, the corresponding band remains silent; the band only becomes audible when the value for that dimension in the current country is missing. After all countries are included in the cycle, the latitude zone narrows again over several cycles, and ends with the latitude and longitude of London. For this second half, the
172 lters have smaller bandwidth, so there is more separation between the dimensions. Gradually, constant decorrelated noise fades in on all speakers, which remains for a few seconds after the end of the last cycle. Terra Nullius plays very elegantly with dierent orders of missingness, in fact creating what could be called second-order missing values of what is being sonied: ... A band of ltered noise is used for each dimension that is missing, i.e. the noisier it is, the less we know. In the end the missing itself seems quite rich of information - only about what? (Rohrhuber (2006)) Personally, I nd this the most intriguing work of art in the ICAD 2006 concert. Subtly shifting the discussion to recursively higher levels of consideration of what it is we do not know, it is an invitation to deeper reection on many questions about meaning and representation.
11.5
Comparison of the pieces
In order to study the variety of approaches that artists and soniers took in creating pieces, SDSM terminology and viewpoint turned out to be quite useful. For the dataset given, a clear anchor can be provided at 190 data points and 26 dimensions for the basic dataset, and 44 for the extended set (see gure 11.5). Life Expectancy chooses a rather large set of data dimensions, and sonies aspects of it in three distinct ways: an overview of for spatial orientation, sorted by life expectancy (LE1), a long sequence of 2 second vignettes, densely packed with information (LE2), and a nal sequence of life expectancies sorted North-South (LE3). Orientation - LE1 Within 20 seconds, a signal sound is played for each country, ordered by total life expectancy; this renders 3 mapped dimensions (life expectancy, latitude, longitude). Vignettes - LE2 Five streams make up each vignette: two bell sounds - 2 dimensions: latitude and longitude; spoken country and capital name - 6 dimensions: 2 names, spatial location again (2), population size, and area; scale fragment - 2 dimensions: life expectancy for males and females; clinking coins - 2 dimensions: average income over density and GDP; water vessel - 3x2 dimensions: 3 pulses with 2 values each, fullness and distortion. This combination of parallel/interlocked streams with [2, 6, 2, 2, 6] dimensions each renders a total of 16 dimensions per vignette of 2 seconds! While these could also be rendered visually as a sideways view of the SDS map (showing the Y and Z axes), they are shown here as 16 parallel dimensions for better comparability.
173
Figure 11.5: SDSM comparison of the ICAD 2006 concert pieces.
Ending - LE3 This section is again short (30 seconds with intro and ending clicks, 17 without), and compares the 2 life expectancy values for males and females, with the countries sorted North/South; including spatial location, it uses 4 dimensions. Overall, the piece has very literal, easy to read mappings to sounds; it employs a really complex, dierentiated soundscape, and it is very true to the concept of sonication. Guernica uses its own data, thus requiring its own data anchor. The piece renders world population as one auditory stream with a single dimension (GUE+), while each war is its own stream of 3 dimensions; while the maximum number of simultaneous wars in Potards data is around 35, the piece does not use war durations, so the maximum number of parallel streams is not documented. The order of the data is chronological, and at 507 wars within 300 seconds, it has an average event density of 5 within 3 seconds. Three dimensions are used for each event: the wars starting year, and its latitude/longitude. The parallelism of streams is roughly sketched with copies of the label GUE receding along the Z axis; as this is dynamically changing, there is no satisfying visual representation. Like Life Expectancy, Guernica features very literal sound mappings (samples of ghting sounds); it is based on additional data collected on wars and population since year 1,
174 which extend the starting dataset consideraby; and it adds the notion of a timeline and historical evolution. Navegar orders the data along a historical time/space route. Within 6 minutes, 134 countries are rendered (the others are too far away from the route to be touched), which puts the average data point density around 1 per 3 seconds. At any point, the nearest 15 countries are rendered as one stream each, with 7 dimensions per stream (NAV): latitude/longitude (with moving distance and direction), population density, GDP/capita, top 10 and top 20 richest to poorest ratios, and water access. The parallelism is again indicated symbolically as multiple NAV labels along the Z axis. Additionally, ship speed, direction, and weather conditions are represented, based on 76 timeline points (NAV+). Like Guernica, Navegar introduces a historical timeline; unlike it, it juxtaposes that with current social data. It uses metaphorically more indirect mappings than most of the other submissions. Uniquely within the concert context, it creates a soundscape of stationary sound sources with a subjective perspective: a moving observer (listener), and it also sonies context (here, speed and travel conditions). Terra Nullius organizes the data by two criteria: selection by latitude zone, ordering by longitude. A maximum of all 46 dimensions is used throughout the piece, which sets its Y value on the SDSM map. Within 19 cycles, larger and larger data subsets are chosen; rst, 14 countries within 18 seconds, putting it at a gestalt size of 2-3 (TN1 in the map). This speeds up to 190 countries at a rate of 100/sec, or ca 35 gestalt size (TN2), and returns to roughly the original rate eventually. What sets Terra Nullius apart from all other entries is that it assumes a meta-perspective on data perceptualisation in general by studying missing values exclusively. Conclusion While the SDSM map view of all for pieces shows the large dierences between the approaches taken, it cannot fully capture or describe the radical dierences in concepts manifested in this subset of the pieces submitted. On the one hand, that would be asking a lot of an overview-creating, orientational concept; one the other, it is interesting to nd that even within rather tightly set constraints like a concert call, creativity easily dees straightforward categorisation.
Chapter 12
Conclusions
This work consists of three interdependent contributions to sonication research: a theoretical framework that is intended for systematic reasoning about design choices while experimenting with perceptualisations of scientic and other data; a software infrastructure that pragmatically supports the process of uid iterative prototyping of such designs; and a body of sonications realised using this infrastructure. All these parts were created within one work process in parallel, interleaved streams: design sketches suggested ideas for infrastructure that would be useful; observing and analysing design sessions led to deeper understanding which informed the theoretical framework, and both the growing framework and the theoretical models eventually led to a more eective design workow. The body of sonications created within this system, and the theoretical models derived from the analyses of this body of practical work (and a few selected other sonication designs of interest) form the permanent results of this dissertation. They contribute to the eld of sonication research in the following respects: The Sonication Design Space Map and the related models provide a sonicationspecic alternative to TaDa Auditory Information Design, and they suggest a clearer, more systematic methodology for future sonication research, in particular for sonication design experimentation. The SonEnvir framework provided the rst large-scale in-depth test of Just In Time programming for scientic contexts, which was highly successful. The sonication community, and other research communities have become aware of the exibility and eciency of this approach. The theoretical models, the practical methodology and the individual solutions developed here may help to reduce time spent to cover large design spaces, and thus contribute to more ecient and fruitful experimentation. The work presented here was also employed in sonication workshop settings, and numerous talks and demonstrations given by the author. It proved to be helpful in giving 175
176 interested non-experts a clear impression of the central issues in sonication design work, and has been received favourably by a number of experts in the eld.
12.1
Further work
Within the SonEnvir project, many compromises had to be made due to time and capacity constraints. Also, given the breadth of the overall approach chosen, many ideas could not be fully explored, and would thus warrant further research. In the theoretical models, the main desirable future research aims would be: 1. Integration of more analyses of the growing body of Model-Based Sonication designs. 2. Expansion of the user interaction model based on a deeper background in HCI research. In the individual research domains, several areas would warrant continued exploration. Here, it is quite gratifying to see that one of the research strands has led to a direct followup project: The QCDaudio project hosted at IEM Graz continues and extends research begun by Kathi Vogt within SonEnvir. For the EEG research activities, two strategies seem potentially fruitful and thus worth pursuing: continuing the planned integration into the NeuroSpeed software, and starting closer collaborations with other EEG researchers, such as the Neuroinformatics group in Bielefeld, and individual experts in the eld, such as Gerold Baier. It is quite unfortunate that none of the designs created within this research context would be directly usable for visually impaired people. In my opinion, providing better access to scientic and other data for the visually impaired is one of the strongest motivations for developing a wider variety of sonication design approaches, and would be well worth pursuing more deeply. I hope the work presented will be found useful for future research in that direction. For me personally, experimenting with dierent forms of sonication in artistic contexts has become even more intriguing than it was before embarking on this venture. As the entries for the ICAD concerts, as well as many current network art pieces show, creative minds nd plenty of possibilities for experimentation with data representation by acoustic, visual and other means; creating work that is both aesthetically interesting and scientically well-informed is a still a fascinating activity. When more perceptual modalities are included in more interactive settings, the creative options and the possibility spaces to explore multiply once again.
Appendix A
The SonEnvir framework structure in subversion

This section describes which parts of the framework reside in which folders in the SonEnvir subversion repository. Note that the state reported below is temporary; pending discussion with the SC3 community, more SonEnvir work will move into the main distribution, as well as into general SC3 Quarks, or SonEnvir-specic Quarks.
A.1
The folder Framework
This folder contains the central SC3 classes written during the project, and their respective help les. The sub-folders are structured as follows: Data: contains all the SC3 classes for dierent kinds of data (see the Data model discussion above), such as EEG data in .edf format; it also includes some applications written as classes: The TimeSeriesAnalyzer (described in section 8), the EEGScreener and EEGRealTimePlayer (described in section 9). Interaction: contains the MouseStrum Class. Most of the user interface devices/interaction classes are covered by the JInT quark written by Till Bovermann, and available from the SC3 project site at sourceforge. Patterns: contains the HilbertIndex, a pattern class that generates 2D and 3D indices along Hilbert space lling curves; note that for 4D Hilbert indices there is a quark package. It also includes support patterns for Hilbert index generation, and Pxnrand, a pattern that avoids repeating the last n values of its own output. Rendering: contains two UGen classes, TorusPanAz and PanRingTop, and a utility for adjusting the individual speakers multichannel systems for more balanced sound, SpeakerAdjust. See also section 5.5. 177
178 Synthesis: includes a reverb class (AdCVerb, used in the VirtualRoom class), several classes for cascaded lters, a UGen to indicate loop ends in buer playback, PhasorClick (both are used in the EEG applications); and a dual band compressor. Utilities: includes a model for QCD simulations, Potts2D, a library of singing voice formants, and various extension methods. osx, linux, windows: these folders capture platform specic development; of these, only the OSX folder is in use for OSX-specic GUI classes. These will eventually be converted to a crossplatform scheme.
A.2
The folder SC3-Support
QtSC3GUI: contains GUIs written in Qt, which were considered an option for SC3 on Windows; this strand of development was dropped when suciently powerful versions of cross-platform GUI extension package swingOSC became available. SonEnvirClasses, SonEnvirHelp: these contain essentially obsolete variants of SonEnvir classes; they are kept mainly in case some users still need to run examples using these classes.
A.3
Other folders in the svn repository
CUBE: contains the QVicon2Osc application, which can connect the Vicon tracking system (which is in use at the IEM Cube) to any software that supports OpenSoundControl, and a test for that system using the SonEnvir VirtualRoom for binaural rendering. Prototypes: contains all the sonication designs (prototypes) written, sorted by scientic domain. These are described extensively and analysed in the chapters on sonication designs for the domain sciences, 6 - 9. Psychoacoustics: contains some demonstrations of perceptual principles written for the domain scientists. SC3-Training: contains a short Introduction to SuperCollider for sonication; this was written for the domain scientists, both in German and in English. SOS1, SOS2: contains demo versions of sonication designs for two presentations (called Sound of Science 1 and 2) at IEM Graz. testData: contains anonymous EEG data les in .edf format, for testing purposes only.
179
A.4
Quarks-SonEnvir
This folder contains all the SC3 classes written in SonEnvir that have been migrated into Quarks packages for specic topics. Each folder can be downloaded and installed as a Quark. QCD contains some Quantum Chromodynamics models implemented in SC3. SGLib contains a port of a 3D graphics library for math operations on tracking data. gui-addons contains platform-independent gui extensions to SC3. hilbert contains a le reader for loading pre-computed 4D Hilbert curve indices from les. rainData contains a data reader class for the Rain data used in the SBE workshop (see section 10). wavesets contains the Wavesets class, which analyses mono soundles into Wavesets, as dened by Trevor Wishart. This can also be used for applying granular synthesis methods on times series-like data.
A.5
Quarks-SuperCollider
These extension packages contains all the SC3 classes written in SonEnvir that have been migrated into Quarks packages for specic topics. They can be downloaded and installed from the sourceforge svn site of SuperCollider. AmbIEM: This package for binaural sound rendering using Ambisonics has become an ocial SuperCollider extension package (Quark). ARHeadtracker is an interface class to a freeware tracking system. The statistics methods implemented within SonEnvir have moved to the general SC3 quark MathLib, while others have become quarks themselves, such as the JustInTerface quark (JInT) written by Till Bovermann (within SonEnvir). Finally, the TUIO quark (Tangible User Interface Objects, also by Till Bovermann, of University Bielefeld) is of interest for sonication research with strongly interactive approaches.
Appendix B
Models - code examples
B.1
B.1.1
Spatialisation examples
Physical sources
For multiple speaker setups, a simple and very eective strategy is to use individual speakers as real physical sources. The main advantage is that physics really help in this case; when locations only serve to identify streams, as with few xed sources, xed single speakers work very well. SuperCollider supports this directly with the Out Ugen: it determines which bus a signal is written on, and thus, which audio hardware output it is heard on.
// a mono source playing out of channel 4 (indices start at 0) { Out.ar(3, Ringz.ar(Dust.ar(30), 400, 0.2)) }.play;
The JITLib library in SuperCollider3 supports a more exible scheme: sound processes (in JITLib speak, NodeProxies) run on their own private busses by default; when they should be audible, they can be routed to the hardware outputs with the .play method.
~snd = { Ringz.ar(Dust.ar(30), 400, 0.2) }; // proxy inaudible, but plays ~snd.play(3); // listen to it on hardware output 4.
NodeProxies also support more exible xed multichannel mapping very simply: The .playN method lets one route each audio channel of the proxy to one or several hardware output channels, each with optional individual level controls.
// a 3 channel source ~snd3ch = { Ringz.ar(Dust.ar([1,1,1] * 30), [400, 550, 750], 0.2) };
180
181
// to individual speakers 1, 3, 5: ~snd3ch.playN([0, 2, 4]); // to multiple speakers, with individual levels: ~snd3ch.playN(outs: [0, [1,2], [3,4]], amps: [1, 0.7, 0.7]);
B.1.2
Amplitude panning
All of the following methods work for both moving and static sources. 1D: In the simplest case the Pan2 UGen is used for equal power stereo panning.
// mouse controlled pan position { Pan2.ar(Ringz.ar(Dust.ar(30), 400, 0.2), MouseX.kr(-1, 1)) }.play;
2D: The PanAz UGen pans a single channel to a symmetrical ring of n speakers by azimuth, with adjustable width over how many speakers (at most) the energy is distributed.
( {
var numChans = 5, width = 2; var pos = MouseX.kr(0, 2); var source = Ringz.ar(Dust.ar(30), 400, 0.2); PanAz.ar(5, source, pos, width); }.play; )
In case the ring is not quite symmetrical, adjustments can be made by remapping; however, using the best geometrical symmetry attainable is always superior to postcompensation. In order to remap dynamic spatial positions to a ring of speakers at unequal angles such that the resulting directions are correct, the following example shows the steps are needed: Given a ve-speaker system, equal speaker angles would be [0, 0.4, 0.8, 1.2, 1.6, 2.0] with 2.0 being equal to 0.0 (this is the behaviour of the PanAz UGen); the actual unsymmetric speaker angles could be for example [0, 0.3, 0.7, 1, 1.5, 2.0]; so remapping should map a control value of 0.3 (where speaker 2 actually is) to a control value value of 0.4 (the control value that positions this source directly in speaker 2). The full map of corresponding values is given in table B.1.
( // remapping unequal speaker angles with asMapTable and PanAz: a = [0, 0.3, 0.7, 1, 1.5, 2.0].asMapTable; b = Buffer.sendCollection(s, a.asWavetable, 1); { |inpos=0.0| var source = Ringz.ar(Dust.ar(30), 400, 0.2));
182 Table B.1: Remapping spatial control values list of breakpoints for desired spatial position 0.0 0.3 0.7 1.0 1.5 0.0 == 2.0 equally spaced output/ mapped control values 0.0 0.4 0.8 1.2 1.6 2.0 == 0.0
var pos = Shaper.kr(b.bufnum, inpos.wrap(0, 2)); PanAz.ar(a.size - 1, source, pos); }.play; )
Mixing multiple channel sources down to stereo: The Splay UGen mixes an array of channels down to 2 channels, at equal pan distances, with adjustable spread and center position. Internally, it uses a Pan2 UGen.
~snd3ch = { Ringz.ar(Dust.ar([1,1,1] * 30), [400, 550, 750], 0.2) }; ~snd3pan = { Splay.ar(~ snd3ch, spread: 0.8, level: 0.5, center: 0) }; ~snd3pan.playN(0);
Mixing multiple channel sources into a ring of speakers: The SplayZ UGen pans an array of source channels into a number of output channels at equal distances; spread and center position can be adjusted. Both larger numbers of channels can be splayed into rings of fewer speakers, and vice versa. Internally, SplayZ uses a PanAz UGen.
// spreading 4 channels equally into a ring of 6 speakers ~snd4ch = { Ringz.ar(Dust.ar([1,1,1,1] * 30), [400, 550, 750, 900], 0.2) }; ~snd4pan = { SplayZ.ar(6, ~ snd4ch, spread: 1.0, level: 0.5, center: 0) }; ~snd4pan.playN(0);
3D: The SonEnvir extension TorusPanAz does the same for setups with rings of rings of speakers. Again, the speaker setup should be as symmetrical as possible; compensation can be trickier here. (In general, even while compensations for less symmetrical setups seem mathematically possible, spatial images will be worse outside the sweet spot. Maximum attainable physical symmetry cannot be fully substituted by more DSP math.)
183
( // panning to 3 rings of 12, 8, and 4 speakers, cf. IEM CUBE. ~snd = { Ringz.ar(Dust.ar(30), 550, 0.2) }; ~toruspan = { var hAngle = MouseX.kr(0, 2); // all the way around (2 == 0) var vAngle = MouseY.kr(0, 1.333); // limited to highest ring TorusPanAz.ar([12, 8, 4], ~snd.ar(1), hAngle, vAngle ); }; ~toruspan.playN(0); )
Compensating overall vertical ring angles and individual horizontal speaker angles within each ring is straightforwrd with the asMapTable method as shown above. For placement deviations that are both horizontal and vertical, it is preferable to have Vector Based Amplitude Panning in SC3, which has been implemented recently by Scott Wilson and colleagues1 . However, this was not needed within the context of the SonEnvir project.
B.1.3
Ambisonics
While some Ambisonics UGens previously existed in SuperCollider, the SonEnvir team decided to write a consistent new implementation of Ambisonics in SC3, based on a subset of the existing PureData libraries. This package was realised up to third order Ambisonics by Christopher Frauenberger for the AmbIEM package, available here2 . It supports the main speaker setup of interest (a half-sphere of 12, 8 and 4 speakers, the CUBE at IEM, with several coecent sets for dierent tradeo choices), and for a setup with 1-4-7-4 speaker rings, mainly used as a more ecient lower resolution alternative for headphone rendering, as described below.
( // panning two sources with 3rd order ambisonics into CUBE sphere. ~snd0 = { Ringz.ar(Dust.ar(30), 400, 0.2) }; ~snd1 = { Ringz.ar(Dust.ar(30), 550, 0.2) }; ~pos0 = [0, 0.01]; // azimuth, elevation ~pos1 = [1, 0.01]; // azimuth, elevation ~encoded[0] = { PanAmbi30.ar( ~snd0.ar, ~pos0.kr(1, 0), ~pos0.kr(1, 1)) }; ~encoded[1] = { PanAmbi30.ar( ~snd1.ar, ~pos1.kr(1, 0), ~pos1.kr(1, 1)) };
1 2
See http://scottwilson.ca/site/Software.html http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/
184
decode24 = { DecodeAmbi3O.ar(~ encoded.ar, CUBE_basic) }; decode24.play(0); )
B.1.4
Headphones
Ambisonics and Virtual Binaural Rendering For complex changing scenes, the IEM has developed a very ecient approach for binaural rendering (Musil et al. (2005); Noisternig et al. (2003)): In eect, taking a virtual, symmetrical speaker setup (such as 1-4-7-4), and spatializing to that setup with Ambisonics; then rendering these virtual speakers as point sources with their appropriate HRIRs, thus arriving at a binaural rendering. This provides the benet that the Ambisonic eld can be rotated as a whole, which is really useful when head movements of the listener are tracked, and the binaural rendering is designed to compensate for them. Also, the known problems with Ambisonics when listeners move outside the sweet zone disappear; when one carries a setup of virtual speakers around ones head, one is always right in the center of the sweet zone. This approach has been ported to SC3 by C. Frauenberger; its main use is in the VirtualRoom class, which simulates moving sources within a rectangular box-shaped room. This class has turned out to be very useful as a simple way to prepare both experiments and presentations for multi-speaker setups by relatively simple headphone simulation.
( // VirtualRoom example - adapted from help file. // preparation: reserve more memory for delay lines, and boot the server s.options.memSize_(8192 * 16) .numAudioBusChannels_(1024); s.boot; // make a proxyspace p = ProxySpace.push; // set the path for the folder with Kemar files. VirtualRoom.kemarPath = "KemarHRTF/"; ) ( // create a virtual room v = VirtualRoom.new; // and start its binaural rendering v.init; // set the room properties (reverberation time and gain, // hf damping on reverb and early reflections gain) v.revTime = 0.1;
185
v.revGain = 0.1; v.hfDamping = 0.5; v.refGain = 0.8; ) ( // set room dimension [x, y, z, x, y, z]: // a room 8m wide (y), 5m deep(x) and 5m high(z) // - nose is always along x v.room = [0, 0, 0, 5, 8, 5]; // make it play to hardware stereo outs v.out.play; // listener is listener position, a controlrate nodeproxy; // here movable by mouse. v.listener.source = { [ MouseY.kr(5,0), MouseX.kr(8,0), 1.6, 0] }; ) // add three sources to the scene ( // make three different sounds ~noisy = { Decay.ar(Impulse.ar(10, 2), 0.2) * PinkNoise.ar(1) }; ~ringy = { Ringz.ar(Dust.ar(10), [400, 600,950], [0.3, 0.2, 0.05]).sum }; ~dusty = { Dust.ar(400) }; ) // add the three sources to the virtual room: // source, name, xpos, ypos, zpos v.addSource( ~noisy, \noisy, 1, 2, 2.5); // bottom right corner v.addSource( ~ringy, \ringy, 1.5, 7, 2.5); // bottom left v.addSource( ~dusty, \dusty, 4, 5, 2.5); // top, left of center v.sources[\noisy].set(\xpos, 4, \ypos, 6, \zpos, 2); // set noisy position v.sources[\noisy].getKeysValues; // check its position values v.sources[\ringy].set(\xpos, 2.5, \ypos, 4, \zpos, 2); // remove the sources v.removeSource(\noisy); v.removeSource(\ringy); v.removeSource(\dusty); v.free; p.pop; // free the virtual room and its resources // and clear and leave proxyspace
Among other things, the submissions for the ICAD 2006 concert3 , described also in section 4.3) were rendered from 8 channels to binaural for the reviewers, and for the web
3
http://www.dcs.qmul.ac.uk/research/imc/icad2006/concert.php
186 documentation4 . One can of course also spatialize sounds on the virtual speakers by any of the simpler panning strategies given above as well; this trades o easy rotation of the entire setup for better point source localisation. To support simple headtracking, C. Frauenberger also created the ARHeadTracker application, also available as a package from the SonEnvir website here5 .
B.1.5
Handling speaker imperfections
All standard spatialisation techniques work best when speaker setups are as symmetrical and well-controlled as possible. While it may not always be feasible to adjust mechanical positions of speakers freely for very precise geometry, a number of factors can be measured and compensated for, and this is supported by several utility classes written in SuperCollider, which are part of the SonEnvir framework. Latency The Latency class plays a test signal for a given number of audio channels, and waits for the signals to arrive back at an audio input. The resulting list of measured per-channel latencies can be used to create compensating delay lines, e.g. in the SpeakerAdjust class described below.
// test 2 channels, max delay expected 0.2 sec, // take default server, mic is on AudioIn 1: Latency.test(2, 0.2, Server.default, 1); // stop measuring and post results Latency.stop; // results are posted like this: // measured latencies: in samples: [ 1186.0, 1197.0 ] in seconds: [ 0.026893424036281, 0.027142857142857 ]
Spectralyzer While inter-speaker latency dierences are well-known and very often addressed, we have found another common problem to be more distracting for multichannel sonication: Each individual channel of the reproduction chain, from D/A converter to amplier,
4 5
http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/
187 cable, loudspeaker, and speaker mounting location in the room, can sound quite dierent. When changes in sound timbre can encode meaning, this is potentially really confusing! To address this, the Spectralyzer class allows for simple analysis of a test signal as played into a room, with optional smoothing over several measurements, and then tuning compensating equalizers by hand for reasonable similarity across all speaker channels. While this could be written to run automatically, we consider it more of an art than an engineering task; a more detailed EQ intervention will make the frequency response atter, but may color the sound more by smearing its impulse behaviour.
x = Spectralyzer.new; x.start; x.makeWindow; x.listenTo({ PinkNoise.ar }); x.listenTo({ AudioIn.ar(1)}); // // // // make a new spectralyzer start it, open its GUI pink noise should look flat should look similar from microphone.
Figure B.1: The Spectralyzer GUI window.

For full details see the Spectralyzer help le.
( // tuning 2 speakers for better linearity p = ProxySpace.push; ~noyz = { PinkNoise.ar(1) }; // create a noise source ~noyz.play(0, vol: 0.5); // filter it with two bands of parametric eq ~noyz.filter(5, { |in, f1=100,rq1=1,db1=0,f2=5000,rq2=1,db2=0| MidEQ.ar(MidEQ.ar(in, f1, rq1, db1), f2, rq2, db2); }); ) // tweak the two bands for better acoustic linearity ~noyz.set(\f1, 1200, \rq1, 1, \db1, -5); // take out low presence bump ~noyz.set(\f2, 150, \rq2, 0.6, \db2, 3); // boost bass dip
188
~noyz.getKeysValues.drop(1).postcs; // move on to speaker 2 ~noyz.play(1, vol: 0.5); // tweak the two bands again for speaker ~noyz.set(\f1, 1200, \rq1, 1, \db1, 0); ~noyz.set(\f2, 150, \rq2, 0.6, \db2, 0); ~noyz.getKeysValues.drop(1).postcs; // post settings when done
2 // likely to be different ... // from speaker 1. // post settings.
SpeakerAdjust Once one has achieved usable EQ curves for every speaker channel, one can begin to compensate for volume dierences between channels (with big timbral dierences between channels, measuring volume or adjusting it by listening is rather pointless). The SpeakerAdjust class expects simple specications for each channel: amplitude (as multiplication factor, typically below 1.0), optionally: delaytime (in seconds, to be independent of the current samplerate), optionally: eq1-frequency, eq1-gain, eq1-relative-bandwidth, optionally: eq2-frequency, eq2-gain, eq2-relative-bandwidth, and repeat for as many bands as desired.
// From SpeakerAdjust.help: // adjustment for 2 channels, amp, dtime, eq specs; // you can add as many triplets of eqspecs as you want. ( var specs; specs = [ // amp, dtime, eq1: frq, db, rq; eq2: frq, db, amp [ 0.75, 0.0, [ 250, 4, 0.5], [ 800, -4, 1]], [ 1, 0.001, [ 250, 2, 0.5], [ 5000, 3, 1]] ]; { var ins; ins = Pan2.ar(PinkNoise.ar(0.05), MouseX.kr(-1, 1)); SpeakerAdjust.ar(ins, specs) }.play; )
Such a speaker adjustment can be created and added to the end of the signal chain to linearise the given speaker setup as much as possible; of course, adding limiters for speaker and listener protection can be built into such a master eects unit as well.
Appendix C
Physics Background
C.1
Constituent Quark Models
The concept of constituent quarks was introduced in the 1960s by Gell-Mann (1964) and Zweig (1964), based on symmetry considerations in the classication of hadrons, the strongly interacting elementary particles. The rst CQMs for the description of hadron spectra were introduced in the early 1970s by de Rjula et al. (1975). The original CQMs u relied on simple models for the connement of constituent quarks (such as the harmonic oscillator potential) and employed rudimentary hyperne interactions. Furthermore they were set up in a completely nonrelativistic framework. In the meantime CQMs have undergone a vivid development. Over the years more and more notions deriving from QCD have been implemented, and CQMs are constructed within a relativistic formalism. Modern CQMs all use a connement potential of linear form, as suggested by QCD. For the hyperne interaction of the constituent quarks several competing dynamical concepts have been proposed: A prominent representative is the one-gluon-exchange (OGE) CQM, whose dynamics for the hyperne interaction basically relies on the original ideas of Zweig (1964): the eective interaction between the constituent quarks is generated by the exchange of a single gluon. For the data we experimented with, we considered a relativistic variant of the OGE CQM as constructed by Theussl et al. (2001). A dierent approach is followed by the so-called instanton-induced (II) CQM (Loering et al. (2001)), whose hyperne forces derive from the t Hooft interaction. Several years ago the physics group at Graz University has suggested a hyperne interaction based on the exchange of Goldstone bosons. This type of dynamics is motivated by the spontaneous breaking of chiral symmetry (SBS), which is an essential property of QCD at low energies. The SBS is considered to be responsible for the quarks to acquire a (heavier) dynamical mass, and their interaction should then be generated by the exchange of Goldstone bosons, the latter being another consequence of SBS. The Goldstone-boson-exchange (GBE) CQM was originally suggested in a simplied version, based on the exchange of pseudoscalar bosons only (Glozman et al. (1998)). In the meantime an extended version 189
190 has been formulated by Glantschnig et al. (2005). Quantum-Mechanical Solution of Constituent Quark Models Modern CQMs are constructed in the framework of relativistic quantum mechanics (RQM). They are characterised by a Hamiltonian operator H that represents the total energy of the system under consideration. For baryons, which are considered as bound states of three constituent quarks, the corresponding Hamiltonian reads H = H0 +
i<j
[Vconf (i, j) + Vhf (i, j)] ,
(C.1)
The rst term on the right-hand side denotes the relativistic kinetic energy of the system (of the three constituent quarks), and the sum includes all mutual quark-quark interactions. It consists of two parts, the connement potential Vconf and the hyperne interaction Vhf . The connement potential prevents the constituent quarks from escaping the volume of the baryon (being of the order of 1015 m); no free quarks have ever been observed in nature. The hyperne potential provides for the ne structure of the energy levels in the baryon spectra. Dierent dynamical models lead to distinct features in the excitation spectra of baryons. In order to produce the baryon spectra of the CQMs one has to solve the eigenvalue problem of the Hamiltonian in equation C.1. Several methods are available to achieve solutions to any desired accuracy. The Graz group has applied both integral-equation (Krassnigg et al. (2000)) as well as dierential-equation techniques (Suzuki and Varga (1998)). Upon solving the eigenvalue problem of the Hamiltonian one ends up with the eigenvalues (energy levels) and eigenstates (quantum-mechanical wave functions) of the baryons. They are characterised according to the conserved quantum numbers, the total angular momentum J (which is half integer in the case of baryons) and the parity P (being positive or negative). The dierent baryons are distinguished by the avor of their constituent quarks, which can be u, d, and s (for up, down, and strange). For example, the proton is uud, the neutron is udd, the ++ is uuu, and the 0 is uds. Classication of Baryons The total baryon wave function XSF C is composed of spatial (X), spin (S), avor (F ), and color (C) degrees of freedom corresponding to the product of symmetry spaces XSF C = XSF singlet , C (C.2)
191 It is antisymmetric under the exchange of any two particles, since baryons must obey Fermi statistics. There are several visual representations of the symmetries between the dierent baryons based on their combinations of quarks; gure C.1 shows one of them.
Figure C.1: Multiplet structure of the baryons as a decuplet.

In this ordering of baryon avor symmetries, all the light and strange baryons are in the lowest layer.
Quarks are dierentiated by the following properties: Color The color quantum numbers are r, b, g (for red, blue, and green). Only white baryons are observed in experiment. Thus the color wave function corresponds to a color singlet state and is therefore completely antisymmetric. As a consequence the rest of the wave function (comprising spatial, spin, and avor degrees of freedom) must be symmetric. Flavor According to the Standard Model (SM) of particle physics there are six quark avors: up, down, strange, charm, bottom, and top. Quarks of dierent avours have dierent masses. Normal hadronic matter (i.e. atomic nuclei) is basically composed only of the so-called light avors u and d. CQMs consider hadrons with avors u, d, and s. These are also the ones that are most aected by the SBS. Correspondingly, one works in SU (3)F and deals with baryons classied within singlet, octet, and decuplet multiplets. For example, the nucleons (proton and neutron) are in an octet, together with the , , and particles.
1 Spin All quarks have spin 2 . The spin wave function of the three quarks is constructed
192 within SU (2)S and is thus symmetric or mixed symmetric or mixed antisymmetric. The total spin of a baryon is denoted by S. Orbital Angular Momentum and Parity The spatial wave function corresponds to a given orbital angular momentum L of the three-quark system. Its symmetry property under spatial reections determines the parity P . Total Angular Momentum The total angular momentum J is composed of the total orbital angular momentum L and the total spin S of the three-quark system according to the quantum-mechanical addition rules of angular momenta: J = L + S. It is always half-integer. The total angular momentum J is a conserved quantum number and, together with the parity P , serves for the distinction of baryon multiplets J P .
C.2
Potts model- theoretical background
In mathematical terms, the Hamilton-function H denes the overall energy, which any physical system, and thus also a Potts model, will try to minimize: H = J
<i,j>
Si Sj M
i
Si
(C.3)
where J is the coupling parameter between spin Si and its neighbouring spin Sj . J is inversely proportional to the temperature; M is the eld strength of an exterior magnetic eld acting on each spin Si . The rst sum is denoted over nearest neighbours and describes the coupling term. It is responsible for the phase transition. If J = 0, only the second term remains, and the Hamiltonian describes a paramagnet, being only magnetised in the presence of an exterior magnetic eld. In our simulations, M was always 0. When studying phase transitions macroscopically, the dening term is the free energy F. F (T, H) = kB T lnZ(T, H) (C.4)
It is proportional to the logarithm of the so-called partition function Z of statistical physics, which sums up all possible spin congurations and weights them with a Boltzmann factor kB . Energetically unfavorable states are less probable in the partition function than energetically favorable ones. Z=
Sn
k HT
B
(C.5)
193 The partition function Z (eq. C.5) is not calculable in practice due to combinatorial explosion: a three dimensional lattice with a length of 100 and two possible spin states 5 3 has 2100 = (210 )10 10300.000 congurations that would have to be summed up - at every time step of the simulation. Also in analytical deduction only few spin models have been solved exactly, and in three dimensions not even the simple Ising model is analytically solvable. Therefore classical treatment relies mainly on approximation methods, which allow partly to estimate critical exponents, and can be outlined briey as follows: Early theories addressing phase transitions, like Van der Waals theory of uids and Weiss theory of magnetism can be subsumed under Landau theory or mean-eld theory. Meaneld theory assumes a mean value for the free energy. Landau derived a theory, where the free energy is expanded as a power series in the order parameter, and only terms are included which are compatible with the symmetry of the system. The problem is that all of these approaches ignore uctuations by relying only on mean values. (For a detailed review of phase transition theories please refer to Yeomans (1992).) Renormalization group theory by K. G. Wilson Wilson (1974) solved many problems of critical phenomena, most importantly the understanding of why continuous phase transitions fall into universality classes. The basic idea is to do a transformation that changes the scale of the system but not its partition function. Only at the critical point the properties of the system will not change under such a transformation, and it is then described by so-called xed points in the parameter space of all Hamiltonians. This is why critical exponents are universal for dierent systems.
C.2.1
Spin models sound examples
The following audio les can be downloaded from http://sonenvir.at/downloads/spinmodels/. The rst part describes sonications that enable the listener to classify the phase of the model (sub-critical, critical, super-critical). Granular sonications: Random, averaged spin blocks were used to determine the sound grains. The spatial setting cannot be reproduced in this recording. But even without having a clear gestalt of the system, the dierent characteristics of IsingHot, IsingCritical and IsingCold may easily be distinguished. Audication approaches: (Please consider that a few clicks in the audio les below are artifacts of the data management and buering in the computer.) 1. Noise: NoiseA gives the audication of a 3-state Potts model at thermal noise (coupling J = 0.4)
194 NoiseB gives the same for the 5-state Potts model (J = 0.4), evidently the sound becomes smoother the more states are possible, but its overall character stays the same. 2. Critical behaviour: this example was recorded with a 4-state Potts model at and near the critical temperature: SuperCritical - near the critical point clusters emerge. These are rather big but homogeneous, hence a regularity is still perceivable. (J = 0.95) Critical - at the critical point itself, clusters of all orders of magnitude emerge, thus the sound is much more unstable and less pleasant. (J = 1.05) 3. SubCritical - as soon as the system is equilibrated in the subcritical domain (at T < Tcrit ), one spin orientation predominates, and only few random spin ips occur due to thermal uctuations. (Recorded with the Ising model at J = 1.3.) The next examples study the order of the phase transition. Direct audication displays only a very subtle dierences between the two types of phase transitions: 1. The 4-state Potts model is played in ContinousTransition. 2. A more sudden change can be perceived in FirstOrderTransition for the 5state Potts model. Audication with separate spin channels: For each spin-orientation the lattice is sequentialised and the resulting audication is played on an own channel. The lattice size was 32x32, and the system was equilibrated at each step. The examples nish with one spin orientation prevailing, which means that only random clicks from a non-vanishing temperature remain. 1. The transition in the 2-state Ising model and the 4-state Potts model are continuous, the change is smooth. 2. In the 5-state and 8-state models the phase transition is abrupt (the data is more distinct the more states are involved).
Appendix D
Science By Ear participants

The following people took part in the Science By Ear workshop: SonEnvir members/moderators Daye, Christian De Campo, Alberto Eckel, Gerhard Frauenberger, Christopher Vogt, Katharina Wallisch, Annette Programming specialists Bovermann, Till, Neuroinformatics Group, Bielefeld University De Campo, Alberto Frauenberger, Christopher Pauletto, Sandra, Music Technology Group, York University Musil, Thomas, Audio/DSP, Institute of Electronic Music (IEM) Graz Rohrhuber, Julian, Academy of Media Arts (KHM) Cologne
Sonication experts Baier, Gerold, Dynamical systems, University of Morelos, Mexico Bonebright, Terri, Psychology/Perception, DePauw University Bovermann, Till Dombois, Florian, Transdisciplinarity, Y Institute, Arts Academy Berne Hermann, Thomas, Neuroinformatics Group, Bielefeld University
195
196 Kramer, Gregory, Metta Organization Pauletto, Sandra Stockman, Tony, Computer Science, Queen Mary Univ. London
Domain scientists Baier, Gerold Dombois, Florian Egger de Campo, Marianne, Sociology, Compass Graz Fickert, Lothar, Electrical power systems, University of Technology (TU) Graz Grond, Florian, Chemistry / media art, ZKM Karlsruhe Grossegger, Dieter, EEG Software, NeuroSpeed Vienna Hipp, Walter, Electrical power systems, TU Graz Huber, Anton, Physical Chemistry, University of Graz Markum, Harald, Atomic Institute of the Austrian Universities, TU Vienna Plessas, Willibald, Physics Institute, University of Graz Shutin, Dimitri, Electrical power systems, TU Graz Schweitzer, Susanne, Wegener Center for Climate and Global Change, University of Graz Witrisal, Klaus, Electrical power systems, TU Graz
Appendix E
Background on Navegar
The saying has a long history. Plutarch ascribes it to General Pompeius saying this line to soldiers he sent o on a suicide mission, and Veloso may well have read it in a famous poem by Fernando Pessoa. Here are Velosos lyrics: Table E.1: Os Argonautas - Caetano Veloso O barco, meu corao no aguenta ca a Tanta tormenta, alegria Meu corao no contenta ca a O dia, o marco, meu corao, o porto, no ca a Navegar preciso, viver no preciso e a e O barco, noite no cu to bonito e a Sorriso solto perdido Horizonte, madrugada O riso, o arco, da madrugada O porto, nada Navegar preciso, viver no preciso e a e O barco, o automvel brilhante o O trilho solto, o barulho Do meu dente em tua veia O sangue, o charco, barulho lento O porto silncio e Navegar preciso, viver no preciso e a e
(Literal English translation: Alberto de Campo.)
the ship, my heart cannot handle it so much torment, happiness my heart is discontent the day, the limit, my heart, the port, no sea-faring is necessary, living is not the ship, night in the beautiful sky the free smile, lost horizon, morning dawn the laugh, the arc, of morning the port, nothing sea-faring is necessary, living is not the ship, the brilliant automobile the free track, the noise of my tooth in your vein the blood, the swamp, slow soft noise the port - silence sea-faring is necessary, living is not
197
Appendix F
Sound, meaning, language

Sounds can change their meanings in dierent contexts. This ambiguity has also been interesting for poetry, as this work by Ernst Jandl shows. Ernst Jandl - Oberchenbersetzung (Surface Translation) a u mai hart lieb zapfen eibe hold er rennbohr in sees kai. so was sieht wenn mai luft begehen, a so es sieht nahe emma mhen, a so biet wenn rschel grollt a ohr leck mit ei! seht steil dies fader rosse mhen, a in teig kurt wisch mai desto bier baum deutsche deutsch bajonett schur alp eiertier. Original poem by William Wordsworth My heart leaps up when I behold a rainbow in the sky. so was ist when my life began, so is it now I am a man, so be it when I shall grow old or let me die! The child is father of the man and I could wish my days to be bound each to each by natural piety.
198
Bibliography
Abbott, A. (1990). A Primer on Sequence Methods. Organization Science, 1(4):375392. Abbott, A. (1995). Sequence Analysis: New Methods for Old Ideas. Annual Review of Sociology, 21:93113. Anderson, M. L. (2003). Embodied cognition: A eld guide. Articial Intelligence, 149(1):91130. Anonymous (March 20, 2001). Lhistoire: PDG surpays. Libration. e e Armstrong, N. (2006). An Enactive Approach to Digital Musical Instrument Design. PhD thesis, Princeton University. Baier, G. and Hermann, T. (2004). The Sonication of Rhythms in Human Electroencephalogram. In Proc. Int. Conf. on Auditory Display (ICAD), Sydney, Australia. Baier, G., Hermann, T., Sahle, S., and Ritter, H. (2006). Sonied Epilectic Rhythms. In Proc. Int Conf. on Auditory Display (ICAD), London, UK. Baier, G., Hermann, T., and Stephani, U. (2007). Event-based sonication of EEG rhythms in real time. Clinical Neurophysiology, 118(6). Barnes, J. (2007). The Odd Couple. Review of That Sweet Enemy: The French and the British from the Sun King to the Present by Robert and Isabelle Tombs. New York Review of Books, LIV(5):49. Barrass, S. (1997). Auditory Information Design. PhD thesis, Australian National University. Barrass, S. and Adcock, x. (2004). Sonication Design Patterns. In Proc. Int. Conf. on Auditory Display (ICAD), Sydney, Australia.
199
200 Barrass, S., Whitelaw, M., and Bailes, F. (2006). Listening to the Mind Listening: An Analysis of Sonication Reviews, Designs and Correspondences. Leonardo Music Journal, 16:1319. Barrass, T. (2006). Description of Sonication for ICAD 2006 Concert: Life Expectancy. In Proc. Int Conf. on Auditory Display (ICAD), London, UK. Beck, U. (1992). Risk Society: Towards a New Modernity. Sage, New Delhi. Ben-Tal, O., Berger, J., Cook, B., Daniels, M., Scavone, G., and Cook, P. (2002). SONART: The Sonication Application Research Toolbox. In Proc. ICAD, Kyoto, Japan. Blauert, J. (1997). Spatial Hearing: The Psychophysics of Human Hearing. MIT Press. Blossfeld, H.-P., Hamerle, A., and Mayer, K. U. (1986). Ereignisanalyse. Statistische Theorie und Anwendung in den Wirtschafts- und Sozialwissenschaften. Campus, Frankfurt. Blossfeld, H.-P. and Rohwer, G. (1995). Techniques of event history modeling. New approaches to causal analysis. Lawrence Erlbaum Associates, Mahwah (N. J.). Borges, J. L. (1980). The analytical language of john wilkins. Labyrinths. Penguin. In
Boulanger, R. (2000). The Csound Book: Perspectives in Software Synthesis, Sound Design, Signal Processing, and Programming. MIT Press, Cambridge, MA, USA. Bovermann, T. (2005). MBS-Sonogram. uni-bielefeld.de/~tboverma/sc/. http://www.techfak.
Bovermann, T., de Campo, A., Groten, J., and Eckel, G. (2007). Juggling Sounds. In Proceedings of Interactive Sonication Workshop ISon2007. Bovermann, T., Hermann, T., and Ritter, H. (2006). Tangible data scanning sonication model. In Proc. of the International Conference on Auditory Display, London, UK. Bregman, A. S. (1990). Auditory Scene Analysis. Bradford Books, MIT Press, Cambridge, MA.
201 Bruce, J. and Palmer, N. (2005). SIFT: Sonication Integrable Flexible Toolkit. In Proc. Int Conf. on Auditory Display (ICAD), Limerick, Ireland. Buxton, Bill with Billinghurst, M., Guiard, Y., Sellen, A., and Zhai, S. (2008). Human Input to Computer Systems: Theories, Techniques and Technology. http://www.billbuxton.com/ inputManuscript.html. Candey, R., Schertenleib, A., and Diaz Merced, W. (2006). xSonify: Sonication Tool for Space Physics. In Proc. Int Conf. on Auditory Display (ICAD), London, UK. Conner, C. D. (2005). A Peoples History of Science: Miners, Midwives and Low Mechanicks. Nation Books, New York, NY, USA. Cooper, D. H. and Shiga, T. (1972). Discrete-Matrix Multichannel Stereo. J. Audio Eng. Soc., 20:344360. Cruz-Neira, C., Sandin, D. J., DeFanti, T. A., Kenyon, R. V., and Hart, J. C. (1992). The CAVE: Audio Visual Experience Automatic Virtual Environment. Commun. ACM, 35(6):6472. Day, C. and de Campo, A. (2006). Sounds sequential: Sonication in e the Social Sciences. Interdisciplinary Science Reviews, 31(6):349364. Day, C., de Campo, A., and Egger de Campo, M. (2006). Sonikationen e in der wissenschaftlichen Datenanalyse. Angewandte Sozialforschung, 24(1/2):4156. Day, C., de Campo, A., Fleck, C., Frauenberger, C., and Edelmayer, G. e (2005). Sonication as a tool to reconstruct users actions in unobservable areas. In Proceedings of ICAD 2005, Limerick. de Campo, A. (2007a). A Sonication Design Space Map. In Proceedings of Interactive Sonication Workshop ISon2007. de Campo, A. (2007b). Toward a Sonication Design Space Map. In Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada. de Campo, A. and Day, C. (2006). Navegar E Preciso, Viver No E e a Preciso. In Proc. Int. Conf. on Auditory Display (ICAD), London, UK. de Campo, A. and Egger de Campo, M. (1999). Sonication of Social Data. In Proceedings of the 1999 International Computer Music Conference (ICMC) Beijing.
202 de Campo, A., Frauenberger, C., and Hldrich, R. (2004). Designing o a Generalized Sonication Environment. In Proceedings of the ICAD 2004, Sydney. de Campo, A., Frauenberger, C., and Hldrich, R. (2005a). Sonenvir o - a progress report. In Proc. Int. Computer Music Conf. (ICMC), Barcelona, Spain. de Campo, A., Frauenberger, C., Vogt, K., Wallisch, A., and Day, e C. (2006a). Sonication as an Interdisciplinary Working Process. In Proceedings of ICAD 2006, London. de Campo, A., Hrmann, N., M., H., P., and W., Vogt, K. (2006b). Sonio cation of lattice data: Dirac spectrum and monopole condensation along the deconnement transition. In Proceedings of the Miniconference in honor of Adriano Di Giacomo on the Sense of Beauty in Physics, Pisa, Italy. de Campo, A., Hrmann, N., Markum, H., Plessas, W., and Sengl, B. o (2005b). Sonication of Lattice Data: The Spectrum of the Dirac Operator Across the Deconnement Transition. In Proc. XXIIIrd Int. Symposium on Lattice Field Theory, Trinity College, Dublin, Ireland. de Campo, A., Hrmann, N., Markum, H., Plessas, W., and Sengl, B. o (2005c). Sonication of Lattice Observables Across Phase Transitions. In International Workshop on Xtreme QCD, Swansea. de Campo, A., Hrmann, N., Markum, H., Plessas, W., and Vogt, K. o (2006c). Sonication of Monopoles and Chaos in QCD. In Proc. of ICHEP06 - the XXXIIIrd International Conference on High Energy Physics, Moscow, Russia. de Campo, A., Sengl, B., Frauenberger, C., Melde, T., Plessas, W., and Hldrich, R. (2005d). Sonication of Quantum Spectra. In Proc. Int o Conf. on Auditory Display (ICAD), Limerick, Ireland. de Campo, A., Wallisch, A., Hldrich, R., and Eckel, G. (2007). New o Sonication Tools for EEG Data Screening and Monitoring. In Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada. de Rjula, A., Georgi, H., and Glashow, S. L. (1975). Hadron masses in u a gauge theory. Phys. Rev., D12(147). Dix, A. (1996). Closing the loop: Modelling action, perception and information. In Catarci, T., Costabile, M. F., Levialdi, S., and Santucci,
203 G., editors, AVI96 - Advanced Visual Interfaces, pages 2028. ACM Press. Dix, A., Finlay, J., Abowd, G., and Beale, R. (2004). Human-Computer Interaction. Prentice Hall, Harlow, 3rd edition. Dombois, F. (2001). Using Audication in Planetary Seismology. In Proc. Int Conf. on Auditory Display (ICAD), Espoo, Finland. Drake, S. (1980). Galileo. Oxford University Press, New York. Drori, G. S., Meyer, J. W., Ramirez, F. O., and Schofer, E. (2003). Science in the Modern World Polity: Institutionalization and Globalization. Stanford University Press, Stanford. Ebe, M. and Homma, I. (2002). Leitfaden fr die EEG-Praxis. Urban u und Fischer bei Elsevier, 3rd edition. Eidelman, S. e. a. (2004). Review of Particle Physics. Phys. Lett., B592(1). Fickert, L., Eckel, G., Nagler, W., de Campo, A., and Schmautzer, E. (2006). New developments of teaching concepts in multimedia learning for electrical power systems introducing sonication. In Proceedings of the 29th ICT International Convention MIPRO, Opatija, Croatia. Fitch, T. and Kramer, G. (1994). Sonifying the Body Electric: Superiority of an Auditory over a Visual Display in a Complex Multivariate System. In Kramer, G., editor, Auditory Display. Addison-Wesley. Frauenberger, C., de Campo, A., and Eckel, G. (2007). Analysing time series data. In Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada. Gardner, B. and Martin, K. (1994). Hrtf measurements of a kemar dummy-head microphone. online. Gaver, W. W., Smith, R. B., and OShea., T. (1991). Eective Sounds in Complex Systems: The ARKola Simulation. In Proceedings of CHI 91, New Orleans, USA. Gell-Mann, M. (1964). A Schematic Model of Baryons and Mesons. Phys. Lett., 8:214. Gerzon, M. (1977a). Multi-System Ambisonic Decoder, Part 1: Basic Design Philosophy. Wireless World, 83(1499):4347.
204 Gerzon, M. (1977b). Multi-System Ambisonic Decoder, Part 2: Main Decoder Circuits. Wireless World, 83(1500):6973. Ghazala, R. (2005). Circuit-Bending: Build Your Own Alien Instruments. Wiley, Hoboken, NJ. Giddens, A. (1990). The Consequences of Modernity. Stanford University Press. Giddens, A. (1999). Runaway World. A series of lectures on globalisation for the BBC. http://news.bbc.co.uk/hi/english/static/ events/reith_99/. Glantschnig, K., Kainhofer, R., Plessas, W., Sengl, B., and Wagenbrunn, R. F. (2005). Extended Goldstone-boson-exchange Constituent Quark Model. Eur. Phys. J. A. Glaser, B. and Strauss, A. (1967). The Discovery of Grounded Theory. Aldine. Glozman, L., Papp, Z., Plessas, W., Varga, K., and Wagenbrunn, R. F. (1998). Unied Description of Light- and Strange-Baryon Spectra. Phys. Rev., D58(094030). Goodrick, M. (1987). The Advancing Guitarist. Hal Leonard. GSL Team (2007). Gnu scientic library. software/gsl/manual/gsl-ref.html. http://www.gnu.org/
Harrar, L. and Stockman, T. (2007). Designing Auditory Graph Overviews. In Proceedings of ICAD 2007, pages 306311. McGill University. Hayward, C. (1994). Listening to the Earth Sing. In Kramer, G., editor, Auditory Display, pages 369404. Addison-Wesley, Reading, MA, USA. Hermann, T. (2002). Sonication for Exploratory Data Analysis. PhD thesis, Bielefeld University, Bielefeld, Germany. Hermann, T., Baier, G., Stephani, U., and Ritter, H. (2006). Vocal Sonication of Pathologic EEG Features. In Proceedings of ICAD 2006, London. Hermann, T. and Hunt, A. (2005). Introduction to Interactive Sonication. IEEE Multimedia, Special Issue on Sonication, 12(2):2024.
205 Hermann, T., Nlker, C., and Ritter, H. (2002). Hand postures for o sonication control. In Wachsmuth, I. and Sowa, T., editors, Gesture and Sign Language in Human-Computer Interaction, Proc. Int. Gesture Workshop GW2001, pages 307316. Springer. Hermann, T. and Ritter, H. (1999). Listen to your Data: Model-Based Sonication for Data Analysis. In Advances in intelligent computing and multimedia systems, pages 189194, Baden-Baden, Germany. Int. Inst. for Advanced Studies in System research and cybernetics. Hinterberger, T. and Baier, G. (2005). POSER: Parametric Orchestral Sonication of EEG in Real-Time for the Self-Regulation of Brain States. IEEE Multimedia, Special Issue on Sonication, 12(2):7079. Hollander, A. (1994). An Exploration of Virtual Auditory Shape Perception. Masters thesis, Univ. of Washington. Hunt, A. and Pauletto, S. (2006). The Sonication of EMG data. In Proceedings of the International Conference on Auditory Display (ICAD), London, UK. Hunt, A. D., Paradis, M., and Wanderley, M. (2003). The importance of parameter mapping in electronic instrument design. Journal of New Music Research, 32(4):429440. Igoe, T. (2007). Making Things Talk. Practical Methods for Connecting Physical Objects. OReilly. Jord` Puig, S. (2005). Digital Lutherie. Crafting musical computers for a new musics performance and improvisation. PhD thesis, Departament de Tecnologia, Universitat Pompeu Fabra. Joseph, A. J. and Lodha, S. K. (2002). MUSART: Musical Audio Transfer Function Real-time Toolkit. In Proc. Int. Conf. on Auditory Display (ICAD), Kyoto, Japan. Kramer, G. (1994a). An Introduction to Auditory Display. In Kramer, G., editor, Auditory Display: Sonication, Audication, and Auditory Interfaces, chapter Introduction. Addison-Wesley. Kramer, G., editor (1994b). Auditory Display: Sonication, Audication, and Auditory Interfaces. Addison-Wesley, Reading, Menlo Park. Krassnigg, A., Papp, Z., and Plessas, W. (2000). Faddeev Approach to Conned Three-Quark Problems. Phys. Rev., C(62):044004.
206 Latour, B. and Woolgar, S. (1986). Laboratory Life: The Construction of Scientic Facts. Princeton University Press, Princeton, NJ, (Revised edition with an introduction by Jonas Salk and a new postscript by the authors.) edition. Leman, M. (2006). The State of Music Perception Research. Talk at Connecting Media conference, Hamburg. Leman, M. and Camurri, A. (2006). Understanding musical expressiveness using interactive multimedia platforms. Musicae Scientiae, special issue. Lodha, S. K., Beahan, J., Heppe, T., Joseph, A., and Zane-Ulman, B. (1997). MUSE: A Musical Data Sonication Toolkit. In Proc. Int Conf. on Auditory Display (ICAD), Palo Alto, CA, USA. Loering, U., Metsch, B. C., and Petry, H. R. (2001). The light baryon spectrum in a relativistic quark model with instanton-induced quark forces: The non-strange baryon spectrum and ground-states. Eur. Phys. J., A10:395. Madhyastha, T. (1992). Porsonify: A Portable System for Data Sonication. Masters thesis, University of Illinois at Urbana-Champaign. Malham, D. G. (1999). Higher Order Ambisonic Systems for the Spatialisation of Sound. In Proceedings of the ICMC, Beijing, China. Marsaglia, G. (2003). DIEHARD: A Battery of Tests for Random Number Generators. http://www.csis.hku.hk/ diehard/. Mathews, M. and Miller, J. (1963). Music IV programmers manual. Bell Telephone Laboratories, Murray Hill, NJ, USA. Mayer-Kress, G. (1994). Sonication of Multiple Electrode Human Scalp Electroencephalogram. Poster presentation demo at ICAD 94, http: //www.ccsr.uiuc.edu/People/gmk/Projects/EEGSound/. McCartney, J. (2003-2007). SuperCollider3. http://supercollider. sourceforge.net. McKusick, V. A., Sharpe, W. D., and Warner, A. O. (1957). Harvey Tercentenary: An Exhibition on the History of Cardiovascular Sound Including the Evolution of the Stethoscope. Bulletin of the History of Medicine, 31:p.463487.
207 Meinicke, P., Hermann, T., Bekel, H., Mller, H. M., Weiss, S., and u Ritter, H. (2002). Identication of Discriminative Features in EEG. Journal for Intelligent Data Analysis. Milczynski, M., Hermann, T., Bovermann, T., and Ritter, H. (2006). A malleable device with applications to sonication-based data exploration. In Proc. of the International Conference on Auditory Display, London, UK. Moore, B. C. (2004). An Introduction to the Psychology of Hearing. Elsevier, fth edition. Musil, T., Noisternig, M., and Hldrich, R. (2005). A Library for Realtime o 3D Binaural Sound Reproduction in Pure Data (PD). In Proc. Int. Conf. on Digital Audio Eects (DAFX-05), Madrid, Spain. Neuho, J. (2004). Ecological Psychoacoustics. Springer. Noisternig, M., Musil, T., Sontacchi, A., and Hldrich, R. (June, 2003). o A 3D Ambisonic based Binaural Sound Reproduction System. In Proc. Int. Conf. Audio Eng. Soc., Ban, Canada. P. Fronczak, A. Fronczak, J. A. H. (2006). Ferromagnetic uid as a model of social impact. International Journal of Modern Physics, 17(8):12271235. Panek, P., Day, C., Edelmayer, G., and et al. (2005). Real Life Test with e a Friendly Rest Room (FRR) Toilet Prototype in a Daye Care Center in Vienna An Interim Report. In Proc. 8th European Conference for the Advancement of Assistive Technologies in Europe, Lille. Pauletto, S. (2007). Interactive non-speech auditory display of multivariate data. PhD thesis, University of York. Pauletto, S. and Hunt, A. (2004). A Toolkit for Interactive Sonication. In Proceedings of ICAD 2004, Sydney. Pelling, A. E., Sehati, S., Gralla, E. B., Valentine, J. S., and Gimzewski, J. K. (2004). Local Nanomechanical Motion of the Cell Wall of Saccharomyces cerevisiae. Science, 305(5687):11471150. Pereverzev, S. V., Loshak, A., Backhaus, S., Davies, J., and Packard, R. E. (1997). Quantum Oscillations between two weakly coupled reservoirs of superuid 3 He. Nature, 388:449451.
208 Pich, J. and Burton, A. (1998). Cecilia: A Production Interface to e Csound. Computer Music Journal, 22(2):5255. Pigafetta, A. (1530). Primo Viaggio Intorno al Globo Terracqueo (First Voyage Around the Terraqueous World). Giuseppe Galeazzi, Milano. Pigafetta, A. (2001). Mit Magellan um die Erde. (Magellans Voyage: A Narrative Account of the First Circumnavigation). Edition Erdmann, Lenningen, Germany. (First edition Paris 1525.). Potard, G. (2006). Guernica 2006: Sonication of 2006 Years of War and World Population Data. In Proc. Int Conf. on Auditory Display (ICAD), London, UK. Pulkki, V. (2001). Spatial Sound Generation and Perception by Amplitude Panning. PhD thesis, Helsinki University of Technology, Espoo. Raskin, J. (2000). The Humane Interface. Addison-Wesley. Rheinberger, H.-J. (2006). Experimentalsysteme und Epistemische Dinge (Experimental Systems and Epistemic Things). Suhrkamp, Germany. Riess, F., Heering, P., and Nawrath, D. (2005). Reconstructing Galileos Inclined Plane Experiments for Teaching Purposes. In Proc. of the International History, Philosophy, Sociology and Science Teaching Conference, Leeds, UK. Roads, C. (2002). Microsound. MIT Press. Rohrhuber, J. (2006). Terra Nullius. In Proc. Int Conf. on Auditory Display (ICAD), London, UK. Rohrhuber, J., de Campo, A., and Wieser, R. (2005). Algorithms Today - Notes on Language Design for Just In Time Programming. In Proceedings of the ICMC 2005, Barcelona. Ryan, J. (1991). Some Remarks on Musical Instrument Design at STEIM. Contemporary Music Review, 6(1):317. Also available online: http://www.steim.org/steim/texts.phtml?id=3. Saraiya, P., North, C., and Duca, K. (2005). An insight-based methodology for evaluating bioinformatics visualizations. Transactions on Visualization and Computer Graphics, 11(4):443 456. Scaletti, C. (1994). Sound Synthesis Algorithms for Auditory Data Representations. In Kramer, G., editor, Auditory Display: Sonication, Audication, and Auditory Interfaces. Addison-Wesley.
209 Schaeer, P. (1997). Trait des objets musicaux. Le Seuil, Paris. e Snyder, B. (2000). Music and Memory. MIT Press. Speeth, S. D. (1961). Seismometer sounds. J. Acoust. Soc. Am., 33:909 916. Stockman, T., Nickerson, L. V., and Hind, G. (2005). Auditory graphs: A summary of current experience and towards a research agenda. In Proc. ICAD 2005, Limerick. Suzuki, Y. and Varga, K. (1998). Stochastic variational approach to quantum-mechanical few-body problems. Lecture Notes in Physics, m54. TAP, ACM (2004). Acm transactions of applied perception. New York, NY, USA. Theussl, L., Wagenbrunn, R. F., Desplanques, B., and Plessas, W. (2001). Hadronic Decays of N and Delta Resonances in a Chiral Quark Model. Eur. Phys. J., A12:91. UN Statistics Division (1975). Towards A System of Social Demographic Statistics. United Nations, Available online at UN Statistics Division (2006). UN Statistics Division (1989). Handbook of Social Indicators. UN Statistics website. UN Statistics Division (2006). Social Indicators. http://unstats.un.org/unsd/demographic/products/socind/default.htm. Urick, R. J. (1967). Principles of Underwater Sound. McGraw-Hill, New York, NY, USA. U.S. Census Bureau (2006). World POPClock Projection. http://www. census.gov/ipc/www/popclockworld.html. Vercoe, B. (1986). CSOUND: A Manual for the Audio Processing System and Supporting Programs. M.I.T. Media Laboratory, Cambridge, MA, USA. Vogt, K., de Campo, A., Frauenberger, C., Plessas, W., and Eckel, G. (2007). Sonication of Spin Models. Listening to Phase Transitions in the Ising and Potts Model. In Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada.
210 Voss, R. and Clarke, J. (1975). 1/f noise in speech and music. Nature, (258):317318. Voss, R. and Clarke, J. (1978). 1/f Noise in Music: Music from 1/f Noise. J. Acoust. Soc. Am., 63:258263. Walker, B. (2000). Magnitude Estimation of Conceptual Data Dimensions for Use in Sonication. PhD thesis, Rice University, Houston. Walker, B. and Cothran, J. (2003). Sonication Sandbox: A Graphical Toolkit for Auditory Graphs. In Proceedings of ICAD 2003, Boston. Walker, B. N. and Kramer, G. (1996). Mappings and Metaphors in Auditory Displays: An Experimental Assessment. In Frysinger, S. and Kramer, G., editors, Proc. Int. Conf. on Auditory Display (ICAD), pages 7174, Palo Alto, CA. Walker, B. N. and Kramer, G. (2005a). Mappings and Metaphors in Auditory Displays: An Experimental Assessment. ACM Trans. Appl. Percept., 2(4):407412. Walker, B. N. and Kramer, G. (2005b). Sonication Design and Metaphors: Comments on Walker and Kramer, ICAD 1996. ACM Trans. Appl. Percept., 2(4):413417. Walker, B. N. and Kramer, G. (2006). International Encyclopedia of Ergonomics and Human Factors (2nd ed.), chapter Auditory Displays, Alarms, and Auditory Interfaces, pages 10211025. CRC Press, New York. Wallisch, A. (2007). EEG plus Sonikation. Sonikation von EEG-Daten zur Epilepsiediagnostik im Rahmen des Projekts SonEnvir. PhD thesis, Medical University Graz, Graz, Austria. Warusfel, O. (2002-2003). LISTEN HRTF http://recherche.ircam.fr/equipes/salles/listen/. database.
Wedensky, N. (1883). Die telefonische Wirkungen des erregten Nerven - The Telephonic Eects of the Excited Nerve. Centralblatt fr u medizinische Wissenschaften, (26). Wessel, D. (2006). An Enactive Approach to Computer Music Performance. In GRAME, editor, Proc. of Rencontres Musicales Pluridisciplinaires, Lyon, France.
211 Wikipedia (2006a). Gini http://en.wikipedia.org/wiki/Gini coecient. Coecient.
Wikipedia (2006b). Magellan. http://en.wikipedia.org/wiki/Magellan. Wikipedia (2007). Levy skew alpha-stable distribution. http://en.wikipedia.org/wiki/Levy skew alpha-stable distribution. Williams, S. (1994). Perceptual Principles in Sound Grouping. In Kramer, G., editor, Auditory Display. Addison-Wesley. Wilson, C. M. and Lodha, S. K. (1996). Listen: A Data Sonication Toolkit. In Proc. Int Conf. on Auditory Display (ICAD), Santa Cruz, CA, USA. Wilson, K. (1974). Renormalization group theory. Physics Reports, 75(12). Worrall, D., Bylstra, M., Barrass, S., and Dean, R. (2007). SoniPy: The Design of an Extendable Software Framework for Sonication Research and Auditory Display. In Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada. Yeo, W. S., Berger, J., and Wilson, R. S. (2004). A Flexible Framework for Real-time Sonication with SonArt. In Proc. Int Conf. on Auditory Display (ICAD), Sydney, Australia. Yeomans, J. M. (1992). Statistical Mechanics of Phase Transitions. Oxford University Press. Zouhar, V., Lorenz, R., Musil, T., Zmlnig, J. M., and Hldrich, R. o o (2005). Hearing Var`ses Po`me Electronique inside a Virtual Philips e e Pavilion. In Proc. Int. Conf. on Auditory Display (ICAD), Limerick, Ireland. Zweig, G. (1964). An SU(3) Model for Strong Interaction Symmetry and its Breaking. CERN Report Th.401/Th.412, page 8182/8419. Zweig, S. (1983). Magellan - Der Mann und seine Tat. (Magellan - The Man and his Achievement). Fischer, Frankfurt am Main. (First ed. Vienna 1938). Zwicker, E. and Fastl, H. (1999). Psychoacoustics-Facts and Models, 2nd Ed. Springer, Berlin.

Science by Ear Diss DeCampo

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Science by Ear Diss DeCampo

Загружено:

Авторское право:

Доступные форматы

Institute of Electronic Music and Acoustics - IEM, University for Music and Dramatic Arts Graz

Science By Ear. An Interdisciplinary Approach to Sonifying Scientic Data Alberto de Campo

Supervisor: Prof Dr Robert Hldrich (IEM/KUG), o Prof Dr Curtis Roads (MAT/UCSB)

Graz, February 23, 2009

Sonication toolkits, frameworks, applications . . . . . . . . . . . . . . 18

SonEnvir software - Overall scope . . . . . . . . . . . . . . . . . . . . . 24 3.5.1

vi 3.5.2 3.5.3 Framework structure . . . . . . . . . . . . . . . . . . . . . . . . 25 The Data model . . . . . . . . . . . . . . . . . . . . . . . . . . 26 29

Science By Ear - An interdisciplinary workshop . . . . . . . . . . . . . . 32

ICAD 2006 concert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

User, task, interaction models . . . . . . . . . . . . . . . . . . . . . . . 64

Wahlgesnge - Election Songs . . . . . . . . . . . . . . . . . . . . . 90 a Interface and sonication design . . . . . . . . . . . . . . . . . . 91 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Social Data Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Interaction design . . . . . . . . . . . . . . . . . . . . . . . . . 94 Sonication design . . . . . . . . . . . . . . . . . . . . . . . . . 96 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 98

7 Examples from Physics 7.1 7.1.1 7.1.2

Sonication of Spin models . . . . . . . . . . . . . . . . . . . . . . . . 109

Listening test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

The EEG Screener

The EEG Realtime Player

Sonication design . . . . . . . . . . . . . . . . . . . . . . . . . 141

Evaluation with user tests

10 Examples from the Science by Ear Workshop

The folder Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Overview of this thesis

Psychoacoustics, Perception, Cognition, and Interaction

Auditory perception and memory

Cognition, action, and embodiment

Perception, perceptualisation and interaction

Mapping, mixing and matching metaphors

http://icad.org/node/400 http://en.wikipedia.org/wiki/James Gimzewski 3 http://www-pw.physics.uiowa.edu/space-audio/

A taxonomy of intended sonication uses

Sonication toolkits, frameworks, applications

Music and sound programming environments

Design of a new system

Requirements of an ideal sonication environment

SonEnvir software - Overall scope

The Data model

The SonEnvir project

Partner institutions and people

Science By Ear - An interdisciplinary workshop

ICAD 2006 concert

Listening to the Mind Listening

Global Music - The World by Ear

General Sonication Models

The Sonication Design Space Map (SDSM)

41 dicult to understand and interpret.

The Sonication Design Space Map

Figure 5.1: The Sonication Design Space Map

Renement by moving on the map

Examples from the Science by Ear workshop

Figure 5.3: All design steps for the LoadFlow dataset.

./Team A/TeamA 1 FiveSines PowersToFreqs.mp3

Extensions of the SDS map

Task Data analysis - LoadFlow data

Figure 5.5: LoadFlow - time series for 3 individual households

Continuous Data Representation

Discrete Data Representation

Model Based Sonication

User, task, interaction models

Background - related disciplines

http://www.steim.nl http://cnmat.berkeley.edu 19 http://www.nime.org

Music interfaces and musical instruments

The Humane Interface and sonication

Goals, tasks, skills, context

Speaker-based sound rendering