Smuse An Embodied Cognition Approach To Interactive Music
Smuse An Embodied Cognition Approach To Interactive Music
CONCLUSIONS
It is difficult to evaluate the role that CAC tools play in
the compositional process; indeed, the influence of even
the most inert music notation software on compositional
thinking is difficult to deny [2, 3, 4, 22]. ManuScore
expands the field of CAC tools by augmenting common
notation-based approaches with a more open conceptual
design, and with the inclusion of corpus-based, generative
capabilities. Although further validation of ManuScore is
required, the user and listener studies outlined in this paper suggest that our goal of providing an interactive CAC
tool, which enhances the compositional process, without
disrupting the composers musical language, has been at
least provisionally achieved.
10. ACKNOWLEDGEMENTS
This work was made possible in part by the Social Sciences and Humanities Research Council, the Canada Council for the Arts, and the Natural Sciences and Engineering
Research Council of Canada.
11. REFERENCES
_364
Name of author
Address - Line 1
Address - Line 2
Address - Line 3
ABSTRACT
The evolution of computer-based music systems has gone
from computer-aided composition, which transposed the
traditional paradigms of music composition to the digital
realm, to complex feedback systems that allow for rich
multimodal interactions. Yet, a lot of interactive music
systems still rely on outdated principles in the light of
modern situated cognitive systems design. Moreover, the
role of human emotional feedback, arguably an important
feature of musical experience, is rarely taken into account
into the interaction loop. We propose to address these limitations by introducing a novel situated synthetic interactive composition system called the SMuSe (for Situated
Music Server). The SMuSe is based on the principles of
parallelism, situatedness, emergence and emotional feedback and is built on a cognitively plausible architecture.
It allows to address questions at the intersection of music
perception and cognition while being used as a creative
tool for interactive music composition.
1. BACKGROUND
Interactivity has now become a standard feature of many
multimedia systems and plays a fundamental role in contemporary art practice. Specifically, real-time human/machine
interactive music systems are now omnipresent as both
composition and live performance tools. Yet, the term
interactive music system is often misused. The interaction that takes place between a human and a system is a
process that includes both control and feedback, where the
real-world actions are interpreted into the virtual domain
of the system [4]. If some parts of the interaction loop are
missing (for instance the cognitive level in Figure 1), the
system becomes only a reactive (vs. interactive) system.
As a matter of fact, in most of current human-computer
musical systems, the human agent interacts whereas the
machine due to a lack of cognitive modeling only reacts.
Although the term interactivity is widely used in the new
media arts, most systems are simply reactive systems [4].
Furthermore, the cognitive modeling of interactive multimedia systems, when it exists, often relies on a classical
cognitive science approach to artificial systems where the
Senses
Memory
Cognition
Interaction
Machine
Actuators
Memory
Cognition
Effectors
Sensors
_365
Voice n
Voice 1
A piece that gives a good illustration of situatedness, distributed processing, and emergence principles is In C by
Terry Riley. In this piece, musicians are given a set of
pitch sequences composed in advance, but each musician
is left in charge of choosing when to start playing and repeating these sequences. The piece is formed by the combination of decisions of each independent musician that
makes her decision based on the collective musical output
that emerges from all the possible variations.
Following recent evolution of our understanding of cognitive systems, we emphasize the crucial role of emergence,
distributed processes and situatedness (as opposed to rulebased, serial, central, internal models) in the design of interactive music composition systems.
2.3. Human-In-The-Loop
With the advent of new sensor technologies, the influence
of the environment on music systems can be sensed via
both explicit and implicit interfaces, which allow access
to the behavioral and physiological state of the user. Music is often referred to as the language of emotion, hence
human emotions seems to be a natural feedback channel
to take into account in the design of a situated music system. We believe that in order to be complete, the design
of a situated music system should take into consideration
the emotional aspects of music.
_366
When
Rhythm Agent
Register
Agent
Pitch Classes/
Chord Agent
Music and its effect on the listener has long been a subject
of fascination and scientific exploration from the Greeks
speculating on the acoustic properties of the voice [14]to
Musak researcher designing soothing elevator music. It
has now become an omnipresent part of our day to day
life. Music is well known for affecting human emotional
states, and most people enjoy music because of the emotions it evokes. Yet, although emotions seem to be a crucial aspect of music listening and performance, the scientific literature on music and emotion is scarce if compared
to music cognition or perception [21, 10, 17, 5]. The relationship between specific musical parameters and timevarying emotional responses is still not clear. Biofeedback
interactive music systems appear to be an ideal paradigm
to explore the complex relationship between emotion and
music.
2.4. Perceptual and Cognitive Models of Musical Representation
What are the most relevant dimensions of music and how
should they be represented in the system ? Here, we take a
cognitive psychology approach, and define a set of parameters that are the most salient perceptually and the most
meaningful cognitively. Music is a real-world stimulus
that is meant to be appreciated by a human listener. It involves a complex set of perceptive and cognitive processes
Articulation
Agent
Instrument 1
Attack
Dynamics
Agent
Musical Event
Instrument n
Brightness
Memory
What
Expressive
Modulation
Pulse
Flux
Noisiness
Inharmonicity
Acoustic Event
Room, Location
Reverb
Spatialization
_367
Voice n
Voice 1
A piece that gives a good illustration of situatedness, distributed processing, and emergence principles is In C by
Terry Riley. In this piece, musicians are given a set of
pitch sequences composed in advance, but each musician
is left in charge of choosing when to start playing and repeating these sequences. The piece is formed by the combination of decisions of each independent musician that
makes her decision based on the collective musical output
that emerges from all the possible variations.
Following recent evolution of our understanding of cognitive systems, we emphasize the crucial role of emergence,
distributed processes and situatedness (as opposed to rulebased, serial, central, internal models) in the design of interactive music composition systems.
2.3. Human-In-The-Loop
With the advent of new sensor technologies, the influence
of the environment on music systems can be sensed via
both explicit and implicit interfaces, which allow access
to the behavioral and physiological state of the user. Music is often referred to as the language of emotion, hence
human emotions seems to be a natural feedback channel
to take into account in the design of a situated music system. We believe that in order to be complete, the design
of a situated music system should take into consideration
the emotional aspects of music.
_366
When
Rhythm Agent
Register
Agent
Pitch Classes/
Chord Agent
Music and its effect on the listener has long been a subject
of fascination and scientific exploration from the Greeks
speculating on the acoustic properties of the voice [14]to
Musak researcher designing soothing elevator music. It
has now become an omnipresent part of our day to day
life. Music is well known for affecting human emotional
states, and most people enjoy music because of the emotions it evokes. Yet, although emotions seem to be a crucial aspect of music listening and performance, the scientific literature on music and emotion is scarce if compared
to music cognition or perception [21, 10, 17, 5]. The relationship between specific musical parameters and timevarying emotional responses is still not clear. Biofeedback
interactive music systems appear to be an ideal paradigm
to explore the complex relationship between emotion and
music.
2.4. Perceptual and Cognitive Models of Musical Representation
What are the most relevant dimensions of music and how
should they be represented in the system ? Here, we take a
cognitive psychology approach, and define a set of parameters that are the most salient perceptually and the most
meaningful cognitively. Music is a real-world stimulus
that is meant to be appreciated by a human listener. It involves a complex set of perceptive and cognitive processes
Articulation
Agent
Instrument 1
Attack
Dynamics
Agent
Musical Event
Instrument n
Brightness
Memory
What
Expressive
Modulation
Pulse
Flux
Noisiness
Inharmonicity
Acoustic Event
Room, Location
Reverb
Spatialization
_367
_368
16n 16n 4n
/voice1/pitch/pattern 0 0 5 7 10 0
/voice1/pitch/register 4 5 4 4 4 5
/voice1/velocity 12 12 16 16 12 32
Musical sequence generation follow different Selection principles, a term inspired by the reflexions of Koenig on serial music and algorithmic composition [19]. It refers to
the actions taken by the system to generate musical events
using the available short and long term memory content.
These actions can be deterministic (e.g. playback of a
stored sequence) or based on probability of occurrence
of specific events (series, Markov chains, random events).
This allows for an hybrid approach to algorithmic composition where complex stochastic processes are mixed with
more deterministic repeating patterns (Cf. Table 1). Expressivity parameters such as articulation, tempo and dynamics can be continuously accessed and modulated.
Selection
Principle
Sequence
Inverse
Markov
Series
Aleatory
Audio Feedback
Environment
Sensor Data
Network Clients
Microph
one
Wiimot
e
IQR
Biosign
als
XIM
Terminal
IanniX
OSC
The SMuSe
Torque
Figure 4. SMuSes environment: the SMuSe can interact with its environment through different sensors such as
biosignals, camera, gazers, lasers, pressure sensitive floor,
MIDI, audio, but also via OSC commands sent from client
applications (such as console terminal, IQR, Iannix graphical score, Torque game engine, etc.) to the music server
over the network.
conveys information about the human participants interacting with the system) in real time and send this interpreted data to the music generation processes after appropriate fixed or learned musical mappings. The initial
musical material generated by SMuSe is amplified, transformed and nuanced as the interaction between the system
and the participant evolves.
Description
3.7. Emotional Mappings
The order of selection follows the initial
sequence. (The sequence in memory is
played back).
The elements of the original are selected
starting from the end. (The sequence in
memory is played backward).
The elements of a sequence are chosen
based on state transition probabilities
Uniform random choice between elements
of the pattern without repetition. If an
element of the pattern has already been
selected, it cant be selected again until all
the other elements have been selected.
Elements of the sequence are chosen
randomly.
Table 1. The action selection agents choose to play specific musical elements stored in the current working STM
following deterministic or stochastic selection principles.
_369
_368
16n 16n 4n
/voice1/pitch/pattern 0 0 5 7 10 0
/voice1/pitch/register 4 5 4 4 4 5
/voice1/velocity 12 12 16 16 12 32
Musical sequence generation follow different Selection principles, a term inspired by the reflexions of Koenig on serial music and algorithmic composition [19]. It refers to
the actions taken by the system to generate musical events
using the available short and long term memory content.
These actions can be deterministic (e.g. playback of a
stored sequence) or based on probability of occurrence
of specific events (series, Markov chains, random events).
This allows for an hybrid approach to algorithmic composition where complex stochastic processes are mixed with
more deterministic repeating patterns (Cf. Table 1). Expressivity parameters such as articulation, tempo and dynamics can be continuously accessed and modulated.
Selection
Principle
Sequence
Inverse
Markov
Series
Aleatory
Audio Feedback
Environment
Sensor Data
Network Clients
Microph
one
Wiimot
e
IQR
Biosign
als
XIM
Terminal
IanniX
OSC
The SMuSe
Torque
Figure 4. SMuSes environment: the SMuSe can interact with its environment through different sensors such as
biosignals, camera, gazers, lasers, pressure sensitive floor,
MIDI, audio, but also via OSC commands sent from client
applications (such as console terminal, IQR, Iannix graphical score, Torque game engine, etc.) to the music server
over the network.
conveys information about the human participants interacting with the system) in real time and send this interpreted data to the music generation processes after appropriate fixed or learned musical mappings. The initial
musical material generated by SMuSe is amplified, transformed and nuanced as the interaction between the system
and the participant evolves.
Description
3.7. Emotional Mappings
The order of selection follows the initial
sequence. (The sequence in memory is
played back).
The elements of the original are selected
starting from the end. (The sequence in
memory is played backward).
The elements of a sequence are chosen
based on state transition probabilities
Uniform random choice between elements
of the pattern without repetition. If an
element of the pattern has already been
selected, it cant be selected again until all
the other elements have been selected.
Elements of the sequence are chosen
randomly.
Table 1. The action selection agents choose to play specific musical elements stored in the current working STM
following deterministic or stochastic selection principles.
_369
Reward
Reinforcement
Agent
A)
B)
Action
Participant
SMuSe
Audio
Feedback
C)
4. ARTISTIC REALIZATIONS
_370
We have further explored the purposive construction of interactive installations and performances using the SMuSe
system. To name but a few, during the VRoboser installation [2], the sensory inputs (motion, color, distance) of
a 3D virtual Khepera1 robot living in a game-like environment modulated musical parameters in real-time, thus
creating a never ending musical soundscape in the spirit
of Brain Enos Music for Airports. In another context
the SMuSe generated automatic soundscapes and music
which reacted to and influenced the spatial behavior of
human and avatars in the mixed-reality space called XIM
(for eXperience Induction Machine) [2] thus emphasizing
the role of the environment and interaction on the musical
composition. Based on similar premises, Re(PER)curso,
an interactive mixed reality performance involving dance,
percussion, interactive music and video was presented at
the ArtFutura Festival 07 and Museum of Modern Art in
Barcelona in the same year. The performance was composed by several interlaced layers of artistic and technological activities. The music controlled had three components: a predefined soundscape, the percussionist who
performed from a score and the interactive composition
system synchronized by SMuSe; the physical actors, the
percussionist and the dancer were tracked by a video based
active tracking system that in turn controlled an array of
moving lights that illuminated the scene. The spatial information from the stage obtained by the tracking system
was also projected onto the virtual world where it modulated the avatars behavior allowing it to adjust body position, posture and gaze to the physical world. In 2009, the
Brain Orchestra, a multimodal performance using brain
computer interfaces, explored the creative potential of a
collection of brains directly interfaced to the world. During the performance, four brain musicians were controlling a string quartet generated by the SMuSe using their
brain activity alone. The orchestra was conducted by an
emotional conductor, whose emotional reactions were
recorded using biosignal interfaces and fed back to the
1 http://www.k-team.com/
6. REFERENCES
5. CONCLUSIONS
Harvard Univ
_371
Reward
Reinforcement
Agent
A)
B)
Action
Participant
SMuSe
Audio
Feedback
C)
4. ARTISTIC REALIZATIONS
_370
We have further explored the purposive construction of interactive installations and performances using the SMuSe
system. To name but a few, during the VRoboser installation [2], the sensory inputs (motion, color, distance) of
a 3D virtual Khepera1 robot living in a game-like environment modulated musical parameters in real-time, thus
creating a never ending musical soundscape in the spirit
of Brain Enos Music for Airports. In another context
the SMuSe generated automatic soundscapes and music
which reacted to and influenced the spatial behavior of
human and avatars in the mixed-reality space called XIM
(for eXperience Induction Machine) [2] thus emphasizing
the role of the environment and interaction on the musical
composition. Based on similar premises, Re(PER)curso,
an interactive mixed reality performance involving dance,
percussion, interactive music and video was presented at
the ArtFutura Festival 07 and Museum of Modern Art in
Barcelona in the same year. The performance was composed by several interlaced layers of artistic and technological activities. The music controlled had three components: a predefined soundscape, the percussionist who
performed from a score and the interactive composition
system synchronized by SMuSe; the physical actors, the
percussionist and the dancer were tracked by a video based
active tracking system that in turn controlled an array of
moving lights that illuminated the scene. The spatial information from the stage obtained by the tracking system
was also projected onto the virtual world where it modulated the avatars behavior allowing it to adjust body position, posture and gaze to the physical world. In 2009, the
Brain Orchestra, a multimodal performance using brain
computer interfaces, explored the creative potential of a
collection of brains directly interfaced to the world. During the performance, four brain musicians were controlling a string quartet generated by the SMuSe using their
brain activity alone. The orchestra was conducted by an
emotional conductor, whose emotional reactions were
recorded using biosignal interfaces and fed back to the
1 http://www.k-team.com/
6. REFERENCES
5. CONCLUSIONS
Harvard Univ
_371
[15] B. R. Knapp and H. S. Lusted, A bioelectric controller for computer music applications, Computer
Music Journal, vol. 14, no. 1, pp. 4247, 1990.
[31] R. Rowe, Interactive music systems: machine listening and composing. Cambridge, MA, USA: MIT
Press, 1993.
The
_372
[33] W. Schultz, P. Dayan, and P. Montague, A neural substrate of prediction and reward, Science, vol.
275, no. 5306, p. 1593, 1997.
Andrea Agostini
Daniele Ghisi
Freelance composer
[34] H. Simon, The science of the artificial, Cambridge, USA, MA, 1981.
ABSTRACT
[36] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (Adaptive Computation and
Machine Learning). The MIT Press, March 1998.
[37] P. F. M. J. Verschure, T. Voegtlin, and R. J. Douglas, Environmentally mediated synergy between
perception and behaviour in mobile robots, Nature,
vol. 425, no. 6958, pp. 6204, Oct 2003.
[38] P. F. Verschure, Synthetic epistemology: The acquisition, retention, and expression of knowledge
in natural and synthetic systems, in Fuzzy Systems
Proceedings, 1998. IEEE World Congress on Computational Intelligence., The 1998 IEEE International Conference on, vol. 1. IEEE, 1998, pp. 147
152.
[39] M. Wright, Open sound control: an enabling
technology for musical networking, Org. Sound,
vol. 10, no. 3, pp. 193200, 2005.
[40] D. Zicarelli, How I learned to love a program that
does nothing, Computer Music Journal, no. 26, pp.
4451, 2002.
1. INTRODUCTION
Since the advent of computers there has been great interest on how to take advantage of their superior precision, speed and power in music-related activities. The
probably best-known (and commercially successful) direction has proven being the generation and transformation of sound. In recent years, inexpensive personal computers (and lately even top-end mobile phones) have gained
the ability to perform professional-quality audio transformation and generation in real-time. On the other hand,
several systems have been developed to process symbolic
data rather than acoustic ones - notes rather than sounds.
These systems can be roughly divided into tools for
computer-assisted music engraving (such as Finale, Sibelius,
Lilypond...) and tools for computer-aided composition
(CAC for short, allowing generation and transformation of
symbolic musical data, such as OpenMusic1 [1], PWGL2 ,
Common Music3 ...). Moreover, at least two graphical programming environments, the closely related Max and PureData, have MIDI control and sound generation and transformation among their main focuses - but at the same
time they are capable to deal with arbitrary set of data,
input/output devices and video. Indeed, the boundaries
between all these categories are fuzzy: music engraving
systems often allow non-trivial data processing; some sequencers also provide high-quality graphical representation of musical scores and sound treatment; modern CAC
1 http://repmus.ircam.fr/openmusic/home
2 http://www2.siba.fi/PWGL/
3 http://commonmusic.sourceforge.net/
environments include tools for sound synthesis and transformation. It should though be remarked that Max and
PureData have very crude native support for sequencing,
and essentially none for symbolic musical notation.
Another, orthogonal distinction should be made between real-time systems, which immediately react to interface actions (such as Finale, MaxMSP, ProTools...) and
non-real-time systems, where these actions have no effect
until a certain refresh operation is performed (such as
Lilypond, OpenMusic, PWGL). The latter is the case of
typical CAC environments; yet, in some cases this is unnatural, and it might be argued that there is no deep reason
why symbolic processing should not be performed in realtime. This does not mean that every compositional process should benefit from a real-time data flow, but some
might, as we shall exemplify at the end of the paper. Realtime is a resource, rather than an obligation. Yet, the lack
of this resource has pushed, up to now, the development
of CAC techniques only in the off-line direction.
In our own experience, the real-time or non-real time
nature of an environment for music composition deeply
affects the very nature of the compositional process. Composers working with sequencers, plug-ins and electronic
instruments need them to immediately react as they change
their parameters; likewise, composers working with symbolic data might want the machine to quickly adapt to
new parameter configurations. As composers ourselves,
we believe that the creation and modification of a musical score is not an out-of-time activity, but it follows the
composers discovery process and develops accordingly.
This issue has been faced by Miller Puckette in [11]:
While we have good paradigms for describing processes (such as in the Max or Pd programs as they stand today), and while much
work has been done on representations of musical data (ranging from searchable databases
of sound to Patchwork and OpenMusic, and
including Pds unfinished data editor), we lack
a fluid mechanism for the two worlds to interoperate.
Arshia Cont in [5] adds:
The performers of computer music have been
faster to grab ideas in real time manipulations
and adopting them to their needs. Today, with
many exceptions, a wide majority of composed mixed instrumental and electronic pieces
_373