0% found this document useful (0 votes)
39 views

Smuse An Embodied Cognition Approach To Interactive Music

The document discusses a new approach to interactive music systems called the SMuSe, which is based on principles from embodied cognitive science like situatedness, emergence, and emotional feedback. It aims to address limitations of current systems by incorporating cognitive modeling and the perception-action loop into the interaction. The SMuSe allows exploring music cognition through composition using this new approach.

Uploaded by

ipires
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Smuse An Embodied Cognition Approach To Interactive Music

The document discusses a new approach to interactive music systems called the SMuSe, which is based on principles from embodied cognitive science like situatedness, emergence, and emotional feedback. It aims to address limitations of current systems by incorporating cognitive modeling and the perception-action loop into the interaction. The SMuSe allows exploring music cognition through composition using this new approach.

Uploaded by

ipires
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

9.

CONCLUSIONS
It is difficult to evaluate the role that CAC tools play in
the compositional process; indeed, the influence of even
the most inert music notation software on compositional
thinking is difficult to deny [2, 3, 4, 22]. ManuScore
expands the field of CAC tools by augmenting common
notation-based approaches with a more open conceptual
design, and with the inclusion of corpus-based, generative
capabilities. Although further validation of ManuScore is
required, the user and listener studies outlined in this paper suggest that our goal of providing an interactive CAC
tool, which enhances the compositional process, without
disrupting the composers musical language, has been at
least provisionally achieved.
10. ACKNOWLEDGEMENTS
This work was made possible in part by the Social Sciences and Humanities Research Council, the Canada Council for the Arts, and the Natural Sciences and Engineering
Research Council of Canada.
11. REFERENCES

[11] J. Maxwell, P. Pasquier, and A. Eigenfeldt, The


closure-based cueing model: Cognitively-inspired
learning and generation of musical sequences, in
Proceedings of the 2011 Sound and Music Computing Conference, 2011.

[13] J. McCormack, P. McIlwain, A. Lane, and A. Dorin,


Generative composition with Nodal, in Workshop
on Music and Artificial Life (part of ECAL 2007),
Lisbon, Portugal. Citeseer, 2007.
[14] T. Potter, All my children: A portrait of Sir Andrzej
Panufnik based on conversations with Tully Potter,
The Musical Times, vol. 132, no. 1778, pp. 186191,
1991.
[15] C. Scaletti and R. Johnson, An interactive environment for object-oriented music composition and
sound synthesis, ACM SIGPLAN Notices, vol. 23,
no. 11, pp. 222233, 1988.
[16] I. Stravinsky, T. Palmer, N. Elliott, and M. Bragg,
Once, at a border ...: Aspects of Stravinsky. Kultur
International Films, 1980.

[2] A. Brown, Modes of compositional engagement,


Mikropolyphony, vol. 6, 2001b.

[17] H. Taube, Common Music: A music composition


language in Common Lisp and CLOS, Computer
Music Journal, vol. 15, no. 2, pp. 2132, 1991.

[4] , Real-time tracking of the creative music


composition process, Digital Creativity, vol. 18,
no. 4, pp. 239256, 2007.
[5] D. Cope, The Composers Underscoring Environment: CUE, Computer Music Journal, vol. 21,
no. 3, pp. 2037, 1997.
[6] O. Laske, Composition Theory: An enrichment
of music theory, Journal of New Music Research,
vol. 18, no. 1, pp. 4559, 1989.
[7] , The computer as the artists alter ego,
Leonardo, pp. 5366, 1990.
[8] , Composition Theory in Koenigs Project One
and Project Two, Computer Music Journal, vol. 5,
no. 4, pp. 5465, 1981.
[9] , Toward an Epistemology of Composition,
Interface, vol. 20, pp. 235269, 1991.
[10] M. Laurson, M. Kuuskankare, and V. Norilo, An
overview of PWGL, a visual programming environment for music, Computer Music Journal, vol. 33,
no. 1, pp. 1931, 2009.

_364

Name of author

[12] J. Maxwell, P. Pasquier, A. Eigenfeldt, and


N. Thomas, MusiCOG: A cognitive architecture for
music learning and generation, in Proceedings of
the 2012 Sound and Music Computing Conference,
2012.

[1] G. Assayag, C. Rueda, M. Laurson, C. Agon, and


O. Delerue, Computer-assisted composition at IRCAM: from Patchwork to Openmusic, Computer
Music Journal, vol. 23, no. 3, pp. 5972, 1999.

[3] D. Collins, A synthesis process model of creative


thinking in music composition, Psychology of Music, vol. 33, no. 2, p. 193, 2005.

THE SMUSE: AN EMBODIED COGNITION APPROACH TO


INTERACTIVE MUSIC COMPOSITION

[18] L. Tenney, J. Polansky, Temporal gestalt perception


in music, Journal of Music Theory, vol. 24, no. 2,
1980.
[19] B. Truax, The PODX system: interactive compositional software for the DMX-1000, Computer Music Journal, pp. 2938, 1985.
[20] J. Ventrella, Evolving structure in Liquid Music,
The Art of Artificial Evolution, pp. 269288, 2008.
[21] G. Wang, P. Cook et al., ChucK: A concurrent, onthe-fly audio programming language, in Proceedings of International Computer Music Conference,
2003, pp. 219226.
[22] J. Wiggins, Compositional process in music, International handbook of research in arts education, pp.
453476, 2007.
[23] D. Zicarelli, M and Jam Factory, Computer Music
Journal, vol. 11, no. 4, pp. 1329, 1987.

Address - Line 1
Address - Line 2
Address - Line 3
ABSTRACT
The evolution of computer-based music systems has gone
from computer-aided composition, which transposed the
traditional paradigms of music composition to the digital
realm, to complex feedback systems that allow for rich
multimodal interactions. Yet, a lot of interactive music
systems still rely on outdated principles in the light of
modern situated cognitive systems design. Moreover, the
role of human emotional feedback, arguably an important
feature of musical experience, is rarely taken into account
into the interaction loop. We propose to address these limitations by introducing a novel situated synthetic interactive composition system called the SMuSe (for Situated
Music Server). The SMuSe is based on the principles of
parallelism, situatedness, emergence and emotional feedback and is built on a cognitively plausible architecture.
It allows to address questions at the intersection of music
perception and cognition while being used as a creative
tool for interactive music composition.
1. BACKGROUND
Interactivity has now become a standard feature of many
multimedia systems and plays a fundamental role in contemporary art practice. Specifically, real-time human/machine
interactive music systems are now omnipresent as both
composition and live performance tools. Yet, the term
interactive music system is often misused. The interaction that takes place between a human and a system is a
process that includes both control and feedback, where the
real-world actions are interpreted into the virtual domain
of the system [4]. If some parts of the interaction loop are
missing (for instance the cognitive level in Figure 1), the
system becomes only a reactive (vs. interactive) system.
As a matter of fact, in most of current human-computer
musical systems, the human agent interacts whereas the
machine due to a lack of cognitive modeling only reacts.
Although the term interactivity is widely used in the new
media arts, most systems are simply reactive systems [4].
Furthermore, the cognitive modeling of interactive multimedia systems, when it exists, often relies on a classical
cognitive science approach to artificial systems where the

different modules (e.g. perception, memory, action) are


studied separately. This approach has since been challenged by modern cognitive science, which emphasizes
the crucial role of the perception-action loop, the building of cognitive artifacts, as well as the interaction of the
system with its environment [37]. In this paper we propose a novel approach to interactive music system design
informed by modern cognitive science and present an implementation of such a system called the SMuSe.
Human

Senses
Memory
Cognition

Interaction

Machine

Actuators
Memory
Cognition

Effectors

Sensors

Figure 1. Human machine interaction (adapted from [4])


2. FROM EMBODIED COGNITIVE SCIENCE TO
MUSIC SYSTEMS DESIGN
2.1. Classical View
A look at the evolution of our understanding of cognitive
systems put in parallel with the evolution of music composition practices, gives a particularly interesting perspective on some limitations of actual interactive music systems.
The classical approach to cognitive science assumes that
external behavior is mediated by internal representations
[6] and that cognition is basically the manipulation of these
mental representations by sets of rules. It mainly relies on
the sense-think-act framework [27], where future actions
are planned according to perceptual information.
Interestingly enough, a parallel can be drawn between classical cognitive science and the development of classical
music which also heavily relies on the use of formal structures. It puts the emphasis on internal processes (composition theory) to the detriment of the environment or the
body, with a centralized control of the performance (the
conductor).

_365

Disembodiment in classical music composition can be seen


at several levels. Firstly, by training, the composer is used
to compose in his head and translate his mental representations into an abstract musical representation: the score.
Secondly, the score is traditionally interpreted live by the
orchestras conductor who controls the main aspects of
the musical interpretation, whereas the orchestra musicians themselves are left with a relatively reduced interpretative freedom. Moreover, the role of audience as an
active actor of a musical performance is mostly neglected.

2.3.1. Explicit Gestural Interfaces


Global Reference

The advent of new sensing technologies has fostered the


development of new kind of interfaces for musical expression. Graphical User Interfaces, tangible interfaces,
gesture interfaces have now become omnipresent in the
design of live music performance or compositions [24].
Most of these interfaces are gesture-based interfaces that
require explicit conscious body movements from the user.
They can give access to behavioral or self-reported information, but not to implicit emotional states of the user.

Voice n
Voice 1

A piece that gives a good illustration of situatedness, distributed processing, and emergence principles is In C by
Terry Riley. In this piece, musicians are given a set of
pitch sequences composed in advance, but each musician
is left in charge of choosing when to start playing and repeating these sequences. The piece is formed by the combination of decisions of each independent musician that
makes her decision based on the collective musical output
that emerges from all the possible variations.
Following recent evolution of our understanding of cognitive systems, we emphasize the crucial role of emergence,
distributed processes and situatedness (as opposed to rulebased, serial, central, internal models) in the design of interactive music composition systems.
2.3. Human-In-The-Loop
With the advent of new sensor technologies, the influence
of the environment on music systems can be sensed via
both explicit and implicit interfaces, which allow access
to the behavioral and physiological state of the user. Music is often referred to as the language of emotion, hence
human emotions seems to be a natural feedback channel
to take into account in the design of a situated music system. We believe that in order to be complete, the design
of a situated music system should take into consideration
the emotional aspects of music.

_366

When

Rhythm Agent

Register
Agent

Pitch Classes/
Chord Agent

Music and its effect on the listener has long been a subject
of fascination and scientific exploration from the Greeks
speculating on the acoustic properties of the voice [14]to
Musak researcher designing soothing elevator music. It
has now become an omnipresent part of our day to day
life. Music is well known for affecting human emotional
states, and most people enjoy music because of the emotions it evokes. Yet, although emotions seem to be a crucial aspect of music listening and performance, the scientific literature on music and emotion is scarce if compared
to music cognition or perception [21, 10, 17, 5]. The relationship between specific musical parameters and timevarying emotional responses is still not clear. Biofeedback
interactive music systems appear to be an ideal paradigm
to explore the complex relationship between emotion and
music.
2.4. Perceptual and Cognitive Models of Musical Representation
What are the most relevant dimensions of music and how
should they be represented in the system ? Here, we take a
cognitive psychology approach, and define a set of parameters that are the most salient perceptually and the most
meaningful cognitively. Music is a real-world stimulus
that is meant to be appreciated by a human listener. It involves a complex set of perceptive and cognitive processes

Articulation
Agent

Instrument 1

Attack

2.3.3. Emotional Mapping

Dynamics
Agent

Musical Event

Instrument n

Brightness

2.3.2. Implicit Biosignal Interface


Thanks to the development of more robust and accurate
biosignal technologies, it is now possible to derive emotionrelated information from physiological data and use it as
an input to interactive music systems. Although the idea
is not new [15, 30], the past few years have witnessed a
growing interest from the computer music community in
using physiological data such as heart rate, electrodermal
activity, electroencephalogram and respiration to generate
or transform sound and music. Providing emotion-based
physiological interface is highly relevant for a number of
applications including music therapy, diagnosis, interactive gaming, and emotion-aware musical instruments.

Memory

What

2.2. Modern View


An alternative to classical cognitive science is the connectionist approach that tries to build biologically plausible
systems using neural networks. Unlike more traditional
digital computation models based on serial processing and
explicit manipulation of symbols, connectionist networks
allow for fast parallel computation. Moreover, it does not
rely on explicit rules but on emergent phenomena stemming from the interaction between simple neural units.
Another related approach, called embodied cognitive science, put the emphasis on the influence of the environment
on internal processes. In some sense it replaced the view
of cognition as a representation by the view that cognition
is an active process involving an agent acting in the environment. Consequently, the complexity of a generated
structure is not the result of the complexity of the underlying system only, but partly due to the complexity of its
environment [34].

Expressive
Modulation

Pulse

Flux

Noisiness

Inharmonicity

Acoustic Event
Room, Location

Figure 2. The different levels of sequential grouping


for musical material: event fusion, melodic and rhythmic
grouping and formal sectioning (from [35])
that take place in the central nervous system. The fast advances in the neuroscience of music over the past twenty
years have taught us that these processes are partly interdependent, are integrated in time and involve memory as
well as emotional systems [16, 25, 26]. Their study shed
light on the structures and features that are involved in
music processing and stand out as being perceptually and
cognitively relevant. Experimental studies have found that
musical perception happens at three different time scale,
namely the event fusion level when basic musical events
such as pitch, intensity and timbre emerge ( 50ms); the
melodic and rhythmic grouping when pattern of those basic events are perceived ( 5s), and finally the form level
(from 5s to 1 hour) that deals with large scale sections of
music [35] (Figure 2). This hierarchy of three time scale
of music processing forms the basis on which we built
SMuSes music processing chain.

Reverb

Spatialization

Figure 3. SMuSE is based on a hierarchy of musical


agents
rately by the brain to then later form (around 25-50ms) an
impression of unified musical object [20] (see [16] for a
review of the neural basis of music perception). This modularity as well as the three different levels and time scales
of auditory memory (sound, groups, structure) form a set
of basic principles for designing our bio-mimetic music
system.
3. A COMPUTATIONAL MODEL BASED ON A
SOCIETY OF MUSICAL AGENTS
The architecture of SMuSe is inspired by neurological evidence. It follows a hierarchical and modular structure,
and has been implemented as a set of agents using dataflow programming.

2.5. Music Processing Modules


Research on the brain substrates underlying music processing has switched in the last twenty years from a classical view emphasizing a dichotomy between language
(supposedly processed in left hemisphere) and music (respectively right hemisphere) to a modular view [1]. There
is some evidence that music processing modules are organized into two parallel but largely independent submodules that deal with pitch content (What ?) and temporal content (When ?) respectively [26, 18]. This evidence suggests that they can be treated separately in a
computational framework. Additionally, studies involving music-related deficits in neurologically impaired individuals (e.g. subjects with amusias who cant recognize
melodies anymore) have shown that music faculty is composed of a set of neurally isolable processing components
for pitch, loudness and rhythm [25]. The common view is
that pitch, rhythm and loudness are first processed sepa-

3.1. The SMuSes Architecture


SMuSe is built on a hierarchical, bio-mimetic and modular architecture. The musical material is represented at
three different hierarchical levels, namely event fusion,
event grouping and structure corresponding to different
memory constraints. From the generative point of view,
SMuSe modules are divided into time modules (when)
that generate rhythmic pattern of events and content modules (what) that for each time event choose musical material such as pitch and dynamics (Figure 3).
These cognitive and perceptual constraints influenced the
design of the SMuSes architecture. At the low event fusion level, SMuSe provides a set of synthesis techniques
validated by psychoacoustic tests [2] that give perceptual
control over the generation of timbre as well as the use of
MIDI information to define basic musical material such as

_367

Disembodiment in classical music composition can be seen


at several levels. Firstly, by training, the composer is used
to compose in his head and translate his mental representations into an abstract musical representation: the score.
Secondly, the score is traditionally interpreted live by the
orchestras conductor who controls the main aspects of
the musical interpretation, whereas the orchestra musicians themselves are left with a relatively reduced interpretative freedom. Moreover, the role of audience as an
active actor of a musical performance is mostly neglected.

2.3.1. Explicit Gestural Interfaces


Global Reference

The advent of new sensing technologies has fostered the


development of new kind of interfaces for musical expression. Graphical User Interfaces, tangible interfaces,
gesture interfaces have now become omnipresent in the
design of live music performance or compositions [24].
Most of these interfaces are gesture-based interfaces that
require explicit conscious body movements from the user.
They can give access to behavioral or self-reported information, but not to implicit emotional states of the user.

Voice n
Voice 1

A piece that gives a good illustration of situatedness, distributed processing, and emergence principles is In C by
Terry Riley. In this piece, musicians are given a set of
pitch sequences composed in advance, but each musician
is left in charge of choosing when to start playing and repeating these sequences. The piece is formed by the combination of decisions of each independent musician that
makes her decision based on the collective musical output
that emerges from all the possible variations.
Following recent evolution of our understanding of cognitive systems, we emphasize the crucial role of emergence,
distributed processes and situatedness (as opposed to rulebased, serial, central, internal models) in the design of interactive music composition systems.
2.3. Human-In-The-Loop
With the advent of new sensor technologies, the influence
of the environment on music systems can be sensed via
both explicit and implicit interfaces, which allow access
to the behavioral and physiological state of the user. Music is often referred to as the language of emotion, hence
human emotions seems to be a natural feedback channel
to take into account in the design of a situated music system. We believe that in order to be complete, the design
of a situated music system should take into consideration
the emotional aspects of music.

_366

When

Rhythm Agent

Register
Agent

Pitch Classes/
Chord Agent

Music and its effect on the listener has long been a subject
of fascination and scientific exploration from the Greeks
speculating on the acoustic properties of the voice [14]to
Musak researcher designing soothing elevator music. It
has now become an omnipresent part of our day to day
life. Music is well known for affecting human emotional
states, and most people enjoy music because of the emotions it evokes. Yet, although emotions seem to be a crucial aspect of music listening and performance, the scientific literature on music and emotion is scarce if compared
to music cognition or perception [21, 10, 17, 5]. The relationship between specific musical parameters and timevarying emotional responses is still not clear. Biofeedback
interactive music systems appear to be an ideal paradigm
to explore the complex relationship between emotion and
music.
2.4. Perceptual and Cognitive Models of Musical Representation
What are the most relevant dimensions of music and how
should they be represented in the system ? Here, we take a
cognitive psychology approach, and define a set of parameters that are the most salient perceptually and the most
meaningful cognitively. Music is a real-world stimulus
that is meant to be appreciated by a human listener. It involves a complex set of perceptive and cognitive processes

Articulation
Agent

Instrument 1

Attack

2.3.3. Emotional Mapping

Dynamics
Agent

Musical Event

Instrument n

Brightness

2.3.2. Implicit Biosignal Interface


Thanks to the development of more robust and accurate
biosignal technologies, it is now possible to derive emotionrelated information from physiological data and use it as
an input to interactive music systems. Although the idea
is not new [15, 30], the past few years have witnessed a
growing interest from the computer music community in
using physiological data such as heart rate, electrodermal
activity, electroencephalogram and respiration to generate
or transform sound and music. Providing emotion-based
physiological interface is highly relevant for a number of
applications including music therapy, diagnosis, interactive gaming, and emotion-aware musical instruments.

Memory

What

2.2. Modern View


An alternative to classical cognitive science is the connectionist approach that tries to build biologically plausible
systems using neural networks. Unlike more traditional
digital computation models based on serial processing and
explicit manipulation of symbols, connectionist networks
allow for fast parallel computation. Moreover, it does not
rely on explicit rules but on emergent phenomena stemming from the interaction between simple neural units.
Another related approach, called embodied cognitive science, put the emphasis on the influence of the environment
on internal processes. In some sense it replaced the view
of cognition as a representation by the view that cognition
is an active process involving an agent acting in the environment. Consequently, the complexity of a generated
structure is not the result of the complexity of the underlying system only, but partly due to the complexity of its
environment [34].

Expressive
Modulation

Pulse

Flux

Noisiness

Inharmonicity

Acoustic Event
Room, Location

Figure 2. The different levels of sequential grouping


for musical material: event fusion, melodic and rhythmic
grouping and formal sectioning (from [35])
that take place in the central nervous system. The fast advances in the neuroscience of music over the past twenty
years have taught us that these processes are partly interdependent, are integrated in time and involve memory as
well as emotional systems [16, 25, 26]. Their study shed
light on the structures and features that are involved in
music processing and stand out as being perceptually and
cognitively relevant. Experimental studies have found that
musical perception happens at three different time scale,
namely the event fusion level when basic musical events
such as pitch, intensity and timbre emerge ( 50ms); the
melodic and rhythmic grouping when pattern of those basic events are perceived ( 5s), and finally the form level
(from 5s to 1 hour) that deals with large scale sections of
music [35] (Figure 2). This hierarchy of three time scale
of music processing forms the basis on which we built
SMuSes music processing chain.

Reverb

Spatialization

Figure 3. SMuSE is based on a hierarchy of musical


agents
rately by the brain to then later form (around 25-50ms) an
impression of unified musical object [20] (see [16] for a
review of the neural basis of music perception). This modularity as well as the three different levels and time scales
of auditory memory (sound, groups, structure) form a set
of basic principles for designing our bio-mimetic music
system.
3. A COMPUTATIONAL MODEL BASED ON A
SOCIETY OF MUSICAL AGENTS
The architecture of SMuSe is inspired by neurological evidence. It follows a hierarchical and modular structure,
and has been implemented as a set of agents using dataflow programming.

2.5. Music Processing Modules


Research on the brain substrates underlying music processing has switched in the last twenty years from a classical view emphasizing a dichotomy between language
(supposedly processed in left hemisphere) and music (respectively right hemisphere) to a modular view [1]. There
is some evidence that music processing modules are organized into two parallel but largely independent submodules that deal with pitch content (What ?) and temporal content (When ?) respectively [26, 18]. This evidence suggests that they can be treated separately in a
computational framework. Additionally, studies involving music-related deficits in neurologically impaired individuals (e.g. subjects with amusias who cant recognize
melodies anymore) have shown that music faculty is composed of a set of neurally isolable processing components
for pitch, loudness and rhythm [25]. The common view is
that pitch, rhythm and loudness are first processed sepa-

3.1. The SMuSes Architecture


SMuSe is built on a hierarchical, bio-mimetic and modular architecture. The musical material is represented at
three different hierarchical levels, namely event fusion,
event grouping and structure corresponding to different
memory constraints. From the generative point of view,
SMuSe modules are divided into time modules (when)
that generate rhythmic pattern of events and content modules (what) that for each time event choose musical material such as pitch and dynamics (Figure 3).
These cognitive and perceptual constraints influenced the
design of the SMuSes architecture. At the low event fusion level, SMuSe provides a set of synthesis techniques
validated by psychoacoustic tests [2] that give perceptual
control over the generation of timbre as well as the use of
MIDI information to define basic musical material such as

_367

pitch, velocity and duration. Inspired by previous works


on musical performance modeling [7], the SMuSe also allows to modulate the expressiveness of music generation
by varying parameters such as phrasing, articulation and
performance noise [2].
At the medium melodic and rhythmic grouping level, the
SMuSe implements various state of the art algorithmic
composition tools (e.g. generation of tonal, Brownian and
serial series of pitches and rhythms, Markov chains, ...).
The time scale of this mid-level of processing is in the
order of 5s. for a single grouping, i.e. the time limit of
auditory short-term memory.
The form level concerns large groupings of events over
a long period of time (longer than the short-term memory). It deals with entire sequences of music and relates
to the structure and limits of long-term memory. Influenced by experiments in synthetic epistemology and situated robotics, this longer term structure is accomplished
via the interaction with the environment[37, 38].
The modularity of the music processing chain is also reflected in different SMuSe modules that specifically deal
with time (when) or material (what).
3.2. Agency
The agent framework is based on the principle that complex tasks can be accomplished through a society of simple cross-connected self-contained agents [22]. Here, an
agent is understood as anything that can be viewed as
perceiving its environment through sensors and acting upon
that environment through effectors. [32]. In the context of cognitive science, this paradigm somehow takes a
stand against a unified theory of mind where a diversity
of phenomena would be explained by a single set of rules.
The claim here is that surprising, complex and emergent
results can be obtained through the interaction of simple non-linear agents. The agent framework is particularly suited to building flexible real-time interactive musical systems based on the principles of modularity, realtime interaction and situatedness.
3.3. Data-flow Programming
We chose to implement this hierarchy of musical agents
in SMuSe in a data-flow programming language called
Max/MSP [40]. Data flow programming conceptually models a program as a directed graph of data flowing between
operations. This kind of model can easily represent parallel processing which is common in biological systems,
and is also convenient to represent an agent-based modular architecture.
Interestingly enough, programming in Max/MSP encourages the programmer to think in a way that is close to how
a brain might works. Firstly, since Max/MSP is based
on a data-flow paradigm, processes can operate in parallel

_368

(e.g. pitch and rhythm processes). Secondly, thanks to the


concept of patch abstraction (a Max meta-patch that abstracts or include another Max patch), one is able to easily
build several layers of processing units (which is somehow similar to the different layers of the cortex). Finally,
each process can be connected to every other in a variety of ways (like neurons). Of course, the comparison to
neural processes is limited to higher level, organizational
processes.
3.4. Distributed Control
All the musical agents in SMuSe are OSC-compatible [39]
which means they can be controlled and accessed from
anywhere (including over a network) at any time. This
gives great flexibility to the system, and allows for shared
collaborative compositions where several clients access
and modulate the music server. In this collaborative composition paradigm, every performer builds on what the
others have done. The result is a complex sound structure that keeps evolving as long as different performers
contribute changes to its current shape. A parallel could
be drawn with stigmergic mechanisms of coordination between social insects like ants [34, 3, 11] . In ants colonies,
the pheromonal trace left by one ant at a given time is
used as a means to communicate and stimulate the action
of the others. Hence they manage to collectively build
complex networks of trails towards food sources. Similarly, in a collective music paradigm powered by an OSC
client/server architecture, one performer leaves a musical
trace to the shared composition, which in turn stimulate
the other co-performers to react and build on top of it.
3.5. Concurrent and On-the-fly Control of Musical Processes
We have proposed a biologically inspired memory and
process architecture for SMuSe as well as a computational
model based on software agents. The OSC communication protocol allows to easily send text-based commands
to specific agents in the hierarchy. It allows for flexible
and intuitive time-based, concurrent and on-the-fly control of musical processes.
The different musical agents in the SMuSe all have a specific ID/address where to receive commands and data. The
addresses are divided into /global (affecting the whole hierarchy), /voice[n] (affecting specific voices), and /synth[n]
(affecting specific sound generators). The OSC syntax
supports regular expressions which allows to address several modules at the same time with a compact syntax.
Patterns of perceptually-grounded musical features are sent
to the short-term (STM) and long-term memory (LTM)
modules at any moment in time via specific commands.
/* Example: fill up the STM */
/voice1/rhythm/pattern 4n 4n 8n

16n 16n 4n
/voice1/pitch/pattern 0 0 5 7 10 0
/voice1/pitch/register 4 5 4 4 4 5
/voice1/velocity 12 12 16 16 12 32

Musical sequence generation follow different Selection principles, a term inspired by the reflexions of Koenig on serial music and algorithmic composition [19]. It refers to
the actions taken by the system to generate musical events
using the available short and long term memory content.
These actions can be deterministic (e.g. playback of a
stored sequence) or based on probability of occurrence
of specific events (series, Markov chains, random events).
This allows for an hybrid approach to algorithmic composition where complex stochastic processes are mixed with
more deterministic repeating patterns (Cf. Table 1). Expressivity parameters such as articulation, tempo and dynamics can be continuously accessed and modulated.
Selection
Principle
Sequence
Inverse
Markov
Series

Aleatory

Audio Feedback

Environment
Sensor Data

Network Clients

Microph
one

Wiimot
e

IQR

Biosign
als

XIM

Terminal

IanniX

OSC

The SMuSe

Torque

Figure 4. SMuSes environment: the SMuSe can interact with its environment through different sensors such as
biosignals, camera, gazers, lasers, pressure sensitive floor,
MIDI, audio, but also via OSC commands sent from client
applications (such as console terminal, IQR, Iannix graphical score, Torque game engine, etc.) to the music server
over the network.
conveys information about the human participants interacting with the system) in real time and send this interpreted data to the music generation processes after appropriate fixed or learned musical mappings. The initial
musical material generated by SMuSe is amplified, transformed and nuanced as the interaction between the system
and the participant evolves.

Description
3.7. Emotional Mappings
The order of selection follows the initial
sequence. (The sequence in memory is
played back).
The elements of the original are selected
starting from the end. (The sequence in
memory is played backward).
The elements of a sequence are chosen
based on state transition probabilities
Uniform random choice between elements
of the pattern without repetition. If an
element of the pattern has already been
selected, it cant be selected again until all
the other elements have been selected.
Elements of the sequence are chosen
randomly.

Table 1. The action selection agents choose to play specific musical elements stored in the current working STM
following deterministic or stochastic selection principles.

3.6. Human in the loop


We tested the SMuSe within different sensing environments ranging from physiology sensors that can provide
implicit emotional user interaction (heart rate, electrodermal activity, electroencephalogram), to virtual and mixedreality sensors for behavioral interaction (camera, gazers,
lasers, pressure sensitive floors) and finally MIDI and audio (microphone) for direct musical interaction (Figure 4).
SMuSe integrates sensory data from the environment (that

We have built a situated cognitive music system that is


sensitive to its environment via musical, behavioral and
physiological sensors. Thanks to a flexible architecture,
the system is able to memorize, combine and generate
complex musical structures in real-time. We have described various sensate environments that provide feedback information from the human interactor and allow to
close the interaction loop. As proposed previously, we
take an approach that focuses on emotion-related feedback.

3.7.1. Predefined Mappings


From advanced behavioral and physiological interfaces,
we can infer emotion-related information about the interaction with SMuSe. One possible way to take this feedback into account is to design fixed mappings based on
previous results from experimental psychology studies that
have investigated the emotional responses to specific musical parameters. This a priori knowledge can be used to
drive the choice of musical parameters depending on the
difference between the goal emotion to be expressed or induced, and the emotional state detected by the system via
its sensors. A number of reviews have proposed generic
relationships between sound parameters and emotional responses [9, 13, 12]. These results serve as the basis for
explicit emotional mappings (Cf. Table 2) and have been
confirmed in the specific context of SMuSes parameter
space. This explicit design approach is detailed in [2].

_369

pitch, velocity and duration. Inspired by previous works


on musical performance modeling [7], the SMuSe also allows to modulate the expressiveness of music generation
by varying parameters such as phrasing, articulation and
performance noise [2].
At the medium melodic and rhythmic grouping level, the
SMuSe implements various state of the art algorithmic
composition tools (e.g. generation of tonal, Brownian and
serial series of pitches and rhythms, Markov chains, ...).
The time scale of this mid-level of processing is in the
order of 5s. for a single grouping, i.e. the time limit of
auditory short-term memory.
The form level concerns large groupings of events over
a long period of time (longer than the short-term memory). It deals with entire sequences of music and relates
to the structure and limits of long-term memory. Influenced by experiments in synthetic epistemology and situated robotics, this longer term structure is accomplished
via the interaction with the environment[37, 38].
The modularity of the music processing chain is also reflected in different SMuSe modules that specifically deal
with time (when) or material (what).
3.2. Agency
The agent framework is based on the principle that complex tasks can be accomplished through a society of simple cross-connected self-contained agents [22]. Here, an
agent is understood as anything that can be viewed as
perceiving its environment through sensors and acting upon
that environment through effectors. [32]. In the context of cognitive science, this paradigm somehow takes a
stand against a unified theory of mind where a diversity
of phenomena would be explained by a single set of rules.
The claim here is that surprising, complex and emergent
results can be obtained through the interaction of simple non-linear agents. The agent framework is particularly suited to building flexible real-time interactive musical systems based on the principles of modularity, realtime interaction and situatedness.
3.3. Data-flow Programming
We chose to implement this hierarchy of musical agents
in SMuSe in a data-flow programming language called
Max/MSP [40]. Data flow programming conceptually models a program as a directed graph of data flowing between
operations. This kind of model can easily represent parallel processing which is common in biological systems,
and is also convenient to represent an agent-based modular architecture.
Interestingly enough, programming in Max/MSP encourages the programmer to think in a way that is close to how
a brain might works. Firstly, since Max/MSP is based
on a data-flow paradigm, processes can operate in parallel

_368

(e.g. pitch and rhythm processes). Secondly, thanks to the


concept of patch abstraction (a Max meta-patch that abstracts or include another Max patch), one is able to easily
build several layers of processing units (which is somehow similar to the different layers of the cortex). Finally,
each process can be connected to every other in a variety of ways (like neurons). Of course, the comparison to
neural processes is limited to higher level, organizational
processes.
3.4. Distributed Control
All the musical agents in SMuSe are OSC-compatible [39]
which means they can be controlled and accessed from
anywhere (including over a network) at any time. This
gives great flexibility to the system, and allows for shared
collaborative compositions where several clients access
and modulate the music server. In this collaborative composition paradigm, every performer builds on what the
others have done. The result is a complex sound structure that keeps evolving as long as different performers
contribute changes to its current shape. A parallel could
be drawn with stigmergic mechanisms of coordination between social insects like ants [34, 3, 11] . In ants colonies,
the pheromonal trace left by one ant at a given time is
used as a means to communicate and stimulate the action
of the others. Hence they manage to collectively build
complex networks of trails towards food sources. Similarly, in a collective music paradigm powered by an OSC
client/server architecture, one performer leaves a musical
trace to the shared composition, which in turn stimulate
the other co-performers to react and build on top of it.
3.5. Concurrent and On-the-fly Control of Musical Processes
We have proposed a biologically inspired memory and
process architecture for SMuSe as well as a computational
model based on software agents. The OSC communication protocol allows to easily send text-based commands
to specific agents in the hierarchy. It allows for flexible
and intuitive time-based, concurrent and on-the-fly control of musical processes.
The different musical agents in the SMuSe all have a specific ID/address where to receive commands and data. The
addresses are divided into /global (affecting the whole hierarchy), /voice[n] (affecting specific voices), and /synth[n]
(affecting specific sound generators). The OSC syntax
supports regular expressions which allows to address several modules at the same time with a compact syntax.
Patterns of perceptually-grounded musical features are sent
to the short-term (STM) and long-term memory (LTM)
modules at any moment in time via specific commands.
/* Example: fill up the STM */
/voice1/rhythm/pattern 4n 4n 8n

16n 16n 4n
/voice1/pitch/pattern 0 0 5 7 10 0
/voice1/pitch/register 4 5 4 4 4 5
/voice1/velocity 12 12 16 16 12 32

Musical sequence generation follow different Selection principles, a term inspired by the reflexions of Koenig on serial music and algorithmic composition [19]. It refers to
the actions taken by the system to generate musical events
using the available short and long term memory content.
These actions can be deterministic (e.g. playback of a
stored sequence) or based on probability of occurrence
of specific events (series, Markov chains, random events).
This allows for an hybrid approach to algorithmic composition where complex stochastic processes are mixed with
more deterministic repeating patterns (Cf. Table 1). Expressivity parameters such as articulation, tempo and dynamics can be continuously accessed and modulated.
Selection
Principle
Sequence
Inverse
Markov
Series

Aleatory

Audio Feedback

Environment
Sensor Data

Network Clients

Microph
one

Wiimot
e

IQR

Biosign
als

XIM

Terminal

IanniX

OSC

The SMuSe

Torque

Figure 4. SMuSes environment: the SMuSe can interact with its environment through different sensors such as
biosignals, camera, gazers, lasers, pressure sensitive floor,
MIDI, audio, but also via OSC commands sent from client
applications (such as console terminal, IQR, Iannix graphical score, Torque game engine, etc.) to the music server
over the network.
conveys information about the human participants interacting with the system) in real time and send this interpreted data to the music generation processes after appropriate fixed or learned musical mappings. The initial
musical material generated by SMuSe is amplified, transformed and nuanced as the interaction between the system
and the participant evolves.

Description
3.7. Emotional Mappings
The order of selection follows the initial
sequence. (The sequence in memory is
played back).
The elements of the original are selected
starting from the end. (The sequence in
memory is played backward).
The elements of a sequence are chosen
based on state transition probabilities
Uniform random choice between elements
of the pattern without repetition. If an
element of the pattern has already been
selected, it cant be selected again until all
the other elements have been selected.
Elements of the sequence are chosen
randomly.

Table 1. The action selection agents choose to play specific musical elements stored in the current working STM
following deterministic or stochastic selection principles.

3.6. Human in the loop


We tested the SMuSe within different sensing environments ranging from physiology sensors that can provide
implicit emotional user interaction (heart rate, electrodermal activity, electroencephalogram), to virtual and mixedreality sensors for behavioral interaction (camera, gazers,
lasers, pressure sensitive floors) and finally MIDI and audio (microphone) for direct musical interaction (Figure 4).
SMuSe integrates sensory data from the environment (that

We have built a situated cognitive music system that is


sensitive to its environment via musical, behavioral and
physiological sensors. Thanks to a flexible architecture,
the system is able to memorize, combine and generate
complex musical structures in real-time. We have described various sensate environments that provide feedback information from the human interactor and allow to
close the interaction loop. As proposed previously, we
take an approach that focuses on emotion-related feedback.

3.7.1. Predefined Mappings


From advanced behavioral and physiological interfaces,
we can infer emotion-related information about the interaction with SMuSe. One possible way to take this feedback into account is to design fixed mappings based on
previous results from experimental psychology studies that
have investigated the emotional responses to specific musical parameters. This a priori knowledge can be used to
drive the choice of musical parameters depending on the
difference between the goal emotion to be expressed or induced, and the emotional state detected by the system via
its sensors. A number of reviews have proposed generic
relationships between sound parameters and emotional responses [9, 13, 12]. These results serve as the basis for
explicit emotional mappings (Cf. Table 2) and have been
confirmed in the specific context of SMuSes parameter
space. This explicit design approach is detailed in [2].

_369

Reward

Reinforcement
Agent

A)

B)

[1] E. Altenmuller, How many music centers are in the


brain? in Annals of the New York Academy of Sciences, vol. 930, no. The Biological Foundations of
Music. John Wiley & Sons, 2001, pp. 273280.

Action

Participant

SMuSe
Audio
Feedback

C)

[2] Anonymous, For blind review purpose.

Figure 5. The reinforcement-based music system is


composed of three main components: the music engine
(the SMuSe), the reinforcement learning agent and the listener who provides the reward signal.
the dopamine system in primate brains [33]).

[3] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm


intelligence: from natural to artificial systems. Oxford University Press, USA, 1999.
Figure 6. Artistic realizations: A) Re(PER)curso (Art
Futura and MACBA, Barcelona 2007) B) The Multimodal
Brain Orchestra (FET, Prague 2009) C) XIM sonification
(2007)

4. ARTISTIC REALIZATIONS

Table 2. The relationship between musical parameters


and perceived emotion. A summary of the main results.
(adapted from [10])
3.7.2. Adaptive Mappings
In cases where the relationship between the change of
musical parameters and the emotional response has to be
learned, explicit mappings is not possible. One solution
in SMuSe is to use a Reinforcement Learning (RL) agent
that learns to adapt the choice of musical parameters based
on the interaction of the system with the environment [36].
Reinforcement learning is a biologically plausible learning algorithm particularly suited to an explorative and adaptive approach to mapping as it tries to find a sequence
of parameter changes that optimizes a reward function
(which for instance relates to a state of emotional stress)
[2]. Unlike most previous examples of the use reinforcement learning in music systems, where the reward relates
to some predefined musical rules or quality of improvisation, we are interested in the emotional feedback from the
listener (Figure 5).
This approach contrasts with expert systems such as the
KTH rule system [7, 8] that can modulate the expressivity of music by applying a set of predefined rules inferred
from previous extensive music and performance analysis.
Here, we propose a paradigm where the system learns to
autonomously tune its own parameters in function of the
desired reward function (some emotional feedback) without using any a-priori musical rule.
Interestingly enough, the biological validity of RL is supported by numerous studies in psychology and neuroscience
that found various examples of reinforcement learning in
animal behavior (e.g. foraging behavior of bees [23] or

_370

We have further explored the purposive construction of interactive installations and performances using the SMuSe
system. To name but a few, during the VRoboser installation [2], the sensory inputs (motion, color, distance) of
a 3D virtual Khepera1 robot living in a game-like environment modulated musical parameters in real-time, thus
creating a never ending musical soundscape in the spirit
of Brain Enos Music for Airports. In another context
the SMuSe generated automatic soundscapes and music
which reacted to and influenced the spatial behavior of
human and avatars in the mixed-reality space called XIM
(for eXperience Induction Machine) [2] thus emphasizing
the role of the environment and interaction on the musical
composition. Based on similar premises, Re(PER)curso,
an interactive mixed reality performance involving dance,
percussion, interactive music and video was presented at
the ArtFutura Festival 07 and Museum of Modern Art in
Barcelona in the same year. The performance was composed by several interlaced layers of artistic and technological activities. The music controlled had three components: a predefined soundscape, the percussionist who
performed from a score and the interactive composition
system synchronized by SMuSe; the physical actors, the
percussionist and the dancer were tracked by a video based
active tracking system that in turn controlled an array of
moving lights that illuminated the scene. The spatial information from the stage obtained by the tracking system
was also projected onto the virtual world where it modulated the avatars behavior allowing it to adjust body position, posture and gaze to the physical world. In 2009, the
Brain Orchestra, a multimodal performance using brain
computer interfaces, explored the creative potential of a
collection of brains directly interfaced to the world. During the performance, four brain musicians were controlling a string quartet generated by the SMuSe using their
brain activity alone. The orchestra was conducted by an
emotional conductor, whose emotional reactions were
recorded using biosignal interfaces and fed back to the
1 http://www.k-team.com/

6. REFERENCES

system. The Brain Orchestra was premiered in Prague for


the FET 09 meeting organized by the European Commission. [2]. Finally, a live performance of a piece inspired
by Terry Rileys in C served as an illustration of the
principles of parallelism, situatedness and emergence exhibited by the SMuSe at the Ernst Strungmann Forum on
Language, Music and the Brain: a mysterious relationship.

5. CONCLUSIONS

The SMuSe illustrates a novel situated approach to music


composition systems. It is built on a cognitively plausible architecture that takes into account the different time
frames of music processing, and uses an agent framework
to model a society of simple distributed musical processes.
It takes advantage of its interaction with the environment
to go beyond the classic sense-think-act paradigm [31].
It combines cognitively relevant representations with perceptually grounded sound synthesis techniques and is based
on modern data-flow audio programming practices [28,
29]. This provides an intuitive, flexible and distributed
control environment that can easily generate complex musical structure in real-time. SMuSe can sense its environment via a variety of sensors, notably physiology-based
sensors. The analysis and extraction of relevant information from sensor data allows to re-inject emotion-based
feedback to the system based on the responses of the human participant. The SMuSe proposes a set of pre-wired
emotional mappings from emotions to musical parameters
grounded on the literature on music and emotion, as well
as a reinforcement learning agent that performs online
adaptive mapping. It provides a well grounded approach
towards the development of advanced synthetic aesthetic
systems and a further understanding of the fundamental
psychological processes on which it relies.

[4] B. Bongers, Physical interfaces in the electronic


arts. Interaction theory and interfacing techniques
for real-time performance, Trends in Gestural Control of Music, pp. 4170, 2000.
[5] M. M. Bradley and P. J. Lang, Affective reactions to
acoustic stimuli. Psychophysiology, vol. 37, no. 2,
pp. 204215, March 2000.
[6] J. Fodor, The language of thought.
Pr, 1975.

Harvard Univ

[7] A. Friberg, R. Bresin, and J. Sundberg, Overview


of the kth rule system for musical performance, Advances in Cognitive Psychology, Special Issue on
Music Performance, vol. 2, no. 2-3, pp. 145161,
2006.
[8] A. Friberg, pdm: An expressive sequencer with
real-time control of the kth music-performance
rules, Comput. Music J., vol. 30, no. 1, pp. 3748,
2006.
[9] A. Gabrielson and P. N. Juslin, Emotional expression in music performance: Between the performers
intention and the listeners experience, in Psychology of Music, vol. 24, no. 1, 1996, pp. 6891.
[10] A. Gabrielsson and E. Lindstrm, Music and Emotion - Theory and Research, ser. Series in Affective
Science. New York: Oxford University Press, 2001,
ch. The Influence of Musical Structure on Emotional
Expression.
[11] E. Hutchins and G. Lintern, Cognition in the Wild.
MIT Press, Cambridge, Mass., 1995.
[12] P. N. Juslin and P. Laukka, Communication of emotions in vocal expression and music performance:
different channels, same code? Psychol Bull, vol.
129, no. 5, pp. 770814, Sep 2003.
[13] P. N. Juslin and J. A. Sloboda, Eds., Music and emotion : theory and research. Oxford ; New York:
Oxford University Press, 2001.
[14] P. Kivy, Introduction to a Philosophy of Music. Oxford University Press, USA, 2002.

_371

Reward

Reinforcement
Agent

A)

B)

[1] E. Altenmuller, How many music centers are in the


brain? in Annals of the New York Academy of Sciences, vol. 930, no. The Biological Foundations of
Music. John Wiley & Sons, 2001, pp. 273280.

Action

Participant

SMuSe
Audio
Feedback

C)

[2] Anonymous, For blind review purpose.

Figure 5. The reinforcement-based music system is


composed of three main components: the music engine
(the SMuSe), the reinforcement learning agent and the listener who provides the reward signal.
the dopamine system in primate brains [33]).

[3] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm


intelligence: from natural to artificial systems. Oxford University Press, USA, 1999.
Figure 6. Artistic realizations: A) Re(PER)curso (Art
Futura and MACBA, Barcelona 2007) B) The Multimodal
Brain Orchestra (FET, Prague 2009) C) XIM sonification
(2007)

4. ARTISTIC REALIZATIONS

Table 2. The relationship between musical parameters


and perceived emotion. A summary of the main results.
(adapted from [10])
3.7.2. Adaptive Mappings
In cases where the relationship between the change of
musical parameters and the emotional response has to be
learned, explicit mappings is not possible. One solution
in SMuSe is to use a Reinforcement Learning (RL) agent
that learns to adapt the choice of musical parameters based
on the interaction of the system with the environment [36].
Reinforcement learning is a biologically plausible learning algorithm particularly suited to an explorative and adaptive approach to mapping as it tries to find a sequence
of parameter changes that optimizes a reward function
(which for instance relates to a state of emotional stress)
[2]. Unlike most previous examples of the use reinforcement learning in music systems, where the reward relates
to some predefined musical rules or quality of improvisation, we are interested in the emotional feedback from the
listener (Figure 5).
This approach contrasts with expert systems such as the
KTH rule system [7, 8] that can modulate the expressivity of music by applying a set of predefined rules inferred
from previous extensive music and performance analysis.
Here, we propose a paradigm where the system learns to
autonomously tune its own parameters in function of the
desired reward function (some emotional feedback) without using any a-priori musical rule.
Interestingly enough, the biological validity of RL is supported by numerous studies in psychology and neuroscience
that found various examples of reinforcement learning in
animal behavior (e.g. foraging behavior of bees [23] or

_370

We have further explored the purposive construction of interactive installations and performances using the SMuSe
system. To name but a few, during the VRoboser installation [2], the sensory inputs (motion, color, distance) of
a 3D virtual Khepera1 robot living in a game-like environment modulated musical parameters in real-time, thus
creating a never ending musical soundscape in the spirit
of Brain Enos Music for Airports. In another context
the SMuSe generated automatic soundscapes and music
which reacted to and influenced the spatial behavior of
human and avatars in the mixed-reality space called XIM
(for eXperience Induction Machine) [2] thus emphasizing
the role of the environment and interaction on the musical
composition. Based on similar premises, Re(PER)curso,
an interactive mixed reality performance involving dance,
percussion, interactive music and video was presented at
the ArtFutura Festival 07 and Museum of Modern Art in
Barcelona in the same year. The performance was composed by several interlaced layers of artistic and technological activities. The music controlled had three components: a predefined soundscape, the percussionist who
performed from a score and the interactive composition
system synchronized by SMuSe; the physical actors, the
percussionist and the dancer were tracked by a video based
active tracking system that in turn controlled an array of
moving lights that illuminated the scene. The spatial information from the stage obtained by the tracking system
was also projected onto the virtual world where it modulated the avatars behavior allowing it to adjust body position, posture and gaze to the physical world. In 2009, the
Brain Orchestra, a multimodal performance using brain
computer interfaces, explored the creative potential of a
collection of brains directly interfaced to the world. During the performance, four brain musicians were controlling a string quartet generated by the SMuSe using their
brain activity alone. The orchestra was conducted by an
emotional conductor, whose emotional reactions were
recorded using biosignal interfaces and fed back to the
1 http://www.k-team.com/

6. REFERENCES

system. The Brain Orchestra was premiered in Prague for


the FET 09 meeting organized by the European Commission. [2]. Finally, a live performance of a piece inspired
by Terry Rileys in C served as an illustration of the
principles of parallelism, situatedness and emergence exhibited by the SMuSe at the Ernst Strungmann Forum on
Language, Music and the Brain: a mysterious relationship.

5. CONCLUSIONS

The SMuSe illustrates a novel situated approach to music


composition systems. It is built on a cognitively plausible architecture that takes into account the different time
frames of music processing, and uses an agent framework
to model a society of simple distributed musical processes.
It takes advantage of its interaction with the environment
to go beyond the classic sense-think-act paradigm [31].
It combines cognitively relevant representations with perceptually grounded sound synthesis techniques and is based
on modern data-flow audio programming practices [28,
29]. This provides an intuitive, flexible and distributed
control environment that can easily generate complex musical structure in real-time. SMuSe can sense its environment via a variety of sensors, notably physiology-based
sensors. The analysis and extraction of relevant information from sensor data allows to re-inject emotion-based
feedback to the system based on the responses of the human participant. The SMuSe proposes a set of pre-wired
emotional mappings from emotions to musical parameters
grounded on the literature on music and emotion, as well
as a reinforcement learning agent that performs online
adaptive mapping. It provides a well grounded approach
towards the development of advanced synthetic aesthetic
systems and a further understanding of the fundamental
psychological processes on which it relies.

[4] B. Bongers, Physical interfaces in the electronic


arts. Interaction theory and interfacing techniques
for real-time performance, Trends in Gestural Control of Music, pp. 4170, 2000.
[5] M. M. Bradley and P. J. Lang, Affective reactions to
acoustic stimuli. Psychophysiology, vol. 37, no. 2,
pp. 204215, March 2000.
[6] J. Fodor, The language of thought.
Pr, 1975.

Harvard Univ

[7] A. Friberg, R. Bresin, and J. Sundberg, Overview


of the kth rule system for musical performance, Advances in Cognitive Psychology, Special Issue on
Music Performance, vol. 2, no. 2-3, pp. 145161,
2006.
[8] A. Friberg, pdm: An expressive sequencer with
real-time control of the kth music-performance
rules, Comput. Music J., vol. 30, no. 1, pp. 3748,
2006.
[9] A. Gabrielson and P. N. Juslin, Emotional expression in music performance: Between the performers
intention and the listeners experience, in Psychology of Music, vol. 24, no. 1, 1996, pp. 6891.
[10] A. Gabrielsson and E. Lindstrm, Music and Emotion - Theory and Research, ser. Series in Affective
Science. New York: Oxford University Press, 2001,
ch. The Influence of Musical Structure on Emotional
Expression.
[11] E. Hutchins and G. Lintern, Cognition in the Wild.
MIT Press, Cambridge, Mass., 1995.
[12] P. N. Juslin and P. Laukka, Communication of emotions in vocal expression and music performance:
different channels, same code? Psychol Bull, vol.
129, no. 5, pp. 770814, Sep 2003.
[13] P. N. Juslin and J. A. Sloboda, Eds., Music and emotion : theory and research. Oxford ; New York:
Oxford University Press, 2001.
[14] P. Kivy, Introduction to a Philosophy of Music. Oxford University Press, USA, 2002.

_371

[15] B. R. Knapp and H. S. Lusted, A bioelectric controller for computer music applications, Computer
Music Journal, vol. 14, no. 1, pp. 4247, 1990.

[31] R. Rowe, Interactive music systems: machine listening and composing. Cambridge, MA, USA: MIT
Press, 1993.

[16] S. Koelsch and W. Siebel, Towards a neural basis


of music perception, Trends in Cognitive Sciences,
vol. 9, no. 12, pp. 578584, 2005.

[32] S. Russell and P. Norvig, Artificial intelligence: a


modern approach. Prentice hall, 2009.

[17] C. Krumhansl, An exploratory study of musical


emotions and psychophysiology, Canadian journal
of experimental psychology, vol. 51, no. 4, pp. 336
353, 1997.
[18] , Rhythm and pitch in music cognition, Psychological Bulletin, vol. 126, no. 1, pp. 159179,
2000.
[19] O. Laske, Composition theory in Koenigs project
one and project two, Computer Music Journal, pp.
5465, 1981.
[20] D. Levitin and A. Tirovolas, Current advances in
the cognitive neuroscience of music, Annals of the
New York Academy of Sciences, vol. 1156, no. The
Year in Cognitive Neuroscience 2009, pp. 211231,
2009.
[21] L. B. Meyer, Emotion and Meaning in Music.
University of Chicago Press, 1956.

The

[22] M. Minsky, The society of mind. Simon and Schuster, 1988.


[23] P. Montague, P. Dayan, C. Person, and T. Sejnowski,
Bee foraging in uncertain environments using predictive hebbian learning, Nature, vol. 377, no.
6551, pp. 725728, 1995.
[24] J. Paradiso, Electronic music: new ways to play,
Spectrum, IEEE, vol. 34, no. 12, pp. 1830, 2002.
[25] I. Peretz and M. Coltheart, Modularity of music
processing, Nature Neuroscience, vol. 6, no. 7, pp.
688691, 2003.
[26] I. Peretz and R. Zatorre, Brain organization for music processing, Psychology, vol. 56, 2005.
[27] R. Pfeifer and C. Scheier, Understanding intelligence. The MIT Press, 2001.
[28] M. Puckette, Pure data: another integrated computer music environment, in Proceedings of the 2nd
Intercollege Computer Music Concerts, Tachikawa,
Japan, 1996.
[29] M. Puckette, M. Ucsd, T. Apel et al., Real-time audio analysis tools for pd and msp, 1998.
[30] D. Rosenboom, Biofeedback and the arts: Results
of early experiments, in Computer Music Journal,
vol. 13, no. 4, 1989, pp. 8688.

_372

BACH: AN ENVIRONMENT FOR COMPUTER-AIDED COMPOSITION


IN MAX

[33] W. Schultz, P. Dayan, and P. Montague, A neural substrate of prediction and reward, Science, vol.
275, no. 5306, p. 1593, 1997.

Andrea Agostini

Daniele Ghisi

Freelance composer

Composer - Casa de Velazquez

[34] H. Simon, The science of the artificial, Cambridge, USA, MA, 1981.

ABSTRACT

[35] B. Snyder, Music and memory: an introduction.


The MIT Press, 2000.

Environments for computer-aided composition (CAC for


short), allowing generation and transformation of symbolic musical data, are usually counterposed to real-time
environments or sequencers. The counterposition is deeply
methodological: in traditional CAC environments interface changes have no effect until a certain refresh operation is performed, whereas real-time environments immediately react to user input. We shall present in this article a library for Max, named bach: automatic composers
helper, which adds highly refined capabilities for musical
notation and symbolic processing to a typically real-time
environment, in order to recompose the fracture between
computer-aided composition and the real-time world.

[36] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (Adaptive Computation and
Machine Learning). The MIT Press, March 1998.
[37] P. F. M. J. Verschure, T. Voegtlin, and R. J. Douglas, Environmentally mediated synergy between
perception and behaviour in mobile robots, Nature,
vol. 425, no. 6958, pp. 6204, Oct 2003.
[38] P. F. Verschure, Synthetic epistemology: The acquisition, retention, and expression of knowledge
in natural and synthetic systems, in Fuzzy Systems
Proceedings, 1998. IEEE World Congress on Computational Intelligence., The 1998 IEEE International Conference on, vol. 1. IEEE, 1998, pp. 147
152.
[39] M. Wright, Open sound control: an enabling
technology for musical networking, Org. Sound,
vol. 10, no. 3, pp. 193200, 2005.
[40] D. Zicarelli, How I learned to love a program that
does nothing, Computer Music Journal, no. 26, pp.
4451, 2002.

1. INTRODUCTION
Since the advent of computers there has been great interest on how to take advantage of their superior precision, speed and power in music-related activities. The
probably best-known (and commercially successful) direction has proven being the generation and transformation of sound. In recent years, inexpensive personal computers (and lately even top-end mobile phones) have gained
the ability to perform professional-quality audio transformation and generation in real-time. On the other hand,
several systems have been developed to process symbolic
data rather than acoustic ones - notes rather than sounds.
These systems can be roughly divided into tools for
computer-assisted music engraving (such as Finale, Sibelius,
Lilypond...) and tools for computer-aided composition
(CAC for short, allowing generation and transformation of
symbolic musical data, such as OpenMusic1 [1], PWGL2 ,
Common Music3 ...). Moreover, at least two graphical programming environments, the closely related Max and PureData, have MIDI control and sound generation and transformation among their main focuses - but at the same
time they are capable to deal with arbitrary set of data,
input/output devices and video. Indeed, the boundaries
between all these categories are fuzzy: music engraving
systems often allow non-trivial data processing; some sequencers also provide high-quality graphical representation of musical scores and sound treatment; modern CAC
1 http://repmus.ircam.fr/openmusic/home
2 http://www2.siba.fi/PWGL/

3 http://commonmusic.sourceforge.net/

environments include tools for sound synthesis and transformation. It should though be remarked that Max and
PureData have very crude native support for sequencing,
and essentially none for symbolic musical notation.
Another, orthogonal distinction should be made between real-time systems, which immediately react to interface actions (such as Finale, MaxMSP, ProTools...) and
non-real-time systems, where these actions have no effect
until a certain refresh operation is performed (such as
Lilypond, OpenMusic, PWGL). The latter is the case of
typical CAC environments; yet, in some cases this is unnatural, and it might be argued that there is no deep reason
why symbolic processing should not be performed in realtime. This does not mean that every compositional process should benefit from a real-time data flow, but some
might, as we shall exemplify at the end of the paper. Realtime is a resource, rather than an obligation. Yet, the lack
of this resource has pushed, up to now, the development
of CAC techniques only in the off-line direction.
In our own experience, the real-time or non-real time
nature of an environment for music composition deeply
affects the very nature of the compositional process. Composers working with sequencers, plug-ins and electronic
instruments need them to immediately react as they change
their parameters; likewise, composers working with symbolic data might want the machine to quickly adapt to
new parameter configurations. As composers ourselves,
we believe that the creation and modification of a musical score is not an out-of-time activity, but it follows the
composers discovery process and develops accordingly.
This issue has been faced by Miller Puckette in [11]:
While we have good paradigms for describing processes (such as in the Max or Pd programs as they stand today), and while much
work has been done on representations of musical data (ranging from searchable databases
of sound to Patchwork and OpenMusic, and
including Pds unfinished data editor), we lack
a fluid mechanism for the two worlds to interoperate.
Arshia Cont in [5] adds:
The performers of computer music have been
faster to grab ideas in real time manipulations
and adopting them to their needs. Today, with
many exceptions, a wide majority of composed mixed instrumental and electronic pieces

_373

You might also like