Visualizing The Structure of Music
Visualizing The Structure of Music
net/publication/351634173
CITATION READS
1 550
1 author:
Job Savelsberg
Eindhoven University of Technology
2 PUBLICATIONS 1 CITATION
SEE PROFILE
All content following this page was uploaded by Job Savelsberg on 17 May 2021.
Abstract—The analysis of music is generally done by reading musical scores or listening to the piece. However, these methods require
time and technical knowledge. Many visualization techniques have been devised that aim to make this task less tedious, and more
inviting. The representations each differ in terms of their focus; musical elements that are taken into account and the level of expertise
of the user. This paper aims to give a comprehensive overview of the different methods used in music structure visualization.
Index Terms—music, visualization, music structure, audio-visual systems
1 I NTRODUCTION
Music is the arrangement of sounds into repeating complex patterns.
There are many theories as to why and how this came to existence,
yet much of this remains a mystery. Similarities between music from
independent origins reveal some underlying structural commonalities,
which points to an inherent sensibility humans have towards music.
It can be traced back to the oldest surviving piece of music, from
1400BC [12]. This is thanks to the transcription of musical ideas to a
visual medium, serving as a great tool for the storage, and subsequently,
analysis of music.
The field of music theory and analysis aims to understand the under-
lying structure of music, but on the level of individual compositions
with relation to the composer’s thought process and the experience of
the listener. The focus is on extracting aspects such as rhythm and
harmony, analyzing these in the context of the composition, aiming
to answer the question: ”How does this piece of music work?”. The
Fig. 1. Some examples of the global structure of pop songs.
process of analysis usually involves expanding the visual music infor-
mation, the music score, with extra symbols denoting relationships or
condensing information [6]. The results of the analysis can be used
again for multiple goals; such as aiding composers in the composition conclusions about the state of the art in music structure visualization.
process, or using visualizations of music structure as a tool for people,
even without much theoretical knowledge, to analyze music. 2 BACKGROUND
Visualization of music structure is not only useful for research and 2.1 Some music theory
analysis, but it is also interesting to consumers of music. Exploration Since most of music structure visualizations are made for people in mu-
and browsing of music could be made easier by providing compre- sic analysis research fields, these visualizations might use concepts that
hensive representations of the music, saving the consumer the time of are unfamiliar to non-experts. Therefore, some of the essential concepts
listening to the entire piece to get the overall idea. Another application that are reoccurring throughout this paper are briefly explained.
for users would be visualizations that deepen the listening experience.
This is slightly beyond scope of this paper, but some of the research in 2.1.1 Basic music structure
this field has contributed to the general music visualization field. For the music to be more than a random collection of sounds, structure
Performers of music could also benefit from a more intuitive overall is introduced to music, similar to the structure in storytelling. Note
representation of the music. In the setting of an orchestra, where a piece that the structural elements are not rules, these are simply based on
is subdivided into many instrument-specific scores, it would be useful common structures found across all music.
for performers to get an overview of how the instruments come together First, music is often divided into sections where the length is some
in the music. Or, in the case of a piece that invites improvisation, power of 2 times the smallest time unit. This is due to smaller phrases
visualizing the changes and harmonic structure of the song could give being repeated. A measure, or bar, is often 4 beats. A section is often
the improviser a better overview as opposed to a music score or lead 4, 8, 16 or 32 of these measures. This holds mostly for pop music,
sheet. however, there are many exceptions.
This paper provides an overview, taxonomy, analysis, and compari- A motive or motif is a short musical phrase that is often repeated
son of the different approaches in music structure visualization. First, throughout the music. They are considered the building blocks of a
some required background information is provided; the essential con- music piece. A theme is similar in definition to a motive, although it is
cepts in music theory are explained. We outline the psychological and usually seen as being a longer fragment. A theme could also include
neuroscientific research of both music structure and visualization. After motives.
this, the problem is stated more formally; what will be the input, and The ordering of sections in music is not very strict and depends
what are the extracted elements to be used in the visualization. Next, largely on the genre, culture, time-period, and the composer’s personal
we provide a detailed analysis of the different approaches, organized preference. An example of a common structuring of sections in popular
by key characteristics of the visualization method. Finally, we draw music can be seen in figure1. Each of these sections has a distinct
function. The difference between these sections is also important to
keep the listener engaged despite the repetition.
• Job Savelsberg is with the Technical University of Eindhoven. E-mail:
[email protected]. 2.1.2 Harmony
Harmony is the composite product of multiple notes that are played
at the same time, hence it is also called the vertical aspect of music,
computer applications the Discrete Fourier Transform is used. Convert-
ing the function to a discrete version requires some parameters to be
set, to reach the desired resolution. Instead of a continuous function
describing the amplitude of all frequencies, we use frequency bins, a
simple example would be to discretize the bins in increments of one
hertz. The sampling frequency of the input signal also has to be set. The
highest frequency that can be detected, called the nyquist frequency,
will be half of the sampling frequency, since we need at least two
data points to detect a sinusoidal wave. Often the term Fast Fourier
Transform (FFT) is used interchangeably with DFT, however, FFT
refers to any algorithm that can compute the DFT in Θ(n log n) time.
In some applications Constant-Q transform (CQT) is used over generic
Short-Time Fourier Transform. One of the advantages of using this
method is that in the high frequencies, it reduces frequency resolution
while increasing temporal resolution. This corresponds to the perceived
resolutions in human hearing.
Since sound in music will change over time, we often want to see
Fig. 2. The circle of fifths. how the frequency spectrum changes over time. Therefore we can
create a so-called spectrogram. To create this we simply divide the
input into windows of a certain size, and apply DFT to each of those,
assuming time is denoted horizontally. A collection of three or more this is called Short-Time Fourier Transform (STFT). Note that the size
notes played simultaneously is called a chord. Sequencing different of this will determine the lowest frequency that we can detect after
chords is a powerful tool for conveying emotion in music. Analysis of DFT. Often, overlapping windows are used with a taper that prevents
harmony involves looking at this sequencing of chords and the notes so-called spectral leakage.
the chords are comprised of. A common tool in analysis and therefore In the case of music, it is usually sufficient to only consider the
also in many visualizations is the concept of the ’circle of fifths’, see frequencies that are defined by the equal-tempered scale. Classifying
Figure 2. All twelve keys, or notes, are mapped onto a circle such that the frequencies data in these 12 pitch classes results in a set of so-called
every adjacent note is exactly a fifth (seven chromatic steps) apart. A chroma features. Just like the spectrogram is created, we can also
fifth, after the octave, is the most consonant interval. Keys that are create a chromagram. Showing very precisely which notes are played
close together on the circle of fifths will therefore also sound consonant, at which time.
and will have the most notes in common. Keys that are distant from
each other on the circle will sound more dissonant and have much less 2.2.2 Mel Frequency Cepstral Coefficients (MFCC)
notes in common.
MFCC is a very popular way of compressing audio information by
2.1.3 Melody key characteristics. It is still considered the most powerful tool in the
Melody is the rhythmic succession of musical tones that are usually field of speech recognition applications as a means of representing the
perceived by the listener as a single entity. It usually takes a leading characteristics of sound. It lends itself very well to the analysis of
or foreground role in the music. Melody also plays into the harmonic instrument characteristics as well. In simple terms, a cepstral represen-
structure of the music. tation, or cepstrum, is a spectrum of a spectrum. More formally, the
inverse Fourier Transform of the log of a mel-scaled spectrum. The
2.1.4 Rhythm reason this is useful is due to natural harmonics, which generate peaks
Rhythm is the placement of sounds in time, usually as a pattern of in the frequency spectrum at regular intervals. The steps of acquiring
beats of different strengths and characteristics. Rhythm is usually the MFCCs are as follows:
most subject to repetition. The rhythm is also very dependent on the
time-signature, which defines the number of pulses in a measure. In 1. The Discrete Fourier Transform is taken of the signal.
popular Western music, it is often divided into four beats, but many
exceptions exist, especially in other cultures. Some rhythmic structures 2. The powers are mapped onto the Mel scale, using triangular
are a compound of different time-signatures, other rhythms consist windows. Mel scaling is a key element of MFCC; instead of using
of multiple different rhythms being played at the same time, called linearly-spaced frequency bands, spaces the bands such that it
polyrhythms. Many different rhythms exist and some composers use approximates the human auditory system’s response more closely.
odd rhythms as a base for experimentation. To the human ear, it is much easier to distinguish a difference in
a set number of Hz in the low frequencies compared to higher
2.1.5 Dynamics frequencies.
The dynamics of a piece are the expression of loudness over time. In
music score, they can be defined as accents on specific notes, a loudness 3. The log of the powers at each of the Mel frequencies is taken.
level for certain sections, and a crescendo (increasing) or decrescendo
(decreasing). In audio, it will correspond to the amplitude.
4. Either DCT (discrete cosine transform) or DFT is applied to
2.2 Common techniques in audio and music processing the list of Mel log powers. Resulting in the desired cepstral
Since many of the approaches use common underlying techniques in coefficients.
audio processing, some will be explained in this section.
5. Optionally, to compress the dimensionality, higher-order MFC
2.2.1 Discrete Fourier Transform (DFT) coefficients are discarded.
When performing automated analysis on a raw audio signal, in the form
of amplitude over time, it is hard to extract meaningful information Note that at the end of this process, after compressing to a lower
about the musical contents. Fourier Transform solves this problem dimensionality, some of the exact pitch information might get lost.
by decomposing the signal into a function of frequency, meaning we Nonetheless, in many applications where the timbre of an instrument is
can see exactly what the amplitude of any frequency is. Since Fourier important, or where an efficient similarity measure is needed, MFCC is
Transform is defined as a continuous function over an infinite range, in still among the best choice.
Table 1. User distinction
Outsiders will not actively try to extract any information from the
User Expertise Goals visualization, but they simply see it as a form of entertainment, while
still learning about music structure. An example of this could be music
Analyst Highly experi- Understanding complex struc- videos that incorporate some type of visualization.
enced tures. Make analysis less tedious. A novice is also not experienced, but could show interest in learning
Composer Experienced Aid in making compositional about song structure, for instance, someone taking music lessons for
choices. Comparison with inspi- the first time, and having visualizations aid in understanding simple
ration music. concepts.
A music student is already much more experienced, and would like
Performer Experienced Getting an overview of the piece. to have a method of quickly analyzing music, or learning with practical
Understanding how part fits in the examples of musical concepts.
whole. Music performers might have some different requirements depend-
Student Moderately ex- Understanding all aspects of mu- ing on the genre of the music. Most of these have in common that they
perienced sical analysis. require some sort of overview. However, specifics might be different;
for instance, orchestra players would want to get to know how their
Novice Not experi- Getting familiar with song struc- part fits in with the rest, when to play and when not to. So a focused
enced but ture and music analysis. view of their part but still with some more condensed information about
interested other parts. Players of more popular music might want an overview of
Outsider Not experi- Deeper listening experience. general structure in terms of sections such as verses and choruses. A
enced and not Gaining interest to musical theory jazz performer would like to view the changes in chords and keys to
interested and analysis. get an overview that is useful when improvising.
A music composer might want to look at the structure of other
(Music N/A Thumbnailing, characterizing mu- music to draw inspiration from or to compare to their composition.
Platform) sic for browsing and exploration. This could help the composer make decisions with regards to structure.
Listener If a composer is creating music with a stricter rule set such as in some
classical styles, visualizations could help warn the composer of any
transgressions.
3 P ROBLEM FORMULATION Professional music analysts do not rely on visualizations outside of
music scores, so these visualizations would either have to offer some-
Visualizations of music will depend highly on the application it is
thing more, or they should automate a normally tedious task. Tedious
used for. We can, however, come up with a generic question: ”How
tasks could be for example recognizing variations and repetitions of
can music be represented such that it depicts the relevant structural
motives in a classical piece, or transcribing the chord degree for every
elements efficiently and comprehensibly?”. The first ambiguous term
measure. And visualizations that add something could be showing
in here is ”relevant structural elements”. What elements music is
statistics that take into account multiple variables, for instance, a graph
comprised of will be explained in more detail in section 3.3. Not for
showing complexity over time.
every use case, all elements of music have to be presented.
The term efficiently, in this case, means scaling the amount of 3.2 Input data
information to be as small as possible without losing the required in-
formation. There is a range of viewing levels possible when analyzing Many of the approaches rely on specific types of input data. Three
a piece of music, for instance, macro analysis would require the sum- main types are distinguished:
marization and compression of the original information, being able • Music Score
to see the entire structure of a piece of music at one glance; while
micro analysis requires the information to be expanded for in-depth A music score is the oldest format and was before audio record-
analysis, for example showing relationships between notes in complex ings the only way to ”store” music information. Before sheet
harmonies. music printing was introduced, musical scores used to be written
Comprehensibility is very dependent on the level of expertise of the down by hand, and as a result, these came with a large variety
visualization user. For example, using symbols and notation common of handwriting styles. After the invention of using computers to
to traditional music scores might be appreciated by music analysts, but write musical scores, older scores could be transformed into digi-
the average person might prefer less abstract visualizations; using more tal formats with optical music recognition systems, which is an
’physical representation’ of the information, like changes in size, shape, active area of research [20]. Despite scores having also become a
and color. digital format, the style conventions may still differ between each
composer. Musical scores can contain many arbitrary symbols
3.1 Users and sometimes contain additional remarks on the way the music
The way these terms are further defined depends on the anticipated should be played, for instance dynamics, accents or instrument-
user of the visualization. One of the most distinctive characteristics specific cues. This makes it the most versatile medium, but also
that needs to be taken into account is the amount of knowledge and harder to use in automated systems. Music scores, combined with
expertise a user has in musical analysis. The set of users mentioned in listening are also the preferred source for theoretical analysis of
the research material can mostly be placed on a scale from experienced music. The most used digital markup language for musical score
to least experienced. In Table 1 a general user distinction is made that sheets is MusicXML.
highlights each user’s level of expertise and goals.
While most of these users want to reach some kind of insight into • MIDI
the structure of the music, one of the users does not entirely fit in Musical Instrument Digital Interface (MIDI) is a technical stan-
this list, namely the music platforms and their listeners. Some of the dard that was invented in the early 80s as a means for electronic
included papers mention how their visualization could help users to instruments to communicate. MIDI consist of a stream of event
easily browse and explore songs based on the visual provided. Actual messages, where each message can consist of parameters such as:
usage, and thus requirement definition, of these methods, would likely Note ON/OFF, Pitch, Velocity, Aftertouch, Tempo, Panning, and
come from music platforms having to provide users with a large set of various other types of modulation. MIDI data can also be stored
music. So in theory both listeners and music platforms are users in this in a file format. The standard visualization method for displaying
case. Moving up the table from here, the user’s goal is already more MIDI data is called a piano roll, named after the rolls of holed
steered towards actually paying attention to the music structure. sheets, used for self-playing pianos and other music machines. It
Fig. 3. An example of a piano roll containing different instruments,
denoted by color.
Fig. 4. A taxonomy of the research covered in this paper.
shows pitch on the vertical axis, and time on the horizontal axis,
and sometimes shows note velocity, or instrument selection, as a of colors to denote the consonance or dissonance between keys,
color value in a gradient. An example can be seen in Figure 3. chords, and notes. An example of this can be seen in Figure 2.
• Audio • Rhythmic structure
Audio storage is certainly the most common form of music data Rhythm is quite a natural concept since it is also present in the
for obvious reasons. Despite its considerably higher difficulty in natural world and mathematics. It is usually in the form of a
automated analysis, lots of applications still prefer to use this form cycle of events. This is also how rhythm, according to many,
of data. The reason it is difficult for computers to analyze is due is naturally visualized; as a subdivided circle, where time runs
to the concepts in music theory. Music theorists think of music along the circle at a constant speed. Yet many of the tools used in
in the form of arranged notes or pitches with their respective composing rhythms use a sequential visualization, where beats
characteristics such as timbre and amplitude over time, while are placed linearly, such as in music scores or drum computers.
a waveform is only a variable amplitude over time. Extracting
these concepts from an audio waveform is not a straightforward • Instrumentation
task as evidenced by the active research and literature on the
subject [13]. Nowadays, most audio formats are dual-channel / Instrumentation, which instruments play which role in the com-
stereo, meaning there is a separate waveform for the left and right position, is another element that is hard to find among music
channel, nowadays, multi-channel formats are gaining popularity. structure visualizations. It finds its use mainly in analyzing mu-
What also sets audio apart from the previous methods, is that sical ensembles such as an orchestra, where the instrumentation
audio is a temporal medium and therefore does not allow the user structure is much more variable. This element is often seen as
to view it from different levels, as opposed to zooming in and out either a gradient, where the instrument can go from being the
of scores and piano-roll visualizations. leading instrument in the foreground, to being the supporting
instrument in the background. Some visualizations also use role
All of these types are somewhat interchangeable with slight amounts assignment over time, with distinct roles such as: accompany, con-
of information being lost upon conversion. Music score and MIDI are trast, collaborate, dominant. The choice of roles is not universal
similar in terms of containing information about exact pitches, loudness, and is usually influenced by the author’s reasoning.
and timings of notes and instrumentation. Music score and MIDI can
usually easily be converted to one another, and in many cases can • Dynamics
be used in combination with software instruments to generate audio. Dynamics are in some cases, such as musical scores, already vi-
Audio can be converted into MIDI or music score by note recognition sualized by various symbols. So most visualizations of dynamics,
software. In section 2.2.1 is explained how methods can extract pitch using scores as input will focus on creating a condensed view
and instrument information from audio. of the dynamics of the entire piece. Dynamics might also be
combined with instrumentation, visualizing the dynamics of indi-
3.3 The elements of a musical composition vidual instruments. And when dealing with audio as input, there
The structure of music is expressed using certain elements. From the are many different methods of considering and extracting dynam-
provided input these elements should be able to be extracted, some ics. For example, taking into account amplitude peaks, taking
types of data lend themselves more to specific elements than others. an average over time or taking into account specific frequency
ranges.
• Repetition
As a key concept in understanding structure, almost all visual- 4 A PPROACHES
izations incorporate repetition in some way. Most of the other Many approaches can be used to visualize the structure of music. To
elements in music are also analyzed on their repetition. get a better overview, these approaches are classified by their most
distinctive characteristics. We first divide the approaches between
• Harmonic structure overview visualizations, and detailed visualizations. These are further
Harmonic structure is the most common element that is visualized subdivided, as can be seen in Figure 4.
in music structure, and mainly the repetition of harmonies. In
highly condensed types of visualizations it comes in the form of 4.1 Time-domain structural overview
the change in large sections being denoted by a discrete value that Approaches that contain a time-axis are popular due to their similarity
changes over time. Multiple visualizations also try to visualize to standard visualizations of all the input types, such as traditional
harmony more at a micro level, denoting small motives or extend- music notation, piano rolls, and waveforms; they all read from left to
ing traditional scores with extra information about note and chord right in time with the music. Condensed, time-axis based visualizations
relationships. The circle of fifths is often applied with a mapping of music structure means that the visualization contains less information
Fig. 5. Example of Foote’s self-similarity matrix with notes about music Fig. 6. Example of MidiVis. Original caption: ”The Beatles – The Magical
structure. Original caption: ”Day Tripper by Lennon/McCartney, per- Mystery Tour. All melodic layers visualized. The structure of the piece is
formed by the Beatles.” [8] easy to perceive.” [27]
than the input, and therefore giving the user a clearer overview of the and therefore uses color to denote relations between patterns played by
piece. either the left or right hand. This has been generalized to multi-track
One common element that is shown using this type of visualization MIDI, such as pieces for an ensemble. Due to this, the input data
is repetition, and change, over time. This extracts things such as cannot be audio but in this case, must be MIDI. In Figure 6, at first
functional sections, recurring melodies and chord progressions from glance, there seems to be a much more visible structure than in Foote’s
the music. example: 5. However, note that this might be due to the type of input,
in the case of MIDI allowing for better performing similarity measures.
4.1.1 Self-similarity Foote also introduces the ”beat spectrum” and ”beat spectrogram”
A widely used method of extracting repetition and change from data [10] [9]. Since rhythm is a feature that is defined by a structure in the
over time is using self-similarity matrices. Jonathan Foote has been at direction of time, it is hard to imagine an intuitive representation of the
the forefront of using self-similarity as a means of visualizing music change in rhythm over time. But using the self-similarity matrix, one
structure. His papers from 1999 [8] and 2001 [9] explain how with can compute a so-called beat spectrum as a function of lag l: the time
both MIDI and audio as input one can create a self-similarity matrix between two similar audio fragments. This is done by calculating the
and proves it can be used to extract useful information. Both the x similarity between a point on the diagonal, and a point on the diagonal
and y value of the matrix increase with time, and the axis are denoted that is l further in time, then summing this for all points on the diagonal,
by seconds. The brightness of any point (i, j) on the matrix denotes for all lag amounts. The resulting spectrum shows exactly how the
the similarity between the segment at time i and time j. A simple strong and weak beats are divided over a measure, and the duration of
similarity measure over the raw audio data would not work, since the a measure can also be easily extracted, see Figure 7. Calculating the
amplitude at a certain point in time does not carry information about the beat spectrum of segments over time will result in a beat spectrogram,
musical content. This is why the segments are parameterized into Mel- which shows the change of rhythm and time signature over time, see
frequency cepstral coefficients (MFCCs, explained in section 2.2.2). Figure 8.
The MFCCs of two segments are compared using a measure based on
vector autocorrelation. In this case, a simple scalar (dot) product of 4.1.2 Repeating blocks representations
the vectors is taken. The size of the segments used in the similarity Because self similarity-matrices require some experience to extract
calculation can differ depending on the goals of the visualization. In meaningful data, attempts have been made to simplify its representation.
Figure 5 it is shown how a self-similarity matrix can be constructed And in special to show structure as a set of colored blocks over time.
for an entire piece of music. It is also possible to create a localized From 2010 up to 2012 Naoko Kosugi has published 4 papers on his so-
self-similarity matrix, for example, showing the self-similarity of a called ”Misual” project [7]. Taking the self-similarity matrix approach
single melodic line, here the segments need to be small enough such explained before, they perform a transformation on the image of the
that they don’t concatenate a sequence of notes. As can be seen in matrix, intending to detect brighter diagonal lines parallel to the main
Figure 5, despite a large amount of useful information contained in this diagonal. Each of these lines represents a repetition of a certain pattern
graph, reading and understanding it is not an easy task. And also the in the music. The final visualization is done in the form of a 3D
fact that the self-similarity matrix is inherently symmetric along the cylinder with the radius at a certain position in time representing the
diagonal, and thus duplicating the necessary information, does not aid RMS amplitude value at that time. The repetitive sections are given a
in the efficient use of space or general readability of the visualization. unique color on the cylinder. Later in the project, Kosugi decided to
Wolkowicz et al. build upon Foote’s work to create an extended add another visual, a graph with the different patterns on the Y-axis,
self-similarity matrix showing more information using colors [27]. One and the duration of the pattern on the X-axis, with the aim of better
of the main differences is that this paper is focused on piano pieces, showing overlapping patterns. It also helps with distinguishing similar
Fig. 9. A misual of Satie’s Gnossienne No. 4. [7]
Fig. 7. The beat spectrum of the jazz composition Take Five, by Dave
Brubeck. [9]
Fig. 10. The schemagram of Beethoven’s Piano Sonata No. 1, Op. 2-1,
Mvt. 1. [3]
pitch range of the notes in the segment. The transparency denotes the
Cantareira et al. developed an approach to analyze single-instrument number of notes played. The horizontal line on each rectangle denotes
MIDI data called ”Music Overview, Stability, and Harmony Visualiza- the weighted average of the pitch of all the notes in the segment, where
tion” (MoshViz) [2]. They present two distinct views: an overview. the weight is based on the note’s length. They have also defined a
explained here, and a detail view, explained in section 4.3.1. The measure of instability, which is denoted by increasing saturation of or-
overview shows statistics of segments with a given size (in units of time ange/red color. This measure considers what notes are played and when
measures) along the length of the music, seen in Figure 16. Three color they are played, and take into account changes in tempo and rhythm.
bars show single value statistics; complexity, repetition, and interval The instability increases when dissonant harmonies and chaotic time
variation. Complexity is the estimation of overall musical clutter; seg- structures are detected. The blue x symbols denote a time signature
ments with a sequence of fast notes in varying pitches will be given a change, and the bar below this denotes tempo changes, where ”¡” and
higher complexity value. Repetition represents the number of times the the pink bar denotes a rise in tempo.
current segment reappears elsewhere in the piece. The interval variation This method achieves to generate a very extensive overview with nu-
shows how many intervals are being played concerning the base tone merous auxiliary statistics useful for in-depth analysis. And according
for a given segment. The authors employ their own heuristic methods to examples provided in the paper, it also allows for easy distinction of
for identifying the base tone, this aids in visualizing the consonance musical sections.
or dissonance of a given harmony. The rectangles at the top contain
information about the notes in that segment. The height denotes the 4.2 Static structural overview
When a person is asked to describe a piece of music, they will most
likely not explain how the song progresses over time, but rather use
adjectives to point out what sets it apart from other songs. This means
that a song can be and is intuitively characterized by static descriptors.
In this section visualization methods are introduced that aim to statically
characterize a piece of music, and in particular their structure. These
visualizations might be helpful for people trying to look for a certain
type of music by exploring and browsing visual representations.
4.2.1 Graph analysis
Wei-An Chen et al. [4] introduce a model called the Harmony Graph,
which decomposes music into a social-network-like structure according
to its harmonic progressions. The key behind choosing such a represen-
tation is the ability to utilize large amounts of research done on social
network analysis methods. The generic visualization of the directed
graph already reveals a lot about the structure of the piece, as can be
seen in Figure17. Much of the information that humans use to distin-
guish the graphs can be calculated. Wei-An Chen et al. use several
Fig. 15. Two scapeplot representations with their manually inputted generic measures for graphs that can be translated back to musical
”ground truth” below. Left: The Beatles’ “You Can’t Do That”. Right: concepts. For example, average path length tells something about the
Brahms’ Hungarian Dance No. 5. [19] harmonic complexity, or randomness, of the music. A new measure
Fig. 18. Original caption: ”The spatial relationships of the patterns in
Beethoven’s Piano Sonata No. 1, Op.2-1, Mvt. 1. Each dot indicates a Fig. 19. An example of the two-level color-based approach. [14]
pattern, and the size of the dots is proportional to the cardinality of the
patterns.” [3]
4.4 Real-time
As mentioned in the previous section, real-time music visualizations
are very popular. Usually, their aim is more directed towards providing
a deeper experience or sparking interest in analysis, as opposed to
Fig. 23. The detail view of MoshViz. [2] being a fully-fledged tool for analysis. Although there are some, that
do attempt to serve as analytical tools. Most of these approaches use a
2D, momentary representation of harmonic structure, and animate this
over time.
4.3.2 Augmented piano-roll visualization
For many people that are new to music, a piano-roll might be easier to 4.4.1 2 Dimensional
understand than musical notation. Therefore, similar to the approaches
In 1739, Leonhard Euler devised a lattice diagram, called Tonnetz,
in the previous section, also extend the piano-roll representation with
to represent tonal space and show traditional harmonic relationships.
extra information.
Bergstrom et al. use this representation in their visualization, ’Iso-
Just like in adapted music score visualizations, a common approach
Chords’ [1], to visualize MIDI data in real-time. The visualization
is to use color to denote tonality. Ciuha et al. [5] use a cycle of thirds
conveys information about intervals and chords synchronously with
to map color to tonality. They add the radius vectors of all concurrent
music playback. The visualization is comprehensible to novice users
tones and normalize the resulting vector by dividing it by the sum of
in terms that it shows relations between the notes in chords in a chord
the vector’s lengths. As explained in Section 2.1.2, the further the
progression. It shows information about the complexity of chords, and
distance on the cycle of fifths, the more dissonant the harmony will
whether a melody follows the notes in a chord or strays away from it.
sound. In the case of this color mapping, vectors pointing opposite
An example of a comparison between a complex and simpler harmony
directions will add up to a vector pointing in the middle, which is an
can be seen in Figure 24. One useful property of the Tonnetz represen-
unsaturated color, intuitively representing an ambiguous or dissonant
tation is that key modulations do not change the movement of shapes in
tonal character. Sudden changes of the key can also be easily spotted
the progression, and common modulations, such as moving fifths and
using this method, see the purple section in Figure 22 (b).
thirds show up as a small translation of one step. One can quickly iden-
The previously mentioned MoshViz [2], next to an overview, also tify major (upwards triangle) and minor (downwards triangle) chords,
provides a detailed view. Individual notes are visualized in a piano-roll and combinations thereof in the case of extended chords.
style. This visualization offers quite some useful auxiliary details such
Mardirossian and Chew used another tonal pitch space represen-
as chord, time signature, tempo, and measure markings, shown above
tation, by Fred Lerdahl, to visualize harmonic structure in real-time.
the notes in Figure 23. The notes itself are visualized as spheres, with
They dynamically show the sequence of keys as growing translucent
their size corresponding to the impact, which takes into account the
discs, which grow with a rate corresponding to the frequency the key is
loudness of the note, and the strength of the beat at which they are
being used. The visualization also uses color to distinguish between
placed. The green to red color corresponds to the note’s consonance
keys. The advantage of this method is the fact that it shows the usage of
and dissonance respectively. The number of the interval with respect to
keys in real-time but also shows the cumulative distribution of the keys
the base tone is written inside the sphere, with notes not belonging in
at the end of the piece. Just like the Tonnetz representation, this tonal
the scale being marked as negative. The duration is signified by a line
pitch space also does not change with certain transformations of the
extending from the circle. Together with the overview visualization,
input; Octave translation and time-scaling don’t affect the visualization,
MoshViz presents itself as a well-rounded tool for analysis.
and pitch translation only shifts the position and color of the entire
Piano-roll visualization can be easily visualized as a real-time ac-
visualization, not affecting the relational information.
companiment to the music. One person who specializes in this and
has done extensive experimentation is Stephen Malinowski, with his
Music Animation Machine [15]. Many of his experiments and iterative 4.4.2 3 Dimensional
invention process can be seen online, each of his visualizations aims Three-dimensional representations are generally inconvenient when
to highlight different aspects of the music. These visualizations do not projected back to 2D due to obstruction of information and having to
have analysis as their main goal, therefore no formal research has been select viewpoints. But in some cases, 3 dimensions might be a more
done. However, as evidenced by its popularity on online videos, it does natural way to represent data, especially when the visualization allows
invite many people to listen more attentively to music and discover the the user to navigate the 3D space.
Fig. 25. Frame-by-Frame Visualization of Beethoven’s WoO 80, First
Variation. [16]