0% found this document useful (0 votes)

14 views10 pages

Música - Terapia Ocupacional Basada en Análisis de Emociones en Audio

This research article presents a semantic matching model for folk music in occupational therapy, focusing on audio emotion analysis. The model utilizes deep learning to extract semantic features from music and text, enhancing the accuracy of emotional recognition and retrieval. Experimental results indicate a high recognition rate, demonstrating the potential of this model to improve the application of folk music in therapeutic settings.

Uploaded by

normannchavarria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views10 pages

Música - Terapia Ocupacional Basada en Análisis de Emociones en Audio

Uploaded by

normannchavarria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Hindawi

Occupational erapy International

Volume 2022, Article ID 6841445, 10 pages
https://doi.org/10.1155/2022/6841445

Research Article
Design of Semantic Matching Model of Folk Music in
Occupational Therapy Based on Audio Emotion Analysis

Wensi Ouyang
School of Music Shaanxi Normal University, Xi'an Shaanxi 710119, China

Correspondence should be addressed to Wensi Ouyang; [email protected]

Received 10 April 2022; Revised 17 May 2022; Accepted 26 May 2022; Published 16 June 2022

Academic Editor: Sheng Bin

Copyright © 2022 Wensi Ouyang. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The main semantic symbol systems for people to express their emotions include natural language and music. The analysis and
establishment of semantic association between language and music is helpful to provide more accurate retrieval and
recommendation services for text and music. Existing researches mainly focus on the surface symbolic features and association
of natural language and music, which limits the performance and interpretability of applications based on semantic association
of natural language and music. Emotion is the main meaning of music expression, and the semantic range of text expression
includes emotion. In this paper, the semantic features of music are extracted from audio features, and the semantic matching
model of audio emotion analysis is constructed to analyze ethnic music audio emotion through feature extraction ability of
deep structure. The model is based on the framework of emotional semantic matching technology and realizes the emotional
semantic matching of music fragments and words through semantic emotional recognition algorithm. Multiple experiments
show that when W = 0:65, the recognition rate of multichannel fusion model is 88.42%, and the model can reasonably realize
audio emotion analysis. When the spatial dimension of music data changes, the classiﬁcation accuracy reaches the highest
when the spatial dimension is 25. Analysing the semantic association of audio promotes the application of folk music in
occupational therapy.

1. Introduction the movie to make the audience and the characters in the movie
resonate and share a common mood, so that the audience can
Emotion analysis is to establish a harmonious man-machine have a “good impression” on the movie. Music and emotion
environment by giving computers the ability to recognize, are inextricably linked. Music can affect emotions and also
understand, express, and adapt to human emotions and make show emotions. Emotions can affect people’s perception of
computers have higher and more comprehensive intelligence. music when they listen to songs. What makes music meaning-
Sentiment analysis is an important content of sentiment com- ful is the emotional connection between the participant and the
puting, and it is also a problem that must be solved in the field listener. Emotion is the essential feature of music, and cognitive
of sentiment computing. Music is the carrier of emotion and an emotion has a corresponding relationship between acoustic
important aspect of affective computing. In the emotional anal- vibration and nonsemantic structure of music. Music not only
ysis of music, people usually like some sad music when they are provides entertainment but also has many social and psycho-
sad and upset, in a happy mood, they like to listen to cheerful logical applications. Emotion analysis has become one of the
and dynamic music, and when they are calm, they tend to most active research fields in natural language processing. It
choose soothing and smooth light music, which is conducive has also been extensively studied in the fields of data mining
to rationally solving problems and maintaining healthy psy- [2], text mining [3], content recommendation [4], and infor-
chology. These potential “emotional tendencies” can expand mation retrieval [5]. Because of its commercial value, academic
the corresponding emotions of music listeners, and these emo- value, and importance to society as a whole, sentiment analysis
tions will generate some “motivation” to solve the correspond- has expanded from computer science to other disciplines. The
ing problems [1]. Filmmakers like to add background music to Internet has become an important medium for people to
2 Occupational Therapy International

express their opinions and to get some information, and the directly allow the composer and listener to carry out emotional
information conveyed by others can influence our opinions. communication, and it can be said that without audio, there is
Therefore, an automatic sentiment analysis system is required, no music. Therefore, many researchers try to use audio pro-
which usually contains sentiment classification modules. cessing to understand and analyze the emotion of music. Gir-
According to the different symbols, sentiment analysis can be gin proposed that music emotion classification based on audio
divided into text sentiment analysis [6], video sentiment analy- data was an important method in music emotion analysis,
sis [7], and audio sentiment analysis [8]. In the research of which segmented audio data and extracted physical features.
emotion recognition of audio, it mainly deals with the audio Lyrics are the important information of a song and the carrier
signal. The widely used audio features are: sound quality fea- of the content of the song, which most intuitively expresses
ture, spectrum feature, rhythm feature, and so on. that the content of the song is the original intention of the song
With the development of science and technology, the model [11]. Although social tags briefly contain users’ understanding
has significantly enhanced the ability to extract effective features of the shared content, they are additional information about
from a large number of data, and the topic of music therapy has the semantic meaning of the shared content, which contains
attracted more and more people’s attention. As music therapy potential resource classification. Social tags can be applied to
workers, it is worth pondering and discussing to develop the music field, including music classification of social tags, con-
cause of music therapy by using rich local music resources. struction of semantic space, and other multiangle and multi-
The Chinese folk music culture is effectively applied to music level applications. Juslin et al. proposed to use social labels of
therapy, and the Chinese folk music culture is carried forward music resources to construct semantic space of music retrieval
through music therapy, so that it goes to the world. Guide the and use tag clustering to improve retrieval effect in tag space to
patient to sing, play, and write music for therapeutic purposes. obtain similar music [12]. If these tags can be well integrated
The application of folk music culture to music therapy, effective into an application system, tag will play an important role in
music therapy internationalization of folk music culture, the music information retrieval system. In music resource sharing
development of music therapy with Chinese characteristics. website, social tag is the system user’s understanding of music
Playing guqin cultivates the mind and forgets troubles, and and emotional interpretation is the free gene of music.
the healer is completely immersed in music, which brings good Music develops with the development of human society, but
practical results for occupational therapy. Folk music culture is music has timeliness. Music retrieval needs to express music in
broad and profound, with a long history, far-reaching connota- the form of music score according to the tempo and pitch of
tions, and rich repertoire, which can express various artistic melody and search music data according to the similarity of
conceptions and feelings. To apply folk music culture into melody. Gingras B. and others have provided four search means.
music therapy, one needs to know more about one’s own music Modern music retrieval system can process hundreds of mil-
culture, and music therapy can be applied in a very wide range lions of music data, but the new music retrieval system should
[9]. There are discrete model and dimensional model widely pay more attention to the performance in online music retrieval,
used in the field of music emotion cognition. The model so that users from different social backgrounds can have a
includes four emotional adjectives, which are used to describe higher experience in retrieving music in different ways [13].
different emotional attributes in the music field and are classi- The content of music is very rich. It can be the rhythm of music
fied into similar emotional types according to the categories. melody or a piece of music. Content-based music information
Each link in the emotional circle is connected with the left retrieval is a hot research topic at present. More and more online
and right links in the emotional logic that there is a progressive music retrieval systems use content-based music information
relationship, and the progressive relationship represents the retrieval technology. In the research of content-based music
regular change of human emotions. The affective cluster of retrieval, humming retrieval is one of the main research direc-
the affective ring, respectively, has been widely recognized and tions. Vaidya and Kalita based their humming music retrieval
applied in the field of musical emotion analysis. The selection on audio processing analysis and obtained the fundamental fre-
of emotion model depends on emotion recognition methods quency distribution of input sound waves through autocorrela-
and specific application scenarios [10]. Emotional words in tion measurement calculation [14]. Xu et al. proposed to extract
the model are extracted from people’s perception of music audio features with an improved music melody contour extrac-
and mainly consider the psychological feelings of listeners to tion algorithm to remove the influence of noise brought by the
the songs they listen to, which is in line with the actual situation environment and audio input equipment on the humming
of psychological interaction of musical emotions. But how to query [15]. Pandeya and Lee proposed the method of extracting
identify music nondescriptive query and music emotion analy- feature kernel for semantic music retrieval, combined audio
sis has become an important research topic for musicologists information and social context information for semantic music
and psychologists. information retrieval, and developed the corresponding music
retrieval system based on semantic understanding [16]. Based
2. Related Work on the rhythm and timbre of music, Belyk et al. constructed a
multilabel emotion recognition model based on principal com-
Musical emotion describes the inherent emotional expression ponent analysis, extracted audio features through multilevel
of musical data, which is widely used in music retrieval, music convolutional network, and learned the vector representation
understanding, and other music-related applications. Audio of music labels for automatic music labelling [17]. Traditional
data is the most important form of music data expression, music retrieval methods cannot meet the needs, so we need to
and a single note cannot show the beauty of music and cannot find new music retrieval methods and ways. With more and
Occupational Therapy International 3

Fourier transform spectrum diagram

–10

–20

Phase position –30

–40

–50

–60
1000 2000 3000 4000 5000 6000 7000

Frequency

Frequency spectrum

Sinusoidal frequency spectrum diagram

8000

7500
Frequency (Hz)

7000

6500

6000
0.5 1.0 1.5 2.0 2.5 3.0 3.5

Time (min)

Spectrogram

Figure 1: Spectrum diagram of the dispersion music signal.

more resources on the Internet, automatic text classification by expression of the audio signal in the frequency domain. The
computer is an important research topic of natural language spectrum diagram of the signal can be obtained by Fourier
processing and artificial intelligence. Related research is still in transform, and the result value of Fourier transform is usually
the stage, related technology is still to be improved, and it is expressed by amplitude and phase, as shown in Figure 1.
difficult to analyze music features and immature representation The figure shows the corresponding frequency-domain
of music, which affects the extraction of music features. features of musical notes played by instruments. All sound fre-
The main method of the audio model based on emo- quency positions are the same. Spectrum is an important fac-
tional semantic matching proposed in this paper is to tor determining sound quality or timbre. Complex sounds
describe the audio in the emotional semantic space and then contain relevant amplitude signals of different frequencies.
identify and match the audio through the emotional seman- For the sampled signal, compute the discrete Fourier trans-
tic, so as to give the music list that best meets its emotional form. Frequency spectrum analysis of audio signals is usually
demands in occupational therapy. carried out on a short segment, called a “frame,” by which
the short-time Fourier transform can capture the frequency
3. Semantic Matching Model of Folk Music content changes in time. The mathematical expression of this
Based on Audio Emotion Analysis transformation is the addition of a window function to the dis-
crete signal, which is generally bell shaped and stationary over
3.1. Feature Extraction of Ethnic Music Audio. The automated a short period of time.
description of music content is based on the feature extraction
of computable time-frequency domain signals. The concept 3.2. Semantic Emotion Recognition Algorithm. In this study, the
and extraction process of each feature will be described below. feature extraction of speech emotion recognition algorithm and
Frequency is a simple sine curve defined as the number of the learning method of training model are used for recognition,
cycles per second, or Hertz (Hz). For example, a sine wave as well as the calculation and judgment of speech emotion
has a frequency of 440 Hz, or 440 cycles per second. The recip- recognition through the query of emotion dictionary combined
rocal of frequency is the period, and the physical meaning is with sentence structure. When semantic understanding is
the number of seconds, that is, the time interval of sinusoidal carried out through emotion dictionary query, it is necessary
signal in a oscillation period [18]. In the time domain, the to establish an emotion dictionary library to obtain the annota-
analogy signal is sampled once every second to obtain the data tion value of emotion. In this paper, some famous emotion dic-
signal. The spectrum diagram of the time domain signal is the tionaries are sorted out to establish the emotion dictionary
4 Occupational Therapy International

Tone Movement Emotion Quality

Master template

Semantic level
Feature level
Alignment
Melody model
template
Imported Matching
templates Pitch calculation
Proper extraction Matching
vector results
Note
Feature segmentation
extraction

Subframe
Pretreatment add window

MIDI, MP3 Smoothing

Physical level

Figure 2: Semantic matching technology framework based on audio sentiment analysis.

library. In this paper, the dictionary of negative words used in tence is split into word sets. Text slice transformation is the
this paper is constructed by selecting common negative word basis of text information processing; emotion recognition tech-
dictionaries and adding popular negative descriptions on the nology cannot do without text slice transformation. Through
Internet. Emotion recognition is carried out by querying the the study of Chinese natural language processing, text cutting
emotion dictionary, and emotion words and judgment words conversion uses language rules to find the separation between
are marked with familiar words, and adverbs of degree are words and their own words and mark the parts of speech. Only
marked numerically. Different emotional words represent dif- through the multichannel fusion of multidomain knowledge
ferent emotional tendencies and sizes, so their annotations can we do a good job of word segmentation.
are also different. Different degree adverbs have different func-
tions and degrees, so determining the value of degree adverbs is 3.3. Semantic Matching Technology Framework Based on Audio
also an important part in the process of emotion recognition Emotion Analysis. In music retrieval, feature selection, repre-
[18]. In the process of emotion recognition, adverbs of degree sentation, and matching are the core techniques. Based on
need to be marked numerically. Through the modification of the research and analysis of music physical and perceptual
adverbs of degree, the intensity of emotion will also change. characteristics, this paper takes melody as the main feature
In this paper, degree adverbs are divided into four grades, and establishes melody representation model through pitch
and then by setting different emotional weights for each degree extraction and dynamic threshold segmentation algorithm to
adverb, degree words can modify the following emotional retrieve music data sets and input music samples. In order to
words through this processing. By sorting out and processing improve the retrieval accuracy, genetic algorithm was used to
the emotion word dictionary, degree word dictionary, and neg- align the template and correct the individual difference of
ative word dictionary, this paper builds an emotion dictionary matching input. The fusion Euclidean distance and similarity
base of common words on the basis of previous studies, which measure matching template is applied to enhance fault toler-
can be used for text emotion recognition in the future. Affective ance and generalization ability. Finally, the effectiveness of
orientation is the value of the valence direction in the dimen- the algorithm is verified by prototype system. Musical features
sional coordinate emotion model, which represents an emo- can be roughly divided into three levels—physical features,
tional orientation and represents the parameters of the acoustic features, and perceptual features. The semantic match-
positive and negative emotions and the positive and negative ing technical framework based on audio sentiment analysis is
degrees of the parties. The semantic emotion recognition sys- shown in Figure 2. Physical features mainly refer to the audio
tem based on emotion dictionary is quick to construct, with content recorded by physical carriers in a certain format, which
good recognition effect and fast recognition speed [19]. The is presented in the form of streaming media. Acoustic level fea-
proposed sentiment analysis algorithm is mainly composed of tures mainly include time and frequency domain features, such
text cutting and conversion, sentiment location, and sentiment as pitch frequency, short-term energy, zero crossing rate, LPC
aggregation. After obtaining the text information, the text infor- coefficient, and MFCC coefficient, which are the performance
mation needs to be converted into text cutting first, and the sen- characteristics of audio itself and are often used in each stage
Occupational Therapy International 5

Excitement
Audio feature
Audio ﬁle
extraction Sadness

National music
Fear
Preprocess Anxiety

Relief
Feature word Joy
Text lyrics
extraction

Figure 3: Structure diagram of speech emotion recognition system with semantic combination.

of speech recognition. Perceptual characteristics reflect people’s fusion after emotional tendencies. The multichannel fusion
descriptions of music feelings, such as pitch, rhythm, tone process of text emotion recognition and acoustic emotion
strength, and timbre. Perceptual characteristics can usually be recognition will be carried out in the judgment of emotion
extracted from physical characteristics, and it is more consis- orientation, namely, the valence axis in the dimensional emo-
tent with human recognition and judgment of music content. tion model.
Music is a discrete sequence of notes that changes over This gap is a relatively small difference in emotion recog-
time, yet feels like a complete entity of notes that change nition. When prosodic features are difficult to identify the
over time. The melody contour of music is the characteristic positive emotion, the method of believing the text emotion
of pitch changes with time, and pitch is determined by the orientation is adopted to adjust the emotion value [21],
pitch frequency of music. Therefore, melody contour can and the emotional tendency of the speech channel and the
be extracted and described by extracting pitch and describ- text channel is identified. The output data of speech channel
ing pitch with appropriate model. This paper proposes a are the result of recognition, the probability of positive emo-
melody representation model based on the standard tem- tional orientation, and the probability of negative emotional
plate and input template, extracts the pitch template of the orientation. The output data of the text channel is the affec-
audio file input by the user from the chord music file, and tive tendency value obtained through calculation. The two
establishes the relationship between the standard pitch fre- channel results of the model are used as input signals to
quency templates, because these two templates belong to the final model. The decision level fusion model is used to
the category of pitch frequency, and their appearance is sim- fuse the two recognition results. Finally, the recognition
ilar after normalization. On the basis of the above research, and classification of discrete emotion model based on
the melody representation model is further improved, and semantic combination is realized.
the appropriate matching algorithm is proposed to match
the final retrieval results. 3.5. Feature Model Training. This design sets up a four-layer
model, based on the model, respectively, using audio fea-
3.4. Construction of Semantic and Audio Emotion Recognition tures and lyric features to classify emotions. The input layer
Model. The model uses prosodic feature, spectral feature, in the network model is set as 100 and 130 nodes, respec-
voice quality, and audio feature to carry out emotion recogni- tively, according to the characteristic dimension of the input,
tion. With the development of speech denoising technology, and the learning rate is initialized as 0.05. The number of
the accuracy of speech recognition has been improved. The nodes in the hidden layer compresses the data layer by layer,
text recognized by speech has strong credibility, and the and the number of nodes in the hidden layer is 400, 200, and
semantic recognition by querying the dictionary can ulti- 150, and the number of iterations is 150. The SoftMax layer
mately preliminarily determine the semantic emotional state at the end of the network structure takes the emotional cat-
of speech information [20]. This paper establishes a speech egory of music as the output, with a total of 4 categories, so
emotion recognition system combining semantics, and the the number of nodes in the output layer is set to 4.
overall system structure is shown in Figure 3. Phonetic fea- Figure 4 shows the influence of iteration times on match-
tures have a high recognition rate in emotional intensity, ing error during network model training. By analysing the
but there is a certain limitation in the identification of emo- above figure, it can be clearly obtained that the relationship
tional orientation. Semantic emotion recognition, on the between the number of iterations and the matching error is
other hand, has a very good performance in identifying emo- that the latter gradually decreases with the increase of the
tional tendencies. Semantic combination of speech emotion former. When the number of iterations increases from 1 to
recognition model is set up, respectively, to identify the text 100, the classification error decreases rapidly. When the
emotion tendentiousness and voice acoustic characteristics number of iterations increases to 150 to 200, the variation
described in the emotional tendency and active situation, of classification error is no longer obvious. On the other
and then, the text recognition tendency and acoustic charac- hand, it can be concluded that the feature extraction ability
teristics identification of multimodal integration level of of deep confidence network with deep structure is stronger
emotional tendency to make decisions get a multimodal than that of shallow network.
6 Occupational Therapy International

Matching error
50

0 50 100 150 200

Iteration
Function loss

Figure 4: Network model training iteration error graph.

4. Experimental Results and Analysis culation of multichannel fusion channel is obtained by

The music samples in this paper are from 1200 songs in the W Wv
database, 1000 of which are used as training samples and the We = + : ð3Þ
Ws 1 − w
remaining 200 as test samples. The emotions of the sample
songs were divided into 6 categories: excitement, sadness,
In the formula, we output multichannel fusion results;
fear, anxiety, relaxation, and joy. The audio was in a unified
W e is the weight of text channel input in the weighted mul-
format, and 30 s of music fragments with the most emotional
tichannel fusion process, W s is the text channel input, and
representation were selected for the classification of musical
WV is the voice channel input.
emotions. One thousand songs were randomly divided into
Finally, the positive and negative values are used to judge
five groups, and then, the five groups were trained
the emotion recognition result after multichannel fusion. By
separately.
adjusting different weights, the recognition rate of each
4.1. Recognition Rate of Semantic Matching Model. The weight is counted, and the weight with the best recognition
experimental training set carries out multichannel fusion of result is selected as the parameter of the multichannel fusion
the two sets of data. In the recognition process, the recogni- channel of the multichannel model. Figure 5 shows the rec-
tion results of the speech channel need to be processed first ognition rate when different weights W are selected in the
and converted into the data that can be used as the input of semantic matching model.
multichannel fusion, which is converted into According to the figure, when W =0.65, the best recogni-
tion rate obtained by the multichannel fusion model is
88.4.2%. Under the optimal parameters, the recognition rate
W v = Pp − Pn , ð1Þ of emotion, sadness, fear, anxiety, ease, and joy is 80.3%,
82.6%, 87.5%, 83.4%, 81.2%, and 88.4%, respectively. The
where W v is voice channel input, Pp is the probability semantically integrated speech emotion recognition experi-
ment finally locates each sample to each specific emotion
that speech recognition results tend to be positive, and Pn by combining the recognition results of the activation coor-
is the speech recognition results that tend to be the negative dinate axis of each sample and the recognition results of the
probability. multichannel emotion orientation. The matching recogni-
The output of this formula is a value from minus one to tion rate was 85.3%, 82.5%, 80.3%, 79.5%, 76.2%, and
one, and the output of the text channel is also processed 81.4%, respectively.
accordingly. By multiplying the result of text emotion recog-
nition by a smaller weight, it is converted into the input data 4.2. Evaluation of Audio Spatial Feature Vectors. The
of the text channel and transformed into method of graph learning is used to make a series of spatial
feature vectors and original feature vectors by integrating
W s = 0:02 × As , ð2Þ audio and text information, which proves the validity of
the musical space representation. In the original space, due
to the heterogeneity between different modal features, in
where W s in the text channel input; As is the sentence the graph learning method, the test samples of each mode
emotion value. In this way, text channel input and voice can only search for training samples in their own space,
channel input are transformed to the same data format while the spatial representation method can make the test
and order of magnitude, and then, the weighted multichan- samples search for modes. The classification accuracy of spa-
nel fusion is carried out for the two groups of data. The cal- tial learning methods is shown in Figure 6.
Occupational Therapy International 7

75 78 81 84 79.8 81.9 84.0 86.1 75.9 79.2 82.5 85.8 79.5 81.0 82.5 84.0 75.4 78.3 81.2 84.1

85.8
83.6
Excitement
81.4
79.2
84 84
81 81
Sadness
78 78
75 75
86.1 86.1
84.0 84.0
Fear 81.9
81.9
79.8 79.8
85.8 85.8
82.5 82.5
Anxiety
79.2 79.2
75.9 75.9
84.0 84.0
82.5 82.5
81.0 Relief 81.0
79.5 79.5
84.1
81.2
78.3 Joy
75.4
79.2 81.4 83.6 85.8 75 78 81 84 79.8 81.9 84.0 86.1 75.9 79.2 82.5 85.8 79.5 81.0 82.5 84.0

Figure 5: Graph of recognition rate of semantic matching model.

It can be seen from the figure that the classification accu- on affective semantic relevance to solve the problem of nonde-
racy of using spatial features is better than that of original spa- scriptive music query processing in music retrieval system, in
tial features in most classes, which proves the validity of spatial order to verify the influence of different music matching
features. This validity is precisely because spatial features make methods on model recognition ability. The verification of the
better use of the correlation between modal data than original combined emotion recognition system is achieved by compar-
spatial features. When the spatial dimension of music data ing the experimental results obtained from three experiments,
changes, the classification accuracy reaches the highest when namely, the emotion recognition based on voice acoustic fea-
the spatial dimension is 25. It can be seen from W that these tures and the speech emotion recognition based on text emo-
two curves do not have the change rule of Ming Si. Such irreg- tion recognition and semantic combination. The experimental
ularity of single mode and result improvement in multimode results and data of the three experiments are shown in Figure 7.
fusion indirectly prove the effectiveness of spatial representa- The data in the graph is expressed in the form of matrix
tion in describing the correlation between modes. graph, and the recognition rate of the three emotion recogni-
tion algorithms can be seen intuitively. The recognition
4.3. Ability to Identify Emotional Tendencies. The audio in the accuracy of traditional support vector machine can be
data set is uniformly converted into WAV format. Each song improved slightly by dimensional-classification recognition
has a long audio time and often has the same music melody. method. On this basis, the recognition rate can be improved
Therefore, 30 music clips of 15-45 seconds per song were used significantly by combining semantic recognition results with
as audio data for emotional classification. Since all the audio multichannel fusion. Among them, the recognition rate of
files in the data set cannot be directly input as training data, it anxiety and fear with low activity increased greatly, indicat-
is necessary to extract representative music features from the ing that there is some complementarity between text infor-
audio files. The accuracy of audio emotion construction and mation and phonological features in emotion recognition.
the coverage of emotion will affect the result of emotion analy-
sis. In different scenarios, the effect of audio sentiment analysis 4.4. Emotional Semantic Matching Accuracy. Accuracy rate
may not be as expected. The main purpose of the experiment is and recall rate were used as evaluation indexes. Accuracy is
to verify the ability of the proposed music retrieval model based the number of correctly matched words divided by the total
8 Occupational Therapy International

0.6

14
15

12
16

11
0.5

10
18
0.4

9
19
0.3

8
20

7
0.2 21
0.1 6
22
0.0 5
23 4
3
24
Average accuracy

2
25
1

26
49
27 48

28 47
0.0 46
29
0.1 45
30
0.2

44
0.3 31

43
32

0.4

42
33

41
0.5
34

40
35

39
38
36

0.6
37

Matching accuracy

Figure 6: Statistical chart of accuracy of audio space feature vector analysis.

Recognition capability
94.90
Joy

89.92
Relief

Anxiety 84.94

Fear
79.96

Sadness
74.98

Excitement
70.00
17 34 51 68 85

Figure 7: Semantic matching speech emotion recognition experimental results data graph.

number of veriﬁed data. Precision is the ratio of the number A

R= : ð4Þ
of correctly matched topics to the number of topics judged A+C
by the model. The recall rate is the ratio of the number of
correctly matched topics to the number of topics that belong When random bars were selected as music units in the
to that topic. Recall rate formula is as follows: experiment, the reason why the two emotional themes of
Occupational Therapy International 9

Exact value 85

Excitement Sadness Fear Anxiety Relief Joy

Exact value Mean value

Center line Abnormal value

Figure 8: Adaptive matching eﬀect of music unit validation model.

sadness and joy were selected in the experiment was that the feasibility and advantage of research results. In the process
they had higher emotional differentiation. Compared with of participating in the establishment of emotional speech data-
the experimental results without emotion screening, the base, through the analysis of annotation data, the potential rela-
experimental results are significantly improved, especially tionship between emotion types and dimension coordinates is
the effective information in the sequence can be better cap- summarized, so that the expression of emotion can be more
tured. Module and attention mechanism are effective in detailed and comprehensive, and the changing trend of emotion
extracting sequence features. After training, they can effec- can be better reflected. By identifying the transformation model
tively distinguish different emotions and are less affected for training, the data amount of training set corresponding to
by the intensity of emotions. According to the experimental the training speed is increased effectively. The recognition accu-
results, the music climax segment is selected as the adaptive racy of traditional support vector machine can be improved
matching effect of the music unit validation model, and the slightly by dimensional-classification recognition method. On
experimental results are shown in Figure 8. this basis, the recognition rate can be improved significantly
Figure can be seen that using highlights data model in 6 by combining semantic recognition results with multichannel
emotional subject adaptability experiment accuracy is over fusion. However, there are some limitations in the research pro-
75%, in sadness and joy these two emotions obviously on cess, and further research is still needed in music classification.
the theme of the accuracy of more than 82%, therefore, Emotion recognition technology has a wide range of application
although the model has certain accuracy, but on the whole scenarios, many fields have extremely important influence and
model can be done more effectively music clips and words demand, and has great research value. In the future, I will con-
to match. The matching effect of excitement and ease is bet- tinue to improve the semantic matching model and study the
ter, and the accuracy rate is 83.35% and 84.81%, respectively. influence of emotional cognition on music retrieval system.
The matching effect of emotional anxiety is poor, and both
emotional focus and fear need to be improved. It performs
best in the results of emotional semantic matching between Data Availability
music and text. Audio features can describe the audio fea-
tures of songs more effectively, and the feature vector model The data used to support the findings of this study are avail-
has higher accuracy and the best analysis effect. able from the corresponding author upon request.

5. Conclusion Conflicts of Interest

With the development of the Internet, folk music retrieval is not The author declares no known competing financial interests
limited to text retrieval, but music retrieval through the audio or personal relationships that could have appeared to influ-
information of songs. Based on the music audio signal charac- ence the work reported in this paper.
teristic, the essential characteristics extracted from folk songs
of audio data, using artificial extraction sequence characteristics
as the training data and semantic matching algorithm, the pro- Acknowledgments
posed songs build audio semantic matching model of sentiment
analysis, and on this basis, the matching accuracy is used to This work was supported by the School of Music Shaanxi
identify the semantic combination of research and finally prove Normal University.
10 Occupational Therapy International

References [16] Y. R. Pandeya and J. Lee, “Deep learning-based late fusion of

multimodal information for emotion classification of music
[1] D. Johnston, H. Egermann, and G. Kearney, “Innovative com- video,” Multimedia Tools and Applications, vol. 80, no. 2,
puter technology in music-based interventions for individuals pp. 2887–2905, 2021.
with autism moving beyond traditional interactive music ther- [17] M. Belyk, S. Brown, J. Lim, and S. A. Kotz, “Convergence of
apy techniques,” Cogent Psychology, vol. 5, no. 1, p. 1554773, semantics and emotional expression within the IFG pars orbi-
2018. talis,” NeuroImage, vol. 156, pp. 240–248, 2017.
[2] M. Barthet, G. Fazekas, and A. Allik, “From interactive to [18] R. Lu, Z. Duan, and C. Zhang, “Listen and look: audio–visual
adaptive mood-based music listening experiences in social or matching assisted speech source separation,” IEEE Signal Pro-
personal contexts,” Journal of the Audio Engineering Society, cessing Letters, vol. 25, no. 9, pp. 1315–1319, 2018.
vol. 64, no. 9, pp. 673–682, 2016.
[19] X. Zhang, W. Lu, F. Li, R. Zhang, and J. Cheng, “A deep neural
[3] R. Abboud and J. Tekli, “Integration of nonparametric fuzzy
architecture for sentence semantic matching,” International
classification with an evolutionary-developmental framework
Journal of Computational Science and Engineering, vol. 21,
to perform music sentiment-based analysis and composition,”
no. 4, pp. 574–582, 2020.
Soft Computing, vol. 24, no. 13, pp. 9875–9925, 2020.
[4] J. R. Binder, L. L. Conant, C. J. Humphries et al., “Toward a [20] M. Carr and T. Nielsen, “Morning REM sleep naps facilitate
brain-based componential semantic representation,” Cognitive broad access to emotional semantic networks,” Sleep, vol. 38,
Neuropsychology, vol. 33, no. 3-4, pp. 130–174, 2016. no. 3, pp. 433–443, 2015.
[5] E. B. Lange and K. Frieler, “Challenges and opportunities of [21] M. Moffat, P. D. Siakaluk, D. M. Sidhu, and P. M. Pexman,
predicting musical emotions with perceptual and automatized “Situated conceptualization and semantic processing: effects
features,” Music Perception: An Interdisciplinary Journal, of emotionasl experience and context availability in semantic
vol. 36, no. 2, pp. 217–242, 2018. categorization and naming tasks,” Psychonomic Bulletin &
Review, vol. 22, no. 2, pp. 408–419, 2015.
[6] S. R. Livingstone and F. A. Russo, “The Ryerson audio-visual
database of emotional speech and song (RAVDESS): a
dynamic, multimodal set of facial and vocal expressions in
north American English,” PLoS One, vol. 13, no. 5, article
e0196391, 2018.
[7] J. K. Johnson and M. L. Chow, “Hearing and music in demen-
tia,” Handbook of Clinical Neurology, vol. 129, pp. 667–687,
2015.
[8] J. Lian, “Optimization of music teaching management system
for college students based on similarity distribution method,”
Mathematical Problems in Engineering, vol. 2022, 11 pages,
2022.
[9] G. Ragone, J. Good, and K. Howland, “How technology
applied to music-therapy and sound-based activities addresses
motor and social skills in autistic children,” Multimodal Tech-
nologies and Interaction, vol. 5, no. 3, p. 11, 2021.
[10] R. Vaudreuil, L. Avila, J. Bradt, and P. Pasquina, “Music ther-
apy applied to complex blast injury in interdisciplinary care: a
case report,” Disability and Rehabilitation, vol. 41, no. 19,
pp. 2333–2342, 2019.
[11] D. Girgin, “An investigation of the songs created by student-
teachers in music via an interdisciplinary approach based on
the Rasch measurement model and the Maxqda analysis pro-
gram,” International Online Journal of Education and Teach-
ing, vol. 7, no. 4, pp. 1661–1687, 2020.
[12] P. N. Juslin, G. Barradas, and T. Eerola, “From sound to signif-
icance: exploring the mechanisms underlying emotional reac-
tions to music,” The American Journal of Psychology, vol. 128,
no. 3, pp. 281–304, 2015.
[13] B. Gingras, M. T. Pearce, M. Goodchild, R. T. Dean,
G. Wiggins, and S. McAdams, “Linking melodic expectation
to expressive performance timing and perceived musical ten-
sion,” Journal of Experimental Psychology: Human Perception
and Performance, vol. 42, no. 4, pp. 594–609, 2016.
[14] G. Vaidya and P. C. Kalita, “Understanding emotions and their
role in the design of products: an integrative review,” Archives
of Design Research, vol. 34, no. 3, pp. 5–21, 2021.
[15] G. Xu, W. Li, and J. Liu, “A social emotion classification
approach using multi-model fusion,” Future Generation Com-
puter Systems, vol. 102, pp. 347–356, 2020.

Music and Emotions Thesis
100% (3)
Music and Emotions Thesis
5 pages
Emotion Classification For Musical Data Using Deep Learning Techniques
No ratings yet
Emotion Classification For Musical Data Using Deep Learning Techniques
8 pages
Feel The Beat Through Emotion Using Convolutional Neural Network
No ratings yet
Feel The Beat Through Emotion Using Convolutional Neural Network
5 pages
Emotify An AI-Powered Emotion-Based Music Recommendation System
No ratings yet
Emotify An AI-Powered Emotion-Based Music Recommendation System
6 pages
Music Emotion Recognition: Homer H. Chen National Taiwan University
No ratings yet
Music Emotion Recognition: Homer H. Chen National Taiwan University
54 pages
Music and Emotions PDF
No ratings yet
Music and Emotions PDF
30 pages
Music Emotion Classification and Context-Based Music Recommendation
No ratings yet
Music Emotion Classification and Context-Based Music Recommendation
28 pages
Music Emotion Recognition
No ratings yet
Music Emotion Recognition
12 pages
Smart Music Player Project
No ratings yet
Smart Music Player Project
28 pages
JETIR2202408
No ratings yet
JETIR2202408
7 pages
Automatic Mood Detection and Tracking of Music Audio Signals
No ratings yet
Automatic Mood Detection and Tracking of Music Audio Signals
14 pages
Modeling of Emotional Content
No ratings yet
Modeling of Emotional Content
6 pages
Examining Emotion Perception Agreement in Live Music Performance
No ratings yet
Examining Emotion Perception Agreement in Live Music Performance
19 pages
Music Emotion Recognition Intention of Composers-Performers Versus Perception of Musicians Non-Musicians and Listening Machines
No ratings yet
Music Emotion Recognition Intention of Composers-Performers Versus Perception of Musicians Non-Musicians and Listening Machines
12 pages
Facial Expression Based Music Recommendation System
No ratings yet
Facial Expression Based Music Recommendation System
10 pages
02 Extracting Emotions
No ratings yet
02 Extracting Emotions
10 pages
Automaticmusicemotionclassification PDF
No ratings yet
Automaticmusicemotionclassification PDF
14 pages
Design of Music Emotion Analysis and Creation Aid System Based On Machine Learning
No ratings yet
Design of Music Emotion Analysis and Creation Aid System Based On Machine Learning
12 pages
Music and Emotions
No ratings yet
Music and Emotions
10 pages
Fujipress - JACIII 24 7 8
No ratings yet
Fujipress - JACIII 24 7 8
10 pages
Emotion Based Music Recommendation System Using Machine Learning and AI
No ratings yet
Emotion Based Music Recommendation System Using Machine Learning and AI
8 pages
A Survey On Speech Emotion Based Music Recommendation System
No ratings yet
A Survey On Speech Emotion Based Music Recommendation System
4 pages
Acquiring Mood Information From Songs in Large Music Database PDF
No ratings yet
Acquiring Mood Information From Songs in Large Music Database PDF
7 pages
Smart Music Player Based On Emotion Recognition From Facial Expression
No ratings yet
Smart Music Player Based On Emotion Recognition From Facial Expression
3 pages
1 s2.0 S0378437122002291 Main
No ratings yet
1 s2.0 S0378437122002291 Main
20 pages
Dafx08 01
No ratings yet
Dafx08 01
6 pages
Zeroth Review 920&860
No ratings yet
Zeroth Review 920&860
16 pages
Music Recommendation System Using Facial Detection Based Emotion Analysis
No ratings yet
Music Recommendation System Using Facial Detection Based Emotion Analysis
6 pages
Music Recommendation System Based On Facial Expression
No ratings yet
Music Recommendation System Based On Facial Expression
5 pages
Lyrics, Music, and Emotions: Rada Mihalcea Carlo Strapparava
No ratings yet
Lyrics, Music, and Emotions: Rada Mihalcea Carlo Strapparava
10 pages
Shi24i Interspeech
No ratings yet
Shi24i Interspeech
5 pages
SSRN 4575257
No ratings yet
SSRN 4575257
9 pages
Aljanaki
No ratings yet
Aljanaki
149 pages
Đề xuất bài hát thông qua phân tích cảm xúc
No ratings yet
Đề xuất bài hát thông qua phân tích cảm xúc
14 pages
Disambiguating Music Emotion Using Software Agents: Dan Yang Wonsook Lee
No ratings yet
Disambiguating Music Emotion Using Software Agents: Dan Yang Wonsook Lee
6 pages
ANN Report
No ratings yet
ANN Report
14 pages
Emotion Based Music Recommendation System
No ratings yet
Emotion Based Music Recommendation System
4 pages
Emotion Based Music Recommendation System Using Deep Learning Model
No ratings yet
Emotion Based Music Recommendation System Using Deep Learning Model
6 pages
Music Player
No ratings yet
Music Player
5 pages
Novel Audio Features For Music Emotion Recognition - Renato Panda, Ricardo Malheiro, Rui Pedro Paiva
No ratings yet
Novel Audio Features For Music Emotion Recognition - Renato Panda, Ricardo Malheiro, Rui Pedro Paiva
13 pages
Emotion-Based Music Recommendation System
No ratings yet
Emotion-Based Music Recommendation System
5 pages
Mtech Paper Local 221014 132533
No ratings yet
Mtech Paper Local 221014 132533
5 pages
ANN Based Facial Emotion Detection and Music Selection
No ratings yet
ANN Based Facial Emotion Detection and Music Selection
5 pages
JETIR2305271
No ratings yet
JETIR2305271
5 pages
Free
No ratings yet
Free
3 pages
Jose Fornari CMMR Springer Article
No ratings yet
Jose Fornari CMMR Springer Article
18 pages
Cmmr2012 Submission 50
No ratings yet
Cmmr2012 Submission 50
8 pages
New Final Poster
No ratings yet
New Final Poster
1 page
Exploring Relationships Between Audio Features and Emotion in Music - Cyril Laurier, Olivier Lartillot, Tuomas Eerola, Petri Toiviainen
No ratings yet
Exploring Relationships Between Audio Features and Emotion in Music - Cyril Laurier, Olivier Lartillot, Tuomas Eerola, Petri Toiviainen
5 pages
V11i2 1171
No ratings yet
V11i2 1171
12 pages
IJCRT2106505
No ratings yet
IJCRT2106505
5 pages
Music Generation With NLP-5
No ratings yet
Music Generation With NLP-5
3 pages
Emotion Based Music Recomentation System
No ratings yet
Emotion Based Music Recomentation System
2 pages
Music Emotion Recognition Toward New Robust Standards in Personalized and Context-Sensitive Applications
No ratings yet
Music Emotion Recognition Toward New Robust Standards in Personalized and Context-Sensitive Applications
9 pages
Melodic Mood Emotion-Based Music Recommendation Sy
No ratings yet
Melodic Mood Emotion-Based Music Recommendation Sy
11 pages
Ref 3
No ratings yet
Ref 3
5 pages
IRJET V6I340320190826 49615 bg0qqz Libre
No ratings yet
IRJET V6I340320190826 49615 bg0qqz Libre
6 pages
Music System Based On Emotional Images
No ratings yet
Music System Based On Emotional Images
1 page
2024 Accounting Grade 10 Project - QP
No ratings yet
2024 Accounting Grade 10 Project - QP
5 pages
Senator Ron Johnson Interim Report - The Clinton Email Scandal and The FBI's Investigation of It
100% (6)
Senator Ron Johnson Interim Report - The Clinton Email Scandal and The FBI's Investigation of It
25 pages
The Fastest Indian Vegetarian Diet To Lose Weight - 7 Days GM Diet
50% (2)
The Fastest Indian Vegetarian Diet To Lose Weight - 7 Days GM Diet
14 pages
Se Ela Preguntar
No ratings yet
Se Ela Preguntar
8 pages
Elements of The Traditional Music of Thailand
80% (5)
Elements of The Traditional Music of Thailand
8 pages
Mitosis
No ratings yet
Mitosis
15 pages
MCA 301 Data Mining Notes
No ratings yet
MCA 301 Data Mining Notes
6 pages
MB-409 International Marketing All Unit Notes RGPV
No ratings yet
MB-409 International Marketing All Unit Notes RGPV
51 pages
2011 Design House Catalog
No ratings yet
2011 Design House Catalog
220 pages
E Mahesh PGT Mathematics
No ratings yet
E Mahesh PGT Mathematics
14 pages
Csta Standards Mapped To Commoncorestandards
No ratings yet
Csta Standards Mapped To Commoncorestandards
6 pages
Practical Experiments 1st Paper
No ratings yet
Practical Experiments 1st Paper
38 pages
Success Against The Odds
No ratings yet
Success Against The Odds
194 pages
GATE 2024 EC Memory-Based
No ratings yet
GATE 2024 EC Memory-Based
38 pages
Fashion Business Research - Assignment
No ratings yet
Fashion Business Research - Assignment
28 pages
Música - A Test To Evaluate The Musical Perception of People With Hearing Impairment
No ratings yet
Música - A Test To Evaluate The Musical Perception of People With Hearing Impairment
18 pages
Música y Salud
No ratings yet
Música y Salud
16 pages
Música - Uso de La Música para La Regulación de Las Emociones
No ratings yet
Música - Uso de La Música para La Regulación de Las Emociones
29 pages
USB Devices As VMFS Datastore in Vsphere ESXi 70 Virtennet
No ratings yet
USB Devices As VMFS Datastore in Vsphere ESXi 70 Virtennet
14 pages
Combinatorial Geometry - Po-Ru Loh - MOP (Black) 2010
No ratings yet
Combinatorial Geometry - Po-Ru Loh - MOP (Black) 2010
2 pages
Machine Learning Ai in Medical Devices
No ratings yet
Machine Learning Ai in Medical Devices
24 pages
Nursing in Research in Malawi
100% (1)
Nursing in Research in Malawi
28 pages
Música, Inteligencia y Ansiedad
No ratings yet
Música, Inteligencia y Ansiedad
14 pages
Música - Comprensión Multimodal Del Papel de
No ratings yet
Música - Comprensión Multimodal Del Papel de
32 pages
2nd Share Capital of A Company
No ratings yet
2nd Share Capital of A Company
34 pages
IDE Faith Sharing
No ratings yet
IDE Faith Sharing
9 pages
Música - La Formación Musical Con El Programa Démos Influye Positivamente en Las Funciones Cognitivas de Niños de Estrato Socioeconómico Bajo
No ratings yet
Música - La Formación Musical Con El Programa Démos Influye Positivamente en Las Funciones Cognitivas de Niños de Estrato Socioeconómico Bajo
21 pages
XML Tutorial For Beginners
No ratings yet
XML Tutorial For Beginners
28 pages
Música - Integrated Parent-Child Music Classes For Preschoolers With and
No ratings yet
Música - Integrated Parent-Child Music Classes For Preschoolers With and
19 pages
Música - Canciones de Amor y Serenatas, Revisión Teórica de La Música y Las Relaciones Románticas.
No ratings yet
Música - Canciones de Amor y Serenatas, Revisión Teórica de La Música y Las Relaciones Románticas.
21 pages
Zudio Bill 1
No ratings yet
Zudio Bill 1
3 pages
Paper Eng
No ratings yet
Paper Eng
21 pages
Holiday Homework 8th-1
No ratings yet
Holiday Homework 8th-1
4 pages
Nursing: A Concept-Based Approach To Learning: Volume One, Third Edition
No ratings yet
Nursing: A Concept-Based Approach To Learning: Volume One, Third Edition
32 pages
Música - Análisis de La Función de Configuración Psicológica de La Música.
No ratings yet
Música - Análisis de La Función de Configuración Psicológica de La Música.
1 page
Música - Implementación de Música de Piano Asistida Por Computadora
No ratings yet
Música - Implementación de Música de Piano Asistida Por Computadora
1 page
Música e Inteligencia
No ratings yet
Música e Inteligencia
1 page
Behaviour Management of An Anxious Child
No ratings yet
Behaviour Management of An Anxious Child
5 pages
Cardio (PP012) Quiz 1 Grades
No ratings yet
Cardio (PP012) Quiz 1 Grades
7 pages
Eaton Sure Lites Sel25 50 60 Spec PDF
No ratings yet
Eaton Sure Lites Sel25 50 60 Spec PDF
7 pages
CIN: U40109MH2005SGC153645: (A Govt. of Maharashtra Undertaking)
No ratings yet
CIN: U40109MH2005SGC153645: (A Govt. of Maharashtra Undertaking)
1 page
Homework: Level 3 BTEC Applied Science Unit 1 Past Paper Exam Questions
No ratings yet
Homework: Level 3 BTEC Applied Science Unit 1 Past Paper Exam Questions
3 pages

Música - Terapia Ocupacional Basada en Análisis de Emociones en Audio

Uploaded by

Música - Terapia Ocupacional Basada en Análisis de Emociones en Audio

Uploaded by

Hindawi

Occupational erapy International

Correspondence should be addressed to Wensi Ouyang; [email protected]

Academic Editor: Sheng Bin

Fourier transform spectrum diagram

Phase position –30

Sinusoidal frequency spectrum diagram

Figure 1: Spectrum diagram of the dispersion music signal.

Tone Movement Emotion Quality

MIDI, MP3 Smoothing

Figure 2: Semantic matching technology framework based on audio sentiment analysis.

0 50 100 150 200

Figure 4: Network model training iteration error graph.

4. Experimental Results and Analysis culation of multichannel fusion channel is obtained by

Figure 5: Graph of recognition rate of semantic matching model.

Figure 6: Statistical chart of accuracy of audio space feature vector analysis.

number of veriﬁed data. Precision is the ratio of the number A

Excitement Sadness Fear Anxiety Relief Joy

Exact value Mean value

Figure 8: Adaptive matching eﬀect of music unit validation model.

5. Conclusion Conflicts of Interest

References [16] Y. R. Pandeya and J. Lee, “Deep learning-based late fusion of

You might also like