0% found this document useful (0 votes)

11 views58 pages

ECMA-418!2!2nd Edition December 2022

Uploaded by

sarah shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views58 pages

ECMA-418!2!2nd Edition December 2022

Uploaded by

sarah shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

ECMA-418-2

2nd Edition / December 2022

Psychoacoustic metrics
for ITT equipment —
Part 2 (models based on
human perception)

Reference number
ECMA-123:2009

© Ecma International 2009

© Ecma International 2022

Contents Page

1 Scope ...................................................................................................................................................... 1
2 Conformance ......................................................................................................................................... 1
3 Normative references ............................................................................................................................ 1
4 Terms and definitions ........................................................................................................................... 2
5 A hearing model approach to calculate psychoacoustic parameters ............................................. 4
5.1 Psychoacoustic hearing model ........................................................................................................... 4
5.1.1 Overview ................................................................................................................................................. 4
5.1.2 Pre-processing of input data ................................................................................................................ 5
5.1.3 Outer and middle/inner ear filtering .................................................................................................... 5
5.1.4 Auditory filtering bank .......................................................................................................................... 7
5.1.5 Segmentation ......................................................................................................................................... 9
5.1.6 Rectification ......................................................................................................................................... 10
5.1.7 Calculation of root-mean-square values ........................................................................................... 10
5.1.8 Nonlinearity to transform sound pressure into specific loudness ................................................ 10
5.1.9 Consideration of threshold in quiet ................................................................................................... 11
6 Identification and evaluation of prominent tonalities using a psychoacoustic tonality
calculation method .............................................................................................................................. 13
6.1 Determination of tonality .................................................................................................................... 13
6.1.1 Tonalities and their relationships to the threshold of hearing ....................................................... 13
6.1.2 Multiple tones in a critical band, and time-variation of tonality due to their interaction ............. 13
6.2 Psychoacoustic tonality calculation method ................................................................................... 13
6.2.1 Overview ............................................................................................................................................... 13
6.2.2 Autocorrelation function ..................................................................................................................... 14
6.2.3 Averaging of ACFs .............................................................................................................................. 16
6.2.4 Application of ACF window ................................................................................................................ 16
6.2.5 Estimation of tonal loudness ............................................................................................................. 18
6.2.6 Resampling to common time basis ................................................................................................... 18
6.2.7 Noise reduction ................................................................................................................................... 19
6.2.8 Calculation of time-dependent specific tonality .............................................................................. 20
6.2.9 Calculation of averaged specific tonality.......................................................................................... 21
6.2.10 Calculation of time-dependent tonality ............................................................................................. 21
6.2.11 Calculation of representative values ................................................................................................. 22
6.3 Information to be recorded for prominent tonalities ....................................................................... 23
7 Identification and evaluation of prominent roughness using a psychoacoustic roughness
calculation method .............................................................................................................................. 24
7.1 Psychoacoustic roughness calculation method .............................................................................. 24
7.1.1 Overview ............................................................................................................................................... 24
7.1.2 Envelope calculation and downsampling ......................................................................................... 25
7.1.3 Calculation of scaled power spectrum .............................................................................................. 26
7.1.4 Noise reduction of the envelopes ...................................................................................................... 26
7.1.5 Spectral weighting ............................................................................................................................... 27
7.1.6 Optional entropy weighting based on randomness of modulation rate ........................................ 31
7.1.7 Calculation of time-dependent specific roughness ......................................................................... 33
7.1.8 Calculation of representative values ................................................................................................. 34
7.1.9 Calculation of time-dependent roughness ....................................................................................... 34
7.1.10 Calculation of representative values ................................................................................................. 34
7.1.11 Calculation of roughness for binaural signals ................................................................................. 34
7.2 Information to be recorded for prominent roughness ..................................................................... 35

© Ecma International 2022 i

8 Improved identification and evaluation of loudness using psychoacoustic methods of
tonal and noise loudness ................................................................................................................... 36
8.1 Psychoacoustic loudness calculation method ................................................................................ 36
8.1.1 Calculation of time-dependent specific loudness ........................................................................... 36
8.1.2 Calculation of averaged specific loudness ...................................................................................... 37
8.1.3 Calculation of time-dependent loudness ......................................................................................... 37
8.1.4 Calculation of representative values ................................................................................................ 37
8.1.5 Calculation of loudness for binaural signals ................................................................................... 38
8.2 Information to be recorded for loudness ......................................................................................... 38
Annex A (informative) Evaluation of the psychoacoustic hearing model .................................................. 39
Annex B (informative) Evaluation of the psychoacoustic tonality calculation method ............................ 41
B.1 Application examples ......................................................................................................................... 41
B.2 Evaluation ............................................................................................................................................ 43
Annex C (informative) Evaluation of the psychoacoustic roughness calculation method ...................... 45

ii © Ecma International 2022

Introduction
ECMA-418-2 specifies methods for identifying perceptually prominent components in airborne noise emitted by
information technology and telecommunications (ITT) equipment using models of human perception. The
content was originally published in ECMA-74 17th edition “Measurement of Airborne Noise emitted by
Information Technology and Telecommunications Equipment”. Psychoacoustic content of ECMA-74 was moved
to ECMA-418 Parts 1 and Part 2 to distinguish and separate it from the legacy prescriptions of microphone
position, equipment operation, and sound level processing, which remain in ECMA-74.

ECMA-418 Parts 1 and 2 are psychoacoustic standards and as such prescribe methods that represent the
perception of noise emitted by ITT equipment. Sound signals recorded by the procedures of ECMA-74 are
analysed using the psychoacoustic methods of ECMA-418 Parts 1 and 2. While intended for ITT equipment,
the methods may be useful for other applications as well.

The psychoacoustic methods in this standard, ECMA-418 Part 2 are based on a human hearing model of Sottek
that expresses specific loudness, which describes level- and frequency-dependent masking and threshold of
hearing. The model approximates the well-established Zwicker specific loudness method, but was extended by
using a modified Bark scale covering the entire audible frequency range and an improved nonlinear matching
of loudness at higher levels, which leads to a significant improvement of the prediction quality for several
loudness matching experiments using synthetic and technical sounds.

Additional models described in this standard use the specific loudness to express the strength of perceived
tonality and roughness. The models of this standard, Part 2, are more intricate than those of Part 1, which
considers sound pressure in narrow and critical bands and hearing threshold.

The first edition of ECMA-418-2 was issued in December 2020.

For the 2nd edition, there were several updates as follows:

− The hearing model, tonality, and roughness procedures of Clauses 5, 6, and 7 were refined, and the
descriptions of these procedures improved to assist implementation.

− In Clause 5, a figure showing auditory filter bank response of the hearing model of Sottek was added
to assist implementation.

− An entropy weighted roughness based on modulation rate random was added to Clause 7.1 for
applications in which measured rotational speed is available.

− Clause 8 was added to describe loudness of sounds with subcritical or larger bandwidths.

ECMA-418 series consists of the following parts, under the general title “Psychoacoustic metrics for ITT
equipment”:

⎯ Part 1 (prominent discrete tones)

⎯ Part 2 (models based on human perception)

This Ecma Standard was developed by Technical Committee 26 and was adopted by the General
Assembly of December 2022.

© Ecma International 2022 iii

iv © Ecma International 2022
COPYRIGHT NOTICE
© 2022 Ecma International
This document may be copied, published and distributed to others, and certain derivative works of it may
be prepared, copied, published, and distributed, in whole or in part, provided that the above copyright
notice and this Copyright License and Disclaimer are included on all such copies and derivative works.
The only derivative works that are permissible under this Copyright License and Disclaimer are:
(i) works which incorporate all or portion of this document for the purpose of providing commentary or
explanation (such as an annotated version of the document),
(ii) works which incorporate all or portion of this document for the purpose of incorporating features that
provide accessibility,
(iii) translations of this document into languages other than English and into different formats and
(iv) works by making use of this specification in standard conformant products by implementing (e.g. by
copy and paste wholly or partly) the functionality therein.
However, the content of this document itself may not be modified in any way, including by re moving the
copyright notice or references to Ecma International, except as required to translate it into languages
other than English or into a different format.
The official version of an Ecma International document is the English language version on the Ecma
International website. In the event of discrepancies between a translated version and the official version,
the official version shall govern.
The limited permissions granted above are perpetual and will not be revoked by Ecma International or
its successors or assigns.
This document and the information contained herein is provided on an "AS IS" basis and ECMA
INTERNATIONAL DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY
OWNERSHIP RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE.

© Ecma International 2022 v

vi © Ecma International 2022
Psychoacoustic metrics for IT equipment —
Part 2 (models based on human perception)

1 Scope

This standard describes the hearing model and psychoacoustic metrics dependent on the hearing model. The
input to the hearing model are sound signals recorded using the procedures of ECMA-74. The hearing model
expresses specific loudness [1]. Psychoacoustic models use the specific loudness to express the strength of any
tonalities or roughness in the sound generated by Information Technology and Telecommunications (ITT)
equipment. While developed for ITT equipment, the psychoacoustic methods of this standard may be relevant
to other applications like automobiles, consumer appliances, etc.

The tonality metric of this standard uses the auto-correlation function to describe causes of perceived tonality
such as individual or multiple steady or time-varying discrete tones, individual or multiple spectrally elevated
bands or slopes of noise, and combinations of these phenomena. A similar approach was published in 1998 to
determine “pitch salience” [2].

The roughness metric presented in this standard uses a spectrum of the sound signal envelope, refined by a
quadratic fit estimator, to describe roughness arising from sound signal envelope variations within a critical band
at modulation rates between 20 and around 300 Hz. For steady sounds, roughness perception peaks at
modulation rates of 70 Hz.

The loudness metric of this standard uses a nonlinear combination of tonal and noise loudness calculated as
intermediate results of the tonality algorithm to achieve a very good match of perceived loudness, especially for
sounds with a subcritical bandwidth (sounds containing tonal and noise components).

2 Conformance

Measurements are in conformity with this Standard if they meet the following requirements:

a) The measurements are taken in conformity with the Standard ECMA-74.

b) The measurements are carried out with a sampling rate of 48 kHz or they are resampled to a
sampling rate of 48 kHz if they were originally taken with a different sampling rate.
c) For the determination of prominent tonalities, the method specified in Clause 6 is used.
d) For the determination of prominent roughness, the method specified in Clause 7 is used.

3 Normative references

The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references, the
latest edition of the referenced document (including any amendments) applies.

ECMA-74, Measurement of Airborne Noise emitted by Information Technology and Telecommunications

Equipment, 19th edition (December 2021)

ISO 226:2003, Acoustics — Normal equal-loudness-level contours

© Ecma International 2022 1

4 Terms and definitions

For the purposes of this document, the following terms and definitions apply.

NOTE If a definition is identical to that in another standard, that standard and definition number is given in brackets.

4.1
loudness
𝑁
perceived magnitude of a sound, which depends on the acoustic properties of the sound and the specific
listening conditions, as estimated by that the average human listener with normal hearing.

NOTE 1 Loudness is expressed in sones.

NOTE 2 Loudness depends primarily upon the sound pressure level, although it also depends upon the frequency,
bandwidth, and duration of the sound.

NOTE 3 A sound that is twice as loud as another sound is characterized by doubling the number of sones.

NOTE 4 Adapted from ISO 532-1, 3.18

4.2
specific loudness
𝑁′
perceived magnitude or volume of sound in a critical band.

NOTE 1 The unit of specific loudness is expressed in terms of sone per Bark.

4.3
equal loudness contour
the sound pressure level1 for which the average human listener with normal hearing perceive constant loudness
when presented with a single frequency (pure) tone.

NOTE 1 Equal loudness contour is parameterized by the sound pressure level and frequency of the presented tone.
See ISO 226:2003.

4.4
threshold of hearing
level of a sound at which, under specified conditions, a person gives 50 % of correct detection responses on
repeated trials.

[SOURCE: ISO 226 :2003, 3.7

4.5
critical band
filter within the human cochlea describing the frequency resolution of the auditory system with characteristics
that are usually estimated from the results of masking experiments.

[SOURCE: ISO 532-1[3], 3.12]

4.6
critical bandwidth
bandwidth of a critical band.

NOTE 1 Each critical bandwidth has a width of one unit on the critical band rate scale.

1 The definition of sound pressure level is given in the terms and definitions of ECMA-74.

2 © Ecma International 2022

4.7
critical band rate scale
transformation of the frequency scale, constructed so that an increase in frequency equal to one critical
bandwidth leads to an increase of one unit on the critical band rate scale.

NOTE 1 Frequencies on the critical band rate scale are expressed in Bark.

NOTE 2 Adapted from ISO 532-1, 3.14

4.8
tonality
a characteristic of sound containing a single-frequency component or narrow-band components that emerge
audibly from the total sound.

NOTE 1 Tonality can arise from individual or multiple steady or time-varying discrete tones, individual or multiple
spectrally elevated bands or slopes of noise, and combinations of these phenomena.

4.9
envelope
the instantaneous amplitude of a signal.

NOTE 1 The instantaneous amplitude describes the low-frequent variations of the amplitude. It has a significantly
lower frequency than the carrier frequency of the signal.

4.10
roughness
a characteristic of sound with the quality of being uneven yet steady.

NOTE 1 Roughness can arise if the envelope of a sound signal within a critical band has temporal variation.

4.11
modulation
fluctuation of the envelope of a signal over time.

NOTE 1 Modulation is expressed in terms of its strength (modulation index) and the speed at which it changes
(modulation rate).

4.12
modulation rate
frequency of changes of the envelope of a signal.

NOTE 1 The modulation rate is expressed in Hertz.

NOTE 2 The word “rate” is used to avoid confusion with the sound frequency.

© Ecma International 2022 3

5 A hearing model approach to calculate psychoacoustic parameters

This clause describes a perception-model-based procedure for determining the specific loudness of a sound,
the hearing model of Sottek. There are different loudness calculation procedures, such as the German standard
DIN 45631/A1 [3] and the international standard ISO 532-1 [4] (both based on Zwicker’s loudness model) as well
as the Dynamic Loudness Model (by Chalupper and Fastl) [5], the Time Varying Loudness model (by Glasberg
and Moore) [6], and the loudness calculation algorithm based on the hearing model of Sottek, allowing for the
prediction of the perceived loudness of time-varying sounds in many cases (ISO 532-2[7] only applies to
stationary sounds). However, previous studies of Rennies et al. [8], [9] showed that the predictions for some time-
varying sounds do not match the loudness ratings of normal-hearing listeners. To address this issue, the
influence of specific signal properties of the sounds on the assessment of loudness was examined in
Reference [1] focusing on impulsive sounds. On the basis of these experiments, it was studied how far the
hearing model approach to time-varying loudness according to Sottek can account for the specific signal
properties of these sounds. It could be shown that the hearing model approach to time-varying loudness
performs better than other existing loudness models: The hearing model, characterized especially by the
application of an improved nonlinearity and the steeper curve progression at higher levels, leads to a significant
improvement of the prediction quality for several loudness matching experiments using synthetic and technical
sounds. In addition, the auditory filter bank used is based on an extended Bark scale covering the entire audible
frequency range while matching the experimental results related to critical bandwidth better than other models.
Further, the hearing model is able to predict the nonlinear behaviour with respect to just-noticeable amplitude
differences and variations. [1]

The hearing model described in this clause transforms sound pressure to loudness, where the unit of loudness
is soneHMS , where HMS stands for “according to the Hearing Model of Sottek” and denotes that the loudness
differs from other definitions. The result of the hearing model can be used as the basis for further psychoacoustic
analyses.

5.1 Psychoacoustic hearing model

5.1.1 Overview

Figure 1 displays the basic hearing model structure for calculating specific loudness as the basis for determining
other psychoacoustic sensations. Subsequently, the different signal processing blocks of the hearing model are
briefly explained.

Figure 1 — Basic hearing model structure, including the auditory filter bank, where CBF is the number
of critical band filters in the filter bank.

4 © Ecma International 2022

The input signal is a discrete time signal containing sound pressure values with a sampling rate of
𝑟s = 48 kHz 2. The common sampling rate 𝑟s = 48 kHz is chosen to ensure that the entire audible frequency
range is covered.

5.1.2 Pre-processing of input data

Initially, the first 5 ms of the input signal (corresponding to 𝑛fade in = 0,005 ⋅ 48000 = 240 samples) are multiplied
with a trigonometric weighting function

𝜋𝑛
𝑤fade in (𝑛) = 0,5 −0,5 ⋅ cos ( ) (1)
𝑛fade in

with 𝑛 = 0, 1, … , 𝑛fade in − 1 in order to reduce artifacts due to filter oscillations in case of signals starting with
non-zero values.

Second, zero-padding on both ends of the signal shall be performed to facilitate later processing steps. The
number of zeros at the end 𝑛zeros,end is calculated as:

𝑛zeros,end = 𝑛new − 𝑛samples , (2)

where 𝑛samples is the number of samples of the signal and 𝑛new equals to:

𝑛samples + 𝑠h,max + 𝑠b,max

𝑛new = 𝑠h,max ⋅ (ceil ( ) − 1) , (3)
𝑠h,max

where the ceil(𝑥) operator gives the smallest integer value higher than or equal to the number 𝑥. The band-
dependent block size 𝑠b (𝑧) and the hop size3 𝑠h (𝑧) are defined in detail in Clause 5.1.5 and 𝑠b,max and 𝑠h,max
are the largest band-dependent block size and hop size of all used filter stages, which are defined in Clause
6.2.2 for the tonality and in Clause 7.1.1 for the roughness. The number of zeros at the start 𝑛zeros,start shall be
equal to 𝑠b,max . The zero-padded sound pressure signal is named 𝑝(𝑛).

5.1.3 Outer and middle/inner ear filtering

5.1.3.1 Theory

The pre-processing consists of filtering the input signal 𝑝(𝑛) with transfer functions of the outer and of the
middle/inner ear. The transfer function of the outer ear was modelled based on measured head related transfer
functions (HRTFs). The transfer function of the middle/inner ear was chosen such that the filtering together with
the loudness threshold LTQ(𝑧) (as explained in Clause 5.1.9) leads to a loudness estimation emulating the
equal-loudness contours from 20 to 90 phon (with a step size of 10 phon) and the lower threshold of hearing.4
The middle/inner ear filter is optimized on the equal-loudness contours of ISO 226:2003.

2 If the input data is sampled at a different sampling rate than 48 kHz, a resampling to 48 kHz needs to be performed.

3 The hop size is the time shift to the next calculation block, smaller than block size if overlapping is used. It is related to the
percent overlap ov by 𝑠h (𝑧) = 𝑠b (𝑧) ∙ (100 − ov)/100.
4 In Zwicker’s loudness model [3] the influence of the outer and middle ear transfer functions is considered by the ear’s
transmission characteristic 𝑎0 .

© Ecma International 2022 5

Figure 2 — Equal loudness contours (ISO 226:2003) used as target for the filter transfer function
The lower threshold of hearing is also taken from ISO 226:2003. This corresponds also to the data of the lower
threshold of hearing published in ISO 389-7[10]. The target equal-loudness contours are illustrated in Figure 2.
An evaluation of the hearing model showing the emulated equal-loudness contours is given in Annex A.

5.1.3.2 Implementation

The transfer function of the resulting filter is shown in Figure 3. The overall filter is composed of a filter modelling
the influence of the outer ear and a filter modelling the influence of the middle/inner ear. Those filters are also
shown in Figure 3.

Figure 3 — Transfer function of the outer and middle/inner ear filter

For numerical reasons, it is recommended to implement this high-order filter as 𝐾 = 8 serially-cascaded second-
order filters 𝐻𝑘 (𝑓). The filter function 𝐻(𝑓) is then defined as

𝐻(𝑓) = ∏ 𝐻𝑘 (𝑓). (4)

𝑘=1

Each second-order filter 𝐻𝑘 (𝑓) can be implemented using the recursive Formula (5)

2 2

𝑦(𝑛) = ∑ 𝑏𝑚𝑘 𝑥(𝑛 − 𝑚) − ∑ 𝑎𝑚𝑘 𝑦(𝑛 − 𝑚) (5)

𝑚=0 𝑚=1

6 © Ecma International 2022

with input 𝑥(𝑛) and output 𝑦(𝑛). The corresponding filter coefficients are given in Table 1. The first five filters
describe the influence of the outer ear, the last three describe the influence of the middle/inner ear. The filtering
results in a filtered signal 𝑝om (𝑛).

Table 1 — Filter coefficients of outer and middle/inner ear filter

Filter coefficients
k 𝑏0𝑘 𝑏1𝑘 𝑏2𝑘 𝑎1𝑘 𝑎2𝑘
1 1,015896 -1,925299 0,922118 -1,925299 0,938014
2 0,958943 -1,806088 0,876439 -1,806088 0,835382
3 0,961372 -1,763632 0,821788 -1,763632 0,783160
4 2,225804 -1,434650 -0,498204 -1,434650 0,727599
5 0,471735 -0,366092 0,244145 -0,366092 -0,284120
6 0,115267 0,000000 -0,115267 -1,796003 0,805838
7 0,988029 -1,912434 0,926132 -1,912434 0,914161
8 1,952238 0,162320 -0,667994 0,162320 0,284244

5.1.4 Auditory filtering bank

5.1.4.1 Theory

An auditory filter bank consisting of overlapping asymmetric filters models the frequency-dependent critical
bandwidths and the tuning curves of the frequency-to-place transform of the inner ear, which mediates the firing
of the auditory hair cells as the traveling wave from an incoming sound event progresses along the basilar
membrane. The shape of the auditory filters matches the gammatone filters [11]. The amplitude is chosen such
that the filter has a gain of 0 dB at the centre frequency 𝐹(𝑧), with 𝑧 denoting the critical band rate scale. This
0 dB gain varies slightly for the first critical bands due to influence of the negative frequencies as seen in
Figure 4. The critical bandwidth ∆𝑓(𝑧) is chosen such that it corresponds to the equivalent rectangular
bandwidth (implementation details are given in Formulae (9) and (10)). The inconstant ratio of bandwidth versus
frequency of the auditory filter bank conveys a high frequency resolution at low frequencies and a high time
resolution at high frequencies, with a very small product of time and frequency resolution at all frequencies,
which empowers, for example, human hearing’s recognition of short-duration low-frequency events. The
impulse responses of the auditory filters are chosen as modulated low-pass filters (𝑗 is the imaginary unit):

ℎ𝑧 (𝑡) = 2 ∙ Re( ℎLP,𝑧 (𝑡) ∙ exp(𝑗2𝜋𝐹(𝑧)𝑡)) = 2 ∙ ℎLP,𝑧 (𝑡) ∙ cos(2𝜋𝐹(𝑧)𝑡) . (6)

The filters are calculated using the low-pass function

1 1 𝑡 𝑘−1 𝑡
ℎLP,𝑧 (𝑡) = 𝜀(𝑡) ∙ ∙ ∙( ) exp (− ) (7)
(𝑘 − 1)! 𝜏(𝑧) 𝜏(𝑧) 𝜏(𝑧)

where 𝑘 is the filter order5, 𝜀(𝑡) is the unit step function and the exclamation mark denotes the factorial operation.
𝜏(𝑧) is a time constant, related to ∆𝑓(𝑧) by

1 2𝑘 − 2 1
𝜏(𝑧) = ∙( )∙ . (8)
22𝑘−1 𝑘 − 1 ∆𝑓(𝑧)

The centre frequencies 𝐹(𝑧) and corresponding bandwidths ∆𝑓(𝑧) of the filter bank are calculated as

∆𝑓(𝑓 = 0)
𝐹(𝑧) = sinh(𝑐𝑧) (9)
𝑐

5 Filter order 𝑘 = 5 is used.

© Ecma International 2022 7

and

2
∆𝑓(𝑧) = √(∆𝑓(𝑓 = 0)) + (𝑐𝐹(𝑧))2 , (10)

with 𝑧 denoting the critical band rate scale.

Values for 𝑧 are chosen from 0,5 to 26,5 with a step size of ∆𝑧 = 0,5. ∆𝑓(𝑓 = 0) = 81,9289 Hz and 𝑐 = 0,1618.
These functions and settings lead to a better matching to the Bark table by Zwicker [12] than other existing
formulae, as documented in detail in Reference [1]. The unit of the critical band rate scale of this auditory filter
bank is Bark HMS , where HMS stands for “according to the Hearing Model of Sottek” and denotes that the critical
bands differ from other definitions.

As discrete approximation of the low-pass filter,

(1 − 𝑑)𝑘
ℎLP,𝑧 (𝑛) = 𝜀(𝑛) ∙ 𝑛𝑘−1 𝑑 𝑛 , (11)
∑𝑘−1
𝑖=1 𝑒𝑖 𝑑
𝑖

1
with time index 𝑛 and 𝑑 = exp (− ) is used6; 𝑒𝑖 depends on the filter order 𝑘 and is given below for a specific
𝑟𝑠 𝜏(𝑧)
value of 𝑘. The band-pass filtering using ℎ𝑧 (𝑡) can be implemented using the discrete approximation of the
band-pass filter

𝑗2𝜋𝐹(𝑧)𝑛 2𝜋𝐹(𝑧)𝑛
ℎ𝑧 (𝑛) = 2 ∙ Re ( ℎLP,𝑧 (𝑛) ∙ exp ( )) = 2 ∙ ℎLP,𝑧 (𝑛) ∙ cos ( ). (12)
𝑓𝑠 𝑓𝑠

5.1.4.2 Implementation

In the following, instructions for the implementation of the auditory filters are given: Digital filtering can be
implemented using the recursive Formula (13):

𝑘−1 𝑘

𝑦(𝑛) = ∑ 𝑏𝑚 𝑥(𝑛 − 𝑚) − ∑ 𝑎𝑚 𝑦(𝑛 − 𝑚) . (13)

𝑚=0 𝑚=1

For the discrete low-pass filter ℎLP,𝑧 (𝑛) as described in Formula (11), the real-valued filter coefficients are

𝑘
𝑎𝑚 = (−𝑑)𝑚 ( ), (14)
𝑚

and

(1 − 𝑑)𝑘
𝑏𝑚 = 𝑑 𝑚 𝑒𝑚 . (15)
∑𝑘−1
𝑖=1 𝑒𝑖 𝑑
𝑖

With a used filter order of 𝑘 = 5 the coefficients 𝑒𝑖 in Formula (11) and in Formula (15) are given as
1
𝑒0 = 0, 𝑒1 = 1, 𝑒2 = 11, 𝑒3 = 11, and 𝑒4 = 1 . As explained above, 𝑑 = exp (− ) with 𝜏(𝑧) as defined in
𝑟𝑠 𝜏(𝑧)
Formula (8).

6 𝑟 = 48 kHz is the sampling rate.

8 © Ecma International 2022

The coefficients 𝑎𝑚 and 𝑏𝑚 can be used for the implementation of the discrete approximation of the low-pass
function given in Formulae (7) and (11). However, to obtain the discrete approximation of the band-pass filter in
Formula (12), the filter coefficients of the low-pass filter shall be modified to:

′
𝑗2𝜋𝐹(𝑧)𝑚
𝑎𝑚 = 𝑎𝑚 exp ( ) (16)
𝑟s

and

′
𝑗2𝜋𝐹(𝑧)𝑚
𝑏𝑚 = 𝑏𝑚 exp ( ), (17)
𝑟s

with a sampling rate of 𝑟s = 48 kHz. Using these modified filter coefficients in the recursive Formula (13) results
in a discrete implementation of the auditory filters. The filter results in a complex-valued band-pass signal with
a single-sided spectrum. Two times the even part of the spectrum of this signal corresponds to the real-valued
band-pass signal. Thus, the real-valued band-pass signal can be determined as the double real part of the
complex result.

Figure 4 shows the magnitude of the transfer functions of the auditory filter bank, calculated by filtering a digital
Dirac pulse (sampling rate: 48000 Hz, duration 1 s) using the filter coefficients 7 defined in Formulae (16) and
(17) with a subsequent Fourier transform on the real-value band-pass signal.

Figure 4 — Magnitude of the transfer functions of the auditory filter bank

The auditory filter bank results in CBF = 53 band-pass signals 𝑝om,𝑧 (𝑛) centred around the critical band rate
scale values 𝑧 ranging from 0,5 to 26,5, thus leading to an extension of the Bark scale for frequencies of the
entire audibility range up to approximately 20 kHz using 53 critical band filters with an overlap of 50%.

5.1.5 Segmentation

For further processing, segmentation into blocks needs to be performed and blockwise root-mean-square (RMS)
values need to be calculated. For the segmentation, the band-dependent block size 𝑠b (𝑧) and the hop size 𝑠h (𝑧)
can be chosen depending on the application. Values for 𝑠b (𝑧) and 𝑠h (𝑧) for the calculation of the psychoacoustic
tonality are given in Clause 6.2.2 and for the calculation of the psychoacoustic roughness in Clause 7.1.1.

7 Filtering shall be performed with double precision.

© Ecma International 2022 9

The segmentation can be described as:

𝑝𝑙,𝑧 (𝑛′ ) = 𝑝om,𝑧 (𝑙 ∙ 𝑠h (𝑧) + 𝑖start (𝑧) + 𝑛′ ) (18)

with 0 ≤ 𝑛′ ≤ 𝑠b (𝑧) − 1, where the time index 𝑙 describes the block number of each block, starting with 𝑙 = 0
(corresponding to a time of 0 ms). 𝑖start (𝑧) is an index offset that guarantees that the first block of all stages
corresponds to the same time reference. It is defined as:

𝑖start (𝑧) = 𝑠b (0.5) − 𝑠b (𝑧). (19)

Thus, each block 𝑝𝑙,𝑧 (𝑛′ ) ranges from 𝑛 = 𝑙 ∙ 𝑠h (𝑧) + 𝑖start (𝑧) to 𝑛 = 𝑙 ∙ 𝑠h (𝑧) + 𝑖start (𝑧) + 𝑠b (𝑧) − 1. The last
value of 𝑙, 𝑙last (z), is dependend on the filter band and the value 𝑛new defined in Formula (3):

𝑛new + 𝑠h (𝑧)
𝑙last (𝑧) = ceil ( )−1. (20)
𝑠h (𝑧)

5.1.6 Rectification

Subsequent half-wave rectification accounts for the fact that the auditory nerves fire only when the basilar
membrane vibrates in a specific direction [13]. The resulting band-pass signals are calculated as:

𝑝𝑙,𝑧 (𝑛′ ), 𝑝𝑙,𝑧 (𝑛′ ) > 0

𝑝rect,𝑙,𝑧 (𝑛′ ) = { . (21)
0, 𝑝𝑙,𝑧 (𝑛′ ) ≤ 0

5.1.7 Calculation of root-mean-square values

With the segmented and rectified blocks 𝑝rect,𝑙,𝑧 (𝑛′ ), the RMS-values are calculated for each block as:

𝑠b (𝑧)−1
2
𝑝̃(𝑙, 𝑧) = √ ∑ 𝑝rect,𝑙,𝑧 2 (𝑛′) , (22)
𝑠b (𝑧)
𝑛′=0

The factor of 2 is necessary to compensate for the signal energy which was lost due to the half-wave rectification.
The dependency on the time index 𝑙 is dropped in the following, since the further processing steps are applied
to each time block in the same way.

5.1.8 Nonlinearity to transform sound pressure into specific loudness

The compressive nonlinearity of the auditory system is significant for the loudness perception. The specific
loudness distribution, resulting from the application of this nonlinearity to the excitation pattern, also forms the
basis for calculating other psychoacoustic parameters such as tonality, roughness or fluctuation. Such a
nonlinearity function has proven applicable to predict many phenomena like ratio loudness, just-noticeable
amplitude differences and modulation thresholds as well as the level dependence of roughness.

The nonlinearity between specific loudness and sound pressure was reconsidered in the hearing model
according to results of many listening tests [14]. Further improvements for higher levels above approximately
80 dB were achieved by introducing a nonlinearity function according to Formula (23):
𝑣𝑖 −𝑣𝑖−1
𝑀 𝛼
𝑝̃ 𝑝̃ 𝛼
𝐴′ (𝑝̃) = 𝑐N ∙ ( ) ∙ ∏ (1 + ( ) ) (23)
𝑝̃0 𝑝̃𝑡𝑖
𝑖=1

with root-mean-square values of sound pressure 𝑝̃ and thresholds 𝑝̃𝑡𝑖 in Pa, 𝑝̃0 = 20 µPa. The 𝑀 thresholds 𝑝̃𝑡𝑖
can be derived from Table 2; 𝛼 is set to 1,5; 𝑐N = 0,0211668 is a calibration factor with the

10 © Ecma International 2022

unit soneHMS /Bark HMS 8 to assure that the total loudness of a sinusoid having a frequency of 1 kHz and a sound
pressure level of 40 dB equals 1 soneHMS (using the method described in Clause 8.1)9. The 𝑀 = 8 exponents
𝜈𝑖 as given in Table 2 were achieved by applying a nonlinear-optimization procedure in order to minimize the
root-mean-square error between the results of the loudness matching experiment and the results of the model
calculation. The initial exponent 𝜈0 is set to 1.

Table 2 — 𝑴 = 𝟖 thresholds and exponents for the nonlinearity function for Formula (23)
𝑖 1 2 3 4 5 6 7 8

20 log10 ( 𝑝̃𝑡𝑖 ⁄𝑝̃0 ) [dB] 15 25 35 45 55 65 75 85

I 0,6602 0,0864 0,6384 0,0328 0,4068 0,2082 0,3994 0,6434

𝜈𝑖

The nonlinearity is applied to 𝑝̃(𝑧) in each band 𝑧. The resulting variable

̃ ′ (𝑧)
𝑁 = 𝐴′ (𝑝̃(𝑧)) (24)

can be interpreted as the specific loudness of the signal without consideration of the threshold in quiet.

The function according to Formula (23) results from an optimization procedure to fit the experimental data with
the lowest root-mean-square error [14]. It has a steep slope at high levels, which agrees with results of
experiments from Buus et al. [15] and Epstein et al. [16]

5.1.9 Consideration of threshold in quiet

The specific loudness in each band 𝑧 is zero if it is at or below a critical-band-dependent specific loudness
threshold LTQ(𝑧). The band-specific loudness threshold LTQ(𝑧) is given for each used band number 𝑧 from 0,5
to 26,5 in Table 3. Figure 5 shows the loudness threshold LTQ(𝑧) in dependency of the center frequency of the
bands.

Table 3 — Specific loudness threshold 𝐋𝐓𝐐(𝒛) for each used value of 𝒛

𝑧 LTQ(𝑧) 𝑧 LTQ(𝑧) 𝑧 LTQ(𝑧) 𝑧 LTQ(𝑧) 𝑧 LTQ(𝑧)
0,5 0,3310 6,0 0,0151 11,5 0,0071 17,0 0,0122 22,5 0,0202
1,0 0,1625 6,5 0,0131 12,0 0,0072 17,5 0,0138 23,0 0,0217
1,5 0,1051 7,0 0,0115 12,5 0,0073 18,0 0,0157 23,5 0,0237
2,0 0,0757 7,5 0,0103 13,0 0,0074 18,5 0,0172 24,0 0,0263
2,5 0,0576 8,0 0,0093 13,5 0,0076 19,0 0,0180 24,5 0,0296
3,0 0,0453 8,5 0,0086 14,0 0,0079 19,5 0,0180 25,0 0,0339
3,5 0,0365 9,0 0,0081 14,5 0,0082 20,0 0,0177 25,5 0,0398
4,0 0,0298 9,5 0,0077 15,0 0,0086 20,5 0,0176 26,0 0,0485
4,5 0,0247 10,0 0,0074 15,5 0,0092 21,0 0,0177 26,5 0,0622
5,0 0,0207 10,5 0,0073 16,0 0,0100 21,5 0,0182
5,5 0,0176 11,0 0,0072 16,5 0,0109 22,0 0,0190

8 HMS stands for “according to the Hearing Model of Sottek” and denotes that the calculated loudness and the critical bands
differ from other definitions.
9 The calibration factor 𝑐 can be adjusted within a tolerance of 0,25 % to account for the effects of different implementations.
N

© Ecma International 2022 11

Figure 5 — Specific loudness threshold 𝐋𝐓𝐐(𝒛)
The lower threshold of hearing is applied by subtraction and a limiter:

′
̃ ′ (𝑧) − LTQ(𝑧),
𝑁 ̃ ′ (𝑧) ≥ LTQ(𝑧)
𝑁
𝑁basis (𝑧) = { . (25)
0 ̃ ′ (𝑧) < LTQ(𝑧)
𝑁

′
The result 𝑁basis (𝑧) is the specific basis loudness of the signal. The specific basis loudness can be used as basis
for other psychoacoustic parameters such as tonality (see Clause 6) and roughness (see Clause 7).

A signal is considered to be audible when its total loudness value exceeds 0,01 soneHMS , where total basis
loudness is calculated summing all specific basis loudness values, using ∆𝑧 = 0,5 as

CBF
𝑖
𝑁basis = ′
∑ 𝑁basis ( ) ∙ ∆𝑧 . (26)
2
𝑖=1

Consideration of both total and specific basis loudness has the benefit of allowing loudness summation of
sounds consisting of multiple components near threshold.

Recent investigations showed that existing loudness procedures underestimate the loudness of tonal signals [17].
Clause 8.1 describes a new loudness algorithm based on a nonlinear weighting of the partial loudness of tonal
and non-tonal components derived in Clause 6.2.

12 © Ecma International 2022

6 Identification and evaluation of prominent tonalities using a psychoacoustic
tonality calculation method
This clause describes a perception-model-based procedure for determining whether or not noise emissions
contain prominent tonalities, and if present, their strengths: the psychoacoustic tonality calculation method. A
similar approach was published in 1998 for the determination of “pitch salience” [2]. The calculation is based on
the specific basis loudness as described in Clause 5.

Prominent perceived tonalities arise from a variety of causes including but not limited to prominent discrete
tones: discrete tones, non-pure tones, narrow elevated noise bands, combinations of tones and narrow elevated
noise bands, band-edges of various slopes terminating elevated noise bands of various bandwidths, and
combinations of these. This clause defines a procedure for identifying and ranking tonalities from any causes.

6.1 Determination of tonality

6.1.1 Tonalities and their relationships to the threshold of hearing

Discrete tones or other tonalities should only be classified as prominent if they are, in fact, audible in the noise
emissions of the equipment under test. For the tonality calculation methods as described in ECMA-418 – Part 1:
Dominant discrete tones, a pre-calculation screening test is recommended concerning audibility of the tonality.
From calibrated acoustical measurement time-data, this step is not required with the psychoacoustic tonality
calculation method regardless of proximity to the threshold of hearing because the method inherently considers
the threshold of hearing and the psychoacoustic loudness of tonal and non-tonal components.

6.1.2 Multiple tones in a critical band, and time-variation of tonality due to their interaction

The noise emitted by a machine may contain multiple tones or narrowband tonalities, several of which may fall
within a single critical band. Besides the likelihood of increased overall tonality strength due to a plurality of
tones within one critical bandwidth, there is a strong likelihood of beating interference between or among the
plural tonalities causing time structure (amplitude modulation): periodic additions and cancellations affecting the
strength of the perceived tonality within that critical band. In this case the sound is often perceived as “rough”,
leading to the psychoacoustics sensation of “roughness”. A method for the identification of prominent roughness
is described in Clause 7.

6.2 Psychoacoustic tonality calculation method

6.2.1 Overview

Tonality perceptions arising from spectrally-elevated noise bands of various widths and slopes and from non-
pure tones as well as from discrete (pure) tones, and from combinations of these, can be mis-measured or
escape measure in “hybrid” sound pressure based tools and tools sensitive only to discrete tones. To address
such issues, a new psychoacoustically-based tonality calculation method based on the hearing model in
Clause 5[18] was developed. The applicability of the model was investigated for technical sounds and compared
to established methods of tonality calculation [19], [20], [21]. The method automatically considers the threshold of
hearing because the hearing threshold is built into the hearing model [21].

Recent research results show a strong correlation between tonality perception and the partial loudness of tonal
sound components [22], [23], [24]. Therefore, the new hearing model approach to tonality on the basis of the
perceived loudness of tonal content has been developed. The new model evaluates the nonlinear and time-
dependent specific loudness of both tonal and broadband components, which are separated using the
autocorrelation function. This model has been validated by many sound situations and listening tests [19].

In early publications, Licklider assumed that human pitch perception is based on both spectral and temporal
cues [25]. According to Licklider, the neuronal processing in human hearing applies a running autocorrelation
analysis of the critical band signals. Under this assumption, psychoacoustic tonality phenomena like difference-
tone perception or the missing-fundamental phenomenon (”virtual pitch”) can be explained.

© Ecma International 2022 13

This work inspired the idea to use the sliding autocorrelation function as a processing block in the hearing model
for the calculation of roughness and fluctuation strength [20], [26], [27] and later for other psychoacoustic quantities
like tonality [19] and loudness [1]. The psychoacoustic tonality calculation is based on scaled ACFs 𝜑𝑧 ′(𝑚) (see
Clause 6.2.2, with 𝑧 denoting the critical band rate scale and 𝑚 denoting the lag), which are calculated using
′
the specific basis loudness 𝑁basis (𝑧) (see Formula (25)) and the CBF = 53 rectified band-pass signals 𝑝𝑧 (𝑛)
(see Clause 5.1.6) as described in Clause 5. An evaluation of the psychoacoustic tonality method, including
application examples, is given in Annex B.

The further processing for tonality calculation is performed similarly as published in References [19], [20], and
[21] as shown in Figure 6 and described in detail as follows:

Figure 6 — Calculation of tonality based on the scaled ACFs as described in Reference [19], but with
frequency-dependent analysis window borders

6.2.2 Autocorrelation function

Recently, it was proposed to use the autocorrelation function of the band-pass signals to separate tonal content
from noise [1]. The autocorrelation function of white Gaussian noise is characterized by a Dirac impulse. Any
broadband noise signal has at least a non-periodic autocorrelation function with high values at low lags, whereas
the autocorrelation function (ACF) of periodic signals shows also a periodic structure [28]. Thus, the loudness of
the tonal component can be estimated by analyzing the ACF at a certain range with respect to the lag 𝑚, and
also the loudness of the remaining (noisy) part.The calculation of the sliding ACF is time-consuming. Therefore,
the sliding ACF is calculated block-wise using the discrete Fourier Transform (DFT) to shorten computing time.
An overlap of 75% is used for neighbouring blocks. There is a low-pass effect due to averaging over the block
length. The ACF is performed on the same rectified blocks 𝑝rect,𝑙,𝑧 (𝑛′ ) (see Formula (21)) of the overlapping
critical band signals, from which the root-mean-square values were calculated in Clause 5.1.7.

For slowly varying low-frequency band-pass signals, a greater block length 𝑠b (𝑧) is necessary than for higher-
frequency bands. Thus, different block lengths are used, depending on the frequency band. The block length is

14 © Ecma International 2022

chosen corresponding to the bandwidth ∆𝑓(𝑧) of each frequency band as described in Formula (10). The given
values for the block size 𝑠b (𝑧) and the hop size 𝑠h (𝑧) also need to be used for the segmentation for the loudness
calculation (see Clause 5.1.5).

Table 4 — Block length 𝒔𝐛 (𝒛) and hop size 𝒔𝐡 (𝒛) for the calculation of the autocorrelation function
∆𝑓(𝑧) 0 − 85 Hz 85 − 170 Hz 170 − 340 Hz > 340 Hz
𝑧 0,5 − 1,5 2−8 8,5 − 12,5 ≥ 13
𝑠b (𝑧) 8192 4096 2048 1024
𝑠h (𝑧) 2048 1024 512 256

For each block of length 𝑠b (𝑧), an unscaled autocorrelation function 𝜑𝑙,𝑧 (𝑚) is calculated in two steps: first a
2𝑠b -point DFT10 of 𝑝rect,𝑙,𝑧 (𝑛′) is performed by zero padding, where 𝑠b (𝑧) is the block size given in Table 4, with
a subsequent calculation of the squared magnitude:

2
𝑃rect,𝑙,𝑧 (𝑘) = |DFT2𝑠b (𝑝rect,𝑙,𝑧 (𝑛′))| , 0 ≤ 𝑘 < 2𝑠b (𝑧) , (27)

and second the Inverse Discrete Fourier Transform (IDFT11) of 𝑃rect,𝑙,𝑧 (𝑘) is calculated12:

𝜑unscaled,𝑙,𝑧 (𝑚) = IDFT2𝑠b (𝑃rect,𝑙,𝑧 (𝑘)) , 0 ≤ 𝑚 < 2𝑠b (𝑧). (28)

The next step is to compute a new estimate of an unbiased normalized autocorrelation function that
3
compensates for lower overlaps at higher lag 𝑚 (windowed, only values for 0 ≤ 𝑚 < 4 𝑠 (𝑧) needed):13
b

𝜑𝑙,𝑧 (𝑚)
𝜑unscaled,𝑙,𝑧 (𝑚) 3
, 0 ≤ 𝑚 < 𝑠b (𝑧)
4
b (𝑧)−𝑚−1
√∑𝑠𝑛′=0 𝑠b (𝑧)−𝑚−1
𝑝rect,𝑙,𝑧 2 (𝑛′) ⋅ ∑𝑛′=0 𝑝rect,𝑙,𝑧 2 (𝑛′ + 𝑚) + 𝜀 (29)
= ,
3
0, 𝑠 (𝑧) ≤ 𝑚 < 2𝑠b (𝑧)
{ 4 b

10 The N-point DFT is defined as 𝑋(𝑘) = DFT (𝑥(𝑛)) = ∑𝑁−1 𝑥(𝑛) ∙ e−𝑗2𝜋𝑘𝑛/𝑁 .
𝑁 𝑛=0

11 The K-point IDFT is defined as 𝑥(𝑛) = IDFT (𝑋(𝑘)) = 1 ∑𝐾−1 𝑋(𝑘) ∙ e+𝑗2𝜋𝑘𝑛/𝐾 .
𝐾 𝑘=0 𝐾

12 The presented calculations use two-sided spectra. This must be considered in an implementation since some signal
processing libraries also use symmetry properties in their function calls to speed up the calculation and thus expect adjusted
call parameters.
13 A common problem in estimating a blockwise autocorrelation function is the decreasing overlap of the blocks with
increasing lag 𝑚. The unscaled autocorrelation
𝑠b (𝑧)−𝑚−1

𝜑𝑧 (𝑚) = ∑ 𝑝𝑧 (𝑛′)𝑝𝑧 (𝑛′ + 𝑚)

𝑛′=0

does not consider this problem and thus leads to decreasing values for higher lag values, even if the signal is perfectly
periodic. The commonly used approach for the unbiased autocorrelation, which aims to compensate for this problem, is
𝑠b (𝑧)−𝑚−1
1
𝜑𝑧 (𝑚) = ∑ 𝑝𝑧 (𝑛′)𝑝𝑧 (𝑛′ + 𝑚) .
𝑠b (𝑧) − |𝑚|
𝑛′=0

However, this approach may lead to unwanted effects, since the result does not necessarily satisfy the condition
𝜑𝑧 (𝑚) ≤ 𝜑𝑧 (0), which is an essential property of the ACF. The new approach for the unbiased autocorrelation solves this
problem by considering the energies of the overlapping parts of the blocks [29]. A drawback of this approach is the
overestimation of the ACF of noise signals for higher lag values, but these values are neglected in further processing.

© Ecma International 2022 15

where the additive constant 𝜀 = 10−12 prevents division by zero 14 . The dependency on the time index 𝑙 is
dropped in the following, since the further processing steps are applied to each time block in the same way.

The autocorrelation function has to to be calculated with two different block lengths for some frequency bands
to allow averaging over neighbouring bands in later processing steps, as explained in the following Clause 6.2.3.

The entire ACF is multiplied with the specific basis loudness of the signal15:
′
𝜑𝑧 ′(𝑚) = 𝑁basis (𝑧) ∙ 𝜑𝑧 (𝑚), (30)

resulting in scaled16 ACFs 𝜑𝑧′ (𝑚) which can be used for further analysis of the tonality.

6.2.3 Averaging of ACFs

First, ACFs of neighbouring bands are averaged in order to reduce noise. Averaging is performed over 2𝑁𝐵 + 1
bands, i.e., each band is averaged with the neighbouring 𝑁𝐵 lower and 𝑁𝐵 higher frequency bands. The value
𝑁𝐵 is chosen depending on the block size as described in Table 5. Since averaging needs to be performed with
identical block size, it needs to be ensured that the autocorrelation function of neighbouring bands is available
in the same block size. Thus, for frequency bands close to block size changes, the autocorrelation function
needs to be calculated with two different block sizes. If not enough neighbouring frequency bands exist (for the
lower frequency bands), 𝑁𝐵 is reduced such that averaging is still performed symmetrically centred around the
particular frequency band. An exception is made for the lowest frequency band, which is averaged only with the
second-lowest frequency band. This is necessary, because a symmetric averaging is not possible because of
the missing lower band. No averaging on the other hand results in high noise artefacts.

Table 5 — Number of bands to average 𝑵𝑩 depending on block size 𝒔𝐛

𝑠b 8192 4096 2048 1024

𝑁𝐵 2 2 1 0

In a next step, the ACFs are averaged over neighbouring blocks in time for further reduction of noise. This block
averaging is performed only for the block sizes 𝑠b = 8192 and 𝑠b = 4096, in which case the ACF in a given block
is averaged with the ACFs in the preceding and the subsequent blocks. The averaging is not performed for the
first and the last block because there is no preceding respectively nor subsequent block.

The outcome of the two averaging steps is a modified, noise reduced scaled ACF 𝜑̅𝑧 ′(𝑚).

6.2.4 Application of ACF window

A lag window with frequency-dependent limits (𝜏start (𝑧) and 𝜏end (𝑧)) according to Formulae (31) and (32) is
applied to the ACF 𝜑̅𝑧 ′(𝑚) to separate tonal from noisy content:

0,5
𝜏start (𝑧) = max ( ,𝜏 ), (31)
∆𝑓(𝑧) min

4
𝜏end (𝑧) = max ( ,𝜏 (𝑧) + 1 ms) . (32)
∆𝑓(𝑧) start

14 The additive constant 𝜀 = 10−12 is used throughout the complete document to avoid division by zero in several formulae.

15 𝑁 ′ (𝑧) is the specific loudness calculated in Formula (25).

basis

16 The ACF is scaled such that 𝜑 ′(0) represents the specific loudness 𝑁′(𝑧).
𝑧

16 © Ecma International 2022

Here ∆𝑓(𝑧) is the bandwidth of the critical band centred at 𝑧, 𝜏min is 2 ms.

It can be shown that the autocorrelation function of a periodic signal is itself periodic [28]. In the case of a pure
tone, the period of the ACF equals the period of the tone. Consequently, the signal energy of a pure tone can
be identified at multiples of the signal period. For white Gaussian noise, the autocorrelation function is a Dirac
impulse, weighted by the power spectral density of the noise [28]. In case of broadband white noise, the
autocorrelation function converges towards a Dirac impulse.

ACF window
8

-4

-8

t0 tstart tend

0 20 40 60 80
t / ms

Figure 7 — Positioning of the ACF window for tonal content separation. This example shows the
autocorrelation function of a tone in pink background noise
Figure 7 visualizes the placement of the ACF window for the autocorrelation function of a tone in pink
background noise.17

From the calculated lag times, indices are calculated as

𝑚start (𝑧) = ceil(𝜏start (𝑧) ⋅ 𝑟𝑠 ) − 1, (33)

𝑚end (𝑧) = floor(𝜏end (𝑧) ⋅ 𝑟𝑠 ) − 1, (34)

where the ceil(𝑥) operator gives the smallest integer value higher than or equal to the number 𝑥 and the floor(𝑥)
operator gives the greatest integer value smaller than or equal to the number 𝑥. The window is applied by setting
all elements of 𝜑̅𝑧 ′(𝑚′) except the ones from index 𝑚start (𝑧) to index 𝑚end (𝑧) to zero and subtracting the mean
of the windowed part of the ACF:

𝑚end (𝑧)
∑𝑚=𝑚 𝜑̅𝑧 ′(𝑚)
start (𝑧)
′ (𝑚)
𝜑𝑧,𝜏 = {𝜑̅𝑧 ′(𝑚) − , 𝑚start (𝑧) ≤ 𝑚 ≤ 𝑚end (𝑧) (35)
𝑀
0, else

𝑀 = 𝑚end (𝑧) − 𝑚start (𝑧) + 1 is the number of samples in the window.

17 The motivation for the limits given in Formulae (31) and (32) is as follows: In Figure 7, the energy distribution at small
lags results from the noisy background and is disregarded by appropriately choosing the lower window border. Nevertheless,
narrow-band noise also causes a perception of tonality when the bandwidth is comparatively small (i.e., few critical bands).
This effect leads to a trade-off in the placement of the window borders: For a smaller bandwidth, the effect of the low-pass
filtered noise on the ACF reaches higher lags than for a larger bandwidth. Thus, the window needs to be moved to higher
lags for a lower bandwidth. On the other hand, higher lags are less reliable because they are calculated from a smaller
number of samples. Therefore, the upper limit of the window should not be chosen too large.

© Ecma International 2022 17

6.2.5 Estimation of tonal loudness

The specific loudness of the tonal component is estimated by evaluating the spectrum of the ACF inside the lag
′ (𝑚′).
window 𝜑𝑧,𝜏 A 16384-point DFT 18 of the 𝑀 samples is performed by zero-padding, where the number
16384 is chosen as two times the largest block size 𝑠b (𝑧) given in Table 4:

′ (𝑚′))
Φ′𝑧,𝜏 (𝑘) = DFT16384 (𝜑𝑧,𝜏 . (36)

The maximum magnitude of the spectrum is searched, meaning, that the largest tonal content is extracted 19:

max (|Φ′𝑧,𝜏 (𝑘)| ) max (|Φ′ 𝑧,𝜏 (𝑘)| )

𝑘 𝑘
2 , 2 ≤ 𝜑̅𝑧 ′(0)
̂tonal
𝑁 ′
(𝑧) = 𝑀 𝑀 . (37)
2 2
{ 𝜑̅𝑧 ′(0), else

̂tonal
𝑁 ′
(𝑧) is a first estimation of the specific loudness of the tonal component. The frequency 𝑓ton (𝑧) of this
component in the critical band centred around 𝑧 can be estimated by first finding the DFT index 𝑘max
corresponding to the maximum of Φ′𝑧,𝜏 (𝑘).

𝑘max (𝑧) = arg max (Φ′𝑧,𝜏 (𝑘)) . (38)

𝑘

and calculating the corresponding frequency

𝑟s
𝑓ton (𝑧) = 𝑘max (𝑧) ∙ . (39)
16384

While this approach is capable of analysing tonalities with a rather high frequency resolution, it might
underestimate tonal content when the corresponding frequency changes quickly inside of one block. This should
be considered, even though the adaptive block size with smaller blocks for high frequencies aims at reducing
this problem, since quickly varying frequencies usually occur at high frequencies.

6.2.6 Resampling to common time basis

For the further processing, the dependency of the time of each processed block becomes important. Thus, the
time index 𝑙 (which was dropped in Clause 6.2.2) needs to be considered. Since the results of different bands
are in a different time basis at this stage of the processing due to a different block length, the bands with a
higher block size are resampled to correspond to the time basis of the blocks calculated with the smallest block
size of 1024. The resampling is done by linear interpolation. In Table 6, the interpolation factors 𝑖 for each critical
band 𝑧 are given.

Table 6 — Interpolation factors for critical bands with different block size
𝑧 0,5 − 1,5 2−8 8,5 − 12,5 ≥ 13
𝑠b (𝑧) 8192 4096 2048 1024
𝑖 8 4 2 1

For all time-dependent variables (𝑖 − 1) new samples are inserted between two given adjacent samples by
simple linear interpolation.

18 The N-point DFT is defined as 𝑋(𝑘) = DFT (𝑥(𝑛)) = ∑𝑁−1 𝑥(𝑛) ∙ e−𝑗2𝜋𝑘𝑛/𝑁 .
𝑁 𝑛=0

19 The normalization by 𝑀 is necessary to calculate the energy of the windowed ACF from the DFT result. The scaling factor
2
2 is necessary because of the half-wave rectified signal.

18 © Ecma International 2022

The final time index 𝑙 is the one corresponding to the original time index of the smallest block size. In the
̂tonal
following, 𝑁 ′
(𝑙, 𝑧) denotes the estimation of the specific loudness of the tonal component in the critical band
𝑟 48000 Hz
centred around 𝑧 at time index 𝑙. The sampling rate of these estimations is 𝑟sd = s = = 187,5 Hz.
𝑠h,min 256
Here, the results belonging to the zero-padding done at the start of the processing need to be removed. Thus
the last evaluated block shall be:

𝑛samples
𝑙end = ceil ( ⋅ 𝑟sd ). (40)
𝑟s

6.2.7 Noise reduction

̂tonal
𝑁 ′
(𝑙, 𝑧) is a first estimation of the specific loudness of the tonal component. However, the specific loudness
of the tonal component is usually overestimated at this stage of the estimation process due to the tonal character
of noise in the narrow-band filtered bands. Thus, further noise reduction is necessary. This is done by application
of nonlinear sigmoid weighting of tonal vs. noise components. 𝑁 ̂tonal
′
(𝑙, 𝑧) is the tonal part of the specific loudness
of the complete band-pass signal. The corresponding specific loudness of the complete band-pass signal is
given by the autocorrelation function at zero lag:
′ ′ (𝑚
𝑁signal (𝑙, 𝑧) = 𝜑̅𝑙,𝑧 = 0). (41)

A first approximation of the signal-to-noise ratio in the band of interest can be derived as

̂tonal
𝑁 ′
(𝑙, 𝑧)
̂ (𝑙, 𝑧)
SNR = . (42)
′
𝑁signal (𝑙, 𝑧) − 𝑁̂tonal
′ (𝑙, 𝑧) + 𝜀

Since the estimation of the tonal component might contain unsteady parts, low-pass filtering is performed over
the temporal dimension of 𝑁 ̂tonal
′ ̂ (𝑙, 𝑧). A cutoff frequency of 3,5 Hz is used.20 Low-pass filters with
(𝑙, 𝑧) and SNR
the same filter coefficients are used for all critical bands. The filter defined in Formula (11) is used with order
𝑘 = 3. The filter coefficients of the low-pass filter ℎLP (𝑙) can be calculated according to Formulae (14) and
(15).21 The filtered signals are then

̃tonal
𝑁 ′
(𝑙, 𝑧) ̂tonal
= 𝑁 ′
(𝑙, 𝑧) ∗ ℎLP (𝑙) (43)

and

̃ (𝑙, 𝑧)
SNR = ̂ (𝑙, 𝑧) ∗ ℎLP (𝑙),
SNR (44)

where ∗ denotes the convolution. These filtered signals are used for further processing in Formulae (45) and
(47).

̃tonal
Band-dependent noise reduction is achieved by weighting the filtered specific loudness 𝑁 ′
(𝑙, 𝑧) of the tonal
component by a sigmoid function

̃ (𝑙,𝑧)
SNR ̃ (𝑙,𝑧)
SNR
−𝛼∙( −𝛽) −𝛼∙( −𝛽)
1−e 𝑔(𝑧) , e 𝑔(𝑧) <1
nr(𝑙, 𝑧) = { ̃ (𝑙,𝑧)
, (45)
SNR
−𝛼∙( −𝛽)
0 e 𝑔(𝑧) ≥1

with parameters 𝛼 and 𝛽 as given in Table 7.

20 Please note that the bandwidth of the lowpass filter is twice as large as the cut-off frequency! Therefore, the variable 𝑑 in
1 1
Formulae (14) and (15) should be calculated using 𝜏(𝑧) = ∙ 6 ∙ = 0.0268s for all critical bands according to
32 7 Hz
Formula (8).
21 For Formula (15) the following factors 𝑒 have to be used for a filter order of 𝑘 = 3: 𝑒 = 0, 𝑒 = 1, 𝑒 = 1.
𝑖 0 1 2

© Ecma International 2022 19

Table 7 — Parameters for the noise reduction function 𝐧𝐫(𝒍, 𝒛) (Formula (45))
Parameter 𝛼 𝛽
Value 20 0,07

Sigmoidal weighting significantly reduces wrongly-detected specific loudness of tonal components for
broadband signals. The frequency dependent factor 𝑔(𝑧) is calculated as

𝑐(𝑠b (𝑧))
𝑔(𝑧) = , (46)
𝐹(𝑧)𝑑(𝑠b(𝑧))

where the parameters 𝑐 and 𝑑 are given in Table 8 depending on the block size 𝑠b (𝑧) (see Table 4). This
function mitigates frequency-dependent overestimations of the tonality estimation (due to the different block
sizes) such that SNR(𝑙, 𝑧)/𝑔(𝑧) is approximately constant over 𝑧 for pink noise signals.

Table 8 — Parameters for the frequency dependent factor 𝒈(𝒛) (Formula (46))
𝑠b (𝑧) 8192 4096 2048 1024
𝑐(𝑠b (𝑧)) 18,21 12,14 417,54 962,68
𝑑(𝑠b (𝑧)) 0,36 0,36 0,71 0,69

′
The specific loudness of the tonal component, 𝑁tonal (𝑙, 𝑧), is then modelled as

′
𝑁tonal (𝑙, 𝑧) ̃tonal
= nr(𝑙, 𝑧) ∙ 𝑁 ′
(𝑙, 𝑧). (47)

6.2.8 Calculation of time-dependent specific tonality

The perceived tonality is not only dependent on the tonal content in each band, but also on the signal-to-noise
ratio over all bands at each time instance 𝑙. Thus, to finally model the tonality of the signal, the overall loudness
signal-to-noise ratio is evaluated across all bands. First, a new estimation of the specific loudness of the noise
component is calculated, using the final estimation of the specific loudness of the tonal component:
′ ′ ′
𝑁noise (𝑙, 𝑧) = 𝑁signal (𝑙, 𝑧) ∗ ℎLP (𝑙) − 𝑁tonal (𝑙, 𝑧). (48)

The overall loudness signal-to-noise ratio is calculated as

′
max 𝑁tonal (𝑙, 𝑧)
SNR(𝑙) = 𝑧
′
. (49)
𝜀 + ∑𝑧 𝑁noise (𝑙, 𝑧)

A scaling factor

1 − e−𝐴∙(SNR(𝑙)−𝐵) , e−𝐴∙(SNR(𝑙)−𝐵) < 1

𝑞(𝑙) = { (50)
0 e−𝐴∙(SNR(𝑙)−𝐵) ≥ 1

is applied multiplicatively. The parameters 𝐴 and 𝐵 are given in Table 9.

Table 9 — Parameters for the scaling factor (Formula (50))

Parameter 𝐴 𝐵
Value 35 0,003

20 © Ecma International 2022

Thus, the final estimation of the time-dependent specific tonality is given as:
′
𝑇 ′ (𝑙, 𝑧) = 𝑐T ∙ 𝑞(𝑙) ∙ 𝑁tonal (𝑙, 𝑧), (51)

where 𝑐T = 2,8785151 is a calibration factor. The time index 𝑙 can be mapped to the time 𝑡 in seconds as:

𝑙 𝑙
𝑡 = = s. (52)
𝑟sd 187,5

The unit of the tonality calculated by the psychoacoustic tonality method is given in tu HMS (HMS stands for tonality
units “according to the Hearing Model of Sottek” described in Clause 5). The psychoacoustic tonality method is
calibrated using a 1 kHz tone with a sound pressure level of 40 dB. The tonality value shall be for this signal
1 tuHMS22.

6.2.9 Calculation of averaged specific tonality

The specific tonality 𝑇′(𝑧) is taken by averaging the time-dependent specific tonality 𝑇′(𝑙, 𝑧). The averaging is
performed as follows:

1. The first tonality values 𝑇′(𝑙, 𝑧) for 0 ≤ 𝑙 ≤ 56 (approximately corresponding to the first 300 ms of the
input signal) are discarded due to the transient responses of the digital filters.
2. Only values that exceed a specific tonality value of 0,02 tuHMS are used for averaging. This step
ensures that the single value is independent of parts of the signal without noticeable tonal
components.

This averaging can be described mathematically as

1
𝑇′(𝑧) = ∑ 𝑇′(𝑙 ′ (𝑧), 𝑧) , (53)
#(𝑙 ′ (𝑧)) +𝜀
𝑙′

with

𝑙′(𝑧) = {57 ≤ 𝑙 ≤ 𝑙end | 𝑇′(𝑙, 𝑧) > 0,02 tu𝐻𝑀𝑆 }, (54)

using set notation23. The frequencies 𝑓ton,z (𝑧) are calculated by accordingly averaging the frequency 𝑓ton (𝑙, 𝑧)
(see Formula (39)24) over corresponding time indices:

1
𝑓ton,z (𝑧) = ∑ 𝑓 (𝑙 ′ (𝑧), 𝑧). (55)
#(𝑙 ′ (𝑧)) + 𝜀 ′ ton
𝑙

6.2.10 Calculation of time-dependent tonality

The time-dependent tonality 𝑇(𝑙) is taken as the maximum of the time-dependent specific tonalities 𝑇 ′ (𝑙, 𝑧) over
all bands 𝑧. If the user is only interested in one specific tonal event, a user defined frequency range [𝑓L , 𝑓H ] can
be specified. In this case, only critical bands with the critical band number 𝑧 are considered that fulfill the
following requirements:

22 The calibration factor 𝑐 can be adjusted within a tolerance of 0,25 % to account for the effects of different
T
implementations.
23 In set notation, {𝑥 | Φ(𝑥)} denotes all elements 𝑥 with the property Φ(𝑥). #(𝐴) denotes the cardinality (i.e. the number
of elements) of a set 𝐴.
24 Note that 𝑓 (𝑙, 𝑧) is denoted 𝑓 (𝑧) in Eq. (39), since the time index 𝑙 was neglected in this computation step.
ton ton

© Ecma International 2022 21

𝐹(𝑧) + 𝐹(𝑧 + 0,5)
16 Hz < 𝑓L < (56)
2

and

𝐹(𝑧) + 𝐹(𝑧 − 0,5)

20 kHz > 𝑓H > (57)
2

leading to a range of critical bands between 𝑧L and 𝑧H . With this calculation procedure, the actually considered
frequency range is [𝑓L′ , 𝑓H′ ] with

𝑓L′ = min 𝑅(𝑧L ). (58)

𝑓

and

𝑓H′ = min 𝑅(𝑧H ). (59)

𝑓

with the frequency range 𝑅(𝑧)

∆𝑓(𝑧) ∆𝑓(𝑧)
𝑅(𝑧) = [𝐹(𝑧) − , 𝐹(𝑧) + ]. (60)
2 2

All frequency bands between 𝑧L and 𝑧H . are used for the maximum search:

𝑇(𝑙) = max 𝑇 ′ (𝑙, 𝑧). (61)

𝑧∈[𝑧𝐿 ,𝑧𝐻 ]

The corresponding frequency 𝑓ton,l (𝑙) is given as

𝑓ton,l (𝑙) = 𝑓ton (𝑙, 𝑧max (𝑙)). (62)

where 𝑧max (𝑙) is the band in which the maximum of the time-dependent specific tonality 𝑇 ′ (𝑙, 𝑧) was found for a
given time instance 𝑙.

6.2.11 Calculation of representative values

The single value 𝑇 of the tonality of the signal is taken by averaging the time-dependent overall tonality 𝑇(𝑙).
The averaging is performed in the same way as described in Formula (53)

1
𝑇 = ∑ 𝑇(𝑙 ′ ) , (63)
#(𝑙 ′ ) ′
𝑙

with

𝑙′ = {57 ≤ 𝑙 ≤ 𝑙end | 𝑇(𝑙) > 0,02 tu𝐻𝑀𝑆 }. (64)

22 © Ecma International 2022

6.3 Information to be recorded for prominent tonalities

For stationary sounds, a tonal component in the critical band 𝑧tonal is identified as prominent, if the specific
tonality 𝑇′(𝑧tonal ) exceeds a value of 0,4 tuHMS and the specific tonality has a local maximum in 𝑧tonal .
Additionally, the frequency 𝑓ton,𝑧 (𝑧tonal ) needs to be in the range [𝐹(𝑧tonal − 1), 𝐹(𝑧tonal + 1)] for the component
to be identified as prominent. If the user is only interested in one specific tonal event, a user defined frequency
range [𝑓L , 𝑓H ] can be specified. Then, only tonalities that are in the frequency range [𝑓L′ , 𝑓H′ ]25 are considered:
For each tonal component that has been identified as prominent according to this standard, the following
information shall be recorded:

a) if a frequency range was defined, the resulting frequency range [𝑓L′ , 𝑓H′ ] for searching prominent tonalities
(Formulae (56) and (57));

b) the frequency, 𝑓ton,z (𝑧tonal ) , in hertz, of the tonality in the corresponding critical band 𝑧tonal (see
Formula (55));

c) details of the method used to evaluate the tonality (ECMA 418 – Part 2: Psychoacoustic metrics based on
the hearing model – Clause 6.2 Psychoacoustic tonality calculation method), together with a reference to this
Standard;

d) the psychoacoustic tonality value 𝑇′(𝑧tonal ) (see Formula (53)).

e) optionally, the time-dependent specific tonality 𝑇 ′ (𝑙, 𝑧) (see Formula (51)).

For non-stationary sounds, a signal is considered to contain prominent tonalities, if the time-independent single
value 𝑇 of the time-dependent tonality 𝑇(𝑙)26 exceeds a value of 0,4 tuHMS (see Formula (63)). If the signal has
been identified to contain prominent tonalities according to this clause, the following information shall be
recorded:

a) if a frequency range was defined, the resulting frequency range [𝑓L′ , 𝑓H′ ] for searching prominent tonalities
(Formulae (56) and (57));

b) the time-dependent frequency, 𝑓ton,l (𝑙), in hertz (see Formula (62)) of the time-dependent tonality 𝑇(𝑙);

d) the time-dependent psychoacoustic tonality value 𝑇(𝑙) (see Formula (61));

e) the time-independent single value 𝑇 (see Formula (63));

f) optionally: the time-dependent specific tonality 𝑇 ′ (𝑙, 𝑧) (see Formula (51)).

NOTE The criterion for prominence of tonalities for the psychoacoustic tonality calculation method (Clause 6.2) is
independent of frequency 0,4 tuHMS (HMS stands for tonality units “according to the Hearing Model of Sottek” described in
Clause 5).

25 [𝑓 ′ , 𝑓 ′ ] is calculated from [𝑓 , 𝑓 ] as explained in Formulae (56) - (60).

L H L H

26 The time index 𝑙 can be mapped to a time in seconds according to Formula (52).

© Ecma International 2022 23

7 Identification and evaluation of prominent roughness using a psychoacoustic
roughness calculation method
This clause describes a perception-model-based procedure for determining whether or not noise emissions
contain prominent roughness, and if present, their strengths: the psychoacoustic roughness calculation method.
The calculation is based on the specific loudness as described in Clause 5.

The auditory sensation roughness describes, together with the auditory sensation fluctuation strength, the
perception of temporal variations of sounds. While fluctuation strength covers slow variations (typically below
20 Hz), roughness is produced by faster variations up to around 500 Hz. The maximum of the auditory sensation
is located at around 4 Hz modulation rate for fluctuation strength and 70 Hz modulation rate for roughness. Both
auditory sensations can be produced either by amplitude modulation or by frequency modulation. Generally,
periodic modulations produce higher values of fluctuation strength and roughness than stochastic variations.

Roughness is used for the subjective evaluation of sound characteristics as well as for sound design. With
increasing roughness, sounds are increasingly attracting attention and perceived as increasingly aggressive,
and annoying, without showing a difference in loudness or A-weighted sound pressure level.

The impression of roughness arises if a time-variant envelope is present in one critical band, for example tones
with a temporal structure because of a change in amplitude or frequency. If these variations are rather slow (for
example lower than 10 Hz), the auditory system is capable to follow the changes and a perception of fluctuation
arises. With increasing modulation rates, sensations like R-roughness (around 20 Hz) arise and turn into actual
roughness, where the auditory system is not capable of resolving the temporal variations. Variations of the
envelope with modulation rates between 20 Hz and 300 Hz are perceived as “rough”. Roughness depends on
the center frequency, the modulation rate 𝑓mod , the degree of modulation 𝑚 and the sound pressure level.
Frequency modulated sounds produce a similar roughness as amplitude modulated sounds. The unit of
roughness is “asper”. As reference signal with 𝑅 = 1 asper, an amplitude modulated sinusoid of 1 kHz center
frequency, 𝑚 = 1, 𝑓mod = 70 Hz and a sound pressure level of 60 dB was chosen.

Roughness originates for example from a multiplicative combination of two vibrations – such as for example the
gear mesh frequency and the rotational speed in a gear wheel – or from superposition of two or more tonal or
narrowband sounds with a similar frequency. In practice, roughness often occurs in rotating components
(engines, gearboxes, fans).

7.1 Psychoacoustic roughness calculation method

7.1.1 Overview

The psychoacoustic roughness calculation is based on scaled envelope power spectra ΦE,𝑙,𝑧 (𝑘), which are
′
calculated using the specific basis loudness 𝑁basis (𝑙, 𝑧) (see Formula (25)) and the envelope of the CBF = 53
segmented band-pass signals 𝑝𝑙,𝑧 (𝑛′) (see Clause 5.1.5) as described in Clause 5. For the calculation of these
values, a block size of 𝑠b = 16384 and a hop size of 𝑠h = 4096 for the segmentation in Clause 5.1.5 shall be
used.

24 © Ecma International 2022

The further processing for roughness calculation is shown in Figure 8 and described in detail as follows:

Figure 8 — Calculation of roughness based on band pass signals and the specific basis loudness
calculated as described in Clause 5.

7.1.2 Envelope calculation and downsampling

The low-frequency envelopes are calculated from the segmented band-pass filtered sound pressure signals
𝑝𝑙,𝑧 (𝑛′) (see Clause 5.1.5) using the Hilbert transform. The envelopes 𝑝E,𝑙,𝑧 (𝑛′) are taken as magnitude of the
analytical signals

𝑝E,𝑙,𝑧 (𝑛′) = |𝑝𝑙,𝑧 (𝑛′) + 𝑗ℋ(𝑝𝑙,𝑧 (𝑛′))|, (65)

with ℋ(∙) denoting the Hilbert transform. Since the envelopes only contain low modulation rates, downsampling
with a factor of 32 is performed. The resulting downsampled envelopes of the band-pass signals are denoted
𝑝E,𝑙,𝑧 (𝑛̃) 27. With this step, the sampling rate changes from 𝑟s = 48 kHz to 𝑟̃s = 1500 Hz. The block size 𝑠̃b = 512
and a hop size of 𝑠̃h = 128 are the values corresponding to the block size of 𝑠b = 16384 and the hop size of
𝑠h = 4096 for the segmentation in Clause 5.1.5.

27 𝑛̃ refers to the index of the downsampled signal.

© Ecma International 2022 25

7.1.3 Calculation of scaled power spectrum

The envelopes 𝑝E,𝑙,𝑧 (𝑛̃) are windowed with a von-Hann window28, 𝑤Hann (𝑛̃)) and a scaled power spectrum29
ΦE,𝑙,𝑧 (𝑘) is generated by using

ΦE,𝑙,𝑧 (𝑘)
′ (𝑙)
0, 𝑁max ∙ 𝜑E,𝑙,𝑧 (0) = 0
′ 2 Bark HMS (66)
= (𝑁basis (𝑙, 𝑧)) ∙ ( ) ,
soneHMS 2
′
|DFT𝑠̃ b (𝑝E,𝑙,𝑧 (𝑛̃) ∙ 𝑤Hann (𝑛̃))| , else
{ 𝑁max (𝑙) ∙ 𝜑E,𝑙,𝑧 (0)

where DFT𝑠̃b denotes the 𝑠̃b -point Discrete Fourier Transform30, 𝑘 is the index corresponding to a modulation
𝑟̃s ′ ′ (𝑙) ′
frequency of 𝑘 ∙ Hz, 𝑁basis (𝑙, 𝑧) is the specific basis loudness, 𝑁max = max(𝑁basis (𝑙, 𝑧)) and
𝑠̃ b 𝑧

𝑠̃ b −1 2
𝜑E,𝑙,𝑧 (0) = ∑𝑛̃=0 (𝑝E,𝑙,𝑧 (𝑛̃) ∙ 𝑤Hann (𝑛̃)) . (67)

The resulting quantities of ΦE,𝑙,𝑧 (𝑘) are without units.

This step consideres the fact that the sensation of roughness changes nonlinearly with loudness. The results
are scaled envelope power spectra ΦE,𝑙,𝑧 (𝑘) which are used for further analysis of the roughness.

7.1.4 Noise reduction of the envelopes

Noise reduction of the envelopes is performed in two steps: First, the scaled power spectra of neighbouring
bands are averaged to reduce noise effects. Averaging is performed over 3 bands. Each band is averaged with
̅ E,𝑙,𝑧 (𝑘).
one higher and one lower band. This step results in averaged scaled power spectra Φ

Then, the sum of the averaged scaled power spectra,

𝑠(𝑙, 𝑘) ̅ E,𝑙,𝑧 (𝑘)

= ∑Φ (68)
𝑧

is calculated, showing an overview of all the modulation patterns over time. Each band may contain fluctuations
even in the case of unmodulated noise due to the bandpass-filtering, but in this case the correlation between
neighbouring bands is very low, while for modulated noise, the correlation is very high. The summation of the
averaged scaled power spectra amplifies the correlated components (peaks) stronger than the uncorrelated
ones. As a result, constant and/or time-varying peaks of the modulation spectrum become cleary visible. Now,
the averaged scaled power spectra are weighted with a noise suppression weighting factor 𝑤(𝑙, 𝑘) depending
on 𝑠(𝑙, 𝑘), that is applied to each individual critical band 𝑧, in order to distinguish between peaks related to the
roughness perception and the background noise of the envelope.

̂ E,𝑙,𝑧 (𝑘)
Φ ̅ E,𝑙,𝑧 (𝑘) ∙ 𝑤(𝑙, 𝑘)
= Φ (69)

̃
2𝜋𝑛
0,5−0,5 cos( )
28 Here, the scaled von-Hann window is defined as 𝑤
Hann (𝑛
̃) = 512
. The scaling factor in the denominator ensures
√0,375
a correct estimation of the magnitude of the power spectrum.
29 In the original version of the algorithm (22), the spectrum of the autocorrelation function 𝜑
E,𝑙,𝑧 (𝑚) of the envelope 𝑝E,𝑙,𝑧 (𝑛)
was evaluated (𝑚: lag time), corresponding to the power spectrum of the envelope. It should be noted that ΦE,𝑙,𝑧 (𝑘) is not
the Fourier transform of 𝜑E,𝑙,𝑧 (𝑚) since the scaling of ΦE,𝑙,𝑧 (𝑘) is not part of the autocorrelation function 𝜑E,𝑙,𝑧 (𝑚).
30 DFT of length N is defined as: 𝑋(𝑘) = DFT (𝑥(𝑛)) = ∑𝑁−1 𝑥(𝑛) ∙ e−𝑗2𝜋𝑘𝑛/𝑁 with 𝑘 = 0,1, … , 𝑁 − 1.
𝑁 𝑛=0

26 © Ecma International 2022

with the weigthing factor

clip(𝑤
̃(𝑙, 𝑘) − 0,1407,0,1) , ̃(𝑙, 𝑘) ≥ 0,05 ∙ max (𝑤
𝑤 ̃(𝑙, 𝑘))
𝑤(𝑙, 𝑘) = { 𝑘=2,…,255 (70)
0, else

where clip(𝑥, 𝑥min , 𝑥max ) returns clipped values of 𝑥 between 𝑥min and 𝑥max . 𝑤
̃(𝑙, 𝑘) is calculated as

𝑠(𝑙, 𝑘)
𝑤
̃(𝑙, 𝑘) = 0,0856 ∙ ∙ clip(0,1891 ∙ e0,0120∙𝑘 , 0,1) (71)
𝑠̃ (𝑙) + 𝛿

with the median 𝑠̃ (𝑙) of 𝑠(𝑙, 𝑘) over 𝑘 = 2, … ,255, and an additional exponential weighting depending on the
modulation rate. The constant 𝛿 = 10−10 ensures a defined value of 𝑤 ̃(𝑙, 𝑘) if 𝑠̃ (𝑙) = 0.

Note that for modulated signals the median value 𝑠̃ (𝑙) is small compared to the peaks, whereas for unmodulated
signals, 𝑠̃ (𝑙) and the random peaks have almost the same magnitude, thus leading to large ratios 𝑠(𝑙, 𝑘)/𝑠̃ (𝑙)
for modulated signals; 𝑤(𝑙, 𝑘) tends to be 1, whereas for unmodulated signals 𝑤(𝑙, 𝑘) becomes 0. The
parameters in the Formulae (70) and (71) were chosen that for an unmodulated White Gaussian Noise with a
level of 80 dB all the weighting values 𝑤(𝑙, 𝑘) become 0, consequently leading to a roughness value of 0 asper.

7.1.5 Spectral weighting

In this step, the amplitudes of the averaged scaled power spectra are weighted according to the perception of
roughness, which depends on the modulation rate. The spectral weighting is divided into four steps: First,
spectral peaks are identified, and the modulation rate of those peaks is estimated with high precision. The
amplitudes of peaks with a high modulation rate are weighted corresponding to the estimated modulation rate
in the second step. Since usually, more than one peak is found, a third step is performed to analyse the relation
of the different peaks. It is assumed that there is one dominant harmonic complex (a fundamental modulation
rate with harmonics at multiples of the fundamental modulation rate) which is the dominant cause for roughness
perception. The fundamental modulation rate of such a harmonic complex is estimated in the third step. In the
fourth step, the amplitudes of peaks with a low modulation rate are weighted corresponding to the estimated
fundamental modulation rate and summed to result in a first, uncalibrated estimation of the specific roughness.

7.1.5.1 Peak picking

In the peak picking steps, maxima of the averaged scaled power spectra are searched. To obtain a very precise
estimation of the modulation rates corresponding to these maxima, a quadratic fit of the envelope spectrum is
performed. Since the use of the von-Hann window in the calculation of the DFT does not lead to an exact
quadratic shape in the spectrum, an additional refinement step is performed to reduce this bias.

First, local maxima of the averaged scaled power spectra Φ ̂ E,𝑙,𝑧 (𝑘) for 𝑘 = 2, … ,255 are searched. For each
maximum, a corresponding prominence is calculated as the difference between the amplitude of the maximum
and the surrounding values.To measure the prominence of a peak, a horizontal line is first extended from the
peak to the left and right of the peak. The points where the line intersects the data on the left and right (this is
either another peak or the end of the data) are marked as the outer endpoints of the left and right intervals. Next,
the lowest valley is searched in both intervals. The larger of these two valleys is taken, and the vertical distance
from that valley to the peak is measured. This distance is the prominence. Only the ten maxima with the highest
prominence are considered. The maxima are numbered with 𝑖, where 𝑖 = 1 is the maximum corresponding to
the lowest modulation rate.

Only maxima at a modulation rate fulfilling the condition

̂ E,𝑙,𝑧 (𝑘p,𝑖 (𝑙, 𝑧)) > 0,05 ∙ max (Φ

Φ ̂ E,𝑙,𝑧 (𝑘p,𝑖 (𝑙, 𝑧))) (72)
𝑖

are considered, where 𝑘p,𝑖 (𝑙, 𝑧) desribes the modulation rate index 𝑘 of the 𝑖th maximum.

Since the modulation rate index 𝑘 only provides a limited resolution of the modulation rate, a refinement step is
performed, which improves the spectral resolution of the estimated modulation rate and the corresponding

© Ecma International 2022 27

amplitudes of each peak. First, a quadratic fit coefficient vector 𝐂 = (𝑐0 , 𝑐1 , 𝑐2 )𝑇 is calculated for each maximum,
which contains three coefficients for a quadratic fit of the envelope spectrum around a centre modulation rate
index. The vector is calculated by solving the system of equations

̂ E,𝑙,𝑧
𝚽 =𝐊∙𝐂 (73)

with

̂ E,𝑙,𝑧 (𝑘p,𝑖 (𝑙, 𝑧) − 1)

Φ
̂ E,𝑙,𝑧
𝚽 ̂ E,𝑙,𝑧 (𝑘p,𝑖 (𝑙, 𝑧))
= (Φ ) (74)
̂ E,𝑙,𝑧 (𝑘p,𝑖 (𝑙, 𝑧) + 1)
Φ

and the modulation index matrix for the quadratic fit,

2
(𝑘p,𝑖 (𝑙, 𝑧) − 1) 𝑘p,𝑖 (𝑙, 𝑧) − 1 1
2
𝐊 = (𝑘p,𝑖 (𝑙, 𝑧)) 𝑘p,𝑖 (𝑙, 𝑧) 1 . (75)
2
(𝑘p,𝑖 (𝑙, 𝑧) + 1) 𝑘p,𝑖 (𝑙, 𝑧) + 1 1
( )

From these coefficients, a first corrected modulation rate

𝑐1
𝑓̃p,𝑖 (𝑙, 𝑧) =− ∙ ∆𝑓 (76)
2𝑐0

𝑟̃s
is calculated with the DFT resolution ∆𝑓 = = 1500 Hz / 512 = 2,9297 Hz. The estimated modulation rate is
𝑠̃ b
refined by applying a bias correction term 𝜌(𝑓̃p,𝑖 (𝑙, 𝑧))

𝑓p,𝑖 (𝑙, 𝑧) = 𝑓̃p,𝑖 (𝑙, 𝑧) + 𝜌 (𝑓̃p,𝑖 (𝑙, 𝑧)). (77)

The bias comes from approximating the spectrum of the von-Hann window with a quadratic function, when
estimating the true modulation rate from the peaks in the sampled spectrum. The bias adjustment term depends
almost only on the difference between the peak index and the corresponding exact modulation rate. This term
𝐸(𝜃) is calculated for 32 steps, covering a range of ∆𝑓 , using integer steps 𝜃 = 0, … , 32 to indicate the
corresponding sub-interval. A higher resolution of the modulation rate could be achieved by using more sub-
intervals. Another option is the linear interpolation of 𝐸(𝜃) as a function of 𝛽(𝜃), the theoretical error after
applying a correction, and 𝜃corr , the argument leading to the smallest error 𝛽(𝜃), as shown in the following:

𝛽(𝜃corr − 1)
𝜌 (𝑓̃p,𝑖 (𝑙, 𝑧)) = 𝐸(𝜃corr ) − (𝐸(𝜃corr ) − 𝐸(𝜃corr − 1)) ∙ (78)
𝛽(𝜃corr ) − 𝛽(𝜃corr − 1)

𝜃corr is determined from the set of possible integer 𝜃 values that lie between 0 and 32 (the value of 𝜃 = 33 in
Table 10 is given only to simplify the implementation, to avoid the use of additional conditions in Formula (81)).
For each possible value of 𝜃, 𝛽(𝜃) is calculated from:

𝑓̃p,𝑖 (𝑙, 𝑧) 𝜃
𝛽(𝜃) = (floor ( ) + ) ∙ ∆𝑓 − (𝑓̃p,𝑖 (𝑙, 𝑧) + 𝐸(𝜃)) (79)
∆𝑓 32

where floor(𝑥) gives the greatest integer value smaller than or equal to the number 𝑥. 𝜃min is the 𝜃 value that
produces the smallest beta value magnitude:

𝜃min = argmin|𝛽(𝜃)|. (80)

0≤𝜃≤32

28 © Ecma International 2022

𝜃corr is then calculated from:

𝜃 , 𝜃min > 0 and 𝛽(𝜃min ) ∙ 𝛽(𝜃min − 1) < 0

𝜃corr = { min . (81)
𝜃min + 1, else

Table 10 and Formula (79) are used to calculate the parameters needed to calculate the bias term given in
Formula (78).

Table 10 – Error correction values 𝑬(𝜽)

𝜃 0 1 2 3 4 5 6 7 8
𝐸(𝜃)/Hz 0,0000 0,0457 0,0907 0,1346 0,1765 0,2157 0,2515 0,2828 0,3084
𝜃 9 10 11 12 13 14 15 16 17
𝐸(𝜃)/Hz 0,3269 0,3364 0,3348 0,3188 0,2844 0,2259 0,1351 0,0000 -0,1351
𝜃 18 19 20 21 22 23 24 25 26
𝐸(𝜃)/Hz -0,2259 -0,2844 -0,3188 -0,3348 -0,3364 -0,3269 -0,3084 -0,2828 -0,2515
𝜃 27 28 29 30 31 32 33
𝐸(𝜃)/Hz -0,2157 -0,1765 -0,1346 -0,0907 -0,0457 0,0000 0,0000

The amplitudes of the maxima are calculated as

1
̂ E,𝑙,𝑧 = ∑ Φ
𝐴𝑖 (𝑙, 𝑧) = ∑ 𝚽 ̂ E,𝑙,𝑧 (𝑘p,𝑖 (𝑙, 𝑧) + 𝑚), (82)
𝑚=−1

where it is assumed that the energy of a peak is mainly distributed over the index of the maximum and the two
neighbouring indices due to the use of the von-Hann window in the DFT calculation.

7.1.5.2 Weighting of high modulation rates

In a next step, these amplitudes are weighted with a modulation-rate-dependent factor 𝐺𝑙,𝑧,𝑖 (𝑓p,𝑖 (𝑙, 𝑧)) and a
scaling factor 𝑟max (𝑧). This weighting (together with the weigthing of low modulation rates described in 7.1.5.4)
consideres the dependency of the perceived roughness on the modulation rate. The weighting parameters were
obtained by an optimization procedure, fitting the results of the roughness algorithm to the results of listening
tests for sinusoids of different carrier frequencies with different modulation rates from Reference [12]. Those
results are shown in the evaluation of the roughness algorithm in Annex C, Figure C.1 and also in Reference [30].

𝐴𝑖 (𝑙, 𝑧) ∙ 𝑟max (𝑧), 𝑓p,𝑖 (𝑙, 𝑧) < 𝑓max (𝑧)

𝐴̃𝑖 (𝑙, 𝑧) = { (83)
𝐺𝑙,𝑧,𝑖 (𝑓p,𝑖 (𝑙, 𝑧)) ∙ 𝐴𝑖 (𝑙, 𝑧) ∙ 𝑟max (𝑧), 𝑓p,𝑖 (𝑙, 𝑧) ≥ 𝑓max (𝑧)

with

1
𝑟max (𝑧) =
𝐹(𝑧)
𝑟2
(84)
1 + 𝑟1 |log 2 ( )|
1 kHz

and the corresponding parameters 𝑟1 and 𝑟2 as given in Table 11.

Table 11 – Parameters for 𝒓𝐦𝐚𝐱 (𝒛)

𝐹(𝑧) < 1 kHz 𝐹(𝑧) ≥ 1 kHz
𝑟1 0,3560 0,8024
𝑟2 0,8049 0,9333

© Ecma International 2022 29

The weighting factor 𝐺𝑙,𝑧,𝑖 (𝑓p,𝑖 (𝑙, 𝑧)) is calculated as

1
𝐺𝑙,𝑧,𝑖 (𝑓p,𝑖 (𝑙, 𝑧)) =
2 𝑞2 (𝑧)
𝑓p,𝑖 (𝑙, 𝑧) 𝑓max (𝑧) (85)
(1 + (( − ) ∙ 𝑞1 ) )
𝑓max (𝑧) 𝑓p,𝑖 (𝑙, 𝑧)

where

𝐹(𝑧)
𝑓max (𝑧) = 72,6937 ∙ (1 − 1,1739 ∙ 𝑒 −5,4583∙1 kHz ) Hz (86)

is the modulation rate at which the weighting factor reaches the maximum of one. 𝐹(𝑧) is the center frequency
of the auditory filter bank as descibed in Clause 5. The parameter 𝑞1 = 1,2822 and 𝑞2 (𝑧) is calculated as

𝐹(𝑧)
0,2471, < 2−3,4253
1 kHz
𝑞2 (𝑧) = 2 (87)
𝐹(𝑧) 𝐹(𝑧)
0,2471 + 0,0129 ∙ (log 2 ( ) + 3,4253) , ≥ 2−3,4253
{ 1 kHz 1 kHz

7.1.5.3 Estimation of fundamental modulation rate

In this step, the maxima of the averaged scaled power spectra, which were found in 7.1.5.1 are further analysed.
It is assumed that there is one dominant harmonic complex (a fundamental modulation rate with harmonics at
multiples of the fundamental modulation rate) which is the dominant cause for roughness perception. The
fundamental modulation rate of such a harmonic complex is estimated in this step.

For each block 𝑙 and band 𝑧 , the fundamental modulation rate of the envelope is estimated in the next
processing step considering the modulation rate 𝑓p,𝑖 (𝑙, 𝑧) and the amplitude 𝐴̃𝑖 (𝑙, 𝑧) of the block. Since the
dependencies on 𝑙 and 𝑧 are not relevant for this processing step, the variables will be denoted only in
dependency of the index of the corresponding maximum, 𝑖, 𝑓p (𝑖) and 𝐴̃(𝑖) in the following to simplify the notation.

For each maximum with index 𝑖, it is tested whether the corresponding modulation rate 𝑓p (𝑖) is the best estimate
for the fundamental modulation rate of the envelope, by assuming that the sum over the harmonic complex
corresponding to the best estimate will result in the highest value. The excact procedure is described in the
following, where 𝑖0 describes the index of the currently tested maximum.

First, integer ratios of the modulation rates 𝑓p (𝑖) of all found maxima to the modulation rate 𝑓p (𝑖0 ) are calculated

𝑓p (𝑖)
𝑅𝑖0 (𝑖) = round ( ), (88)
𝑓p (𝑖0 )

by rounding to the nearest integer. If several 𝑖 result in the same integer ratio 𝑅𝑖0 (𝑖), it needs to be decided
which of the maxima is used further. In this case, the maximum with the index

𝑓p (𝑖)
𝑖 = argmin | − 1| (89)
𝑖 𝑅𝑖0 (𝑖) ∙ 𝑓p (𝑖0 )

is used, while the other maxima are discarded. From all remaining maxima, a set 𝐼𝑖0 of indices of all maxima,
which belong to a harmonic complex with fundamental modulation rate 𝑓p (𝑖0 ) is defined (using a tolerance of
4%):

𝑓p (𝑖)
𝐼𝑖0 = {𝑖 |(| − 1| < 0,04)}. (90)
𝑅𝑖0 (𝑖) ∙ 𝑓p (𝑖0 )

30 © Ecma International 2022

For this set of indices, the energy of the harmonic complex is calculated as

𝐸𝑖0 = ∑ 𝐴̃(𝑖). (91)

𝑖∈𝐼𝑖0

The index 𝑖0 leading to the highest energy is denoted 𝑖max in the following, the corresponding set of indices 𝐼𝑖0
is denoted 𝐼max . The fundamental modulation rate of the envelope is 𝑓p (𝑖max ).

In the following, only peaks corresponding to the indices in 𝐼max are considered as part of the envelope. The
amplitudes of these peaks are weighted depending on the distance between the center of gravity of these peaks
and the modulation rate of the peak with the highest amplitude:

𝐴̂(𝑖) = 𝑤peak ∙ 𝐴̃(𝑖) (92)

with 𝑖 ∈ 𝐼max and

0,749
𝑓p (𝑖)
∑𝑖∈𝐼max ( ∙ 𝐴̃(𝑖))
Hz 𝑓p (𝑖peak )|
𝑤peak = 1 + 0,1 ∙ || − (93)
∑𝑖∈𝐼 𝐴(𝑖)
max
̃ Hz |

and

𝑖peak = argmax 𝐴̃(𝑖). (94)

𝑖∈𝐼max

7.1.5.4 Weighting of low modulation rates

In this next step, another weighting based on the fundamental modulation rate and a summation of amplitudes
is performed. The block index 𝑙 and the band index 𝑧 are reintroduced for this step. Thus, the weighted
amplitudes are denoted 𝐴̂𝑖 (𝑙, 𝑧), the corresponding fundamental modulation rates 𝑓p,𝑖max (𝑙, 𝑧) and the set of
relevant maxima 𝐼max (𝑙, 𝑧).

The summation and weighting is performed as

∑ 𝐺𝑙,𝑧,𝑖 (𝑓p,𝑖max (𝑙, 𝑧)) ∙ 𝐴̂𝑖 (𝑙, 𝑧) , 𝑓p,𝑖max (𝑙, 𝑧) < 𝑓max (𝑧)
𝑖∈𝐼max (𝑙,𝑧)
𝐴(𝑙, 𝑧) = (95)
∑ 𝐴̂𝑖 (𝑙, 𝑧) , 𝑓p,𝑖max (𝑙, 𝑧) ≥ 𝑓max (𝑧)
{ 𝑖∈𝐼max (𝑙,𝑧)

where 𝐺𝑙,𝑧,𝑖 (𝑓p,𝑖max (𝑙, 𝑧)) is calculated as described in Formula (85) but with parameters
𝑞1 = 0,7066 and

𝐹(𝑧)
𝑞2 (𝑧) = 1,0967 − 0,0640 ∙ log 2 ( ). (96)
1 kHz

The parameter 𝑓max (𝑧) in Formula (95) is calculated according to Formula (86).

Values of 𝐴(𝑙, 𝑧) that fall below a threshold of 0,074376 are set to zero.

7.1.6 Optional entropy weighting based on randomness of modulation rate

In an optional processing step, 𝐴(𝑙, 𝑧) is weighted depending on the randomness (measured using the entropy)
of the estimated modulation rates. This method has been shown to improve the estimation of the roughness [31][32].

For this processing step, a signal of rotational speed 𝑑(𝑛) (unit revolutions per minute) as reference variable
with the same sampling rate as the sound pressure signal 𝑝(𝑛) needs to be available.

First, the rotational speed signal is segmented in the same way as the sound pressure signal (see Clause 5.1.5,
with 𝑠b and 𝑠h as given in Clause 7.1.1). The result is a segmented rotational speed signal 𝑑S (𝑛′, 𝑙). In each time
block 𝑙, the median of 𝑑S (𝑛′, 𝑙) over 𝑛′ is calculated. The result 𝑑̃S (𝑙) is an estimation of one rotational speed
value for each block. This estimation is transformed to an estimation of the frequency of the rotational speed in
Hertz:

𝑑̃S (𝑙)
𝑓D (𝑙) = Hz (97)
R
60
min

Now the maxima of the modulation rate, which were found in Clause 7.1.5.1 to calculate a weighting factor
based on the entropy of these maxima. First, a set

𝐼f (𝑙, 𝑧) = {𝑖 | 𝑖 ∉ 𝐼max (𝑙, 𝑧) ∨ 𝑖 = 𝑖max } (98)

Is defined. This set contains all indices of maxima, which were not identified as corresponding to the harmonic
complex of the estimated fundamental frequency in Section 7.1.5.3, and the index corresponding to the
fundamental frequency (but not the ones of the harmonics). For all 𝑖 ∈ 𝐼f (𝑙, 𝑧) an estimation of the order is
calculated as the ratio between the frequency of the maximum 𝑓p,𝑖 (𝑙, 𝑧) (see Clause 7.1.5.1) and the frequency
of the rotational speed:

0, 𝑓D (𝑙) = 0
𝑜𝑖 (𝑙, 𝑧) = {𝑓p,𝑖 (𝑙, 𝑧) . (99)
, else
𝑓D (𝑙)

Now a histogram of all estimated orders is calculated for each time index 𝑙 and frequency band 𝑧 from all
31
maxima of the current time block and the three preceding and subsequent blocks . In these histograms, 160
classes of constant width are used between the values 0,0625 and 20,625. The result is the histogram 𝐻(𝑏, 𝑙, 𝑧),
where 𝑏 is the class number and 𝐻(𝑏, 𝑙, 𝑧) contains the number of elements in the respective class. For
calculation of the entropy, probabilities of occurrence

0, ∑ 𝐻(𝑏, 𝑙, 𝑧) = 0
𝑏
𝑃(𝑏, 𝑙, 𝑧) = 𝐻(𝑏, 𝑙, 𝑧) (100)
, else
{∑𝑏 𝐻(𝑏, 𝑙, 𝑧)

are calculated from the histogram for all classes. From this probability, the Shannon entropy

0, 𝑃(𝑏, 𝑙, 𝑧) = 0
𝐸(𝑙, 𝑧) = {− ∑(𝑃(𝑏, 𝑙, 𝑧) ⋅ log 𝑃(𝑏, 𝑙, 𝑧)) , else , (101)
2
𝑏

32
is calculated . Finally, 𝐴(𝑙, 𝑧) is weighted with the entropy, if 𝐸(𝑙, 𝑧) > 1:

𝐴(𝑙, 𝑧)
𝐴E (𝑙, 𝑧) = . (102)
max(𝐸(𝑙, 𝑧); 1)

31 In the border regions less preceding or subsequent blocks are used.

32 In the case of a probability of zero, a result of 0 ∙ log 0 = 0 is used according to the limit lim (𝑥log2 𝑥) = 0.
2 𝑥→0

If this optional weighting step is used, 𝐴E (𝑙, 𝑧) needs to be used instead of 𝐴(𝑙, 𝑧) in all following processing
steps of this algorithm.

7.1.7 Calculation of time-dependent specific roughness

𝐴(𝑙, 𝑧) is interpolated to a sampling rate of 𝑟s50 = 50 Hz using a piecewise cubic Hermitian function (temporal
resolution of 20 ms). The new time index is designated 𝑙50 . Subsequently, negative values resulting from the
′ (𝑙
interpolation are set to zero, resulting in a first, uncalibrated estimate of the specific roughness 𝑅est 50 , 𝑧).

Here, the results belonging to the zero-padding done at the start of the processing need to be removed. Thus
the last evaluated block shall be:

𝑛samples
𝑙50,end = ceil ( ⋅ 𝑟s50 ). (103)
𝑟s

The next step in calculating the specific roughness is a nonlinear transform, depending on the distribution of
′
𝑅est (𝑙50 , 𝑧) over the critical bands 𝑧. This step is necessary to take into account that the roughness perception
′
differs for broad-band signals (i.e., signals with a broader distribution of 𝑅est (𝑙50 , 𝑧) over the critical bands)
compared to narrow band signals such as modulated sinusoids (i.e., signals with a narrow distribution of
′
𝑅est (𝑙50 , 𝑧) over the critical bands). With this step it is possible to model the roughness for very different kinds of
synthetical and technical sounds as described in Reference [30].

Together with the nonlinear transform, a calibration is performed, which ensures that the calibration signal
(amplitude modulated sinusoid, 60 dB SPL, 1 kHz carrier frequency, 70 Hz modulation rate) results in a
roughness of 1 asper33.

𝑅̂ ′ (𝑙50 , 𝑧) = ′
𝑐R ∙ (𝑅est (𝑙50 , 𝑧))𝐸(𝑙50 ) (104)
asper
with the calibration factor 𝑐R = 0,0180909 ,
BarkHMS

𝐸(𝑙50 ) = 0,95555 ∙ (tanh(1,6407 ∙ (𝐵(𝑙50 ) − 2,5804)) + 1) ∙ 0,5 + 0,58449 (105)

and

𝑅̃est
′ (𝑙 )
50
′ (𝑙 ) , 𝑅̅est
′ (𝑙 )
50 ≠ 0
𝐵(𝑙50 ) = {𝑅̅est 50 (106)
0, 𝑅̅est
′ (𝑙 )
50 = 0

The squared and linear mean 𝑅̃est

′ (𝑙 ) ̅′
50 and 𝑅est (𝑙50 ) are defined as

′ 2
∑𝑧(𝑅est (𝑙50 , 𝑧)) (107)
𝑅̃est
′ (𝑙 )
50 = √ ,
CBF

and

′ (𝑙
∑𝑧(𝑅est 50 , 𝑧))
𝑅̅est
′ (𝑙 )
50 = (108)
CBF

where CBF = 53 is the number of critical bands. The resulting estimate of the time-dependent specific
roughness, 𝑅̂ ′(𝑙50 , 𝑧), is smoothed by using a lowpass filter of order one with different time constants for rising
and falling slopes. This filtering consideres the fact, that the perception of sound events rises quickly with the

33 The calibration factor 𝑐 can be adjusted within a tolerance of 0,25 % to account for the effects of different
R
implementations.

beginning of the sound event, but only decays slowly when the sound event ends. A similar filtering is used in
the loudness model for time-varying sounds of Moore and Glasberg [6]. The filtering can be described as

𝑅̂′(𝑙50 , 𝑧), 𝑙50 = 0

1 1
𝑅′ (𝑙50 , 𝑧) ={ − − (109)
𝑅̂ ′(𝑙50 , 𝑧) ∙ (1 − e 𝑟s50∙𝜏(𝑙50,𝑧) ) + 𝑅̂ ′(𝑙50 − 1, 𝑧) ∙ e 𝑟s50∙𝜏(𝑙50 ,𝑧) , 𝑙50 ≥ 1

with the different time constants for rising and falling slopes

0,0625, 𝑅̂ ′(𝑙50 , 𝑧) ≥ 𝑅̂ ′(𝑙50 − 1, 𝑧)

𝜏(𝑙50 , 𝑧) ={ , (110)
0,5000, 𝑅̂ ′(𝑙50 , 𝑧) < 𝑅̂ ′(𝑙50 − 1, 𝑧)

resulting in the final estimate of the time-dependent specific roughness 𝑅′(𝑙50 , 𝑧).

7.1.8 Calculation of representative values

The specific roughness 𝑅′(𝑧) is taken by averaging the time-dependent specific roughness 𝑅′(𝑙50 , 𝑧). For the
averaging, the first roughness values 𝑅′(𝑙50 , 𝑧) for 0 ≤ 𝑙50 ≤ 15 (approximately corresponding to the first 300 ms
of the input signal) are discarded due to the transient responses of the digital filters.

7.1.9 Calculation of time-dependent roughness

The time-dependent roughness 𝑅(𝑙50 ) is the integral of 𝑅′(𝑙50 , 𝑧) over 𝑧, approximated by summing over all
bands 𝑧 while considering the overlap ∆𝑧:

𝑅(𝑙50 ) = ∆𝑧 ∑(𝑅′(𝑙50 , 𝑧)). (111)

𝑧

7.1.10 Calculation of representative values

The single value 𝑅 is calculated by taking the 90th percentile of the time-dependent roughness 𝑅(𝑙50 ), discarding
again the first roughness values 𝑅(𝑙50 ) for 0 ≤ 𝑙50 ≤ 15.

7.1.11 Calculation of roughness for binaural signals

For binaural signals, monaural time-dependent specific roughness values 𝑅L′ (𝑙50 , 𝑧) and 𝑅R′ (𝑙50 , 𝑧) of the left and
right channel shall be calculated separately for each channel (assuming diotic signals).

A combined binaural time-dependent specific roughness 𝑅B′ (𝑙50 , 𝑧) is calculated using the quadratic mean:

′ 2 ′ 2

𝑅B′ (𝑙50 , 𝑧) = √(𝑅L (𝑙50 , 𝑧)) + (𝑅R (𝑙50 , 𝑧)) . (112)

Formula (112) approximately corresponds to the formula for binaural inhibition from the binaural loudness model
by Moore/Glasberg (ISO 532-2[7], see also Reference [33]). In the case that the roughness value of a channel
is negligible, Formula (112) results in a roughness, which is √0,5 lower than that of the diotic presentation.

For binaural signals, the binaural time-dependent specific roughness 𝑅B′ (𝑙50 , 𝑧) shall be used as basis for the
calculation of the specific roughness 𝑅′(𝑧), the time-dependent roughness 𝑅(𝑙50 ) and the single value 𝑅 instead
of 𝑅′(𝑙50 , 𝑧) in Clauses 7.1.8, 7.1.9 and 7.1.10.

7.2 Information to be recorded for prominent roughness

A signal is considered to contain prominent roughness, if the time-independent single value 𝑅 of the time-
dependent roughness 𝑅(𝑙50 ) exceeds a value of 0,2 asper. If the signal has been identified to contain prominent
roughness according to this standard, the following information shall be recorded:

a) details of the method used to evaluate the roughness (ECMA 418 – Part 2: Psychoacoustic metrics based
on the hearing model – Clause 7.1 Psychoacoustic roughness calculation method), together with a
reference to this Standard;

b) the time-dependent psychoacoustic roughness values 𝑅(𝑙50 ) (see Formula (111));

c) the time-independent single value 𝑅;

d) information if the optional entropy weighting was used or not;

e) optionally: the time-dependent specific roughness 𝑅′(𝑙50 , 𝑧).

8 Improved identification and evaluation of loudness using psychoacoustic
methods of tonal and noise loudness
This clause describes a procedure based on a perceptual model for determining how loud a sound is perceived
taking into consideration how people’s perceptions differ for tonal and noise signals. For narrowband signals
with subcritical bandwidths, it is generally assumed that loudness only depends on the level, independent of
the bandwidth. This assumption is also demonstrated by standardized loudness models such as ISO 532-1
(Zwicker) [3] and ISO 532-3 DIS (Moore, Glasberg, Schlittenlacher). Several published experimental
studies [35]-[38], however, showed that this is not the case, but rather that tonal components are perceived as
louder than equivalent narrow-band (subcritical bandwidth) noise with the same level on the same band. Sottek
et al [39] have shown that a more accurate loudness estimation can be done by combining the tonal loudness
and noise loudness presented earlier. This calculation process is described in this Clause.

8.1 Psychoacoustic loudness calculation method

The calculation process is simpler compared to the last sections, since most of the calculations were already
described in Clauses 5 and 6. An overview of the determination of the specific loudness is shown in Figure 8.

Figure 8 — Calculation of loudness based on specific tonal and noise loudness (see Clause 6).

8.1.1 Calculation of time-dependent specific loudness

′
To obtain a better estimation of the loudness, a power average of the specific tonal loudness 𝑁tonal (𝑙, 𝑧) in
′
Formula (47) and the weighted specific noise loudness 𝑁noise (𝑙, 𝑧) in Formula (48) is performed to obtain the
specific loudness 𝑁 ′ (𝑙, 𝑧):

1/𝑒(𝑧)
𝑒(𝑧) 𝑒(𝑧)
′ ′
𝑁 ′ (𝑙, 𝑧) = ((𝑁tonal (𝑙, 𝑧)) + (𝑤n ⋅ 𝑁noise (𝑙, 𝑧)) ) (113)

Here 𝑤n = 0,5331 and the exponent 𝑒(𝑧) is a function of the maximal specific basis loudness:

𝑎
𝑒(𝑧) = +𝑏
′ ′
𝑁tonal (𝑙, 𝑧)+ 𝑁noise (𝑙, 𝑧) (114)
max ( soneHMS )+𝜖
𝑧
( )
Bark HMS

with the parameters 𝑎, 𝑏 and 𝜖 given in Table 12.

Table 12 – Parameters to define the exponent for the loudness power average (Formula (114))
Parameter 𝑎 𝑏 𝜖
Value 0,2918 0,5459 10−12

8.1.2 Calculation of averaged specific loudness

The specific loudness 𝑁 ′ (𝑧) is taken by averaging the time-dependent specific loudness 𝑁 ′ (𝑙, 𝑧) . For the
averaging, the first loudness values 𝑁 ′ (𝑙, 𝑧) for 0 ≤ 𝑙 ≤ 56 (approximately corresponding to the first 300 ms of
the input signal) are discarded due to the transient responses of the digital filters.

This averaging can be described mathematically as:

1/𝑒
1
𝑁 ′ (𝑧) = ( ∑ 𝑁 ′ (𝑙 , 𝑧)𝑒 ) , (115)
𝑙end − 56
𝑙

1
where 𝑒 = and 57 ≤ 𝑙 ≤ 𝑙end . A power average is used here because it gives more weight to stronger
log10 (2)
components and correlates better with human loudness perception[39]

8.1.3 Calculation of time-dependent loudness

The time-dependent loudness 𝑁(𝑙) is calculated by integrating all specific loudness values, like Formula (26)
with ∆𝑧 = 0,5:

CBF
𝑖
𝑁(𝑙) = ∑ 𝑁 ′ (𝑙, ) ∙ ∆𝑧. (116)
2
𝑖=1

The unit of the result is soneHMS /Bark HMS and no additional calibration is needed since the specific results were
already calibrated in Formula (23).

8.1.4 Calculation of representative values

The single value 𝑁 of the loudness of the signal is taken again by a power average of the time-dependent
loudness 𝑁(𝑙). Like the specific loudness, the values of 𝑁(𝑙) for 0 ≤ 𝑙 ≤ 56 (approximately corresponding to the
first 300 ms of the input signal) are discarded due to the transient responses of the digital filters.

1/𝑒
1
𝑁 = ( ∑ 𝑁(𝑙)𝑒 ) , (117)
𝑙end − 56
𝑙

where 𝑙 > 56. The unit of the result is soneHMS . While this process does not significantly modify the loudness of
pure tonal signals in comparison to the result in Formula (26), it improves the result of noise-like signals and
mixtures of tones and noise for which the loudness of the noise components are overestimated [39].

8.1.5 Calculation of loudness for binaural signals

For binaural signals, monaural time-dependent specific loudness values 𝑁L′ (𝑙, 𝑧) and 𝑁R′ (𝑙, 𝑧) of the left and right
channel shall be calculated separately for each channel (assuming diotic signals).

A combined binaural time-dependent specific loudness 𝑁B′ (𝑙, 𝑧) is calculated using the quadratic mean:

2 2
(𝑁L′ (𝑙, 𝑧)) + (𝑁R′ (𝑙, 𝑧)) . (118)
𝑁B′ (𝑙, 𝑧) = √
2

Formula (118) approximately corresponds to the formula for binaural inhibition from the binaural loudness model
by Moore/Glasberg (ISO 532-2 [7], see also Reference [33]). In the case that the loudness value of a channel is
negligible, Formula (118) results in a loudness, which is √0,5 lower than that of the diotic presentation.

For binaural signals, the binaural time-dependent specific loudness 𝑁B′ (𝑙, 𝑧) shall be used as basis for the
calculation of the specific loudness 𝑁′(𝑧), the time-dependent loudness 𝑁(𝑙) and the single value 𝑁 instead of
𝑁′(𝑙, 𝑧) in Clauses 8.1.2, 8.1.3 and 8.1.4.

8.2 Information to be recorded for loudness

The following information shall be recorded:

a) details of the method used to evaluate the loudness (ECMA 418 – Part 2: Psychoacoustic metrics based
on the hearing model – Clause 8.1 Psychoacoustic loudness calculation method), together with a reference
to this Standard;

b) the time-dependent psychoacoustic loudness values 𝑁(𝑙) (see Formula (116));

c) the time-independent single value 𝑁;

d) optionally: the time-dependent specific loudness 𝑁 ′ (𝑙, 𝑧).

Annex A
(informative)

Evaluation of the psychoacoustic hearing model

The psychoacoustic loudness calculation is evaluated by comparison with the target equal-loudness contours
as shown in Figure 2. The loudness was calculated for sinusoidal signals with a frequency of 1000 Hz and a
sound pressure level of 20 to 80 dB with a step size of 20 dB. For other frequencies, the level was varied to
match the loudness calculated for the 1000 Hz tone. The same procedure was performed for the lower threshold
of hearing. The results are shown in Figure A.1. The target equal-loudness contours are emulated well by the
results of the hearing model.

Figure A.1— Results for the equal-loudness contours. The dotted lines show the target equal-loudness
contours, the solid lines are the equal-loudness contours obtained with the hearing model

Evaluation of the psychoacoustic tonality calculation method

B.1 Application examples

Figure B.1 shows analysis results for a frequency-modulated signal with a low modulation rate of 2 Hz, a
modulation index of 150 at a frequency of 2 kHz and with very low sound pressure (L = 30 dB).

From top to bottom, it shows:

1. the spectrum (FFT size 65536, sampling rate 48 kHz), a smoothed spectrum (1/24th octave smoothed
FFT: the “background noise”, useful to show general shapes while not resolving pure tones), and a
1-critical-bandwidth peak-hold spectrum as “critical bandwidth ruler”;
2. the tone-to-noise ratio 34 (TNR) results along with the TNR tolerance line.
3. the prominence ratio 35 (PR) calculated as a full spectrum for each frequency of interest (specific
prominence ratio, SPR), both with and without recognition only of pure tones, along with the
PR tolerance line.

TNR and PR fail since the corresponding tolerance lines are not exceeded. Only SPR shows a marginal value
for a signal with a clearly prominent tonality (even though at a very low sound pressure level).

34 The calculation of the tone-to-noise Ratio is described in ECMA-418-1.

35 The calculation of the prominence ratio is described in ECMA-418-1.

Figure B.1— Top: Different spectral representations (FFT and smoothed FFT) of a frequency-modulated
tone; Middle: corresponding TNR results (scale gives dB of tonal audibility); Bottom: corresponding PR
values tones-only (according to ECMA-418-1) using critical bands (CB) and complete SPR not
constrained only to pure tones results
Figure B.2 depicts the specific psychoacoustic tonality analysis of the same sound as shown in Figure B.1 with
a distinct tonal content: The location of the maximum of the specific tonality is changing over time, but the
magnitude is almost constant, leading to a stable tonality prediction based on the assumption that the perceived
tonality is taken as the maximum of the specific tonality. This corresponds well to the auditory impression.

Figure B.2— Specific psychoacoustic tonality analysis of the same sound used as source for the results
of the analyses shown in Figure B.1

B.2 Evaluation

The psychoacoustic tonality is evaluated by comparison with listening test results. As a reference, PR is also
added to the comparison. TNR values were also calculated. However, since they were very similar to the results
of the PR, they are not displayed in the results for reasons of clarity.

For the listening tests, mixtures of a sinusoidal tone with a frequency of 1000 Hz with different levels and pink
noise with different levels were used. Thus, the effect of different signal-to-noise-ratios can be evaluated for
different levels. Five different tests were performed. In all five tests, the level of the pink noise was varied from
40 dB SPL to 80 dB SPL with a step size of 5 dB SPL. The tests differed in the level of the sinusoidal tone,
which was chosen from 55 dB SPL to 75 dB SPL with a step size of 5 dB SPL.

The tests were performed with 16 test subjects. The test subjects were asked to rate the tonality of each sound
on a 13-point categorical scale (ranging from “0 - not tonal” to “12 - extremely tonal”). To compare the results of
the listening tests with the results of the psychoacoustic model, a linear scaling factor was used for the results
of the listening tests. Another scaling factor was used to map the results of the listening tests to the results of
the PR. The scaling factors were derived by minimizing the root-mean-square error between the mean ratings
of all participants and the calculated psychoacoustic tonality (or the PR, respectively) of all five experiments.

The results of the evaluation are shown in Figure B.3. The results illustrate one problem of the PR: it decreases
linearly for decreasing SNR. The tonality perception however does not decrease linearly according to the
experimental results. The results of the psychoacoustic hearing model fit much better to the perceived tonality.

Figure B.3 — Psychoacoustic tonality and prominence ratio compared to results of listening tests

Since experimental results are subject to statistical uncertainty, the variance of the results need to be considered.
Thus, an error measure was defined, taking into account the 95% confidence interval of the results. First, the
results of the psychoacoustic tonality were scaled such that they are comparable to the tonality ratings of the
listening tests. The experimental results are compared to the scaled psychoacoustic tonality. If the
psychoacoustic tonality lies within the 95% confidence interval, no error is assumed. If it is outside of the
confidence interval, the error is taken as the difference to the confidence interval. The root-mean-square error
of this value is calculated. An error for the PR was calculated in the same way, scaling the PR to make it
comparable with the results of the listening tests.

The better performance of the psychoacoustic hearing model is also reflected in this error measure. For the
psychoacoustic tonality, the error measure over all five experiments (related to the 13-pt categorical scale) was
0,21, for the PR it was 0,70, for the TNR (not shown in the figures) it was 0,74.

Further application examples related to IT equipment can be found in Reference [34].

Annex C
(informative)

Evaluation of the psychoacoustic roughness calculation method

The psychoacoustic roughness is evaluated by comparison with listening test results and data from
Reference [12]. Figure C.1 shows results for amplitude modulated sinusoids with seven different carrier
frequencies (125 Hz, 250 Hz, 500 Hz, 1000 Hz, 2000 Hz, 4000 Hz, 8000 Hz) and different modulation rates.
The results from Reference [12] are idealized, smoothed curves that were fit to the results of jury tests. The
results of the model are close to these idealized curves and never exceed a tolerance of ±0,1 asper.

1 1 1 1

0.5 0.5 0.5 0.5

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

Roughness asper

10 20 50 100 200 350 10 20 50 100 200 350 10 20 50 100 200 350 10 20 50 100 200 350

1 1 1 Roughness algorithm
ata from Fastl Zwicker
0.1 asper tolerance
0.5 0.5 0.5

0.2 0.2 0.2

0.1 0.1 0.1

10 20 50 100 200 350 10 20 50 100 200 350 10 20 50 100 200 350

Modulation rate Hz

Figure C.1 — Results for modulated sinusoids with different carrier frequencies and modulation rates.
All sounds were modulated with 100% degree of modulation and a sound pressure level of 60 dB.

In order to investigate the applicability of the method to technical sounds, listening tests with the sounds
described in Table C.1 were carried out.

Table C.1 — Technical Sounds

ESA_02 Electrical Seat Adjuster

ETB_01 Electrical Toothbrush

GEN_02 Generator

HDD_07 Hard Disk Drive

HDD_09 Hard Disk Drive

SCOOT Pass-by of Scooter

SINUS Calibration Signal: Modulated Sinus Tone

TOF_03 Take-Off (Airplane)

In Figure C.2, the results of the psychoacoustic roughness model are compared with listening test results (mean
values and 95% confidence intervals) for the seven technical sounds and a reference sound (SINUS), which
was used as anchor. It can be seen that the calculated results are all within the 95% confidence intervals of the
listening test data, thus proving that the algorithm performs well for technical sounds. More results can be found
in Reference [30].

2
Roughness algorithm
ury test

1.5
Roughness asper

0.5

Figure C.2 — Results of several technical sounds. The results of the listening tests are displayed: mean
values with 95% confidence intervals.

Bibliography

[1] R. Sottek: A Hearing Model Approach to Time-Varying Loudness, Acta Acustica united with Acustica,
vol. 102, no. 4, pp. 725-744, 2016.

[2] M. Slaney: Auditory toolbox. Interval Research Corporation, Tech. Rep 10 (1998), 1998.

[3] ISO 532-1: Acoustics – Methods for calculating loudness, Part 1: Zwicker method

[4] DIN 45631/A1:2010: Calculation of loudness level and loudness from the sound spectrum - Zwicker
method - Amendment 1: Calculation of the loudness of time-variant sound, Beuth Verlag, 2010.

[5] J. Chalupper, H. Fastl: Dynamic loudness model (DLM) for normal and hearing-impaired listeners, Acta
Acustica united with Acustica 88(3), pp. 378-386, 2002.

[6] B.R. Glasberg, B.C.J. Moore: A model of loudness applicable to time-varying sounds, Journal of the
Audio Engineering Society 50, pp. 331-341, 2002.

[7] ISO 532-2, Acoustics — Methods for calculating loudness — Part 2: Moore-Glasberg method

[8] J. Rennies, J.L. Verhey, J.E. Appell, B. Kollmeier: Loudness of complex time-varying sounds? A
challenge for current loudness models. Proceedings of Meetings on Acoustics, vol. 19, 050189, 2013.

[9] J. Rennies, M. Wächtler, J. Hots, J.L. Verhey.: Spectro-temporal characteristics affecting the loudness
of technical sounds: data and model predictions, Acta Acustica united with Acustica, vol. 101(6), pp.
1145–1156, 2015.

[10] ISO 389-7, Acoustics — Reference zero for the calibration of audiometric equipment — Part 7:
Reference threshold of hearing under free-field and diffuse-field listening conditions

[11] A. M. H. J. Aertsen and P. I. M. Johannesma: Spectro-Temporal Receptive-Fields of Auditory Neurons

in the Grassfrog .1. Characterization of Tonal and Natural Stimuli, Biological Cybernetics, vol. 38, no. 4,
pp. 223-234, 1980.

[12] H. Fastl, E. Zwicker: Psychoacoustics. Facts and Models, Springer, Berlin, Heidelberg, New York, 2006.

[13] B. C. Moore: Basic auditory processes involved in the analysis of speech sounds. Philosophical
Transactions of the Royal Society of London B: Biological Sciences, 363. Jg., Nr. 1493, S. 947-963,
2008.

[14] T. Bierbaums, R. Sottek: Modellierung der zeitvarianten Lautheit mit einem Gehörmodel, Proc. DAGA
2012, Darmstadt, pp. 591-592, 2012.

[15] S. Buus, M. Florentine: Modifications to the power function for loudness. In: E. Summerfield, R. Kompass,
T. Lachmann (eds), Fechner Day 2001. Proceedings of the 17th Annual Meeting of the International
Society for Psychophysics. Berlin: Pabst, pp. 236-241, 2001.

[16] M. Epstein, M. Florentine: A test of the Equal-Loudness-Ratio hypothesis using cross-modality matching
functions, J. Acoust. Soc. Am., vol. 118(2), pp. 907-913, 2005.

[17] R. Sottek: Improvements in calculating the loudness of time varying sounds. Proc. Inter-Noise 2014,
Melbourne, 2014.

[18] R. Sottek: Modelle zur Signalverarbeitung im menschlichen Gehör, dissertation, RWTH Aachen, 1993.

[19] R. Sottek, F. Kamp, A. Fiebig: A new hearing model approach to tonality, Proc. Inter-Noise 2013,
Innsbruck, 2013.

[20] R. Sottek: Progress in calculating tonality of technical sounds, Proc. Inter-Noise 2014, Melbourne, 2014.

[21] R. Sottek: Calculating tonality of IT product sounds using a psychoacoustically-based model, Proc. Inter-
Noise 2015, San Francisco, 2015.

[22] H. Hansen, J.L. Verhey, R. Weber: The Magnitude of Tonal Content. A Review, Acta Acustica united
with Acustica, 97(3), pp. 355-363, 2011.

[23] H. Hansen, R. Weber: Zum Verhältnis von Tonhaltigkeit und der partiellen Lautheit der tonalen
Komponenten in Rauschen, Proc. DAGA 2010, Berlin, pp. 597-598, 2010.

[24] J.L. Verhey, S. Stefanowicz: Binaurale Tonhaltigkeit, Proc. DAGA 2011, Düsseldorf, pp. 827-828, 2011.

[25] J.C.R. Licklider: A Duplex Theory of Pitch Perception, Cellular and Molecular Life Sciences, vol. 7(4),
pp. 128-134, 1951.

[26] R. Sottek: Gehörgerechte Rauhigkeitsberechnung, Proc. DAGA 1994, Dresden, pp. 1197-1200, 1994.

[27] R. Sottek, P. Vranken, H.-J. Kaiser: Anwendung der gehörgerechten Rauhigkeitsberechnung, Proc.
DAGA 1994, Dresden, pp. 1201-1204, 1994.

[28] R. N. Bracewell: The Fourier transform and its applications. McGraw-Hill, New York,1986.

[29] J. Becker, R. Sottek: Psychoacoustic Tonality Analysis, Proc. Inter-Noise 2018, Chicago, 2018.

[30] R. Sottek, J. Becker, T. Lobato: Progress in Roughness Calculation, Proc. Inter-Noise 2020, Seoul, 2020.

[31] A. Oetjen, "Threshold and Suprathreshold Phenomena in Auditory Modulation Perception" (Phd.-Thesis),
Oldenburg, 2018.

[32] A. Oetjen, U. Letens, S. van de Par, J. Verhey und R. Weber, „Roughness calculation for randomly
modulated sounds,“ in AGA, Meran, 2013.

[33] Moore B C, Glasberg B R, Varathanathan A, Schlittenlacher, J. A loudness model for time-varying

sounds incorporating binaural inhibition. Trends in hearing, 20, 2331216516682698, 2016.

[34] HEAD acoustics GmbH: Using the new psychoacoustic tonality analyses Tonality (Hearing Model),
Application Note, 2018.

[35] Zwicker, E., Loudness and excitation patterns of strongly frequency modulated tones, in Sensation and
Measurement, papers in honor of S.S. Stevens, edited by H.R. Moskowitz, B. Scharf, and J.C. Stevens
(D. Reidel, Dordrecht, Netherlands), pp. 325–335, 1974.

[36] R. Sottek, Loudness models applied to technical sounds, Noise-Con 2010, 2010.

[37] Hots, J. et al., Loudness of sounds with a subcritical bandwidth: A challenge to current loudness models?
J. Acoust. Soc. Am. 134(4), EL334–EL339, 2013.

[38] Hots, J. et al., Loudness of subcritical sounds as a function of bandwidth, center frequency, and level.
J. Acoust. Soc. Am., 135(3), pp. 1313-1320, 2014.

[39] R. Sottek, T. Lobato, J. Becker: Loudness of sounds with a subcritical bandwidth: improved prediction
with the concept of tonal loudness, DAGA 2022, Stuttgart, 2022.

Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
PRACTICAL RECORD - Percieved Stress Scale
100% (1)
PRACTICAL RECORD - Percieved Stress Scale
2 pages
Speech Acoustic Analysis - Volume 1
100% (2)
Speech Acoustic Analysis - Volume 1
194 pages
Audio, Video, and Media in the Ministry
From Everand
Audio, Video, and Media in the Ministry
Clarence Floyd Richmond
No ratings yet
Loudness and Sharpness
No ratings yet
Loudness and Sharpness
13 pages
Lecture Notes On Acoustics I
No ratings yet
Lecture Notes On Acoustics I
157 pages
Eric Schmidt
No ratings yet
Eric Schmidt
1 page
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Digital Photography for Beginners
From Everand
Digital Photography for Beginners
Samuel J.Swan
No ratings yet
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet
Human Nature Potential in Nurture
From Everand
Human Nature Potential in Nurture
David L. Hawk
No ratings yet
ISO_226_2023
No ratings yet
ISO_226_2023
4 pages
Risk Management and System Safety
From Everand
Risk Management and System Safety
Leonam dos Santos Guimarães
5/5 (1)
ECMA-74 20th Edition December 2022
No ratings yet
ECMA-74 20th Edition December 2022
138 pages
Audiology Notes
No ratings yet
Audiology Notes
3 pages
Cybersecurity for Executives: A Guide to Protecting Your Business
From Everand
Cybersecurity for Executives: A Guide to Protecting Your Business
Matthew C. Smith
No ratings yet
Fundamentals To Perform Acoustical Measurements
No ratings yet
Fundamentals To Perform Acoustical Measurements
50 pages
10K Blueprint
From Everand
10K Blueprint
Cian O Farrell
5/5 (2)
1355 Bray A New Psychoacoustic Method For Reliable Measurement of Tonalities According To Perception 05-10-18
No ratings yet
1355 Bray A New Psychoacoustic Method For Reliable Measurement of Tonalities According To Perception 05-10-18
11 pages
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
CAN Bus for Beginners: A Practical Guide to Automotive Networking
From Everand
CAN Bus for Beginners: A Practical Guide to Automotive Networking
Mohamad Charara
No ratings yet
6 MHZ Level Test Set ET 92: Four Instruments in One
No ratings yet
6 MHZ Level Test Set ET 92: Four Instruments in One
4 pages
Keyboard Shortcuts for Mac
From Everand
Keyboard Shortcuts for Mac
Jordan Kennedy
No ratings yet
Emsd1 001 32
No ratings yet
Emsd1 001 32
1 page
The Stock Market from A to See - 2nd Edition
From Everand
The Stock Market from A to See - 2nd Edition
John Nunez
No ratings yet
Hearing, 2nd Edition Complete EPUB eBook
100% (9)
Hearing, 2nd Edition Complete EPUB eBook
16 pages
MPEG Psycho-Acoustic Model
No ratings yet
MPEG Psycho-Acoustic Model
10 pages
BlockChain for Beginners
From Everand
BlockChain for Beginners
Matthew Smith
No ratings yet
21 Howard1997
100% (1)
21 Howard1997
13 pages
AI-Driven Digital Transformation: A Proven Blueprint for Responsible AI Scaling
From Everand
AI-Driven Digital Transformation: A Proven Blueprint for Responsible AI Scaling
Srikanth Victory
No ratings yet
Skripta1 English PDF
No ratings yet
Skripta1 English PDF
157 pages
The Perfectionist's Paradox: Achieving More By Doing Less
From Everand
The Perfectionist's Paradox: Achieving More By Doing Less
Utkarsh
No ratings yet
Dissertation
No ratings yet
Dissertation
47 pages
Unit 4 Commniction Science New
No ratings yet
Unit 4 Commniction Science New
10 pages
483805 (1)
No ratings yet
483805 (1)
188 pages
The Psycho Acoustics
No ratings yet
The Psycho Acoustics
10 pages
Pure Analyzer System
No ratings yet
Pure Analyzer System
114 pages
Implementing Loudness Models in Matlab
No ratings yet
Implementing Loudness Models in Matlab
5 pages
Acoustics ARCH 255 - Liapu Wasif 8 10
No ratings yet
Acoustics ARCH 255 - Liapu Wasif 8 10
3 pages
Pure Tone Audiometer PDF
No ratings yet
Pure Tone Audiometer PDF
5 pages
spearfinal05
No ratings yet
spearfinal05
4 pages
CSP 2019 Winter Model Answer Paper
No ratings yet
CSP 2019 Winter Model Answer Paper
28 pages
UNIT V Notes 1
No ratings yet
UNIT V Notes 1
7 pages
Duplex Tank Mounted Lubricated Rotary Vane Medical Vacuum Systems 1 Through 5 HP
No ratings yet
Duplex Tank Mounted Lubricated Rotary Vane Medical Vacuum Systems 1 Through 5 HP
2 pages
Child and Adolescent Development Prof - Ed
100% (10)
Child and Adolescent Development Prof - Ed
5 pages
08 A Lubrication - Coollant - en - R
100% (1)
08 A Lubrication - Coollant - en - R
6 pages
Chapter 3 Thesis
No ratings yet
Chapter 3 Thesis
3 pages
Doc022 98 80452 PDF
No ratings yet
Doc022 98 80452 PDF
582 pages
A.K. Sarkar, P.B. Gajendragadkar and T.L. Venkatarama Aiyyar, JJ
No ratings yet
A.K. Sarkar, P.B. Gajendragadkar and T.L. Venkatarama Aiyyar, JJ
20 pages
BB Bill
No ratings yet
BB Bill
4 pages
Fiqih Abdul Jafar - Resume
No ratings yet
Fiqih Abdul Jafar - Resume
1 page
Instant Access to Waste-to-Energy: Multi-Criteria Decision Analysis for Sustainability Assessment and Ranking 1st Edition Jingzheng Ren ebook Full Chapters
100% (5)
Instant Access to Waste-to-Energy: Multi-Criteria Decision Analysis for Sustainability Assessment and Ranking 1st Edition Jingzheng Ren ebook Full Chapters
55 pages
I. P. Gurskii-Elementary Physics - Problems and Solutions-Imported Pubn (1989)
100% (2)
I. P. Gurskii-Elementary Physics - Problems and Solutions-Imported Pubn (1989)
520 pages
Katalog Quantum
No ratings yet
Katalog Quantum
9 pages
BS 8204 Bonding
No ratings yet
BS 8204 Bonding
1 page
Car Brands and Models
No ratings yet
Car Brands and Models
9 pages
Solomon11 Tif
No ratings yet
Solomon11 Tif
20 pages
Karthikesan 565-571 RRIJM
No ratings yet
Karthikesan 565-571 RRIJM
8 pages
Understanding Polymorphism
No ratings yet
Understanding Polymorphism
4 pages
Corrosion Report 1
No ratings yet
Corrosion Report 1
6 pages
Forensic 5 - Prelim - Chapter 1 & 2
No ratings yet
Forensic 5 - Prelim - Chapter 1 & 2
4 pages
HamburgSymbols PDF
50% (2)
HamburgSymbols PDF
13 pages
Aircraft Propulsion Systems PDF
No ratings yet
Aircraft Propulsion Systems PDF
9 pages
Overview of Wearable Electronics and Smart Textiles
No ratings yet
Overview of Wearable Electronics and Smart Textiles
23 pages
Chapter 4 L.3 5
No ratings yet
Chapter 4 L.3 5
7 pages
User Manual Microowave 2
No ratings yet
User Manual Microowave 2
16 pages
Letter L Lesson Reflection 5
No ratings yet
Letter L Lesson Reflection 5
2 pages
Operation Supervisor
No ratings yet
Operation Supervisor
3 pages
Grade 6 Catch Up Fridays
No ratings yet
Grade 6 Catch Up Fridays
14 pages