0% found this document useful (0 votes)

43 views4 pages

Tech Talk Supplementary

This document describes a method for estimating material properties of objects from impact sounds. It extracts modal features from example impact sounds using spectrograms. An optimization framework is used to estimate material parameters by minimizing the difference between synthesized and example sounds based on a psychoacoustic metric.

Uploaded by

YAOZHONG ZHANG

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views4 pages

Tech Talk Supplementary

Uploaded by

YAOZHONG ZHANG

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Online Submission ID: 0028

74 A Modal Synthesis Background 104 is below a certain threshold, we collect it in the set of extracted fea-
105 tures, shown as the red cross in the feature space (Figure 2c, where
75 We adopt tetrahedral finite element models to represent any given 106 only the frequency f and damping d are shown).
76 geometry [O’Brien et al. 2002]. The displacements, x ∈ R3N , in
77 such a system can be calculated with the following linear deforma- 107 C Parameter Estimation
78 tion equation:
Mẍ + Cẋ + Kx = f , (1)
108 C.1 Optimization Framework
79 where M, C, and K respectively represent the mass, damping
80 and stiffness matrices. We approximate the damping matrix with 109 The material parameters are estimated through an optimization
81 Rayleigh damping: C = αM + βK, which is a well-established 110 framework. We first create a virtual object that is roughly the same
82 practice. The system can be decoupled into the following form: 111 size and geometry as the real-world object whose impact sound was
T
112 recorded. We then calculate its mass matrix M and stiffness matrix
q̈ + (αI + βΛ)q̇ + Λq = U f , (2) 113 K and find the assumed eigenvalues λ0i ’s using some initial values
114 for the Young’s modulus, mass density, and Poisson’s ratio, E0 , ρ0 ,
83 where Λ is a diagonal matrix.The solution to Eqn. 2 is a bank of
115 and ν0 . The eigenvalue λi for general E and ρ is just a multiple of
84 modes, i.e. damped sinusoidal waves. The i’th mode is
116 λ0i :
γ 0
qi = ai e−di t sin(2πfi t + θi ), (3) λi =
γ0
λi (7)

85 where fi is the frequency of the mode, di is the damping coefficient, 117 where γ = E/ρ is the ratio of Young’s modulus to density, and
86 ai is the excited amplitude, and θi is the initial phase. (fi , di , ai ) 118 γ0 = E0 /ρ0 is the ratio using the assumed values. Applying a unit
87 together define the feature of mode i. 119 impulse on the virtual object at a point corresponding to the actual
The values in Eqn. 3 depend on the material properties, the geome- 120 impact point in the example recording gives an excitation pattern of
try, and the run-time interactions: ai and θi depend on the run-time 121 the eigenvalues as Eqn. 3, where the excitation amplitude of mode j
excitation of the object, while fi and di depend on the geometry 122 is a0j . If the actual (unknown) impulse is not unit, then the excitation
and the material properties: 123 amplitude is just scaled by a factor σ,

1 aj = σa0j (8)
di = (α + βλi ), (4)
2
√ ( )2 124 2 Combining Eqn. 4, Eqn. 5, Eqn.7, and Eqn.8, we obtain a map-
fi =
1
λi −
α + βλi
. (5) 125 ping from an assumed eigenvalue and its excitation (λ0j , a0j ) to an
2π 2 126 estimated mode with frequency f˜j , damping d˜j , and amplitude ãj :
88 where the eigenvalues λi ’s are calculated from M and K, which in {α,β,γ,σ}
89 turn depend on mass density ρ, Young’s modulus E, and Poisson’s (λ0j , a0j ) −−−−−−→ (f˜j , d˜j , ãj ). (9)
90 ratio ν.
127 The estimated sound s̃[n] is generated by mixing all the estimated
128 modes,
B Feature Extraction
∑ ( −d˜ (n/F ) )
91

s̃[n] = ãj e j s
sin(2π f˜j (n/Fs )) (10)
92 We extract the features {fi , di , ai } from the example audio using
j
93 a time-varying frequency representation called power spectrogram.
94 A power spectrogram P for a a time domain signal s[n], is obtained 129 where Fs is the sampling rate.
95 by first breaking it up into overlapping frames, and then performing
96 windowing and Fourier transform on each frame: 130 The estimated sound s̃[n] can then be compared against the exam-
2 131 ple sound s[n] and a difference metric can be computed. An op-
∑ timization process is used to find the parameter set with minimal
−jωn
132
P[m, ω] = s[n]w[n − m]e , (6) difference metric value.
133
n

97 where w is the window applied to the original time domain signal. 134 C.2 Psychoacoustic Metric

135 A combination of two metrics is used: an ‘image domain metric’

136 that evaluates the perceptual similarity of sound clips, and a ‘feature
137 domain metric’ that measures the audio material resemblance.
138 Image Domain Metric: Given an reference sound s[n] and an es-
139 timated sound s̃[n], their power spectrograms are computed using
140 Eqn. 6. The power spectrograms are transformed before the differ-
141 ence is taken. The frequency axis is transformed to critical band
Figure 2 142 rate z to account for humans’ better ability to distinguish lower fre-
143 quencies than higher frequencies [Zwicker and Fastl 1999]. The in-
144 tensity is transformed from pressure to loudness, a perceptual value
98 The features are then extracted from the power spectrogram through 145 that measures human sensation to sound intensity.
99 the process shown in Figure 2. First, a peak is detected in a power
100 spectrogram at the location of a potential mode (Figure 2a, where 146 Feature Domain Metric: To measure the resemblance between ex-
101 f =frequency, t=time). Then a local shape fitting of the power spec- 147 tracted (real) features and estimated (synthesized) features, we use
102 trogram is performed to estimate the frequency, damping and am- 148 a point set matching metric. First the frequency and damping of fea-
103 plitude of the potential mode (Figure 2b). Finally, if the fitting error 149 ture points, (f, d), are transformed. The frequency is transformed

1
Online Submission ID: 0028

150 to the critical band rate as described previously. The damping is 174 component of the synthesized sounds already provides transferabil-
151 transformed to duration, which is proportional to the inverse of the 175 ity of sounds due to varying geometries and dynamics. Hence, we
152 damping value. Figure 3 shows the effect of the transformation. 176 compute the transferred residual under the guidance of modes. Al-
153 A matching score can then be computed between the transformed 177 gorithm 1 shows the complete feature-guided residual transfer al-
154 point sets. gorithm.

Algorithm 1: Residual Transformation at Runtime

200 100

80
1 Input: source modes Φs = {ϕsi }, target modes Φt = {ϕtj }, and
150
source residual audio ssresidual [n]
damping (1/sec)

Output: target residual audio stresidual [n]

Y(d)
100
3 40
Ψ ← DetermineModePairs(Φs , Φt )
50 2 20 foreach mode pair (ϕsk , ϕtk ) ∈ Ψ do
0
1 0 2 3
Ps′ ← ShiftSpectrogram( Ps , ∆frequency)
0 2 4 6 8 10 12 14 0 20 40 60 80 100 120 140 Ps′′ ← StretchSpectrogram( Ps′ , damping ratio)
frequency (kHz) X(f)
A ← FindPixelScale(Pt , Ps′′ )
(a) (b)
Psresidual ′ ← ShiftSpectrogram(Psresidual , ∆frequency)
Figure 3: Point set matching problem in the feature domain: (a) Psresidual ′′ ← StretchSpectrogram(Psresidual ′ , damping ratio)
′′
in the original frequency and damping, (f, d)-space. (b) in the Ptresidual ← MultiplyPixelScale(Psresidual ′′ , A)
transformed, (x, y)-space, where x = X(f ) and y = Y (d). The (ωstart , ωend ) ← FindFrequencyRange(ϕtk−1 , ϕtk )
′′
blue crosses and red circles are the reference and estimated feature Ptresidual [m, ωstart , . . . , ωend ] ← Ptresidual [m, ωstart , . . . , ωend ]
points respectively. The three features having the largest energies end
are labeled 1, 2, and 3.
stresidual [n] ← IterativeInverseSTFT(Ptresidual )

178
155 D Residual Compensation

156 D.1 Residual Computation 179 E Results

157 Figure 4 illustrates the residual computation process. From a 180 Parameter estimation: We estimate the material parameters from
158 recorded sound (Figure 4a), the reference features are extracted 181 various real-world audio recordings: a wood plate, a plastic plate, a
159 (Figure 4b), with frequencies, dampings, and energies depicted as 182 metal plate, a porcelain plate, and a glass bowl. For each recording,
160 the blue circles (Figure 4f). After parameter estimation, the syn- 183 the parameters are estimated using a virtual object that is of the
161 thesized sound is generated (Figure 4c), with the estimated features 184 same size and shape as the one used to record the audio clips. When
162 shown as the red crosses (Figure 4g), which all lie on a curve in the 185 the virtual object is hit at the same location as the real-world object,
163 (f, d)-plane. Each reference feature may be approximated by one 186 it produces a sound similar to the recorded audio, as shown in Fig. 5
164 or more estimated features, and its match ratio number is shown. 187 and the supplementary video.
165 The represented sound is the summation of the reference features 188 Fig. 6 compares the refenece features of the real-world objects and
166 weighted by their match scores, shown as the solid blue circles (Fig- 189 the estimated features of the virtual objects as a result of the param-
167 ure 4h). Finally, the difference between the recorded sound’s power 190 eter estimation.
168 spectrogram (Figure 4a) and the represented sound’s (Figure 4d) are
computed to obtain the residual (Figure 4e). 191 Transfered parameters and residual: The parameters estimated
192 as well as the residuals can be transfered to virtual objects with
193 different sizes and shapes as shown in Fig. 7. From an example
194 recording of a porcelain plate (a), the parameters for the porcelain
195 material are estimated, and the residual computed (b). The parame-
196 ters and residual are then transfered to a smaller porcelain plate (c)
197 and a porcelain bunny (d).
198 Comparison with real recordings: Fig. 8 shows a comparison of
199 the transferred results with the real recordings. From a recording
200 of glass bowl, the parameters for glass are estimated (column (a))
201 and transfered to other virtual glass bowls of different sizes. The
202 synthesized sounds ((b) (c) (d), bottom row) are compared with the
203 real-world audio for these different-sized glass bowls ((b) (c) (d),
204 top row). More examples of transferring the material parameters as
205 well as the residuals are demonstrated in the supplementary video.
Figure 4: Residual computation.
169 206 F Perceptual Study
170 D.2 Residual Transfer 207 we also designed an experiment to evaluate the auditory perception
208 of the synthesized sounds of five different materials. Each subject is
171 As discussed in previous sections, modes transfer naturally with 209 presented with a series of 24 audio clips: 8 are audio recordings of
172 geometries in the modal analysis process, and they respond to exci- 210 sound generated from hitting a real-world objec; 16 are synthesized
173 tations at runtime in a physical manner. In other words, the modal 211 using the techniques described in this paper. For each audio clip,

2
Online Submission ID: 0028

Figure 5: Parameter estimation for different materials. For each material, the material parameters are estimated using an example recorded
audio (top row). Applying the estimated parameters to a virtual object with the same geometry as the real object used in recording the audio
will produce a similar sound (bottom row).

500 30 300 500

1000
20 200
0 500 0 0 0 0
2000 5000 5000 10 5000 100 5000
4000 10000 10000
0 d (1/s) 10000 0 d (1/s) 10000 0 d (1/s) 15000 0 d (1/s) 15000 0 d (1/s)
f (Hz) f (Hz) f (Hz) f (Hz) f (Hz)

(a) wood plate (b) plastic plate (c) metal plate (d) porcelain plate (e) glass bowl

Figure 6: Feature comparison of real and virtual objects. The blue circles represent the reference features extracted from the recordings of
the real objects. The red crosses are the features of the virtual objects using the estimated parameters. Because of the Rayleigh damping
model, all the features of a virtual object lie on the depicted red curve on the (f, d)-plane.

Recognized Material Recognized Material

Recorded Wood Plastic Metal Porcelain Glass Synthesized Wood Plastic Metal Porcelain Glass
Material (%) (%) (%) (%) (%) Material (%) (%) (%) (%) (%)
Wood 50.7 47.9 0.0 0.0 1.4 Wood 52.8 43.5 0.0 0.0 3.7
Plastic 37.5 37.5 6.3 0.0 18.8 Plastic 43.0 52.7 0.0 2.2 2.2
Metal 0.0 0.0 66.1 9.7 24.2 Metal 1.8 1.8 69.6 15.2 11.7
Porcelain 0.0 0.0 1.2 15.1 83.7 Porcelain 0.0 1.1 7.4 29.8 61.7
Glass 1.7 1.7 1.7 21.6 73.3 Glass 3.3 3.3 3.8 40.4 49.2

Table 1: Material Recognition Rate Matrix: Recorded Sounds Table 2: Material Recognition Rate Matrix: Synthesized Sounds
Using Our Method

212 the subject is asked to identify among a set of 5 choices (wood,

213 plastic, metal, porcelain, and glass), from which the sound came. 222 thesizing sounds from rigid-body simulations. In The ACM SIG-
223 GRAPH 2002 Symposium on Computer Animation, ACM Press,
214 Table 1 presents the recognition rates of sounds from real-world 224 175–181.
215 materials, and Table 2 reflects the recognition rates of sounds from
216 synthesized virtual materials. We found that the successful recogni- 225 Z WICKER , E., AND FASTL , H. 1999. Psychoacoustics: Facts and
217 tion rate of virtual materials using our synthesized sounds compares 226 models, 2nd updated edition ed., vol. 254. Springer New York.
218 favorably to the recognition rate of real materials using recorded
219 sounds.

220 References
221 O’B RIEN , J. F., S HEN , C., AND G ATCHALIAN , C. M. 2002. Syn-

3
Online Submission ID: 0028

Figure 7: Transfered material parameters and residual: from a real-world recording (a), the material parameters are estimated and the
residual computed (b). The parameters and residual can then be applied to various objects made of the same material, including (c) a smaller
object with similar shape; (d) an object with different geometry. The transfered modes and residuals are combined to form the final results
(bottom row).

Figure 8: Comparison of transfered results with real-word recordings: from one recording (column (a), top), the optimal parameters and
residual are estimated, and a similar sound is reproduced (column (a), bottom). The parameters and residual can then be applied to different
objects of the same material ((b), (c), (d), bottom), and the results are comparable to the real-world recordings ((b), (c), (d), top).

Empire of Great Brightness Visual and Material Cultures of Ming China, 1368-1644 (Craig Clunas) (Z-Library)
No ratings yet
Empire of Great Brightness Visual and Material Cultures of Ming China, 1368-1644 (Craig Clunas) (Z-Library)
288 pages
Chapter 19 - NVH
100% (1)
Chapter 19 - NVH
24 pages
483805 (1)
No ratings yet
483805 (1)
188 pages
3.1 Frequency-Domain Theory
No ratings yet
3.1 Frequency-Domain Theory
42 pages
Tuning, Timbre, Spectrum Scale
100% (4)
Tuning, Timbre, Spectrum Scale
3 pages
Chapter6 - SPEECH SIGNAL PROCESSING
No ratings yet
Chapter6 - SPEECH SIGNAL PROCESSING
54 pages
SOUND MODDING
No ratings yet
SOUND MODDING
45 pages
Module - 1: Raghudathehs - Gp@manipal - Edu
No ratings yet
Module - 1: Raghudathehs - Gp@manipal - Edu
32 pages
Emotional Responses To The Perceptual Dimensions of Timbre - A Pilot Study Using Physically Informed Sound Synthesis
No ratings yet
Emotional Responses To The Perceptual Dimensions of Timbre - A Pilot Study Using Physically Informed Sound Synthesis
15 pages
1804.01212
No ratings yet
1804.01212
19 pages
A Planar Microphone Array For Spatial Coherence-Based Source Separation
No ratings yet
A Planar Microphone Array For Spatial Coherence-Based Source Separation
6 pages
Image Method For Efficiently Simulating Small-Room Acoustics - DSP - Project
No ratings yet
Image Method For Efficiently Simulating Small-Room Acoustics - DSP - Project
8 pages
article - audio intent detection classification problem
No ratings yet
article - audio intent detection classification problem
4 pages
Full Text 01
No ratings yet
Full Text 01
18 pages
ACEX30-18-112 Vasishta Kanthi
No ratings yet
ACEX30-18-112 Vasishta Kanthi
83 pages
Aeroacustic_report_LEYVA_HERRERA
No ratings yet
Aeroacustic_report_LEYVA_HERRERA
14 pages
Digital Signal Processing: Course
No ratings yet
Digital Signal Processing: Course
47 pages
Experimental Study of Real Gear Transmission Defects Using Sound Perception
No ratings yet
Experimental Study of Real Gear Transmission Defects Using Sound Perception
14 pages
37 Christensen Identification Techniques
No ratings yet
37 Christensen Identification Techniques
3 pages
Acousto Tutorials
No ratings yet
Acousto Tutorials
49 pages
MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment
No ratings yet
MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment
5 pages
Empirical Study of Features and Classifiers For Fault Diagnosis in Motorcycles Based On Acoustic Signals
No ratings yet
Empirical Study of Features and Classifiers For Fault Diagnosis in Motorcycles Based On Acoustic Signals
28 pages
spearfinal05
No ratings yet
spearfinal05
4 pages
Nietjet 0602S 2018 003
No ratings yet
Nietjet 0602S 2018 003
5 pages
Applied Acoustics: J. Klaus, I. Bork, M. Graf, G.-P. Ostermeyer
No ratings yet
Applied Acoustics: J. Klaus, I. Bork, M. Graf, G.-P. Ostermeyer
5 pages
Musical Instrument Timbres Classification With Spectum
100% (1)
Musical Instrument Timbres Classification With Spectum
10 pages
Basic Features of Audio Signals (音訊的基本特徵) : Jyh-Shing Roger Jang (張智星) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan
No ratings yet
Basic Features of Audio Signals (音訊的基本特徵) : Jyh-Shing Roger Jang (張智星) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan
18 pages
1.2 Signals
No ratings yet
1.2 Signals
52 pages
qt5g19z937 Nosplash
No ratings yet
qt5g19z937 Nosplash
51 pages
ZsaDescriptors A Library
No ratings yet
ZsaDescriptors A Library
5 pages
3D Spacial Impulse Response Rendering
No ratings yet
3D Spacial Impulse Response Rendering
10 pages
Singh Mimani NonSynchronousBeBeC2022
No ratings yet
Singh Mimani NonSynchronousBeBeC2022
7 pages
Audio Noise detection
No ratings yet
Audio Noise detection
29 pages
A Percussive Sound Synthesizer Based On Physical and Perceptual Attributes
No ratings yet
A Percussive Sound Synthesizer Based On Physical and Perceptual Attributes
10 pages
Example-Guided Physically Based Modal Sound Synthesis
No ratings yet
Example-Guided Physically Based Modal Sound Synthesis
16 pages
Dectetor de Drones Por El Sonido
No ratings yet
Dectetor de Drones Por El Sonido
66 pages
D - B A I: Iffusion Ased Udio Npainting
No ratings yet
D - B A I: Iffusion Ased Udio Npainting
15 pages
Separation of Harmonic Sound Sources Using Sinusoidal Modeling
No ratings yet
Separation of Harmonic Sound Sources Using Sinusoidal Modeling
4 pages
Gender Recognition Using Fast Fourier Transform With Ann
No ratings yet
Gender Recognition Using Fast Fourier Transform With Ann
6 pages
Sound Fields Radiated by Multiple Sound Sources Arrays
No ratings yet
Sound Fields Radiated by Multiple Sound Sources Arrays
45 pages
Filtering With Spatial Parameters in B-Format Audio Streams
No ratings yet
Filtering With Spatial Parameters in B-Format Audio Streams
49 pages
Sound and Vibration Tutorial en
No ratings yet
Sound and Vibration Tutorial en
4 pages
SIVE Toolkit Manual
No ratings yet
SIVE Toolkit Manual
20 pages
Video Game Ontology Competency Questions
No ratings yet
Video Game Ontology Competency Questions
4 pages
Unclassified 2012 Sijtsma AcousticBeamformingForTheRankingOfAircraftNoise
No ratings yet
Unclassified 2012 Sijtsma AcousticBeamformingForTheRankingOfAircraftNoise
52 pages
VR Game Ontology - Encylopedia of Ludic Terms
No ratings yet
VR Game Ontology - Encylopedia of Ludic Terms
18 pages
Convention Paper 5452: Audio Engineering Society
100% (1)
Convention Paper 5452: Audio Engineering Society
10 pages
Real Time Vibraphone Pitch and Timbre Classification
No ratings yet
Real Time Vibraphone Pitch and Timbre Classification
52 pages
44 E2248
No ratings yet
44 E2248
11 pages
How To Write A CHI Paper
No ratings yet
How To Write A CHI Paper
42 pages
2015 WeissMueller TonalComplexity ICASSP1
No ratings yet
2015 WeissMueller TonalComplexity ICASSP1
6 pages
Ph.D. Thesis Computationally Efficient Methods For Polyphonic Music Transcription
No ratings yet
Ph.D. Thesis Computationally Efficient Methods For Polyphonic Music Transcription
232 pages
Bros Sier 04 Fast Notes
No ratings yet
Bros Sier 04 Fast Notes
6 pages
Skripta1 English PDF
No ratings yet
Skripta1 English PDF
157 pages
Measurement of Loudspeaker and Microphone
No ratings yet
Measurement of Loudspeaker and Microphone
11 pages
Pert Usa PHD
No ratings yet
Pert Usa PHD
232 pages
Klingbeil Dissertation Web
No ratings yet
Klingbeil Dissertation Web
167 pages
Lecture Notes On Acoustics I
No ratings yet
Lecture Notes On Acoustics I
157 pages
Game Ontology - Encylopedia of Ludic Terms
No ratings yet
Game Ontology - Encylopedia of Ludic Terms
16 pages
Formulas Sheet - Fundamental of Acoustics
No ratings yet
Formulas Sheet - Fundamental of Acoustics
22 pages
Introduction To Acoustics Module
No ratings yet
Introduction To Acoustics Module
52 pages
The Sounding Object
No ratings yet
The Sounding Object
411 pages
Directivity Index of A 2 Way Speaker PDF
No ratings yet
Directivity Index of A 2 Way Speaker PDF
121 pages
Room Acoustics: CMSC 828D / Spring 2006
No ratings yet
Room Acoustics: CMSC 828D / Spring 2006
36 pages

Tech Talk Supplementary

Uploaded by

Tech Talk Supplementary

Uploaded by

Online Submission ID: 0028

135 A combination of two metrics is used: an ‘image domain metric’

Algorithm 1: Residual Transformation at Runtime

Output: target residual audio stresidual [n]

156 D.1 Residual Computation 179 E Results

500 30 300 500

Recognized Material Recognized Material

212 the subject is asked to identify among a set of 5 choices (wood,

You might also like