0% found this document useful (0 votes)
43 views4 pages

Tech Talk Supplementary

This document describes a method for estimating material properties of objects from impact sounds. It extracts modal features from example impact sounds using spectrograms. An optimization framework is used to estimate material parameters by minimizing the difference between synthesized and example sounds based on a psychoacoustic metric.

Uploaded by

YAOZHONG ZHANG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views4 pages

Tech Talk Supplementary

This document describes a method for estimating material properties of objects from impact sounds. It extracts modal features from example impact sounds using spectrograms. An optimization framework is used to estimate material parameters by minimizing the difference between synthesized and example sounds based on a psychoacoustic metric.

Uploaded by

YAOZHONG ZHANG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Online Submission ID: 0028

74 A Modal Synthesis Background 104 is below a certain threshold, we collect it in the set of extracted fea-
105 tures, shown as the red cross in the feature space (Figure 2c, where
75 We adopt tetrahedral finite element models to represent any given 106 only the frequency f and damping d are shown).
76 geometry [O’Brien et al. 2002]. The displacements, x ∈ R3N , in
77 such a system can be calculated with the following linear deforma- 107 C Parameter Estimation
78 tion equation:
Mẍ + Cẋ + Kx = f , (1)
108 C.1 Optimization Framework
79 where M, C, and K respectively represent the mass, damping
80 and stiffness matrices. We approximate the damping matrix with 109 The material parameters are estimated through an optimization
81 Rayleigh damping: C = αM + βK, which is a well-established 110 framework. We first create a virtual object that is roughly the same
82 practice. The system can be decoupled into the following form: 111 size and geometry as the real-world object whose impact sound was
T
112 recorded. We then calculate its mass matrix M and stiffness matrix
q̈ + (αI + βΛ)q̇ + Λq = U f , (2) 113 K and find the assumed eigenvalues λ0i ’s using some initial values
114 for the Young’s modulus, mass density, and Poisson’s ratio, E0 , ρ0 ,
83 where Λ is a diagonal matrix.The solution to Eqn. 2 is a bank of
115 and ν0 . The eigenvalue λi for general E and ρ is just a multiple of
84 modes, i.e. damped sinusoidal waves. The i’th mode is
116 λ0i :
γ 0
qi = ai e−di t sin(2πfi t + θi ), (3) λi =
γ0
λi (7)

85 where fi is the frequency of the mode, di is the damping coefficient, 117 where γ = E/ρ is the ratio of Young’s modulus to density, and
86 ai is the excited amplitude, and θi is the initial phase. (fi , di , ai ) 118 γ0 = E0 /ρ0 is the ratio using the assumed values. Applying a unit
87 together define the feature of mode i. 119 impulse on the virtual object at a point corresponding to the actual
The values in Eqn. 3 depend on the material properties, the geome- 120 impact point in the example recording gives an excitation pattern of
try, and the run-time interactions: ai and θi depend on the run-time 121 the eigenvalues as Eqn. 3, where the excitation amplitude of mode j
excitation of the object, while fi and di depend on the geometry 122 is a0j . If the actual (unknown) impulse is not unit, then the excitation
and the material properties: 123 amplitude is just scaled by a factor σ,

1 aj = σa0j (8)
di = (α + βλi ), (4)
2
√ ( )2 124 2 Combining Eqn. 4, Eqn. 5, Eqn.7, and Eqn.8, we obtain a map-
fi =
1
λi −
α + βλi
. (5) 125 ping from an assumed eigenvalue and its excitation (λ0j , a0j ) to an
2π 2 126 estimated mode with frequency f˜j , damping d˜j , and amplitude ãj :
88 where the eigenvalues λi ’s are calculated from M and K, which in {α,β,γ,σ}
89 turn depend on mass density ρ, Young’s modulus E, and Poisson’s (λ0j , a0j ) −−−−−−→ (f˜j , d˜j , ãj ). (9)
90 ratio ν.
127 The estimated sound s̃[n] is generated by mixing all the estimated
128 modes,
B Feature Extraction
∑ ( −d˜ (n/F ) )
91

s̃[n] = ãj e j s
sin(2π f˜j (n/Fs )) (10)
92 We extract the features {fi , di , ai } from the example audio using
j
93 a time-varying frequency representation called power spectrogram.
94 A power spectrogram P for a a time domain signal s[n], is obtained 129 where Fs is the sampling rate.
95 by first breaking it up into overlapping frames, and then performing
96 windowing and Fourier transform on each frame: 130 The estimated sound s̃[n] can then be compared against the exam-
2 131 ple sound s[n] and a difference metric can be computed. An op-
∑ timization process is used to find the parameter set with minimal
−jωn
132
P[m, ω] = s[n]w[n − m]e , (6) difference metric value.
133
n

97 where w is the window applied to the original time domain signal. 134 C.2 Psychoacoustic Metric

135 A combination of two metrics is used: an ‘image domain metric’


136 that evaluates the perceptual similarity of sound clips, and a ‘feature
137 domain metric’ that measures the audio material resemblance.
138 Image Domain Metric: Given an reference sound s[n] and an es-
139 timated sound s̃[n], their power spectrograms are computed using
140 Eqn. 6. The power spectrograms are transformed before the differ-
141 ence is taken. The frequency axis is transformed to critical band
Figure 2 142 rate z to account for humans’ better ability to distinguish lower fre-
143 quencies than higher frequencies [Zwicker and Fastl 1999]. The in-
144 tensity is transformed from pressure to loudness, a perceptual value
98 The features are then extracted from the power spectrogram through 145 that measures human sensation to sound intensity.
99 the process shown in Figure 2. First, a peak is detected in a power
100 spectrogram at the location of a potential mode (Figure 2a, where 146 Feature Domain Metric: To measure the resemblance between ex-
101 f =frequency, t=time). Then a local shape fitting of the power spec- 147 tracted (real) features and estimated (synthesized) features, we use
102 trogram is performed to estimate the frequency, damping and am- 148 a point set matching metric. First the frequency and damping of fea-
103 plitude of the potential mode (Figure 2b). Finally, if the fitting error 149 ture points, (f, d), are transformed. The frequency is transformed

1
Online Submission ID: 0028

150 to the critical band rate as described previously. The damping is 174 component of the synthesized sounds already provides transferabil-
151 transformed to duration, which is proportional to the inverse of the 175 ity of sounds due to varying geometries and dynamics. Hence, we
152 damping value. Figure 3 shows the effect of the transformation. 176 compute the transferred residual under the guidance of modes. Al-
153 A matching score can then be computed between the transformed 177 gorithm 1 shows the complete feature-guided residual transfer al-
154 point sets. gorithm.

Algorithm 1: Residual Transformation at Runtime


200 100

80
1 Input: source modes Φs = {ϕsi }, target modes Φt = {ϕtj }, and
150
source residual audio ssresidual [n]
damping (1/sec)

Output: target residual audio stresidual [n]


60

Y(d)
100
3 40
Ψ ← DetermineModePairs(Φs , Φt )
50 2 20 foreach mode pair (ϕsk , ϕtk ) ∈ Ψ do
0
1 0 2 3
Ps′ ← ShiftSpectrogram( Ps , ∆frequency)
0 2 4 6 8 10 12 14 0 20 40 60 80 100 120 140 Ps′′ ← StretchSpectrogram( Ps′ , damping ratio)
frequency (kHz) X(f)
A ← FindPixelScale(Pt , Ps′′ )
(a) (b)
Psresidual ′ ← ShiftSpectrogram(Psresidual , ∆frequency)
Figure 3: Point set matching problem in the feature domain: (a) Psresidual ′′ ← StretchSpectrogram(Psresidual ′ , damping ratio)
′′
in the original frequency and damping, (f, d)-space. (b) in the Ptresidual ← MultiplyPixelScale(Psresidual ′′ , A)
transformed, (x, y)-space, where x = X(f ) and y = Y (d). The (ωstart , ωend ) ← FindFrequencyRange(ϕtk−1 , ϕtk )
′′
blue crosses and red circles are the reference and estimated feature Ptresidual [m, ωstart , . . . , ωend ] ← Ptresidual [m, ωstart , . . . , ωend ]
points respectively. The three features having the largest energies end
are labeled 1, 2, and 3.
stresidual [n] ← IterativeInverseSTFT(Ptresidual )

178
155 D Residual Compensation

156 D.1 Residual Computation 179 E Results

157 Figure 4 illustrates the residual computation process. From a 180 Parameter estimation: We estimate the material parameters from
158 recorded sound (Figure 4a), the reference features are extracted 181 various real-world audio recordings: a wood plate, a plastic plate, a
159 (Figure 4b), with frequencies, dampings, and energies depicted as 182 metal plate, a porcelain plate, and a glass bowl. For each recording,
160 the blue circles (Figure 4f). After parameter estimation, the syn- 183 the parameters are estimated using a virtual object that is of the
161 thesized sound is generated (Figure 4c), with the estimated features 184 same size and shape as the one used to record the audio clips. When
162 shown as the red crosses (Figure 4g), which all lie on a curve in the 185 the virtual object is hit at the same location as the real-world object,
163 (f, d)-plane. Each reference feature may be approximated by one 186 it produces a sound similar to the recorded audio, as shown in Fig. 5
164 or more estimated features, and its match ratio number is shown. 187 and the supplementary video.
165 The represented sound is the summation of the reference features 188 Fig. 6 compares the refenece features of the real-world objects and
166 weighted by their match scores, shown as the solid blue circles (Fig- 189 the estimated features of the virtual objects as a result of the param-
167 ure 4h). Finally, the difference between the recorded sound’s power 190 eter estimation.
168 spectrogram (Figure 4a) and the represented sound’s (Figure 4d) are
computed to obtain the residual (Figure 4e). 191 Transfered parameters and residual: The parameters estimated
192 as well as the residuals can be transfered to virtual objects with
193 different sizes and shapes as shown in Fig. 7. From an example
194 recording of a porcelain plate (a), the parameters for the porcelain
195 material are estimated, and the residual computed (b). The parame-
196 ters and residual are then transfered to a smaller porcelain plate (c)
197 and a porcelain bunny (d).
198 Comparison with real recordings: Fig. 8 shows a comparison of
199 the transferred results with the real recordings. From a recording
200 of glass bowl, the parameters for glass are estimated (column (a))
201 and transfered to other virtual glass bowls of different sizes. The
202 synthesized sounds ((b) (c) (d), bottom row) are compared with the
203 real-world audio for these different-sized glass bowls ((b) (c) (d),
204 top row). More examples of transferring the material parameters as
205 well as the residuals are demonstrated in the supplementary video.
Figure 4: Residual computation.
169 206 F Perceptual Study
170 D.2 Residual Transfer 207 we also designed an experiment to evaluate the auditory perception
208 of the synthesized sounds of five different materials. Each subject is
171 As discussed in previous sections, modes transfer naturally with 209 presented with a series of 24 audio clips: 8 are audio recordings of
172 geometries in the modal analysis process, and they respond to exci- 210 sound generated from hitting a real-world objec; 16 are synthesized
173 tations at runtime in a physical manner. In other words, the modal 211 using the techniques described in this paper. For each audio clip,

2
Online Submission ID: 0028

Figure 5: Parameter estimation for different materials. For each material, the material parameters are estimated using an example recorded
audio (top row). Applying the estimated parameters to a virtual object with the same geometry as the real object used in recording the audio
will produce a similar sound (bottom row).

500 30 300 500


1000
20 200
0 500 0 0 0 0
2000 5000 5000 10 5000 100 5000
4000 10000 10000
0 d (1/s) 10000 0 d (1/s) 10000 0 d (1/s) 15000 0 d (1/s) 15000 0 d (1/s)
f (Hz) f (Hz) f (Hz) f (Hz) f (Hz)

(a) wood plate (b) plastic plate (c) metal plate (d) porcelain plate (e) glass bowl

Figure 6: Feature comparison of real and virtual objects. The blue circles represent the reference features extracted from the recordings of
the real objects. The red crosses are the features of the virtual objects using the estimated parameters. Because of the Rayleigh damping
model, all the features of a virtual object lie on the depicted red curve on the (f, d)-plane.

Recognized Material Recognized Material


Recorded Wood Plastic Metal Porcelain Glass Synthesized Wood Plastic Metal Porcelain Glass
Material (%) (%) (%) (%) (%) Material (%) (%) (%) (%) (%)
Wood 50.7 47.9 0.0 0.0 1.4 Wood 52.8 43.5 0.0 0.0 3.7
Plastic 37.5 37.5 6.3 0.0 18.8 Plastic 43.0 52.7 0.0 2.2 2.2
Metal 0.0 0.0 66.1 9.7 24.2 Metal 1.8 1.8 69.6 15.2 11.7
Porcelain 0.0 0.0 1.2 15.1 83.7 Porcelain 0.0 1.1 7.4 29.8 61.7
Glass 1.7 1.7 1.7 21.6 73.3 Glass 3.3 3.3 3.8 40.4 49.2

Table 1: Material Recognition Rate Matrix: Recorded Sounds Table 2: Material Recognition Rate Matrix: Synthesized Sounds
Using Our Method

212 the subject is asked to identify among a set of 5 choices (wood,


213 plastic, metal, porcelain, and glass), from which the sound came. 222 thesizing sounds from rigid-body simulations. In The ACM SIG-
223 GRAPH 2002 Symposium on Computer Animation, ACM Press,
214 Table 1 presents the recognition rates of sounds from real-world 224 175–181.
215 materials, and Table 2 reflects the recognition rates of sounds from
216 synthesized virtual materials. We found that the successful recogni- 225 Z WICKER , E., AND FASTL , H. 1999. Psychoacoustics: Facts and
217 tion rate of virtual materials using our synthesized sounds compares 226 models, 2nd updated edition ed., vol. 254. Springer New York.
218 favorably to the recognition rate of real materials using recorded
219 sounds.

220 References
221 O’B RIEN , J. F., S HEN , C., AND G ATCHALIAN , C. M. 2002. Syn-

3
Online Submission ID: 0028

Figure 7: Transfered material parameters and residual: from a real-world recording (a), the material parameters are estimated and the
residual computed (b). The parameters and residual can then be applied to various objects made of the same material, including (c) a smaller
object with similar shape; (d) an object with different geometry. The transfered modes and residuals are combined to form the final results
(bottom row).

Figure 8: Comparison of transfered results with real-word recordings: from one recording (column (a), top), the optimal parameters and
residual are estimated, and a similar sound is reproduced (column (a), bottom). The parameters and residual can then be applied to different
objects of the same material ((b), (c), (d), bottom), and the results are comparable to the real-world recordings ((b), (c), (d), top).

You might also like