0% found this document useful (0 votes)

46 views

Audio-Based Classification of Speaker Characteristics: Promiti Dutta and Alexander Haubold

Human voice contains non-linguistic features indicative of various speaker demographics. Low-level signal-based features include MFCCs, LPCs, and six spectral features. Midand high-level features are optimal for identification of speaker characteristics.

Uploaded by

sabih

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Audio-Based Classification of Speaker Characteristics: Promiti Dutta and Alexander Haubold

Uploaded by

sabih

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

AUDIO-BASED CLASSIFICATION OF SPEAKER CHARACTERISTICS

Promiti Dutta and Alexander Haubold

Columbia University, New York, NY

{pd2049,ah297}@columbia.edu
ABSTRACT Video Database Speaker Class Annotation Silence
(segmented speakers) (VAST MM) Filter

The human voice is primarily a carrier of speech, but it also

contains non-linguistic features unique to a speaker and
indicative of various speaker demographics, e.g. gender, Weka: SMO algorithm WEKA File Low, Mid, High Level
(50% train, 50% test) Formatting Feature Extraction
nativity, ethnicity. Such characteristics are helpful cues for
audio/video search and retrieval. In this paper, we evaluate
the effects of various low-, mid-, and high-level features for
effective classification of speaker characteristics. Low-level Results Recursive SMO with
Ranker: pick top-k
signal-based features include MFCCs, LPCs, and six
spectral features; mid-level statistical features model low-
level features; and high-level semantic features are based on Figure 1: Overview diagram of processing steps in our approach.
selected phonemes in addition to mid-level features. Our
data set consists of approximately 76.4 hours of annotated significantly less effective in determining characteristics
audio with 2786 unique speaker segments used for than mid-level features. For gender classification, we
classification. Quantitative evaluation of our method results achieve an accuracy of 67.3% using low-level signal-based
in accuracy rates up to 98.6% on our test data for features. Tzanetakis et. al. report similar results (76%) on
male/female classification using mid-level features and a TRECVid 2003 data; however their low-level features are
linear kernel support vector machine. We determine that computed on 20 ms windows, while we use 10 ms windows
mid- and high-level features are optimal for identification of [3]. Our results for mid-level statistical features show
speaker characteristics. significant improvement, leading to an overall accuracy of
90.1%-98.6% over varying speech window sizes.
Index Terms— audio signal processing, feature
extraction, MFCC, LPC, classification, gender, ethnicity 2. DATA SET

1. INTRODUCTION Our dataset includes student final project presentation

videos from a large university-level engineering design
Searching through vast amounts of spoken audio course with more than 150 students per semester. Each
collections is an arduous task without the availability of presentation team is comprised of 5 – 6 students who take
search cues. Audio transcripts generated by Automatic turns presenting their team’s project during a midterm and a
Speech Recognition (ASR) systems provide good content final period in the semester. Our video data spans 5 years.
search cues, albeit imperfect coverage and varying We perform data annotation to establish ground truth
accuracy, especially for salient key terms [1,2]. Search for using the VAST MM Video Audio Structure Text
content can be improved significantly through re-ranking or Multimedia system (Figure 2) [4]. The VAST MM browser
filtering speech segments by known speaker characteristics. displays audio and visual cues, which are useful for
In this paper we identify and evaluate classifiers for distinguishing speaker segments. In an indexing step, the
three characteristics: gender, nativity (native vs. non-native VAST MM indexing tool performs several content analysis
English), and ethnicity (African-American, Asian, processes, including automatic speaker segmentation based
Caucasian, Hispanic, South-east Asian). Using a large on Mel Frequency Cepstral Coefficient (MFCC) features
dataset of 2786 manually annotated speech segments from and the Bayesian Information Criterion (BIC) [5]. Using the
student presentation videos, we evaluate and train on tool, we listen to and view short video clips from each
various low-, mid-, and high-level feature classifiers on the speaker segment to correctly annotate each with appropriate
detection of voice characteristics (Figure 1). Through classifications. Each speaker segment is classified according
experimentation, we observe that low-level features are to gender, ethnicity, and familiarity of spoken English.

978-1-4244-4291-1/09/$25.00 ©2009 IEEE 422 ICME 2009

Authorized licensed use limited to: University of Wollongong. Downloaded on June 05,2010 at 15:53:08 UTC from IEEE Xplore. Restrictions apply.
Table 1: Summary of classification for data set.
Class # of Segments Time (hr)
African-American 101 2.36
Asian 776 20.15
Ethnicity Caucasian 1233 33.34
Hispanic 80 2.00
South-east Asian 295 7.74
Male 1865 51.86
Gender
Female 692 16.23
Spoken Native 2197 58.81
English Non-native 327 8.53

3. FEATURE EXTRACTION
Figure 2: VAST MM browser used to annotate speaker seg-
ments. Visual cues (key frames and streaming video) and audio
signal are displayed in the user interface for ease of annotation. We extract low-, mid-, and high-level features from each
audio speaker segments for varying time-intervals. Low-
Table 1 summarizes the sample sizes of the annotated data level features are signal-based; mid-level features are
set: we have annotated over 76.4 hours of audio with 2786 statistical aggregates of low-level features; and high-level
unique speaker segments. Each audio speaker segment is features include phonemes in addition to mid-level features.
extracted from the original video for further analysis. 3.1. Low-level: Signal-level
In a preprocessing step, audio speaker segments are
filtered for silence. This step is crucial for removing a Low-level features include 13 MFCCs, 13 Linear Predictive
signal, which would otherwise act as a similarity between Coefficients (LPCs), and 6 distinct spectral features for a
speaker segments from different classes. Because the total of 32 distinct features from each 256-sample window
original video recordings were made with wired and (~0.01 sec in a 22 kHz sampled signal). The 13 MFCCs are
wireless analog microphones, silent pauses in the audio a representation of the short-term power spectrum of a
track are practically low-amplitude noise. Their numerical sound. LPCs analyze the speech signal by estimating
representation as MFCC features is substantially different formants, removing their effects from the speech signal, and
from actual speech: the zeroth MFCC feature, commonly estimating the intensity and frequency of the remaining
referred to as a representation of signal amplitude, deviates buzz. The six spectral features include energy entropy
most; higher order MFCCs also reflect a significant block, short time energy, zero-crossing rate, spectral roll
difference due to the high frequency inherent to noise. We off, spectral centroid, and spectral flux [6].
apply a simple heuristic, which computes the absolute 3.2. Mid-level: Statistical Aggregates from Signal Level
maximum amplitude A for a given speaker segment, and
filters out any short fixed audio sample window (256 Mid-level features are statistical aggregates of the
samples) which does not pass a threshold measured as an aforementioned 32 low-level features on longer samples.
empirically determined fraction of A. The low-level features underlie a Gaussian distribution with
Key characteristics of the audio data include varying mean, μ, and variance, . We model the aggregate of low-
audio quality between student presentations. This is largely level MFCC and LPC features by their mean and
due to different microphones that were used over the five- covariance. The covariance matrices for MFCC and LPC
year course recordings. Also affecting audio quality is an are symmetrical; we only use the covariance values from the
individual speaker’s use of the microphone, such as upper triangular matrix and the diagonal for a total of 91
placement with respect to speaker (hand-held vs. on-stand) values for MFCCs and LPCs, respectively. We include 13
and presenter’s activity (rigid pose vs. constant shifting). MFCC means, 13 LPC means, and respective statistical
We notice a skew in the distribution between certain measures for the 6 spectral features [6]. The complete
annotation classes. Specifically, in the engineering school feature vector for mid-level features contains 214 features.
we observe a 3:1 ratio of male to female students. Similarly, 3.3. High-level: Semantic Level
we find fewer speakers in some ethnic classes (African
Americans and Hispanics) than others (Asians, Caucasians, High-level feature vectors are derived from mid-level
and Southeast Asians). To avoid a bias due to unequal features. We include 12 additional features derived from
sample sizes, we down-sample the data set to comparable phonemes for a total of 226 features (91 MFCC cov, 13
class sizes for classification. MFCC mean, 91 LPC cov, 13 LPC mean, 6 spectral

423

Authorized licensed use limited to: University of Wollongong. Downloaded on June 05,2010 at 15:53:08 UTC from IEEE Xplore. Restrictions apply.
100 30000

95 25000

Number of Samples
Accuracy (%)

90 20000

85 15000

80 10000

75 5000

70 0
0 10 20 30 40 50 1 2 3 5 10 30 40
Sample Time (sec) Sample Time (sec)

All Features Top 5 Features Top 10 Features Female Male

Figure 3 (left): Male/female classification accuracy for varying sampling time lengths. The 256-sample window (~0.01 sec) using low-
level samples is our baseline accuracy (67.3%). Mid-level features (1 sec – 40 sec) exhibit significant increasing classification accuracy.
The same trend is apparent after applying classification with the top 5 and 10 features selected by recursive feature selection.
Figure 4 (right): Distribution of non-overlapping sample sizes for male/female speaker segments at different sampling time durations.

features, and 12 phonemes). We apply phoneme extraction Additionally, we perform feature selection using the
to generate a frequency list of occurring phonemes in the “SVMAttributeEval” method in Weka.
audio signal. We apply our approach [7] to identify a “SVMAttributeEval” evaluates the weight of an attribute by
selection of monophthongs, diphthongs, and fricatives using a linear SVM. The “Ranker” search method ranks
(/AA/, /AE/, /AH/, /AO/, /EH/, /ER/, /IH/, /IY/, /S/, /SH/, each feature by the square of the weight assigned by the
/UH/, /UW/). This heuristic method models the vocal tract SVM from the “SVMAttributeEval” method. The selected
using an autoregressive model of the speech signal in which features are used for classification to test the overall
the peaks of the frequency response correspond to resonant prediction of the dataset.
frequencies of the vocal tract (formants). The closest
matching phoneme is determined by the Euclidean distance 5. EXPERIMENTS AND RESULTS
of a weighted difference between model and computed
5.1. Low-level: Signal-level
values by using a table of expected frequency values for
formants F1, F2, and F3. We apply low-level features to non-overlapping 256-sample
windows (~0.01 sec) for gender classification
4. CLASSIFICATION AND FEATURE SELECTION (male/female). Male samples sizes are down-sampled to
adjust for any classification bias due to mismatching sample
Classification is performed using the Sequential Minimal sizes between the two classes. In total, we have 105,106
Optimization (SMO) algorithm in Weka [8]. SMO is a male speaking auditory samples compared to the 94,555
computationally simpler method to compute the support female speaking samples. The linear kernel SMO achieves
vector machine (SVM) quadratic programming (QP) 67.3% classification accuracy (Figure 3), consistent with an
optimization problem without extra matrix storage and accuracy of 76% on 0.02 sec sample windows in [4].
without using numerical QP optimization steps. We use a
5.2. Mid-level: Statistical aggregates from signal level
linear kernel unless otherwise noted for the SMO. The
output equation for a linear SVM (Equation 1) defines w as 5.2.1. Varying Sampling Times
the normal vector to the hyperplane, x as the input vector, We extract mid-level features for several non-overlapping
and u as the separating hyperplane. The linear kernel sampling intervals: 1, 2, 3, 5, 10, 20, 30, and 40 seconds
identifies the optimal separating hyper-plane between the (Figure 4). Monologues longer than 40 seconds by a given
distributions by maximizing the margin m (Equation 2) speaker are rare in our dataset. We down-sample the male
using training examples. Prediction is performed on the test samples to create equal distribution between male and
set. To avoid classification biases, cross-validation is female samples for classification.
obtained for the experiments using a 50% split for training We perform classification using all 214 features to
and test sets. determine the efficacy of mid-level features at varying
& &
u  w x  b (Equation 1) sampling intervals. Classification accuracies range between
b 90.1 – 98.6% (Figure 3) where accuracy is logarithmically
m
w2 (Equation 2) related to sample time. A 10-second time interval provides a
reasonable baseline for analysis on high-level features.

424

Authorized licensed use limited to: University of Wollongong. Downloaded on June 05,2010 at 15:53:08 UTC from IEEE Xplore. Restrictions apply.
polynomial kernel. A linear kernel did not provide effective
classification accuracies.
The demographic classification may be confounded by
inclusion of both native and non-native English speakers in
the respective groups. We remove this bias by creating
groups based on native and non-native speakers and their
respective demographics class. This increases classification
accuracy to approximately 64.5% accuracy for each class.
The classification confusion matrix indicates the similarity
between the Asian and South-east Asian groups, suggesting
that better accuracy may be obtained by combining these
two groups. A similar association is observed with the
Figure 5: Additive histogram of male samples shown in African American and Hispanic groups as well. We note
blue; female in red. (top) MFCC covariance 2_7. (bottom) that these results are significant compared to the
MFCC covariance 3_8. probabilistic 20% accuracy achieved by random guessing.

5.2.2. Feature Selection 6. CONCLUSIONS

Each feature vector contains 214 features for mid-level
analysis. The use of excessive features can result in over- This paper presents a survey of the different levels of
fitting. To determine whether the data was over-fitted, we features that can be applied to classification of speech. We
perform feature selection to identify the 5 and 10 most demonstrate that low-level features perform poorly for
significant features for our classification for each sampling classification since audio sampling is too short and therefore
interval (1, 2, 3, 5, 10, 20, 30, and 40 seconds). We not representative of characteristic traits for the classifi-
determine that using fewer features provides comparable cation classes. We show that mid- and high-level features
classification accuracy and is less computationally perform significantly better, because higher order features
expensive (Figure 3). more closely correspond to human perception of auditory
For each of the sampling intervals, we obtain a distinct characteristics. A human can identify characteristics best
group of top 5 and 10 significant. We note that certain with ample amount of information (longer speech segments)
features are common to each sampling interval. Specifically, rather than short samples of speech. The main disadvantage
there are two MFCC co-variances that rank as the top two with the use of these types of features is the requirement of
features for all classification performed with mid-level longer audio segments. However given the domain of
features. The additive histogram contains two very distinct presentation and lecture videos, these audio segments are
Gaussians for males and females with very different means aptly available and thus are applicable for effective audio
for these two MFCC co-variances (Figure 5). search methods for large multimedia collections. We
propose further investigation into the high-level feature
5.3. High-level: semantic level domain by through the exploration of additional phonemes
5.3.1. Spoken English Experiment as well as semantic (vocabulary) usage.
In the spoken English experiment, we classify native
English speakers versus non-native English accented 7. REFERENCES
speakers. Sample size is 2700 male and 2700 female feature
segments. We obtain a 73.5% classification accuracy. [1] B. Matthews, U. Chaudhari, and B. Ramabhadran, “Fast Audio
It is possible that the classifier is confounded by Search Using Space Modeling,” ASRU ‘07, Kyoto, Japan.
[2] C. González-Ferreras and V. Cardeñoso-Payo, ”A System for
demographical data. We perform additional experiments in
Speech Driven Information Retrieval,” ASRU ‘07, Kyoto, Japan.
which we create sub-groups for classification, i.e. African [3] G. Tzanetakis, M.-Y. Chen, “Building Audio Classifiers for
American native English speaker versus African American Broadcast News Retrieval,” WIAMIS '04, Lisbon, Portugal, 2004.
non-native English speaker. Classification accuracy for [4] A. Haubold, J.R. Kender, “VAST MM: Multimedia Browser
these smaller groups rises to 80% and greater for each of the for Presentation Video”, CIVR ‘07, Amsterdam, The Netherlands.
5 smaller subgroups created. [5] A. Haubold, J.R. Kender, “Augmented segmentation and
visualization for presentation videos,” MM ‘05, Singapore.
5.3.2. Demographics Experiment [6] T. Giannakopoulos, “Some Basic Audio Features,” Matlab File
The demographics experiment is a multi-class classification Exchange, March 16, 2008.
task amongst five groups: African Americans, Asians, [7] A. Haubold, J.R. Kender, “Alignment of Speech to Highly
Caucasians, Hispanics, and South-east Asians. We sample Imperfect Text Transcriptions”, ICME ‘07.
each group to contain 600 samples of non-overlapping 10- [8] I.H. Witten and E. Frank, “Data Mining: Practical machine
second sampling windows. We obtain 48.5% classification learning tools and techniques,” 2nd Edition, Morgan Kaufmann,
accuracy using an empirically determined 5th degree San Francisco, 2005.

425

Authorized licensed use limited to: University of Wollongong. Downloaded on June 05,2010 at 15:53:08 UTC from IEEE Xplore. Restrictions apply.

Charlie and The Chocolate Factory - Play Script Scenes For Drama
67% (3)
Charlie and The Chocolate Factory - Play Script Scenes For Drama
5 pages
Medical Transcription Sample OB
No ratings yet
Medical Transcription Sample OB
8 pages
article - audio intent detection classification problem
No ratings yet
article - audio intent detection classification problem
4 pages
Pitch and MFCC Dependent GMM Models For Speaker Identification Systems
No ratings yet
Pitch and MFCC Dependent GMM Models For Speaker Identification Systems
4 pages
Speaker Identification E6820 Spring '08 Final Project Report Prof. Dan Ellis
No ratings yet
Speaker Identification E6820 Spring '08 Final Project Report Prof. Dan Ellis
16 pages
Feature Analysis and Extraction For Audio Automatic Classification
No ratings yet
Feature Analysis and Extraction For Audio Automatic Classification
6 pages
2_CNN based speaker recognition in language and text independent small scale system
No ratings yet
2_CNN based speaker recognition in language and text independent small scale system
4 pages
Speaker and Language Recognition by GMM
No ratings yet
Speaker and Language Recognition by GMM
5 pages
MFCC As Features For Speaker Classification Using Machine Learning
No ratings yet
MFCC As Features For Speaker Classification Using Machine Learning
5 pages
Spoken Language Identification Using Hybrid Feature Extraction Methods
No ratings yet
Spoken Language Identification Using Hybrid Feature Extraction Methods
5 pages
Iot Project Report
No ratings yet
Iot Project Report
15 pages
Intechopen 80419
No ratings yet
Intechopen 80419
18 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
Feature Extraction Methods LPC, PLP and MFCC
100% (1)
Feature Extraction Methods LPC, PLP and MFCC
5 pages
Andy Sun, Maisy Wieman, Analyzing Vocal Patterns To Determine Emotion
No ratings yet
Andy Sun, Maisy Wieman, Analyzing Vocal Patterns To Determine Emotion
5 pages
Speaker Verification Using MFCC and Support Vector Machine: Shi-Huang Chen and Yu-Ren Luo
No ratings yet
Speaker Verification Using MFCC and Support Vector Machine: Shi-Huang Chen and Yu-Ren Luo
4 pages
MFCC and Vector Quantization For Arabic Fricatives2012
No ratings yet
MFCC and Vector Quantization For Arabic Fricatives2012
6 pages
bhakre2016
No ratings yet
bhakre2016
5 pages
Speaker Recognition System Using MFCC and Vector Quantization
No ratings yet
Speaker Recognition System Using MFCC and Vector Quantization
7 pages
MFCC Feature Extraction
No ratings yet
MFCC Feature Extraction
9 pages
An Audio Classification Approach Using Feature Extraction Neural Network Classification Approch
No ratings yet
An Audio Classification Approach Using Feature Extraction Neural Network Classification Approch
6 pages
Speech Recognition, Synthesis, and Dialogue 2
No ratings yet
Speech Recognition, Synthesis, and Dialogue 2
59 pages
Recall What Are Sound Features? Feature Detection and Extraction Features in Sphinx III
No ratings yet
Recall What Are Sound Features? Feature Detection and Extraction Features in Sphinx III
11 pages
Final Report Complete PDF
No ratings yet
Final Report Complete PDF
26 pages
Research Paper ML
No ratings yet
Research Paper ML
3 pages
VOICE IDENTIFICATION USING MACHINE LEARNING MODELS
No ratings yet
VOICE IDENTIFICATION USING MACHINE LEARNING MODELS
4 pages
hedha houa
No ratings yet
hedha houa
5 pages
pxc3872774 PDF
No ratings yet
pxc3872774 PDF
7 pages
Vector Quantization Approach For Speaker Recognition Using MFCC and Inverted MFCC
No ratings yet
Vector Quantization Approach For Speaker Recognition Using MFCC and Inverted MFCC
7 pages
Multimedia Auditory Signal Analysis
No ratings yet
Multimedia Auditory Signal Analysis
17 pages
Speaker Recognition Using Vocal Tract Features
No ratings yet
Speaker Recognition Using Vocal Tract Features
5 pages
Implementing Speaker Recognition: Chase Zhou Physics 406 - 11 May 2015
No ratings yet
Implementing Speaker Recognition: Chase Zhou Physics 406 - 11 May 2015
10 pages
Speaker Verification For Remote Authentication
100% (2)
Speaker Verification For Remote Authentication
31 pages
favsi m3 (models)
No ratings yet
favsi m3 (models)
48 pages
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
No ratings yet
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
3 pages
Speech Recognition and Verification Using MFCC and VQ
No ratings yet
Speech Recognition and Verification Using MFCC and VQ
6 pages
A Comparative Study in Automatic Recognition of Broadcast Audio
No ratings yet
A Comparative Study in Automatic Recognition of Broadcast Audio
4 pages
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
No ratings yet
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
19 pages
Expert Systems With Applications: P. Dhanalakshmi, S. Palanivel, V. Ramalingam
No ratings yet
Expert Systems With Applications: P. Dhanalakshmi, S. Palanivel, V. Ramalingam
7 pages
2017 Bookmatter SpeechRecognitionUsingArticula
No ratings yet
2017 Bookmatter SpeechRecognitionUsingArticula
8 pages
Feature Extraction Techniques Comparison For Emotion Recognition Using Acoustic Features
No ratings yet
Feature Extraction Techniques Comparison For Emotion Recognition Using Acoustic Features
4 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
45 pages
Features For Speech Classification and Recogination
No ratings yet
Features For Speech Classification and Recogination
1 page
Ijves Y14 05338
No ratings yet
Ijves Y14 05338
5 pages
Speaker Recognition Using MFCC and VQ
No ratings yet
Speaker Recognition Using MFCC and VQ
2 pages
11.speech Emotion Recognition
No ratings yet
11.speech Emotion Recognition
13 pages
Dushyant Sharma, Patrick A. Naylor, Nikolay D. Gaubitch and Mike Brookes
No ratings yet
Dushyant Sharma, Patrick A. Naylor, Nikolay D. Gaubitch and Mike Brookes
4 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
1804.01212
No ratings yet
1804.01212
19 pages
Speaker Recognition: SRT Project of Signal Processing
No ratings yet
Speaker Recognition: SRT Project of Signal Processing
27 pages
Algorithm For The Identification and Verification Phase
No ratings yet
Algorithm For The Identification and Verification Phase
9 pages
Voice Recognition
100% (1)
Voice Recognition
18 pages
Gender Recognition Using Fast Fourier Transform With Ann
No ratings yet
Gender Recognition Using Fast Fourier Transform With Ann
6 pages
AES 132 Salient Audio Features Investigation (Paper No 8663) 2012-Libre
No ratings yet
AES 132 Salient Audio Features Investigation (Paper No 8663) 2012-Libre
8 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
Feature extraction techniques for speech processing A review
No ratings yet
Feature extraction techniques for speech processing A review
8 pages
Acoustic Feature Analysis For ASR: Instructor: Preethi Jyothi
No ratings yet
Acoustic Feature Analysis For ASR: Instructor: Preethi Jyothi
34 pages
2009-03-24 NAG-DAGA Becker Jessen Grigoras
No ratings yet
2009-03-24 NAG-DAGA Becker Jessen Grigoras
4 pages
Using Vocals Determine Human Emotion
From Everand
Using Vocals Determine Human Emotion
Faiz ul haque Zeya
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Project #3 Hypothesis Testing Project
No ratings yet
Project #3 Hypothesis Testing Project
2 pages
Guide To Cyber Threat Modelling
No ratings yet
Guide To Cyber Threat Modelling
28 pages
Betegség, Baleset - A1-C2
No ratings yet
Betegség, Baleset - A1-C2
24 pages
Theme Options
No ratings yet
Theme Options
12 pages
Keratoconus Management Guidelines
No ratings yet
Keratoconus Management Guidelines
40 pages
Wind and Tall Buildings
No ratings yet
Wind and Tall Buildings
25 pages
Chapter 10
No ratings yet
Chapter 10
19 pages
5 Graph Theory
No ratings yet
5 Graph Theory
42 pages
"To Autumn," by John Keats: Prof. S. Peppin, PHD, Ximb
No ratings yet
"To Autumn," by John Keats: Prof. S. Peppin, PHD, Ximb
2 pages
Publish 5
No ratings yet
Publish 5
16 pages
Newman Gentleman
No ratings yet
Newman Gentleman
20 pages
The Modern Mystic Vol. 2 No. 6 July 1938
100% (3)
The Modern Mystic Vol. 2 No. 6 July 1938
54 pages
Activity 1
No ratings yet
Activity 1
2 pages
Q4M2 LP Technichal and Operational Def
No ratings yet
Q4M2 LP Technichal and Operational Def
8 pages
DSM-5 Personality Disorders PDF
0% (1)
DSM-5 Personality Disorders PDF
2 pages
Syllabus CE5312-CE4361 Summer 2016
No ratings yet
Syllabus CE5312-CE4361 Summer 2016
5 pages
Role of Ethics in Counseling PDF
100% (1)
Role of Ethics in Counseling PDF
11 pages
Company Law Tutorial 15 Egrit PLC On Directors' Duties and Derivative Claims
No ratings yet
Company Law Tutorial 15 Egrit PLC On Directors' Duties and Derivative Claims
7 pages
Italian cuisine
No ratings yet
Italian cuisine
2 pages
Use of HL7 To Integrate A HIS-subsystem
No ratings yet
Use of HL7 To Integrate A HIS-subsystem
5 pages
Lumanog v. People
No ratings yet
Lumanog v. People
88 pages
Rejoinder in Bhumiraj Woods AGM Case JTR - M1
No ratings yet
Rejoinder in Bhumiraj Woods AGM Case JTR - M1
5 pages
Mathematical Association of America
No ratings yet
Mathematical Association of America
6 pages
Modern Power Quality Measurements METREL
100% (1)
Modern Power Quality Measurements METREL
80 pages
DIC Case Study
No ratings yet
DIC Case Study
7 pages
Eating & Living Gluten Free - September October 2015
100% (1)
Eating & Living Gluten Free - September October 2015
116 pages
Network Diagramming - OPNET NetMapper
No ratings yet
Network Diagramming - OPNET NetMapper
2 pages
Professional CV
No ratings yet
Professional CV
3 pages

Audio-Based Classification of Speaker Characteristics: Promiti Dutta and Alexander Haubold

Uploaded by

Audio-Based Classification of Speaker Characteristics: Promiti Dutta and Alexander Haubold

Uploaded by

AUDIO-BASED CLASSIFICATION OF SPEAKER CHARACTERISTICS

Promiti Dutta and Alexander Haubold

Columbia University, New York, NY

The human voice is primarily a carrier of speech, but it also

1. INTRODUCTION Our dataset includes student final project presentation

978-1-4244-4291-1/09/$25.00 ©2009 IEEE 422 ICME 2009

All Features Top 5 Features Top 10 Features Female Male

5.2.2. Feature Selection 6. CONCLUSIONS

You might also like