0% found this document useful (0 votes)

5 views

Multimodal Corpora for Silent Speech Interaction

Uploaded by

dalgamuni.1984

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Multimodal Corpora for Silent Speech Interaction

Uploaded by

dalgamuni.1984

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Repositório ISCTE-IUL

Deposited in Repositório ISCTE-IUL:

2022-05-25

Deposited version:
Publisher Version

Peer-review status of attached file:

Peer-reviewed

Citation for published item:

Freitas, J., Teixeira, A. & Dias, J. (2014). Multimodal corpora for silent speech interaction. In Nicoletta
Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion
Moreno, Jan Odijk, Stelios Piperidis (Ed.), Proceedings of the Ninth International Conference on
Language Resources and Evaluation (LREC 2014). (pp. 4507-4511). Reykjavik: European Language
Resources Association (ELRA).

Further information on publisher's website:

Publisher's copyright statement:

This is the peer reviewed version of the following article: Freitas, J., Teixeira, A. & Dias, J. (2014).
Multimodal corpora for silent speech interaction. In Nicoletta Calzolari, Khalid Choukri, Thierry
Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios
Piperidis (Ed.), Proceedings of the Ninth International Conference on Language Resources and
Evaluation (LREC 2014). (pp. 4507-4511). Reykjavik: European Language Resources Association
(ELRA).. This article may be used for non-commercial purposes in accordance with the Publisher's
Terms and Conditions for self-archiving.

Use policy

Creative Commons CC BY 4.0

The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or
charge, for personal research or study, educational, or not-for-profit purposes provided that:
• a full bibliographic reference is made to the original source
• a link is made to the metadata record in the Repository
• the full-text is not changed in any way
The full-text must not be sold in any format or medium without the formal permission of the copyright holders.

Serviços de Informação e Documentação, Instituto Universitário de Lisboa (ISCTE-IUL)

Av. das Forças Armadas, Edifício II, 1649-026 Lisboa Portugal
Phone: +(351) 217 903 024 | e-mail: [email protected]
https://repositorio.iscte-iul.pt
Multimodal Corpora for Silent Speech Interaction
João Freitas1,2, António Teixeira2, Miguel Sales Dias1,3
1
Microsoft Language Development Center, Lisboa, Portugal
2
Dep. Electronics Telecommunications & Informatics/IEETA, University of Aveiro, Portugal
3
ISCTE-Lisbon University Institute, Lisboa, Portugal
E-mail: [email protected], [email protected], [email protected]

Abstract
A Silent Speech Interface (SSI) allows for speech communication to take place in the absence of an acoustic signal. This type of
interface is an alternative to conventional Automatic Speech Recognition which is not adequate for users with some speech
impairments or in the presence of environmental noise. The work presented here produces the conditions to explore and analyze
complex combinations of input modalities applicable in SSI research. By exploring non-invasive and promising modalities, we have
selected the following sensing technologies used in human-computer interaction: Video and Depth input, Ultrasonic Doppler sensing
and Surface Electromyography. This paper describes a novel data collection methodology where these independent streams of
information are synchronously acquired with the aim of supporting research and development of a multimodal SSI. The reported
recordings were divided into two rounds: a first one where the acquired data was silently uttered and a second round where speakers
pronounced the scripted prompts in an audible and normal tone. In the first round of recordings, a total of 53.94 minutes were captured
where 30.25% was estimated to be silent speech. In the second round of recordings, a total of 30.45 minutes were obtained and 30.05%
of the recordings were audible speech.

Keywords: Silent Speech, Multimodal HCI, Data Collection

able to work with speech-handicapped users or elderly

1. Introduction people, for whom speaking requires a substantial effort.
Silent Speech designates the process of speech Based on these requirements, we collected data from four
communication in the absence of an audible and SSI modalities with the following specifications: (1)
intelligible acoustic signal (Denby et al., 2009). By video input, which captures the RGB color of each image
extracting information of the human speech production pixel of the speakers’ mouth region and its surroundings,
process, an SSI is able to interpret and process the including chin and cheeks; (2) depth input, which
acquired data. Several SSI based on different sensory captures depth information of each pixel for the same
types of data have been proposed in the literature (e.g. areas (resulting in our this case, in a 3D point cloud in the
Electro-encephalographic sensors (Porbadnigk et al., sensor reference frame, represented by a grayscale
2009), Electromagnetic Articulography sensors (Fagan et image), providing useful information about the mouth
al. 2008), etc.). Nonetheless, acquiring data from a single opening and tongue position, in some cases; (3) surface
input modality limits the amount of useful information EMG (sEMG) sensory data, which provides information
available for capture and further processing. Furthermore, about the myoelectric signal produced by the targeted
in order to develop a multimodal SSI, it is necessary to facial muscles during speech movements; (4) Ultrasonic
collect data from multiple input modalities in a Doppler Sensing (UDS), a technique which is based on
synchronous way due to the nonexistence of SSI the emission of a pure tone in the ultrasound range
multimodal data available for research. However, towards the speaker’s face, that is received by an
satisfying the requirements and gathering all the ultrasound sensor tuned to the transmitted frequency. The
necessary equipment for collecting such corpora is a reflected signal then contains Doppler frequency shifts
complex and cumbersome task (Hueber et al., 2007). that correlate with the movements of the speaker’s face
Hence, the availability to the community of multimodal (Srinivasan et al., 2010).
corpora would not only allow to increase the number of In literature several studies that combined 2 input
data resources accessible for further research, but would modalities, in addition to audio can be found (e.g. Denby
also pave the way for the development of a multimodal and Stone, (2004) and Tran et al. (2008)). Nonetheless, to
SSI, which could provide a more complete representation the best of our knowledge, this is the first silent speech
of the speech production model behavior during speech. corpora that combines more than two input data types and
The work presented in this paper, creates the the first to synchronously combine the corresponding four
conditions to explore and analyze more complex modalities, thus, providing the necessary information for
combinations of input modalities for SSI research. By future studies and research on multimodal SSIs.
exploring non-invasive and state-of-the-art modalities
such as Ultrasonic Doppler (Srinivasan et al., 2010), we 2. Data Collection Setup
have selected several sensing technologies based on: the After assembling all the necessary data collection
possibility of being used in a natural manner without equipment which, in the case of ultrasound, led us to the
complex medical procedures from the ethical and clinical development of custom built equipment based on the
perspectives, low cost, tolerant to noisy environments and work of Zhu (2008), we needed to create the necessary

4507
conditions to record all signals with adequate Portuguese (EP) (Strevens, 1954).
synchronization. The challenge of synchronizing all
signals resided in the fact that a potential synchronization
event would need to be captured simultaneously by all
(four) input modalities. To that purpose, we have selected
the sEMG recording device, which had an available
output channel, as the source that generates the alignment
pulse for all the remaining modalities. After the data
collection system setup was ready, a database described in
this paper, was collected for further analysis.

2.1 The individual data input modalities

The devices employed in this data collection, depicted in
Figure 1, were: (1) a Microsoft Kinect for Windows that
acquires visual and depth information; (2) an sEMG
sensor acquisition system from Plux (2014), that captures
the myoelectric signal from the facial muscles; (3) a Figure 1: Acquisition devices and laptop with the data
custom built dedicated circuit board (referred to as UDS collection application running.
device), that includes: 2 ultrasound transducers (400ST
and 400SR working at 40 kHz), a crystal oscillator at 7.2
MHz and frequency dividers to obtain 40 kHz and 36
kHz, and all amplifiers and linear filters needed to process
the echo signal (Freitas et al., 2012).
The Kinect sensor was placed at approximately
70cm from the speaker. It was configured, using Kinect
SDK 1.5, to capture a color video stream with a resolution
of 640x480 pixel, 24-bit RGB at 30 frames per second and
a depth stream, with a resolution of 640x480 pixel, 11-bit Figure 2: Surface EMG electrodes positioning and the
to code the Z dimension, at 30 frames per second. Kinect respective channels (1 to 5) plus the reference electrode
was also configured to use the Near Depth range (i.e. (R).
range between 40cm to 300cm) and to track a seated The Ultrasonic Doppler sensing device was placed
skeleton. at approximately 40cm from the speaker and was
The sEMG acquisition system consisted of 5 pairs of connected to an external sound board (Roland, UA-25 EX
EMG surface electrodes connected to a device that in the first setup and a TASCAM US-1641 in the second
communicates with a computer via Bluetooth. As setup) which in turn was connected to the laptop through a
depicted in Figure 2, the sensors were attached to the skin USB connection. Two recording channels of the external
using a single use 2.50cm diameter clear plastic sound board were connected to the I/O channel of the
self-adhesive surfaces and considering an approximate sEMG recording device and to the UDS device. The
2.00cm spacing between the electrodes center for bipolar Doppler echo and the synchronization signals were
configurations. Before placing the surface EMG sensors, sampled at 44.1 kHz and to facilitate signal processing, a
the sensor location was previously cleaned with alcohol. frequency translation was applied to the carrier by
While uttering the prompts no other movement, besides modulating the echo signal by a sine wave and low
the one associated with speech production, was made. The passing the result, obtaining a similar frequency
five electrode pairs were placed in order to capture the modulated signal centered at 4 kHz.
myoelectric signal from the following muscles: the
zygomaticus major (channel 2); the tongue (channel 1 and 2.2 Registration of all input modalities
5), the anterior belly of the digastric (channel 1); the In order to register all the mentioned input modalities via
platysma (channel 4) and the last electrode pair was time alignment between all corresponding input streams,
placed below the ear between the mastoid process and the we have used an I/O bit flag in the sEMG recording
mandible. The sEMG channels 1 and 4 used a monopolar device, which has one input switch for debugging
configuration (i.e. placed one of the electrodes from the purposes and two output connections, as depicted in
respective pair in a location with low or negligible muscle Figure 4. Synchronization occurs when the output of a
activity), being the reference electrodes placed on the synch signal, programmed to be automatically emitted by
mastoid portion of the temporal bone. The positioning of the sEMG device at the beginning of each prompt, is used
the EMG electrodes 1, 2, 4 and 5 was based on previous to drive a led and to provide an additional channel in an
work (e.g. Schultz and Wand, 2010) and sEMG electrode external sound card. Registration between the video and
from channel 3 was placed according to recent findings by depth streams is ensured by the Kinect SDK.
the authors about the detection of nasality in SSIs (Freitas Using the information from the led and the auxiliary
et al., 2014), a distinct characteristic of European audio channel with synch info, the signals were time

4508
aligned offline. To align the RGB video and the depth the extra input channels provide by this device, we
streams with the remaining modalities, we have used an decided to collect a second round of recordings where the
image template matching technique that automatically audio channel from the UDS device is also synchronously
detects the led position on each color frame. acquired. As such, we have collected in this round 3
For the UDS acquisition system, the activation of the speakers, one from the previous data collection and two
output I/O flag of the sEMG recording device, generates a elderly speakers without any history of speech disorders
small voltage peak on the signal of the first channel. To known so far and also native EP speakers. The first
enhance and detect that peak, a second degree derivative speaker was a male with 31 years old and the two elderly
is applied to the signal followed by an amplitude speakers, were two female with 65 and 71 years old,
threshold. To be able to detect this peak, we have respectively. In this second stage of data collection, each
previously configured the external sound board channel speaker recorded two sessions without removing the
with maximum input sensitivity. EMG electrodes or changing the recording position.
The time-alignment of the EMG signals is ensured by the
sEMG recording device, since the I/O flag is recorded in a
synchronous way with the samples of each channel.

Figure 3: TASCAM US-1641 device used in the second

round of recordings.

Before each recording session, the participants

received a 30 minute briefing that included instructions,
speaker preparation and voluntarily signing of a consent
form which accurately described the experiment and its
duration and what kind of data was going to be collected.
Each recording session took between 40 to 60
minutes, generating an average 3.81GB of data per
speaker, that includes: session metadata, such as devices
Figure 4: Diagram of the time alignment scheme showing configuration; RBG and depth information of a 128x128
the I/O channel connected to the three outputs – debug pixel square centered at the mouth center and the
switch, external sound card and a directional led. coordinates of 100 facial points, in the sensor reference
frame, for each Kinect image; sEMG data from the 5
2.3 Acquisition Methodologies available channels; two channel wave per prompt
The recordings took place in a quiet room with controlled containing the UDS and the synchronization signals; and
illumination and an assistant responsible for monitoring a compressed video of the whole session. In the second
the data acquisition and also for pushing a record/stop round of recordings, we recorded a three channel wave
button in the recording tool interface in order to avoid containing the audio from the UDS device microphone,
unwanted muscle activity. the Ultrasonic and the synchronization signal.
The data acquisition is divided into two distinct
rounds hereon referred as first and second round of 2.4 Corpora
recordings. The main difference between them is the For this data collection we have selected a vocabulary of
acquisition of an audible acoustic signal (second round) 32 EP words, which can be divided into 3 distinct sets.
versus silently articulating the words (first round). The first set, used in previous literature work for other
The first round of our database contains the languages (e.g. (Srinivasan et al., 2010) and for EP in
recordings of 9 sessions of 8 native EP speakers (one prior work of the authors (Freitas et al., 2012), consists of
speaker recorded two sessions) - 2 female and 6 male – 10 digits from zero to nine. The second set contains 4
with no history of hearing or speech disorders, with an age minimal pairs of common words in EP that only differ on
range from 25 to 35 years old and an average age of 30 nasality of one of the phones (minimal pairs regarding this
years. Due to hardware limitations and the differences characteristic, e.g. Cato/Canto [katu]/[kɐ̃tu] or Peta/Penta
found between silently articulated speech and audible [petɐ]/[pẽtɐ] – see (Freitas et al., 2011) for more details),
uttered speech related with the lack of acoustic feedback and is directly related with previous investigation by the
(Wand and Schultz, 2011), in this first round we have authors on the detection of nasality with SSIs. Table 1
chosen to record only silent speech. Thus, no audible shows the last (third) set, with 14 common words in EP,
acoustic signal was produced by the speakers during the taken from context free grammars of an Ambient Assisted
recordings and only one speaker had past experience with Living (AAL) application that supports speech input and
silent articulation. chosen based on past experiences of the authors (Teixeira
In a second round, the previous sound card was et al., 2012). A total of 99 scripted prompts per session
replaced by a TASCAM US-1641, as depicted in Figure 3, were presented to the speaker (three additional silence
and for comparison purposes and by taking advantage of prompts were also included in the beginning, middle and

4509
end of the session), in a random order with each prompt Total Recorded Silent
being pronounced individually, in order to allow isolated Word Set Non-Speech
Duration (minutes) Speech
word recognition. All prompts were repeated 3 times per Digits 15.28 26.78% 73.22%
recording session.
Nasal Pairs 13.02 28.90% 71.10%
AAL 25.63 33.00% 67.00%
Ambient Assisted Living Word Set
All word sets 53.94 30.25% 69.75%
Videos Ligar Contatos Mensagens Voltar
(Videos) (Call/Dial) (Contacts) (Messages) (Back)
Table 2: Audio duration, speech time and non-speech time
Pesquisar Anterior Fotografias Família Ajuda
distribution by word set (excluding silence utterances) for
(Search) (Previous) (Photographs) (Family) (Help)
the first round of recordings.
Seguinte Lembretes Calendário E-Mail
-
(Next) (Reminders) (Calendar) (E-Mail) 3.2 Second Round of Recordings
In the second round of recordings since synchronously
Table 1: Set of words of the EP vocabulary, extracted from acquired audio was available the estimation of the speech
AAL contexts. and non-speech characteristics was performed based on
the manual annotation of the speech signal by the first
3. Characterization of the Acquired author. As described in Table 3, in this second round, a
Database total elapsed duration of 30.45 minutes, with an average
In this section we present some statistics of the acquired duration of 5.07 minutes per session and 3.17 seconds per
utterance.
data. In the first round of recordings no audio was
collected thus an automatic algorithm was used to
estimate speech statistics. For the second round of Total Recorded
Word Set Speech Non-Speech
recordings, audible utterances were recorded and the Duration (minutes)
audio was used as auxiliary information for manually Digits 8.78 28.13% 71.87%
annotating the data. Nasal Pairs 7.48 71.87% 73.13%
AAL 14.20 32.91% 67.09%
3.1 First Round of Recordings All Word Sets 30.45 30.05% 69.95%
The data collected in the first round of recordings has a
total elapsed duration of 56.11 minutes, with an average Table 3: Audio duration, speech time and non-speech time
duration of 5.99 minutes per session and 3.74 seconds per distribution by word set (excluding silence utterances) for
utterance, not considering silence utterances. By applying the second round of recordings.
a Voice Activity Detection (VAD) technique based on
UDS alone, we estimate that 30.25% is silent speech (i.e. In Table 4 the session statistics for the first and
continuous facial movements) and that 69.75% is the second round are presented. Based on these values, a
silence before and after each utterance. The VAD larger duration of the sessions were only silent speech was
algorithm uses the energy of the UDS pre-processed considered, can be noticed. This suggests a slower
spectrum information around the carrier and a mean articulation when no acoustic feedback, however it might
reference value extracted from the silence prompts of also be related or influenced by the lack of experience
each speaker to distinguish silent articulation. Each verified in most speakers when articulating the words
session presents an average speech duration of 1.81 without any acoustic feedback.
minutes and 4.18 minutes of non-speech. The female
speakers had an average speech duration of 42.79% per Average Average
Average
session, while this figure for male speakers was only Duration Speech
Non-Speech
23.29%. Table 2 details the audio duration of the collected Data Collection Stage per per
per session
data by word set. session session
(minutes)
(minutes) (minutes)
1st round 5.99 1.81 4.18
2nd round 5.07 1.52 3.55

Table 4: Audio duration, speech time and non-speech time

distribution by word set (excluding silence utterances) for
the second round of recordings.

If instead of estimating the characteristics of the first

round based on the automatic algorithm we use the
speech/non-speech distribution estimated in the second
round. Then, by applying it to the average duration per
session of the first round, we get a 1.80 minutes of speech

4510
and 4.19 minutes for non-speech data, a similar result to Speech Recognition based on Ultrasonic Doppler
what was obtained using the UDS algorithm. Sensing for European Portuguese”, Advances in
Speech and Language Technologies for Iberian
4. Conclusion Languages, vol. CCIS 328, Springer, 2012.
This paper describes a multimodal data collection with 5 Freitas, J., Teixeira, A., Silva, S., Oliveira, C., Dias, M.S.
streams of data: Video, Depth, Surface EMG, Ultrasonic (2014). Velum Movement Detection based on Surface
Doppler Sensing and audio. By using the surface EMG Electromyography for Speech Interface”, Proceedings
recording device we were able to synchronously combine of Biosignals 2014, Angers, France.
these silent speech modalities and acquire information Hueber, T., Chollet, G., Denby, B., Stone, M. and Zouari,
from multiple stages of the human speech production L. (2007). Ouisper: Corpus Based Synthesis Driven by
model. The data collection is divided into two rounds of Articulatory Data. International Congress of Phonetic
recordings: in a first round only silent speech (i.e. no Sciences, Saarbrücken, pp. 2193--2196.
acoustic signal was produced by the speaker) was Plux Wireless Biosignals, Portugal (2014). Online:
recorded; in a second set of recordings, audible speech http://www.plux.info/, accessed on 17 March 2014.
was captured in addition to the remaining modalities. We Porbadnigk, A., Wester, M., Calliess, J. and Schultz, T.
have also used an algorithm based on UDS energy for (2009). EEG-based speech recognition impact of
estimating total speech time in the absence of the acoustic temporal effects. Biosignals 2009, Porto, Portugal,
signal and some statistics of how the data was distributed. pp.376--381.
Schultz, T. and Wand, M. (2010). Modeling coarticulation
5. Future Work in large vocabulary EMG-based speech recognition.
Speech Communication, 52(4), pp. 341--353.
The collected data opens several doors in terms of future
Srinivasan, S., Raj, B. and Ezzat, T. (2010). Ultrasonic
research. This data will potentially allow for the
sensing for robust speech recognition. Internat. Conf.
development of a multimodal SSI based on these
on Acoustics, Speech, and Signal Processing, pp.
modalities, where the strongest points of one modality can
5102—5105.
eventually help to minimize the weakest point of other(s).
Strevens, P. (1954). Some observations on the phonetics
It will also allow looking at other types of information,
and pronunciation of modern Portuguese, Rev.
beyond the acoustic signal, for interesting research issues,
Laboratório Fonética Experimental, Coimbra II, pp.
such as elderly speech characteristics and nasal sounds
5--29.
production and recognition.
Tran, V.A., Bailly, G., Loevenbruck, H. and Jutten, C.
(2008). Improvement to a NAM captured
6. Acknowledgements whisper-to-speech system. Proceedings of Interspeech
This work was partially funded by Marie Curie Actions 2008, pp.1465-1468.
Golem (ref.251415, FP7-PEOPLE-2009-IAPP) and IRIS Wand, M. Schultz, T. (2011). Investigations on Speaking
(ref. 610986, FP7-PEOPLE-2013-IAPP), by FEDER Mode Discrepancies in EMG-based Speech
through the Program COMPETE under the scope of Recognition. Proceedings of Interspeech 2011,
QREN 5329 FalaGlobal and by National Funds (FCT- Florence, Italy.
Foundation for Science and Technology) in the context of Zhu, B. (2008). Multimodal speech recognition with
IEETA Research Unit funding ultrasonic sensors. Master’s thesis, Massachusetts
FCOMP-01-0124-FEDER-022682 Institute of Technology, Cambridge, Massachusetts.
(FCT-PEst-C/EEI/UI0127/2011). The authors would also
like to thank the experiment participants.

7. References
Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert,
J.M., Brumberg, J.S. (2009). Silent speech interfaces.
Speech Communication, 52(4), pp. 270--287.
Denby, B., Stone, M. (2004). Speech synthesis from real
time ultrasound images of the tongue. Internat. Conf.
on Acoustics, Speech, and Signal Processing, Montreal,
Canada, 1, pp. I685--I688.
Fagan, M. J., Ell, S. R., Gilbert, J. M., Sarrazin, E. and
Chapman, P.M. (2008). Development of a (silent)
speech recognition system for patients following
laryngectomy. Med. Eng. Phys., 30(4), pp. 419–425.
Freitas, J. Teixeira, A. Dias M. S. and Bastos, C. (2011).
Towards a Multimodal Silent Speech Interface for
European Portuguese. Speech Technologies, InTech.
Freitas, J. Teixeira, A., Vaz, F. and Dias, M.S., “Automatic

4511

Respiratory Critical Care
100% (8)
Respiratory Critical Care
323 pages
Effect of Hip Joint Position On Electromyographic.6
No ratings yet
Effect of Hip Joint Position On Electromyographic.6
6 pages
Exploring Silent Speech Interfaces Based On
No ratings yet
Exploring Silent Speech Interfaces Based On
17 pages
10 1 1 16 5972 PDF
No ratings yet
10 1 1 16 5972 PDF
4 pages
Silent Speech Interfaces: B. Denby, T. Schultz, K. Honda, T. Hueber, J.M. Gilbert, J.S. Brumberg
No ratings yet
Silent Speech Interfaces: B. Denby, T. Schultz, K. Honda, T. Hueber, J.M. Gilbert, J.S. Brumberg
18 pages
Silent Speech Interface: Fundamentals and Applications
From Everand
Silent Speech Interface: Fundamentals and Applications
Fouad Sabry
No ratings yet
Silent Speech Interface Using Facial Recognition and Electromyography
100% (2)
Silent Speech Interface Using Facial Recognition and Electromyography
15 pages
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Citation For The Published Contribution:: An International Handbook. Berlin: Mouton de Gruyter. P 207-225
No ratings yet
Citation For The Published Contribution:: An International Handbook. Berlin: Mouton de Gruyter. P 207-225
19 pages
Knight 2011 A Future of Mulimodal Corpus
No ratings yet
Knight 2011 A Future of Mulimodal Corpus
25 pages
Modern Speech Recognition Approa
No ratings yet
Modern Speech Recognition Approa
337 pages
1
No ratings yet
1
3 pages
2000 The processing of information from multiple sources in simultaneous interpreting Jesse intp.5.2.04jes
No ratings yet
2000 The processing of information from multiple sources in simultaneous interpreting Jesse intp.5.2.04jes
21 pages
Mulitimodal Corpora
No ratings yet
Mulitimodal Corpora
20 pages
sensors-22-04473
No ratings yet
sensors-22-04473
13 pages
Text-to-Speech (TTS) System
No ratings yet
Text-to-Speech (TTS) System
11 pages
Silent Sound Technology: Yamini.A Student Computer Science and Engineering R. M. D Engineering College, Chennai, India
No ratings yet
Silent Sound Technology: Yamini.A Student Computer Science and Engineering R. M. D Engineering College, Chennai, India
4 pages
A Voice Trigger System Using Keyword and Speaker Recognition
No ratings yet
A Voice Trigger System Using Keyword and Speaker Recognition
73 pages
Untitled 1
No ratings yet
Untitled 1
7 pages
Primacy of Multimodal Speech Perception
No ratings yet
Primacy of Multimodal Speech Perception
16 pages
Physiological Background of Speech Decoding Processes in The Brain 93 112
No ratings yet
Physiological Background of Speech Decoding Processes in The Brain 93 112
14 pages
Synopsis
No ratings yet
Synopsis
11 pages
Corpus Phonetics
No ratings yet
Corpus Phonetics
19 pages
Haugh_Designing a Multimodal Spoken Component of the Australian National Corpus
No ratings yet
Haugh_Designing a Multimodal Spoken Component of the Australian National Corpus
14 pages
psllt-15408-biber
No ratings yet
psllt-15408-biber
3 pages
Preprocessing Signal
No ratings yet
Preprocessing Signal
6 pages
A STUDY OF TEXT TO SPEECH SYSTEMS FOR
No ratings yet
A STUDY OF TEXT TO SPEECH SYSTEMS FOR
7 pages
The Main Principles of Text-to-Speech Synthesis System: January 2010
No ratings yet
The Main Principles of Text-to-Speech Synthesis System: January 2010
8 pages
ALLWOOD 2008 Multimodal Corpora
No ratings yet
ALLWOOD 2008 Multimodal Corpora
19 pages
Silent Sound Technology Word Docut
No ratings yet
Silent Sound Technology Word Docut
12 pages
Speech Enhancement
No ratings yet
Speech Enhancement
4 pages
Blessie Research
No ratings yet
Blessie Research
8 pages
Speech Recog Intro
No ratings yet
Speech Recog Intro
9 pages
S.Jairam Krishna (Name) 18311A04AV (REG - NO.) : Silent Sound Technology (Title)
No ratings yet
S.Jairam Krishna (Name) 18311A04AV (REG - NO.) : Silent Sound Technology (Title)
23 pages
The Psychology of Linguistic Form: Sounds
No ratings yet
The Psychology of Linguistic Form: Sounds
15 pages
Silent Sound Technology Report
No ratings yet
Silent Sound Technology Report
23 pages
Festival Hindi Pxc3893287
No ratings yet
Festival Hindi Pxc3893287
6 pages
Fin Irjmets1656649207
No ratings yet
Fin Irjmets1656649207
4 pages
A Multimedia Speech Corpus For Audio Visual Research in Virtual Reality (L)
No ratings yet
A Multimedia Speech Corpus For Audio Visual Research in Virtual Reality (L)
5 pages
Physical_models_of_the_human_vocal_tract_as_tools_
No ratings yet
Physical_models_of_the_human_vocal_tract_as_tools_
6 pages
Voice Recognition System Speech To Text
No ratings yet
Voice Recognition System Speech To Text
5 pages
09205294-Silent Speech Interface (Topic-3)
No ratings yet
09205294-Silent Speech Interface (Topic-3)
27 pages
msp.1982.28454
No ratings yet
msp.1982.28454
6 pages
Noise Suppression in Tele-Lectures Using Bi-Modal Feature Extraction
No ratings yet
Noise Suppression in Tele-Lectures Using Bi-Modal Feature Extraction
8 pages
Text To Speech Conversion: Muhammad Amar (19L-1916)
No ratings yet
Text To Speech Conversion: Muhammad Amar (19L-1916)
4 pages
Information Theoretic Feature Extraction For Audio-Visual Speech Recognition
No ratings yet
Information Theoretic Feature Extraction For Audio-Visual Speech Recognition
12 pages
Silent Sound Technology: A Solution To Noisy Communication: Priya Jethani, Bharat Choudhari
No ratings yet
Silent Sound Technology: A Solution To Noisy Communication: Priya Jethani, Bharat Choudhari
3 pages
Seminar Paper On Ai
No ratings yet
Seminar Paper On Ai
4 pages
Marathi Speech Synthesis A Review
No ratings yet
Marathi Speech Synthesis A Review
4 pages
The Use of Speech Synthesis in Exploring DifferentThe Use of Speech Synthesis in Exploring Different
No ratings yet
The Use of Speech Synthesis in Exploring DifferentThe Use of Speech Synthesis in Exploring Different
12 pages
Non-Contact Speech Recovery Technology Using A 24
No ratings yet
Non-Contact Speech Recovery Technology Using A 24
16 pages
TEST-1
No ratings yet
TEST-1
77 pages
Research paper
No ratings yet
Research paper
9 pages
Logoaudiometria
No ratings yet
Logoaudiometria
9 pages
Speech Trainer Kit Using Laryngeal Vibrations
No ratings yet
Speech Trainer Kit Using Laryngeal Vibrations
5 pages
Text To Speech Synthesis TTS
No ratings yet
Text To Speech Synthesis TTS
7 pages
Assignment On Speech
No ratings yet
Assignment On Speech
9 pages
Speech Recognition: For The Human Linguistic Concept, See
No ratings yet
Speech Recognition: For The Human Linguistic Concept, See
10 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
Application of Speech Processing For Pathological Voice Detection and Analysis
No ratings yet
Application of Speech Processing For Pathological Voice Detection and Analysis
9 pages
Speech Perception Notes
No ratings yet
Speech Perception Notes
13 pages
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition
No ratings yet
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition
6 pages
Surface Electromyographic Activation Patterns And.27
No ratings yet
Surface Electromyographic Activation Patterns And.27
11 pages
Muscle Activation During Several Battle Rope Exercises: The Study
No ratings yet
Muscle Activation During Several Battle Rope Exercises: The Study
6 pages
Aei-5th & 6TH Sem23062008
No ratings yet
Aei-5th & 6TH Sem23062008
44 pages
Factors Affecting Body Alignment & Activity 1. 2. 3.
100% (2)
Factors Affecting Body Alignment & Activity 1. 2. 3.
5 pages
Rhythmic Myoclonus of The Right Arm As
No ratings yet
Rhythmic Myoclonus of The Right Arm As
10 pages
DR Sabbahi Brochure CE 1
No ratings yet
DR Sabbahi Brochure CE 1
2 pages
Neck Posture During Lifting and Its Effect On Trunk Muscle Activation
No ratings yet
Neck Posture During Lifting and Its Effect On Trunk Muscle Activation
6 pages
An EMG Controlled Bionic Arm With Machine Learning
No ratings yet
An EMG Controlled Bionic Arm With Machine Learning
6 pages
2D Robotic Arm Control Using Emg Signal
No ratings yet
2D Robotic Arm Control Using Emg Signal
5 pages
Interpreting EMG and NCS Results
No ratings yet
Interpreting EMG and NCS Results
4 pages
Gesture Based NUI Proposal
No ratings yet
Gesture Based NUI Proposal
31 pages
Influence of Gender
No ratings yet
Influence of Gender
10 pages
module 2 (2)
No ratings yet
module 2 (2)
21 pages
Immediate download (Ebook) Electrodiagnosis in Diseases of Nerve and Muscle: Principles and Practice 3rd Edition by Jun Kimura M.D. ISBN 9780195129779, 0195129776 ebooks 2024
100% (2)
Immediate download (Ebook) Electrodiagnosis in Diseases of Nerve and Muscle: Principles and Practice 3rd Edition by Jun Kimura M.D. ISBN 9780195129779, 0195129776 ebooks 2024
76 pages
Vojta Therapy
No ratings yet
Vojta Therapy
12 pages
Effects of Six-Week Exercise Training Protocol On Pain Relief in Patients With Lumbar Disc Herniation
No ratings yet
Effects of Six-Week Exercise Training Protocol On Pain Relief in Patients With Lumbar Disc Herniation
7 pages
LAB# 2 and 3 – Electrode Placement for PSG Study
No ratings yet
LAB# 2 and 3 – Electrode Placement for PSG Study
9 pages
Instant ebooks textbook Electromyography and Neuromuscular Disorders: Clinical-Electrophysiologic-Ultrasound Correlations 4th Edition Edition David Preston - eBook PDF download all chapters
100% (2)
Instant ebooks textbook Electromyography and Neuromuscular Disorders: Clinical-Electrophysiologic-Ultrasound Correlations 4th Edition Edition David Preston - eBook PDF download all chapters
44 pages
PAPER (ENG) - Pseudodysphagia Due To Omohyoid Muscle Syndrome
No ratings yet
PAPER (ENG) - Pseudodysphagia Due To Omohyoid Muscle Syndrome
6 pages
Respiratory Physiology & Neurobiology: Short Communication
No ratings yet
Respiratory Physiology & Neurobiology: Short Communication
4 pages
BM482 Biomedical Instrumentation
No ratings yet
BM482 Biomedical Instrumentation
2 pages
Why Facebookissosuccessful
No ratings yet
Why Facebookissosuccessful
11 pages
Birth Brachial Plexus Palsy Update
No ratings yet
Birth Brachial Plexus Palsy Update
8 pages
Technical 430
No ratings yet
Technical 430
18 pages
TruTrace EMG Brochures
No ratings yet
TruTrace EMG Brochures
8 pages
Placement Test: Passage 1: What Are You Doing This Weekend?
No ratings yet
Placement Test: Passage 1: What Are You Doing This Weekend?
11 pages
Drop Foot, An Unexpected Complication of Vaginal Hysterectomy
No ratings yet
Drop Foot, An Unexpected Complication of Vaginal Hysterectomy
4 pages
Effect of Long-Term Isometric Training On Core Torso Stiffness
No ratings yet
Effect of Long-Term Isometric Training On Core Torso Stiffness
12 pages

Multimodal Corpora for Silent Speech Interaction

Uploaded by

Multimodal Corpora for Silent Speech Interaction

Uploaded by

Repositório ISCTE-IUL

Deposited in Repositório ISCTE-IUL:

Peer-review status of attached file:

Citation for published item:

Further information on publisher's website:

Publisher's copyright statement:

Creative Commons CC BY 4.0

Serviços de Informação e Documentação, Instituto Universitário de Lisboa (ISCTE-IUL)

Keywords: Silent Speech, Multimodal HCI, Data Collection

able to work with speech-handicapped users or elderly

2.1 The individual data input modalities

Figure 3: TASCAM US-1641 device used in the second

Before each recording session, the participants

Table 4: Audio duration, speech time and non-speech time

If instead of estimating the characteristics of the first

You might also like