0% found this document useful (0 votes)

179 views

Speech Application Language Tags

Speech Application Language Tags (SALT) is a small set of XML elements that can be embedded into host programming languages to enable speech input and output capabilities. This allows users to speak to and listen to responses from a computer application. SALT can be used to develop voice-only or multimodal applications and provides more control than existing standards like VoiceXML. The SALT specification was published in 2002 with the goal of making it easier for developers to add speech functionality to their applications.

Uploaded by

frd12345

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views

Speech Application Language Tags

Uploaded by

frd12345

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 13

SPEECH PROCESSING

ABSTRACT:

Speech Application Language Tags (SALT) is a small number of XML

elements that may be embedded into host programming languages to speech-enable

applications which enables the users to speak and listen to a computer will greatly enhance

the ability for users to access computers at any time from nearly any place. SALT may be

used to develop telephony (speech input and output only) applications and multimodal

applications (speech input and output, as well as keyboard and mouse input and display

output). SALT and the host programming language provide control structures not available in

VoiceXML, the current standard language for developing speech applications.

INTRODUCTION:

Speaking and listening is so fundamental that people take it for granted.

Everyday people ask questions. They give instructions. Speaking and listening are necessary

for learning and training, for selling and buying, for persuading and agreeing, and for most

social interactions. For the majority of people, speaking and understanding spoken speech is

simply the most convenient and natural way of interacting with other people.

So, is it possible to speak and listen to a computer?

Yes.

Emerging technology enables users to speak and listen to the computer now. Speech

recognition converts spoken words and phrases into text, and speech synthesis converts text

to human-like spoken words and phrases. While speech recognition and synthesis have long

been in the research stage, three recent advances have enabled speech recognition and

synthesis technologies to be used in real products and services: (1) faster, more powerful

computer technology, (2) improved algorithms using speech data captured from the real

world, and (3) improved strategies for using speech recognition and speech synthesis in

conversational dialogs enabling users to speak and listen to the computer.

MOTIVATION FOR SPEAKING AND LISTENING TO A COMPUTER :

Speech applications enable users to speak and listen to a computer despite physical

impairments such as blindness or poor physical dexterity. Speaking enables impaired callers

to access computers. Callers with poor physical dexterity (who cannot type) can use speech to

enter requests to the computer. The sight-impaired can listen to the computer as it speaks.

When visual and/or mechanical interfaces are not an option, callers can perform transactions

by saying what they want done and supplying the appropriate information. If a person with

impairments can speak and listen, that person can use a computer to bypass the limitations of
small keyboards and screens. As devices become smaller, our fingers do not. Keys on the

keypad shrink often to the point where people with thick fingers press two or more keys with

one finger stroke. The small screens on some cell phones may be difficult to see, especially

in extreme lighting conditions. Even PDAs with QWERTY keyboards are awkward.

(QWERTY is a sequence of six keys found on traditional keyboards used by most English and

Western-European language speakers.) Users hold the device with one hand and “hunt and

peck” with the forefinger of the other hand. It is impossible to use both hands to touch-type

and hold the device at the same time. By speaking, callers can bypass the keypad (except

possibly for entering private data in crowded or noisy environments). By speaking and

listening, callers can bypass the small screen of many handheld electronic devices.

IF THE DEVICE HAS NO KEYBOARD :

Many devices have no keypad or keyboard. For

example, stoves, refrigerators, and heating and air conditioning thermostats have no

keyboards. These appliances may have a small control panel with a couple of buttons and a

dial. The physical controls are good for turning the appliance on and off and adjusting its

temperature and time. Without speech, a user cannot specify complex instructions such as,

“turn the temperature in the oven to 350 degrees for 30 minutes, then change the temperature

to 250 degrees for 15 minutes, and finally leave the oven on warm.” Without speech, the

appliance cannot ask questions such as, “When on Saturday morning do you turn the heat

on?” Any sophisticated dialog with these appliances will require speech input. And speech

can be used with rotary phones, which do not have a keypad.

WHILE CALLERS WORK WITH THEIR HANDS AND EYES :

Speaking and listening are

especially useful in situations where the caller’s eyes and/or hands are busy. Drivers need to

keep their eyes on the road and their hands on the steering wheel. If they must use a computer

when driving, the interface should be speech only. When driving machines requiring their

hands to operate controls and their eyes to focus on the machine activities, machine operators

can also use speech to communicate with a computer. (Although is it not recommended that

you hold and use a cell phone while driving a car.) Mothers and caregivers with children in

their arms may also appreciate speaking and listening to a doctor’s Web page or medical

service. If a person can speak and listen to others while they work, they can speak and listen

to a computer while they work.

AT ANYTIME DURING THE DAY :

Many telephone help lines and receptionists are

available only during working hours. Computers can automate much of this activity, such as

accepting messages, providing information, and answering callers’ questions. Callers can

access these automated services 24 hours a day, 7 days a week via a telephone by speaking

and listening to a computer. If a person can speak and listen, they can interact with a

computer anytime

during the day or night.

WITH INSTANT CONNECTION WITHOUT BEING PLACED ON “HOLD:”

Callers become

frustrated when they hear “your call is very important to us” because this message means

they must wait. “Thanks for waiting, all of our operators are busy” means more waiting.
When using speech to interact with an application, there are no hold times. The computer

responds quickly. (However, computers can become saturated which results in delays; but

these occur less frequently than callers waiting for a human operator.) Because many callers

can be serviced by voice-enabled applications, the human operators are freed to resolve more

difficult caller problems.

USING LANGUAGES THAT DO NOT LEND THEMSELVES TO KEYBOARDING :

Some languages do not lend themselves to data entry using the traditional QWERTY

keyboard. Rather than force Asian language users to mentally translate their words and

phrases to phonetic sounds and then press the corresponding keys on the QWERTY

keyboard, a much better solution is to speak and listen. Speech and handwriting recognition

will be the key to enabling Asian language speakers to gain full use of computers. If a person

can speak and listen to an Asian language, they can interact with a computer using that

language.

TO CONVEY EMOTION:

In an effort to enhance written text to convey emotions, callers

frequently use emoticons — keyboard symbols to convey emotions to enhance their text

messages. Example emoticons include :) for happy or a joke and :( for sad. With speech,

these emotions can be conveyed naturally by changing the inflection, speed, and volume of

the speaking voice.

TO USE MULTIPLE CHANNELS OF COMMUNICATION BETWEEN USER AND COMPUTER :

Speech enhances traditional GUI user interfaces by enabling users to speak

as well as click
and type, and hear as well as read. Multimodal user interfaces will improve the exchange of

information between users and computers by transferring information in the most appropriate

mode—speech for simple requests and simple answers, and GUIs for complex requests and

graphical and pictorial answers.

LANGUAGES FOR SPEECH APPLICATIONS:

This new environment led to the creation of VoiceXML, an XML-based declarative language

for describing the exchange of spoken information between users and computers and related

languages. The related languages include the Speech Recognition Grammar Specification

(SRGS) for describing what words and phrases the computer should listen for and the Speech

Synthesis Markup Language (SSML) for describing how text should be rendered as verbal

speech. VoiceXML is widely used to develop voice-only user interfaces for telephones and

cell phones users.

VoiceXML uses predefined control structures, enabling developers to specify what should be

spoken and heard, but not the low level details of how those operations occur. As is the case

with many special-purpose declarative languages, developers sometimes prefer to write their

own procedural instructions. Speech Application Language Tags (SALT) was developed to

enable Web developers to use traditional Web development languages to specify the control

and use a small number of XML elements for managing speech. In addition for use with

telephony applications, SALT can also be used for multimodal applications where people use

multiple modes of input—speaking, as well as typing and selecting (pointing).

SALT:

The SALT Forum [http://www.saltforum.org/] originally consisting of Cisco,

Comverse, Intel, Microsoft, Philips, and SpeechWorks (now ScanSoft), published the initial

specificationin June 2002. This specification was contributed to the World Wide Web

Consortium (W3C) in August of that year. Later in June 2003, the SALT Forum contributed a

SALT profile for Scalar Vector Graphics (SVG) to the W3C.

The SALT specification contains a small number of XML elements enabling speech output to

the user, called prompts, and speech input form the user, called responses. SALT elements

include:

• <prompt>—presents audio recordings and synthesized speech to the user. SALT

also

contains a prompt queue and commands for managing the presentation of prompt on
the queue to the user.

• <listen>—recognizes spoken words and phrases. There are three listen modes:

Automatic—used for recognition in telephony or hands-free scenarios. The speech

platform rather than the application controls when to stop the recognition facility.

Single—used for push-to-talk applications. An explicit stop from the application

returns the recognition result.

Multiple—used for “open-microphone” or dictation applications. Recognition results

are returned at intervals until the application makes an explicit stop.

• <grammar>—specifies the words and phrases a user might speak

• <dtmf>—recognizes DTMF (telephone touch-tones)

• <record>—captures spoken speech, music, and other sounds

• <bind>—integrates recognized words and phrases with application logic

• <smex>—communicates with other platform components

SALT designers subsetted the SALT functionality into multiple profiles that are implemented

and used independently of the remaining SALT modules. Various devices may use different

combinations of profiles. Devices with limited processor power or memory need not support

all features (for example, mobile devices do not need to support dictation). Devices may be

tailored to particular environments (for example, telephony support may not be necessary for

television set-top boxes). While full application portability is possible within devices using

the same profile, there is limited portability across devices with different profiles.

SALT has no control elements, such as <for> or <goto>, so developers embed SALT

elements into other languages, called host languages. For example, SALT elements may be

embedded into languages such XHTML, SVG, and JavaScript. Developers use the host

language to specify application functions and execution control while the SALT elements

provide advanced input and output using speech recognition and speech synthesis.

ARCHITECTURES FOR SALT APPLICATIONS:

Users interact with telephony

applications using a telephone, cell phone, or other mobile device with a microphone and

speaker. The hardware architecture for telephony applications, illustrated in Figure 1,

contains:

• Web server—contains HTML, SALT and embedded scripts. The scripts control the
dialog flow, such as the order for playing audio prompts to the caller.

• Telephony server—connects the IP network (and the speech server) to the

telephone network

• Speech server—contains a speech recognition engine which converts spoken

speech into text, a speech synthesis engine which converts text to human-sounding

speech, and an audio subsystem for playing prompts and responses back to the user.

• Client devices—device to which to user listens and speaks, such as for example

mobile telephones and telephony-enabled PDAs.

There are numerous variations for the architecture shown in Figure 1. A small speech

recognition engine could reside in the user device (for example, to recognize a small number

of command and control instructions), or it may be distributed across the device and speech

server (the device performs DSP functions on spoken speech, extracting “speech features”

that are transmitted to the speech server which concludes the speech recognition processing).

The various servers may be combined or replicated depending upon the workload. And the

telephony server could by replaced by internet connections to speech-enabled desktop

devices, bypassing the telephone communication system entirely.

Some mobile devices—and most desktop devices—have screens and input devices such as

keyboard, mouse, and stylus. These devices support multimodal applications, which support

more than one mode of input from the user, including keyed text, handwriting and pen

gestures, and spoken speech.

TELEPHONY AND MULTIMODAL APPLICATIONS USING SALT:

Figure 2 illustrates a sample telephony application written with SALT elements embedded in

HTML. The bolded code in Figure 2 will be replaced by the bolded code in Figure 3, which
illustrates the same application as a multimodal application.

Figure 3 illustrates a typical multimodal application written with SALT embedded in HTML.

In this application, the user may either speak or type to enter values into the text boxes. Note

that the code in Figure 3 is somewhat different from the code in Figure 2. This is because

many telephony applications are system-directed (the system guides the user by asking

questions which the user answers), while as with visual-only applications, multimodal

applications are often user-directed (the user indicates which data will be entered by clicking

a mouse or pointing with a stylus, and then entering the data).

Programming with SALT is different from programming traditional visual applications in the

following ways:

• If the developer does not like how the speech synthesizer renders text as human-

understandable voice, the developer may add Speech Synthesis Markup language

(SSML) elements to the text to provide hints for the speech synthesis system. For

example, the user could insert a <break time = "500ms"/> element to instruct the

speech synthesizer to remain silent for 500 milliseconds. SSML is a W3C standard

and is used by both SALT and VoiceXML 2.0/2.1.

The developer must supply a grammar to describe the words and phrases users are

likely to say. Grammars help the speech recognition system recognize words faster

and more accurately. SALT (and VoiceXML 2.0/2.1) developers specify grammars

using the Speech Recognition Grammar Specification (SRGS), another W3C

standard. An example grammar is illustrated in Figure 2, lines 44–54. Application

developers should spend effort to fine-tune the specification of grammars to recognize

words frequently spoken by the user at each point in the dialog, as well as fine-tune

the wording of the prompts to encourage users to speak those words and phrases.
• Speech recognition systems do not understand spoken speech perfectly. (Even

humans occasionally misunderstand what others say.) In the best circumstances,

speech recognition engines fail to accurately recognize three to five percent of spoken

words. Developers compensate for poor speech recognition by writing event handlers

to assist users in overcoming speech recognition problems by prompting the user to

speak again, often rephrasing the question differently so the user responds by saying

different words. Example event handlers are illustrated in Figure 2, lines 35–37 and

lines 38–40. Developers may spend as much as 30 to 40 percent of their time writing

event handlers which are needed occasionally but are essential when the speech

recognition system fails.

COMPARISON OF SALT WITH VOICEXML:

SALT and VoiceXML enable very different approaches for developing speech applications.

SALT tags control the speech medium (speech synthesis, speech recognition, audio capture,

audio replay, and DTMF recognition). SALT tags are often be embedded into another

language that specifies flow control and turn taking. On the other hand, VoiceXML is a

stand-alone language which controls the speech medium as well as flow control and turn-

taking.

In VoiceXML the details of flow control are managed by an a special algorithm called the

Forms Interpretation Algorithm. For this reason, many developers consider VoiceXML a

declarative language. On the other hand, SALT is frequently embedded into a procedural

programming language. Many developers consider the programming languages into which

SALT is embedded to be procedural. It should be noted, however, that SALT can be used as

a stand-alone declarative language by using the assignment and conditional features of the

<bind> statement. Thus, SALT can be used in resource-scarce platforms such as cell phones
that cannot support a host language. For details, see section 2.6.1.3 in the SALT specification.

While SALT and VoiceXML make it easy to implement speech-enabled applications, it is

difficult to design a quality speech application. An HTML programmer easily learns how to

write SALT applications, but designing a usable speech or multimodal application is still

more of an art than a science. [Balentine and Cohen] present guidelines and heuristics for

designing effective speech dialogs. A series of iterative designs and usability tests are

necessary to implement speech applications for users to both enjoy and use efficiently to

perform their desired computer tasks.

CONCLUSION:

It is not clear at when this article was written if SALT will overtake and replace VoiceXML

as the most widely used language for writing telephony applications. It is also not clear if

SALT or some other language will become the preferred language for developing multimodal

applications. The availability of high-level design tools, code generators, and system

development environments that hide the choice of development language from the speech

application developer may minimize the importance of programming language choice.

Balentine B, and Morgan, D. P. (2004) How to Build a Speech Recognition Application: A

Style Guide for Telephony Dialogues (2 nd edition), 1999, San Ramon, CA: Enterprise

Integration Group.

Cohen M. H., Giangola J. P., Balogh, J., (2004). Voice User Interface Design, Addison

Wesley.

Speech Applications Language Tags Specification Version 1.0 , 15 July 2002,

http://www.saltforum.org/

Speech Recognition Grammar Specification (SRGS), Version 1.0, W3C Recommendation,

16 March 2004, http://www.w3.org/TR/2004/REC-speech-grammar-20040316/

Speech Synthesis Markup Language (SSML), Version 1.0, W3C Proposed Recommendation,

15 July 2004, http://www.w3.org/TR/2004/PR-speech-synthesis-20040715/

Voice Extensible Markup Language (VoiceXML), Version 2.0, W3C Recommendation, 16

March 2004, http://www.w3.org/TR/2004/REC-voicexml20-20040316/

Hitachi ZAXIS 130W Wheeled Excavator Operator's Manual SN 001001 and Up PDF
50% (2)
Hitachi ZAXIS 130W Wheeled Excavator Operator's Manual SN 001001 and Up PDF
12 pages
Augmentative - Alternative Communication
No ratings yet
Augmentative - Alternative Communication
6 pages
Design and Implementation of Text To Speech Application For Vision Impaired Students
75% (4)
Design and Implementation of Text To Speech Application For Vision Impaired Students
83 pages
Area Classification To Ip 15
100% (4)
Area Classification To Ip 15
33 pages
Speech Generating Device: Fundamentals and Applications
From Everand
Speech Generating Device: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction to Linguistics 14
No ratings yet
Introduction to Linguistics 14
27 pages
The Role of Speech Processing in Multimedia Communications_rev1
No ratings yet
The Role of Speech Processing in Multimedia Communications_rev1
6 pages
Voice Communication With Computers (VanNostrand) (1993)
No ratings yet
Voice Communication With Computers (VanNostrand) (1993)
342 pages
HPL-2002-42 - Speech-As-Data Technologies For Personal Information Devices
No ratings yet
HPL-2002-42 - Speech-As-Data Technologies For Personal Information Devices
10 pages
Voice Application Development For Android 1st Edition Michael F. Mctear pdf download
100% (2)
Voice Application Development For Android 1st Edition Michael F. Mctear pdf download
49 pages
Voice Application Development for Android
From Everand
Voice Application Development for Android
Michael F. McTear
1/5 (1)
Today's Technology for Baby Boomers & Beyond !
From Everand
Today's Technology for Baby Boomers & Beyond !
Andy Livingston
No ratings yet
VUIs and Mobile Applications
No ratings yet
VUIs and Mobile Applications
9 pages
Computer Based Automatic Speech Processing: Pham Van Tuan
No ratings yet
Computer Based Automatic Speech Processing: Pham Van Tuan
70 pages
POST_LTC07_115
No ratings yet
POST_LTC07_115
13 pages
F Computer Assisted Instruction in Lang Teaching
No ratings yet
F Computer Assisted Instruction in Lang Teaching
13 pages
Fara Edited Report
No ratings yet
Fara Edited Report
18 pages
Oreilly Design For Voice Interfaces
No ratings yet
Oreilly Design For Voice Interfaces
37 pages
Development of A Voice-Controlled Personal Assistant For The Elderly and Disabled
No ratings yet
Development of A Voice-Controlled Personal Assistant For The Elderly and Disabled
6 pages
A Thumbnail Sketch of CALL History
No ratings yet
A Thumbnail Sketch of CALL History
2 pages
Speech Recognition
No ratings yet
Speech Recognition
27 pages
JETIR2003165
No ratings yet
JETIR2003165
4 pages
Interfaces Voice Applications: User For
No ratings yet
Interfaces Voice Applications: User For
7 pages
Artificial Intelligence For Speech Recog
No ratings yet
Artificial Intelligence For Speech Recog
5 pages
Speech-As-Data Technologies For Personal Information Devices
No ratings yet
Speech-As-Data Technologies For Personal Information Devices
8 pages
Voice Controlled Wheel Chair
0% (1)
Voice Controlled Wheel Chair
56 pages
Meeting Xii: What'S This For ?
No ratings yet
Meeting Xii: What'S This For ?
10 pages
5-Computer Part I 091339
No ratings yet
5-Computer Part I 091339
14 pages
DTUI6 Chap09 accessiblePPT
No ratings yet
DTUI6 Chap09 accessiblePPT
23 pages
Pertemuan 12
No ratings yet
Pertemuan 12
10 pages
A Comparative Study of Various Approaches For Dialogue Management
No ratings yet
A Comparative Study of Various Approaches For Dialogue Management
8 pages
Meeting Xii: What'S This For ?
No ratings yet
Meeting Xii: What'S This For ?
10 pages
Artificial Intelligence For Speech Recognition
No ratings yet
Artificial Intelligence For Speech Recognition
5 pages
Science and Technology For Physically Abled Person
100% (3)
Science and Technology For Physically Abled Person
10 pages
Science and Technology For Specially Abled Person
No ratings yet
Science and Technology For Specially Abled Person
10 pages
Hearing Impairment and Computing: Visual Warnings
No ratings yet
Hearing Impairment and Computing: Visual Warnings
2 pages
CALL (Computer Assisted Language Learning)
No ratings yet
CALL (Computer Assisted Language Learning)
10 pages
Creating and Evaluating The Effectiveness of A Mobile Application For Children With Speech Retardation
No ratings yet
Creating and Evaluating The Effectiveness of A Mobile Application For Children With Speech Retardation
8 pages
Automatic Sound Recognition Technology: Modern College of Engineering, Pune-05
100% (2)
Automatic Sound Recognition Technology: Modern College of Engineering, Pune-05
20 pages
14) 2019 - P.singh - Voice Control Device Using Raspberry Pi
No ratings yet
14) 2019 - P.singh - Voice Control Device Using Raspberry Pi
6 pages
Natural Language User Interface - NLP Search
No ratings yet
Natural Language User Interface - NLP Search
6 pages
Natural Language User Interface: Fundamentals and Applications
From Everand
Natural Language User Interface: Fundamentals and Applications
Fouad Sabry
No ratings yet
CALL and Language Skills
No ratings yet
CALL and Language Skills
7 pages
UNIT-2 - Expressive Human and CMD Languages
No ratings yet
UNIT-2 - Expressive Human and CMD Languages
21 pages
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition
No ratings yet
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition
6 pages
An Intelligent Web-Based Voice Chat Bot
No ratings yet
An Intelligent Web-Based Voice Chat Bot
33 pages
Speech Recognition Report
No ratings yet
Speech Recognition Report
87 pages
Implementation of Voice Based E-Mail System For Visually Challenged
No ratings yet
Implementation of Voice Based E-Mail System For Visually Challenged
9 pages
Anyone Can Talk Tool
No ratings yet
Anyone Can Talk Tool
9 pages
Thesis 2023
No ratings yet
Thesis 2023
69 pages
Speech Recognition
100% (3)
Speech Recognition
66 pages
Text Entry Interface - Wikipedia
No ratings yet
Text Entry Interface - Wikipedia
26 pages
Introduction to Computers: A student's guide to computer learning
From Everand
Introduction to Computers: A student's guide to computer learning
Ms. Shikha Nautiyal
4.5/5 (2)
Voice Recognition System Report
No ratings yet
Voice Recognition System Report
17 pages
Voice Controlled Car: BS Documentation by Hammad Malik (F16-1244) Arslan Ali (F16-1160) Hassam Akram (F16-1153)
100% (1)
Voice Controlled Car: BS Documentation by Hammad Malik (F16-1244) Arslan Ali (F16-1160) Hassam Akram (F16-1153)
16 pages
CI Advantages & Disadvatages
No ratings yet
CI Advantages & Disadvatages
15 pages
Design and Implementation of Text To Speech Application For Vision Impaired Students
100% (2)
Design and Implementation of Text To Speech Application For Vision Impaired Students
15 pages
Call in Lanaguage Teaching
No ratings yet
Call in Lanaguage Teaching
8 pages
Living in A Digital Age: The Magic of Computers
No ratings yet
Living in A Digital Age: The Magic of Computers
4 pages
Designing Accessible Conversational Interfaces For Older Adults: The Case For New Usability Guidelines
No ratings yet
Designing Accessible Conversational Interfaces For Older Adults: The Case For New Usability Guidelines
4 pages
An Introduction to Computers and Internet - A Practical Presentation
From Everand
An Introduction to Computers and Internet - A Practical Presentation
Ronald Nsereko
No ratings yet
Wiki Week 2
No ratings yet
Wiki Week 2
22 pages
SMArnold Catalog
No ratings yet
SMArnold Catalog
96 pages
Chapter 1 - Structures of Self-Sensing Concrete
100% (1)
Chapter 1 - Structures of Self-Sensing Concrete
11 pages
HI-FOG Pop-Out Spray Head For Rolling Stock v2
No ratings yet
HI-FOG Pop-Out Spray Head For Rolling Stock v2
2 pages
Definition of Gabion
No ratings yet
Definition of Gabion
8 pages
MBR Calculations
No ratings yet
MBR Calculations
13 pages
Adam Brochure
No ratings yet
Adam Brochure
48 pages
TM 5-3800-205-23-2 Model 613CS
100% (1)
TM 5-3800-205-23-2 Model 613CS
276 pages
Work Environment: Belgium
No ratings yet
Work Environment: Belgium
3 pages
Sustainable Schools: Best Practises Guide - Vancouver, Canada
100% (1)
Sustainable Schools: Best Practises Guide - Vancouver, Canada
34 pages
The Traditional Furniture Maker - Bryant (1990) PDF
100% (3)
The Traditional Furniture Maker - Bryant (1990) PDF
231 pages
OpenSAP Iot3 Week 2 All Slides
No ratings yet
OpenSAP Iot3 Week 2 All Slides
97 pages
Flowpet 5G
No ratings yet
Flowpet 5G
56 pages
Pipe Span Without Point Loads
100% (6)
Pipe Span Without Point Loads
36 pages
CV. Dr. Basil Mansour
No ratings yet
CV. Dr. Basil Mansour
3 pages
15EC101 Basic Electronics Engineering
0% (1)
15EC101 Basic Electronics Engineering
2 pages
Aluminium Utensils - Gujranwala PDF
No ratings yet
Aluminium Utensils - Gujranwala PDF
14 pages
Smart Farm Automated Classifying and Grading System of Tomatoes Using Fuzzy Logic
No ratings yet
Smart Farm Automated Classifying and Grading System of Tomatoes Using Fuzzy Logic
9 pages
Thomas - Academy of Baccio Bandinelli
100% (1)
Thomas - Academy of Baccio Bandinelli
13 pages
Master Deliverable List - Conveyor: Circuit TSL Doc / DRG No. Vendor Doc / DRG No
No ratings yet
Master Deliverable List - Conveyor: Circuit TSL Doc / DRG No. Vendor Doc / DRG No
10 pages
Seminar On Magnetic Refrigeration: A Promising Substitute For Vapour Compression System
No ratings yet
Seminar On Magnetic Refrigeration: A Promising Substitute For Vapour Compression System
16 pages
APB Manual PDF
No ratings yet
APB Manual PDF
84 pages
"Green Building": A Seminar Report On
No ratings yet
"Green Building": A Seminar Report On
12 pages
07 KE KW Eng
No ratings yet
07 KE KW Eng
22 pages
Tysons Resume 2017 08 25
No ratings yet
Tysons Resume 2017 08 25
2 pages
Indica EV2 Manual PDF
100% (1)
Indica EV2 Manual PDF
164 pages
Asian Paints
No ratings yet
Asian Paints
19 pages
Furniture Worksheet
No ratings yet
Furniture Worksheet
4 pages
Pipe Production Products - Plassim - English Catalog 2006
No ratings yet
Pipe Production Products - Plassim - English Catalog 2006
6 pages

Speech Application Language Tags

Uploaded by

Speech Application Language Tags

Uploaded by

SPEECH PROCESSING

Speech Application Language Tags (SALT) is a small number of XML

elements that may be embedded into host programming languages to speech-enable

VoiceXML, the current standard language for developing speech applications.

Speaking and listening is so fundamental that people take it for granted.

So, is it possible to speak and listen to a computer?

conversational dialogs enabling users to speak and listen to the computer.

MOTIVATION FOR SPEAKING AND LISTENING TO A COMPUTER :

IF THE DEVICE HAS NO KEYBOARD :

Many devices have no keypad or keyboard. For

can be used with rotary phones, which do not have a keypad.

Speaking and listening are

to a computer while they work.

AT ANYTIME DURING THE DAY :

Many telephone help lines and receptionists are

during the day or night.

WITH INSTANT CONNECTION WITHOUT BEING PLACED ON “HOLD:”

difficult caller problems.

USING LANGUAGES THAT DO NOT LEND THEMSELVES TO KEYBOARDING :

In an effort to enhance written text to convey emotions, callers

the speaking voice.

TO USE MULTIPLE CHANNELS OF COMMUNICATION BETWEEN USER AND COMPUTER :

Speech enhances traditional GUI user interfaces by enabling users to speak

graphical and pictorial answers.

LANGUAGES FOR SPEECH APPLICATIONS:

cell phones users.

multiple modes of input—speaking, as well as typing and selecting (pointing).

The SALT Forum [http://www.saltforum.org/] originally consisting of Cisco,

SALT profile for Scalar Vector Graphics (SVG) to the W3C.

• <prompt>—presents audio recordings and synthesized speech to the user. SALT

Automatic—used for recognition in telephony or hands-free scenarios. The speech

Single—used for push-to-talk applications. An explicit stop from the application

returns the recognition result.

Multiple—used for “open-microphone” or dictation applications. Recognition results

are returned at intervals until the application makes an explicit stop.

• <grammar>—specifies the words and phrases a user might speak

• <dtmf>—recognizes DTMF (telephone touch-tones)

• <bind>—integrates recognized words and phrases with application logic

• <smex>—communicates with other platform components

ARCHITECTURES FOR SALT APPLICATIONS:

Users interact with telephony

speaker. The hardware architecture for telephony applications, illustrated in Figure 1,

• Telephony server—connects the IP network (and the speech server) to the

• Speech server—contains a speech recognition engine which converts spoken

mobile telephones and telephony-enabled PDAs.

telephony server could by replaced by internet connections to speech-enabled desktop

devices, bypassing the telephone communication system entirely.

gestures, and spoken speech.

TELEPHONY AND MULTIMODAL APPLICATIONS USING SALT:

a mouse or pointing with a stylus, and then entering the data).

and is used by both SALT and VoiceXML 2.0/2.1.

using the Speech Recognition Grammar Specification (SRGS), another W3C

standard. An example grammar is illustrated in Figure 2, lines 44–54. Application

developers should spend effort to fine-tune the specification of grammars to recognize

humans occasionally misunderstand what others say.) In the best circumstances,

to assist users in overcoming speech recognition problems by prompting the user to

recognition system fails.

COMPARISON OF SALT WITH VOICEXML:

While SALT and VoiceXML make it easy to implement speech-enabled applications, it is

perform their desired computer tasks.

application developer may minimize the importance of programming language choice.

Balentine B, and Morgan, D. P. (2004) How to Build a Speech Recognition Application: A

Speech Applications Language Tags Specification Version 1.0 , 15 July 2002,

Speech Recognition Grammar Specification (SRGS), Version 1.0, W3C Recommendation,

16 March 2004, http://www.w3.org/TR/2004/REC-speech-grammar-20040316/

15 July 2004, http://www.w3.org/TR/2004/PR-speech-synthesis-20040715/

Voice Extensible Markup Language (VoiceXML), Version 2.0, W3C Recommendation, 16

March 2004, http://www.w3.org/TR/2004/REC-voicexml20-20040316/

You might also like