0% found this document useful (0 votes)
275 views4 pages

35 End-To-End Conversion Speed Analysis of An FPT - AI-based Text-to-Speech Application

The document summarizes a study analyzing the end-to-end conversion speed of a text-to-speech (TTS) application based on FPT.AI. The application converts Vietnamese text to speech using 7 voices through an API. Conversion time for 400-500 character text was around 10 seconds initially and under 1.8 seconds for subsequent conversions. The proposed system was found to have advantages over existing systems by allowing download of converted audio, supporting multiple user requests, and using the FPT.AI engine.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
275 views4 pages

35 End-To-End Conversion Speed Analysis of An FPT - AI-based Text-to-Speech Application

The document summarizes a study analyzing the end-to-end conversion speed of a text-to-speech (TTS) application based on FPT.AI. The application converts Vietnamese text to speech using 7 voices through an API. Conversion time for 400-500 character text was around 10 seconds initially and under 1.8 seconds for subsequent conversions. The proposed system was found to have advantages over existing systems by allowing download of converted audio, supporting multiple user requests, and using the FPT.AI engine.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

End-to-end Conversion Speed Analysis of an FPT.

AI-based Text-to-
Speech Application
In this project, an FPT.AI-based text-to-speech (TTS) application is developed that
converts Vietnamese text into spoken words. The FPT stands for Financing and
Promoting Technology. The application is developed based on Django for Python
and in the form of an interactive web page which is connected to an FPT.AI server
through its application programming interface (API). The application supports
conversion of text to seven different Vietnamese speeches. Four out of seven
voices can be used to convert up to 500 characters in a single transaction while the
others support that of 400 characters. Based on the results obtained, the first
conversion time takes up to 10 s to convert 400-character text into speech while the
subsequent times, given same text, it takes under 1.8 s for the conversion. This is
applicable to all voices. Speech synthesis is the fundamental component of many
artificial intelligence systems. With our own ambition, FPT Technology Innovation
Department has been working hard for nearly 8 years to launch FPT Speech
Synthesis. Being considered as the best integrated system of Vietnamese language
voice in the market today, FPT's new Vietnamese Speech Synthesis API is being
opened for free evaluation.

EXISTING SYSTEM:
In the existing system there are several parameters that can be used for indicating
the performance of a TTS system. The most common parameter is mean opinion
score (MOS) which is broadly used to measure the naturalness of the generated
speech. However, this is not enough to indicate the performance of a system from
customer perspective. The reason is that end-users only experience end-to-end TTS
conversion, therefore, even if the core engine is very fast, the intermediate
communication media may affect how fast the system can perform the conversion.
DISADVANTAGES OF EXISTING SYSTEM:
 Cloud Based solution not available for the MOS based TTS.
 The Converted file into mp3 is not downloadable to the user.
 It is simple Speech engine gtts
 Algorithm: mean opinion score (MOS)

PROPOSED SYSTEM:
The proposed System end-to-end conversion speed of an FPT.AI-based TTS
application is analyzed with the main focus on the relationship between the length
of the input text and its end-to-end conversion time. The main contributions of this
research are: (i) an FPT.AIbased TTS application using Django for Python, (ii)
performance analysis of the application for seven different supported voices and
several lengths of input text. In this work, FPT.AI API is used for interfacing
between local host and remote FPT TTS server. This represents an actual working
condition when an external user needs to use the API for converting text to speech.

ADVANTAGES OF PROPOSED SYSTEM:


 Each registered user is allowed to use the TTS service for multiple requests
amounting a total of 10,000 characters monthly.
 There are three main input parameters that user can key into the system: text
to convert to speech, the desired speed of generated speech and voice type.
 For each request, a response will be returned to host application by the
server. It has JavaScript Object Notation (JSON) format and contains a static
HTTP link to download the converted audio file in *.mp3 format. In
addition, the response has an error-or-success indicator.
 Algorithm: FPT.AI core engine

SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:

 System : Intel Core i3.


 Hard Disk : 1 TB.
 Monitor : 15’’ LED
 Input Devices : Keyboard, Mouse
 Ram : 8 GB.

SOFTWARE REQUIREMENTS:

 Operating system : Windows 10.


 Coding Language : Python
 Tool : PyCharm, Visual Studio Code
 Database : SQLite

REFERENCE:
Tran Duc Chung, Micheal Drieberg, Mohd Fadzil Bin Hassan, Alexandra
Khalyasmaa Centre for Research and Data Science (CeRDaS), Computer and
Information Science Department Automated Electrical Systems Department FPT
University, Hoa Lac Hi-Tech Park, Hanoi, Vietnam " End-to-end Conversion
Speed Analysis of an FPT.AI-based Text-to-Speech Application " Global
Conference on Life Sciences and Technologies Date Added to IEEE Xplore: 30
April 2020 INSPEC Accession Number: 19575875 DOI:
10.1109/LifeTech48969.2020.1570620448

You might also like