0% found this document useful (0 votes)

43 views

Exploration of English Speech Translation Recognition Based On The LSTM RNN Algorithm

Uploaded by

dongruichan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

Exploration of English Speech Translation Recognition Based On The LSTM RNN Algorithm

Uploaded by

dongruichan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Neural Computing and Applications (2023) 35:24961–24970

https://doi.org/10.1007/s00521-023-08462-8 (0123456789().,-volV)(0123456789().
,- volV)

S . I . : E V O L U T I O N A R Y C O M P U T A T I O N B A S E D M E T H O D S A N D A P P L I CA T I O N S F O R
DATA PROCESSING

Exploration of English speech translation recognition based

on the LSTM RNN algorithm
Qiwei Yuan1 • Yu Dai2 • Guangming Li3

Received: 10 October 2022 / Accepted: 3 March 2023 / Published online: 23 March 2023
Ó The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023

Abstract
In today’s information society, the demand for intelligence is increasing daily. English speech translation recognition
technology based on the LSTM (long short-term memory) recurrent neural network (RNN) algorithm is an important
manifestations of computer intelligence. In recent years, many scholars have conducted research on speech translation
recognition technology, including template matching and statistical pattern recognition. Each of these methods has its
drawbacks. This paper discusses English speech recognition techniques by utilizing the basic RNN principles. Moreover,
its application and construction in practice, which can provide some useful reference for future researchers, are analysed.
LSTM RNN is an intelligent system that is different from traditional pattern recognition methods. The greatest difference is
that it simulates the information processing of the human brain and realizes the intelligent information processing in a
distributed manner. It has a variety of automatic recognition and extraction functions, such as storage, association, and
retrieval, especially for speech translation and recognition problems with high perception ability. This new neural network
recognition system has a strong scientific nature and can store sound information in a decentralized manner, similar to the
human brain. The LSTM RNN has been widely used in the speech recognition field due to its excellent performance in
extraction and classification. The study found that the recognition accuracy of the original RNN was generally maintained
between 48 and 54%, and the data loss rate was relatively high. The accuracy rate of speech recognition based on LSTM
RNN was as high as 94%, and the information storage efficiency was high, which greatly avoided repetitive processes. The
voice data processing speed can be completed in 4.5 s at the fastest, which plays an important role in terms of mass
satisfaction and social development needs.

Keywords LSTM (long short-term memory) Recurrent neural network algorithm English voice translator
Translation recognition system

& Yu Dai
1 Introduction
[email protected]
Currently, the needs of human life are changing, especially
Qiwei Yuan
[email protected] in terms of computers. The initial simple applications can
no longer meet people’s needs. RNN is a good neural
Guangming Li
[email protected] network model that can process and predict sequence data.
RNNs and their variant networks have been widely used in
1
College of Foreign Language, Shaoyang University, many tasks, especially with certain time dependencies. In
Shaoyang 422000, Hunan, China speech recognition, machine translation, language mod-
2
College of International Education, Hunan University of elling, text classification, word vector generation, and
Medicine, Huaihua 418000, Hunan, China information retrieval, a model is needed to integrate and
3
Department of Mechanical and Energy Engineering, train data. However, RNNs are generally difficult to train.
Shaoyang University, Shaoyang 422000, Hunan, China After repeated loops, the gradient gradually disappears, and

123
24962 Neural Computing and Applications (2023) 35:24961–24970

the problem of gradient explosion rarely occurs. Due to will also become a new development trend for English
some problems in the practical application of RNNs, speech translation.
LSTM networks have attracted extensive attention as an
effective method for saving data. Improvements in English
speech translation recognition with LSTM RNN algorithms 2 Related work
have also appeared one after another.
In addition, in the past two decades, with the advance- At present, due to the development of science and tech-
ment of technology, computer performance has become nology, people’s demand for computer intelligence is
increasingly better. People’s understanding of speech increasing, and speech translation and recognition tech-
translation is also becoming increasingly mature, and many nology is one of the main intelligent technologies. Many
of our dream functions can be applied in real life. Speech scholars have conducted research on related aspects and
translation recognition technology is not only a science but proposed many translation recognition algorithms, but
also closely related to the fields of acoustics, intelligent these methods have certain drawbacks (such as poor-
computer recognition, and data processing. In daily quality translations, delays in data processing, and trans-
research and analysis, three aspects are often analysed from lations that are often very different from the original
template matching, random model, and probability gram- intent). Based on this, this paper discusses the recognition
mar. In addition, more research methods are constantly technology of speech translation from the perspective of
being discovered. neural networks and analyses its application and imple-
At present, the English speech translation recognition mentation in practice to meet the needs of today’s society.
system is still based on hidden Markov patterns for mod- For the purpose of English phonetic translation retrieval,
elling the time series of speech signals in English. Its an English listening translation retrieval system can be
model parameters cannot be well trained, which seriously designed and implemented, and different English phonetic
affects speech recognition and translation abilities. The retrieval, query and translation implementation methods
English speech translation recognition system based on the can be analysed. DuS proposed an English speech trans-
LSTM RNN algorithm mentioned in this paper is a forward lation recognition system based on the LSTM recurrent
neural network with multiple hidden layers, and each layer neural network algorithm to solve the translation ambiguity
is trained separately (including the classifier at the last problem in bilingual dictionary-based information retrieval
level). This enables more comprehensive training of the systems [1]. The difference between Chinese and English is
model parameters under limited training data, and the that English attaches great importance to syllable stress.
advantages of the LSTM RNN in the English speech Therefore, syllable recognition in English is a key link in
translation recognition task are supported by corresponding English teaching. Chen X’s research was a study on Eng-
evidence. lish speech recognition based on LSTM RNN. On this
English education is an important method for cultivating basis, an acoustic model based on weighted transfer was
language talent, but in practice, English education devel- proposed, that is, a one-step method and a two-step
opment has encountered many difficulties, thus affecting method. In English speech recognition, the convolutional
the quality of final education. English voice translators are neural network (CNN) and the LSTM neural network were
an indispensable tool in today’s teaching and can effec- compared and combined with the LSTM neural network to
tively solve various problems encountered in daily life. The form a statistical chart. The experimental results showed
core of the English translation system is speech recogni- that compared with other methods, the LSTM neural net-
tion. Through optimized design and comprehensive appli- work has obvious advantages in English translation, and it
cation, it can improve the level of English teaching. is suitable for subsequently teaching English speech
After research, this paper found that the recognition recognition models and developing real-time translation
accuracy of the original RNN is usually maintained in the equipment [2]. Zenkel T introduced a suite of open source
range of 48–54%, while the data loss rate is relatively high. English speech translations. Although there are a variety of
Speech recognition using LSTM RNN has a 94% accuracy open source tools to perform basic speech translation, the
rate, effectively saves a large amount of data, and greatly purpose is to provide a convenient method for overall
reduces repetitive processing. The processing time of voice English speech translation. Therefore, Docker (open source
data is less than 4.5 s, which plays a great role in meeting application container engine) can be used, which includes
the satisfaction of the people and the needs of social the following components: a neural speech recognition
development. The speech processing rate based on LSTM system, a sentence segmentation system, and an attention
recurrent neural network is relatively slow, and the LSTM conversion system. In addition, it also provides a pre-
recurrent neural network can reach 4.5 s in the shortest training model for this purpose to promote the development
time. At the same time, LSTM recurrent neural network of voice translation systems and encourage researchers to

123
Neural Computing and Applications (2023) 35:24961–24970 24963

improve overall translation recognition system perfor- model [7]. Zhu H proposed a language vocabulary classi-
mance [3]. Speech recognition or speech-to-text conversion fication type based on the LSTM RNN algorithm and used
has rapidly stimulated great interest in large organizations an English dictionary (WordNet) based on cognitive lin-
to simplify the human–computer communication process. guistics of synonyms, antonyms, and context words. The
Dharmale G found that optimizing the speech recognition English speech translation recognition system based on the
process is crucial because real-time users want to perform LSTM RNN algorithm can utilize the semantic relationship
actions based on voice input, and these actions sometimes of WordNet to improve the ease of use of English speech
define the user’s lifestyle, thus defining the speech process. translation recognition systems [8]. Worldwide, research
Some existing speech recognition software from Google, on English speech translation recognition systems has
Tencent and Microsoft tend to be more than 90% accurate turned to neural networks, which will be a hot topic for a
in real-time speech detection. The English speech transla- long time in the future.
tion recognition system based on the LSTM cyclic neural After a long period of development, the performance
network algorithm combines the speech recognition and role of speech translation and recognition systems have
methods used by these software programs and language been greatly improved, but there are still many problems
processing, which improves the overall process accuracy that need to be studied in the future. Although speech
with the help of speech analysis [4]. In English feature recognition technology based on neural networks has good
recognition, the speech translation recognition technology performance, it is difficult to widely use in practice due to
accuracy is particularly prominent. Hou Q took the intel- its unsatisfactory learning speed. With the continuous
ligent learning algorithm as the system algorithm and used development of its algorithm and the simplification of the
linear classification and nonlinear classification methods to model, the recognition rate of the LSTM RNN will be
carry out relevant subjective identification. By making use further improved, and it will gradually become the main-
of speech phase insensitivity, it can achieve the goal of stream in the market and occupy a place in various
noise reduction and better recognize and translate English industries.
[5]. With the continuous progress of society, RNNs based
on long- and short-term memory are increasingly favoured
by people, as they solve many historical problems for 3 English speech translation recognition
mankind and contribute to the development of the country. system based on the LSTM RNN algorithm
Therefore, English translation recognition technology
based on the LSTMRNN algorithm has become an 3.1 Technical basis for the design of an English
important direction in the field of human–computer inter- speech translation recognition system
action. In English, the use of speech recognition technol-
ogy to help teachers correct pronunciation has a certain Currently, the trend of education informatization in China
effect but can also help students not be limited by time and is becoming increasingly prominent. Both teachers and
space. Duan R used the LSTM cyclic neural network students need to build an information platform for English
algorithm to improve and analyse the speech recognition teaching activities, which has changed the shortcomings
algorithm and used the effective algorithm as a systematic (such as indoctrination teaching, time and space con-
algorithm for the English speech translation recognition straints, poor classroom atmosphere, and insufficient
model. Additionally, the basic speech-cutting process was human translation ability) of the previous English teaching
described. In addition, a control experiment was designed mode. English speech recognition plays an important role
to verify and analyse the English speech translation in translation software, and its recognition process is shown
recognition correction model based on the LSTM recurrent in Fig. 1. It is mainly used for identifying multiple lan-
neural network algorithm [6]. Hai Y aimed to make use of guages to help students quickly grasp English connotations.
English-specific syllables and prosodic features in spoken There are three main aspects of English speech translation
language data for English speech translation recognition recognition technology: feature extraction, pattern match-
and explored effective methods for speech detection and ing, and model training.
recognition systems. The method is based on a combination
of classifiers and syllable classifiers combined with other 3.1.1 Feature extraction technology
speech features based on speech rate, intensity, formant
and energy statistics, and articulation rate. Compared with The auxiliary function of the English language translation
syllable classifiers trained on specific syllables, it achieved speech recognition system usually involves three aspects:
better recognition rates. It was found that the recognition collection, processing and transmission. In addition, com-
performance of the English speech translation recognition puter language and natural language are very different.
system was significantly better than that of the traditional Therefore, correctly distinguishing the difference between

123
24964 Neural Computing and Applications (2023) 35:24961–24970

the two when translating is an urgent problem [9]. Feature feedback between the output of the network and the model.
extraction technology can extract features from the English Therefore, FNNs still have great drawbacks in many cases
language and transmit correct language signals to transla- [11]. The greatest RNN feature is that its continuous net-
tors to improve computer translation accuracy. work structure is well adapted to time series data and can
maintain data correlation. Figure 3 shows the RNN net-
3.1.2 Pattern matching technology work structure, which uses the loop on the hidden layer to
reduce the neural network parameters to be trained.
Speech recognition systems can help students and teachers In addition, due to the existence of shared parameters,
quickly understand the meaning of language through pat- data of different lengths can be extended, so the RNN input
tern matching technology and avoid the trouble caused by can be an indeterminately long sequence. For example, to
artificial language errors. Pattern matching technology uses train a fixed sentence, if a FNN is used, then each input
intelligent pattern recognition technology to automatically feature will have independent parameters, while the RNN
recognize and analyse speech input, which reduces the is completely the opposite. Although the original goal of
difficulty of manual translation. It can automatically select RNNs is to learn long-term dependencies, extensive prac-
a matching translation mode according to the structure, tice has proven that standard RNNs often struggle to pre-
grammar and application of English words and sentences. serve information for long periods of time [12]. Therefore,
People can obtain the final translation result by executing the application of an RNN in the initial stage is not uni-
the program command, which is of great help to both versal. For this problem, this paper improved the traditional
students and teachers. RNN. LSTM is the most effective method at present.
Compared with the implicit RNN unit, the internal struc-
3.2 Model training technology ture of the implicit LSTM unit is more complex and has
more options. The LSTM RNN type is shown in Fig. 4.
To realize teaching informatization, a translation recogni- In addition, the LSTM method is used for feature
tion system based on speech recognition is proposed. In extraction, and the fully connected layer method is used for
English teaching, it can help teachers solve translation regression classification. The RNN system of LSTM con-
problems and improve students’ ability to understand sists of 5 layers, which consist of two LSTMs, two fully
English knowledge. After completing speech recognition, connected and outputs. The structure is shown in Fig. 5.
the translator conducts simulation training according to the
actual situation to establish a virtual language training 3.4 Development status of English speech
platform. The simulation training technology uses the recognition and translation technology
design concept of man–machine integration. By combining
translators and speech recognizers for training, it can Currently, speech translation recognition has made great
quickly identify and judge the degree of English pronun- progress in theory and practice, which has greatly pro-
ciation and guide students to adjust their speech [10]. moted communication and collaboration between people in
different languages and cultural backgrounds. The sum-
3.3 LSTM RNN model structure mary is as follows:
First, the number of translated words is increasing. With
An FNN (feedforward neural network) can also be con- the progress of technology, the number of speech transla-
sidered a primitive neural network, which is composed of a tions is also growing exponentially. From the initial hun-
series of simple neurons. Figure 2 is a simple FNN that dreds of English words to the current tens of thousands,
includes the input layer, the hidden layer, and the output translation efficiency has been greatly improved.
layer. There are no loops in the network, and there is no

voice Feature measure Identify

preprocessing
input extraction test estimation decisions Recognition
result

result
model output
library

Fig. 1 Identification flowchart

123
Neural Computing and Applications (2023) 35:24961–24970 24965

grammatical restrictions of input and improves the analysis

ability of spoken languages [14].
Third, translation is complex and integrated. In speech

input layer
recognition systems, the translation algorithm usually

hidden
output

layer
layer performs interactive processing through multiple transla-
tion methods. In this way, their respective advantages are
complemented, and the problem of a single algorithm is
overcome, thereby achieving the goal of a multiengine
Fig. 2 Simple structure of FNN
translation strategy. For example, translation software such
as Google Translate and Youdao Dictionary all use this
multiengine translation algorithm [15].
hidden layer
output Fourth, a large amount of world knowledge and lan-
layer input layer guage expression environment knowledge is introduced
into the speech recognition translation system. To improve
translation accuracy, many research groups are working to
Connect to the next time
step
introduce the knowledge of social roles, conversation
scenes, body movements, and expression into speech
Fig. 3 Network structure of RNN recognition systems. Some research groups are also using
television and image capture technology to help aid speech
recognition translation by collecting and analysing the
output gating speaker’s facial expressions, movements and environment.
unit Even if the translation effect is not very good, the listener
can roughly judge the meaning of the other party from
other information [16].
output Fifth, it begins the technological development from one
extrusion unit language to multilingual, multicontext, two-way commu-
nication. The previous translation systems used a single
output gate
language as the object to complete a single voice conver-
unit
sion. The current translation system is for multilanguage
and multidomain two-way speech, which greatly promotes
memory
cells the exchange of information between the two parties.
Table 1 shows an overview of the current state of
forget gate unit
development.

input gating 3.5 Difference between speech recognition

unit
based on LSTM neural network and original
input gate unit
speech recognition
enter
extrusion unit
The development of English professional teaching in
practice has encountered many difficulties, which have
affected the final education quality. Therefore, the original
Fig. 4 LSTM RNN model teaching method should be changed to create a more sci-
entific teaching platform [17]. English translation is an
Second, speech translation technology has entered more indispensable tool in today’s English teaching that can help
fields. In the early stages, people usually standardized teachers with many problems in daily life. The core of the
translations and sentence patterns. That is, whether the English translation system is speech recognition, and the
input sentence is lexical, grammatical or word order, it quality of English teaching can be improved through
must strictly follow language norms and restrictions; optimized design and comprehensive application [18].
otherwise, the quality of translation cannot be guaranteed The English speech translation recognition system is an
[13]. Now, for some common languages, even if there are auxiliary tool commonly used in English teaching. Figure 6
problems with word order, language disorders, and pro- is the basic block diagram of the English speech translation
nunciation, speech translation technology can be used to recognition system. Figure 6 shows that the process is
deal with it effectively, which greatly reduces the simple and easy to operate. In the case of high efficiency

123
24966 Neural Computing and Applications (2023) 35:24961–24970

Fig. 5 Schematic diagram of raw

LSTM-based translation sequence
time
recognition network structure data
window
data fully connected layer
LSTM LSTM

LSTM LSTM

result

LSTM LSTM

Table 1 Overview of
Development status Vocabulary Translation quality (%) Translation needs (%)
development status
Translation volume increased More 85.69 89.62
Wide range of translation More 80.99 93.69
Translation diversity More 90.15 95.22
Wide range of translation situations More 92.66 93.88

and speed, it perfectly replaces many complex traditional 3.5.2 Humanization

RNN processes. It can not only conform to the trend of
educational informatization but also greatly improve the The English speech recognition system adopts the LSTM
quality of teachers’ classroom teaching. A ‘‘translator’’ is a cyclic neural network algorithm, which is also a human–
digital tool that can realize automatic language information computer interaction method. Through the interaction
processing, which replaces artificial language translation between natural languages, it solves the drawbacks of the
and improves translation efficiency. traditional English teaching mode. ‘‘Human–machine’’
From the practical application of English teaching, the teaching uses multifunctional software as a platform to
advantages of English speech translation recognition soft- create a harmonious learning environment for English
ware based on the LSTM RNN algorithm are as follows: teaching, learning, and translation. This teaching mode
establishes a man–machine integrated control system [9].
3.5.1 Automation
3.6 Evaluation of LSTM RNN Model construction
The LSTM-based RNN is a cutting-edge achievement in
the current information technology field. It uses computer LSTM RNNs are different from feedforward neural net-
data processing technology to replace manual operations works (FNNs). They can store the previous information in
and builds an automatic translation operation platform [19]. hidden nodes in the middle, which can have a certain
In English teaching, translation is a major problem. influence on the network output [20]. In traditional feed-
Translation software can help teachers teach in the class- forward RNNs, parameters are trained using an algorithm
room so that students can better understand English that performs backpropagation over time. It is assumed that
knowledge. at each time t, the RNN has a piece of monitoring

Fig. 6 Basic block diagram of Preliminary identification

identification system results Recognition
English
voice signal feature pattern post- result
preprocessing
advance matching processing

speech
corpus acoustic language text corpus
dictionary model
model

123
Neural Computing and Applications (2023) 35:24961–24970 24967

information. The information loss is dt and the total loss is the state of the current input information cannot affect the
as follows: output information of the output gate. The increase in the
X
T connection of the current input gate to the output gate can
dt ð1Þ better control the output data of each memory cell.
t¼1

By using the chain law, the gradient of loss o about l

can be found: 4 Comparison of original RNN and LSTM
RNN
#d XT X t
#hk #yt #dt
¼ f ðhi Þ ð2Þ
#l t¼1 k¼1 #l #ht #yt RNN is a special self-connected network in the field of
h 0 i learning that can complete the mapping between complex
Y
t
f ð hi Þ ¼ lT diag f ðhi1 Þ ð3Þ vectors to simple vectors. It has strong computing power
i¼kþ1 and has the functions of association and memory. However,
due to its difficulty in implementation, it was quickly
Based on this, there are:
h 0 i replaced by other neural networks and traditional machine
s ¼ lT diag f ðhi1 Þ ð4Þ learning methods. A large number of practical applications
also prove that RNNs have difficulty achieving long-term
Formula (4) is stk . If s \ 1, when ðt kÞ ! 1, data storage. Therefore, an LSTM-based RNN was intro-
tk duced to improve the original RNN model. The two are
s ? 0 will have gradient disappearance. Due to the long
network transmission period, the update speed of the net- quite different in terms of information storage integrity
work weights is slow. Additionally, due to gradient rate, recognition accuracy rate, voice data processing
explosion and disappearance in the parameter learning speed, and mass satisfaction. Figure 7 shows the compar-
process, it cannot reflect the long-term memory effect of ison of recognition accuracy.
RNNs. Therefore, an RNN model based on LSTM is Figure 7 is a comparison of the speech recognition
proposed. accuracy of the original RNN and LSTM RNN. From
In the LSTM neural network, a ring link network called Fig. 7a, it can be seen that the recognition accuracy of the
‘‘memory’’ is first used to replace the hidden layer nodes in original RNN was generally maintained between 48 and
the traditional network. Second, a threshold mechanism is 54%. The accuracy rate of speech recognition based on
employed to control the information accumulation rate, LSTM RNN in Fig. 7b was as high as 94%, which plays an
which provides a new function for writing, reading and indispensable role in terms of mass satisfaction and social
resetting memory cells. The forget gate ft is used to control development needs. Figure 8 shows the comparison of the
how much information each memory cell needs to forget. completeness rate of information storage.
The input gate it is used to control how much new infor- According to the data analysis of the information storage
mation is added to each memory cell. The output gate ot is rate of the original CNN in Fig. 8a, the storage rate was
used to control how much information each memory cell only up to 74%, which can also be said to be a relatively
outputs. The LSTM neural network can choose to forget high churn rate. If the same information needs to be
previously accumulated information so that the LSTM translated and recognized again, the data need to be re-
model can learn long-term historical information. At time t, entered, which makes it difficult to improve work effi-
Ct represents all historical information and is controlled by ciency. In Fig. 8b, the information storage rate of the
the input gate it , forget gate ft , and output gate ot . The LSTM RNN was as high as 97%, which greatly avoided
LSTM operates as follows at time t: repetitive processes and improved work efficiency. Fig-
it ¼ uðWi xt þ li ht1 þ Vi ct1 Þ ð5Þ ure 9 shows the comparison of voice data processing
speed.
ft ¼ u Wf xt þ lf ht1 þ Vf ct1 ð6Þ Figure 9 is a comparison of the voice data processing
speed of the original RNN and LSTM cycle neural net-
ot ¼ Wo xt þ lf ht1 þ Vf ct1 ð7Þ
work. It can be seen in Fig. 9a that the processing speed of
Ct ¼ tan hðWc xt þ lc ht1 Þ ð8Þ the original RNN was relatively slow, and it required 8 s.
In Fig. 9b, the voice data processing speed of the LSTM
Ct ¼ f t Ct1 þ it Ct ð9Þ
cyclic neural network was completed at the fastest speed of
ht ¼ ot tan hðCt Þ ð10Þ 4.5 s, which greatly facilitates the needs of social devel-
opment in the science and technology era. Additionally, the
It can be seen from this that the LSTM network is an
LSTM cyclic neural network will also be a future research
RNN with a large amount of extended memory. In LSTM,
direction of the English voice translation field.

123
24968 Neural Computing and Applications (2023) 35:24961–24970

Fig. 7 Comparison of goal one goal two goal one goal two
recognition accuracy goal three goal four goal three goal four
55% 95%
54% 94%
53% 93%
52%
92%

recognition accuracy

recognition accuracy
51%
91%
50%
90%
49%
89%
48%
47% 88%

46% 87%

45% 86%
first second third forth fifth sixth first second third forth fifth sixth
time time time time time time time time time time time time

original RNN LSTM recurrent neural network

a. Original RNN recognition accuracy b. LSTM RNN recognition accuracy

Fig. 8 Information storage ratio object four object three

comparison
sixth survey
fifth survey
original RNN

forth survey
third survey
second survey
first survey

66% 67% 68% 69% 70% 71% 72% 73% 74% 75%
storage integrity

a. Original CNN information storage rate

object four object three

sixth survey
LSTM recurrent neural

fifth survey
network

forth survey
third survey
second survey
first survey

91% 92% 93% 94% 95% 96% 97% 98%

storage integrity

b. LSTM RNN information storage rate

123
Neural Computing and Applications (2023) 35:24961–24970 24969

Fig. 9 Comparison of voice goal one goal two goal three goal four goal one goal two goal three goal four
data processing speed 12 6

10 5

8 4

data processing speed

6 3

4 2

2 1

0 0

original RNN LSTM recurrent neural network

a. Primitive RNN data processing speed b. LSTM cyclic neural network data processing speed

5 Conclusions References

Throughout the text, it can be seen that the intelligent 1. Du S (2019) Optimization of speech recognition system of eng-
lish education industry based on machine learning. Computer-
translation system of English voice recognition is a new Aided Des Appl 17(1):124–136
translation technology based on information technology 2. Chen X (2021) Simulation of english speech translation recog-
and intelligent technology. After decades of development, nition based on transfer learning and CNN neural network.
it has made great progress in theory and practice and has J Intell Fuzzy Syst 40(2):2349–2360
3. Zenkel T, Sperber M, Niehues J (2018) An open source toolkit for
realized the desire to communicate across languages. speech-to-English text translation. Prague Bull Math Ling
LSTM cyclic neural networks are an important research 111(1):125–135
direction for current learning research. They can process 4. Dharmale G, Thakare VM, Patil DD (2019) Implementation of
sequence data such as text, audio, and video and achieve Efficient speech recognition system on mobile device for Hindi
and English language. Int J Adv Comput Sci Appl 10(2):83–87
significant results in many aspects. However, the explo- 5. Hou Q, Li C, Kang M (2020) Intelligent model for speech
ration of components in the circulatory structure continues recognition based on SVM: a case study on English language.
and continuously improves computing components to J Intell Fuzzy Syst 40(7):1–11
improve performance. However, there are still many 6. Duan R, Wang Y, Qin H (2020) A speech recognition model for
correcting spoken English teaching. Journal of Intelligence and
defects in the current voice recognition translation system. Fuzzy Systems 40(1):1–12
How to further improve translation quality has become a 7. Hai Y (2020) Computer-aided teaching mode of oral English
current problem faced by scientists, which requires all intelligent learning based on speech recognition and network
scientific researchers to work together to create tomorrow’s assistance. J Intell Fuzzy Syst 39(4):5749–5760
8. Zhu H (2020) Construction of English spoken language system
voice recognition translation technology. based on machine learning algorithm and natural language
recognition. J Intell Fuzzy Syst 39(99):1–12
9. Sangeetha J, Jothilakshmi S (2017) Speech translation system for
Funding This work was supported by Shaoyang Science and Tech- english to dravidian languages. Appl Intell 46(3):534–550
nology Planning Project (2021025ZD): Construction of College 10. Mott M, Midgley KJ, Holcomb PJ (2020) Speech recognition
English online education Platform under the background of translation initiation and image effects in American Sign Lan-
"Internet+". guage deaf and English listening learners. Biling Lang Cognit
23(5):1032–1044
Data availability statement Data sharing not applicable to this article 11. Mendel LL, Poussen M, Bass JK (2019) English speech recog-
as no datasets were generated or analyzed during the current study. nition threshold test for Spanish children. Am J Audiol 28(1):1–8
12. Long Y, Li Y, Zhang Q (2020) Acoustic data augmentation for
Mandarin-English code-switching speech recognition. Appl
Declarations Acoust 161(11):107–125
13. Feng X, Zhou Y (2021) English translation language retrieval
Conflict of interest These are no potential competing interests in our based on adaptive English phonetic adjustment algorithm. Com-
paper. plexity 202(1):1–12
14. Cao D, Guo Y (2020) Algorithm research of spoken English
assessment based on fuzzy measure and speech recognition
technology. Int J Biom 12(1):120–131

123
24970 Neural Computing and Applications (2023) 35:24961–24970

15. Miller MK, Calandruccio L, Buss E (2019) Masked English 20. Bawa S (2021) A Sanskrit-to-English machine translation using
speech recognition performance in younger and older Spanish– hybridization of direct and rule-based approach. Neural Comput
English bilingual and English monolingual children. J Speech Appl 33:2819–2838
Lang Hear Res 62(12):1–14
16. Yun Z (2017) Research on spoken english speech recognition Publisher’s Note Springer Nature remains neutral with regard to
technology in computer network environment. Boletin Tecnico/ jurisdictional claims in published maps and institutional affiliations.
Tech Bull 55(16):445–449
17. Zhang Y, Liu L (2018) Using computer speech recognition
Springer Nature or its licensor (e.g. a society or other partner) holds
technology to evaluate spoken English. Educ Sci Theory Pract
exclusive rights to this article under a publishing agreement with the
18(5):20–31
author(s) or other rightsholder(s); author self-archiving of the
18. Hidayat R, Winursito A (2021) Improved MFCC robust English
accepted manuscript version of this article is solely governed by the
speech recognition based on wavelet denoising. Int J Intell Eng
terms of such publishing agreement and applicable law.
Syst 14(1):12–21
19. Pathak A, Pakray P, Bentham J (2019) English-Mizo machine
translation using neural and statistical approaches. Neural Com-
put Appl 31:7615–7631

123

Victor Bulmer-Thomas (Auth.) The New Economic Mo
100% (1)
Victor Bulmer-Thomas (Auth.) The New Economic Mo
376 pages
Deep Speech - Scaling Up End-To-End Speech Recognition
No ratings yet
Deep Speech - Scaling Up End-To-End Speech Recognition
12 pages
Performance - Evaluation - of - Recurrent - Neural - Networks-LSTM - and - GRU - For ASR - IC2E3
No ratings yet
Performance - Evaluation - of - Recurrent - Neural - Networks-LSTM - and - GRU - For ASR - IC2E3
6 pages
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
No ratings yet
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
10 pages
Domain Adap Asr 2
No ratings yet
Domain Adap Asr 2
5 pages
10.2478 - Jaiscr 2019 0006
No ratings yet
10.2478 - Jaiscr 2019 0006
11 pages
Transformer-Transducer End-to-End Speech Recognition with Self-Attention
No ratings yet
Transformer-Transducer End-to-End Speech Recognition with Self-Attention
5 pages
po
No ratings yet
po
2 pages
Interactive Language Translator Using NMT-LSTM
No ratings yet
Interactive Language Translator Using NMT-LSTM
5 pages
Article 5
No ratings yet
Article 5
7 pages
Asru 2013 PDF
No ratings yet
Asru 2013 PDF
6 pages
Long Short-Term Memory Recurrent Neural Network Architectures For Large Scale Acoustic Modeling
No ratings yet
Long Short-Term Memory Recurrent Neural Network Architectures For Large Scale Acoustic Modeling
5 pages
sak14_interspeech
No ratings yet
sak14_interspeech
5 pages
Approaches For Neural-Network Language Model Adaptation: August 2017
No ratings yet
Approaches For Neural-Network Language Model Adaptation: August 2017
6 pages
Bidirectional Long Short-Term Memory For Automatic English To Kannada Back-Transliteration
No ratings yet
Bidirectional Long Short-Term Memory For Automatic English To Kannada Back-Transliteration
11 pages
DP Module 5
No ratings yet
DP Module 5
8 pages
BiLSTM_BPTT
No ratings yet
BiLSTM_BPTT
8 pages
A Recipe for Arabic-English Neural Machine Translation
No ratings yet
A Recipe for Arabic-English Neural Machine Translation
5 pages
Language Model Bootstrapping Using Neural Machine Translation For Conversational Speech Recognition
No ratings yet
Language Model Bootstrapping Using Neural Machine Translation For Conversational Speech Recognition
7 pages
DL Based Speech To Text Converter For Audio Visual Applications
No ratings yet
DL Based Speech To Text Converter For Audio Visual Applications
4 pages
Comparing Gru and LSTM For Automatic Speech Recognition: Shubham Khandelwal, Benjamin Lecouteux, Laurent Besacier
No ratings yet
Comparing Gru and LSTM For Automatic Speech Recognition: Shubham Khandelwal, Benjamin Lecouteux, Laurent Besacier
7 pages
Survey On Recurrent Neural Network in Natural Lang
No ratings yet
Survey On Recurrent Neural Network in Natural Lang
5 pages
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
From Everand
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
Adam Jones
No ratings yet
Character-Aware Neural Language Models
No ratings yet
Character-Aware Neural Language Models
9 pages
RNN Approaches To Text Normalization - A Challenge
No ratings yet
RNN Approaches To Text Normalization - A Challenge
17 pages
Sequence Transduction With Recurrent Neural Networks: Hochreiter Et Al. 2001
No ratings yet
Sequence Transduction With Recurrent Neural Networks: Hochreiter Et Al. 2001
9 pages
End-to-End Automatic Speech Recognition
No ratings yet
End-to-End Automatic Speech Recognition
19 pages
1507 08240
No ratings yet
1507 08240
8 pages
Domain Adap Asr 5
No ratings yet
Domain Adap Asr 5
6 pages
Capstone Paper
No ratings yet
Capstone Paper
3 pages
DRNN-AM
No ratings yet
DRNN-AM
5 pages
Incorporating Source-Side Phrase Structures Into Neural Machine Translation
No ratings yet
Incorporating Source-Side Phrase Structures Into Neural Machine Translation
26 pages
po
No ratings yet
po
2 pages
1-s2.0-S0893608005001206-main
No ratings yet
1-s2.0-S0893608005001206-main
9 pages
LSTM
No ratings yet
LSTM
27 pages
Accepted Manuscript: Speech Communication
No ratings yet
Accepted Manuscript: Speech Communication
16 pages
Improving English Conversational Telephone Speech Recognition
No ratings yet
Improving English Conversational Telephone Speech Recognition
6 pages
Bidirectional LSTM Networks For Improved Phoneme Classification and Recognition
No ratings yet
Bidirectional LSTM Networks For Improved Phoneme Classification and Recognition
6 pages
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
No ratings yet
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
5 pages
1808 08946v2 PDF
No ratings yet
1808 08946v2 PDF
10 pages
1508.06615 - PTB Character Aware Neural Language Models Yoon Kim
No ratings yet
1508.06615 - PTB Character Aware Neural Language Models Yoon Kim
9 pages
GROUP19_EEE_PAPER
No ratings yet
GROUP19_EEE_PAPER
23 pages
Full Text 01
No ratings yet
Full Text 01
54 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
10 pages
Adaptation Algorithms for Neural Network-Based Speech Recognition an Overview
No ratings yet
Adaptation Algorithms for Neural Network-Based Speech Recognition an Overview
34 pages
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
No ratings yet
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
17 pages
Lexicon-Free Conversational Speech Recognition With Neural Networks
No ratings yet
Lexicon-Free Conversational Speech Recognition With Neural Networks
10 pages
GROUP19_EEE_PAPER_IEEE
No ratings yet
GROUP19_EEE_PAPER_IEEE
158 pages
A Vietnamese Language Model Based On Recurrent Neural Network
No ratings yet
A Vietnamese Language Model Based On Recurrent Neural Network
5 pages
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
No ratings yet
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
63 pages
Conversing With AI: The World Of Natural Language Processing
From Everand
Conversing With AI: The World Of Natural Language Processing
William Garcia
No ratings yet
Ai Final Print
No ratings yet
Ai Final Print
23 pages
UNIT-3
No ratings yet
UNIT-3
4 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
Ijeet 12 03 035
No ratings yet
Ijeet 12 03 035
9 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
question bank_3
No ratings yet
question bank_3
5 pages
Development of Multilingual Speech
No ratings yet
Development of Multilingual Speech
13 pages
Text To Speechh Technology
No ratings yet
Text To Speechh Technology
28 pages
Gold value in CPUs & Computer Chips
100% (1)
Gold value in CPUs & Computer Chips
10 pages
100种有机溶剂的极性和其他物理常数列表
No ratings yet
100种有机溶剂的极性和其他物理常数列表
17 pages
28911037-Gold-Content-List-in-CPU-Chips
No ratings yet
28911037-Gold-Content-List-in-CPU-Chips
6 pages
Improving The Thermal Stability of The Fine-Grained Structure in The Cu-15Ni-8Sn Alloy During Solution Treatment by The Additions of Si and Ti
No ratings yet
Improving The Thermal Stability of The Fine-Grained Structure in The Cu-15Ni-8Sn Alloy During Solution Treatment by The Additions of Si and Ti
10 pages
The Moire Pattern Rule of The Twisted Bilayer Graphene and Its Electronic Property Under A Strain
No ratings yet
The Moire Pattern Rule of The Twisted Bilayer Graphene and Its Electronic Property Under A Strain
6 pages
Para Começar e Refletir para Começar e Refletir: Inglês 6º Ano Level 1 Inglês 6º Ano Level 1
No ratings yet
Para Começar e Refletir para Começar e Refletir: Inglês 6º Ano Level 1 Inglês 6º Ano Level 1
32 pages
Normal and Abnormal Behaviour in Children and Adolescents
No ratings yet
Normal and Abnormal Behaviour in Children and Adolescents
34 pages
Question and Answer For Chapter 3
No ratings yet
Question and Answer For Chapter 3
11 pages
Metropolitan Area
No ratings yet
Metropolitan Area
9 pages
Without This Message by Purchasing Novapdf : Print To PDF
No ratings yet
Without This Message by Purchasing Novapdf : Print To PDF
6 pages
Born A Crime Exploration
No ratings yet
Born A Crime Exploration
2 pages
Cancellation of Encumbrance Sec 7 Ra 26 - Google Search
No ratings yet
Cancellation of Encumbrance Sec 7 Ra 26 - Google Search
2 pages
Evil in World Religions
No ratings yet
Evil in World Religions
56 pages
CHAPTER 1 Importance of Work Immersion
100% (2)
CHAPTER 1 Importance of Work Immersion
2 pages
Stages of Team Development
No ratings yet
Stages of Team Development
8 pages
Signed Off - Introduction To Philosophy12 - q1 - m3 - The Human Person As An Embodied Subject - v3
100% (1)
Signed Off - Introduction To Philosophy12 - q1 - m3 - The Human Person As An Embodied Subject - v3
25 pages
27.11.2023 3 ҚМЖ ЕКІНШІ ТОҚСАН
No ratings yet
27.11.2023 3 ҚМЖ ЕКІНШІ ТОҚСАН
3 pages
Gen - Math G11 Q2 Wk2 Compound-Interest
No ratings yet
Gen - Math G11 Q2 Wk2 Compound-Interest
7 pages
Synthesis of Camphor PDF
No ratings yet
Synthesis of Camphor PDF
4 pages
Pokemon Ruby-Sapphire Cheat Codes
0% (1)
Pokemon Ruby-Sapphire Cheat Codes
11 pages
Mission Statement
100% (1)
Mission Statement
24 pages
Untitled
No ratings yet
Untitled
9 pages
WPA 17B Rules Reasons For Warning & DQ
No ratings yet
WPA 17B Rules Reasons For Warning & DQ
3 pages
Answer Questions + 320w (600w)
No ratings yet
Answer Questions + 320w (600w)
2 pages
Ken Binmore On Whiggery, The Left, and The Right
100% (1)
Ken Binmore On Whiggery, The Left, and The Right
22 pages
Contemporary Celtic Mass
No ratings yet
Contemporary Celtic Mass
11 pages
Lecture 2 - Hydrogen Atom
No ratings yet
Lecture 2 - Hydrogen Atom
60 pages
The Record Player
No ratings yet
The Record Player
7 pages
Datasheet 2777 Single Core Single Insulated Xl Hffr Fire Resistant Power Cable
No ratings yet
Datasheet 2777 Single Core Single Insulated Xl Hffr Fire Resistant Power Cable
2 pages
Outstanding Achievement in Poetry Recognized: Kenneth Boyd Named Winner in Royal Palm Literary Awards 2024
No ratings yet
Outstanding Achievement in Poetry Recognized: Kenneth Boyd Named Winner in Royal Palm Literary Awards 2024
4 pages
ACONIS Maintenance
No ratings yet
ACONIS Maintenance
15 pages
Science 9 Quarter 3 Module 1 Week 2
100% (1)
Science 9 Quarter 3 Module 1 Week 2
4 pages
Week 7 Assignment
No ratings yet
Week 7 Assignment
6 pages
TILOC Chapter 41 - The Law of Faith
No ratings yet
TILOC Chapter 41 - The Law of Faith
2 pages