0% found this document useful (0 votes)
43 views

Exploration of English Speech Translation Recognition Based On The LSTM RNN Algorithm

Uploaded by

dongruichan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Exploration of English Speech Translation Recognition Based On The LSTM RNN Algorithm

Uploaded by

dongruichan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Neural Computing and Applications (2023) 35:24961–24970

https://doi.org/10.1007/s00521-023-08462-8 (0123456789().,-volV)(0123456789().
,- volV)

S . I . : E V O L U T I O N A R Y C O M P U T A T I O N B A S E D M E T H O D S A N D A P P L I CA T I O N S F O R
DATA PROCESSING

Exploration of English speech translation recognition based


on the LSTM RNN algorithm
Qiwei Yuan1 • Yu Dai2 • Guangming Li3

Received: 10 October 2022 / Accepted: 3 March 2023 / Published online: 23 March 2023
Ó The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023

Abstract
In today’s information society, the demand for intelligence is increasing daily. English speech translation recognition
technology based on the LSTM (long short-term memory) recurrent neural network (RNN) algorithm is an important
manifestations of computer intelligence. In recent years, many scholars have conducted research on speech translation
recognition technology, including template matching and statistical pattern recognition. Each of these methods has its
drawbacks. This paper discusses English speech recognition techniques by utilizing the basic RNN principles. Moreover,
its application and construction in practice, which can provide some useful reference for future researchers, are analysed.
LSTM RNN is an intelligent system that is different from traditional pattern recognition methods. The greatest difference is
that it simulates the information processing of the human brain and realizes the intelligent information processing in a
distributed manner. It has a variety of automatic recognition and extraction functions, such as storage, association, and
retrieval, especially for speech translation and recognition problems with high perception ability. This new neural network
recognition system has a strong scientific nature and can store sound information in a decentralized manner, similar to the
human brain. The LSTM RNN has been widely used in the speech recognition field due to its excellent performance in
extraction and classification. The study found that the recognition accuracy of the original RNN was generally maintained
between 48 and 54%, and the data loss rate was relatively high. The accuracy rate of speech recognition based on LSTM
RNN was as high as 94%, and the information storage efficiency was high, which greatly avoided repetitive processes. The
voice data processing speed can be completed in 4.5 s at the fastest, which plays an important role in terms of mass
satisfaction and social development needs.

Keywords LSTM (long short-term memory)  Recurrent neural network algorithm  English voice translator 
Translation recognition system

& Yu Dai
1 Introduction
[email protected]
Currently, the needs of human life are changing, especially
Qiwei Yuan
[email protected] in terms of computers. The initial simple applications can
no longer meet people’s needs. RNN is a good neural
Guangming Li
[email protected] network model that can process and predict sequence data.
RNNs and their variant networks have been widely used in
1
College of Foreign Language, Shaoyang University, many tasks, especially with certain time dependencies. In
Shaoyang 422000, Hunan, China speech recognition, machine translation, language mod-
2
College of International Education, Hunan University of elling, text classification, word vector generation, and
Medicine, Huaihua 418000, Hunan, China information retrieval, a model is needed to integrate and
3
Department of Mechanical and Energy Engineering, train data. However, RNNs are generally difficult to train.
Shaoyang University, Shaoyang 422000, Hunan, China After repeated loops, the gradient gradually disappears, and

123
24962 Neural Computing and Applications (2023) 35:24961–24970

the problem of gradient explosion rarely occurs. Due to will also become a new development trend for English
some problems in the practical application of RNNs, speech translation.
LSTM networks have attracted extensive attention as an
effective method for saving data. Improvements in English
speech translation recognition with LSTM RNN algorithms 2 Related work
have also appeared one after another.
In addition, in the past two decades, with the advance- At present, due to the development of science and tech-
ment of technology, computer performance has become nology, people’s demand for computer intelligence is
increasingly better. People’s understanding of speech increasing, and speech translation and recognition tech-
translation is also becoming increasingly mature, and many nology is one of the main intelligent technologies. Many
of our dream functions can be applied in real life. Speech scholars have conducted research on related aspects and
translation recognition technology is not only a science but proposed many translation recognition algorithms, but
also closely related to the fields of acoustics, intelligent these methods have certain drawbacks (such as poor-
computer recognition, and data processing. In daily quality translations, delays in data processing, and trans-
research and analysis, three aspects are often analysed from lations that are often very different from the original
template matching, random model, and probability gram- intent). Based on this, this paper discusses the recognition
mar. In addition, more research methods are constantly technology of speech translation from the perspective of
being discovered. neural networks and analyses its application and imple-
At present, the English speech translation recognition mentation in practice to meet the needs of today’s society.
system is still based on hidden Markov patterns for mod- For the purpose of English phonetic translation retrieval,
elling the time series of speech signals in English. Its an English listening translation retrieval system can be
model parameters cannot be well trained, which seriously designed and implemented, and different English phonetic
affects speech recognition and translation abilities. The retrieval, query and translation implementation methods
English speech translation recognition system based on the can be analysed. DuS proposed an English speech trans-
LSTM RNN algorithm mentioned in this paper is a forward lation recognition system based on the LSTM recurrent
neural network with multiple hidden layers, and each layer neural network algorithm to solve the translation ambiguity
is trained separately (including the classifier at the last problem in bilingual dictionary-based information retrieval
level). This enables more comprehensive training of the systems [1]. The difference between Chinese and English is
model parameters under limited training data, and the that English attaches great importance to syllable stress.
advantages of the LSTM RNN in the English speech Therefore, syllable recognition in English is a key link in
translation recognition task are supported by corresponding English teaching. Chen X’s research was a study on Eng-
evidence. lish speech recognition based on LSTM RNN. On this
English education is an important method for cultivating basis, an acoustic model based on weighted transfer was
language talent, but in practice, English education devel- proposed, that is, a one-step method and a two-step
opment has encountered many difficulties, thus affecting method. In English speech recognition, the convolutional
the quality of final education. English voice translators are neural network (CNN) and the LSTM neural network were
an indispensable tool in today’s teaching and can effec- compared and combined with the LSTM neural network to
tively solve various problems encountered in daily life. The form a statistical chart. The experimental results showed
core of the English translation system is speech recogni- that compared with other methods, the LSTM neural net-
tion. Through optimized design and comprehensive appli- work has obvious advantages in English translation, and it
cation, it can improve the level of English teaching. is suitable for subsequently teaching English speech
After research, this paper found that the recognition recognition models and developing real-time translation
accuracy of the original RNN is usually maintained in the equipment [2]. Zenkel T introduced a suite of open source
range of 48–54%, while the data loss rate is relatively high. English speech translations. Although there are a variety of
Speech recognition using LSTM RNN has a 94% accuracy open source tools to perform basic speech translation, the
rate, effectively saves a large amount of data, and greatly purpose is to provide a convenient method for overall
reduces repetitive processing. The processing time of voice English speech translation. Therefore, Docker (open source
data is less than 4.5 s, which plays a great role in meeting application container engine) can be used, which includes
the satisfaction of the people and the needs of social the following components: a neural speech recognition
development. The speech processing rate based on LSTM system, a sentence segmentation system, and an attention
recurrent neural network is relatively slow, and the LSTM conversion system. In addition, it also provides a pre-
recurrent neural network can reach 4.5 s in the shortest training model for this purpose to promote the development
time. At the same time, LSTM recurrent neural network of voice translation systems and encourage researchers to

123
Neural Computing and Applications (2023) 35:24961–24970 24963

improve overall translation recognition system perfor- model [7]. Zhu H proposed a language vocabulary classi-
mance [3]. Speech recognition or speech-to-text conversion fication type based on the LSTM RNN algorithm and used
has rapidly stimulated great interest in large organizations an English dictionary (WordNet) based on cognitive lin-
to simplify the human–computer communication process. guistics of synonyms, antonyms, and context words. The
Dharmale G found that optimizing the speech recognition English speech translation recognition system based on the
process is crucial because real-time users want to perform LSTM RNN algorithm can utilize the semantic relationship
actions based on voice input, and these actions sometimes of WordNet to improve the ease of use of English speech
define the user’s lifestyle, thus defining the speech process. translation recognition systems [8]. Worldwide, research
Some existing speech recognition software from Google, on English speech translation recognition systems has
Tencent and Microsoft tend to be more than 90% accurate turned to neural networks, which will be a hot topic for a
in real-time speech detection. The English speech transla- long time in the future.
tion recognition system based on the LSTM cyclic neural After a long period of development, the performance
network algorithm combines the speech recognition and role of speech translation and recognition systems have
methods used by these software programs and language been greatly improved, but there are still many problems
processing, which improves the overall process accuracy that need to be studied in the future. Although speech
with the help of speech analysis [4]. In English feature recognition technology based on neural networks has good
recognition, the speech translation recognition technology performance, it is difficult to widely use in practice due to
accuracy is particularly prominent. Hou Q took the intel- its unsatisfactory learning speed. With the continuous
ligent learning algorithm as the system algorithm and used development of its algorithm and the simplification of the
linear classification and nonlinear classification methods to model, the recognition rate of the LSTM RNN will be
carry out relevant subjective identification. By making use further improved, and it will gradually become the main-
of speech phase insensitivity, it can achieve the goal of stream in the market and occupy a place in various
noise reduction and better recognize and translate English industries.
[5]. With the continuous progress of society, RNNs based
on long- and short-term memory are increasingly favoured
by people, as they solve many historical problems for 3 English speech translation recognition
mankind and contribute to the development of the country. system based on the LSTM RNN algorithm
Therefore, English translation recognition technology
based on the LSTMRNN algorithm has become an 3.1 Technical basis for the design of an English
important direction in the field of human–computer inter- speech translation recognition system
action. In English, the use of speech recognition technol-
ogy to help teachers correct pronunciation has a certain Currently, the trend of education informatization in China
effect but can also help students not be limited by time and is becoming increasingly prominent. Both teachers and
space. Duan R used the LSTM cyclic neural network students need to build an information platform for English
algorithm to improve and analyse the speech recognition teaching activities, which has changed the shortcomings
algorithm and used the effective algorithm as a systematic (such as indoctrination teaching, time and space con-
algorithm for the English speech translation recognition straints, poor classroom atmosphere, and insufficient
model. Additionally, the basic speech-cutting process was human translation ability) of the previous English teaching
described. In addition, a control experiment was designed mode. English speech recognition plays an important role
to verify and analyse the English speech translation in translation software, and its recognition process is shown
recognition correction model based on the LSTM recurrent in Fig. 1. It is mainly used for identifying multiple lan-
neural network algorithm [6]. Hai Y aimed to make use of guages to help students quickly grasp English connotations.
English-specific syllables and prosodic features in spoken There are three main aspects of English speech translation
language data for English speech translation recognition recognition technology: feature extraction, pattern match-
and explored effective methods for speech detection and ing, and model training.
recognition systems. The method is based on a combination
of classifiers and syllable classifiers combined with other 3.1.1 Feature extraction technology
speech features based on speech rate, intensity, formant
and energy statistics, and articulation rate. Compared with The auxiliary function of the English language translation
syllable classifiers trained on specific syllables, it achieved speech recognition system usually involves three aspects:
better recognition rates. It was found that the recognition collection, processing and transmission. In addition, com-
performance of the English speech translation recognition puter language and natural language are very different.
system was significantly better than that of the traditional Therefore, correctly distinguishing the difference between

123
24964 Neural Computing and Applications (2023) 35:24961–24970

the two when translating is an urgent problem [9]. Feature feedback between the output of the network and the model.
extraction technology can extract features from the English Therefore, FNNs still have great drawbacks in many cases
language and transmit correct language signals to transla- [11]. The greatest RNN feature is that its continuous net-
tors to improve computer translation accuracy. work structure is well adapted to time series data and can
maintain data correlation. Figure 3 shows the RNN net-
3.1.2 Pattern matching technology work structure, which uses the loop on the hidden layer to
reduce the neural network parameters to be trained.
Speech recognition systems can help students and teachers In addition, due to the existence of shared parameters,
quickly understand the meaning of language through pat- data of different lengths can be extended, so the RNN input
tern matching technology and avoid the trouble caused by can be an indeterminately long sequence. For example, to
artificial language errors. Pattern matching technology uses train a fixed sentence, if a FNN is used, then each input
intelligent pattern recognition technology to automatically feature will have independent parameters, while the RNN
recognize and analyse speech input, which reduces the is completely the opposite. Although the original goal of
difficulty of manual translation. It can automatically select RNNs is to learn long-term dependencies, extensive prac-
a matching translation mode according to the structure, tice has proven that standard RNNs often struggle to pre-
grammar and application of English words and sentences. serve information for long periods of time [12]. Therefore,
People can obtain the final translation result by executing the application of an RNN in the initial stage is not uni-
the program command, which is of great help to both versal. For this problem, this paper improved the traditional
students and teachers. RNN. LSTM is the most effective method at present.
Compared with the implicit RNN unit, the internal struc-
3.2 Model training technology ture of the implicit LSTM unit is more complex and has
more options. The LSTM RNN type is shown in Fig. 4.
To realize teaching informatization, a translation recogni- In addition, the LSTM method is used for feature
tion system based on speech recognition is proposed. In extraction, and the fully connected layer method is used for
English teaching, it can help teachers solve translation regression classification. The RNN system of LSTM con-
problems and improve students’ ability to understand sists of 5 layers, which consist of two LSTMs, two fully
English knowledge. After completing speech recognition, connected and outputs. The structure is shown in Fig. 5.
the translator conducts simulation training according to the
actual situation to establish a virtual language training 3.4 Development status of English speech
platform. The simulation training technology uses the recognition and translation technology
design concept of man–machine integration. By combining
translators and speech recognizers for training, it can Currently, speech translation recognition has made great
quickly identify and judge the degree of English pronun- progress in theory and practice, which has greatly pro-
ciation and guide students to adjust their speech [10]. moted communication and collaboration between people in
different languages and cultural backgrounds. The sum-
3.3 LSTM RNN model structure mary is as follows:
First, the number of translated words is increasing. With
An FNN (feedforward neural network) can also be con- the progress of technology, the number of speech transla-
sidered a primitive neural network, which is composed of a tions is also growing exponentially. From the initial hun-
series of simple neurons. Figure 2 is a simple FNN that dreds of English words to the current tens of thousands,
includes the input layer, the hidden layer, and the output translation efficiency has been greatly improved.
layer. There are no loops in the network, and there is no

voice Feature measure Identify


preprocessing
input extraction test estimation decisions Recognition
result

result
model output
library

Fig. 1 Identification flowchart

123
Neural Computing and Applications (2023) 35:24961–24970 24965

grammatical restrictions of input and improves the analysis


ability of spoken languages [14].
Third, translation is complex and integrated. In speech

input layer
recognition systems, the translation algorithm usually

hidden
output

layer
layer performs interactive processing through multiple transla-
tion methods. In this way, their respective advantages are
complemented, and the problem of a single algorithm is
overcome, thereby achieving the goal of a multiengine
Fig. 2 Simple structure of FNN
translation strategy. For example, translation software such
as Google Translate and Youdao Dictionary all use this
multiengine translation algorithm [15].
hidden layer
output Fourth, a large amount of world knowledge and lan-
layer input layer guage expression environment knowledge is introduced
into the speech recognition translation system. To improve
translation accuracy, many research groups are working to
Connect to the next time
step
introduce the knowledge of social roles, conversation
scenes, body movements, and expression into speech
Fig. 3 Network structure of RNN recognition systems. Some research groups are also using
television and image capture technology to help aid speech
recognition translation by collecting and analysing the
output gating speaker’s facial expressions, movements and environment.
unit Even if the translation effect is not very good, the listener
can roughly judge the meaning of the other party from
other information [16].
output Fifth, it begins the technological development from one
extrusion unit language to multilingual, multicontext, two-way commu-
nication. The previous translation systems used a single
output gate
language as the object to complete a single voice conver-
unit
sion. The current translation system is for multilanguage
and multidomain two-way speech, which greatly promotes
memory
cells the exchange of information between the two parties.
Table 1 shows an overview of the current state of
forget gate unit
development.

input gating 3.5 Difference between speech recognition


unit
based on LSTM neural network and original
input gate unit
speech recognition
enter
extrusion unit
The development of English professional teaching in
practice has encountered many difficulties, which have
affected the final education quality. Therefore, the original
Fig. 4 LSTM RNN model teaching method should be changed to create a more sci-
entific teaching platform [17]. English translation is an
Second, speech translation technology has entered more indispensable tool in today’s English teaching that can help
fields. In the early stages, people usually standardized teachers with many problems in daily life. The core of the
translations and sentence patterns. That is, whether the English translation system is speech recognition, and the
input sentence is lexical, grammatical or word order, it quality of English teaching can be improved through
must strictly follow language norms and restrictions; optimized design and comprehensive application [18].
otherwise, the quality of translation cannot be guaranteed The English speech translation recognition system is an
[13]. Now, for some common languages, even if there are auxiliary tool commonly used in English teaching. Figure 6
problems with word order, language disorders, and pro- is the basic block diagram of the English speech translation
nunciation, speech translation technology can be used to recognition system. Figure 6 shows that the process is
deal with it effectively, which greatly reduces the simple and easy to operate. In the case of high efficiency

123
24966 Neural Computing and Applications (2023) 35:24961–24970

Fig. 5 Schematic diagram of raw


LSTM-based translation sequence
time
recognition network structure data
window
data fully connected layer
LSTM LSTM

LSTM LSTM

result

LSTM LSTM

LSTM LSTM

Table 1 Overview of
Development status Vocabulary Translation quality (%) Translation needs (%)
development status
Translation volume increased More 85.69 89.62
Wide range of translation More 80.99 93.69
Translation diversity More 90.15 95.22
Wide range of translation situations More 92.66 93.88

and speed, it perfectly replaces many complex traditional 3.5.2 Humanization


RNN processes. It can not only conform to the trend of
educational informatization but also greatly improve the The English speech recognition system adopts the LSTM
quality of teachers’ classroom teaching. A ‘‘translator’’ is a cyclic neural network algorithm, which is also a human–
digital tool that can realize automatic language information computer interaction method. Through the interaction
processing, which replaces artificial language translation between natural languages, it solves the drawbacks of the
and improves translation efficiency. traditional English teaching mode. ‘‘Human–machine’’
From the practical application of English teaching, the teaching uses multifunctional software as a platform to
advantages of English speech translation recognition soft- create a harmonious learning environment for English
ware based on the LSTM RNN algorithm are as follows: teaching, learning, and translation. This teaching mode
establishes a man–machine integrated control system [9].
3.5.1 Automation
3.6 Evaluation of LSTM RNN Model construction
The LSTM-based RNN is a cutting-edge achievement in
the current information technology field. It uses computer LSTM RNNs are different from feedforward neural net-
data processing technology to replace manual operations works (FNNs). They can store the previous information in
and builds an automatic translation operation platform [19]. hidden nodes in the middle, which can have a certain
In English teaching, translation is a major problem. influence on the network output [20]. In traditional feed-
Translation software can help teachers teach in the class- forward RNNs, parameters are trained using an algorithm
room so that students can better understand English that performs backpropagation over time. It is assumed that
knowledge. at each time t, the RNN has a piece of monitoring

Fig. 6 Basic block diagram of Preliminary identification


identification system results Recognition
English
voice signal feature pattern post- result
preprocessing
advance matching processing

speech
corpus acoustic language text corpus
dictionary model
model

123
Neural Computing and Applications (2023) 35:24961–24970 24967

information. The information loss is dt and the total loss is the state of the current input information cannot affect the
as follows: output information of the output gate. The increase in the
X
T connection of the current input gate to the output gate can
dt ð1Þ better control the output data of each memory cell.
t¼1

By using the chain law, the gradient of loss o about l


can be found: 4 Comparison of original RNN and LSTM
RNN
#d XT X t
#hk #yt #dt
¼ f ðhi Þ ð2Þ
#l t¼1 k¼1 #l #ht #yt RNN is a special self-connected network in the field of
h 0 i learning that can complete the mapping between complex
Y
t
f ð hi Þ ¼ lT diag f ðhi1 Þ ð3Þ vectors to simple vectors. It has strong computing power
i¼kþ1 and has the functions of association and memory. However,
due to its difficulty in implementation, it was quickly
Based on this, there are:
h 0 i replaced by other neural networks and traditional machine
s ¼ lT diag f ðhi1 Þ ð4Þ learning methods. A large number of practical applications
also prove that RNNs have difficulty achieving long-term
Formula (4) is stk . If s \ 1, when ðt  kÞ ! 1, data storage. Therefore, an LSTM-based RNN was intro-
tk duced to improve the original RNN model. The two are
s ? 0 will have gradient disappearance. Due to the long
network transmission period, the update speed of the net- quite different in terms of information storage integrity
work weights is slow. Additionally, due to gradient rate, recognition accuracy rate, voice data processing
explosion and disappearance in the parameter learning speed, and mass satisfaction. Figure 7 shows the compar-
process, it cannot reflect the long-term memory effect of ison of recognition accuracy.
RNNs. Therefore, an RNN model based on LSTM is Figure 7 is a comparison of the speech recognition
proposed. accuracy of the original RNN and LSTM RNN. From
In the LSTM neural network, a ring link network called Fig. 7a, it can be seen that the recognition accuracy of the
‘‘memory’’ is first used to replace the hidden layer nodes in original RNN was generally maintained between 48 and
the traditional network. Second, a threshold mechanism is 54%. The accuracy rate of speech recognition based on
employed to control the information accumulation rate, LSTM RNN in Fig. 7b was as high as 94%, which plays an
which provides a new function for writing, reading and indispensable role in terms of mass satisfaction and social
resetting memory cells. The forget gate ft is used to control development needs. Figure 8 shows the comparison of the
how much information each memory cell needs to forget. completeness rate of information storage.
The input gate it is used to control how much new infor- According to the data analysis of the information storage
mation is added to each memory cell. The output gate ot is rate of the original CNN in Fig. 8a, the storage rate was
used to control how much information each memory cell only up to 74%, which can also be said to be a relatively
outputs. The LSTM neural network can choose to forget high churn rate. If the same information needs to be
previously accumulated information so that the LSTM translated and recognized again, the data need to be re-
model can learn long-term historical information. At time t, entered, which makes it difficult to improve work effi-
Ct represents all historical information and is controlled by ciency. In Fig. 8b, the information storage rate of the
the input gate it , forget gate ft , and output gate ot . The LSTM RNN was as high as 97%, which greatly avoided
LSTM operates as follows at time t: repetitive processes and improved work efficiency. Fig-
it ¼ uðWi xt þ li ht1 þ Vi ct1 Þ ð5Þ ure 9 shows the comparison of voice data processing
  speed.
ft ¼ u Wf xt þ lf ht1 þ Vf ct1 ð6Þ Figure 9 is a comparison of the voice data processing
  speed of the original RNN and LSTM cycle neural net-
ot ¼ Wo xt þ lf ht1 þ Vf ct1 ð7Þ
work. It can be seen in Fig. 9a that the processing speed of
Ct ¼ tan hðWc xt þ lc ht1 Þ ð8Þ the original RNN was relatively slow, and it required 8 s.
In Fig. 9b, the voice data processing speed of the LSTM
Ct ¼ f t  Ct1 þ it  Ct ð9Þ
cyclic neural network was completed at the fastest speed of
ht ¼ ot tan hðCt Þ ð10Þ 4.5 s, which greatly facilitates the needs of social devel-
opment in the science and technology era. Additionally, the
It can be seen from this that the LSTM network is an
LSTM cyclic neural network will also be a future research
RNN with a large amount of extended memory. In LSTM,
direction of the English voice translation field.

123
24968 Neural Computing and Applications (2023) 35:24961–24970

Fig. 7 Comparison of goal one goal two goal one goal two
recognition accuracy goal three goal four goal three goal four
55% 95%
54% 94%
53% 93%
52%
92%

recognition accuracy

recognition accuracy
51%
91%
50%
90%
49%
89%
48%
47% 88%

46% 87%

45% 86%
first second third forth fifth sixth first second third forth fifth sixth
time time time time time time time time time time time time

original RNN LSTM recurrent neural network

a. Original RNN recognition accuracy b. LSTM RNN recognition accuracy

Fig. 8 Information storage ratio object four object three


comparison
sixth survey
fifth survey
original RNN

forth survey
third survey
second survey
first survey

66% 67% 68% 69% 70% 71% 72% 73% 74% 75%
storage integrity

a. Original CNN information storage rate

object four object three


sixth survey
LSTM recurrent neural

fifth survey
network

forth survey
third survey
second survey
first survey

91% 92% 93% 94% 95% 96% 97% 98%


storage integrity

b. LSTM RNN information storage rate

123
Neural Computing and Applications (2023) 35:24961–24970 24969

Fig. 9 Comparison of voice goal one goal two goal three goal four goal one goal two goal three goal four
data processing speed 12 6

10 5

8 4

data processing speed

data processing speed


6 3

4 2

2 1

0 0

original RNN LSTM recurrent neural network


a. Primitive RNN data processing speed b. LSTM cyclic neural network data processing speed

5 Conclusions References

Throughout the text, it can be seen that the intelligent 1. Du S (2019) Optimization of speech recognition system of eng-
lish education industry based on machine learning. Computer-
translation system of English voice recognition is a new Aided Des Appl 17(1):124–136
translation technology based on information technology 2. Chen X (2021) Simulation of english speech translation recog-
and intelligent technology. After decades of development, nition based on transfer learning and CNN neural network.
it has made great progress in theory and practice and has J Intell Fuzzy Syst 40(2):2349–2360
3. Zenkel T, Sperber M, Niehues J (2018) An open source toolkit for
realized the desire to communicate across languages. speech-to-English text translation. Prague Bull Math Ling
LSTM cyclic neural networks are an important research 111(1):125–135
direction for current learning research. They can process 4. Dharmale G, Thakare VM, Patil DD (2019) Implementation of
sequence data such as text, audio, and video and achieve Efficient speech recognition system on mobile device for Hindi
and English language. Int J Adv Comput Sci Appl 10(2):83–87
significant results in many aspects. However, the explo- 5. Hou Q, Li C, Kang M (2020) Intelligent model for speech
ration of components in the circulatory structure continues recognition based on SVM: a case study on English language.
and continuously improves computing components to J Intell Fuzzy Syst 40(7):1–11
improve performance. However, there are still many 6. Duan R, Wang Y, Qin H (2020) A speech recognition model for
correcting spoken English teaching. Journal of Intelligence and
defects in the current voice recognition translation system. Fuzzy Systems 40(1):1–12
How to further improve translation quality has become a 7. Hai Y (2020) Computer-aided teaching mode of oral English
current problem faced by scientists, which requires all intelligent learning based on speech recognition and network
scientific researchers to work together to create tomorrow’s assistance. J Intell Fuzzy Syst 39(4):5749–5760
8. Zhu H (2020) Construction of English spoken language system
voice recognition translation technology. based on machine learning algorithm and natural language
recognition. J Intell Fuzzy Syst 39(99):1–12
9. Sangeetha J, Jothilakshmi S (2017) Speech translation system for
Funding This work was supported by Shaoyang Science and Tech- english to dravidian languages. Appl Intell 46(3):534–550
nology Planning Project (2021025ZD): Construction of College 10. Mott M, Midgley KJ, Holcomb PJ (2020) Speech recognition
English online education Platform under the background of translation initiation and image effects in American Sign Lan-
"Internet+". guage deaf and English listening learners. Biling Lang Cognit
23(5):1032–1044
Data availability statement Data sharing not applicable to this article 11. Mendel LL, Poussen M, Bass JK (2019) English speech recog-
as no datasets were generated or analyzed during the current study. nition threshold test for Spanish children. Am J Audiol 28(1):1–8
12. Long Y, Li Y, Zhang Q (2020) Acoustic data augmentation for
Mandarin-English code-switching speech recognition. Appl
Declarations Acoust 161(11):107–125
13. Feng X, Zhou Y (2021) English translation language retrieval
Conflict of interest These are no potential competing interests in our based on adaptive English phonetic adjustment algorithm. Com-
paper. plexity 202(1):1–12
14. Cao D, Guo Y (2020) Algorithm research of spoken English
assessment based on fuzzy measure and speech recognition
technology. Int J Biom 12(1):120–131

123
24970 Neural Computing and Applications (2023) 35:24961–24970

15. Miller MK, Calandruccio L, Buss E (2019) Masked English 20. Bawa S (2021) A Sanskrit-to-English machine translation using
speech recognition performance in younger and older Spanish– hybridization of direct and rule-based approach. Neural Comput
English bilingual and English monolingual children. J Speech Appl 33:2819–2838
Lang Hear Res 62(12):1–14
16. Yun Z (2017) Research on spoken english speech recognition Publisher’s Note Springer Nature remains neutral with regard to
technology in computer network environment. Boletin Tecnico/ jurisdictional claims in published maps and institutional affiliations.
Tech Bull 55(16):445–449
17. Zhang Y, Liu L (2018) Using computer speech recognition
Springer Nature or its licensor (e.g. a society or other partner) holds
technology to evaluate spoken English. Educ Sci Theory Pract
exclusive rights to this article under a publishing agreement with the
18(5):20–31
author(s) or other rightsholder(s); author self-archiving of the
18. Hidayat R, Winursito A (2021) Improved MFCC robust English
accepted manuscript version of this article is solely governed by the
speech recognition based on wavelet denoising. Int J Intell Eng
terms of such publishing agreement and applicable law.
Syst 14(1):12–21
19. Pathak A, Pakray P, Bentham J (2019) English-Mizo machine
translation using neural and statistical approaches. Neural Com-
put Appl 31:7615–7631

123

You might also like