Deep Learning in Solving Mathematical Equations
Deep Learning in Solving Mathematical Equations
Department of Mathematics , Faculty of Science , Omar AL- Mokhtar University , Albaida – Libya
1
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
. تشفري املكدس التلقائي، التعميم، الشبكات العصبية، التعلم العميق، املعادالت الرايضية:الكلمات املفتاحية
2
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
Deep learning is considered as a new method of machine learning which is closer to artificial
intelligence than other previous and conventional neural networks models [1]. In contrast to
the traditional neural networks which are linear models, deep networks are hierarchical
representations of data to extract complexity and abstractions of data. In other words, deep
learning networks are neural networks of many hidden layers, that is why they are called
deep, which allows them to extract different levels features from input data. The training and
working principles vary from one deep network to another [2]. For example, stacked auto-
encoder which is a deep network has its own learning principles in which it uses Greedy-
layer wise training, however, a convolutional neural network has a different architecture and
training algorithm than stacked auto-encoder.
Recently, deep learning has encountered progress in the field of mathematics. Neural
networks have been applied first in solving many differential equations by Dissanayake and
Phan-Thien [3]. Furthermore, the concept of generality in a network layer was elaborated upon by Yosinski
et al. [4] as its capacity to exhibit effective performance across a spectrum of tasks. This notion stemmed from
the observation that initial layers within image-oriented convolutional neural networks (CNNs) converged
towards comparable features, spanning multiple network architectures and applications. We assert that these
concepts share a correlation: when a particular representation fosters exemplary performance across diverse
tasks, networks that adapt to any of these tasks are likely to uncover analogous representations.
To illustrate, Li et al. [5] introduced a methodology for gauging layer similarity through the
identification of neuron alterations that enhance the interrelation between networks. In
contrast, while preserving a logarithmic efficiency, the Singular Value Canonical Correlation
Analysis (SVCCA) stands out as a more computationally streamlined alternative to change-
based techniques. Additionally, Berg and Nyström [6] concurrently endeavored to unravel the
composition of Deep Evolutionary Neural Networks (DENNs) by scrutinizing weights and
biases directly. However, their findings were impeded by the sensitivity of the outcomes to
local minima, into which their networks converged.
Hence, in this work we employ stacked auto-encoder which is a deep network consisting of
two hidden layers which are learned by this network, trained on solving mathematical
equation tasks.
3
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
4
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
5
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
Let N represent a network comprising single-layer perceptrons. In this context, xt,I pertains to
the ith input of the tth example. The weight connecting neurons, specifically wij, pertains to
the weight associated with the ith node at the tth instance.
Furthermore, the symbol θ denotes the bias or threshold applied to the neuron.
Concomitantly, Ø is indicative of the transfer or activation function characterizing the
neuron's behavior.
The term Et, j is utilized to represent the error encountered within the system. In a similar
vein, yt,i signifies the actual output, aspired for and contingent upon the network's
performance. Conversely, 'yt,i denotes the effective or actual output, an outcome derived from
the network's predictive capabilities, as delineated by Du and Swamy in 2013.
3. Proposed Stacked Auto-Encoder for solving mathematical equations.
This paper introduces a novel approach employing deep learning techniques to address
mathematical equation solving. The method leverages a stacked auto-encoder (SAE) as its
core mechanism for this purpose. The choice of utilizing an SAE was influenced by extensive
research within the mathematical domain, wherein existing networks predominantly
comprised backpropagation neural networks or convolutional neural networks. Consequently,
6
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
7
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
8
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
Table 1 presents the data samples number in the dataset and its data division for training and
testing, respectively.
The training parameters of the network during pre-training and fine-tuning are shown in
Table 2.
Table 2: Learning parameters of networks during pre-training and fine-tuning
Learning parameters Values (Pre-training) Values (Fine-training)
Number of training images 100 100
Number of output neurons 100 3
Number of layers of the network 4 4
Number of hidden layers 2 2
Learning rate 0.47 0.18
Maximum number of iterations 500\500 323\400
Transfer function Sigmoid Sigmoid
Figure 4 illustrates the learning trajectory of the network throughout its pre-training phase.
The discernible trend showcases a reduction in the network error as the iterations progress.
However, it is noteworthy that the error fails to converge to an exceedingly diminutive value
(0.04), indicative of a residual level of imprecision in the network's predictive outcomes.
9
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
10
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
The performance of the designed SAE for solving mathematical equations can be
summarized in Table 3. As can be seen, the network was trained on 470 samples during pre-
training and fine-tuning. However, its performance was not the same. It is seen that the
accuracy of network in predicting different mathematical equations was 100% during fine-
tuning which is than that obtained during pre-training. Note that this accuracy was achieved
in a shorter time (38 secs) than that of pre-training (250 secs). Moreover, it is important to
mention that the network required a smaller number of iterations during fine-tuning to
achieve an error of 0.0009, which is smaller than that reached during pre-training (0.0402).
This can be considered as a major problem as the performance of the network is measured by
its performance during fine-tuning as it is learning to predict data. However, during pre-
training the network is just trying to warm up and get some power in extracting and
identifying features in data.
3.2 Deep Models Testing
Once trained, the stacked auto-encoder network (SAE) is tested using 100 other samples.
Note that these samples are other than those used for training the network.in addition these
data include samples of all 3 classes. Table 8 shows the performance of the network during
testing, in terms of accuracy.
Table 4 shows that the trained and tested SAE was capable of achieving a good
generalization ability when tested on data that are unseen before, where it reached an
accuracy of 91.2%
11
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
4. Results Discussion
A deep stacked auto-encoder based deep learning is developed in this study from the purpose
of solving mathematical equations. The designed SAE is first built using two auto-encoders
and then it is pre-trained in order to learn the extraction of useful features from input data
which helps then in predicting the answers of different mathematical equations more
accurately. These kinds of networks are trained using Greedy-layer wise training. This is an
algorithm that allows the training of an auto-encoder layer by layer, in an unsupervised
manner. The importance of this training technique (pre-training) is that it provides the
network with good features extracting abilities and learned weights that are used in fine-
tuning the network.
In this study, the data is normalized before being fed into network and then the pre-training
starts. Then, the same data are fed into a network which is also fine-tuned to predict answers
of mathematical equations of different 3 classes. Table 5 shows the deep networks’
performances.
Table 5: Performance evaluation
Performance parameters SAE1
Number of training images 100
Number of testing images 100
Training recognition rates 100%
Testing recognition rates 91.2%
Minimum square error 0.0009
achieved (MSE)
Iterations required 323
Training time 288 secs
12
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
The current investigation centers around the utilization of a deep network, specifically a
stacked auto-encoder, as a tool for the resolution of mathematical equations. The application
of a deep network, especially a stacked auto-encoder, within this domain is inherently
intricate, given the task's demand for precise equation solution based on their intrinsic data
and attributes. This prerequisite for high accuracy is driven by the field's intolerance for even
minor error margins. Thus, the deployment of a sophisticated predictive model like the deep
network is instrumental in expediting and facilitating the work of mathematicians.
The chosen approach employs a stacked auto-encoder model, which is subjected to training
and testing using datasets comprising 100 samples each. The model's initial training involves
the utilization of normalized data derived directly from the equations. The primary aim of
data normalization is the enhancement and extraction of pertinent, distinctive attributes,
enabling the model to discern unique characteristics within each sample. This feature
extraction process is crucial, contributing to the model's proficiency in accurately
categorizing the data. Consequently, this enhancement in classification accuracy serves to
expedite the network's learning process and bolster its convergence. Subsequently, the model
is assessed through testing on a distinct set of 100 samples, with its performance evaluated in
terms of accuracy, training time, and the extent of error incurred.
In summation, the assessment of the stacked auto-encoder's performance demonstrates its
adeptness at extracting pertinent features and delivering accurate predictions for previously
unseen equations. This underscores the model's potential as an effective predictor for
mathematical equations, given a preliminary preprocessing stage aimed at refining
classification, all while adhering to a stringent error margin.
13
University of Benghazi جامعة بنغازي
Faculty of Education Almarj كلية التربية – المرج
ISSN 2518-5845
[2] Helwan, A., & Uzun Ozsahin, D. (2017). Sliding window based machine learning system
for the left ventricle localization in MR cardiac images. Applied Computational Intelligence
and Soft Computing, 2017.
[4] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are
features in deep neural networks? In Advances in neural information processing systems,
pages 3320–3328, 2014.
[5] Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, and John Hopcroft. Convergent
learning: Do different neural networks learn the same representations? In Feature Extraction:
Modern Questions and Challenges, pages 196–212, 2015.
[6] Jens Berg and Kaj Nyström. A unified deep artificial neural network approach to partial
differential equations in complex geometries. arXiv preprint arXiv:1711.06464, 2017.
[7] Haykin, S. S. (2009). Neural networks and learning machines (Vol. 3). Upper Saddle
River, NJ, USA:: Pearson.
[8] Du, K. L., & Swamy, M. N. (2013). Neural networks and statistical learning. Springer
Science & Business Media.
[9] Zhang, Y., Jiang, D., & Wang, J. (2002). A recurrent neural network for solving Sylvester
equation with time-varying coefficients. IEEE Transactions on Neural
Networks, 13(5), 1053-1063.
[10] Lagaris, I. E., Likas, A., & Fotiadis, D. I. (1998). Artificial neural networks for solving
ordinary and partial differential equations. IEEE transactions on neural
networks, 9(5), 987-1000.
14