Earthquake Prediction Using Machine Learning Using Support Vector Machine Algorithm
Earthquake Prediction Using Machine Learning Using Support Vector Machine Algorithm
Abstract
Earthquake forecasting is one of the most significant issues in Earth science because of its
devastating consequences. Current earthquake forecasting scientific studies focus on three key points:
when the disaster will occur, where it will occur and how big it will be. Scientists can predict where an
earthquake will occur but it has been a major challenge to predict when it will occur and how powerful it
will be. When the earthquake happens, we must fix this project. Specifically, you predict the time left
before laboratory earthquakes occur from real-time seismic data that will have the potential to improve
earthquake hazard assessments that could save lives and billions of dollars in infrastructure.
1. Introduction
The prediction of earthquake as a whole has proved to be a challenge which is essentially
impossible. With modern computing power, machine learning techniques, and a significantly narrowed
emphasis, however, this important role may possibly be made some headway. To this end, predict the
occurrence of earthquakes using seismic signal.
Earthquake prediction research projects in a variety of countries are reviewed in accordance with
achievements in various disciplines involved in earthquake prediction science, i.e. geodesic work,
observation of tide gages, continuous observation of crustal movement, seismic activity and seismological
system, seismic wave frequency, geotectonic work, geomagnetic and geoelectric work and laboratory
work and its application in the field. Present-day progress of earthquake prediction research indicates that
significant, if not all, prediction of any class of earthquakes might be feasible within a span of a few tens
of years, given that basic data could be collected steadily.
The goal is to identify hidden signals preceding earthquakes by listening to the acoustic signal
generated by a fault in the laboratory. We use ML to classify telltale sounds that predict when a quake
will occur — like a squeaky door.
Acoustic / seismic precursors to failure seem to be an almost common material phenomenon. For
example, failure in granular materials is often followed by impulsive acoustic / seismic precursors, many
of which are very small. Precursors are found in laboratory faults and corresponding earthquakes are
commonly but not routinely detected. We claim that the magnitudes of seismic precursors can be very
small, and so often go unrecorded or unidentified.
Laboratory system
During the inter-event interval, the driving piston displaces at a very constant speed of 5 μm /s
and accelerates momentarily during the slip. The acoustic emission (AE) emanating from shearing layers
is reported by an accelerometer. As approaches to failure the rate of impulsive precursors accelerates.
Our goal is to estimate the remaining time before the next failure (Figure 1a, bottom) using only
local, moving time windows of AE data based on time windows statistical features.
We measure a collection of about 100 potentially relevant statistical features from each time
window (e.g., mean, variance, kurtosis, and autocorrelation). Instead, recursively pick the most useful
features. The Random Forest (RF) uses these selected characteristics to estimate the remaining time
before the next failure.
Each prediction uses only the acoustic signal information within a single time-window.
Therefore, through listening to the acoustic signal that the system is currently emitting, we estimate the
time remaining before it fails — a "now" prediction based on the system's instantaneous physical
characteristics that do not allow use of its background.
We consider that statistics which quantify the distribution of the signal amplitude (e.g., its
variance and higher-order moments) are highly effective in predicting failure. The variation, which
characterizes the fluctuation in the average amplitude of the signal, is the strongest single feature early in
time. As the system reaches failure, other outlier statistics such as the kurtosis and thresholds are also
becoming predictive.
Indicates a sequence of raw time distant from failure. The signal exhibits minor modulations that
question eye-by-eye recognition and persist throughout the stress cycle. As failure occurs, these
modulations increase in amplitude, as determined by the change in signal variance.
Our ML-driven analysis suggests that the device emits a small but gradually increasing amount of
energy throughout the stress cycle, before the stored energy is suddenly released when a slip event occurs.
The fact that the RF has never seen timing prediction can be made under conditions implies that
the time series signal captures fundamental physics which leads to the prediction.
2.Dataset
An earthquake occurs when Earth's large blocks, mostly near the tectonic plate interface, abruptly
slip along Earth's fractures or faults. The same tension that holds the rock in place under pressure —
friction — builds up to a point where the rocks easily and violently slip past each other, releasing energy
through seismic waves. In the laboratory, researchers at Los Alamos National Laboratory imitated a real
earthquake using steel blocks that communicate with rocky material (fault gauge) to cause slipping that
generated seismic sounds. Seismic data (acoustic data) are collected using a piezo-ceramic sensor, which
by incoming seismic waves produces a voltage upon deformation. The input seismic data is this voltage
registered, in integer. The team recognizes that the laboratory experiment's physical traits (such as shear
stresses and thermal properties) are different from the real world. Seismic data was recorded in 4096
specimen bins. Each bin(input) is a chunk of 0.0375 seconds of seismic data (ordered in time), which is
registered at 4 MHz, thus 150,000 data points, and the output remains time until the next lab earthquake
in seconds, but there is a 12-microsecond gap between each bin, an artifact of the recording instrument.
Both the instruction and the test set derive from the same experiment.
Train and test folder are in the data section. We have two columns in the train dataset, one is
acoustic data which is nothing but the seismic signal and another column has time to failure which means
the time in seconds until the next earthquake in the laboratory. The seismic signal is nothing but the
waves of energy moving through the layer of the earth which can occur because of that depending on the
frequency of the signal earthquake. There is no time-continuous correlation between the training and
research sets.
3.Features
Training Data:
Test Data:
Seg id- the test segment ids (one prediction per segment) for which predictions should be
made.
Acoustic data-the seismic signal [int16 ] for which the forecast is made.
Test Data Instances: 2624 files, each file having 150,000 instances = > 393,600,000
instances.
4.Literature review
Khawaja M.Asim has proposed seismic indicators based on earthquake predictors using Genetic
Programming and AdaBoost classification. A earthquake predictors system is proposed in this study by
combining seismic indicators with Genetic Programming (GP) and GP-based ensemble approach.Seismic
indicators are measured using a new approach in which the indicators are determined in order to gain full
knowledge about the region's seismic situation. The computed seismic indicators are used for the
development of an Earthquake Predictor (EP-GPBoost) framework with GP-AdaBoost algorithm. The
system was designed to provide earthquake forecasts of magnitude 5.0 and above, 15 days prior to the
earthquake. The Hindukush, Chilean, and Southern California areas are known for exploration. Thanks to
the collaboration of strong search and boost capabilities of GP and AdaBoost, the EP-GPBoost has
provided significant improvement in earthquake prediction. In contrast to contemporary tests, the
earthquake prediction method shows improved performance in terms of sensitivity, sensitivity and
Matthews Correlation Coefficient for the three regions considered.
Mustafa Ulukavak has suggested an study of ionospheric TEC anomalies for global earthquakes
during 2000-2019 with respect to the magnitude of the earthquake. In this work a relationship was
explored before the major shocks between ionospheric TEC anomalies and specific earthquake magnitude
classes.To this end, 2942 global Mw6 earthquakes from 2000 to 2019 and potential ionospheric TEC
disturbances that occurred prior to earthquakes were investigated by considering 13 different index values
of space weather conditions (geomagnetic storm indices and indices of solar activity).Anomalies of
ionospheric TEC changes were established for15-days before and4-days after the earthquakes using 15-
days moving median method with a 15-days period. Earthquakes were first classified by magnitude, and
then negative and positive TEC variations were observed in the quiet days prior to the earthquakes.Such
abnormalities were found for the 6.0 Mw<6.5, 6.5x Mw<7.0, 7.0x Mw<7.5, 7.5x Mw<7.5, 7.5x Mw<8.0,
8.0x Mw<8.0, 8.0x Mw<8.5, 8.5x Mw<9.0, and 9.0x Mw<9.5, respectively, in the case of the 6.0
Mw<6.5, 6.5x Mw<7.0, 7.0x Mw<7.5, 7.5x Mw<8.0, 8.0x Mw<8.5, 8.5x Mw<9.0, and 9.0x Mw<9.5
Mw<9.5. The mean of changes in these groups ' TEC anomalies is 44.2 per cent TECU and we found that
the number of positive anomalies in each group is greater than the number of negative anomalies.
Consequently, these analyzes clearly demonstrate that the regular changes in TEC anomalies can provide
important precursors in the short-term earthquake prediction for major shocks prior to global earthquakes
(M § 6).
5.Methodology
Unlike previous simply proposed earthquake prediction models, a multiple prediction model is
used in this paper. Therefore, each step in this model adds more improvements to the robustness, resulting
in the prediction model being finally improved.
1.Catboost Regressor
"Cat" comes from the different data categories. “Boost "comes from the gradient boosting
machine learning algorithm because this library is based on the gradient boosting library itself. Gradient
boosting is a powerful machine learning algorithm commonly applied to multiple types of business
problems such as fraud detection, recommendation products, forecasting and it also performs well. In
comparison to DL models that need to learn from a large amount of data, it can also return the very good
result with relatively less data. UsingCatboost Regressor for this dataset, obtained Mean absolute error is
2.008.
2.Random Forest Regressor
A Random Forest is an ensemble technique that uses multiple decision trees to perform both
regression and classification tasks, and a technique called Bootstrap Aggregation, commonly known as
bagging. The basic idea behind this is to combine multiple decision trees in order to determine the final
production rather than depending on individual decision trees. Random forest improves the algorithm's
predictive power and avoids overfitting. It provides a robust feature value estimate and offers efficient test
error estimates without incurring the expense of repeated model training related to cross-validation.Mean
absolute error obtained for Random Forest Regressor is 2.017.
3. Support Vector Regressor
Support Vector Machine can also be used as a regression tool, preserving all the main
characteristics (maximum margin) that define the algorithm. For classification the Support Vector
Regression (SVR) follows the same concepts as the SVM, with only a few minor differences. First of all,
it becomes very difficult to predict the information at hand, which has infinite possibilities because output
is a real number. In the case of regression, a tolerance margin (epsilon) is set in approximation to the
SVM that would have already been requested from the problem. Firstly, predicting the information at
hand becomes very difficult, which has infinite possibilities because production is a real number. For
regression a tolerance margin (epsilon) is set for approximation to the SVM already expected from the
problem. Result obtained by using SVR is 2.72.
4. XGBoost
XGBoost is a Machine Learning algorithm based on a decision-tree set that uses a gradient boosting
method. Artificial neural networks tend to outperform all other algorithms or systems in prediction
problems involving unstructured data (images, text, etc.). However, in the case of small-to-medium
structured / tabular data, decision tree-based algorithms are currently considered best-in-class.
6.Results and Discussion
In this paper we have used five different machine learning algorithm for predicting the time to
failure. From those five algorithms mean absolute error was best for Catboost algorithm. Absolute Error
is the amount of error in your measurements. It is the difference between the measured value and “true”
value. For example, if a scale states 90 pounds but you know your true weight is 89 pounds, then the scale
has an absolute error of 90 lbs – 89 lbs = 1 lbs.This can be caused by your scale not measuring the exact
amount you are trying to measure. For example, your scale may be accurate to the nearest pound. If you
weigh 89.6 lbs, the scale may “round up” and give you 90 lbs. In this case the absolute error is 90 lbs –
89.6 lbs = .4 lbs.
In this research, the main approaches in application of machine learning methods to a problem of
earthquake prediction are observed. The main open-source earthquake catalogs and databases are
described. The definition of main metrics used for performance evaluation is given. A detailed review of
published works is presented, which highlights the way of development of scientific methods in this area
of research. Finally, during the discussion of the results achieved, further directions of research in the
field of earthquake prediction are proposed.
7. References
[1] K.F. Tiampo, R. Shcherbakov, Seismicity-based earthquake forecasting techniques: Ten years of
progress, Tectonophysics 522-523 (2012) 89–121.
[2] H. Adeli, A. Panakkat, A probabilistic neural network for earthquake magnitude prediction, Neural
Netw. 22 (2009) 1018–1024.
[3] A. Alexandridis, E. Chondrodima, E. Efthimiou, G. Papadakis, Large earthquake occurrence
estimation based on radial basis function neural networks, IEEE Trans. Geosci. Remote Sens. 52 (9)
(2014) 5443–5453.
[4] F. Martínez-Álvarez, A. Troncoso, A. Morales-Esteban, J.C. Riquelme, Computational intelligence
techniques for predicting earthquakes, Lect. Notes Artif. Intell. 6679 (2) (2011) 287–294.
[5] A. Zamani, M.R. Sorbi, A.A. Safavi, Application of neural network and ANFIS model for earthquake
occurrence in Iran, Earth Sci. Inf. 6 (2) (2013) 71–85.
[6] A. Morales-Esteban, F. Martínez-Álvarez, J. Reyes, Earthquake prediction in seismogenic areas of the
Iberian Peninsula based on computational intelligence, Tectonophysics 593 (2013) 121–134.
[7] A. Panakkat, H. Adeli, Neural network models for earthquake magnitude prediction using multiple
seismicity indicators, Int. J. Neural Syst. 17 (1) (2007) 13–33.
[8] J. Reyes, A. Morales-Esteban, F. Martínez-Álvarez, Neural networks to predict earthquakes in Chile,
Appl. Soft Comput. 13 (2) (2013) 1314–1328.