Evaluation of effect of coal chemical properties on coal swelling index using artificial neural networks_
Evaluation of effect of coal chemical properties on coal swelling index using artificial neural networks_
a r t i c l e i n f o a b s t r a c t
Keywords: In this research, the effect of chemical properties of coals on coal free swelling index has been studied by
Coal chemical properties artificial neural network methods. Artificial neural networks (ANNs) method for more than 300 datasets
Free swelling index was used for evaluating free swelling index value. In this investigation, some of input parameters (nearly
Artificial neural networks (ANNs) 10) were used. For selecting the best model for this study, outputs of models were compared. A three-
Cokeability
layer ANN was found to be optimum with architecture of 12 and 5 neurons in the first and second hidden
Back propagation neural network (BPNN)
layer, respectively, and 1 neuron in output layer. In this work, training and test data’s square correlation
coefficients (R2) achieved 0.99 and 0.92, respectively. Sensitivity analysis shows that, nitrogen (dry), car-
bon (dry), hydrogen (dry), Btu (dry), volatile matter (dry) and fixed carbon (dry) have positive effects and
moisture, oxygen (dry), ash (dry) and total sulfur (dry) have negative effects on FSI. Finally, the fixed car-
bon was found to have the lowest effect (0.0425) on FSI.
Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction When bituminous coals are heated, they develop plastic proper-
ties at about 350 °C and as a result exhibit fluidity, swelling, expan-
Coals are organic sedimentary rocks that have their origin from sion and contraction in volume and after carbonization produce a
a variety of plant materials and tissues deposited in more or less coherent residue whose strength depends on the rank of the coal.
aquatic locations (Loison, Foch, & Boyer, 1989). A coal is character- This plastic property of coals is commonly indicated in free swell-
ized by a number of chemical, physical, physico-chemical and pet- ing index, Gieseler plastometry, Ruhr dilatometers, Audibert-Amu
rographic properties. In proximate analysis, moisture, ash, volatile dilatometer and Gray-king assay tests.
matter and fixed carbon are determined. Cokeability of coal is an Gieseler plastometer and Ruhr dilatometer are commonly used
important technological parameter of coals during reducing pro- to study coals’ plastic properties for coke making. In Gieseler plas-
cess in electric furnace method. This is usually determined by free tometry, the softening temperature, re-solidification temperature
swelling index. and maximum fluidity of coals are determined to predict their coke
The simplest test to evaluate whether a coal is suitable for pro- ability. In Ruhr dilatometry, the coking capacity, G, is defined by
duction of coke is the free swelling index test. This involves heating Simonis as (Price & Grandsen, 1987)
a small sample of coal in a standardized crucible to around 800 °C
(1500 °F). G ¼ ½ðE þ VÞ=2 ½ðc þ dÞ=ðV c EÞ ð1Þ
The free swelling index in British Standards Index (BSI) nomen-
clature (the crucible swelling index number (CSN) in ISO nomen- is used to predict the cokeability of coals.
clature) is a measure of increase in the volume of coal when When the coal particle is heated, its surface becomes plastic
heated, with the exclusion of air. This parameter is useful in eval- while devolatilization occurs from both inside and outside the
uating coals for coking and combustion. Coals with a low free particle.
swelling index (0–2) are not suitable for coke manufacture. Coals Various parameters such as coal type, heating conditions and
with high swelling numbers (+8) cannot be used by themselves coal properties affect on free swelling index. For example, Kidena
to produce coke, as the resultant coke is usually weak and will studied effect of hydrogen/carbon, oxygen/carbon, volatile matter
not support the loads imposed within the blast furnace (Thomas, and heating conditions on CSI (Kidena, 2007). In this work effect
2002). of coal chemical properties on swelling index were studied.
After heating for a specified time, or until all volatiles are driven
off, a small coke button remains in the crucible. The cross sectional
⇑ Corresponding author. profile of this coke button compared to a set of standardized pro-
E-mail address: [email protected] (S. Khoshjavan). files determines the free swelling index (Thomas, 2002).
0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.04.084
S. Khoshjavan et al. / Expert Systems with Applications 38 (2011) 12906–12912 12907
Artificial neural network (ANN) is an empirical modeling tool, 2.3. Artificial neural network design and development
which is analogous to the behavior of biological neural structures
(Yao, Vuthaluru, Tade, & Djukanovic, 2005). Neural networks are Artificial neural network models have been studied for two dec-
powerful tools that have the abilities to identify underlying highly ades, with an objective of achieving human like performance in
complex relationships from input–output data only (Haykin, many fields of knowledge engineering. Neural networks are pow-
1999). Over the last 10 years, artificial neural networks (ANNs), erful tools that have the ability to identify underlying highly com-
and, in particular, feed forward artificial neural networks (FANNs), plex relationships from input–output data only (Plippman, 1987).
have been extensively studied to present process models, and their The study of neural network is an attempt to understand the func-
use in industry has been rapidly growing (Ungar, Hartman, Keeler, tionality of a brain. Essentially, ANN is an approach to artificial
& Martin, 1996). In this investigation, 10 input parameters such as intelligence, in which a network of processing elements is de-
moisture, volatile matter (dry), fixed carbon (dry), ash (dry), total signed. Further, mathematics carry out information processing
sulfur (organic and pyretic) (dry), Btu/lb (dry), carbon (dry), hydro- for problems whose solutions require knowledge that is difficult
gen (dry) nitrogen (dry) as well as oxygen (dry) were used. to describe (Stephen, 1990; Zeidenberg, 1990).
In the procedure of ANN modelings the following contents are ANNs derived from their biological counterparts, ANNs are
usually used: based on the concept that a highly interconnected system of simple
processing elements (also called ‘‘nodes’’ or ‘‘neurons’’) can learn
1. choosing the parameters of ANN complex nonlinear interrelationships existing between input and
2. collecting of data output variables of a data set (Vuthaluru, Brooke, Zhang, & Yan,
3. pre-processing of database 2003).
4. training of ANN For developing ANN model of a system, feed-forward architec-
5. simulation and prediction by using the trained ANN. ture namely MLP1 is most commonly used. This network usually
consists of a hierarchical structure of three layers described as input,
In this paper, these stages were used in the developing of the hidden, and output layers, comprising I, J, and L number of process-
model. ing nodes, respectively (Vuthaluru et al., 2003). General MLP archi-
tecture with two hidden layers is shown in Fig. 1. When an input
pattern is introduced to the neural network, the synaptic weights be-
tween the neurons are stimulated and these signals propagate
2. Material and methods through layers and an output pattern is formed. Depending on
how close the formed output pattern is to the expected output pat-
2.1. Data set tern, the weights between the layers and the neurons are modified in
such a way that next time the same input pattern is introduced, the
The collected data was divided into training and testing data- neural network will provide an output pattern that will be closer to
sets using sorting method to maintain statistical consistency. Data- the expected response (Patel et al., 2007).
sets for testing were extracted at regular intervals from the sorted Various algorithms are available for training of neural networks.
database and the remaining datasets were used for training. The Feedforward back-propagation algorithm is the most versatile and
same datasets were used for all networks to make a comparable robust technique, which provides the most efficient learning pro-
analysis of different architecture. In the present study, more than cedure for multilayer perception (MLP) neural networks. Also,
300 datasets were collected among which 10% were chosen for the fact that the back-propagation algorithm is especially capable
testing. These data were collected from Illinois state coal mines of solving predictive problems makes it so popular. The network2
and geological database (http://www.isgs.illinois.edu/maps-data- model presented in this article is a supervised back-propagation
pub/coal-maps/nonconf_masterfile.xls). neural network, making use of the Levenberg–Marquardt
approximation.
This algorithm is more powerful than the common used gradi-
2.2. Input parameters ent descent methods, because the Levenberg–Marquardt approxi-
mation makes training more accurate and faster near minima on
In the current study, input parameters include moisture, ash the error surface (Lines & Treitel, 1984).
(dry), volatile matter (dry), fixed carbon (dry), total sulfur (dry), The method is as follows:
Btu (dry), carbon (dry), hydrogen (dry), nitrogen (dry) and oxygen
(dry) for predicting the FSI. The ranges of input variables to FSI pre-
DW ¼ ðJT J þ lIÞ1 JT e ð2Þ
diction for the 300 samples are shown in Table 1. In Eq. (3) the adjusted weight matrix DW is calculated using a
Jacobian matrix J, a transposed Jacobian matrix JT, a constant mul-
tiplier m, a unity matrix I and an error vector e. The Jacobian matrix
contains the weights derivatives of the errors:
Table 1
@E
The ranges of variables in coal samples (as determined).
@wij
Coal chemical properties Max Min Mean St. dev. J¼ ð3Þ
Moisture (%) 15.94 6.03 10.32 2.21224 @E
@wm
Volatile matter, dry (%) 45.10 25.49 36.87 2.458445
Fixed carbon, dry (%) 60.39 30.70 50.58 4.152964 If the scalar l is very large, the Levenberg–Marquardt algorithm
Ash, dry (%) 43.81 4.41 12.56 4.861197 approximates the normal gradient descent method, while if it is
Total sulfur, dry (%) 9.07 0.62 3.00 2.018264
Btu/Ib, dry 14076.00 8025.00 12631.08 841.5436
small, the expression transforms into the Gauss–Newton method
Carbon, dry (%) 79.32 44.03 70.43 5.026348 (Haykin, 1999). For more detailed information the readers are re-
Hydrogen, dry (%) 5.36 3.39 4.78 0.310245 ferred to Lines and Treitel (1984).
Nitrogen, dry (%) 3.03 0.35 1.40 0.290988
Oxygen, dry (%) 12.57 2.16 7.53 1.660288 1
Free swelling index 8.50 1.00 4.39 1.268707 Multiple layer perception.
2
The network is developed in Matlab 7.1, using also a neural network toolbox.
12908 S. Khoshjavan et al. / Expert Systems with Applications 38 (2011) 12906–12912
Fig. 1. MLP architecture with two hidden layers (Patel et al., 2007).
After each successful step (lower errors) the constant m is de- tent the final configuration and training constraints of the network
creased, forcing the adjusted weight matrix to transform as quickly (Haykin, 1999).
as possible to the Gauss–Newton solution. When after a step the
errors increase the constant m is increased subsequently. The
weights of the adjusted weight matrix (Eq. (3)) are used in the for- 2.4. Training and testing of the model
ward pass. The mathematics of both the forward and backward
pass is briefly explained in the following. As the above-mentioned, the input layer has six neurons Xi,
The net input (netpj) of neuron j in a layer L and the output (opj) i = 1, 2, . . . , 6. The number of neurons in the hidden layer is sup-
of the same neuron of the pth training pair (i.e. the inputs and the posed Y, the output of which is categorized as Pj, j = 1, 2, . . . , Y.
corresponding swelling index value of sample) are calculated by: The output layer has one neuron which denotes amount of gold
extraction. It is assumed that the connection weight matrix be-
X
last
tween input and hidden layers is Wij, and the connection weight
netpj ¼ wjn opn ð4Þ
n¼1
ex ex
e e
f ¼ ð9Þ
eex þ eex Table 2
Results of a comparison between some of the models.
where ex is the weighted sum of the inputs for a processing unit.
The number of input and output neurons is the same as the No. Transfer function Model 3SE
number of input and output variables. For this research, different 1 TANSIG–LOGSIG 10–5–1 1.34
multilayer network architectures are examined (Table 2). 2 LOGSIG–LOGSIG 10–7–1 0.7
3 LOGSIG–LOGSIG–LOGSIG 10–4–3–1 1.21
During the design and development of the neural network for
4 TANSIG–TANSIG–LOGSIG 10–5–3–1 1.02
this study, it was determined that a four-layer network with 14 5 LOGSIG–LOGSIG–LOGSIG 10–6–4–1 0.46
neurons in the hidden layers (two layers) would be most appropri- 6 LOGSIG–LOGSIG–LOGSIG 10–7–4–1 0.3
ate. Artificial neural network (ANN) architecture for predicting of 7 LOGSIG–LOGSIG–LOGSIG 10–8–4–1 0.1S
the free swelling index is shown in Fig. 5. 8 LOGSIG–LOGSIG–LOGIG 10–8–6–1 0.03
9 LOGSIG–LOGSIG–LOGSIG 10–10–4–1 0.014
To determine the optimum network, SSE was calculated for var-
ious models by the following formula:
X T i Oi 2 predicted free swelling index for training and testing data are
SSE ¼ ð10Þ
N shown in Figs. 7 and 8 respectively. Correlations achieved from
these figures, between measured and predicted free swelling index
where Ti, Oi and N represent the measured output, the predicted
from training and testing data, indicate that the network has high
output and the number of input–output data pairs, respectively
ability for predicting free swelling index (Figs. 9 and 10).
(Haykin, 1999).
The learning rate of the network was adjusted so that training
time was minimized. During the training, several parameters had 2.5. Sensitivity analysis
to be closely watched. It was important to train the network long
enough so it would learn all the examples that were provided. It A useful concept has been proposed to identify the significance
was also equally important to avoid over training, which would of each factor (input) on the factors (outputs) using a trained net-
cause memorization of the input data by the network. During the work. This enables us to hierarchically recognize the most sensitive
course of training, the network is continuously trying to correct it- factors effecting coal swelling index. This is performed by incorpo-
self and achieve the lowest possible error (global minimum) for rating values of ‘relative strength of effect’ (RSEs) (Kim, Bae, et al.,
every example to which it is exposed. The network performance 2001). After a BPNN has been trained successfully, the neural net-
during the training process is shown in Fig. 6, as shown, the opti- work is no longer allowed to adapt. The output for a one-hidden-
mum epochs of train achieved about 400 epochs. layer network can be written as:
For evaluation of a model, a comparison between predicted and 1
measured values of FSI can be fulfilled. For this purpose, MAE (Ea) ok ¼ ð13Þ
ð1 þ eek Þ
and mean relative error (Er) can be used. Ea and Er are computed as
follows (Haykin, 1999) where
X
Ea ¼ jT i Oi j ð11Þ ek ¼ oj wjk þ hk ; oj ¼ 1=ð1 þ eej Þ ð14Þ
j
jT i Oi j X
Er ¼ ð12Þ
Ti ej ¼ oi wij þ hj ð15Þ
j
where Ti and Oi represent the measured and predicted output.
For the optimum model Ea and Er were equal to 0.02627 and where w is a connected weight, H is a threshold and oi is the value
0.006633 respectively. Comparison between measured and of input unit. Thus, we have.
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Fig. 8. Comparison of measured and predicted free swelling index for different
samples for test data.
where G(ek) = eek/(1 + eek)2, Ojn, Ojn1, Ojn2, . . . , Oj1 denote the
Fig. 6. Network performance during the training process.
hidden units in the n, n 1, n 2, . . . , 1 hidden layers, respectively
(Kim et al., 2001).
0 0 0 0 0 111 11 Obviously, no matter what the neural network approximates,
P
B B BP B B wijoiþHj
CCC C C all items on the right hand side of Eq. (17) always exist. According
B @@ wjk @1=@1þe i AAAþHA C to Eq. (17), a new parameter RSEki can be defined as the RSE for in-
B j C
B C put unit i on output unit k (Kim et al., 2001).
Ok ¼ 1=B1 þ e C ð16Þ
B C Definition of RSE: For a given sample set
B C
@ A S = {s1, s2, s3, . . . , sj, . . . , sr} where Sj = {X, Y}, X = {x1, x2, x3, . . . , xp},
Y = {y1, y2, y3, . . . , yp}, if there is a neural network trained by
back-propagation algorithm with this set of samples, the RSEki ex-
Since the activation function is sigmoid Eq. (13), it is differentia-
ists as:
ble. The variance of Ok with the change of Oj for a network with n
XX X
hidden layers can be calculated by the differentiation of the follow- RSEki ¼ C wjn k Gðek Þwjn1 jn Gðejn Þ
ing equation: jn jn1 j1
XX X
@Ok =@Oi ¼ wjn k Gðek Þwjn1 jn Gðejn Þ wjn2 jn1 Gðejn1 Þ . . . wij1 Gðej1 Þ ð18Þ
jn jn1 j1
where C is a normalized constant which controls the maximum
wjn2 jn1 Gðejn1 Þ . . . wij1 Gðej1 Þ ð17Þ absolute value of RSEki as unit and the function G denotes differen-
8
7
Free Swelling indexn
6
5
4
3
2
1
0
1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181
Fig. 7. Comparison of measured and predicted free swelling index for different samples of training data.
S. Khoshjavan et al. / Expert Systems with Applications 38 (2011) 12906–12912 12911
8 than the differentiation of one to one input and output. The larger
Measured Free Swelling Index
7
2 the absolute value of RSE, the greater the effect the corresponding
R = 0,9967
input unit has on the output unit. Also, the sign of RSE indicates the
6 direction of influence, which means a positive action applies to the
5 output when RSE > 0, and a negative action applies when RSE < 0.
4 Here, a positive action denotes that the output increases with the
increment of the corresponding input, and decreases with reduc-
3
tion of the corresponding input. On the contrary, negative action
2 indicates that the output decreases when the corresponding input
1 increase and increases when the corresponding input decreases.
The output has no relation with the input if RSE = 0. RSE is a dy-
0
0 2 4 6 8 namic parameter which changes with the variance of input factors.
In a further section, the RSE will be used for a sensitivity analysis of
Predicted Free Swelling Index
the influence of factors on free swelling index predicted by a
Fig. 9. Correlation between measured and predicted free swelling index for training trained neural network.
data. Fig. 11 shows the average RSE values of the factors calculated
for all of 250 field data that used in the previous sections. It can
be seen in Fig. 11 that ‘moisture’ and ‘nitrogen’ are usually the
6 most sensitive parameters. Remaining factors include Btu (dry),
Measured Free Swelling Index
2
carbon (dry), fixed carbon (dry), hydrogen (dry), oxygen (dry), total
5 R = 0,9233 sulfur (dry) and volatile matter (dry) which were studied in neural
4 network method, also. In addition, a positive value of RSE indicates
that for example, if ‘carbon’ has a positive RSE (see Fig. 8) increases
3 the value of RSE, the FSI will increase, and inverse effects will take
place in the case of negative RSE (i.e. ‘ash’, etc.).
2
1 3. Discussion
0
0 1 2 3 4 5 6 In this investigation the effect of coal chemical properties on
Predicted Free Swelling Index free swelling index were studied. Results from neural network
showed that nitrogen (dry), carbon (dry), hydrogen (dry), Btu
Fig. 10. Correlation between measured and predicted free swelling index for (dry), volatile matter (dry) and fixed carbon (dry) had positive ef-
testing data.
fects on FSI, respectively. The negative effects of input parameters
were related to, moisture, oxygen (dry), ash (dry) and total sulfur
(dry), respectively. Figs. 4 and 5 show that the measured and pre-
tiation of the activation function. G, w and e are all the same as in
dicted free swelling indexes value are similar.
Eq. (17).
The results of artificial neural network shows that training and
It should be noted that the control of RSE is done with respect to
test data’s square correlation coefficients (R2) achieved 0.9967 and
the corresponding output unit, which means all RSE values for
0.9181 respectively.
every input unit on corresponding output unit are scaled with
the same scale coefficient. Hence, it is clear that RSE ranges from
1 to 1 (Kim et al., 2001). 4. Conclusions
Compared with Eq. (17) RSE is similar to the derivative except
for its scaling value. But it is a different concept from the differen- In this research, to evaluate the effects of chemical properties of
tiation of the original mapping function. RSE is a kind of parameter coal on FSI, artificial neural network approach was employed. Input
which could be to measure the relative importance of input factors parameters were moisture, volatile matter (dry), fixed carbon
to output units, and it shows only the relative dominance rather (dry), ash (dry), total sulfur (organic and pyretic) (dry), Btu/lb
1.0
0.8
0.6
0.4
0.2
RSE
0.0
-0.2
-0.4
-0.6
-0.8
Fixed Volatile
Carbon Hydrogen Nitrogen Oxygen Total Sulfur
Ash (dry) Btu (dry) Carbon Moisture Material
(dry) (dry) (dry) (dry) (dry)
(dry) (dry)
RSEs -0,48 0,405 0,525 0,0425 0,43 -0,665 0,845 -0,545 -0,4275 0,375
Fig. 11. Sensitivity analysis between the free swelling index and coal chemical properties.
12912 S. Khoshjavan et al. / Expert Systems with Applications 38 (2011) 12906–12912
(dry), carbon (dry), hydrogen (dry), nitrogen (dry) and oxygen Haykin, S. (1999). Neural networks – A comprehensive foundation (2nd ed.). USA:
Prentice-Hall.
(dry). According to the results obtained from this research, the
Kidena, K. (2007). Prediction of thermal swelling behavior on rapid heating using
optimum ANN architecture has been found to be 10 and 4 neurons basic analytical data. Energy & Fuels, 21(2), 1038–1041<http://pubs.acs.org/>.
in the first and second hidden layer, respectively, and one neuron Kim, C. Y., Bae, G. J., et al. (2001). Neural network based prediction of ground surface
in output layer. Higher nitrogen (dry), carbon (dry) and Btu (dry) settlements due to tunneling. Computers and Geotechnics, 28, 517–547.
Lines, LR., & Treitel, S. (1984). A review of least-squares inversion and its application
contents in coal can result in higher free swelling index and higher to geophysical problems. Geophysical Prospecting, 32.
moisture, oxygen (dry) and ash (dry) contents in coal results in Loison, R., Foch, P., & Boyer, A. (1989). Coke quality and production (2nd ed.). London:
lower free swelling index. In ANN’s method, results of artificial Butterworths.
Patel, S. U. et al. (2007). Estimation of gross calorific value of coals using artificial
neural network shows that training and test data’s square correla- neural networks. Fuel, 86, 334–344.
tion coefficients (R2) achieved 0.9967 and 0.921, respectively. Plippman, R. (1987). An introduction to computing with neural nets. IEEE ASSP
Results from sensitivity analysis show that nitrogen (dry), mois- Magazine, 4, 4–22.
Price, J. T., & Grandsen, J. F. (1987). Improving coke quality with Canadian coals. In
ture, oxygen (dry), carbon (dry), ash (dry), total sulfur (dry), Btu Proceedings of the 1st international cokemaking congress, Essen, section E3,
(dry) and volatile matter (dry) were effective parameter on free Preprints.
swelling index (Fig. 11). For network training performance, when Stephen, J. J. (1990). Neural Network Design and the Complexity of Learning (p. 20).
Cambridge, MA: MIT Press.
number of epochs is 400, error of training network was minimized Thomas, L. (2002). Coal geology. Wiley.
and after this point suitable performance was achieved. Results Ungar, L. H., Hartman, E. J., Keeler, J. D., & Martin, G. D. (1996). Process modelling
from neural network showed that nitrogen (dry), carbon (dry), and control using neural networks. In American institute of chemical engineers
symposium series (Vol. 92, pp. 57–66).
hydrogen (dry), Btu (dry), volatile matter (dry) and fixed carbon
Vuthaluru, H. B., Brooke, R. J., Zhang, D. K., & Yan, H. M. (2003). Effects of moisture
(dry) had positive effects on free swelling index, respectively. The and coal blending on hard grove grind ability index of Western Australian coal.
negative effects of input parameters were related to moisture, oxy- Fuel Processing Technology, 67–76.
gen (dry), ash (dry) and total sulfur (dry), respectively. The fixed Yao, H. M., Vuthaluru, H. B., Tade, M. O., & Djukanovic, D. (2005). Artificial neural
network-based prediction of hydrogen content of coal in power station boilers.
carbon was found to have the lowest effect (0.0425) on FSI. Fuel, 84, 1535–1542.
Zeidenberg, M. (1990). Neural network models in artificial intelligence (p. 16). New
References York: E. Horwood.
Demuth, H., & Beale, M. (1994). Neural network toolbox user’s guide. MA: The Math
Works Inc..