0% found this document useful (0 votes)
15 views

NN Causality

paper

Uploaded by

willnburger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
15 views

NN Causality

paper

Uploaded by

willnburger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 17
ResearchGate Secu ts rotor pr pen ements S812 Granger causality test with nonlinear neural-network-based methods: Python package and simulation study 2» 865 Sverre Contents lists available at ScienceDirect Computer Methods and Programs in Biomedicine journal homepage: winw-elsevierconvloestelempb ELSEVIER Granger causality test with nonlinear neural-network-based methods: @ Python package and simulation study od Maciej Rosot’, Marcel Miyficzak Gerard Cybulski Foal of Mechatronic tte of Meloy and Bomesica Pgnerng, Warw Univesity of Techy, Wars, Plnd I or Revloed 25 January 2022 ‘ceed 26 anu) 2022 [eckground and objective. Causality defined by Granger in 1969 1s a widely used concept, paricaaly in neuroscience an economies. As there Js an increasing interest nonlinear causality Teseare, a Python package with a neutal-network-based causality analysis approach was created allows perform= ing causality tests using neural networks based on Long Shor-Term Memory ISTM}. Cated Recurrent ‘Uni (RUN, or Multilayer Perception (MLE). The aim of this paper is to present the nonlinear method for Reo Causality aaj and the crested Python package eager ensaty Method The created functions with the autoregressive (AR) and Generlied Raa Basis Functions Tie sre {CRBF) neural network models were tested slated sgl in tw cases: with online depene eal ewes Aency and with absence of euslty from Y to X signal The tae spt (730) was wed Ere hen nade ‘hvaied onthe est set were compared using the Wikaxon sghedrank ett determine the presence ‘ofthe causality For the chosen mode, the proposed method of study the change of causality ove was presented KResuts Inthe case when X was a polynomial of ¥, nonlinear methods were able to decec che causality, hile the AR model did not manage to indicate The best results (in terms ofthe prediction accuracy} ‘Were obtained for the MLP forthe lag of 150 (MSE equal t9 GO, compared t9 0041 and 0.036 for AR and GRAF, respectively). When there was no causality between the signals, one ofthe proposed and AR ‘odes id indicat false causality, while it was detected by GRBF models in one case. Only the proposed rode gave the expected results in each of the tested Scenarios. Conclusions: The proposed method appeared tobe superior 19 the compared methods. They were abe to detect nos-near causality, make accurate forecasting and not indicat false causality. The created pack- age enables easy usage of neural networks t study the causal relationship between signals. The neural. retworks-baseé approach isa suitable method that allows the detection of a nonlinear causal elation ship, which cannot be detected by the classical Granger method. Unlike other similar tools, the package allows forthe study of changes in causality over me. 1 2022 The Authors Published by Elever BY. This tan open access article under the CC BY-NC-ND license ip fereativecommonsorghiensesfby-ne-nd}40)) 1. Introduction 1 Granger causality ‘The causality concept, which is discussed in this paper was pre- sented by Sir Clive Granger in 1968 (1), Nowadays it is widely used in economics [2-5] and neuroscience [5-14]. Recently, there has also been a growing interest in Granger causality in the Feld of Physiology, where it can be used for searching the cause of phe- nomena and even as a physiological marker |15-20). In this ap- mal ares ace ros dektepwedu pl Mt Ras proach, a causal relationship (where the time series ¥ isthe cause of the time series X denoted by YX) occurs ifthe variance of the prediction erro ofthe ¥ based on the past ofall available informa- tion (U) is statistically significantly smaller than the prediction er- variance of the X based on the past of al available information ‘except for the time series ¥ (Fg, (1) In practice, some defined by researcher number of lag values of X and Y time series are treated as all available information, while past lags of X are treated as all available information except for ¥. (XIU) < 0° XIU -¥) o To test hypotheses about causality 2 linear autoregressive mod- els (AR) and covariance stationary time series X and Y are as- sumed. The first model predicts the current value of time series ‘5168-26070 2022 The Abas Fabled by Esever BV. Tis san open acess ati unde the CC BYNCND Heese (ptetecomensxlicensesy-acad/A0)) Rose ME Micah and. busts X based only on p lagged values of X and Y (£4, (2), the second ‘one predicts the current value of ¥, based on the same values but diferent coefficients (24. (3)). In the equations shown below ‘Aare the regression coefficients and £ are the X= Dan XC -D~ Dane -D +E @ YO= DA XC D+ Daa Me-D+LO ° If in the first equation the variance of error E} is statistically, significantly smaller for the model taking into account the va able ¥ (coefficients Ap different from zero) than the variance of the same model without taking into account tne variable ¥ (coef ficients Ai, equal to zero), it means that the variable Y is the C- cause of the variable X [21]. Similany, ifthe variance of the ertor £, in the second equation is smaller to a statistically significant de- fee for the model including the vatiable X (coeficients Ayy other than zero) from the variance of the prediction error for a'model in which the variable X was not included (coefficients Ay; equal to zero), it means that the variable X isthe G-cause of the variable Y. To test if vatiable ¥ is G-causing variable X, the F-test can be applied (22|, Fits, the residual sums of squares (RSS) are calculated for the model making prediction only on past values of X (Fa. (4)) and for the model that uses for prediction past values of X and Y time series (Ea. (5)) [23 RSS, = DE? « RSS =) Exy(t)? (5) Based on this, the test statistic for the F-test (S) can be com- puted according to £9. (6), where {is equal to the number of con- sidered lags, and Tis equal to the number of predicted values [24 @5,- ESDt "RST =a The value of the S; test statistic is consistent with the Fisher distribution with I and (7-2-1) degrees of freedom. To test whether ¥ causes X (2X), the F-test is performed under the null hypothesis that does not cause X and the alternative hypothesis that Y is causing x 5 ~ Bran 6) ‘Ais the Chi-squared test may be applied with test statistics computed according to, (7) [231 T (RRS, — SS.) a Causality can not only be tested for ocurrence but also quanti- fied according t0 Fo, (8) where oF, means variance of the error ‘obtained from the model based oniy on the past values of X and 2Fxy isthe variance ofthe error obtained from the model based fon the past ofboth signals [67.25] oF TEs I eheee time series (X-2) ate available, the mutual Granger ‘causality analysis for two variables for each pair is not able to show sume relationships between the data. For example, if ¥ is the signal eausng Zand 25 the signal eausing X but Yi nat the signal causing X, then in the case of two-variable analysis this ‘will be inestinguishable from the situation where ¥ isthe signal ‘causing both Z and X. This situation is presented in Fig 1. With a 5 PO) o Fw 6) ampute Methods nd rogram Blomedine 26 (2022) 10888 Deseace of causaltypreseted i Dagam A ‘causality analysis for two time series for both presented cases, the ‘obtained result will suggest the presence of the relationship pre- sented in patt A of Fig. 1. In order to distinguish such situations, a conditional Granger ‘causality tet can be used. It is able to indicate whether the past ‘of the ¥ signal helps to reduce the vatiance of the prediction error fof X predicted from the past of X and Z time series. in the case of conditional causality analysis, the linear autoregressive equations presented in Eq, (2) are extended by the sum of the products of the respective coefficients (Ay;) and the third variable Z, so that this variable is used for forecasting the present value of X, as pre- sented in £3. (2). XO = YA xC-)+ YAnKe-) + Dau 20-D +0) oy Similar to the causality analysis fortwo time series, two models ate ceated in the conditional causality analysis. The first one does hot take into account the past of variable Vin the model (cof Gients Ap equl tO), the second one Lakes inte account the past ofall vatiables (coeficients fy diferent fem 0), I ero" fis Stax Ustcally signfcantly smaller forthe second model it means that Ys eausing X conditioned on Z, which is written a5 Y-oXIz 12, Nonlinear methods for Granger causality analysis ‘The main limitation of the approach presented by Granger is ‘the usage of the linear model. De to the fact that many real data tum out fo be non-stationary processes. this introduces a large limitation for the above-mentioned methods, which assume che stationarity of the tested time series. It is possible to overcome the issue of non-stationarity of the time series by the usage of vector error correction models however, the modeling is still us- ing linear autoregression [26.27]. The limitation coming from the usage of the AR model is that more complex causality dependen- cies may not be captured by this method [21,28]. Thus, many re- searchers use other models for prediction instead of linear autore- {ression, One of the approsches of nonlinear causality testing i= Kernel Granger Causality (KGC) [29,30], It is based on the trans- formation of the data using a specified kernel function. The inner product of the data and kernel function is used to perform the lin feat regression. The nonlinearity ofthis method is controlled by the choice of the kernel function, KGC can be applicable for multivati- ate problems, it also allows for quantification of the causality and does not suffer from overfitting problem which is a common issue for many methods [31|, Another approach that is growing in pop- tlarity isto use neural networks as a forecasting method (14.22— 37], Proposed in 2017, Causal Relationship Estimation by Artificial [Neural Network (CREANN} method uses weights of Multilayer Per- ceptron to assess the causality of the individual lags [36]. This Rose ME Micah and. busts method is robust with respect to signal-to-noise ratio and model ‘order however, it does not make statistical inference about causal- ity. Another recently proposed method for causality assessment is large-scale nonlinear Granger causality (IsNGC) [14]. In this ap- proach, the Generalized Radial Basis Functions neural network is ‘used as a prediction model. In the experiments conducted by Wis miller etal, 1d, this method outperformed other methods com- pared, including KGC, The IsNGC was implemented in Python and made available to researchers to use [38]. Other existing Python solutions using neural networks to study causality use lasso reg- larization [35,40]. By using this technique, some input weights are zetoed and those ¥ lag values whose weights are not equal to zero are considered to be the values that cause the time series X The results obtained using this approach do not allow for sta- tistical inference using statistical tests and depend on the value of the lambda parameter corresponding to the lasso penalty. The created Python package presented in this paper was de- signed to test for causality with precise forecasting thanks to the usage of neural networks. By the usage of dropout and out-of sample testing, the overiting problem, which might lead to false ‘causality detection, can be omitted, Moreover, che package allows ‘the quantification of the change in causality over time (which is not enabled by any other available package) Created functions al- low for the forecasting using Multilayer Perceptcon (MLP) and re- ‘current neural networks, in particular Long-Short Term Memory (1STM) and Gated Recurrent Unit (GRU) It also supports the us- ‘age of ARIMA model, which is not in the scope of this paper. The approach proposed by the authors does not make the obtained re- sults dependent on an additional parameter or chosen kernel and allows for statistical inference. The package is designed to help sci- fentists use mote complex models in terms of Granger causality in an easy user-fiendly way without very specific programming knowledge, as well as study causality changes over time, which is not provided by any other framework. It was designed to study the relationship between biological signals. however, it can be widely used in any field of science. The paper aims at presenting the method and corresponding Python package, along with a simula- tion study showing its usage and relevance. 2. Methods 21, Used models 2.1 Multilayer perceptron MUP is a type of artificial neural network, which is built from single perceptrons organized in layers: MLP are the most popular artificial neural networks used for forecasting, but also in the field of physiology 1,42]. Each perceptron in the frst layer takes as an input the features vector and calculates its output as presented in Ea, (23), where w are weights and fis the activation function, y= F0%0+ + pox) (23) Neurons in successive layers take the output values of neurons from previous layers y as the input vector. n the created package, the activation function in hidden layers is the Rectified Linear Unit {ReLU) function, while in the output layer it is a linear function, ‘The architecture of a simple MLP model based only on past values ‘of X and predicting the current value of X forthe Granger causality testis presented in Fig 2. 212, Long short-term memory ‘The LSTM is a type of gated recurrent neural network. It is of ten used for sequence data analysis like text and speech recogni- tion, time series forecasting, or physiological data analysis [43-45 ‘The LSTM network includes a system of gate units, thanks to which ampute Methods nd rogram Blomedine 26 (2022) 10888 Fig. 2. MUP model whic is recasting te curenvaue of X base on 3 pas val nes 0X sth station Fnctiane and example eh shawn, it is able to control the low of information, and thus “temember* ‘or “forget information ftom previous moments in time. What is ‘more, this type of network is not affected by the gradient vanish ing oF explosion problem, which is common for recustent neural ‘networks [46]. The big advantage of using LSTM in causality testing 4s that, unlike linear regression models, those types of networks do not assume the stationarity ofthe predicted time series. The ESTM cell takes as input as the input vector x(0), the value ofthe Tong- term state from the previous time point s(-1) and the value of the shoreterm state from the previous time point ht-1). Instead, it returns the value of the short-term it) and longe-term 5) state at the current moment in time. The frst gate is the forget gate, ‘which controls what part ofthe information from the past is “re- membered” and which one is “forgotten. For this purpose, a sig- ‘moid function is used, and the value passed further from the forget sate is calculated based on £, (24). 0 =o (Ux + WHEY + bt) (24) ‘Another important gate isthe input gate. It determines the de- see of status update at a given moment in time, For this it uses the sigmoid function (o select the values that should be used to ‘update the state at a given moment from the input veetor and the ‘output vector for the previous moment of time. The formula de~ scribing the operation of the input gate is presented in Ea. (25), where U is the gate input weights, W is the recursive weights, and bi isthe bias of the input gate. For the state to be updated, the candidate values are also calculated using the hyperbolic tan- ent for the input vector and the output vector of tie previous time moment, ‘aking into account the appropriate weights U' and WE and the bias BF as shown in Ea, (26), i =o (UX 4 WIN 4 Bi) (25) 50 = tanh(Ux + Wh) +b) (25) Updating the LSTM cell state takes place by summing the prod- uct ofthe result obtained on the forget gate © and the state value at the previous time point 1) with the product of the result ob- tained on the input gate with the candidate values. The formula for upéating the state is shown in Fa. (27) 5 = fst 4 jose en The last gate in the LSTM cel is the output gate, the opera- tion of which is described in (28), where U" is the gate input ‘eights, Wis the recursive weights, and 6° is the bias of the out- Dut gate Like the other gates, € uses a sigmoidal function to con- {rol which state values ata given point in time will be included in the calculation of the final output of the network. 0 (Ux + WAY + 6) (28) Rose ME Micah and. busts The computation ofthe output of the LSTM cell is done by mul- Liplying the hyperbolic tangent of the cell's state and the result ob- tained from the output gale as shown in Eg, (29). i = otanb(s!) 8) Thanks to the use of the three above-described gates, the LSTM neural network is able to control the flow of information between consecutive time moments by "zemembering” or “forgetting” them, What is more, LSTM networks have greater ease in recognizing long-term dependencies than simple recursive network architec- tures (47) 213, Gated recurrent unit ‘GRU neural network was presented by Cho et al. in 2014 (48), ‘GRU, ike LSTM, is a kind of gated recursive neural network. The ‘GRU network is characterized by the use of two gates - a reset gate and an update one. Such a recursive neural network architec- ‘ure also prevents gradient vanishing and exploding effets [47 In the GRU Cell. the update gate is a kind of counterpart of the for- {get gate and input gate in the LSTM cell. The update gate controls, ‘which information is “forgotten” and which new information from 2 given point in time willbe taken into account in further caleula- tions. The operation of the update gate is based on calculating the value of the sigmoid function for the input data and the cell result from the previous time moment, taking into account the approp! ate weights UF and W* and the bias bas shown in Eq. (30) (49) ul) 2000-40000 —~—«OO Number of sample 70000 Fig. 5. Vieuazaton af gnlsX and ‘equal to 0.001 and 0.0001 for the frst 50 epochs and the last 100 ‘epochs, respectively, To obtain the most accurate models LSTM and ‘GRU neural network models were created and trained 2 times (pa~ rameter run) and the MLP model 5 times. For final causality test- ing models (one based on X and one based on both X and ¥) with ‘the smallest RSS were chosen, The difference in the value of the ‘un parameter was due to the long time needed for LSTM and ‘GRU training. The usage of the created functions is presented in Listing 2. Regarding the detection of causality and the prediction perfor- mance, the presented methods were compared with models used in two other methods for causality testing. both of which allow for ‘out-of-sample forecasting and are as well implemented in Python, ‘he first of these methods was a nonmodified Granger test, which uses linear autoregression (AR) for prediction, The second one was large-scale nonlinear Granger causality (IsNGC) [14,38], which is ‘based on a Generalized Racial Basis Functions {GRBF) neural net- ‘work. Its parameters cj and ¢,, which are the number of hidéen layer neutons in the GRBF networks were set to 25. All models ‘were fitted on the training set and used on the test set. The ab- solute values of the prediction errors obtained on a test set ftom ‘2 model based on the past of X and a model based on the past ‘of both signals were compared using Wileoxon signed-rank test, to determine the presence of causality. Performance of all meth- ‘ods was quantified using mean squared error (MSE), mean abso- lute error (MAE) and median absolute error (MedAE). Moreover, it ‘was assessed! ifthe usage of the proposed results in improvement of the prediction accuracy models in comparison to AR ot GRBF network. For this purpose, Wilcoxon signed-rank test was used t0 ‘compare the absolute values of errors obtained from corresponding models (eg. MLP model based on past of X and AR model based also on a past of X). The effect size of including the past of signal Y in prediction for each method was assessed using Cohen's d. In order to better visualize the results plots of predicted values ver- Sus the true values of X and plots of prediction error versus the predicted X values were prepared for each model. The steps of the ‘entire process of causality analysis were presented in Fis. 5. In the next step of the analysis, the Y time series was replaced ‘with the random noise to compare the results from the tests in the ‘ase where the causality relation should be detected with the case ‘where there is no relation between the signals. This analysis was performed as well using the first 70% of the signal asthe training set and the last 30% as the test set. The same metrics and tess ‘were performed as described in the previous paragraph. ‘The second type of function in the package can be used to ex- amine the change of causality overtime. In order to simulate such 4 change of causality relation the first 50% of the test Y signal was changed to random noise (F. 7), so there would be no causal e- lation between X and Y at the first half ofthe test time series. To present this feature ofthe package the MLP architecture which ob- tained the highest Cohen’s d was chosen. Mentioned in Section 22 Window values were set to 30 and 1 for w1 and w2, respectively The example usage of the function for assessment of the change of causality over time was presented in Listing 3 ‘The assumed significance level is equal to 0.05, All analyses were performed using Python version 3.710, 22. Presence of causality from ¥ to X Each of the designed functions generates a plot of original tst- ing signa X. values predicted by the model based only on past va ues of X and values predicted by the model based on past values (of X and ¥. For each function based on neural networks, i is possi- ble to obtain a history of the fitting from the output of the function and create the learning curve. Sample plots of tne original and pre~ dicted data obtained from function nonlincausalityN% for lag ‘equal to 50 and 150 are presented in Fig. 8 while learning curves of the models returned by this function are presented in Fis. 9. All error metrics calculated on a testing set for each mode! for lag equal to $0 and 150 are presented in Table 1. P-values obtained {rom the Wilcoxon signed-rank test used to assess the presence of ‘causality from Y to X for each method and each lag are presented in Table 2, In case of lag equal to 50, all models from a created package (NN) obtained similar results im terms of error metrics. The AR ‘models had a similar accuracy of prediction, while GREF models tend to obtain the biggest error metrics for model based only on the past values of X and the smallest one in the case of model based on both signals. while model based on both time series ob- tained slightly smaller error metrics than LSTM, GRU and AR mod- cls For the lag of 150, NN and AK mostly obtained similar results, ‘except for MLP model based on X and Y which outperformed all ‘other models and got the smallest error in the case of all three ‘metrics. In the case of the bigger lag GRBF model had a drop in performance for the model based only on X signal. In the case of the causality test, ll nonlinear methods abtained a p-value smaller than the assumed significance level, thus in the c3se of using any ‘of the presented models or GRBF the causality relationship be- tween signals was detected, even for lag smaller, than the actual delay between the time series. The autoregressive model used in the state-of-the-art Granger method did not capture the causality for any given lag. Pevalues obtained from the Wilcoxon signed-rank test used to assed the improvement in prediction due to usage of neural networks over autoregressive and GRBE models are presented in ‘able 3 and Table 4, respectively. Cohens'd used to assess the effect, size of incorporating the past of ¥ signal into the models are pre- sented in Table 5. Plots of predicted values from actual values are presented in Figs. 10, and 11. for lag $0 and 150, respectively and Mesh Myc and cut Compute Methads and Programs Biemedine 216 (2022) 10568 Tae = GES ‘SSouoecronastt0rv¢/srie-ssopnhe_nunete0,160y Sears etast0, 86170 0001} otek suse some, BK neuronsc(200, 0-01, 109, 0-03), fon [6.001,0.0901}, baton er2e nun using 2. Using te funcos for causa testing Generating signals X and Y + Dividing the da inten and ican aks ofa chosen fechtscur(LSTM, CRU, MLPA GRIF) onc bas only on past ales and second based on pst ‘bod Nand ¥ ing models onthe tring for The est ot 150 and Fortin the ales fr the testing wt Caletting pve om Wieoxon oft cana ram ¥ 10 Calelating cor mies (MSE. MAF, MGdAE} forte test et eect sizeof icading the past oF Y lnyrovement of perfomance de to sags of aul networks ver ABE ad GRIF mel ‘a edition Fig. 6. The process scheme ofthe data anasis rom generating the daa upto the causality and preéction assessment. Inte case of rel weré application the proces ‘woul be these, bt Xd Yel be The dats abaine fom the ey plots of the prediction error from predicted values are presented in Figs. 12 and 13, The usage of neural networks aver AR did not statistically sig- nificantly improve the prediction of the X signal if taking only past values of X as an input. On the other hand, the usage of NN over CGRBF with one signal as an input always results in significantly better prediction accuracy. If the model was based on both X and Y signals the usage of neural networks instead of the autoregres- sive model improved the prediction in 5 out of 6 cases (only the MLP model for 50 lags did not obtain significantly beter results) In the case of GRBF, only MLP mode! for lag equal to 150 outper- foamed it. For Doth lags, the effect size of incomporating past of Y etge('ehy fe ay Fae MN Zeaening_eater{0.001,0.00051, Bs sarons" (100,002, 100,/0,981y, sures epschs sur"( 30,109), sting 3. Usage of the function for measuring the causality change eve Une with MLP mals Mesh Mz and cut Compute Methads and Programs Beedle 216 (2022) 105688 20 as 10 05 00 Predicted value 05 -10 15 Predicted value ig. 5. The plot of he original valve of X. peice values fram the made ated onthe pat vaso hd Yu thee easally fom 0X then the value pel 15 10 os. oo Signal value 10 ais oso 1000 1500 2000 +2500 «3000 Number of sample Fig. 7. Valuation 5 est ign Xa Y wi the Hist SOE of the et ¥sgnl changed te random ae Lag = 50 — original x — Pred. based on x — Pred. based on x and Y ° 500 1000 1500 2000 2500 '3000 Lag = 150 — original x — Pred. based on x — Pred. based on x and ¥ 300 1000 1500 2000 2500 Number of sample rd pretited vals fom the mal bared on pat values of love o he aa values ha the ves peed ony th the past X bated bth sighs “alusipeecton ero f mage! based on bet signals shoud be lower, then the eer ef mode ase ely en), Rose ME Mica and. busts Lag = 50 ampute Meads nd rogram Blomedne 216 (202) 10565 Lag = 150 108 — Model based on X 2x 107 — Mode! based on X and ¥ 107 6x10 4x10? — Model based on X — Made! based on X and ¥ Soe coool Number of epoch 120 Fe 9 Pot presenting the dependencies between lass ‘bie ar ar) Number of epoch 120150 and uber of eed o ag equal te 50 and 150, [ror merce (MSE MAE an4 MedAE) obtained on tet et foreach mel ceed bate on 0 an 150 pst aes inte ate where Y-oX. The ems erat ag ve wes ‘Model waned ot ine ey ia 3 ar edn x ons one one one ors, Xandy Sax cone oon Son eae Kandy aa one base 036 one Piles or ech mode and ech tested ag ebained fom the Wilken signed fank tes im cae where VoX. Caen where 3 casa reltons har beer de 50 X. fer the random noise the :measute for Y-+X tends to be much higher, than for X-+¥ (as there {is no such causality). Incase of lag equal t0 50 in part of the plot where there is the random noise, the measures are similar for Both XSY and YX. while after this part the measure for YX seems to be slightly bigger than for XY (but smaller than for the lag ‘of 150), In the middle of the signals for both lags, there is an in- ‘crease in causality from X to Y, which means that in that part the ‘model was able to benefit from using the X signal in forecasting the Y in terms of decreasing the prediction error. The maximum of ‘the causality measute forthe lag of 50 is equal to 0324 and 0.139, Mesh Myc and cut Compute Meads and Programs Biemedine 21 (2022) 10568 Lap = 150 wy based on sms bose on Xand¥ 5 os 9 § os foo 20 Regs: eae usted on x tsttonadenktnd Ao ne § 0.00 , § 0.00 fe : 28 B-os0 2 B -050. ae alae Grubs on any beadon rend ae ac § 0.00 § 0.00 § 025 P Boas ERE . Ens aoe oles se bsed on x lot bed on end Bu 2 & 10 a £05 & oe 543 foe 1 pe ieee antaseon zene 2 oso 2 0.50) £025 2 025 ells 38 $28 ene) £58 we ben pei es pd ea for XY and Y-+X, respectively and for lag equal to 150 those val. ‘ues were equal to 0.318 and 0.520. The mean value of the causal- ity for lag equal to 50 was equal to 0.009 and 0.014 for X-»¥ and Y->X. respectively, while for lag equal to 150 those values were ‘equal t0 0.007 for X-+¥ and 0.128 for ¥-»X., However, it should be ‘emphasized that the mean and maximum values of the causality ‘measure are showing the gain ffom incorporating the other vati- able into prediction and those values cannot be interpreted as a Presence of causality between signals on their own, As showed in the previous sections the assessment of the presence of causality is performed using Wilcoxon signed-rank test. 4. Discussion ‘The methods presented in this paper are focused on overcom- ing the main disadvantages of the non-modified Granger method ‘based on linear autoregressive models and proposing an alternative {or existing nonlinear methods. The first weakness of the linear Granger causality test is that of using AR models the state-of-the- art Granger approach may not capture the moze complex, nonlin- ‘ear causality dependencies like polynomial, exponential, logarith- ‘mic, or others that are visible in many biomedical. physiological ‘economic and social measurementsfsignals. What is more, the AR ‘models are assuming the stationarity of the signals, which is prob- Tematic as most real-life signals are not stationary (however this problem might be also addressed by usage of vector error corsec- ton model [2627]. but this approach is stl assuming the linearity of the dependency and is not implemented in the existing Python packages). Thanks to the usage of neural networks models itis possible not only to detect the causal relations, which are nonlin ‘ear but also to test the dependencies between nonstationary time series and obtain very accurate forecasting results In order to test, the proposed approach and the created Python package, two time series were simulated, where the signal X was a polynomial of the Y signal and was delayed in relation to it by 100-time steps. The data analysis was performed in 2 different ways. In the fist ap- proach, both models (based on the past of X and the past of X and YY) were tained on the first 70% of the data and testing for causal- Mesh Mz and cut Compute Methads and Programs Beedle 21 (2022) 105688 Lags:50 xy ig 44 The messre of cust (271) fom signal Yt signal X (V2) and ie signal X we ¥ (XY) alongwith signal X and or ans fo 50 as Theat on {he lel tepeesens the sgh vals andthe aon the ight represents the casa vale Lagsas0 = 10 20 =. zg q B co i oa state = th Aloo Se a0 Si ato anno urber of sample ig. 5. The measte of easly (C4 (27) om signal Vt signa X Y-0X) an to signal Xt ¥ (KY) long wt signal X and Y for ana for 150 ag. The axis an {he let epeesens the gpl values andthe aon the ight fepesets the casa ae ity on the remaining 30% to check the robustness of the methods. ‘The second analyses were similar tothe frst one, but the ¥ signal, ‘was changed (0 4 random noise so there was no causality relation between X and ¥. This change was made to test if the proposed methods do nat indicate the false causality relations. The proposed models were compared with the autoregressive model used in the ‘traditional Granger method and with the Generalized Radial Basis, Functions neural network used in IsNGC approach (14) The created functions based on neural networks were able to detect the causality relation between time series for lag both smaller and greatet, than the actual delay between the two sig- nals. The GRBF models also indicated the presence of true causal- ity, unlike the AR ones, which were not able to detect this depen- dency. The biggest diference between the results for lag bigger and smaller than the actual delay between X and ¥ signals were the p-value, Cohens'd and error metties for MLP models. The p- value was many orders of magnitude smaller forthe lag equal to 150, while Cohen's d was much higher for this lag value. The MLP ‘model based on both X and Y obtained much smaller error met- ries in case of the lag equal to 150 compared to the results for lag equal to 50. In the case of lag equal to 150, there was a large drop in the precision of GRBF model based on the past of X com- pared to the same model but for a smaller lag value. Inthe case of other models, the performance was rather similar irrespective (of the value ofthe lag. The plot of predicted value from the actual value was much more centeted around the line y equals to x for Rose ML Micah and. busts MLP mode! using both signals for lag equal to 150, which also ind ‘ales that the most accurate prediction was for this model. In the ‘ase of GRBF model using only X signal fr lag equal to 150, the points are the least concentrated around the mentioned line, which also confirms its lowest accuracy. When it comes to the plot of the prediction error from the predicted values, the MLP mode! based ‘on X and Y for lag equals 150 is distinguished, where the error at higher predicted values is much smaller than in the case of other models, while for GRBF based on X there is visible more dispersion ‘of the points, which incicates the lowest prediction performance, ‘The best results in terms of Cohen's d and error mettics were ob- tained for MLP for lag equal to 150 for models based on both X and Y signals. The usage of NN instead of AR mostly resulted in mote accurate results in the ease of models based on both signals, ‘hile compared to GRBF usage of NN was always more efficent for models based only on the past of X When Y time series was changed to a random noise to test it proposed methods do not indicate the false dependency when ‘there is no causal relation between examined time series, all neu ral networks obtained from the created package and AR modes did not detect any causality between the signals for any lag value. In the case of GRBF the false causality was detected for lag equal to 150. Error metrics and Cohen's d obtained by NN and AK models ‘were similar to each other, while for GRBF the prediction accuracy ‘was the lowest especially in case of lag equal to 150. Compared to the autoregressive models used in traditional Granger causality analysis the proposed approach was character- ized by better or similar prediction performance and discovery of nonlinear causality relations undetected by the AR models. In the ‘ase of comparison with the nonlinear approach using GRBF the proposed solution was always superior in prediction based on the past of only one variable and did not indicate any false causality, ‘Which appeared to be the issue in one case for GRBF models. The advantage of the proposed method over other approaches using non-linear data transforms such as KGC ts the lack of assumptions about the data transformation (kernel used). The prepared Python package is superior to other existing Solutions that use neural net- ‘works with lasso regularization to study causality [29,40] as it a= lows for statistical inference and the results are independent of the lambda parameter used in the lasso regularization. AS the neural networks applied in this research are already used in the analysis of physiological data [41.42.45], the created package can also find ‘wide application in the study of causal relationships in physiology, but also other scientific fields To our knowledge, 2 novelty of the created package isthe pro- posed method of studying the change in causality over time, which may allow for a better understanding of the causal relationships between the signals. For both lags, the causality values seem to be bigger for the part of the signal without random noise. As ex- pected, the plots of the change of causality over time show much more causal dependency for the lag equal to 150. than for lag equal to 50. For the higher lag. there is very clearly visible the moment ‘when the causality relationship appears. The value of the causality measure varies over time, probably due to the added noise to the signal and to the out-of-sample testing. This featute ofthe package ‘an be very useful especially in the case of signals which depen- dence varies over time. ‘The limitations of the prepared package are the computational complexity and time-consuming involved in training neural net- ‘works, However, the benefits of using neural networks seem to ‘outweigh these issues. The limitation of the study is the use of sirmulated data, so further studies on real-world data are planned, In the near future, we plan to focus on using the created package to investigate the causal relationships between biomedical signals and their dependence on various vital parameters as a cont ation of research conducted by Miyfiezak and Keysztoflak [16.17] ampte Methods nd rogram Blomedine 26 (2022) 10888 We plan, among others, to use the package in the analysis of the causality between the respiratory signal from impedance pneu- ‘ography (Udal volume equivalent) and the catdiological signal from the ECG (mainly RR-intervals and tachogram as their inter polation) and to investigate the causality phenomenon in various -roups of patients (network physiology paradigm |58)} 5. Conclusion The usage of neural networks in causality testing allows captur- ing the nonlinear causal dependencies, which are not detected by the AR model used in the state-of-the-art Granger method. In the ‘ase when there is no causal relation neural-network-based meth- ‘ods do not indicate false causality, which might be an issue while using GRBF model for prediction. Usage of neural networks al- lowed to provide better prediction results especially in the case of ‘multilayer perceptron taking as an input past values of both time series fora lag value greater than the actual delay. This model ob- tained the highest effect sizeof incorporating the past of Y signal into the prediction model. The measure of the change of causality ‘over time seems to be a valid Feature that allows to better under- stand the detected dependencies between the signals. The created package can be widely used (is available in PyPl[36)) in the anal sis of signals indifferent scientific fields lke neuroscience, physiol= ‘gy of economy. Thanks (o the proposed method, itis possible to study nonlinear dependencies, study causality changes over time, and unlike similar nonlinear approaches, itis easily usable thanks to the package created in Python. Declaration of Competing Interest ‘The authors declare that they have no known competing finan- cial interests or personal relationships that could have appeared to influence the work reported in this paper ‘Acknowledgments This research was not financially supported by any institution fr organization. No ethical approval was required. The authors de- clare that they have no conflict of interests References [1] CW, Granger vesiaing xo ations by econometric madels and crs KEP Maradona es Paghan 5 Data, DB. Zak K Gaur, Me Jayme, Counties the Granger caabty approach IMB Manag. fe. (2019) e010 ‘sate fcr in econo gow? Evidence tes Teco Fees Soe change 2018), da 0105) Stig Ca, zhang Wong. Yang. Zhang. CU Mace disesons for anger casi and ew causa mess, Cogs Newodye 6 (2012) 35- ton co neuioscinc, in Habdook of Tne Seats Ata: feceat Therese Developments and Applications. Wey-VCH Veg Gab Co, KEah. 205, auslty measures in neurescence, ta Med. 2018 10 [0027521 azatml Sk. Mi Asan A Shlbal, H.Sehnay NER. Dal. Granger fausalty nays In combination with dete neta mesures for dhs Fetes os pavent ana heathy controls using tskeatee MRL Compa Bo Med (2018). 6:10 us} compoomee 2019103855, Zipobasvand, AML Nasraad A seearanzed feurent neural network or Estimating te etecve connect ands aplctin FE daa, Compa Bal ed (209) e010 eh eompaomee 3013.0 Mesh My and cut 01) Mc. Tana R Slee, AML Rane GMAC: a tated tetbox for spec lager causaity nao of IMM data Comput Bat Med (2002) doo "oie comphonee 01207003, [02] Rat Demet 5. Ozerdem.C Baya Mend, Dewzminasen of 66 i formation aw acy based 9 Ganges causality a ber tans, Comput Methegs Programs Samed (2013), 0 0s} 0120801 (03) Woe Wang, Per | hang. Zang Sigel 2G eaten ee Ion using ranger causaliyfuanterenopy ana. | Neos. Methods (aozny deo ot neuer 2020108808, nisms. AM. Dseuza, MA. Vosoueh A. Abin, Large-scale online Cranes causality fr veing dreceecependence fram Shore mutate lpesenes data Se Rep (2021 do 01:0 /ois9so71es 135-6 [os] LiF. Nol, A Fort, Non-unform multvarate embeedig asses he Ingrmaten water in cardovsculr and earespstry Vary ses, Comput lel Mee 2012, 91010) carpbimed 2011020 (a6) AE MyezakHKeyetofiak Discovery of causal pats in eateorespatory pe ameter imestependen porch in ite athletes, Wane Pry 208 Garo 35sphgs 2038 01855 1071 Me Msc Keyofak, Crdorespitory tempera eau inks andthe ferences by sport lack here, Ft. ysl. 2018 dr 10338536 sm.0o4s, [os] AD. Oxjue-Canén, A Cergers. 1A Feu. lier, AG Rawle Carel Sleep spness acing eects of Git reson af CFA? therapy by means Clangereauaiy Comput Methods Trorams Rome (2020, 10015) [09] AD. aime Abarcin. AD. Orsel-Cenon, AL eine. MA. Baza, Doe or Brain and heat pologal networks stalyar employing Mera ne treks grange cus ne Paceding» af the 1s fenton! TERIEMBS etc on esa ingen NER 20 da noon 0 [20] Corin F. Chases, F Roche JC Batis, V. Peet. Casal anaes to study ttesomiereplaton dung ate hea-ou water imersian eae fw tt apd supine poston Exp. Psi. (2020), easio I ]-roseb, fan R'"Seth “Changer casalty, Seblapedia 2 (2007) "G67, da 04249) ‘coordi. [22] CA Sis. Money, income, ad aul. Ar. Eon. Rv. 5 (1872) 540-552, fdaviniishewnce 3712857. [as SAS! Granger causality test SAS (2021), psuppertsascom snl appes ‘atl erence actesed December IL 2021) [2a] SE Besse, Aik Sek Wienescrage: causal 2 well established method fe, Newounage Se (011) 325-128 dio 01) .newosage 01002059. [25] Ceeke Mesure ef lear dependence and feedbuck between ule {ane sees Am Sat Asoc 77 (12) 304-313, dd Lo oso Teas Te (25) AE. Obayel, AS. salu. Ayal sesponse to pes abd exbange ate th Rigi apleason of encgraton abd veo eter caiection model (Veen | age sa 2010) 1 0s os7e6606 200 Nss858 ler) FRAN Asc NS Babatudit Neh ” Menamaa N shansudn. K just ‘vector eo cerecon model {ECM approx in explaining the eaten Ship beoten ines tate ane aan towarés exchange rate vol Natasa Word fopt se} (201) 49-55 las) Ancona. Darna, staal, Rada bas anton approach non Test ange aust af tine see Pys Re Sat. Phys, Pasas Fics eat nteretsep Top. 2004), dt 01103 Fhtevt. 0038221, [a9] D ofaritazzo. Pelco, stramagia, Kee thd for alner Crane aus Pps ev Let (2008), do 0s Pysteviet 10014103 [30] D annaza, Plier, 8. Saraga Kermeranger causalty an ehe ‘alyssa eam networks Phys Rev Stat Nona Sr Mate PMS. {Gooey dio rionPhytent aso. [31] N.Nieoas, 6. Constndines. A noninearcuslty estimator based on 100 Darmetrie muleplestwe regression Fone Nesranfr (2078) €0°00 2595) Fer aoreooa's [b2] A Monts s tama. Fes. G.TesstoreR Peet. Mariano, Neral ‘Networks with notin embeesing ane exp! vation phe fo set easly, Neural Netw. 7 (2015) 15871, da 016} neure 201, 0 [b3] A Atanas U. Tees, Detecting haan nfuence on ciate wing pera Sets bie range: cay Teor Ap at 13 Ga 10, Compute Methods and Programs Biemedine 216 (2022) 10568 (24) 7.14 6 U4 1.2m, 5 Zhang, Analy baa connect inthe mutual ep Insane eteovemet wg baeinal Grange causally oat. Neu ose 2020 368, 103580200309. [35] W Muang 2 fu CLE Manske Detecting causality trom sme sees ina oe his einingfanework Chios 30 (2020) 06311, e010 053)50007070 (95) Nate, AMC RasabadL t Nohanmae-ezazaden, Eximaon of tleive fanatnty using allay perceptron aif eal network, Coa. Neu fodyn (208), e910 o07susri-r-95-1 [b7) Taleb, An. hasrdael Mehamad-Reazade,K Coben, NREANN non nea causal felatonship estimation by aca neu network applied or sn cannectvty sey TEE as. Ne. aging (2019), 101090 Sovsaonea (38) Largescle Neninsr ranger causality, 2021. bp inubcom) [rgeseale-catsaliy inference rye sele-pokneacausaity (essed December 1,202) [b9) K Mareen, b Miadneré Grangecasal inference in time seis for een mleclr ners ring sleep eps etd om 82467) INNGc SLMHA [aecesee December 12021 {40} A"Tank Covers N Fos. A Shoe fox, Neural Granger aul, EE ‘rane Faster hoa Mach nt 2009 dei 9) Thal se es [a1] 5 Kaur A Choudy, PR. Sevan, N Dasgupta, 5 Natraian. LA. {Date Aly heltheste time-series freearng sing vst) Seral nd Ensemble sretecare Fone ig Dats (202) do o80 fen ane [42] EM Dames: GL Marla HAV fessrer 30 able phyrilogis) marers for sees etc wing wearable donee, Sener (atl, 89:10) sions [40] ¥ YeXC SC Ha.) Zhang. A review of ecrent perl neswrks 1ST cl 2nd neta atchtertres, Neural Campa (2018 do 0 o2/eco 0188 (44) [tanya Fiance sere freating model based on CEENDAN nd UST, Fy Stat Mec. Appl (08) da 015 physn 2018106, tis Pines Sia Rar, asin ate ad SM a lc {Gabe} 2019, casio iosyeac 2019 ssa402 (as) S Xiang ¥ Oi, € Zu, Y. Wang Che, Lag shorter meery eu et Seek with weigh stpiiaton atid sppeation sno gea emai wel ec AoA eT 0 sop a7} Coodtelow . beng, A. Cour Deep teuting, MET Fess, 2016 [aa] Koch & Van Merten c cuceve Bandana, Bougaes, 1 Swenk, ‘beng, Leung pase fepeesetains ing RAN ebtade-decode ft Satsteal mahibe Vasiaten in Preceedngs af the Confrence ob Ep den Rata Taga Peg, 26 9001 (4s) | chung ule. Kho, ¥. Beng, Gated feedback recent neal net tert Proceeding ofthe 32nd nerational Conference on IM. 2015, (50) Jc Beck, En Sue, Sumplee mininal gated unt vanatens fer ieuteat eral owes rocedngs ofthe IEE 60% iernatonal Mudwest Sj Dastum on Cus ane systems (MWSCAS, 2017 de 0 105)¥0NSKS 2017 Ist). Line Time series analysand ics applatins: with K examples, 2007, ‘oefasazo0 sa, (52) Fvhicoon. Inia comparisns by ranking metbag, Bom. Bull 1 (1845), B08, eai0.2507 [300858 (58) Te Vicon Signed-tenk Test ~ Fyhon implemencaton, (2020). stub comispy cp b)1 8 spysatsmerestatspy#1380813059 (ace fesse December 11 2021 54) FHC Ritenburgh Sizes in Medicine, Beir te bsreotacosti0-8380%6-8 [55] Mose. Nenlncausalc. Python pacage- git, (2021 hp tb com) rovaion nein cere Beerber 303), [56] Mosel Nenineausaliy. Phen pacage = FY (2021) hsp pviore Projet monica) aeesed Deeb 1, 2021, 157] F Chat others: Kera ray, (205) po (aeesed Deceber hort} [58] PC Ivano, RP Bars, Network phsilogy: mapping interactions between seen of ppg etwas. Unde Compe it 01) a3 2006, €o:10 1016)

You might also like