Hipotesis Parameter Deret Fourier
Hipotesis Parameter Deret Fourier
MethodsX
journal homepage: www.elsevier.com/locate/methodsx
a r t i c l e i n f o a b s t r a c t
Method name: Nonparametric regression model with the Fourier series approach was first introduced by Bilodeau
Fourier series function, Likelihood ratio test in 1994. In the later years, several researchers developed a nonparametric regression model with
the Fourier series approach. However, these researches are limited to parameter estimation and
Keywords:
there is no research related to parameter hypothesis testing. Parameter hypothesis testing is a sta-
Nonparametric regression
Fourier series function tistical method used to test the significance of the parameters. In nonparametric regression model
Hypothesis testing with the Fourier series approach, parameter hypothesis testing is used to determine whether the
Likelihood ratio test estimated parameters have significance influence on the model or not. Therefore, the purpose
Return on asset of this research is for parameter hypothesis testing in the nonparametric regression model with
the Fourier series approach. The method that we use for hypothesis testing is the LRT method.
The LRT method is a method that compares the likelihood functions under the parameter space
of the null hypothesis and the hypothesis. By using the LRT method, we obtain the form of the
statistical test and its distribution as well as the rejection region of the null hypothesis. To apply
the method, we use ROA data from 47 go public banks that are listed on the Indonesia stock
exchange in 2020. The highlights of this research are:
Specifications table
∗
Corresponding author.
E-mail address: [email protected] (I.N. Budiantara).
https://doi.org/10.1016/j.mex.2023.102468
Received 2 October 2023; Accepted 29 October 2023
Available online 31 October 2023
2215-0161/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/)
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
Method details
The purpose of this research is to develop a method for parameter hypothesis testing in nonparametric regression with the Fourier
series approach using the Likelihood Ratio Test (LRT) method and we apply the method to Return on Asset (ROA) data of 47 go
public banks on the Indonesia stock exchange in 2020. Based on the hypothesis testing that will be carried out, we will obtain the
formula of the statistical test and its distribution as well as the rejection region for the null hypothesis. The method details used in
this research are given as follows.
Regression analysis is part of the statistical method used to model the relationship between predictor and response variables.
Suppose 𝑥𝑖 is the predictor variable and 𝑦𝑖 is the response variable on the 𝑖𝑡ℎ observation with 𝑖 = 1, 2, ..., 𝑛, the relationship between
(𝑥𝑖 , 𝑦𝑖 ) could be expressed as follows.
( )
𝑦𝑖 = 𝑓 𝑥𝑖 + 𝜀𝑖 , (1)
where 𝑓 is the regression curve and 𝜀𝑖 is the error term which we assumed to be normally distributed with mean 0 and the constant
variance of 𝜎 2 . In regression analysis, there are several approaches for the model (1), namely the parametric regression model and
the nonparametric regression model [1]. If we assume 𝑓 as a known function, then model (1) could be approached using parametric
regression. However, if we assume 𝑓 as an unknown function, then the model (1) could be approached using nonparametric regression.
The assumption of 𝑓 as a known or unknown function could be seen by using a scatterplot [2]. In this research, we are assuming 𝑓
as an unknown function. Therefore, the model (1) is a nonparametric regression.
Nonparametric regression is a regression approximation which is not bound by the assumption that the shape of the regression
curve is known and has flexible properties as the function 𝑓 could adapt to the nature of the local data. Since 𝑓 is a nonparametric
function, 𝑓 could be approached using one of the nonparametric estimators. The estimator which could be used to approximate 𝑓
is the Fourier series function. The Fourier series is a trigonometric polynomial containing cosine and sine functions, which Joseph
Fourier first introduced. In 1977, Jong was the first researcher to conduct research related to the Fourier series which discusses the
transformation of the Fourier series for smoothing of the density function in the spectral estimator [3]. In the later years, it followed
∑
by several researchers, with the form of Fourier series given as of 𝑓 (𝑥) = 𝛼2 + 𝐾 𝑘=1 (𝛾𝑘 cos(𝑘𝑥) + 𝛿𝑘 sin(𝑘𝑥)) [4–7]. However, Bilodeau
in 1992 developed the Fourier series function for a smoothing model in nonparametric regression by modifying the function. Bilodeau
modifies the function by using the cosine functions only and adds 𝛽𝑥 as a trend into the Fourier series function [8]. Therefore, the
∑
Fourier series function becomes 𝑓 (𝑥) = 12 𝛼 + 𝛽𝑥 + 𝐾 𝑘=1 𝛾𝑘 cos(𝑘𝑥). This type of the Fourier series function was developed and used
in a nonparametric regression model [see, 9–12]. The advantage of using the Fourier series function in nonparametric regression is
being able to handle data that has a recurring trend at certain intervals and has a good statistical interpretation.
The nonparametric regression model given in (1) is the nonparametric regression that contains only one predictor variable (uni-
variable model). In this research, we present the number of the predictor variables as 𝑝 predictor variables (multivariable model).
Suppose 𝑛 is the number of observations and 𝑝 is the number of the predictor variables, the relationship of the predictor variables
and a response variable (𝑥𝑖1 , 𝑥𝑖2 , ..., 𝑥𝑖𝑝 ; 𝑦𝑖 ) assumed to follow the nonparametric regression model as follows.
𝑦𝑖 = 𝑓 (𝑥𝑖1 , 𝑥𝑖2 , ..., 𝑥𝑖𝑝 ) + 𝜀𝑖 , 𝑖 = 1, 2, ..., 𝑛, 𝜀𝑖 ∼ 𝑁(0, 𝜎 2 ) (2)
If we assume that all the predictor variables are independent or in other words between 𝑥1 , 𝑥2 , ..., 𝑥𝑝 are not correlated, then model
(2) could be written in additive model form as follows.
( )
𝑦𝑖 = 𝑓 𝑥𝑖1 , 𝑥𝑖2 , ..., 𝑥𝑖𝑝 + 𝜀𝑖
( ) ( ) ( )
= 𝑓1 𝑥𝑖1 + 𝑓2 𝑥𝑖2 + ... + 𝑓𝑝 𝑥𝑖𝑝 + 𝜀𝑖
𝑝
∑ ( )
= 𝑓𝑗 𝑥𝑖𝑗 + 𝜀𝑖 . (3)
𝑗=1
Since (2) is the nonparametric regression model then 𝑓𝑗 are unknown nonparametric regression curves. Let 𝑓𝑗 are a continuous
function, where 𝑓𝑗 ∈ 𝐶(0, 𝜋), then the function of 𝑓𝑗 could be approximated with the Fourier series function [8].
∑𝐾
( ) 1 ( )
𝑓𝑗 𝑥𝑖𝑗 = 𝛼𝑗 + 𝛽𝑗 𝑥𝑖𝑗 + 𝛾𝑘𝑗 cos 𝑘𝑥𝑖𝑗 , (4)
2 𝑘=1
where 𝛼𝑗 , 𝛽𝑗 , and 𝛾𝑘𝑗 with 𝑗 = 1, 2, ...𝑝 and 𝑘 = 1, 2, ...𝐾 are the parameters in the model and 𝐾 is the oscillation parameter which
∑
represents the number of waves in the cosine function. Therefore, by submitting Eq. (4) into the regression curve of 𝑝𝑗=1 𝑓𝑗 (𝑥𝑖𝑗 ) in
(3), we simplify have
𝑝 𝑝
( 𝐾
)
∑ ( ) ∑ 1 ∑ ( )
𝑓𝑗 𝑥𝑖𝑗 = 𝛼 + 𝛽𝑗 𝑥𝑖𝑗 + 𝛾𝑘𝑗 cos 𝑘𝑥𝑖𝑗 . (5)
𝑗=1 𝑗=1
2 𝑗 𝑘=1
2
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
Furthermore, if we describe for 𝑗 = 1, 2, ..., 𝑝 and 𝑖 = 1, 2, ...𝑛, we obtained the Eq. (5) in matrix and vector form as follows.
𝑓̃ = 𝐗(𝐾 )𝐵̃ , (6)
∑
where 𝑓̃ = 𝑝𝑗=1 𝑓̃𝑗 , 𝐗(𝐾) = [𝐗1 (𝐾) 𝐗2 (𝐾) ⋯ 𝐗𝑝 (𝐾)], and 𝐵̃ = [𝐵̃ 1 𝐵̃ 2 ⋯ 𝐵̃ 𝑝 ] , with 𝑓̃𝑗 = [𝑓𝑗 (𝑥1𝑗 ) 𝑓𝑗 (𝑥2𝑗 ) ⋯ 𝑓𝑗 (𝑥𝑛𝑗 )] ,
′ ′
( ) ( ) ( ) ( ) ( ) ( )
⎡1 𝑥11 cos 𝑥11 cos 2𝑥11 ⋯ cos 𝐾𝑥11 ⎤ ⎡1 𝑥12 cos 𝑥12 cos 2𝑥12 ⋯ cos 𝐾𝑥12 ⎤
⎢2 ( ) ( ) ( ) ⎥ ⎢2 ( ) ( ) ( )⎥
⎢1 𝑥21 cos 𝑥21 cos 2𝑥21 ⋯ cos 𝐾𝑥21 ⎥ ⎢1 𝑥22 cos 𝑥22 cos 2𝑥22 ⋯ cos 𝐾𝑥22 ⎥
𝐗 1 (𝐾 ) = ⎢ 2 ⎥, 𝐗 (𝐾 ) = ⎢ 2 ⎥, ...,
⎢⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥ 2
⎢⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ ⎥ ⎢ ⎥
⎢1 ( ) ( ) ( )⎥ ⎢1 ( ) ( ) ( )⎥
⎣2 𝑥𝑛1 cos 𝑥𝑛1 cos 2𝑥𝑛1 ⋯ cos 𝐾𝑥𝑛1 ⎦ ⎣2 𝑥𝑛2 cos 𝑥𝑛2 cos 2𝑥𝑛2 ⋯ cos 𝐾𝑥𝑛2 ⎦
[ ]′
⎡1 𝑥1𝑝 cos(𝑥1𝑝 ) cos(2𝑥1𝑝 ) ⋯ cos(𝐾𝑥1𝑝 )⎤ 𝐵̃ 1 = 𝛼1 𝛽1 𝛾11 𝛾21 ⋯ 𝛾𝐾1
⎢2 ⎥ [ ]′
⎢1 𝑥2𝑝 cos(𝑥2𝑝 ) cos(2𝑥2𝑝 ) ⋯ cos(𝐾𝑥2𝑝 )⎥ 𝐵̃ 2 = 𝛼2 𝛽2 𝛾12 𝛾22 ⋯ 𝛾𝐾2
𝐗𝑝 (𝐾) = ⎢ 2 ⎥,
⎢⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥ ⋮
⎢ ⎥
⎢1 ⎥ [ ]′
⎣2 𝑥𝑛𝑝 cos(𝑥𝑛𝑝 ) cos(2𝑥𝑛𝑝 ) ⋯ cos(𝐾𝑥𝑛𝑝 )⎦ 𝐵̃ 𝑝 = 𝛼𝑝 𝛽𝑝 𝛾1𝑝 𝛾2𝑝 ⋯ 𝛾𝐾𝑝 .
In general, for 𝑖 = 1, 2, ..., 𝑛, the nonparametric regression model is given in Eq. (3) could be written in matrix and vector form as
follows.
𝑦̃ = 𝑓̃ + 𝜀̃ = 𝐗(𝐾 )𝐵̃ + 𝜀̃, (7)
𝜀𝑛 ] is the error term where 𝜀̃ ∼ 𝑁(0̃ , 𝜎 2 𝐈).
′ ′
where 𝑦̃ = [𝑦1 𝑦2 ⋯ 𝑦𝑛 ] is the response variable and 𝜀̃ = [𝜀1 𝜀2 ⋯
Parameter estimation
To obtain the estimation of the regression curve of 𝑓̃ is equivalent to obtaining the estimation of the parameters. As in many
nonparametric regression models, there are many methods to obtain the estimation of the parameters such as the Penalized Least
Square (PLS) method if the regression curve of 𝑓̃ are assumed to be a smooth function [8,10,13,14]. However, if the regression curve
of 𝑓̃ is assumed to be only an unknown function and presented as a linear model as of (7), then we could use the Ordinary Least
Square (OLS) method that minimizes the sum of the square of the error. By using the optimization of the OLS method, the parameter
estimation of 𝐵̃ could be obtained as follows [12,15-17].
{ } {( )′ ( )}
𝐵̃̂ = arg min 𝜀̃ ′ 𝜀̃ = arg min 𝑦̃ − 𝐗(𝐾 )𝐵̃ 𝑦̃ − 𝐗(𝐾 )𝐵̃ = 𝐀(𝐾 )𝑦̃, (8)
𝐵̃ ∈𝑅𝑝(𝐾+2)
where
[ ]′ [ ]′
𝐵̃̂ = 𝐵̃̂ 𝐵̃̂ ⋯ 𝐵̃̂ = 𝛼̂ 1 𝛽̂1 𝛾̂11 ⋯ 𝛾̂𝐾1 𝛼̂ 2 𝛽̂2 𝛾̂12 ⋯ 𝛾̂𝐾2 ⋯ 𝛼̂ 𝑝 𝛽̂𝑝 𝛾̂1𝑝 ⋯ 𝛾̂𝐾𝑝 ,
[ ]′ ( )−1
𝑦̃ = 𝑦1 𝑦2 ⋯ 𝑦𝑛 , and 𝐀(𝐾 ) = 𝐗′ (𝐾 )𝐗(𝐾 ) 𝐗(𝐾 )′ .
Based on Eq. (6) and (8), we obtain the estimation curve of the nonparametric regression with the Fourier series approach. Noted
𝑓̃ is the regression curve and 𝐵̃ is the parameter in the model which we estimated by 𝐵̃̂ as of Eq. (8). Therefore, the estimation of the
regression curve of 𝑓̃ is 𝑓̃̂ as follows.
𝑓̃̂ = 𝐗(𝐾 )𝐵̃̂ = 𝐗(𝐾 )𝐀(𝐾 )𝑦̃ = 𝐕(𝐾 )𝑦̃, (9)
where 𝐕(𝐾) = 𝐗(𝐾)𝐀(𝐾) = 𝐗(𝐾)(𝐗′ (𝐾)𝐗(𝐾))−1 𝐗(𝐾)′ .
The estimation of the regression curve (10) is in matrix and vector form. In
general, if we make an analogy by Eq. (6) and (4), then the estimation of the regression curve in nonparametric regression with the
Fourier series approach could be written as follows.
∑𝐾
( ) 1 ( )
𝑓̂𝑗 𝑥𝑖𝑗 = 𝛼̂ 𝑗 + 𝛽̂𝑗 𝑥𝑖𝑗 + 𝛾̂𝑘𝑗 cos 𝑘𝑥𝑖𝑗 . (10)
2 𝑘=1
Furthermore, the parameter estimation of 𝐵̃̂ which we obtain using the OLS method in (8) contains an unknown parameter, namely
the oscillation parameter of 𝐾. In nonparametric regression with the OLS method, there is always one unknown parameter such as
knot in the Spline function, bandwidth in the Kernel function, and oscillation parameter in the Fourier series function. Therefore, to
obtain the best parameter estimation which fits into the model is the same way as obtaining the optimum number of 𝐾 in the Fourier
series function. The method which could be used to obtain the optimum number of 𝐾 is the Cross Validation (CV) or the Generalized
Cross Validation (GCV) method. The GCV method has been developed by many researchers [see, 18–19] for the Spline function and
3
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
Bilodeau for the Fourier series function [8]. The GCV formula for choosing the optimum number of the oscillation parameter 𝐾 is
given as follows (the optimum number of 𝐾 is obtained based on the minimum value of the GCV method):
MSE(𝐾 ) 𝑛−1 𝑦̃𝑇 (𝐈 − 𝐕(𝐾 ))′ (𝐈 − 𝐕(𝐾 ))𝑦̃
GCV(𝐾 ) = ( )2 = ( )2 . (11)
𝑛−1 trace[𝐈 − 𝐕(𝐾 )] 𝑛−1 trace[𝐈 − 𝐕(𝐾 )]
Parameter hypothesis testing plays an important role in modelling and is part of the statistical inference which is essential in
regression analysis. Parameter hypothesis testing is used to determine whether the estimated parameters have a significant influence
on the model or not. In nonparametric regression model with the Fourier series approach, parameter hypothesis testing has not
been carried out previously. Referring to previous researches, this model was used by several researchers for modelling or even
for prediction in various fields/data [see, 12,15-17]. However, these researches were only focused on estimation and modelling.
Therefore, it is essential to develop a method for parameter hypothesis testing in a nonparametric regression model with the Fourier
series approach. One of the methods could be used for parameter hypothesis testing is the LRT method. The LRT method is a method
that compares the goodness of fit of two different models (the model under the null hypothesis and the model under the hypothesis).
This method is widely used for parameter hypothesis testing in many regressions analysis [see, 20-22].
According to Casella and Berger, the LRT method for hypothesis testing is related to Maximum Likelihood Estimation (MLE) [23].
Let 𝑍1 , 𝑍2 , ..., 𝑍𝑛 are random samples from a population with the Probability Density Function (PDF) of 𝑓 (𝑧|𝜇), where 𝜇 is a parameter
(𝜇 may also be a vector), then the likelihood function could be defined as follows.
∏𝑛
𝐿(𝜇|𝑧1 , 𝑧2 , ..., 𝑧𝑛 ) = 𝐿(𝜇|𝑧) = 𝑓 (𝑧𝑖 |𝜇).
𝑖=1
Definition 1. Suppose Θ is the parameter space, then the LRT statistic to test 𝐻0 ∶ 𝜇 ∈ Θ0 versus 𝐻1 ∶ 𝜇 ∈ Θ𝑐0 is given as follows.
sup 𝐿(𝜇|𝑧)
Θ0
Λ= , 0 < Λ ≤ 1, (12)
sup 𝐿(𝜇|𝑧)
Θ
where 𝐿(𝜇|𝑧) is the likelihood function with the parameter of 𝜇 and the LRT rejected the null hypothesis in the region of
{𝑧1 , 𝑧2 , ..., 𝑧𝑛 |Λ ≤ 𝑐} where 𝑐 is any constant number with 0 ≤ 𝑐 ≤ 1. Suppose 𝜇̂ is the parameter estimation under the parameter
space of Θ and 𝜇̂ 0 is the estimation parameter under the parameter space of Θ0 which both are obtained by MLE and maximize the
likelihood function. Therefore, the LRT in (12) could be written as follows.
( ) max 𝐿(𝜇|𝑧)
𝐿 𝜇̂ 0 |𝑧 Θ0
Λ= = , 0 < Λ ≤ 1. (13)
𝐿(𝜇|𝑧
̂ ) max 𝐿(𝜇|𝑧)
Θ
As the central objective of this research is to develop a method for parameter hypothesis testing in nonparametric regression
with the Fourier series approach, the initial step involves the formulation of the hypothesis. Suppose the hypothesis form is given as
follows.
𝐻0 ∶ 𝛼1 = ... = 𝛼𝑝 = 𝛽1 = ... = 𝛽𝑝 = 𝛾11 = ... = 𝛾1𝑝 = ... = 𝛾𝐾1 = ... = 𝛾𝐾𝑝 = 0 vs
𝐻1 ∶ at least one of 𝛼𝑗 ≠ 0, 𝛽𝑗 ≠ 0, 𝛾𝑘𝑗 ≠ 0, for 𝑗 = 1, 2, ..., 𝑝 and 𝑘 = 1, 2, ..., 𝐾. (14)
The hypothesis given in (14) is a form of hypothesis which tested two different models (a model without parameters and a model
containing at least one of the parameters). Mathematically, the hypothesis form in (14) could be written in the following form.
𝑝 𝑝
( ) ∑ ( ) ( ) ∑ ( )
𝐻0 ∶ 𝐸 𝑦𝑖 |𝑥𝑖𝑗 = 𝑓𝑗 𝑥𝑖𝑗 = 0 𝑣𝑠 𝐻1 ∶ 𝐸 𝑦𝑖 |𝑥𝑖𝑗 = 𝑓𝑗 𝑥𝑖𝑗 ≠ 0. (15)
𝑗=1 𝑗=1
Under the assumption of model (2), we know that 𝜀𝑖 are normally distributed with mean 0 and the constant variance 𝜎 2 , the PDF
of 𝜀𝑖 ∼ 𝑁(0, 𝜎 2 ) is
( ( )2 ) ( )
( ) 1 𝜀𝑖 − 0 1 𝜀2
𝑔 𝜀𝑖 = √ exp − = √ exp − 𝑖 . (16)
2𝜋𝜎 2 2𝜎 2 2𝜋𝜎 2 2𝜎 2
∑
Based on (3), where 𝜀𝑖 = 𝑦𝑖 − 𝑝𝑗=1 𝑓𝑗 (𝑥𝑖𝑗 ) with 𝑓𝑗 are Fourier series function, we obtain the likelihood function of (16) as follows.
⎛ ( 𝑝
)2
( ) ⎞
⎜ ∑ ⎟
𝑦𝑖 − 𝑓𝑗 𝑥𝑖𝑗 ( )2
∏𝑛 ⎜ ⎟ ( ⎛ 𝑛 𝑝
( ) ⎞
( ) 1 ⎜
𝑗=1
⎟
) 𝑛
2 −2 ⎜ 1 ∑ ∑
⎟.
𝐿 𝑦1 , 𝑦2 , ..., 𝑦𝑛 |𝜎 2 = √ exp − = 2 𝜋𝜎 exp − 𝑦 𝑖 − 𝑓 𝑗 𝑥 𝑖𝑗 (17)
2𝜋𝜎 2 ⎜ 2𝜎 2 ⎟ ⎜ 2𝜎 2 𝑖=1 ⎟
𝑖=1
⎜ ⎟ ⎝ 𝑗=1 ⎠
⎜ ⎟
⎝ ⎠
4
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
Suppose 𝜔 is the parameter space under the null hypothesis and Ω is the parameter space under the hypothesis. Based on the
hypothesis form in (14) and the likelihood function (17), then we could define the parameter space under the null hypothesis is
𝐻0 (𝜔) and the parameter space under the hypothesis is 𝐻(Ω) as follows.
{ } { }
𝜔 = 𝜎𝜔2 and Ω = 𝛼1 , ..., 𝛼𝑝 , 𝛽1 , ..., 𝛽𝑝 , 𝛾11 , ..., 𝛾1𝑝 , ..., 𝛾𝐾1 , ..., 𝛾𝐾𝑝 , 𝜎Ω
2
(18)
Based on Definition 1, the statistical test for testing the hypothesis form given in (14) could be obtained by comparing the maximum
likelihood under the parameter space of the null hypothesis (𝜔) and the parameter space under the hypothesis (Ω) which is given in
Theorem 1. However, before presenting Theorem 1, let’s first introduce Lemma 1 and Lemma 2. Lemma 1 provides a summary of
how to obtain the maximum likelihood under the parameter space of the null hypothesis (𝜔) and Lemma 2 provides a summary of
how to obtain the maximum likelihood under the parameter space of the hypothesis (Ω).
Lemma 1. Suppose 𝜔 is the parameter space under the null hypothesis (18) then the maximum of the likelihood function (17) is
( )− 𝑛 ( )
𝑛
max 𝐿(𝜔) = 2𝜋 𝜎̂ 𝜔2 2 exp − , (19)
𝜔 2
𝑦̃′ 𝑦̃
where 𝜎̂ 𝜔2 = 𝑛
.
Proof. In this case, the parameter space of 𝜔 only contains the variance since we define all the parameters under the null hypothesis
to be zero value. By the likelihood function (17) and the parameter space of 𝜔 (18), we obtain the likelihood function under the
parameter space of 𝜔 as follows.
( 𝑛
) ( )
( ) 𝑛
2 −2 1 ∑ 2 ( ) 𝑛
2 −2 𝑦̃′ 𝑦̃
𝐿(𝜔) = 2𝜋𝜎𝜔 exp − 𝑦 = 2𝜋𝜎𝜔 exp − , (20)
2𝜎𝜔2 𝑖=1 𝑖 2𝜎𝜔2
where 𝑦̃ is a vector of the response variable. Furthermore, to obtain the maximum of (20) we estimate the parameter of 𝜎𝜔2 by
completing 𝜕 ln 𝐿2(𝜔) = 0. The natural logarithm of 𝐿(𝜔) in (20) is given as follows.
𝜕𝜎𝜔
( ( ))
( )− 𝑛 𝑦̃′ 𝑦̃ 𝑛 𝑛 ( ) 𝑦̃′ 𝑦̃
ln 𝐿(𝜔) = ln 2𝜋𝜎𝜔2 2 exp − = − ln (2𝜋) − ln 𝜎𝜔2 − . (21)
2 𝜎𝜔 2 2 2 2𝜎𝜔2
The partial derivative of (21) with respect to 𝜎𝜔2 and equalized to zero as follows.
( )
𝜕 ln 𝐿(𝜔) 𝑛 𝑦̃′ 𝑦̃ 1 𝑦̃′ 𝑦̃
=0− − = 𝑛− = 0. (22)
𝜕𝜎𝜔2 2𝜎𝜔2 2𝜎𝜔4 2𝜎𝜔2 𝜎𝜔2
Lemma 2. Suppose Ω is the parameter space under the hypothesis (18) then the maximum of the likelihood function (17) is
( ) 𝑛 ( )
2 −2 𝑛
max 𝐿(Ω) = 2𝜋 𝜎̂ Ω exp − , (24)
Ω 2
′
(𝑦̃−𝐗(𝐾)𝐵̃̂ Ω ) (𝑦̃−𝐗(𝐾)𝐵̃̂ Ω )
where 𝜎̂ Ω
2 =
𝑛
and 𝐵̃̂ Ω = (𝐗′ (𝐾 )𝐗(𝐾 ))−1 𝐗′ (𝐾)𝑦̃.
Proof. Noted that the parameter space of Ω is containing all the parameters in the model (the full model). Under the parameter space
of Ω, the likelihood function (17) could be defined as follows.
( )2
⎛ 𝑛 𝑝
( ) ⎞
( ) 𝑛
2 −2 ⎜ 1 ∑ ∑
⎟.
𝐿(Ω) = 2𝜋𝜎Ω exp − 𝑦𝑖 − 𝑓𝑗 𝑥𝑖𝑗 (25)
⎜ 2𝜎 2 𝑖=1 ⎟
⎝ Ω 𝑗=1 ⎠
Since 𝑓𝑗 are unknown functions and we approximated by the Fourier series function (4), the likelihood function (25) becomes
( ( ))2
⎛ 𝑛 𝑝 𝐾 ⎞
( ) 𝑛 1 ∑ ∑ 1 ∑ ( )
2 −2
𝐿(Ω) = 2𝜋𝜎Ω exp ⎜− 𝑦𝑖 − 𝛼𝑗 + 𝛽𝑗 𝑥𝑖𝑗 + 𝛾𝑘𝑗 cos 𝑘𝑥𝑖𝑗 ⎟[−4𝑝𝑡]
⎜ 2𝜎 2 𝑖=1 2 ⎟
⎝ Ω 𝑗=1 𝑘=1 ⎠
5
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
( )
( ) 𝑛 1 ( )′ ( )
2 −2
= 2𝜋𝜎Ω exp − 𝑦̃ − 𝐗(𝐾 )𝐵̃ Ω 𝑦̃ − 𝐗(𝐾 )𝐵̃ Ω . (26)
2 𝜎Ω
2
The likelihood function under the parameter space of Ω (26) could be maximized by obtaining the estimation of 𝐵̃ Ω and 𝜎Ω
2 . The
̃ 𝜕 ln 𝐿(Ω)
estimation of 𝐵Ω which easily obtained by completing = 0 as follows. 𝜕 𝐵̃ Ω
( ) 1 ( ′ )
− 2𝑛 ln 2𝜋𝜎Ω
2 − ̃ 𝑦̃ −
2 𝑦 2𝐵̃ Ω ̃ ′ 𝐗′ (𝐾 )𝐗(𝐾 )𝐵̃ Ω
′ 𝐗′ (𝐾 )𝑦̃ + 𝐵
𝜕 ln 𝐿(Ω) 2 𝜎Ω Ω
= =0
𝜕 𝐵̃ Ω 𝜕 𝐵̃ Ω
⇒ −2𝐗′ (𝐾 )𝑦̃ + 2𝐗′ (𝐾 )𝐗(𝐾 )𝐵̃ Ω = 0
( )−1
⇒ 𝐵̃̂ = 𝐗′ (𝐾 )𝐗(𝐾 ) 𝐗′ (𝐾 )𝑦̃.
Ω (27)
𝜕 ln 𝐿(Ω)
The estimation of 𝜎Ω
2 which is the same way as we obtain 𝜎 2 in Lemma 1 by completing
𝜔 = 0 as follows.
𝜕𝜎Ω2
( 2) 1 ( )′ ( )
− 2𝑛 ln (2𝜋) − 𝑛
ln 𝜎Ω − 2 𝑦̃− 𝐗(𝐾 )𝐵̃ Ω 𝑦̃ − 𝐗(𝐾 )𝐵̃ Ω
𝜕 ln 𝐿(Ω) 2 2 𝜎Ω
= =0
𝜕𝜎Ω
2 𝜕𝜎Ω
2
( )
1 1 ( ̃
)′ (
̃
)
⇒− 𝑛− 𝑦̃ − 𝐗(𝐾 )𝐵Ω 𝑦̃ − 𝐗(𝐾 )𝐵Ω =0
2 𝜎Ω2 𝜎Ω2
( )′ ( )
𝑦̃ − 𝐗(𝐾 )𝐵̃ Ω 𝑦̃ − 𝐗(𝐾 )𝐵̃ Ω
⇒ 𝜎̂ Ω
2
= . (28)
𝑛
( )′ ( )
𝑦̃ − 𝐗(𝐾 )𝐵̃̂ Ω 𝑦̃ − 𝐗(𝐾 )𝐵̃̂ Ω
𝜎̂ Ω =
2
. (29)
𝑛
Therefore, by submitting 𝐵̃̂ Ω (27) and 𝜎̂ Ω2 (29) into the likelihood function (26), we obtain the maximum of the likelihood function
( )′ ( )
⎛ ̃̂ ⎞
( ) 𝑛 ⎜ 1 𝑦̃ − 𝐗(𝐾 )𝐵 Ω 𝑦̃ − 𝐗(𝐾 )𝐵̃̂ Ω ⎟ ( ) 𝑛 ( )
2 −2 2 −2 𝑛
= 2𝜋 𝜎̂ Ω exp ⎜− ( )′ ( ) ⎟ = 2𝜋 𝜎̂ Ω exp − . □
⎜ 2 𝑦̃−𝐗(𝐾 )𝐵̃̂ Ω 𝑦̃−𝐗(𝐾 )𝐵̃̂ Ω ⎟ 2
⎝ 𝑛
⎠
Furthermore, the statistical test for testing the hypothesis form in (14) could be obtained using the LRT method as of Theorem 1.
Theorem 1. Given the nonparametric regression model (3) with 𝑓𝑗 approximated by the Fourier series function (4), by using the LRT method,
the statistical test for testing the hypothesis in (14) is
Λ∗ > 𝑐 ∗ , (30)
′
(𝐗(𝐾)𝐵̃̂ Ω ) 𝑦̃
𝑑1 2 𝑑
where Λ∗ = ′ with 𝑑1 and 𝑑2 are given in Theorem 2 and 𝑐 ∗ = (𝑐 − 𝑛 − 1) 𝑑2 with 0 ≤ 𝑐 ≤ 1.
(𝑦̃−𝐗(𝐾)𝐵̃̂ Ω ) (𝑦̃−𝐗(𝐾)𝐵̃̂ Ω ) 1
𝑑2
Proof. Since the error in (3) is assumed to be normally distributed with mean 0 and the constant variance 𝜎 2 and given the hypothesis
in (14) with 𝜔 is the parameter space under the null hypothesis and Ω is the parameter space under the hypothesis. Therefore, by
Definition 1 of the LRT in (13) we obtain
max 𝐿(𝜔)
𝜔
Λ= . (31)
max 𝐿(Ω)
Ω
6
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
Since 𝜎̂ 𝜔2 and 𝜎̂ Ω
2 are given in (23) and (29), we obtain the LRT in (32) as follows.
− 2𝑛 − 2𝑛
⎛ 𝑦̃′ 𝑦̃ ⎞ ⎛ ⎞
⎜ 𝑛 ⎟ ⎜ 𝑦̃′ 𝑦̃ ⎟
Λ=⎜( )( ) ⎟ = ⎜( )′ ( )⎟ . (33)
̂ ′ 𝑦̃−𝐗(𝐾 )𝐵̃̂
̃ ̂ ̂
⎜ 𝑦
̃ − 𝐗 (𝐾 )𝐵 Ω Ω ⎟ ̃
⎜ 𝑦̃ − 𝐗(𝐾 )𝐵 Ω ̃
𝑦̃ − 𝐗(𝐾 )𝐵 Ω ⎟
⎝ 𝑛
⎠ ⎝ ⎠
.
Since 𝐵̃̂ Ω = (𝐗′ (𝐾 )𝐗(𝐾 ))−1 𝐗′ (𝐾)𝑦̃ (27), then (𝐗(𝐾)𝐵̃̂ Ω ) 𝐗(𝐾)𝐵̃̂ Ω = (𝐗(𝐾)𝐵̃̂ Ω ) 𝑦̃. Thus,
′ ′
( )′ ( ) ( )′
𝑦̃′ 𝑦̃ = 𝑦̃ − 𝐗(𝐾 )𝐵̃̂ Ω 𝑦̃ − 𝐗(𝐾 )𝐵̃̂ Ω + 𝐗(𝐾 )𝐵̃̂ Ω 𝑦̃. (34)
𝑑2
Let 𝑑1 and 𝑑2 are the degrees of freedom which are given later in Theorem 2. By multiplying 𝑑1
for both segments in (36), we
obtain the statistical test for the hypothesis in (14) as follows.
( )′
𝐗(𝐾 )𝐵̃̂ Ω 𝑦̃
𝑑1
( 2 )𝑑
( )′ ( ) ≥ 𝑐− 𝑛 − 1 2
̂
̃
𝑦̃−𝐗(𝐾 )𝐵 Ω 𝑦̃−𝐗(𝐾 )𝐵̃̂ Ω 𝑑1
𝑑2
Λ∗ > 𝑐 ∗ .
The form of the statistical test that we obtain in Theorem 1 is for testing the hypothesis form in (14). To determine whether the
null hypothesis presented in (14) is rejected or fails to be rejected by using the statistical test (30), we need to establish the rejection
region for the null hypothesis by determining the distribution of the statistical test. The distribution of the statistical test of Λ∗ is
given in Theorem 2. However, to support the proof of Theorem 2, it is necessary to simplify the statistical test of Λ∗ as provided in
Corollary 1 below.
7
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
Proof. Noted that the statistical test of Λ∗ is given by Theorem 1. Let 𝑀 = (𝐗(𝐾)𝐵̃̂ Ω ) 𝑦̃ and 𝑁 = (𝑦̃ − 𝐗(𝐾)𝐵̃̂ Ω ) (𝑦̃ − 𝐗(𝐾)𝐵̃̂ Ω ), since
′ ′
( )′ ( ) ( )′ ( )′
𝑁 = 𝑦̃ − 𝐗(𝐾 )𝐵̃̂ Ω 𝑦̃ − 𝐗(𝐾 )𝐵̃̂ Ω = 𝑦̃′ 𝑦̃ − 2 𝐗(𝐾 )𝐵̃̂ Ω 𝑦̃ + 𝐗(𝐾 )𝐵̃̂ Ω 𝐗(𝐾 )𝐵̃̂ Ω .
Followed by (34) that (𝐗(𝐾)𝐵̃̂ Ω ) 𝐗(𝐾 )𝐵̃̂ Ω = (𝐗(𝐾 )𝐵̃̂ Ω ) 𝑦̃, thus
′ ′
( )−1 ( ( )−1 )
𝑁 = 𝑦̃′ 𝑦̃ − 𝑦̃′ 𝐗(𝐾 ) 𝐗′ (𝐾 )𝐗(𝐾 ) 𝐗′ (𝐾 )𝑦̃ = 𝑦̃′ 𝐈 − 𝐗(𝐾 ) 𝐗′ (𝐾 )𝐗(𝐾 ) 𝐗′ (𝐾 ) 𝑦̃ = 𝑦̃𝑇 𝐔(𝐾 )𝑦̃. (39)
Theorem 2. Let Λ∗ is the statistical test given by Theorem 1 and it has been simplified by Corollary 1, then the statistical test of Λ∗ follows
the distribution of 𝐹(𝑑1 ,𝑑2 ) as follows.
( )′
𝐗(𝐾 )𝐵̃̂ Ω 𝑦̃
𝑑1
Λ∗ = ( )′ ( ) ∼ 𝐹(𝑑1 ,𝑑2 ) , (40)
𝑦̃−𝐗(𝐾 )𝐵̃̂ Ω 𝑦̃−𝐗(𝐾 )𝐵̃̂ Ω
𝑑2
where 𝑑1 = 𝑝(𝐾 + 1) + 1 and 𝑑2 = 𝑛 − (𝑝(𝐾 + 1) + 1), if the following conditions are fulfilled.
𝑦̃′ 𝐕(𝐾)𝑦̃
i. The distribution of 𝜎2
is 𝜒(2𝑑 ) where 𝑑1 = 𝑝(𝐾 + 1) + 1.
1
𝑦̃′ 𝐔(𝐾)𝑦̃
ii. The distribution of is 𝜒(2𝑑 ) where 𝑑2 = 𝑛 − (𝑝(𝐾 + 1) + 1).
𝜎2
2
iii. 𝐕(𝐾) and 𝐔(𝐾) are independent.
𝑦̃′ 𝐕(𝐾)𝑦̃
𝜎 2 𝑑1
Proof. Let us divide the denominator and numerator of the statistical test of Λ∗ with 𝜎 2 and by Corollary 1 we have Λ∗ = 𝑦̃′ 𝐔(𝐾)𝑦̃
. We
𝜎 2 𝑑2
𝑦̃′ 𝐕(𝐾)𝑦̃ 𝑦̃′ 𝐔(𝐾)𝑦̃
finally have 𝜎2
and 𝜎2
we will then obtain the distribution.
𝑦̃′ 𝐕(𝐾)𝑦̃
i. Since 𝜎2
is a quadratic form, we easily obtain its distributed 𝜒(2𝑑 ) by proving 𝐕(𝐾) is a symmetric matrix where 𝐕′ (𝐾) = 𝐕(𝐾)
1
and idempotent matrix where 𝐕2 (𝐾) = 𝐕(𝐾).
( ( )−1 )′ ( )−1
𝐕′ (𝐾 ) = 𝐗(𝐾 ) 𝐗′ (𝐾 )𝐗(𝐾 ) 𝐗′ (𝐾 ) = 𝐗(𝐾 ) 𝐗′ (𝐾 )𝐗(𝐾 ) 𝐗′ (𝐾 ) = 𝐕(𝐾 ). (41)
( ( )−1 ( )−1 )′
𝐕2 (𝐾 ) = 𝐕′ (𝐾 )𝐕(𝐾 ) = 𝐗(𝐾 ) 𝐗(𝐾 )′ 𝐗(𝐾 ) 𝐗(𝐾 )′ 𝐗(𝐾 ) 𝐗(𝐾 )′ 𝐗(𝐾 ) 𝐗(𝐾 )′
( )−1 ( )−1
= 𝐗(𝐾 ) 𝐗′ (𝐾 )𝐗(𝐾 ) 𝐗′ (𝐾 )𝐗(𝐾 ) 𝐗′ (𝐾 )𝐗(𝐾 ) 𝐗′ (𝐾 )
( )−1
= 𝐗(𝐾 ) 𝐗′ (𝐾 )𝐗(𝐾 ) 𝐗′ (𝐾 ) = 𝐕(𝐾 ). (42)
𝑦̃′ 𝐕(𝐾)𝑦̃
Based on (41) and (42) the matrix 𝐕(𝐾) is symmetric and idempotent, thus 𝜎2
∼ 𝜒(2𝑑 ) . The degree of freedom could be obtained
1
by 𝑑1 = 𝑡𝑟𝑎𝑐𝑒(𝐕(𝐾)). Since 𝐗(𝐾) is a matrix with 𝑛 rows and 𝑝(𝐾 + 2) columns where the 𝑙𝑡ℎ column in 𝐗(𝐾) are the same where
𝑙 = ℎ(𝐾 + 1) + 1 + ℎ with ℎ = 0, 1, 2, ..., 𝑝 − 1 see Eq. (6) for the column elements of 𝐗(𝐾), then
−1
𝑑1 = 𝑡𝑟𝑎𝑐𝑒(𝐕(𝐾)) = 𝑡𝑟𝑎𝑐𝑒(𝐗(𝐾 )(𝐗′ (𝐾 )𝐗(𝐾 )) 𝐗′ (𝐾 )) = 𝑝(𝐾 + 1) + 1.
𝑦̃′ 𝐔(𝐾)𝑦̃
ii. Following by (i) we also easily obtain the distribution of 𝜎2
by showing that 𝐔(𝐾) is a symmetric matrix where 𝐔′ (𝐾) = 𝐔(𝐾)
and idempotent matrix where 𝐔2 (𝐾) = 𝐔(𝐾).
𝐔′ (𝐾 ) = (𝐈 − 𝐕(𝐾 ))′ = 𝐈′ − 𝐕′ (𝐾 ) = 𝐈 − 𝐕(𝐾 ) = 𝐔(𝐾 ). (43)
By (41) and (42), we know that 𝐕′ (𝐾) = 𝐕(𝐾) and 𝐕′ (𝐾)𝐕(𝐾) = 𝐕(𝐾), we obtain:
𝐔2 (𝐾 ) = 𝐈 − 𝐕(𝐾 ) = 𝐔(𝐾 ). (44)
8
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
Table 1
Variable description.
𝑦̃′ 𝐔(𝐾)𝑦̃
Based on (43) and (44) the matrix 𝐔(𝐾) is symmetric and idempotent, thus 𝜎2
∼ 𝜒(2𝑑 ) . Since 𝐈 is the identity matrix with the
2
dimension of 𝑛 × 𝑛 and followed by 𝑑1 in (i), then we obtain:
𝑑2 = 𝑡𝑟𝑎𝑐𝑒(𝐔(𝐾)) = 𝑡𝑟𝑎𝑐𝑒(𝐈 − 𝐕(𝐾)) = 𝑡𝑟𝑎𝑐𝑒(𝐈) − 𝑡𝑟𝑎𝑐𝑒(𝐕(𝐾)) = 𝑛 − (𝑝(𝐾 + 1) + 1).
iii. 𝐕(𝐾 )and 𝐔(𝐾 ) are independent if 𝐕(𝐾 )𝐔(𝐾 ) = 𝟎, then 𝐕(𝐾 )𝐔(𝐾 ) = 𝐕(𝐾)(𝐈 − 𝐕(𝐾)).
Since 𝐕′ (𝐾) = 𝐕(𝐾) and 𝐕′ (𝐾)𝐕(𝐾) = 𝐕(𝐾), then 𝐕(𝐾)𝐕(𝐾) = 𝐕(𝐾), thus
𝐕(𝐾 )𝐔(𝐾 ) = 𝐕(𝐾) − 𝐕(𝐾)𝐕(𝐾) = 𝐕(𝐾) − 𝐕(𝐾) = 𝟎.
Based on (i) and (ii) we have proved that 𝑦̃ 𝐕𝜎(2𝐾)𝑦̃ ∼ 𝜒(2𝑝(𝐾+1)+1) and 𝑦̃ 𝐔𝜎(2𝐾)𝑦̃ ∼ 𝜒(2𝑛−(𝑝(𝐾+1)+1)) as well as (iii) where 𝐕(𝐾) and 𝐔(𝐾) are
′ ′
independent. Therefore, the statistical test of Λ∗ given in Theorem 2 followed by Corollary 1 is distributed of 𝐹(𝑑1 ,𝑑2 ) as follows.
( )′
𝑦̃′ 𝐕(𝐾 )𝑦̃ 𝐗(𝐾 )𝐵̃̂ Ω 𝑦̃
𝜎 2 𝑑1 𝑑1
Λ∗ = = ( )′ ( ) ∼ 𝐹(𝑑1 ,𝑑2 ) .
𝑦̃′ 𝐔(𝐾 )𝑦̃ ̂
̃
𝑦̃−𝐗(𝐾 )𝐵 Ω 𝑦̃−𝐗(𝐾 )𝐵̃̂ Ω
𝜎 2 𝑑2
𝑑2
Since the statistical test given in Theorem 1 has a rejection region of {(𝑦̃, 𝐗(𝐾))|Λ∗ ≥ 𝑐 ∗ } and suppose given 𝛼 is a significance
2 𝑑
level where 0 < 𝛼 < 1. By Theorem 2, since Λ∗ ∼ 𝐹(𝑝(𝐾+2),𝑛−𝑝(𝐾+2)) and 𝑐 ∗ = (𝑐 − 𝑛 − 1) 𝑑2 . Thus 𝑐 could be obtained by integrating
1
the PDF of 𝐹 distribution and equalizing to 𝛼 which we also easily obtain that = 𝐹𝑡𝑎𝑏𝑙𝑒 = 𝐹(𝛼,𝑝(𝐾+1)+1,𝑛−(𝑝(𝐾+1)+1)) . Therefore, the
𝑐∗
null hypothesis has a rejection region as follows.
( ) ( ) ( )
𝛼 = 𝑃 rejected 𝐻0 |𝐻0 is true = 𝑃 Λ∗ ≥ 𝑐 ∗ |𝐵̃ = 0̃ = 𝑃 Λ∗ ≥ 𝐹(𝛼,𝑝(𝐾+1)+1,𝑛−(𝑝(𝐾+1)+1)) |𝐵̃ = 0̃ . (45)
Based on (45), the null hypothesis will be rejected if Λ∗ ≥ 𝐹(𝛼,𝑝(𝐾+1)+1,𝑛−(𝑝(𝐾+1)+1)) (for given 𝛼) or the null hypothesis will be
∞
rejected if the probability value is smaller than the significance level (𝑃 (𝐹 ≥ Λ∗ ) < 𝛼) where 𝑃 (𝐹 ≥ Λ∗ ) = ∫Λ∗ 𝑓 (𝐹 )𝑑𝐹 with 𝑓 (𝐹 )
is the PDF of 𝐹 distribution. □
Method validation
We use secondary data to apply the method. The data we use is ROA data that we collected from the annual reports of 47 go
public banks in 2020. The 47 go public banks are the banks that carry out stock trading on the Indonesia stock exchange, the list of
the 47 go public banks could be seen in Table A1 (see Appendix A). We use ROA data as the response variable (𝑦) and 5 predictor
variables (𝑥), the detail of the variables are described in Table 1.
1. Create a scatterplot between the response and all the predictor variables.
2. Assume the relationship between the response and all the predictor variables follow the nonparametric regression model with
the Fourier series approach.
3. Choose the optimum number of 𝐾when the number of 𝐾 is the same for all the predictor variables by using the GCV method
(11).
4. Choose the optimum number of 𝐾when the number of 𝐾 is different for each predictor variable by using the GCV method
(11).
5. Create the best model between step 3 and 4 based on the smallest GCV value.
6. Estimate the parameters.
7. Create the hypothesis form for testing the parameters.
8. Calculate the statistic value based on the statistical test and the probability value based on the distribution of the statistical
test.
9. Compare the value between the probability value and the significance level of 𝛼.
9
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
In this research, we use the R programming language for the exploration and analysis of ROA data. To streamline the implementa-
tion of the method detailed in this research, we have developed a package (syntax), which was created using R-Studio. This package
encompasses the data analysis steps described in this research. We have made this package publicly available to facilitate its applica-
tion to various datasets. The package can be accessed through the following link (https://rpubs.com/Authorsdataanalysis/1104036).
We create a scatterplot for each predictor variable versus the response variable to identify that the relationship between the
response variable and each predictor variable follow the nonparametric regression model, the scatterplot is given in Fig. 1 as follows.
Based on Fig. 1, we could observe the relationship patterns between the response variable and the predictor variables 𝑥1 , 𝑥2 ,
𝑥3 , 𝑥5 don’t exhibit specific patterns, while 𝑥4 exhibits a tendency toward linearity. However, the relationship patten of 𝑥4 as linear
could not be definitively established without further analysis. To address this, we conducted an analysis for modelling 𝑥4 using linear
parametric regression and we obtained 𝑅2 of 74.66 % with MSE of 1.898. We also conducted a comparison using nonparametric
regression with the Fourier series approach. For the number of 𝐾 is 1, we obtained 𝑅2 of 75.52 % with MSE of 1.834. Moreover, we
conducted a trial by setting the maximum number of 𝐾 is 10 and we obtained the optimum number of 𝐾 is 7 based on the minimum
GCV value of 1.803 with 𝑅2 of 84.26 % and MSE of 1.179. Based on the results of the trial analysis for modelling 𝑥4 using linear
parametric regression and nonparametric regression with the Fourier series approach, we could conclude that, even when initially
assessed through scatterplot, 𝑥4 demonstrates a tendency linearity. However, upon conducted a further analysis, 𝑥4 is better to be
modelled using nonparametric regression with the Fourier series, even when the number of 𝐾 is 1 or 7 for the maximum number of
𝐾 is 10 (this could be seen by 𝑅2 and MSE values). Therefore, in this research, we chose to model the ROA data using nonparametric
regression with the Fourier series approach for all the predictor variables.
Since the Fourier series function depends on the number of 𝐾, then we use the GCV method to obtain the optimum number of
𝐾. In this research, we carried out several trials related to the number of 𝐾, including when the number of 𝐾 is the same for all the
predictor variables and the number of 𝐾 is different for all the predictor variables. In the analysis we conducted, we use the maximum
number of 𝐾 is 5 and we obtained the values of GCV, 𝑅2 , and MSE for the number of 𝐾 is the same for all the predictor variables in
Table 2.
We obtain the minimum GCV value of 1.772 which means that the optimum number of 𝐾 is 2. Although the value of 𝑅2 is highest
for the number of 𝐾 is 5, however the GCV value of 2.507 is highest than the GCV value for the number of 𝐾 is 2. Therefore, the
10
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
Table 2
GCV values for the number of 𝐾 is the same for all the pre-
dictor variables.
𝐾 GCV 𝑅2 MSE
Table 3
Minimum GCV values for the optimum combinations of 𝐾.
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5
Table 4
Parameter estimations.
best estimation model for ROA data by using nonparametric regression with the Fourier series approach is when the number of 𝐾 is
2 for all the predictor variables. Moreover, we conducted several combinations for the number of 𝐾 on each predictor variable by
taking the maximum number of 𝐾 is 2, 3, 4, and 5. For the maximum number of 𝐾 is 2 we have 32 combinations, 𝐾 is 3 we have
243 combinations, 𝐾 is 4 we have 1024 combinations, and 𝐾 is 5 we have 3125 combinations. Based on the results of the analysis,
we obtain the minimum of the GCV value as well as the 𝑅2 and the MSE value for the combination number of 𝐾 on each predictor
variable when the maximum number of 𝐾 is 2, 3, 4, and 5.
In Table 3 we only show the optimum combination number of 𝐾 based on the minimum of the GCV value when the maximum
number of 𝐾 is 2, 3, 4, and 5. For example, we take the maximum number of 𝐾 is 2, then we have 32 combinations for the number
of 𝐾 on all the predictor variables and we obtain the minimum GCV value of 1.537 for the combination number of 𝐾 is 1 for 𝑥1 , 2
for 𝑥2 , 1 for𝑥3 , 2 for 𝑥4 , and 1 for 𝑥5 . Based on the possibilities for the combination number of 𝐾, we obtain the minimum GCV value
of 1.375 when the maximum number of 𝐾 is 5 with the combination number of 𝐾 is 5 for 𝑥1 , 2 for 𝑥2 , 3 for𝑥3 , 2 for 𝑥4 , and 1 for 𝑥5 .
Based on the combination number of 𝐾 on each predictor variable, then we obtain the general form of the nonparametric regression
model with the Fourier series approach for ROA data in 2020 as follows.
1 ( ) ( ) ( ) ( ) ( ) 1 ( )
𝑦𝑖 = 𝛼 + 𝛽1 𝑥𝑖1 + 𝛾11 cos 𝑥𝑖1 + 𝛾21 cos 2𝑥𝑖1 + 𝛾31 cos 3𝑥𝑖1 + 𝛾41 cos 4𝑥𝑖1 + 𝛾51 cos 5𝑥𝑖1 + 𝛼2 + 𝛽2 𝑥𝑖2 + 𝛾12 cos 𝑥𝑖2
2 1 2
( ) 1 ( ) ( ) ( ) 1 ( )
+ 𝛾22 cos 2𝑥𝑖2 + 𝛼̂ 3 + 𝛽3 𝑥𝑖3 + 𝛾13 cos 𝑥𝑖3 + 𝛾23 cos 2𝑥𝑖3 + 𝛾33 cos 3𝑥𝑖3 + 𝛼4 + 𝛽4 𝑥𝑖4 + 𝛾14 cos 𝑥𝑖4
2 2
( ) 1 ( )
+ 𝛾24 cos 2𝑥𝑖4 + 𝛼5 + 𝛽5 𝑥𝑖5 + 𝛾15 cos 𝑥𝑖5 + 𝜀𝑖 . (46)
2
Furthermore, we obtain the estimation of all the parameters in the model (46) as follows.
After obtaining the estimation of all the parameters in Table 4, we conducted a parameter hypothesis testing to determine whether
all the parameters we estimated have a significant influence on the model (46). Based on the model (46) and the hypothesis form
(14) we could define the null hypothesis and the alternative hypothesis for testing the parameters in the model (46) as follows.
𝐻0 ∶ 𝛼1 = 𝛼2 = 𝛼3 = 𝛼4 = 𝛼5 = 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4 = 𝛽5 = 𝛾11 = 𝛾21 = 𝛾31 = 𝛾41 = 𝛾51 = 𝛾12 = 𝛾22 = 𝛾13 = 𝛾23 = 𝛾33
= 𝛾14 = 𝛾24 = 𝛾15 = 0 𝑣𝑠 𝐻1 ∶ at least one of the parameters ≠ 0 (47)
11
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
Based on the results of the analysis, by using the statistical test in (30) for the number of 𝑛 is 47, 𝑝 is 5 and the combination number
of 𝐾 is given in Table 3, we obtain the value of Λ∗ is 22.31. Since Λ∗ is distributed as 𝐹(𝑑1 ,𝑑2 ) where 𝑑1 is 19 and 𝑑2 is 28, by given the
significance level of 𝛼 is 0.05 we obtain 𝐹(0.05,19,28) of 1.97 and the probability value of 𝑃 (𝐹 ≥ Λ∗ ) is 3.82×10−12 . Since Λ∗ > 𝐹(0.05,19,28)
or by the probability value of 𝑃 (𝐹 ≥ Λ∗ ) < 𝛼, thus we reject the null hypothesis. Therefore, at least one of the parameters is not zero
or we could say that all the parameters simultaneously have a significant influence on the model (46).
Ethics statements
The data we use in this research is secondary data that we collected from the annual report of 47 go public banks on the Indonesia
stock exchange in 2020. The data is available on request.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.
Mustain Ramli: Conceptualization, Methodology, Software, Writing – original draft, Visualization. I Nyoman Budiantara: Con-
ceptualization, Methodology, Writing – review & editing, Validation, Supervision. Vita Ratnasari: Conceptualization, Methodology,
Writing – review & editing, Validation, Supervision.
Data Availability
Acknowledgments
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Appendix A
Table A1
Table A1
List of 47 go public banks.
12
M. Ramli, I.N. Budiantara and V. Ratnasari MethodsX 11 (2023) 102468
Table A1 (continued)
References
[1] P. Cizek, S. Sadikoglu, Robust nonparametric regression: a review, WIREs Comput. Stat. 12 (2020) 1–16, doi:10.1002/wics.1492.
[2] W. Hardle, Applied Nonparametric Regression, Cambridge University Press, Cambridge, 1990.
[3] P.D. Jong, The fast Fourier transform spectral estimator, J. Roy. Statist. Soc. Ser. C 39 (1997) 327–330, doi:10.1111/j.2517-6161.1977.tb01629.x.
[4] W. Enders, J. Lee, A unit root test using a Fourier series to approximate smooth breaks, Oxford Bull. Eco. Stat. 74 (2012) 574–599,
doi:10.1111/j.1468-0084.2011.00662.x.
[5] A. Srivani, T. Arunkumar, S.D. Ashok, Fourier harmonic regression method for bearing condition monitoring using vibration measurements, Mater. Today: Proc.
5 (2018) 12151–12160, doi:10.1016/j.matpr.2018.02.193.
[6] S. Steinerberger, Wasserstein distance, Fourier series and applications, Monatsh. Math. 194 (2021) 305–338, doi:10.1007/s00605-020-01497-2.
[7] M. Lyon, Approximation error in regularized SVD-based fourier continuation, Appl. Numeric. Math. 62 (2012) 1790–1803, doi:10.1016/j.apnum.2012.06.032.
[8] M. Bilodeau, Fourier smoother and additive models, Canadian J. Stat. 20 (1992) 257–269, doi:10.2307/3315313.
[9] M.F.F. Mardianto, H.Utami Gunardi, An analysis about Fourier series estimator in nonparametric regression for longitudinal data, Mathematics and Statistics 9
(2021) 501–510, doi:10.13189/ms.2021.090409.
[10] A. Iriany, A.A.R. Fernandes, Hybrid Fourier series and smoothing spline path non-parametric estimation model, Front. Appl. Math. Stat. 8 (2023) 1–8,
doi:10.3389/fams.2022.1045098.
[11] N. Schneider, Y. Cai, Quantile integration order of decarbonized energy series using a Fourier function in the deterministic trend, Energy Climate Change 4
(2023) 1–17, doi:10.1016/j.egycc.2023.100105.
[12] P.A.W. Zoumb, X. Li, Fourier regression model predicting train-bridge interactions under wind and wave actions, Struct. Infrastruct. Eng. 19 (2023) 1489–1503,
doi:10.1080/15732479.2022.2033281.
[13] R. Pane, I.N. Budiantara, I. Zain, B.W. Otok, Fourier series estimator in nonparametric multivariable regression and its characteristics, Proc. Int. Conf. Appl. Stat.
1 (2013) 34–43.
[14] D.R.S. Saputro, A. Sukmayanti, P. Widyaningsih, The nonparametric regression model using Fourier series approximation and Penalized Least Squares (PLS)
(case on data proverty in East Java), J. Phys. Conf. Ser. 1188 (2019) 1–7, doi:10.1088/1742-6596/1188/1/012019.
[15] A. Prahutama, T.W.Utami Suparti, Modelling Fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia, J. Phys. Conf.
Ser. 974 (2018) 1–9, doi:10.1088/1742-6596/974/1/012067.
[16] N.P.A.M. Mariati, I.N. Budiantara, V. Ratnasari, Modeling poverty percentages in the Papua islands using Fourier series in nonparametric regression multivariable,
J. Phys. Conf. Ser. 1397 (2019) 1–7, doi:10.1088/1742-6596/1397/1/012071.
[17] M.A.D. Octavanny, I.N. Budiantara, H. Kuswanto, D.P. Rahmawati, Modeling of children ever born in Indonesia using Fourier series nonparametric regression,
J. Phys. Conf. Ser. 1752 (2021) 1–7, doi:10.1088/1742-6596/1752/1/012019.
[18] R. Putra, M.G. Fadhlurrahman, Determination of the best knot and bandwidth in geographically weighted truncated spline nonparametric regression using
generalized cross validation, MethodsX 10 (2023) 1–17, doi:10.1016/j.mex.2022.101994.
[19] V.J. Espana, J. Aparicio, X. Barber, M. Esteve, Estimating production functions through additive models based on regression splines, Eur. J. Oper. Res. 312 (2024)
684–699, doi:10.1016/j.ejor.2023.06.035.
[20] I.N. Sifriyani, S.H. Budiantara, Gunardi Kartiko, A new method of hypothesis test for truncated Spline nonparametric regression influenced by spatial hetero-
geneity and application, Abs. Appl. Anal. (2018) 1–13, doi:10.1155/2018/9769150.
[21] Y. He, T. Jiang, J. wen, G. Xu, Likelihood ratio test in multivariate linear regression: from low to high dimension, Stat. Sin. 31 (2021) 1215–1238,
doi:10.5705/ss.202019.0056.
[22] Y. Bai, Y. Zhang, C. Liu, Moderate deviation principle for likelihood ratio test in multivariate linear regression model, J. Multivar. Anal. 194 (2023) 1–15,
doi:10.1016/j.jmva.2022.105139.
[23] G. Casella, R.L. Berger, Statistical Inference, 2nd ed., Duxbury Press, Pacific Grove, 2002.
13