A Novel Family of Robust Hyperbolic Arctan Adaptive Filtering Algorithms1
A Novel Family of Robust Hyperbolic Arctan Adaptive Filtering Algorithms1
Definition 1: The process u[i] is said to be second-order statistic (wide-sense) cyclostationary signal
(WSCS) with period T if and only if its mean and autocorrelation function (s) are constant with the
period T , [19] i.e.:
E { u [i+τ ]uT [i] }=E {u [ i+mT + τ ] u H [i] } for all i , τ ∈(−∞ , ∞ ) (2)
Where T and m are an integer.
Here we adopt the model used in [x] to define the input signal u[i] as
where σ u [ i ] is a deterministic periodic sequence with period T and s[i] is a zero-mean random
sequence with unity variance. We denote that, s[i] is often defined as the coloured Gaussian sequence
as in [x]. Hence, u[i] can be viewed as a discrete time wide sense white Gaussian cyclostationary
process with autocorrelation matrix Ru as
Adopting the common model for the periodic sequence σ 2u [ i ] =β ( 1+ γ sin (2 πi/T ) ), with |γ|<1 and
β > Σ 0 are scalar factor and amplitude respectively as in [x]. The correlated (or colored) degrees of
cyclostationary input can be classified as weak, moderate or strong M ≪ T , M ≈T and M ≫ T ,
respectively depending on how the variation period compares to the filter memory, i.e., to the
coefficient vector length M. Hence,
General framework of the stochastic gradient adaptive filtering algorithm
In order to derive the general framework, we consider the system identification model of unknown
time-variant system depicted in Fig. (1). The relation between its scalar desired signal d [i] and 1 × M
input signalu [ i ]is characterized by linear model as
d [ i ]=u [ i ] w +n [ i ]
o (5)
Where w o is unknown optimal solution coefficient vector to be defined later in (7) and n [i]
represents perturbation noise, where the disturbance term n [i] is uncorrelated with u [ i ]. We define the
error signal
ϵ [ i ]=d [ i ] −w [ i ] u [ i ]
T
(6)
ϵ [ i ]=ϵ a [ i ] + n[i ]
ϵ a [ i ] =~
w [i ] u [ i ]
H (7a)
The term ϵ a [ i ], measure the distance between u [ i ] w and w T [ i ] u [ i ] , i.e., it measures how close the
o
estimator w T [ i ] u [ i ] is to the optimal linear estimator of d [ i ]. And the weight-error vector, defined
by
~
w [ i ] =w −w [ i ]
o (7a)
Where w [i] is the unknown filter coefficient vector to be estimated. Furthermore, the time-varying
parameter vector w o is of system is modelled as a random walk process [18]:
w [ i+1 ] =w [ i ] +h [i]
o o (7b)
where the random error h [i] is a zero-mean white Gaussian vector with the covariance matrix
E {h [ i ] h [ i ]=σ h I , and is assumed to be statistically independent ofu [ i ] and n [i].
H 2
Basically, we aim to determine accurate estimates of the coefficient vector w [ i ] . In order to pursue
our objective, and facilitate the ensuing performance analysis, we first derive generalized
mathematical framework of the family of the stochastics adaption algorithms. To that aim, we use the
form of error ϵ [ i ], defined in (6) to design algorithms in the form [1]
ˇ
w [ i+1 ] =w [ i ] −μ ∇ (8)
w [i ]
ˇ
Where μ is the step-size parameter and ∇ w [i] is the gradient with respect to the coefficient w [ i ] of an error cost
function ϕ (ϵ [ i ] ), called as the error criteria
ˇ
Examining the gradient function ∇ w [i] used in (8), we obtain that
ˇ = ∂ ϕ ( ϵ [i ])
∇ w [i ]
∂ w [i]
∂ ( ϵ [ i ]) (9)
¿ ϕ' ( ϵ [ i ] )
∂ w [i ]
¿−ϕ ( ϵ [ i ] ) u [ i ]
'
In which ϕ ' (.) refers the first order differentiation of ϕ ( . ). let use f ( ϵ [ i ] )=ϕ ( ϵ [ i ] ), then we may define the
'
w [ i+1 ] =w [ i ] + μf ( ϵ [ i ] ) u [ i ]
(10)
ϵ [ i ]=d [ i ] −w [ i ] u [ i ]
T
It’s noteworthy that (10) represents a generalization form of the algorithms based on both mean-
square error and non-mean-square error criterions. However, this form of adaptive algorithms in (10)
have less frequent studies in the signal processing community, basically because of mathematical
difficulties analysis concerns with their analysis. Authors in [9, 17, 20] have examined this family of
adaptive filtering algorithms and they show that this family can improve the performance of the
adaptive filtering algorithms especially when the perturbing noise distribution is non-Gaussian, which
is motivates more researchers to perform more studies on such family of adaptive filtering algorithms.
Therefore, here in this work we adopt this family of algorithms defined (10) to propose a novel
family of robust hyperbolic arctan adaptive filtering algorithms (FRHATAF) that are effective under
impulsive (heavy-tailed) interferences environments. To derive our proposed algorithms, we use non-
mean-square error criterion which is arctan function. In particular, we proposed two categories of
FRHATAF algorithms, namely: 1) standard robust hyperbolic arctan adaptive filtering algorithms
(SRHATAF) and 2) robust hyperbolic arctan adaptive filtering algorithms (RHATAF) based on how
we define the cost function. It worth to note that, in this work we define robustness as the insensitivity
of the algorithms against the impulsive noise encountered in the practical applications.
Where k (ϵ [ i ] ) is the conventional cost function based on the nonlinear error function ϵ [ i ], e.g,
E [|ϵ [ i ]|] and E [|ϵ [ i ]|¿¿ 2]¿. Generally, we consider the conventional cost function k ( ϵ [ i ]) =|ϵ [ i ]| ,
p
where p>0 , is corresponds to the shape parameter. Thus, the generalized hyperbolic arctan cost
function in (11) becomes
1 p (12)
E [tan (α |ϵ [ i ]| )]
−1
ϕ ( ϵ [ i ] )=
pα
Where α >0, is the smoothing factor, thus, applying a stochastic gradient process used (9) to cost
function given in (11), and substitute into (10), there the SFRHATAF algorithms can be defined as
w [ i+1 ] =w [ i ] + μ R ( ϵ [ i ] ) u [i]
(13)
( )
p−1
|ϵ [ i ]| sign(ϵ [ i ] )
R ( ϵ [ i ]) = 2p
1+ δ |ϵ [ i ]|
Some observation from (13):
a). When we consider p=1 , (13) becomes
w¿ (14)
Which is called standard robust hyperbolic arctan least absolute deference (SRHAT-LAD) algorithm.
b). When we consider p=2, (13) becomes
w¿ (15)
Which is called standard robust hyperbolic arctan least mean square (SRHAT-LMS) algorithm.
A. Normalized Updates SFRHATAF
In this part we introduce normalized update equation of SFRHATAF algorithms with respect to input
signal u [ i ] in order to improve the performance of the proposed algorithms. We re-defined the cost
function (9) as
[ ( )]
p (16)
1 −1 α |ϵ [ i ]|
ϕ ( ϵ [ i ])= E tan p
pα ‖u [ i ]‖
The gradient update is given by
[ ]
p −1 (17)
|ϵ [ i ]| sign(ϵ [ i ] )
ϕ (ϵ [ i ] )=
'
2p
p δ |ϵ [ i ]|
‖u [ i ]‖ + p
‖u [ i ]‖
Thus, we have SRHAT-NLAD algorithm can be described as
w¿ (18)
w¿ (19)
It’ss worth to point out that the last expression in (8) is also proposed as Arctan NLMS (Arc-NLMS)
algorithm in [6].
FRATAF, p = 1
Gradient function F p ( (i))
FRATAF, p = 2
0.5 6
0.4
0.2
0
-10 -5 0 5 10
Error (i)
2. Framework of cost function of RHATAF
Alternatively, the cost function defined in (11) can also be defined as following in (16) as following
p (20)
ϕ ( ϵ [ i ] )=E [|tan−1 (δϵ [ i ] )| ]
Thus, a stochastic gradient based adaptive filtering algorithm of (16) can be easily derived as
∂
w [ i+1 ] =w [ i ] −μ ϕ ( ϵ [ i ])
∂ w[i] (21)
w [ i+1 ] =w [ i ] + μF ( ϵ [ i ] ) u[i]
( )
p−1
|tan−1 ( δϵ [ i ] )| −1
sign(tan ( δϵ [ i ] ) )
F ( ϵ [ i ])= 2
1+ δ |ϵ [ i ]|
μ −1 (23)
w [i+1]=w [ i ] + 2
sign(tan ( δϵ [ i ] ) )u [ i ]
1+δ |ϵ [ i ]|
Which is called robust arctan least absolute deference (RHAT-LAD) algorithm
When we consider p=2, (17) becomes
w¿ (24)
−1 π (25)
tan ( x [i] ) ≤ tanh ( x [i] ) ,
2
Both sides of the above expression (25) become an equal as the amplitude of x tends to larger values,
which is very convenient for case when we perform the analysis under assumption of impulsive noise
environment. Thus, applying the last expression (25), into (21), we obtain
( )
p−2
π |tanh ( δϵ [ i ] )|
η [i]=μ
2 1+δ [i]|ϵ [ i ]|2
2
Based on the simulation depicted in Fig .1, it’s easy to note that, when |ϵ [ i ]|→ ∞ , the both terms
R ( ϵ [ i ] ) and F ( ϵ [ i ] ) → 0 regardless the value of the power p. Thus, for the large values of error e.g.,
heavy-tailed noise, the filter coefficient values don’t change much. This demonstrates the robustness
of the FRHATAF algorithm against the impulsive noise or outliers the cause high peak of error.
Remark 2
Through the proposed SRHATAF algorithms, in general we can improve the convergence analysis for
the different conventional objective function. However, in the environments where higher order error,
e.g., LMS and LMF algorithms which have poor convergent (impulsive noise environments), the
performance of the corresponding SRHAT-LAD, and SRHAT-LMS algorithms are robust against
impulsive noise, as SRHAT-LAD robustness to the LMS algorithm, and SRHAT-LMS overcome the
stability issues of the LMF algorithm. Moreover, via the well optimization of δ , we can still obtain
improvement in the convergence performance.
Performance analysis
We study the convergent conditions of the filter coefficient vector w [ i ] e.g., mean convergent and
mean-square convergent, and excess mean-square error (EMSE) of an adaptive filter and finally
determine the range of values of the step-size over which coefficient vector error remains bounded.
All analysis performs under assumption of white Gaussian environment
Statistical Assumptions
Assumption 1. The input regressor u[i] is zero-mean independent and identical distribution (i.i.d)
Gaussian random variable, and the perturbation noise n [i] is also zero-mean i.i.d Gaussian random
variable and uncorrelated with all u[i] .
Assumption 2. The a prior error ϵ a [ i ] to be later defined is assumed to be Gaussian and is jointly
Gaussian with the weighted prior error ϵ a [ i ] for any constant matrix Σ , where ϵ a [ i ] =~
Σ Σ H
w Σu [i] and
Σ is asymmetric positive-definite weighted matrix. This assumption is hold for longer filter length
e.g., M, and sufficient small step-size μ. Also, ϵ a [ i ] independent from n [i].
Assumption 3. We assume that there exists optimal solution vector w o, such that the desired
signal can be achieved as d [ i ]=wo u [ i ] +n [ i ]
Based on general update equation of stochastic gradient algorithm defined in (10), and using the
definition of the ~
w [ i ] in (6), it follows that
~
w [ i+ 1 ] =~
w [ i ] −μ f ( ϵ [ i ] ) u [ i ] (21)
( )
p −1 (22)
~ |ϵ [ i ]| sign( ϵ [ i ] )
w [ i+ 1 ] =~
w [ i ] −μ 2p
u [i ]
1+ δ |ϵ [ i ]|
[( ) ]
(23)
sign(ϵ [ i ] )
E [~
w [ i+1 ] ]=E [ ~
w [ i ] ]−μ E 2
u [i ]
1+δ |ϵ [ i ]|
The first part of right-side of the last expression above can be evaluated as
( )
∞
1 1 1
E 2
= ∫
√ 2 πδ σ ϵ −∞ 1+ x 2
2
exp (−λ x )dx (24)
1+δ |ϵ [ i ]|
1
Where x=√ δ ϵ [ i ] and λ= 2
2 δ σϵ
However, inequality in (24) has solved exactly in [11] as
β 1=E
( 1
1+δ |ϵ [ i ]|
2
)
= √ πλ exp (λ) [1−erf ( √ λ) ] (25)
√λ
2
Where erf ( √ λ )=
√ 0
π
∫ exp (−t 2 ) dt . Also the second part of the right-side can be evaluated as
E [ sign(ϵ [ i ] )u [ i ] ] =μ
√ 2 1
R
π σϵ u
(26)
√
2 1 (27)
E [~
w [ i+1 ] ] =E [ ~
w [ i ] ] −μ β R E [~
w [i ] ]
π σϵ 1 u
( √
¿ 1−μ
2 1
β R E [~
π σϵ 1 u
w [i]]
)
Now it’s easy to E [ ~
w [ i ] ] will converge if we choose the step-size to satisfy
σϵ
0< μ ≤ √ 2 π (28)
β 1 λmax (Ru )
E [~
w [ i+1 ] ]=E [ ~
w [ i ] ]−μ E
[( 1
1+δ |ϵ [ i ]|
4
) [ ] [ ]]
ϵ iui (29)
2
It’s known that as i→ ∞ , it yields that E [|ϵ [ i ]| ]→ σ 2n , it means that E [|ϵ [ i ]| ] is converge square of
4
2
the noise variance ( σ 2n ) thus in this case
E
[( 1
1+ δ |ϵ [ i ]|
4
) ]
ϵ [ i ] u [ i ] ≈ E [ sech2 (δ ϵ [ i ] )ϵ [ i ] u [ i ] ] (30b)
Where in the last case for large values of the noise amplitude E
( | [ ]| )
1
1+δ ϵ i
4
≈ E [ sech ( δϵ [ i ]) ] ≈ 0
2
E [~
w [ i+1 ] ]=E [ ~
w [ i ] ]−μ ψ [i] Ru E [ ~
w [i]]
(31)
E [~
w [ i+1 ] ]=( 1−μ ψ [i] Ru ) E [ ~
w [i]]
( )
p−1 (33)
~
w [ i+ 1 ] =~
w [ i ] −μt
|tanh ( δϵ [ i ] )| sign( tanh ( δϵ [ i ] ) ) u [ i ]
2
1+ δ |ϵ [ i ]|
π
Where μt = μ
2
Mean Behavior of RHAT-LAD
After take statistic expectation of (33) when p =1, we have
[( ) ]
(34)
sign(tanh ( δϵ [ i ] ) )
E [~
w [ i+1 ] ] =E [ ~
w [ i ] ] −μt E 2
u [i ]
1+ δ |ϵ [ i ]|
The right-side of (35), is evaluated exactly in (25) and (26) this yields that
E [~
w [ i+1 ] ] =E [ ~ w [ i ] ] −μt
2 1
√
β R E [~
π σϵ 1 u
w [i]]
¿ E [~w [ i ] ] −μ
√ π 1
β R E [~
2 σϵ 1 u
w [i ] ] (36)
( √
¿ 1−μ
π 1
β R E [~
2 σϵ 1 u
w [i]]
)
Now it’s easy to E [ ~
w [ i ] ] will converge if we choose the step-size to satisfy
√ 8 σϵ
0< μ ≤ (37)
π β 1 λ max (R u)
Mean Behavior of SRHAT-LMS
After take statistic expectation of (33) when p =2, we have
[( ) ]
1 (38)
E [~
w [ i+1 ] ] =E [ ~
w [ i ] ] −μt E 2
tanh ( δϵ [ i ] ) u [ i ]
1+ δ |ϵ [ i ]|
To evaluate above relation (38). let use the Taylor series expansion of nonlinearity function g ( ϵ [ i ] )
with respect to ϵ a [ i ] around the noise value n [ i ], can written as [9, 20]
1 '' 2 2
g ( ϵ [ i ] )=g ( ϵ a [ i ] + n[i] )=g ( n [i] ) + g ( n[i] ) ϵ a [ i ] + g ( n [ i ] ) ( ϵ a [ i ] ) +O(ϵ a [ i ] )
'
2 (39)
[ ] [ ] [[ ] ] (40)
2
E ( g ( ϵ [ i ] ) ) ≈ E ( g ( n [ i ] ) ) + E g ( ϵ [ i ] ) f ( ϵ [ i ] ) +|g ( n [ i ] )| ϵ a [ i ]
2 2 '' ' 2
Where g' ( n[i] ) and g' ' ( n [i] ) are the first and second order derivatives of the function g(.), and
2
O(ϵ a [ i ] ) denotes the third and higher orders. Last Taylor expansion expression in (22) represents a
class of function g ( . )that are differentiable up to third order over the all of ϵ [ i ] [9]
E [~
w [ i+1 ] ] =E [ ~
w [ i ] ] −μt E [δ sech (δ n [i])] β1 Ru E [ ~
w [i ]]
2 (41)
Mean convergent for the different kind the noise distributions can be evaluated from (41), For the case
if the noise distribution is known and if it’s uninform over range [-b,b], relation (41) will be evaluated
as [10]
( )
1 (41)
E [~
w [ i+1 ] ] = 1−μt tanh ( δ b) β1 Ru E [ ~
w [i ] ]
b
E [~
w [ i+1 ] ~
wT [ i+ 1 ] ]=E ( [ ~ {
w [ i ] ] −μf ( n [ i ] ) u[i] )( E [ ~
w [ i ] ] −μf ( n [ i ] ) u[i] )
T
}
¿ E [~ wT [ i ] ]−μE [~
w [i ] ~ w [ i ] ] + μ2 E [‖u[i]‖] E [ f 2 ( n [ i ]) ]
w [ i ] f ( n [ i ]) ]−μE [ f ( n [ i ]) ~
(43
K [i+1]=K [i]−μE [ ϵ a [ i ] f ( n [ i ]) ] −μE [ f ( n [ i ]) ϵ a [ i ] ]+ μ E [‖u [i]‖] E [ f ( n [ i ]) ]
2 2 )
[( ) ]√
1 2 1 (44
f ( n [ i ]) ϵ a [ i ] =E sign( ( δϵ [ i ]) )ϵ a [ i ] = β R K [i] )
1+ δ |ϵ [ i ]|
2
π σϵ 1 u
[( )] [ ]
2
1
2
1 |ϵ [ i ]|
E [ f ( n [ i ]) ]=E
2
2
= 2E 2
1+ δ |ϵ [ i ]| σϵ ( 1+ δ|ϵ [ i ]|2 )
[ ] [
(45
] [ ]
2
1 |ϵ [ i ]| 1 −∂ 1 −1 ∂ 1 )
2
E 2 2
= 2E [ 2
]= 2 E 2
σϵ ( 1+ δ|ϵ [ i ]| ) σ ϵ ∂ δ 1+δ|ϵ [ i ]| σ ϵ ∂ δ 1+ δ|ϵ [ i ]|
Where the interchange of integration and differentiation property in [11] is used in the last expression
1 ∂ ϑ (ϵ [ i ] , δ)
since ϑ (ϵ [ i ] ,δ ) ≜ 2 and are both continuous in R2. Thus, using the result in (25),
1+δ |ϵ [ i ]| ∂δ
and after some mathematical manipulations, (45) become
(46
E [ f ( n [ i ]) ]=( 2 λ+ 1 ) { λ √ πλ exp ( λ ) [ 1−erf ( √ λ ) ] }−2 λ )
2 2
Now substituting results in (44) and (45) into (43), we can achieve the MSD relation of SRHAT-LAD
as following
[
K [ i+1 ]= 1−2 μE
√ 2 1
π σϵ ] [
β 1 R u K [i]+ μ2 N σ 2u ( 2 λ+1 ) { λ √ πλ exp ( λ ) [ 1−erf ( √ λ ) ] }−2 λ 2 ]
(47)
Mean-square convergent of SRHAT-LMS
To perform mean-square deviation (MSD) of the SRHAT-LMS, we recall (15), we have
[( ) ]
1 (48
f ( n [ i ] ) ϵ a [ i ] =E 4
ϵ [i ] ϵ a [ i ] ≈ E ¿ ¿ ¿ )
1+ δ |ϵ [ i ]|
[ ] [
(49
]
2
|ϵ [ i ]| 1
E [ f ( n [ i ] ) ]=E
2
2
4 2
=E 4 2
|ϵ [ i ]| )
( 1+ δ|ϵ [ i ]| ) ( 1+δ|ϵ [ i ]| )
To evaluate (49), we consider the following assumptions
[ ] [√ ]
(50a
1 4
E [ f 2 ( n [ i ] ) ]=E
2
2
|ϵ [ i ]| ≤ E sech2 ( δϵ [ i ] ) E [|ϵ [ i ]|¿¿ 2]¿ )
( 1+ δ|ϵ [ i ]|4 ) π
Under assumption of Gaussian noise (with small variance), (49) can be evaluate as
[( ]
1 (50b
E [ f ( n [ i ]) ]=E
2
2
4 2
|ϵ [ i ]| ≈ E[|ϵ [ i ]|¿¿ 2]¿ )
1+ δ |ϵ [ i ]| )
Now substituting results in (48) and (50, a&b) into (43), we can achieve the MSD relation of SRHAT-
LMS as following
f ( n [ i ]) ϵ a [ i ] =E
[( 1
1+ δ |ϵ [ i ]|
2
) sign( ( δϵ [ i ]) )ϵ a [ i ] =
]√ 2 1
β R K [i]
π σϵ 1 u
[( | [ ]| ) ]
2
1
E [ f ( n [ i ]) ]=E =( 2 λ+1 ) { λ √ πλ exp ( λ ) [ 1−erf ( √ λ ) ] }−2 λ
2 2
2
1+ δ ϵ i
[ √
K [ i+1 ]= 1−2 μ
π 1
2 σϵ
β 1 R u K [i]+ μ2
2]
π 2
() [
N σ 2u ( 2 λ+1 ) { λ √ πλ exp ( λ ) [ 1−erf ( √ λ ) ] }−2 λ 2 ]
(52)
RHAT-LMS
To perform mean-square deviation (MSD) of the RHAT-LAD, we recall (26), we have
[( ) ]
1 (53
f ( n [ i ] ) ϵ a [ i ] =E 2
tanh ( δϵ [ i ] ) ϵ a [ i ] =E[δ sech2 ( δn [ i ] ) ] β 1 Ru K [i] )
1+ δ |ϵ [ i ]|
[( )]
1
2 (54
E [ f ( n [ i ]) ]=E
2
2 [
E ( tanh ( δϵ [ i ]) )
2
] )
1+ δ |ϵ [ i ]|
To evaluate last line (54), let g ( ϵ [ i ] )=tanh ( δϵ [ i ] ) to (54) and apply (40), we obtain
E [ tanh ( δϵ [ i ] ) ] =¿
2
[ () ] ()
2 2
π 2 π 2 π
K [ i+1 ]= 1−2 μ E [ g ( ϵ [ i ]) ] β 1 Ru + μ β2 E[ g ( ϵ [ i ]) g ( ϵ [ i ]) +|g ( ϵ [ i ])| ][ Ru ⨂ Ru ] K [i]+ μ β2 E [g ( ϵ [ i
' '' ' 2
2 2 2
Where
[
β 2= ( 2 λ+ 1 ) { λ √ πλ exp ( λ ) [ 1−erf ( √ λ ) ] }−2 λ2 ]
E [ g' ( ϵ [ i ]) ]=E [ δ sech 2 ( δ n [ i ]) ]
[ 2
]
E g ( ϵ [ i ]) g ( ϵ [ i ])+|g ( ϵ [ i ])| =E [ δ tanh (δ n [ i ] ) ] + E [ δ sech ( δ n [ i ]) −δ tanh ( δ n [ i ]) sech ( δ n [ i ]) ]
'' ' 2 2 4 2 2 2