0% found this document useful (0 votes)
20 views

Skew-T Distribution

This document discusses efficient and scalable design of high-order portfolios using a parametric distribution approach. It introduces the mean-variance-skewness-kurtosis portfolio framework and challenges with existing non-parametric modeling of skewness and kurtosis. A parametric generalized hyperbolic skew-t distribution is proposed to allow faster computation of high-order moments and more efficient optimization algorithms are developed to solve the portfolio design problem.

Uploaded by

BERNARDO LEON
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Skew-T Distribution

This document discusses efficient and scalable design of high-order portfolios using a parametric distribution approach. It introduces the mean-variance-skewness-kurtosis portfolio framework and challenges with existing non-parametric modeling of skewness and kurtosis. A parametric generalized hyperbolic skew-t distribution is proposed to allow faster computation of high-order moments and more efficient optimization algorithms are developed to solve the portfolio design problem.

Uploaded by

BERNARDO LEON
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

3726 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL.

71, 2023

Efficient and Scalable Parametric High-Order


Portfolios Design via the Skew-t Distribution
Xiwen Wang , Rui Zhou , Member, IEEE, Jiaxi Ying , and Daniel P. Palomar , Fellow, IEEE

Abstract—Since Markowitz’s mean-variance framework, opti- I. INTRODUCTION


mizing a portfolio that strikes a trade-off between maximizing
profit and minimizing risk has been ubiquitous in the finan-
cial industry. Initially, profit and risk were measured by the
first two moments of the portfolio’s return, a.k.a. the mean
M ODERN portfolio theory (MPT), pioneered by Harry
Markowitz [1], strives to reaching a trade-off between
minimizing the risk of the portfolio and maximizing its profit.
and variance, which are sufficient to characterize a Gaussian For the convenience of modeling the profit and risk, the as-
distribution. However, it is broadly believed that the first two
moments are not enough to capture the characteristics of the sets’ returns are conventionally assumed to follow a Gaussian
returns’ behavior, which have been recognized to be asymmetric distribution. The Gaussian distribution was embraced in early
and heavy-tailed. Although portfolio designs involving the third research for a number of reasons [2]. First of all, it is straightfor-
and fourth moments, i.e., skewness and kurtosis, have been ward to describe the data using the Gaussian distribution. The
demonstrated to outperform the conventional mean-variance mean vector μ and covariance matrix Σ, which are the parame-
framework, they present non-trivial challenges. Specifically, in
the classical framework, the memory and computational cost ters of the Gaussian distribution, can be obtained via numerous
of computing the skewness and kurtosis grow sharply with the estimation methods. Moreover, the mathematical expression
number of assets. To alleviate the difficulty in high-dimensional of profit and risk are henceforth simple enough such that the
problems, we consider an alternative expression for high-order resultant portfolio designs are convenient from the perspective
moments based on parametric representations via a generalized of optimization. However, the mean and variance, a.k.a. the
hyperbolic skew-t distribution. Then, we reformulate the high-
order portfolio optimization problem as a fixed-point problem first- and second-order moments, are usually not sufficient to
and propose a robust fixed-point acceleration algorithm that capture the characteristics of the assets’ returns [3], [4]. It is
solves the problem in an efficient and scalable manner. Empirical widely acknowledged that empirical observations of stock data
experiments also attest to the efficiency of our proposed high- exhibit asymmetry and fat tails that can be barely described by
order portfolio optimization framework, which presents low a Gaussian distribution [5], [6]. In light of these deficiencies,
complexity and significantly outperforms the state-of-the-art
methods by 2 ∼ 4 orders of magnitude. a number of empirical evidence advocates the incorporation of
the high-order moments into portfolio design [7], [8].
Index Terms—High-order portfolios, generalized hyperbolic
skew-t distribution, fixed point acceleration.
The concerns of skewness and kurtosis, a.k.a. third- and
fourth-order moments, have been raised for decades [9]. Typi-
cally, higher skewness is preferred as it reduces extreme values
on the side of losses and increases them on the side of gains.
Whereas the kurtosis measures dispersion which is something
undesirable that increases the uncertainty of returns [10], [11],
Manuscript received 1 June 2022; revised 13 February 2023; accepted [12]. A detailed discussion can be found in [11]. Therefore,
29 August 2023. Date of publication 12 October 2023; date of current version portfolio designs should also aspire to achieve high skewness
20 October 2023. This work was supported in part by the Hong Kong GRF and low kurtosis. This trade-off was then naturally formu-
under Grant 16207820, in part by the National Nature Science Foundation
of China (NSFC) under Grant 62201362, and in part by the Shenzhen Sci- lated as a mean-variance-skewness-kurtosis (MVSK) frame-
ence and Technology Program under Grant RCBS20221008093126071. The work [13].
associate editor coordinating the review of this manuscript and approving it Although there are many compelling advantages of involving
for publication was Prof. Yue M. Lu. (Corresponding authors: Rui Zhou;
Jiaxi Ying.) skewness and kurtosis [14], [15], solving high-order portfolio
Xiwen Wang is with the Department of Electronic and Computer Engi- optimization problems is non-trivial. Given a problem formu-
neering, Hong Kong University of Science and Technology, Clear Water Bay, lation to specify the trade-off, a typical high-order portfolio de-
Kowloon, Hong Kong (e-mail: [email protected]).
Rui Zhou is with the Shenzhen Research Institute of Big Data, Shenzhen sign consists of a model to characterize the high-order moments
518115, China (e-mail: [email protected]). and optimization algorithms to solve the problem. Each of these
Jiaxi Ying is with the Department of Mathematics, Hong Kong University modules can be a limiting factor in the overall practicability of
of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (e-mail:
[email protected]). the framework. In this paper, we start from the classical MVSK
Daniel P. Palomar is with the Department of Electronic and Computer problem formulation. Then, the first fundamental problem is
Engineering and the Department of Industrial Engineering and Decision how to model the skewness and kurtosis of the portfolio return.
Analytics, Hong Kong University of Science and Technology, Kowloon, Hong
Kong (e-mail: [email protected]). The conventional approach models the skewness and kurtosis
via the vanilla co-skewness matrix Φ ∈ RN ×N and co-kurtosis
2
Digital Object Identifier 10.1109/TSP.2023.3314278

1053-587X © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: EFFICIENT AND SCALABLE PARAMETRIC HIGH-ORDER PORTFOLIOS DESIGN VIA THE SKEW-t DISTRIBUTION 3727

matrix Ψ ∈ RN ×N . However, this non-parametric modeling


3
generalized hyperbolic skew-t distribution. The parametric dis-
suffers a lot from the dimensionality problem [16], which might tribution allows a faster way of computing high-order moments
not be critical on variance but is severely exacerbated on esti- of the portfolio. In Section IV, we propose efficient algorithms
mating skewness and kurtosis. For example, to obtain Φ and to solve the MVSK portfolios based on fixed point acceleration
Ψ when N = 100, we need to estimate more than 170 thou- strategy. Additionally, in Section V, we show that the proposed
sand and 4 million parameters, respectively. As the number of algorithm can be easily generalized into the MVSK-Tilting
parameters is significantly larger than the number of samples, portfolio. Then, we elaborate the performance of proposed
the estimation error is inevitably large [17]. In addition, the high-order portfolio design framework in Section VI. Finally,
storage burden is also exceptionally heavy. Any mathematical we summarize the conclusions in Section VII.
manipulations involving Φ and Ψ would demand prohibitive
computational resources and are thus not applicable to high- II. PROBLEM FORMULATIONS
dimensional problems. A. MVSK Portfolios
Apart from the high computational cost due to the matrices
Φ and Ψ, the third moment of the portfolio return is non- Let r ∈ RN denote the log-returns of N assets and w ∈ RN
convex, making it difficult to optimize. The existing methods denote the portfolio weights. The classical mean-variance port-
in the literature can be roughly classified into three major folio optimization problem is formulated as
categories: zeroth-order, first-order, and second-order methods. minimize −φ1 (w) + λφ2 (w)
The zeroth-order methods [18] usually iteratively improve the
w (1)
subject to w ∈ W,
objective values via repetitive function evaluations. For in-
stance, the differential evolution [19] and genetic algorithms where φ1 (w) refers to the first central moment, a.k.a. the mean
[20] iteratively improve solutions via searching in the feasible of the portfolio return, i.e.,
 
region. Usually, zeroth-order searching algorithms are often φ1 (w) = E wT r , (2)
criticized for their mediocre performance on the computational
cost. The first-order methods make use of the first-order deriva- φ2 (w) is the second central moment, which is the variance of
tive of the objective function. Some classical examples include the portfolio return, i.e.,
  2 
the difference-of-convex (DC) algorithms [21], [22] and some φ2 (w) = E wT r − E wT r , (3)
Majorization-Minimization algorithms [23]. However, the first-
order methods may need quite a large number of iterations λ > 0 is a risk-aversion coefficient, and W represents the fea-
to converge. In contrast, the second-order methods improve sible set of the portfolio weights. In the paper, we consider no-
the description of the descent direction by incorporating the shorting. Therefore, W is a unit simplex denoted as
second-order derivative of the objective function. For exam- 
W = w 1T w = 1, w ≥ 0 . (4)
ple, the Q-MVSK algorithm [23] presents a significantly faster
convergence rate than the first-order methods. However, the Now, we incorporate the third and fourth central moments of
per-iteration cost of second-order methods is prohibitive as the portfolio return, i.e.,
computing the Hessian has dramatically high complexity.   3 
In summary, due to the computationally expensive model- φ3 (w) = E wT r − E wT r ,
  4 
ing of high-order moments and the absence of practical opti- φ4 (w) = E wT r − E wT r , (5)
mization algorithms, the current MVSK framework can only
produce high-order portfolios in low-dimensional problems. To into the portfolio selection. This directly extends the mean-
address these limitations, in this paper, we present a novel variance portfolio into a mean-variance-skewness-kurtosis
high-order portfolio design framework that is both efficient and (MVSK) portfolio, formulated as follows
scalable. Our contributions are mainly twofold: minimize f (w) = −λ1 φ1 (w) + λ2 φ2 (w)
1) We adopt a parametric model to significantly reduce the w
memory and computational cost of obtaining the high- −λ3 φ3 (w) + λ4 φ4 (w) (6)
order moments of the portfolio return. The proposed subject to w ∈ W,
method accommodates the high-dimensional scenarios where λ1 , λ2 , λ3 , λ4 are the non-negative parameters control-
by fitting the data via a generalized hyperbolic skew-t ling the relative importance of individual moments.
distribution.
2) We propose a practical algorithm based on a robust fixed B. Current Difficulties
point acceleration strategy to solve the high-order port-
Among many difficulties regarding high-order portfolio de-
folios. The numerical experiments demonstrate that the
signs, the most fundamental bottleneck is the prohibitive cost
proposed algorithms are significantly more efficient and
of computing high-order central moments using the non-
scalable than the state-of-the-art solvers.
parametric representation. Namely, the conventional way ap-
The structure of this paper is as follows. In Section II, we first
plies the following formulas to characterize the co-skewness
introduce the high-order portfolio optimization problems and
and co-kurtosis matrices,
illustrate the current difficulties. In Section III, we present an
efficient approach to compute the skewness and kurtosis using a Φ = E [(r − μ) (r − μ) ⊗ (r − μ)] ,

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
3728 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 71, 2023

TABLE I
CONVENTIONAL NON-PARAMETRIC REPRESENTATIONS γ ν
OF HIGH-ORDER MOMENTS γ ν
γ ν
Number of Parameters Memory
to Estimate Complexity
 
Co-skewness Φ 1
N (N + 1) (N + 2) O N3
6  4
Co-kurtosis Ψ 1
24
N (N + 1) (N + 2) (N + 3) O N

TABLE II
FORMULATIONS AND COMPUTATIONAL COMPLEXITY OF
COMPUTING HIGH-ORDER MOMENTS IN NON-PARAMETRIC WAY

Formulation Complexity
 
3rd φ3 (w) wT Φ (w ⊗ w) O N 3 
Fig. 1. Illustrations for the univariate generalized hyperbolic skew-t
central ∇φ3 (w) 3Φ (w ⊗ w) O N 3 
distribution (μ = 0, Σ = 1).
moment ∇2 φ3 (w) 6Φ (I ⊗ w) O N 4 
4th φ4 (w) wT Ψ (w ⊗ w ⊗ w) O N 4 
central ∇φ4 (w) 4Ψ (w ⊗ w ⊗ w) O N 4 
Suppose that a N -dimensional random vector x follows the
moment ∇2 φ4 (w) 12Ψ (I ⊗ w ⊗ w) O N5 ghMST distribution, i.e., x ∼ ghMST (μ, Σ, γ, ν). It has the
probability density function (pdf)
Ψ = E [(r − μ) (r − μ) ⊗ (r − μ) ⊗ (r − μ)] , (7) (x−μ)T Σ−1 γ  ν
fghMST (x |μ, Σ, γ, ν ) = e N 1 · 2 ν2 2 · Γ 1ν ·
(2π) |Σ|
2 2 ( 2)
where μ = E [r]. As shown in Table I, the costs for storing Φ − ν+N   
(ν + Q (x)) γ T Σ−1 γ ,
χ+Q(x) 4
and Ψ have a high complexity. This means that we may not γ T Σ−1 γ
· K− ν+N
2
be able to set up these matrices when the problem dimension (8)
is large.
In addition, the non-parametric approach also poses tremen- where ν ∈ R++ is the degree of freedom, μ ∈ RN is the lo-
dous challenges in computing the objectives values, gradients, cation vector, γ ∈ RN is the skewness vector, Σ ∈ RN ×N
and the Hessian of the third and fourth central moments for a is the scatter matrix, Γ is the gamma function, Q (x) =
(x − μ) Σ−1 (x − μ), and Kλ is the modified Bessel func-
T
given portfolio [23]. Here, we exhibit the corresponding com-
plexities in Table II. As a result, existing first-order methods tion of the second kind with index λ [31].
could not be efficient as they often require many iterations to Remark 1: In the following contexts, μ and Σ refer to the
converge while per-iteration cost is very high. On the other parameters of ghMST distribution, that is, to the location vector
hand, existing second-order methods are not scalable and scatter matrix and not to the mean vector and covari-
 because
the complexity of computing ∇2 φ4 (w) is O N 5 . ance matrix.
Therefore, in the next section, we would present a parametric Interestingly, the ghMST distribution can be represented in
approach to model the skewness and kurtosis such that the a hierarchical structure as

concerns discussed above can be significantly eliminated. i.i.d 1 1
x|τ ∼ N μ + γ, Σ ,
τ τ
i.i.d ν ν
III. MODELING HIGH-ORDER MOMENTS USING GENERALIZED τ ∼ Gamma , , (9)
HYPERBOLIC MULTIVARIATE SKEW-t DISTRIBUTION 2 2

In this section, we illustrate how to apply a parametric dis- where N μ̃, Σ̃ denotes the multivariate Gaussian distribution
tribution to model the data and derive the high-order moments with mean vector μ̃ and covariance matrix Σ̃, and Gamma (a, b)
from the parametric model. To be more specific, this approach represents the gamma distribution of shape a and rate b.
assumes that the assets’ returns follow a multivariate gener- Fig. 1 illustrates the skewness and fat-tailness under the
alized hyperbolic skew-t distribution. Then, high-order mo- ghMST distribution. When γ is fixed, the higher the value
ments can be represented using the parameters of the fitted of ν, the thinner the tails. When ν is fixed, the larger the
distribution. To proceed, we will first present some preliminary value of γ, the heavier the skewness. Henceforth, the third- and
knowledge of the generalized hyperbolic skew-t distribution, fourth-moments are naturally embedded into the parameters of
followed by the derivation of efficient methods for computing the distribution.
high-order moments based on this distribution. In the literature, some restricted multivariate skew-t (rMST)
distributions1 [37] are also capable of modeling asymmetry
and fat-tailness. In this paper, we choose to use the ghMST
A. ghMST Distribution
distribution for two reasons. Foremost, the ghMST distribu-
The generalized hyperbolic multivariate skew-t (ghMST) tion is the only skew-t distribution that we can fit within a
distribution [24], [25], is a sub-class of the generalized hy-
1 Variants of rMST distribution include Gupta’s skew-t [32], Pyne’s skew-t
perbolic distribution [26], which is often used in economics
[33], Branco’s skew-t [34], and Azzalini’s skew-t [35]. It can be shown that
to model the data with skewness and heavy tails [27], [28], these variants have similar forms and can characterize the same distribution
[29], [30]. after some parametrization [36].

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: EFFICIENT AND SCALABLE PARAMETRIC HIGH-ORDER PORTFOLIOS DESIGN VIA THE SKEW-t DISTRIBUTION 3729

In the conventional framework, before we compute the high-


order moments of portfolio returns, we need to store large
matrices, including Φ and Ψ. Now, we suppose that a ran-
dom vector r follows a ghMST distribution. Then, according
to Lemma 2, the high-order moments can be easily computed
from the parameter set Θ = {μ, Σ, γ, ν}, which is significantly
smaller than Φ and
 Ψ.  As aresult,
 the memory complexity is
reduced from O N 4 to O N 2 . 
Lemma 2: Assuming a random vector r ∼ ghMST μ,
Σ, γ, ν , then the mean and covariance of r are given as
E [r] = μ + a1 γ,
Fig. 2. Out-of-sample log-likelihood for the restricted skew-t (rMST) and
generalized hyperbolic skew-t (ghMST) distributions. Cov [r] = a21 Σ + a22 γγ T , (11)
ν ν 2ν 2
reasonable amount of time under high-dimensional settings where a1 = a21 =
ν−2 , and a22 =
ν−2 , are scalar
(ν−2)2 (ν−4)
[36], [38]. The details of fitting time are discussed in Section A coefficients decided by ν. The third moment co-skewness ma-
of the Appendix. In short, existing implementations2 may take a trix Φ is expressed as
a32 
number of minutes to fit the rMST distribution when N ≥ 30. In Φi,(j−1)×N +k = a31 γi γj γk + γi Σjk
contrast, existing EM algorithms can efficiently fit the ghMST 3 
distribution3 with thousands of assets in few seconds [24], [41], + γj Σik + γk Σij . (12)
[42]. When N = 20, fitting a ghMST distribution is over four The fourth moment co-kurtosis matrix Ψ is expressed as
orders of magnitude faster than fitting an rMST distribution. Ψi,(j−1)N 2 +(k−1)N +l
On the other hand, rMST distributions do not provide better a42
out-of-sample fitting performance. To show this, we conduct a = a41 γi γj γk γl + (Σij γk γl + · · · + Σkl γi γj )
6   
simple experiment as follows. In each realization, we randomly 6 items
select N assets from SP500 stock list. Then, we randomly pick a43
+ (Σij Σkl + Σik Σjl + Σil Σjk ) . (13)
the data from 15N continuous trading days to form the data set 3
D. Without shuffle, D is split into training set Dtrain and test set Here 16ν 3
a31 = (ν−2)3 (ν−4)(ν−6)
2
, a32 = (ν−2)6ν2 (ν−4) , a41 =
Dtest by assigning the 2/3 data to the former and the remaining (12ν+120)ν 4 6(2ν+4)ν 3
1/3 to the latter. For each distribution, the optimal parameters (ν−2)4 (ν−4)(ν−6)(ν−8) , a42 = (ν−2)3 (ν−4)(ν−6) and a43 =
3ν 2
are obtained via the training set (ν−2)(ν−4) are coefficients determined by ν.

Θ = arg maxΘ L (Dtrain ; Θ) . (10) Proof: See Section B of the Appendix.


Another advantage of using the ghMST model comes at com-
Then we compute the out-of-sample normalized log-likelihood puting the high-order central moments of the portfolio return in
on the test set as 5N1 2 Ltest (Dtest ; Θ ) . We repeat the exper- a fast way. Specifically, though recovering the complete forms
iments 50 times for each problem dimension. Fig. 2 shows of Φ and Ψ using Lemma 2 can be computationally expensive,
that the ghMST distribution gives a higher average likelihood the skewness and kurtosis of wT r can be efficiently derived.
values when N goes large. As it is difficult to differentiate their Lemma 3: Assuming r ∼ ghMST (μ, Σ, γ, ν), then the first-
obtained likelihoods, the ghMST distribution appears to be the to-fourth central moments of wT r, denoted as φi (w) , i =
best choice for characterizing high-order moments due to its 1, . . . , 4, are given as follows
significantly more efficient estimation. φ1 (w) = wT μ + a1 wT γ,
 2
B. Computing High-Order Moments Under ghMST φ2 (w) = a21 wT Σw + a22 wT γ ,
 3   
Distribution φ3 (w) = a31 wT γ + a32 wT γ wT Σw ,
 4  2  T 
Incorporating the ghMST distribution into the design of high- φ4 (w) = a41 wT γ + a42 wT γ w Σw
order portfolios also makes it convenient to manipulate the  2
high-order moments, i.e., skewness and kurtosis. In this sub- + a43 wT Σw . (14)
section, we highlight two advantages of using the parametric Proof: See Section C of the Appendix.
ghMST distribution. Firstly, it allows for low-memory repre- Under the ghMST distribution, we can significantly speed
sentation of the co-moments of the asset return. Secondly, it up the computation of the objective value, gradient, and
provides more efficient computation of the skewness and kur- Hessian of high-order moments. Their exact expressions are
tosis to the portfolio returns. listed in Section D of the Appendix, and their correspond-
ing computational complexities are summarized in Table III.
2 For fitting rMST distribution, we apply the EM algorithm [39] imple-
Note that the per-iteration complexity
  for first-order ap-
mented in R packageEMMIXskew [40].
3 The ghMST distribution fitting process is carried out using the ’fit_mvst’ proaches has been reduced to O N 2 . In response to this,
function from the R packagefitHeavyTail. in Section IV, we present an algorithm that mainly utilizes

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
3730 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 71, 2023

TABLE III Proof: Let w ∈ W be the fixed point of G (·; η), i.e., w =
COMPUTATIONAL COMPLEXITY OF COMPUTING G (w ; η). Hence, w is the optimal solution to the following
HIGH-ORDER MOMENTS USING ghMST DISTRIBUTION
convex optimization problem:
Objective
  Gradient
  Hessian
  2
3-rd moment O N 2  O N 2  O N 3 
minimize 1
2 w − (w − η∇f (w ))2
w (17)
4-th moment O N2 O N2 O N3 subject to w ∈ W.

gradient information. As a result, the proposed algorithm can Therefore, for any y ∈ W, we have
exhibit superior scalability over state-of-the-art methods. T
(y − w ) (w − (w − η∇f (w )))
T
IV. PROPOSED METHODS FOR SOLVING MVSK PORTFOLIOS = η (y − w ) ∇f (w ) ≥ 0, (18)

In this section, we explore new practical algorithms for solv- which already indicates that w is the stationary point of
ing Problem (6) under the ghMST distribution. The proposed Problem (6).
method iteratively minimizes the objective values via searching Using Lemma 5, we can recast Problem (6) into the following
a fixed point of a projected gradient mapping. The section is optimization problem
organized as follows. We first recast the optimization prob-
find w ∈ W, subject to w = G (w; η) . (19)
lem (6) as a fixed-point problem. After that, we introduce a
fixed-point acceleration scheme to solve the fixed point more This well-known fixed-point problem can be solved by
efficiently. To overcome the convergence issues caused by the the fixed-point iteration method [45], which iterates the
acceleration scheme, we further enhance the robustness of the following update
fixed-point acceleration method and accomplish our algorithm.  
Finally, we provide an analysis of the complexity and conver- wk+1 = G wk ; η , (20)
gence of our proposed methods.
in which the function G should be Lipschitz continuous with
Lipschitz constant L < 1. In practice, this conventional ap-
A. Constructing the Fixed-Point Problem proach is often criticized for slow convergence. Hence, in the
Considering a continuous vector-to-vector mapping G: rest part of this section, we will introduce an acceleration
RN → RN , a point w is a fixed point of function G when it scheme that significantly improves its convergence.
satisfies w = G (w). In optimization,
 many iterative methods
aim at generating a sequence w1 , w2 , . . . that is expected B. Fixed-Point Acceleration
to converge to a stationary point via a designed update rule
wk+1 = G wk . As a result, when those algorithms converge, We first reformulate the fixed-point problem as finding a root
the obtained w is also the fixed point of G. In this subsection, of a residual function R : RN → RN
we will introduce the exact expression of G of interest and how R (w; η) = G (w; η) − w. (21)
solving Problem (6) can be transformed into finding a fixed
point of function G. If the problem is unconstrained, the non-smooth version of
The function G we consider is selected as Newton-Raphson method [46] solves the fixed-point problem
 Δ    via iterating the following update formula
G wk ; η = PW wk − η∇f wk , (15)    
wk+1 = wk − M−1 wk ; η R wk ; η , (22)
where η > 0 is the step size and the operator PW is defined as
 k     
a projection onto a unit simplex [43] where M w ; η ∈ RN ×N ∈ ∂R wk ; η and ∂R wk ; η is
   2 the Clarke’s generalized Jacobian of R evaluated at w = wk
PW wk = arg min w − wk 2 , (16)
w∈W [47]. However, (22) is not applicable in our case. On one hand,
which is a continuous vector-valued function defined the acceleration may render iterates infeasible, i.e., wk+1 ∈ / W.
on w ∈ RN . To make up for it, a heuristic alternative to (22) is
Remark 4: In fact, the choice of G is not unique, but (15)     
wk+1 = PW wk − M−1 wk ; η R wk ; η . (23)
is preferred because it is simple to manipulate. Instead of call-  k 
ing a quadratic programming solver,
 we
 can design a water- On the other hand, M w ; η is generally intractable to obtain.
filling algorithm [44] to solve G wk ; η efficiently. Details are But we notice that the classical directional derivative evaluated
elaborated in Section E of the Appendix. The simplicity of at w = wk still exists and is given as
solving G plays an important role in promoting the efficiency    
 k  R wk + hd; η − R wk ; η
and scalability of the proposed algorithm. Dd R w ; η = lim . (24)
Given any η > 0, the fixed point of G is the stationary point h→0 h
of Problem (6). This is shown in Lemma 5. Then, according to[46, Lemma
  forany direction d, there
2.2],
Lemma 5: The set of fixed point of G (·; η), i.e., w = exists a matrix M wk ; η ∈ ∂R wk ; η such that
G (w; η), coincides with that of the stationary points of    
Problem (6). Dd R wk ; η = M wk ; η d, (25)

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: EFFICIENT AND SCALABLE PARAMETRIC HIGH-ORDER PORTFOLIOS DESIGN VIA THE SKEW-t DISTRIBUTION 3731

 
Hence, by assigning h = 1 and d = G wk ; η − wk to (24), Meanwhile, we require another constraint
we can construct the secant equation at w = wk as  k 
     k  y1 − wk , y2k − y1k ≥ 0. (35)
M wk ; η R wk ; η ≈ V w ;η , (26)
We hope that the direction of first-level acceleration should
where the function V : RN → RN is defined as be similar to the direction of second-level acceleration. The
 Δ       inequality (35), which is equivalent to
V wk ; η = R G wk ; η ; η − R wk ; η   k     
      R w ; η , R wk ; η − αk V wk ; η ≥ 0, (36)
= G G wk ; η ; η − 2G wk ; η + wk . (27)
 
 k  another constraint for Nthe value of α , i.e., α ≥
k k
Here, we replace the matrix M wk ; η by the scaled identity provides
 k −1 b w , where the function b : R → R is denoted as
matrix α I such that the inverse of it can be easily derived.
 
The value of αk is therefore determined by approximating the b wk
following equation ⎧
⎨ R(wk ;η)22     
 k −1  k    if R wk ; η , V wk ; η < 0,
α R w ; η ≈ V wk ; η , (28) = R(wk ;η),V (wk ;η)
⎩−∞ otherwise.
whose details will be elaborated later. As a result, we have the (37)
formulation for the first-level fixed-point acceleration, i.e.,
Δ   Therefore, the value of αk is computed as the solution to the
y1k = wk − αk R wk ; η . (29) following constrained least square problem
    2 
Intuitively,as a replacement to (23), the projection of the new minimize R wk ; η − αV wk ; η  |α|
point PW y1k is expected to provide smaller residual values α  k (38)
subject to b w ≤ α < 0,
compared to wk .
Inspired by the ‘squared extrapolation method’ [48], we whose solution can be easily obtained as
introduce the second-level acceleration by defining          
αk = max − R wk ; η  V wk ; η  , b wk . (39)
 
Δ
y2k = y1k − αk R y1k ; η . (30) Δ  
k
In principle, we can also simulate yi+1 = yik − αk R yik ; η
This strategy, inspired by [49], can be seen as taking two for i > 2, but the formulations are typically more complicated
successive first-level acceleration
 using
 the same step length. to derive and more levels of approximation is more likely to
Interestingly, the value of R y1k ; η can be approximated by produce invalid acceleration.
manipulating the secant equations. To be more specific, we Compared to the conventional update (20), the proposed
assign different values of d to construct
 the secant equations. method only includes some small extra computational costs
In (26), d is set to d1 = G wk ; η − wk . Now, we set at each iteration, while significantly improve the efficiency in
    practice. However, like many other fixed point acceleration
d2 = −αk G wk ; η − wk = −αk d1 . (31)
 k   k  methods, directly iterating (34) may not yield robust results. In
This indicates that the approximation of R y1 ; η − R w ; η other words, we may obtain a sequence that does not converge.
can be obtained
 by multiplying a scaling factor −αk to Hence, we will provide our solutions to further improve the
V wk ; η , i.e., robustness of the proposed fixed-point acceleration.
     
R y1k ; η − R wk ; η ≈ −αk V wk ; η . (32)
C. A Robust Fixed Point Acceleration (RFPA) Algorithm
Therefore, we obtain the closed-form approximation for y2k as
To establish
  a stable convergence, we require that the
Δ       
y2k = wk − αk R wk ; η − αk R wk ; η − αk V wk ; η sequence f wk should be monotone, i.e.,
   2      
= wk − 2αk R wk ; η + αk V wk ; η . (33) ∀k : f wk+1 ≤ f wk . (40)
Eventually, the update for w is finalized as The strategy is illustrated as follows. When the fixed-point
 
  acceleration
 k fails to improve the objective,
 i.e., f wk+1 >
wk+1 = PW wk − 2αk R wk ; η f w , we first set wk+1 = G wk ; η . Then, we keep de-
 2   creasing it by η ← βη with a scaling factor β ∈ (0, 1) until the
+ αk V w k ; η . (34)
following condition is met
Now we introduce how to compute the value of αk . In the      T  k+1 
f wk+1 ≤ f wk + ∇f wk w − wk
literature, αk is usually estimated by minimizing a discrepancy
1  
wk − wk+1 2 .
measure
  based
 on the  secant
equation
 (28). From [50], we select + (41)
2
R wk ; η − αV wk ; η 2 |α| as our discrepancy measure. 2η
    
In addition, because the term R wk ; η in (29) can be seen Once the condition (41) holds, the sequence f wk is then
as a direction to achieve small objective values, it is naturally monotone with the details provided in Section F of the Ap-
to impose the constraint αk ≤ 0 such that the acceleration is pendix. Eventually, we summarize the proposed robust fixed
performed along with descent direction. point acceleration (RFPA) algorithm in Algorithm 1.

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
3732 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 71, 2023

Algorithm 1 Robust Fixed Point Acceleration (RFPA) algo- point of function G, we can obtain the stationary point of
rithm for solving Problem (6). Problem (6).
Theorem 6: If wk = wk+1 , then wk is a stationary point of
1: Initialize w0 ∈ W, η, η0 , β
Problem (6).
2: for k = 0, 1, 2,. . . do  k  Proof: See Section G of the Appendix.
3: Compute R wk ; η , V  w ; η     Theorem 6 indicates that the algorithm can obtain the
4: αk = max − R wk ; η  V wk ; η  , b wk .
stationary point of Problem (6) if it terminates with wk =
5: wk+1 =    2   wk+1 , which always holds in empirical studies as shown in
PW wk − 2αk R wk ; η + αk V wk ; η .
    Section VI-C.
6: if f wk+1 > f wk then
7: η  = η0 .  
8: Update wk+1 = G wk ; η  . V. EXTENSION: SOLVING MVSK-TILTING PORTFOLIOS WITH
9: while (41) not satisfied do GENERAL DETERIORATION MEASURE
10: η  ← βη  , go to step 8. Our proposed framework provides an efficient and scalable
11: end while discipline for handling high-order moments, therefore presents
12: end if great potential for more advanced and sophisticated applica-
13: Terminate loop if converges. tions, like multi-period portfolio optimization problems [51],
14: end for [52], [53], incorporating diversification into the high-order de-
signs [54], [55], and increasing the robustness of current MVSK
If no fixed
 point acceleration is applied, we only iterate formulation [56]. In this section, we explore an interesting
wk+1 = G wk ; η that satisfies (41), the RFPA algorithm example of extending our framework to other portfolios.
would reduce to the projected gradient descent (PGD) method. In portfolio theory, though the MVSK framework finds a
The main motivation of executing projected gradients is to solution on the efficient frontiers, choosing proper values for
enlarge the difference between wk+1 and wk . Theoretically, λ may be difficult and the optimal weights are often concen-
whether the fixed-point acceleration would significantly im- trated into some positions, resulting in a greater idiosyncratic
prove the convergence is decided by the numerical properties risk [57]. Therefore, we can generalize the idea of the RFPA
at wk . Therefore, if the difference of wk and wk+1 is not large algorithm for solving another important high-order portfolio
enough while the fixed-point acceleration at wk is not success- called the MVSK-Tilting problem with general deterioration
ful, the algorithm tends to reject the fixed-point acceleration at measures. This MVSK-Tilting portfolio aims at improving a
wk+1 due to their similar numerical properties. given portfolio that is not sufficiently optimal from the MVSK
perspective by tilting it toward a direction that concurrently
ameliorates all the objectives [58], [59].
D. Complexity Analysis and Convergence Analysis The problem of interest is formulated as
The overall complexity of the proposed RFPA algorithm is minimize −δ + λ · gdet (w)
O N 2 . Specifically, the per-iteration cost of the proposed w,δ
RFPA algorithm comes from two parts: computing the gradi- subject to φ1 (w) ≥ φ1 (w0 ) + d1 δ,
 
ent ∇f wk and solving a projection problem PW . With the φ2 (w) ≤ φ2 (w0 ) − d2 δ, (42)
help of the parametric skew-t distribution, the computational φ3 (w) ≥ φ3 (w0 ) + d3 δ,
 
complexity of computing the gradient is reduced to O N 2 . φ4 (w) ≤ φ4 (w0 ) − d2 δ,
For solving the projection problems, the computational com- w ∈ W,
plexity mainly depends on finding proper values of the dual  T
variables via bisection. According to Section E of the Appendix, where d = d1 d2 d3 d4 ≥ 0 represents the relative
the primary cost of the water-filling algorithm is to sort an importance of each target, gdet (w) is a differentiable function
array of numbers. Therefore, the corresponding complexity is that corresponds to an assigned deterioration measure with re-
O (N log N ). In conclusion, regardless of the number of outer spect to w0 , and λ is the regularization coefficient. For example,
iterations, the gdet (w) can represent a tracking error
 overall
 complexity of the proposed RFPA algo-
rithm is O N 2 . T
gdet (w) = (w − w0 ) Cov [r] (w − w0 ). (43)
On the contrary, if we apply the non-parametric modeling
of the high-order moments, then the bottleneck of all the al- Implicitly, the point w0 refers to a reference portfolio that
gorithms would be the  computation
  of the gradient or the satisfies w0 = arg minw∈W gdet (w), indicating that the penalty
Hessian, which are O N 4 or O N 5 , respectively. After we would be imposed when we tilt w away from w0 .
assume the returns follow a parametric skew-t distribution, the As the key for the success of the RFPA algorithm is to form
complexity of the second-order methods, like Q-MVSK algo- a separable function G such that the fixed point of G is the
rithm
 and sequential quadratic programming method, becomes stationary point we want to obtain. The function G corresponds
O N 3 due to the complexity of evaluating ∇2 φ4 (w). to an optimization problem that has the following properties:
The convergence of the RFPA algorithm for MVSK portfo- • The objective function of the optimization problem is
lio optimization is given as Theorem 6. By solving the fixed separable.

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: EFFICIENT AND SCALABLE PARAMETRIC HIGH-ORDER PORTFOLIOS DESIGN VIA THE SKEW-t DISTRIBUTION 3733

• The constraint of the optimization problem is simple. In


our case, we require that the constraint is just w ∈ W.
Therefore, we first move the MVSK-Tilting constraints into
the objective, resulting in the following equivalent problem:
minimize max [ϕ (w)] + λ · gdet (w)
w (44)
subject to w ∈ W,
in which
⎡ ⎤ ⎡1 ⎤
ϕ1 (w) d1 [φ1 (w0 ) − φ1 (w)]
⎢ϕ2 (w)⎥ ⎢ d1 [φ2 (w) − φ2 (w0 )]⎥
ϕ (w) = ⎢ ⎥ ⎢ 2
⎣ϕ3 (w)⎦ = ⎣ 1
⎥. (45)
d3 [φ3 (w0 ) − φ3 (w)]⎦
ϕ4 (w) d
1
4
[φ4 (w) − φ4 (w0 )]
Fig. 3. Errors of non-parametric and parametric approaches.
To alleviate the difficulty taken by the non-smoothness of
the max term, instead of directly solving Problem (44), we A. On Applying the ghMST Distribution
solve the relaxation of (44) via the p -norm smoothing approxi- The portfolios based on parametric representation of the
mation, i.e., high-order moments distinguishes the portfolio obtained from
minimize gp (w) = t1 + ϕ (w)p + λ · gdet (w) traditional MVSK framework. In other words, given the same
w (46) data and optimization problem, we can either compute φi (w),
subject to w ∈ W, i = 1, 2, 3, 4, using non-parametric sample moments Φ and
where p is a positive integer, and t is larger than any possible Ψ in (7), or the parametric Θ from ghMST distribution in
value of the elements of ϕ (w) such that Lemma 3, resulting in different optimal portfolios.
Assuming the data follows a ghMST distribution with the true
lim t1 + ϕ (w)p − t = max [ϕ (w)] . (47) parameter Θtrue . We generate the synthetic data set D based on
p→∞
Θtrue , then construct the high-order portfolios using either non-
When the value of p is large enough, the relaxed problem re- parametric approach or parametric skew-t approach. Here we
duces to the original problem. As gp (w) is smooth, the gradient consider an MVSK formulations with λ = (1, 1, 1, 1) with wtrue
exists for any w ∈ W, we have as its optimal portfolio, i.e.,

t1 + ϕ (w)p wtrue = arg min f (w; Θtrue , λ) . (51)
∂w ⎡ w∈W
T⎤
"p−1 − d1 ∇φ1 (w)
1
! Using the non-parametric approach, we first estimate Φ and
T ⎢ T ⎥
⎢ d2 ∇φ2 (w) ⎥
1
(t1 + ϕ (w)) Ψ from D, then obtain the optimal portfolio wnp as the so-
= ⎢ 1 ⎥. (48)
t1 + ϕ (w)p ⎣− d3 ∇φ3 (w)T ⎦ lution to (6). While with the parametric approach, we have
T to fit the ghMST distribution given D, then solve the optimal
d4 ∇φ4 (w)
1
portfolio wst based on the estimated parameters Θ. Here, we
2
Hence, the relaxed problem is equivalent to find the fixed point denote the errors np and st as np = wnp − wtrue  and st =
2
of the following function wst − wtrue  , respectively.
 Δ    We repetitively evaluate the errors from different data sets
G wk ; η = PW wk − η∇gp wk , (49) under different problem sizes. According to the result shown in
where η is the step size and Fig. 3, the parametric skew-t approach produces smaller errors
than the non-parametric approach on any problem size.
  ∂
∇gp wk = t1 + ϕ (w)p
∂w w=wk
B. On Solving MVSK Portfolio Using RFPA Algorithm

+λ g det (w) . (50) In this subsection, we conduct experiments to evaluate how
∂w w=wk
applying the ghMST distribution would accelerate the existing
By simply applying Algorithm 1, the RFPA algorithm for the and proposed algorithms and the performance of our proposed
MVSK-Tilting problem with general deterioration measure can RFPA algorithm on efficiency and scalability. We mainly uti-
be easily solved. lize real-world data for the experiments. The data is randomly
selected from the S&P 500 stock index. The trading period is
VI. NUMERICAL SIMULATIONS chosen from 2011-01-01 to 2020-12-31.
1) Comparing Non-Parametric and Parametric (ghMST)
In this section, we conduct numerical experiments for eval-
Approach: We first perform the comparison on the non-
uating our proposed high-order portfolio solving framework4 .
parametric and parametric modeling of the high-order mo-
4 We have released an R package highOrderPortfolios implementing our ments. Given the data, we first estimate the parameter Θ for
proposed algorithms at https://github.com/dppalomar/highOrderPortfolios. the ghMST distribution, then generate the sample moments, i.e.,

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
3734 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 71, 2023

Fig. 5. Comparison of algorithms with respect to the computational time


under different data dimension.

for Nlopt (Non-parametric) and Nlopt (skew-t) is 9.349 and


1.275 seconds, respectively. When N = 400, the number for
Nlopt(skew-t) becomes 52.934 seconds. Note that all the non-
parametric approaches, which model the high-order moments
using sample moments, are not applicable in high dimension
due to the memory limit. Besides, MM methods require com-
puting η that meets the condition η1 ≥ supw∈W ∇f (w)2 ,
which is computationally expensive to obtain in high dimen-
sional problems. From the numerical simulations we observe
the following.
• By applying the parametric skew-t distribution, we can
accelerate the MVSK portfolio design by one-to-two
orders of magnitude given any optimization algorithm
when N = 100.
Fig. 4. Convergence of algorithms for solving the MVSK portfolio opti- • The per-iteration cost of proposed RFPA and PGD algo-
mization problems (6). rithms is significantly smaller than other methods with the
sample skewness matrix and kurtosis matrix, using Lemma 2. help of water-filling algorithms.
In this way, φi (w), i = 1, 2, 3, 4, will produce the same values • The effect of using the parametric skew-t distribution tends
under both non-parametric and parametric modeling. to be algorithm-dependent. The acceleration is more no-
We list the benchmarks as (first-order) MM algorithm [23], ticeable for first-order algorithms like RFPA, which has
projected gradient descent (PGD) method, Q-MVSK (second- negligible per-iteration cost.
order SCA) algorithm [60], the nonlinear optimization solver 2) Comparison on Efficiency: To better compare the effi-
’Nlopt’ [61] and our proposed RFPA algorithm. The inner ciency of the proposed algorithms, we also conduct experiments
solver for QP is selected as quadprog [62]. The weights λ are using real-world data sets with different problem dimensions.
determined according to the Constant Relative Risk Aversion For each problem size, we set η = 5, β = 0.5, and take 200
utility function independent
 experiments
 with ξ randomly drawn from the in-
# $ terval 10−1 , 10 . All the methods are initialized with the same
ξ ξ (ξ + 1) ξ (ξ + 1) (ξ + 2) starting point w0 . For Nlopt, the stopping criteria are set as
λT = 1, , , , (52)
2 6 24 the default. For Q-MVSK, PGD, and RFPA, the algorithms
where ξ ≥ 0 is a parameter to measure the risk aversion [63]. are regarded as converged when both the following conditions
Suggested by [64], [65], [66], we set ξ = 6 in this experiment. are satisfied:
 
We further choose η = 5, β = 0.5 and investigate the empirical wk+1 − wk ≤ 10−6 wk+1 + wk , (53)
convergence of all algorithms under two different dimensions  k+1   k −6
  k+1   k 
N = 100 and N = 400. The gap is defined as the difference of f w −f w ≤ 10 f w + f w .
the objective value at each iteration and the smallest objective (54)
value we obtained across all the methods. When N = 400, According to the numerical simulation results shown in
we cannot compare the performance of the non-parametric ap- Fig. 5, our proposed outperforms the state-of-the-art methods
proaches due to the memory limit that renders them intractable. by one-to-two orders of magnitude when we assume the data
We have the following observations according to the simu- follows a ghMST distribution. The difference seems to be
lation results exhibited in Fig. 4. When N = 100, the time cost enlarged when the problem dimension increases. Besides, the

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: EFFICIENT AND SCALABLE PARAMETRIC HIGH-ORDER PORTFOLIOS DESIGN VIA THE SKEW-t DISTRIBUTION 3735

TABLE IV
EMPIRICAL ORDERS OF COMPLEXITY

Q-MVSK Nlopt PGD RFPA


2.864 3.827 1.976 1.944

Fig. 7. Median relative difference at each iteration.

tend to zero after 20 iterations and the final solution would


converge to the stationary point of Problem (6).

VII. CONCLUSION
Fig. 6. Investigation on the empirical complexity of RFPA and Q-MVSK In this paper, we have proposed a high-order portfolio design
algorithms.
framework with the help of the parametric skew-t distribution
RFPA algorithm appears to be more stable compared to the and a robust fixed point acceleration. The parametric approach
PGD method. is practical for modeling the skewness and kurtosis of portfolio
3) Comparison on Scalability: Interestingly, implied by returns in high-dimensional settings. By assuming the returns
Fig. 5, first-order methods, including RFPA and PGD, appear follow a ghMST distribution, we can alleviate the difficulties
to be more scalable than the second-order Q-MVSK algorithm. caused by the high complexity of traditional methods and accel-
To better investigate this phenomenon, we will be conducting a erate all existing algorithms to a certain extent. Additionally, the
comparison of these algorithms using a synthetic data set, where proposed RFPA algorithm immensely cut down the number of
the parameter Θ is randomly generated. iterations for first-order methods. Numerical simulations have
As shown in Fig. 6, the proposed RFPA algorithm has demonstrated the outstanding efficiency and scalability of our
a significantly lower complexity compared to the Q-MVSK proposed framework over the state-of-the-are benchmarks.
algorithm, as its every single iteration does not contain pro-
cedures with high complexity. Meanwhile, PGD method also APPENDIX
enjoys the benefits of low complexity but its overall efficiency
is worse than the RFPA method. We also fit the empirical A. Computational Time of Different Estimation Methods
orders of the four methods considered. The relative results are Fig. 8 depicts the computational time of different estimation
shown in Table IV. It turns out that theempirical computational methods. It can be observed that fitting the ghMST distribution
complexity of our method is O N 2 and the complexity   of is much more efficient than others.
the second-order method Q-MVSK is around O N 3 . The
results of numerical simulations coincide with the discussion in B. Proof for Lemma 2
Section IV-D. The proof starts with a fact that the central moments of a
Gaussian variable X̃ ∼ N μ̃, Σ̃ is given by
C. Empirical Convergence of the Proposed RFPA Algorithm
E[X̃i ] = μ̃i ,
According to Theorem 6, when wk = wk+1 , the algorithm
terminates at a stationary point of Problem (6). Though exact E[X̃i X̃j ] = μ̃i μ̃j + Σ̃ij ,
equality is often unattainable, empirically, the relative differ- E[X̃i X̃j X̃k ] = μ̃i μ̃j μ̃k + μ̃i Σ̃jk + μ̃j Σ̃ik + μ̃k Σ̃ij ,
ence of w, denoted as E[X̃i X̃j X̃k X̃l ] = μ̃i μ̃j μ̃k μ̃l + (Σ̃ij μ̃k μ̃l + · · · + Σ̃kl μ̃i μ̃j )
 Δ     
Relative Error wk = wk − wk−1  wk  , (55) 6 items
+ (Σ̃ij Σ̃kl + Σ̃ik Σ̃jl + Σ̃il Σ̃jk ). (56)
would tend to zero. To show this, we conduct experiments using
i.i.d
 term of the hierarchical structure r|τ ∼
real-world data sets with different problem dimensions. The Then,
 given the first
values of (55) are computed at each iteration. From Fig. 7 we N μ + τ1 γ, τ1 Σ , we have
observe that the differences all reduce to very small
 numbers.
 1
Empirical studies show that the residual value R wk ; η would E [ri |τ ] = μi + γi , (57)
τ

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
3736 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 71, 2023

C. Proof for Lemma 3


Assuming r ∼ ghMST (μ, Σ, γ, ν), which indicates that
the portfolio return wT r satisfies the following hierarchical
structure:

i.i.d 1 T 1 T
w r|τ ∼ N w μ + w γ, w Σw ,
T T
τ τ
i.i.d ν ν
τ ∼ Gamma , , (61)
2 2
Then, according to (61), we have
 
wT r ∼ ghMST wT μ, wT Σw, wT γ, ν . (62)
As wT r is a scalar, its high-order central moments, i.e., Φ and
Ψ are all scalars. Based on Lemma 2, we replace μ, Σ, and γ
with wT μ, wT Σw, and wT γ, respectively. Then we can obtain
% 3 &
1 ν  T 3
φ3 (w) = Φ = E − w γ +
τ ν−2
# $
1 1 ν  T 
3E − w Σw · wT γ ,
τ τ ν−2
% 4 &
1 ν  T 4
φ4 (w) = Ψ = E − w γ
τ ν−2
Fig. 8. Comparing computational time (seconds) of different estimation % 2 &
methods. 1 ν  T 2  T 
+ 6E − w γ w Σw
# $ τ ν−2
1 ν # $
E [ri ] = μi + E γi = μi + γi . (58) 1  2
τ ν−2 + 3E 2 a43 wT Σw . (63)
τ
Meanwhile, the hierarchical structure can be further written as Simply follows the definition of a, Lemma 3 is proved.

i.i.d 1 1 D. Gradient and Hessian of the High-Order Moments
r|τ − E [r|τ ] ∼ N μ + γ − E [r|τ ] , Σ ,
τ τ
i.i.d ν ν Based on Lemma 3, the gradient and Hessian of the skewness
τ ∼ Gamma , , (59) and kurtosis subject to w can be computed as
2 2
 2
∇φ3 (w) = 3a31 wT γ γ
where μ + τ1 γ − E [r|τ ] = τ1 − ν−2
ν
γ. Therefore, we can     
compute the central moments of r̃|τ = r|τ − E [r|τ ] by regard- + a32 wT Σw γ + 2 wT γ Σw ,
 
ing μ̃ = τ1 − ν−2
ν
γ and Σ̃ = τ1 Σ: ∇2 φ3 (w) = 6a31 wT γ γγ T
 
2 + 2a32 γwT Σ + Σwγ T + wT γΣ ,
1 ν 1  3
E[ r̃i r̃j | τ ] = − γi γj + Σij , ∇φ4 (w) = 4a41 wT γ γ
τ ν−2 τ  2    
3  + 2a42 wT γ Σw + wT Σw wT γ γ
1 ν 1 1 ν
E[ r̃i r̃j r̃k | τ ] = − γi γj γk + − ·  
τ ν−2 τ τ ν−2 + 4a43 wT Σw Σw,
[γi Σjk + γj Σik + γk Σij ] ,  2
4 ∇2 φ4 (w) = 12a41 wT γ γγ T
1 ν #
E[ r̃i r̃j r̃k r̃l | τ ] = − γi γj γk γl +    2
τ ν−2 + 2a42 2 wT γ Σwγ T + wT γ Σ
2 $
1 ν 1  T   T  T
− · T
+ 2 w γ γw Σ + w Σw γγ
τ ν−2 τ
(Σij γk γl + · · · + Σkl γi γj ) + # $
    
+ 4a43 2ΣwwT Σ + wT Σw Σ . (64)
6 items
1
(Σij Σkl + Σik Σjl + Σil Σjk ). (60) E. Water-Filling Algorithm
τ2
 
By taking expectation subject to τ , i.e., E τ −1 = ν−2 ν
, Here we consider an optimization problem
 −2  2  −3
 3   k  k 2
E τ ν
E τ ν 1  
2 w − w − η∇f w
= (ν−2)(ν−4) , = (ν−2)(ν−4)(ν−6) , and minimize
  ν4 w 2 (65)
E τ −4 = (ν−2)(ν−4)(ν−6)(ν−8) , the Lemma 2 is obtained. subject to w ∈ W.

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: EFFICIENT AND SCALABLE PARAMETRIC HIGH-ORDER PORTFOLIOS DESIGN VIA THE SKEW-t DISTRIBUTION 3737


Given W = w 1T w = 1, w ≥ 0 , the Lagrangian of Prob- G. Proof of Theorem 6
lem (65) is
Proof: When wk = wk+1 , we may have wk+1 = PW wk −
1       2    
L (w, ψ, γ) = w − wk − η∇f wk 2 2αk R wk ; η + αk V wk ; η or wk+1 = G wk ; η  .
2  T 
2  
−ψ w+γ 1 w−1 ,
T
(66) (i) We first analyze the first case where wk+1 = PW yk ,
in which
where ψ and γ are dual variables associated with the constraints Δ    2  
yk = wk − 2αk R wk ; η + αk V wk ; η . (75)
w ≥ 0 and 1T w = 1, respectively. The KKT conditions are
    By applying the contraposition, we prove the following state-
η∇f wk + w − wk − ψ + γ1 = 0, ment instead
ψ  w = 0. (67)    
∀wk ∈ W : R wk ; η = 0 ⇒ PW yk = wk . (76)
Hence, we have
     For simplicity,
 we denote α = −αk > 0. Note that α = 0 as
wi = max 0, wik − η ∇f w k
i
−γ . (68) R w ; η = 0.
k

(A) If α ∈ (0, 1], then, we obtain


Define a continuous and monotone decreasing function      
ζ : R → R: yk = 1 − 2α + α2 wk + 2α − 2α2 G wk ; η
   
+ α2 G G w k ; η ; η
'
N
          
ζ (γ) = max 0, wik − η ∇f wk i − γ − 1 (69) Δ
= a k w k + bk G w k ; η + c k G G w k ; η ; η (77)
i=1
in which ak = 1 − 2α + α2 , bk = 2α − 2α2 , and ck = α2 . As
with ζ (−∞) = +∞ and ζ (−∞) = −1, the root 0 < α ≤ 1, we have 0 ≤ ak < 1, 0 ≤ bk ≤ 12 , 0 < ck ≤ 1, and
ak+ bk + ck = 1. Hence, k
  ky is aconvex combination of wk ,
γ  = arg (ζ (γ) = 0) (70)
G w ; η , and G G w ; η ; η . As a result, y ∈ W and
k k

exists and is unique. The root provides a dual optimal of the the projection of yk onto W is itself, i.e., PW yk = yk .
KKT system. We can easily solve γ and w via bisection. Consequently, we obtain
 
wk+1 = PW yk = wk . (78)
  
F. Monotonicity of the Sequence f wk (B) If α ∈ (1, ∞). We will first show that the following
According to the projection theorem [67], i.e., inequality holds for any wk
Δ       
∀x, z : z − x, PW (z) − PW (x) ≥ PW (z) ξ = R wk ; η , R wk ; η + αV wk ; η ≥ 0. (79)
2
−PW (x)2 , (71) In principle,
 we consider
 the following
 three cases based on the
  value of R wk ; η , V wk ; η .
we apply z = wk − η∇f wk and x = wk to obtain  
(B.1) If R wk ; η , V wk ; η ≥ 0, then b wk = −∞.
     2 (79) holds as ∀α > 1:
−η∇f wk , wk+1 − wk ≥ wk+1 − wk 2 , (72)
  2     
ξ = R wk ; η  + α R wk ; η , V wk ; η ≥ 0. (80)
or equivalently
  k     
    1 2 (B.2) If R w ; η , V wk ; η < 0 and b wk =
∇f wk , wk+1 − wk ≤ − wk+1 − wk 2 . (73) R(wk ;η)2
2
 
η = −∞, we have α ≤ −b wk . In this case,
R(wk ;η),V (wk ;η)   
Hence, from the inequality (41) we have (79) holds as ∀α ∈ 1, −b wk :
     T  k+1    2     
f wk+1 ≤ f wk + ∇f wk w − wk ξ = R w k ; η  + α R w k ; η , V w k ; η
  2      
1  
wk − wk+1 2 ≥ R wk ; η  − b wk R wk ; η , V wk ; η = 0.
+
2η 2 (81)
 k  1  k+1 2   k   k   k
≤ f w − w − w k 2 (B.3) If  R w ; η , V w ; η < 0 but b w → −∞
η
 ; η  → 0, the value of α can
k
1  2   tok Vw
due  be either
+  w − wk+1 2
k R w ; η  V wk ; η  → ∞ or −b wk → ∞. When
2η       
α = R wk ; η  V wk ; η , we suppose
 k 1    
wk − wk+1 2 ≤ f wk , (74)
=f w −   k   
2η 2
R w ; η , V wk ; η
        
which indicates that the sequence f wk is then monotone. = R wk ; η  V wk ; η  cos θR,V , (82)

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
3738 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 71, 2023

   
in which θR,V is the angle between R wk ; η and V wk ; η . [6] E. Jondeau and M. Rockinger, “Conditional volatility, skewness, and
Hence, as cos θR,V ∈ [−1, 1], we obtain kurtosis: Existence, persistence, and comovements,” J. Econ. Dyn.
   Control, vol. 27, no. 10, pp. 1699–1737, 2003.
  k 2 R wk ; η    k    [7] B. O. Bradley and M. S. Taqqu, “Financial risk and heavy tails,” in

ξ = R w ;η  + R w ; η , V wk ; η Handbook of Heavy Tailed Distributions in Finance. Amsterdam, The
V (w ; η)
k
Netherlands: Elsevier, 2003, pp. 35–103.
  k 2

= R w ;η  (1 + cos θR,V ) ≥ 0. (83) [8] J. V. Rosenberg and T. Schuermann, “A general approach to integrated
risk management with skewed, fat-tailed risks,” J. Financial Econ., vol.
  R(wk ;η)
2 79, no. 3, pp. 569–614, 2006.
When α = −b wk = − R(wk ;η),V (wk2;η) , it is obvious that [9] K. Gaurav and P. Mohanty, “Effect of skewness on optimum portfolio
selection,” IUP J. Appl. Finance, vol. 19, no. 3, p. 56, 2013.
  2      
ξ = R wk ; η  − b wk R wk ; η , V wk ; η = 0.
[10] D. Maringer and P. Parpas, “Global optimization of higher order
moments in portfolio selection,” J. Global Optim., vol. 43, no. 2,
(84) pp. 219–230, 2009.
[11] R. C. Scott and P. A. Horvath, “On the direction of preference for
Therefore, (79) holds. As a consequence, we can compare moments of higher order than the variance,” J. Finance, vol. 35, no.
the following two terms 4, pp. 915–919, 1980.
  [12] L. T. DeCarlo, “On the meaning and use of kurtosis,” Psychol. Methods,
yk − wk = αR wk ; η vol. 2, no. 3, p. 292, 1997.
     [13] E. Jondeau and M. Rockinger, “Optimal portfolio allocation under higher
+ α R wk ; η + αV wk ; η ,
    moments,” Eur. Financial Manage., vol. 12, no. 1, pp. 29–55, 2006.
yk − G wk ; η = (α − 1) R wk ; η [14] C. R. Harvey, J. C. Liechty, M. W. Liechty, and P. Müller, “Portfo-
     lio selection with higher moments,” Quant. Finance, vol. 10, no. 5,
+ α R wk ; η + αV wk ; η , (85) pp. 469–485, 2010.
[15] J. He, Q.-G. Wang, P. Cheng, J. Chen, and Y. Sun, “Multi-period mean-
by evaluating the difference of their squared 2 norms, i.e., variance portfolio optimization with high-order coupled asset dynamics,”
 k     IEEE Trans. Autom. Control, vol. 60, no. 5, pp. 1320–1335, May 2015.
y − w k 2 − y k − G w k ; η 2 [16] L. Martellini and V. Ziemann, “Improved estimates of higher-order
  2 comoments and implications for portfolio selection,” Rev. Financial
= (2α − 1) R wk ; η  Stud., vol. 23, no. 4, pp. 1467–1502, 2010.
       [17] E. Jondeau, “Asymmetry in tail dependence in equity portfolios,” Com-
+ 2α R wk ; η , R wk ; η + αV wk ; η . (86) put. Statist. Data Anal., vol. 100, pp. 351–368, 2016.
[18] S. Liu, P.-Y. Chen, B. Kailkhura, G. Zhang, A. O. Hero III, and
Then, we obtain the following strict inequality
P. K. Varshney, “A primer on zeroth-order optimization in signal process-
 k    
y − w k 2 − y k − G w k ; η 2 > 0 (87)
ing and machine learning: Principals, recent advances, and applications,”
IEEE Signal Process. Mag., vol. 37, no. 5, pp. 43–54, Sep. 2020.
  k   k
as α > 1 and R w ; η  > 0. Therefore,
 PW y = wk as [19] B. Babu and M. M. L. Jehan, “Differential evolution for multi-objective
optimization,” in Proc. Congr. Evol. Comput. (CEC), vol. 4. Piscataway,
there exists a feasible point G w ; η ∈ W that is closer to yk
k
NJ, USA: IEEE, 2003, pp. 2696–2703.
compared to wk .     [20] S. Kshatriya and P. K. Prasanna, “Genetic algorithm-based portfolio
Hence, we have shown that PW yk = wk if R wk ; η = optimization with higher moments in global stock markets,” J. Risk,
vol. 20, no. 4, pp, 1–26, 2018.
0. As a result, we have obtained the following statement [21] T. P. Dinh and Y.-S. Niu, “An efficient DC programming approach for
    portfolio decision with higher moments,” Comput. Optim. Appl., vol. 50,
PW yk = wk ⇒ R wk ; η = 0. (88) no. 3, pp. 525–554, 2011.
[22] Y.-S. Niu and Y.-J. Wang, “Higher-order moment portfolio optimization
Then, wk is a stationary point of Problem (6) according to via the difference-of-convex programming and sums-of-squares,” 2019,
Lemma 5. arXiv:1906.01509.
(ii) We then analyze the second case where [23] R. Zhou and D. P. Palomar, “Solving high-order portfolios via successive
   convex approximation algorithms,” IEEE Trans. Signal Process., vol. 69,
wk+1 = PW wk − η  ∇f wk . (89) pp. 892–904, 2021.
[24] K. Aas and I. H. Haff, “The generalized hyperbolic skew Student’s
Then, wk is a stationary point of Problem (6) when wk+1 = wk t-distribution,” J. Financial Econometrics, vol. 4, no. 2, pp. 275–
with the proof directly from [68], Theorem 9.10]. 309, 2006.
[25] Y. Wei, Y. Tang, and P. D. McNicholas, “Mixtures of generalized
In conclusion, once we obtain wk+1 = wk from the proposed hyperbolic distributions and mixtures of skew t-distributions for model-
RFPA algorithm, wk is a stationary point of Problem (6). based clustering with incomplete data,” Comput. Statist. Data Anal., vol.
130, pp. 18–41, 2019.
[26] O. Barndorff-Nielsen, “Exponentially decreasing distributions for the
REFERENCES logarithm of particle size,” Proc. R. Soc. London A. Math. Phys. Sci.,
[1] H. M. Markowitz, “Portfolio selection,” J. Finance, vol. 7, no. 1, pp. 77– vol. 353, no. 1674, pp. 401–419, 1977.
91, 1952. [27] M. Hellmich and S. Kassberger, “Efficient and robust portfolio opti-
[2] H. M. Markowitz, “Foundations of portfolio theory,” J. Finance, mization in the multivariate generalized hyperbolic framework,” Quant.
vol. 46, no. 2, pp. 469–477, 1991. Finance, vol. 11, no. 10, pp. 1503–1516, 2011.
[3] C. Adcock, M. Eling, and N. Loperfido, “Skewed distributions in finance [28] W. Hu and A. Kercheval, “Risk management with generalized hyperbolic
and actuarial science: A review,” Eur. J. Finance, vol. 21, no. 13–14, distributions,” in Proc. 4th IASTED Int. Conf. Financial Eng. Appl.
pp. 1253–1281, 2015. Berkeley, CA, USA: ACTA Press, 2007, pp. 19–24.
[4] S. I. Resnick, Heavy-Tail Phenomena: Probabilistic and Statisti- [29] J. R. Birge and L. Chavez-Bedoya, “Portfolio optimization under a
cal Modeling. New York, NY, USA: Springer Science & Business generalized hyperbolic skewed t-distribution and exponential utility,”
Media, 2007. Quant. Finance, vol. 16, no. 7, pp. 1019–1036, 2016.
[5] P. N. Kolm, R. Tütüncü, and F. J. Fabozzi, “60 years of portfolio [30] M. Haas and C. Pigorsch, “Financial economics, fat-tailed distributions,”
optimization: Practical challenges and current trends,” Eur. J. Oper. Res., in Encyclopedia of Complexity and Systems Science, New York, NY,
vol. 234, no. 2, pp. 356–371, 2014. USA: Springer, vol. 4, no. 1, pp. 3404–3435, 2009.

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: EFFICIENT AND SCALABLE PARAMETRIC HIGH-ORDER PORTFOLIOS DESIGN VIA THE SKEW-t DISTRIBUTION 3739

[31] O. E. Barndorff-Nielsen, T. Mikosch, and S. I. Resnick, Lévy [56] C. Chen and Y.-S. Zhou, “Robust multiobjective portfolio with higher
Processes: Theory and Applications. Boston, MA, USA: Springer moments,” Expert Syst. Appl., vol. 100, pp. 165–181, 2018.
Science & Business Media, 2012. [57] A. J. Prakash, C.-H. Chang, and T. E. Pactwa, “Selecting a portfolio with
[32] A. Gupta, “Multivariate skew t-distribution,” Statist.: J. Theor. Appl. skewness: Recent evidence from US, European, and Latin American eq-
Statist., vol. 37, no. 4, pp. 359–363, 2003. uity markets,” J. Banking Finance, vol. 27, no. 7, pp. 1375–1390, 2003.
[33] S. Pyne et al., “Automated high-dimensional flow cytometric data [58] E. Jurczenko and B. Maillet, Multi-Moment Asset Allocation and Pricing
analysis,” Proc. Nat. Acad. Sci., vol. 106, no. 21, pp. 8519–8524, 2009. Models. Hoboken, NJ, USA: Wiley, 2006.
[34] M. D. Branco and D. K. Dey, “A general class of multivariate skew- [59] K. Boudt, D. Cornilly, F. Van Holle, and J. Willems, “Algorithmic
elliptical distributions,” J. Multivariate Anal., vol. 79, no. 1, pp. 99– portfolio tilting to harvest higher moment gains,” Heliyon, vol. 6, no. 3,
113, 2001. 2020, Art. no. e03516.
[35] A. Azzalini and A. Capitanio, “Distributions generated by perturbation [60] R. Zhou and D. P. Palomar, “Solving high-order portfolios via successive
of symmetry with emphasis on a multivariate skew t-distribution,” convex approximation algorithms,” IEEE Trans. Signal Process., vol. 69,
J. R. Statist. Soc.: Ser. B (Statist. Methodol.), vol. 65, no. 2, pp. 367– pp. 892–904, 2021.
389, 2003. [61] S. G. Johnson, “The NLopt nonlinear-optimization package,” 2014.
[36] S. X. Lee and G. J. McLachlan, “On mixtures of skew normal and skew [Online]. Available: http://ab-initio.mit.edu/nlopt
t-distributions,” Adv. Data Anal. Classification, vol. 7, no. 3, pp. 241– [62] B. A. Turlach and A. Weingessel, “Quadprog: Functions to solve
266, 2013. quadratic programming problems,” R Package Version 1.7, 2020. [On-
[37] S. K. Sahu, D. K. Dey, and M. D. Branco, “A new class of multivariate line]. Available: https://CRAN.R-project.org/package=quadprog
skew distributions with applications to Bayesian regression models,” [63] K. Boudt, W. Lu, and B. Peeters, “Higher order comoments of multifac-
Can. J. Statist., vol. 31, no. 2, pp. 129–150, 2003. tor models and asset allocation,” Finance Res. Lett., vol. 13, pp. 225–
[38] S. Lee and G. J. McLachlan, “Finite mixtures of multivariate skew 233, 2015.
t-distributions: Some recent and new results,” Statist. Comput., vol. 24, [64] A. Elminejad, T. Havranek, and Z. Irsova, “Relative risk aversion: A
no. 2, pp. 181–202, 2014. meta-analysis,” 2022.
[39] K. Wang, S.-K. Ng, and G. J. McLachlan, “Multivariate skew t mixture [65] R. B. Barsky, F. T. Juster, M. S. Kimball, and M. D. Shapiro, “Preference
models: Applications to fluorescence-activated cell sorting data,” in parameters and behavioral heterogeneity: An experimental approach in
Proc. Digit. Image Comput.: Techn. Appl. Piscataway, NJ, USA: IEEE, the health and retirement study,” Quart. J. Econ., vol. 112, no. 2,
2009, pp. 526–531. pp. 537–579, 1997.
[40] K. Wang, A. Ng, G. McLachlan, and M. S. Lee, “Package [66] G. G. Pennacchi, Theory of Asset Pricing. Boston, MA, USA:
’EMMIXskew’,” 2018. [Online]. Available: http://cran.nexr.com/web/ Pearson/Addison-Wesley, 2008.
packages/EMMIXskew/ [67] F. Facchinei and J.-S. Pang, Finite-Dimensional Variational Inequalities
[41] W. Breymann and D. Lüthi, “ghyp: A package on generalized hyperbolic and Complementarity Problems. Springer, 2003.
distributions,” Manual for R Package ghyp, 2013. [68] A. Beck, Introduction to Nonlinear Optimization: Theory, Algorithms,
[42] A. J. McNeil, R. Frey, and P. Embrechts, Quantitative Risk Management: and Applications With MATLAB. Philadelphia, PA, USA: SIAM, 2014.
Concepts, Techniques and Tools, Rev. ed. Princeton, NJ, USA: Princeton
University Press, 2015.
[43] L. Condat, “Fast projection onto the simplex and the 1 ball,” Math.
Program., vol. 158, no. 1, pp. 575–585, 2016.
[44] D. P. Palomar and J. R. Fonollosa, “Practical algorithms for a family
of waterfilling solutions,” IEEE Trans. Signal Process., vol. 53, no. 2,
pp. 686–695, Feb. 2005.
[45] K. L. Judd, Numerical Methods in Economics. Cambridge, MA, USA:
MIT Press, 1998.
[46] L. Qi and J. Sun, “A nonsmooth version of Newton’s method,” Math.
Program., vol. 58, no. 1, pp. 353–367, 1993.
[47] F. H. Clarke, Optimization and Nonsmooth Analysis. Philadelphia, PA,
USA: SIAM, 1990. Xiwen Wang received the B.Sc. degree in elec-
[48] R. Varadhan and C. Roland, “Squared extrapolation methods tronic information of science and technology from
(SQUAREM): A new class of simple and efficient numerical schemes for Nanjing University, Nanjing, China in June 2019.
accelerating the convergence of the EM algorithm,” Dept. of Biostatistics He is currently pursuing the Ph.D. degree with the
Working Papers, Johns Hopkins University, Working Paper 63, 2004. Department of Electronic and Computer Engineer-
[49] M. Raydan and B. F. Svaiter, “Relaxed steepest descent and Cauchy- ing at the Hong Kong University of Science and
Barzilai-Borwein method,” Comput. Optim. Appl., vol. 21, no. 2, Technology. His research interests include convex
pp. 155–167, 2002. optimization and fast algorithms with applications
[50] R. Varadhan and C. Roland, “Simple and globally convergent methods in financial engineering, machine learning, and
for accelerating the convergence of any EM algorithm,” Scand. J. Statist., operations research.
vol. 35, no. 2, pp. 335–353, 2008.
[51] M. W. Brandt, A. Goyal, P. Santa-Clara, and J. R. Stroud, “A simulation
approach to dynamic portfolio choice with an application to learning
about return predictability,” Rev. Financial Stud., vol. 18, no. 3, pp. 831–
873, 2005.
[52] F. Cong and C. W. Oosterlee, “Multi-period mean–variance portfolio Rui Zhou (Member, IEEE) received the B.Eng.
optimization based on Monte-Carlo simulation,” J. Econ. Dyn. Control, degree in information engineering from Southeast
vol. 64, pp. 23–38, 2016. University, Nanjing, China, in 2017, and the Ph.D.
[53] J. Skaf and S. Boyd, “Multi-period portfolio optimization with con- degree from the Hong Kong University of Sci-
straints and transaction costs,” unpublished, 2009. ence and Technology, Hong Kong, in 2021. He is
[54] P. J. Mercurio, Y. Wu, and H. Xie, “An entropy-based approach to currently a Research Scientist with the Shenzhen
portfolio optimization,” Entropy, vol. 22, no. 3, p. 332, 2020. Research Institute of Big Data, Shenzhen, China.
[55] Y.-l. Kang, J.-S. Tian, C. Chen, G.-Y. Zhao, Y.-f. Li, and Y. Wei, His research interests include optimization algo-
“Entropy based robust portfolio,” Physica A: Stat. Mech. Appl., vol. rithms, statistical signal processing, machine learn-
ing, and financial engineering.
583, 2021, Art. no. 126260.

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.
3740 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 71, 2023

Jiaxi Ying received the Ph.D. degree from the Daniel P. Palomar (Fellow, IEEE) received the
Department of Electronic and Computer Engineer- bachelor’s degree in electrical engineering and the
ing at the Hong Kong University of Science and Ph.D. degree from the Technical University of
Technology, Hong Kong, in 2022. He is currently a Catalonia (UPC), Barcelona, Spain, in 1998 and
Postdoctoral Fellow with the Department of Mathe- 2003, respectively. He was a Fulbright Scholar at
matics at the same university. He was the recipient Princeton University during 2004–2006. He is a
of the Outstanding Master’s Thesis Award of Chi- Professor with the Department of Electronic &
nese Institute of Electronics, the Excellent Master Computer Engineering and the Department of In-
Thesis in Fujian Province, and the HKUST RedBird dustrial Engineering & Decision Analytics at the
Ph.D. Scholarship Program. His research interests Hong Kong University of Science and Technology
are mainly on the intersection of optimization, ma- (HKUST), Hong Kong, where he joined in 2006. He
chine learning, signal processing, and statistics. had previously held several research appointments, namely, at King’s College
London (KCL), London, U.K.; Stanford University, Stanford, CA, USA;
Telecommunications Technological Center of Catalonia (CTTC), Barcelona,
Spain; Royal Institute of Technology (KTH), Stockholm, Sweden; University
of Rome “La Sapienza,” Rome, Italy; and Princeton University, Princeton,
NJ, USA. His current research interests include applications of optimization
theory, graph methods, and signal processing in financial systems and big data
analytics. He was a recipient of a 2004/2006 Fulbright Research Fellowship,
the 2004, 2015, and 2020 (co-author) Young Author Best Paper Awards
by the IEEE Signal Processing Society, the 2015–2016 HKUST Excellence
Research Award, the 2002/2003 best Ph.D. prize in information technologies
and communications by the Technical University of Catalonia (UPC), the
2002/2003 Rosina Ribalta first prize for the Best Doctoral Thesis in infor-
mation technologies and communications by the Epson Foundation, and the
2004 prize for the best Doctoral Thesis in advanced mobile communications
by the Vodafone Foundation and COIT. He has been a Guest Editor of
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 2016 special
issue on “Financial Signal Processing and Machine Learning for Electronic
Trading,” IEEE SIGNAL PROCESSING MAGAZINE 2010 special issue on “Convex
Optimization for Signal Processing,” IEEE JOURNAL ON SELECTED AREAS IN
COMMUNICATIONS 2008 special issue on “Game Theory in Communication
Systems,” and IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS
2007 special issue on “Optimization of MIMO Transceivers for Realistic
Communication Networks,” and an Associate Editor of IEEE TRANSACTIONS
ON INFORMATION THEORY and IEEE TRANSACTIONS ON SIGNAL PROCESSING.

Authorized licensed use limited to: Pontificia Universidad Javeriana. Downloaded on April 02,2024 at 00:12:07 UTC from IEEE Xplore. Restrictions apply.

You might also like