18.A Deep Probabilistic Transfer Learning Framework For Soft Sensor Modeling With Missing Data
18.A Deep Probabilistic Transfer Learning Framework For Soft Sensor Modeling With Missing Data
Abstract— Soft sensors have been extensively developed and are difficult to obtain. On the contrary, with the availability
applied in the process industry. One of the main challenges of of massive process measurements, data-driven soft sensors
the data-driven soft sensors is the lack of labeled data and the have been extensively studied and successfully applied to the
need to absorb the knowledge from a related source operating
condition to enhance the soft sensing performance on the target process industry [4]–[6].
application. This article introduces deep transfer learning to Currently, many data-driven approaches have been
soft sensor modeling and proposes a deep probabilistic transfer established for industrial process modeling through machine
regression (DPTR) framework. In DPTR, a deep generative learning algorithms [7]–[10]. However, as most of them
regression model is first developed to learn Gaussian latent are developed from statistical approaches, an underlying
feature representations and model the regression relationship
under the stochastic gradient variational Bayes framework. Then, assumption is that the training and testing data are drawn
a probabilistic latent space transfer strategy is designed to reduce from the same distribution. In practice, because of the changes
the discrepancy between the source and target latent features in operating conditions, the distribution of the data collected
such that the knowledge from the source data can be explored from the new operating condition (target domain) may show
and transferred to enhance the target soft sensor performance. certain discrepancies in comparison with the data collected
Besides, considering the missing values in the process data in
the target operating condition, the DPTR is further extended to from the original condition (source domain) [11]–[13]. This
handle the missing data problem utilizing the strong generation discrepancy can lead to inferior soft sensing performance on
and reconstruction capability of the deep generative model. The the target application if using models built under the source
effectiveness of the proposed method is validated through an domain. Under such circumstances, the soft sensor model has
industrial multiphase flow process. to be rebuilt from scratch with the training data collected from
Index Terms— Deep learning, industrial processes, missing the new target domain. This strategy, on one hand, would
data, probabilistic transfer learning (TL), soft sensor. be costly in terms of computation. On the other hand, it is
generally expensive or even impossible to collect sufficient
I. I NTRODUCTION labeled data in a new operating condition within a short time
period.
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
suboptimal performances in comparison with the end-to-end modeling strategy with consideration of missing data is devel-
deep soft sensors [23]. oped under the TL framework to fully exploit the generation
In industrial applications, missing data are also a significant and reconstruction capacities of the deep model. This work
problem that should be considered for soft sensing across contributes in two aspects.
different operating conditions [3]. Due to possible failures or 1) A DPTR framework is developed to tackle the distribu-
in maintenance of hardware sensors or signal transmission tion discrepancy in developing soft sensors considering
errors, missing values are frequently encountered in certain data uncertainties. Thus, the probabilistic latent space
variables, leading to incomplete training samples [24]. In this and model knowledge in the source domain can be
case, most of the machine learning or TL methods become transferred for enhancing the performance of the soft
difficult to implement because of the assumption that the sensor in the target domain.
training samples should be structurally complete. To deal 2) The missing data are explored under the TL framework
with the missing data problem, the downsampling technique for industrial soft sensor modeling. With the capability
provides a simple solution by discarding or lifting the records of data generation and reconstruction, the proposed
with missing values. This solution, although it can be readily method can naturally impute the missing values that are
implemented in practice, can generally cause information prevalent in industrial applications.
loss or asymmetry. Thus, instead of discarding data directly, The remainder of this article starts with brief introductions
imputing the missing values has become a popular alterna- to the related works in Section II. In Section III, the detailed
tive [3] and various approaches have been developed for data description of the DPTR using both complete data and missing
imputation. Representative methods include principal compo- data is introduced. Section IV presents the illustrations on a
nent analysis (PCA)-based approaches, such as probabilistic multiphase flow process (MFP) to verify the efficacy of the
PCA [25]. In recent years, because of the efficacy in nonlinear DPTR. Finally, Section V concludes this article.
information processing and data reconstruction, deep models,
such as autoencoders (AEs), have been actively researched
for performing data imputation and soft sensor modeling [26]. II. R ELATED W ORKS
Despite the popularity, most of the deep models are developed A. Data-Driven Soft Sensors
in a deterministic fashion. This means that the deterministic Data-driven soft sensors have been widely developed in
feature representations are expressed, which do not contain recent years [3], [4]. To predict the key quality-relevant
uncertainty information and thus may lead to weak robust- variables in industrial processes, data-driven soft sensors gen-
ness of the soft sensors [27]. As a probabilistic counterpart erally build regressive models between those easy-to-measure
of the deterministic AE, the variational autoencoder (VAE) process variables and the hard-to-measure quality variables
[28] naturally provides an uncertain data modeling solution based on the offline training data [23]. With the advances
for industrial applications [29], [30]. Developed under the of machine learning techniques, variable approaches have
framework of stochastic gradient variational Bayes (SGVB), been proposed for soft sensor developments. The most
the VAE learns nonlinear reconstructive latent variables that representative machine learning-based soft sensors include
are expected to follow standard normal distributions. There- PLS [4], [7], support vector regression [8], and slow feature
fore, the capacities of characterization, reconstruction, and analysis [9], [31].
complex nonlinear feature extraction of uncertain data make In recent years, due to the capability of learning complex
it more suitable for probabilistic TL with missing data. feature representations, deep neural networks (DNNs) have
In this article, a deep probabilistic transfer regres- been extensively researched and applied in industrial appli-
sion (DPTR) framework is proposed for the TL problem in cations [32]. The widely used model structures for building
modeling soft sensors with missing data. First, formulating industrial soft sensors include AEs [6], [23], VAEs [27], [30],
the learning objective under the framework of SGVB, a deep and recurrent neural networks (NNs) [33], [34]. For example,
generative regression model (DGRM) is developed in an Yan et al. [32] pretrained multiple denoising AEs as the
end-to-end fashion, which is structured by deep EncoderNet, backbone, and then, an additional output layer is added for
PriorNet, and DecoderNet. The developed model can not only prediction of the oxygen content in flue gases. Yuan et al. [23]
extract distributions of nonlinear feature representations from developed a quality-driven AE-based soft sensor, and the effec-
raw inputs but also model the data regression relationship tiveness is validated through an industrial debutanizer column
for the labeled training data. Second, a probabilistic latent case. Despite these advances, the conventional data-driven and
space transfer strategy is designed to make the probabilistic machine learning-based soft sensors generally assume that the
latent features transferrable across different operating condi- training and testing data are drawn from the same distribution,
tions. The AdversaryNet is designed to reduce the discrep- which is challenged in many industrial applications due to the
ancy between the probabilistic representations under different changes in operating conditions.
operating conditions together with the sampling-based repara-
meterization trick. Thus, the DPTR model can be established
based on the DGRM and the probabilistic latent space transfer B. Domain Adaptation
strategy, which can then be deployed for the soft sensor task Domain adaptation aims to minimize the distribution dis-
under the target operating condition. Furthermore, considering crepancy between the source and the target domain such
missing values in certain sampling instances, a regression that the knowledge from the relevant source domain can
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
be adopted and transferred to the target domain [15], [16]. Based on these two models, which are also referred to as
As a representative technique of TL [14], domain adaptation probabilistic encoder and decoder in the literature, VAE aims
algorithms have been widely developed and applied to many to optimize the variational Evidence Lower BOund (ELBO)
fields, such as computer vision [16] and natural language on the marginal likelihood. Specifically, the data likelihood is
processing [17]. calculated as the sum over all the marginal likelihoods of the
Many conventional machine learning-based domain adapta- samples
tion algorithms have been developed [16], [35]. For example,
N
Pan et al. [15] designed a transfer component analysis method log pθ (x 1 , x 2 , . . . , x N ) = log pθ (x i ) (1)
for extracting transferrable components, and then, the clas- i=1
sifiers or regression models in the source domain can be
applied to the target domain. In the locality-preserving joint where N signifies the number of samples and each marginal
likelihood log pθ (x i ) is given by
transfer method, the feature and sample levels of knowledge
transfer are jointly considered and optimized to improve the log pθ (x i ) = DKL (qφ (z|x i )|| pθ (z|x i )) + L(θ, φ; x i ). (2)
performance [16]. In recent years, with the rapid progress
of deep learning techniques, the deep learning-based domain In the right-hand side (RHS) of (2), the first term indicates
adaption has attracted considerable attention. For example, the Kullback–Leibler (KL) divergence between the approx-
Li et al. [36] developed faster domain adaptation networks to imation and the true posterior. Because the KL term is
deal with the limited computing resource problem and acceler- nonnegative, the ELBO on log pθ (x i ) can then be formulated
ate the knowledge adaptation. Among these deep model-based as
domain adaptation methods, common solutions include opti- L(θ, φ; x i ) = Eqφ (z|x i ) [− log qφ (z|x i ) + log pθ (x i , z)]
mizing the distribution discrepancy metrics [11], [37] and
using the domain adversarial training (DAT) [38]. For the = −DKL [qφ (z|x i )|| pθ (z)] + Eqφ (z|x i ) [log pθ (x i |z)]
discrepancy metric-based methods, existing approaches gen- (3)
erally embed the metrics into the deep structures to learn
where θ and φ are optimized such that
transferable feature spaces. For example, by incorporating the
multikernel maximum mean discrepancy (MMD) metric into
N
the loss function of DNNs, Long et al. [39] proposed a deep θ ∗ , φ ∗ = arg max L(θ, φ; x i ). (4)
θ,φ
adaptation network to extract transferable features, and an i=1
optimal multikernel selection strategy is designed to further To accomplish the optimization, note that the second RHS
improve the feature matching performance. Recently, different term in (3) requires sampling a latent variable z ∼ qφ (z|x),
from the abovementioned solution, the DAT that inspired while this is problematic in practice as the random sampling is
by generative adversarial nets [40] is utilized to make the nondifferentiable in NN training. Typically, the approximation
feature representations from the source and target domains qφ (z|x) is designed as some parameterized distribution, and
unrecognizable. It draws wide research interests due to the it is possible to introduce an auxiliary variable ε to sample a
superior performance in comparison with some MMD-based deterministic z from x using z = gφ (ε, x). This reparameter-
works [38] and less need of specified hyperparameters, such ization trick yields a low-variance SGVB estimator, which is
as the kernel parameters in MMD [15], [37], [39]. Specifically, the learning objective of the VAE model
it has been applied to the manufacturing processes and demon-
strated prominent effectiveness. For example, Li et al. [41] L̃(θ, φ; x i ) = −DKL (qφ (z|x i )|| pθ (z))
1
L
developed a DAT-based method to leverage the knowledge
from different but related equipment to improve the diagnostic + Eqφ (z|x i ) log pθ x i |z li (5)
L l=1
performance in rotating machinery. By treating each grade as
a domain in the multigrade industrial processes, Liu et al. [22] where z li = gφ (εl , x i ) with εl sampled from the independent
used the DAT to extract transferrable representations, which marginal distribution p(ε) and L is the number of samplings.
benefits the performance of industrial quality inferring. Due Generally, VAE assumes the prior pθ (z) = N (0, I ) and the
to its advantages, popularities, and effectiveness in man- true posterior is a multivariate Gaussian distribution, and the
ufacturing process modeling, this article follows the DAT approximate posterior is also a multivariate Gaussian with an
approach and developed a probabilistic counterpart to model isotropic covariance qφ (z|x i ) = N (μi , (σ i )2 I ). Then, z i can
the cross-domain soft sensor of process data. be sampled using z li = μi + σ i εl , where εl ∼ N (0,I ).
indicates the elementwise multiplication.
To incorporate supervision information into the VAE model,
C. Variational Autoencoders
the conditional VAE (CVAE) [42]–[44] extends the distrib-
The VAE [28] is an NN that is composed of a recognition utions to dependence on external information, i.e., condition
model qφ (z|x) that approximates the intractable posterior label c. The ELBO of CVAE can be readily formulated through
pθ (z|x) and a generation model pθ (x|z) that provides a the following expression according to (3):
distribution for the generated x, where φ and θ signify the
variational and generative parameters, i.e., the parameters log pθ (x|c) ≥ −DKL [qφ (z|x, c)|| pθ (z|c)]
regarding the recognition and the generation processes [28]. + Eqφ (z|x,c) [log pθ (x|z, c)]. (6)
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
III. M ETHODOLOGY
In this section, a DGRM is first designed to obtain the
soft sensor in a supervised end-to-end fashion. Second, a
probabilistic latent space transfer strategy is developed to drive
the probabilistic features to be transferrable such that the
soft sensor built in the source domain can be adapted to the
application in the target operating condition. Then, a DPTR
Fig. 1. Graphical models of (a) plain VAE, (b) CVAE, (c) DGRM, and
model can be established to conduct soft sensing across (d) DGRM-MD. The black arrowed lines denote the generative process.
different operating conditions. Finally, taking the missing data The green and red arrowed lines indicate the conditional prior of z and the
problem into account, we extend the proposed method to the approximate inference of z, respectively.
TL scenario with missing data.
the recognition process qφ (z|x, y) in the DGRM. Specifically,
we refer qφ (z|x, y) as an EncoderNet, which is parameterized
A. Deep Generative Regression Model by W E . For the generative process pθ (y|z), it is realized using
In this section, we present the DGRM that extends the plain a DecoderNet parameterized by W D . The prior density pθ (z|x)
VAE to a probabilistic soft sensor model under the SGVB is provided by PriorNet parameterized by W P .
framework. The plain VAE is an unsupervised generative For analytically solving the KL term in (8), similar to the
model aiming at maximizing the data likelihood log pθ (x). plain VAE, we assume the prior density to be a parameterized
Considering the supervised regression task in soft sensing, Gaussian distribution N (μx , σx2 I ). The parameters μx and σx
we first deduce the DGRM under the SGVB framework, whose are estimated using the PriorNet in DGRM. The approximated
graphical model is shown in Fig. 1 in comparison with VAE posterior qφ (z|x, y) is multivariate Gaussian with isotropic
and CVAE. covariance N (μxy , σxy 2
I ), where μxy and σxy are realized
Specifically, given a labeled sample pair {x, y}, let using EncoderNet in DGRM. Denoting J as the dimensionality
the hidden variable z be conditioned on x in DGRM. of z, then the KL divergence in (8) is given by
In the inference phase, a hidden variable z is sampled from
the prior density pθ (z|x). Then, the output y can be generated −DKL qφ (z|x, y)|| pθ (z|x)
based on pθ (y|z). Given the description of the DGRM, whose N μxy , σxy 2
I
= − N μxy , σxy I log
2
dz
graphical model is shown in Fig. 1(c), the goal is to optimize N μx , σx2 I
the parameters such that the conditional likelihood pθ (y|x) is ⎡ 2 2 ⎤
( j) ( j) 2
maximized. To this end, for an individual sample pair {x, y}, 1 ⎢
J σ σ μxy − μx
xy xy
⎥
we first deduce the conditional log likelihood as follows: = ⎣log 2 − 2 − 2 + 1⎦.
2 j =1 ( j) ( j) ( j)
σx σx σx
log pθ (y|x)
(9)
pθ (x, y, z) qφ (z|x, y)
= Eqφ (z|x,y) log
pθ (z|x, y) pθ (x) qφ (z|x, y) According to (9), the SGVB lower bound estimator of the
DGRM in (8) can be rewritten as
pθ (y, z|x)
= Eqφ (z|x,y) log
qφ (z|x, y) LDGRM (W P , W E , W D ; x, y)
⎡ 2 2 ⎤
+ DKL qφ (z|x, y)|| pθ (z|x, y) ( j) ( j) 2
J σ σ μxy − μx
= ELBODGRM (θ, φ) + DKL qφ (z|x, y)|| pθ (z|x, y) . (7) 1 ⎢ xy xy
⎥
= ⎣log 2 − 2 − 2 + 1⎦
2 j =1 ( j) ( j) ( j)
In (7), the ELBO term indicates the variational lower bound σx σx σx
of log pθ (y|x). Following the model structure in Fig. 1(c) and 1
L
recalling that the output y is generated through the conditional + Eq (z|x,y) [log pθ (y|z l )] (10)
L l=1 φ
density pθ (y|z) in DRGM, the ELBO of the DGRM can be
further written as and the parameters W P , W E , and W D are optimized such that
ELBODGRM (θ, φ)
(
WP ,
WE , W D ) = arg max LDGRM (W P , W E , W D ; x, y).
W P ,W E ,W D
pθ (z|x)
= Eqφ (z|x,y) log + Eqφ (z|x,y) [log pθ (y|z)] (11)
qφ (z|x, y)
= −DKL qφ (z|x, y)|| pθ (z|x) + Eqφ (z|x,y) [log pθ (y|z)]. As the first RHS term in (10) forces qφ (z|x, y) to be close to
(8) the prior pθ (z|x), in the online testing phase, the latent variable
z can be obtained through the efficient mapping pθ (z|x) and
The ELBO of the DGRM method contains two parts. The the reparameterization technique directly [27], [44]. Moreover,
first RHS term performs regularization on the divergence in the proposed DGRM, the output y is generated through
between qφ (z|x, y) and pθ (z|x). The second RHS term is the probabilistic DecoderNet pθ (y|z). We note that this shares
an expected negative reconstruction error of y, given the a similar property as the conditional multimodal AE [45],
sampled hidden variable z. Multilayered perceptrons are used which naturally enables us to transfer the latent space z across
to realize the generative processes pθ (z|x) and pθ (y|z) and datasets collected under different operating conditions and thus
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
end, we first assign an operating condition label a i ∈ {0, 1} =− a log G A W A ; z li +(1 − a i ) log 1−G A W A ; z li
L l=1
to each x Si and x Ti to denote whether the sample belongs to
the source or target domain dataset, considering the operating (13)
condition information is readily available prior to the training where L is the number of samplings, z̃ i denotes the empirical
phase. Thus, a training sample is represented by {x i , y i , a i }. sampled z for the i th sample, and each z is achieved by
Then, the latent variables of both datasets can be trained
adversarially to enable the adaptation from source to target z li = μixy + εl σxy
i
, εl ∼ N (0, I ). (14)
domain motivated by the deterministic DAT idea proposed by
Ganin et al. [38]. Let G A (W A ; z i ) parameterized by W A be an C. DPTR-Based Soft Sensor
operating condition label predictor modeled by an NN, which In this section, we will present the DPTR model along with
is termed AdversaryNet in this article. Then, the logistic loss the corresponding soft sensing procedure. The basic idea is to
function can be used to measure the prediction loss of G A , learn a transferrable probabilistic latent space z for DGRM
which is formulated by such that the model knowledge from the source operating
condition can be adopted and transferred to the target operating
LDAT (W A ; z i ) = − a i log G A (W A ; z i )
condition where the labeled data are limited. To achieve
+ (1 − a i ) log 1 − G A W A ; z i . (12)
this goal, the DPTR integrates the EncoderNet, PriorNet,
The vanilla DAT strategy is motivated by GAN [40], which and DecoderNet in the DGRM and the AdversaryNet in the
aims to find a mapping from the input noises to the fake MC-DAT. Let d signifies the dimensionality of an input sample
samples by competing for the generator against the discrim- x, and a sketch of the DPTR is presented in Fig. 2.
inator. In the DAT framework, on one hand, the feature z i Based on the DGRM and the MC-DAT strategy, the overall
is optimized to fool the AdversaryNet G A (W A ; z i ), which objective function of the DPTR model is given by (15), as
thus makes the operating condition label of z i unrecognizable. shown at the bottom of the next page where the parameters
On the other hand, W A is optimized to make G A (W A ; z i ) W P , W E , and W D of the DGRM and W A of the AdversaryNet
accurately distinguish the operating condition label of an can be learned in an adversarial fashion such that
input z i . Such a competition is expected to achieve a Nash
WP ,
(
WE , W D ) = arg max LDPTR (W P , W E , W D , W A )
equilibrium to make the learned z i from both the source- and W P ,W E ,W D
target-domain datasets transferrable.
W A = arg min LDPTR (W P , W E , W D , W A ). (16)
WA
It is noted that the vanilla DAT is conducted on a
deterministic feature space. As we are interested in achiev- In the offline modeling phase, based on the avail-
ing probabilistic latent space transfer, a Monte Carlo DAT able data {X S , Y S } and {X T , Y T }, the DPTR model is
(MC-DAT) approach is designed in which the EncoderNet and trained according to (15) and (16) through gradient descent.
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Then, the well-trained PriorNet and DecoderNet are deployed = ELBODGRM−MD (θ, φ) + DKL qφ (z|x̄, y)|| pθ (z|x, y, x̄)
for online applications. ≥ −DKL qφ (z|x̄, y)|| pθ (z|x̄) + Eqφ (z|x̄,y) [log pθ (x, y|z)].
In the online application phase, denote x Ttest as the online
(17)
input sample in the target operating condition. The sample is
first passed through the PriorNet, and the latent variable can In comparison with the DGRM, note that these two model
then be obtained as z Ttest = E[z|x Ttest ]. Finally, the output is structures are designed for different scenarios. The DGRM is
estimated as ŷTtest for x Ttest with DecoderNet pθ (y|z Ttest ). developed for soft sensing using structurally complete data and
the DGRM-MD is used for soft sensing using incomplete data
with missing variables. Thus, for the ELBO of DGRM in (8),
D. Extension to Transferring Soft Sensor With Missing Data
the goal is to estimate the soft sensor output y based on the
A TL-based probabilistic regression model DPTR has been structurally complete process variables x, and the DecoderNet
built so far. Like most of the TL methods, it uses structurally is realized through pθ (y|z). In (17), as missing variables are
complete data samples to build the model. In industrial appli- contained in x, both the recovery of clean x and prediction of
cations, however, missing data are a common problem due to y are crucial to build a valid soft sensor, and thus, DecoderNet
hardware sensor failures or signal transmission errors. Previous is realized using pθ (x, y|z).
works [27], [48] have applied VAE for data imputation, Similar to the DGRM, the optimization objective of DPTR
which used complete data to train a network in advance with missing data (DPTR-MD) can be achieved by substitut-
and then input the incomplete data iteratively or directly ing (17) for the DGRM loss in (15)
input the incomplete data into the stacked model. However,
the complete training data in the target operating condition LDPTR−MD = LDGRM−MD + LMC−DAT (18)
are generally very limited to learn a valid VAE model for where LDGRM−MD is defined by (19), as shown at the bottom
imputation. Note that there are generally sufficient complete of the next page.
data in the source domain, which are labeled and provide a The aim of the DGRM-MD method can be regarded as the
potential to build a valid soft sensor with data imputation reconstruction of x and prediction of y given the incomplete
ability. Here, the “labeled data” in the source domain refers x̄ under the SGVB framework, both of which are crucial to
to as the source-domain samples having the corresponding build a valid soft sensor with missing data. Thus, besides the
soft sensor outputs y and thus can be used for soft sensor restriction on the latent feature z, the loss is summed over
training. Specifically, for a complete sample x in the source the recovery of x and the prediction of y, as shown in (19).
and target domain, let x̄ signify the incomplete version in Many sophisticated strategies can be adopted to optimize this.
which certain variables are intentionally missed and y denote In this article, a two-stage training strategy is conducted on
the corresponding output of x̄. Thus, unlike the existing the reconstruction task of x and the prediction task of y. In the
VAE-based imputation methods, extending from the DGRM first stage, the losses regarding the latent space regularization
in (7) and (8), we consider the DGRM with missing data and the reconstruction of x in (19) are optimized to impute
(DGRM-MD) where the conditional probability pθ (x, y|x̄) is the missing values first. With the structurally complete source
maximized. The graphical model of DGRM-MD is presented and target data, in the second stage, the DPTR model can be
in Fig. 1(d). With the model description, the corresponding applied to learn probabilistic transferable feature representa-
ELBO of the DGRM-MD on the source data is deduced as tions across the source and target data by optimizing the losses
follows: with respect to the prediction of y and the latent regularization,
and the cross-domain soft sensor can thus be established.
log pθ (x, y|x̄)
To assess the soft sensing performance, two metrics, includ-
pθ (x, y, z|x̄) qφ (z|x̄, y)
= Eqφ (z|x̄,y) log ing root-mean-squared error (RMSE) and mean absolute error
pθ (z|x, y, x̄) qφ (z|x̄, y) (MAE), are used for evaluation
pθ (x, y, z|x̄)
= Eqφ (z|x̄,y) log 1 T
n test 2
qφ (z|x̄, y) i,test i,test
RMSE = y T − ŷ T (20)
+ DKL qφ (z|x̄, y)|| pθ (z|x, y, x̄) n test
T i=1
LDPTR (W P , W E , W D , W A ; X S , YS , X T , YT )
= LDGRM (W P , W E , W D ; X S , YS , X T , YT ) + LMC−DAT (W A ; Z S , Z T )
⎡ ⎛ 2 2 ⎞
i,( j ) i,( j ) i
+n T
n S J σ σ μ − μ i 2
1 ⎢1 ⎜ xy xy
xy x ⎟
= ⎣ ⎝log 2 − 2 − 2 + 1⎠
(n S + n T ) i=1 2 j =1 i,( j ) i,( j ) i,( j )
σx σx σx
⎤
1
L
⎥
+ Eqφ (z|x,y) log pθ y i |z li − a i log G A W A ; z li + (1 − a i ) log 1 − G A W A ; z li ⎦ (15)
L l=1
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
T
n test
1 i,test i,test
TABLE I
MAE = test y T − ŷ T (21) D ESCRIPTION OF THE I NPUT VARIABLES FOR THE MFP S YSTEM [49]
nT i=1
where n test
T signifies the number of testing data in the target
operating condition.
A. Process Description
The MFP system is developed to supply a controlled and
measured multiphase flow comprised of water, oil, and air
to a pressurized facility [49]. For monitoring and control
purposes, some hardware sensors have been installed in the
system, as shown in the simplified sketch of the facility
in Fig. 3. As shown in the figure, the MFP system consists
of a gas–liquid separator, a three-phase separator, and sev-
eral coalescers and storage tanks that are connected through
pipelines with various sizes and geometries. A flow mixture
consisting of water, oil, and air can be used as the input of the
MPF system at required flowrates. The test area of the facility TABLE II
consists of pipelines with different bore sizes and geometries D ESCRIPTION OF THE T WO O PERATING C ONDITIONS
and a gas and liquid separator at the top of a high platform. FOR THE MFP P LANT
The mixtures are separated in the three-phase separator at the
ground level. According to the system design, the pressure in
the three-phase separator is selected as the output variable of
the soft sensor model and 16 related variables in the MFP are
selected as input variables, which are listed in Table I with the
corresponding tags and units. The sampling period of all the
variables in the MFP system is 1 s. testing samples in the target operating condition. Specifically,
The MFP is an industrial system working under varying 10%, 30%, and 50% of samples in the target dataset are
operational conditions [49]. Typically, the two set points of corrupted. For each corrupted sample, three randomly selected
the process, including the airflow rate and the waterflow rate, variables are missed and replaced by random noises sampled
can be tuned to generate different steady operating conditions. from N (0, 1). The three missing levels are denoted as lightly,
Thus, this process provides a suitable case for evaluating the mediumly, and heavily missing cases in this article.
soft sensing performance of the developed DPTR framework.
Among the working conditions, considering the appropriate B. Experimental Setup
number of training samples, two conditions are selected as To assess the performance of the developed DPTR frame-
the source and target domain, respectively. The details of the work, two comparison experiments are designed using com-
designed task are given in Table II. n S = 1000 labeled samples plete data or missing data, respectively.
collected from the source operating condition are available and For the first experiment with complete source- and target-
can be used for knowledge transfer. There are 700 samples domain datasets for training, two groups of comparison meth-
collected from the target operating condition, among which ods are designed to fully explore the advantage of the DPTR.
the first 200 samples are utilized for training and the rest First, we select the conventional methods with no TL capacity.
500 samples are utilized as testing data. Besides, to assess the The VAE [28] and multilayer NN are combined as the baseline
performance of the proposed DPTR-MD, we consider three structure, in which the VAE model is utilized to extract
levels of missing data ratio in the training samples and online feature representations from the inputs, and then, the NN
⎛⎡ 2 2 ⎞⎤
i,( j ) i,( j ) i
+n T
n S J σ σ μ − μ i 2
1 ⎢1 ⎜ x̄ y x̄ y x̄ y x̄ ⎟⎥
LDGRM−MD = ⎣ ⎝log 2 − 2 − 2 + 1⎠⎦
n S + n T i=1 2 j =1 i,( j ) i,( j ) i,( j )
σx̄ σx̄ σx̄
L
1
+ Eqφ (z|x̄,y) log pθ x i , y i |z li (19)
L l=1
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
model is further trained for prediction. Specifically, as both For the second experiment with missing data, following the
datasets are available for training, the model trained with target existing work [3], [48], three common solutions, including data
domain data only (VAE-T) and the model trained with com- deletion, mean imputation, and VAE imputation, are compared.
bined source- and target-domain data (VAE-C) are designed Specifically, these methods are utilized to handle the missing
as the comparing methods. Besides, the recently proposed data first, and then, the soft sensor performance using the
VAE-regression (VAE-R) method [50] is used as a comparing proposed DPTR is compared with the proposed DPTR-MD
method, which uses both the source- and target-domain data at the three missing levels.
for training. Then, the second group includes methods with
TL capacity. Specifically, to verify the efficacy of the designed C. Experimental Results and Discussion
probabilistic modeling mechanism of the proposed DPTR, this The prediction performance on the testing dataset for the
group of methods includes two deterministic TL approaches, complete data experiment is presented in Table III, in which
i.e., the recently proposed DAELM [20] and an AE-DAT the evaluation performance based on at least three runs of
model by integrating the deterministic AE, multilayer NN, and experiments is reported. There are several observations from
the deterministic version of DAT [38]. For fair comparisons, Table III. First, in comparison with VAE-T, the other methods
the basic network structures of the probabilistic deep models, show a significantly better performance, due to the incorpora-
including VAE-T, VAE-C, VAE-R, and the proposed DPTR, tion of the source-domain training data. Second, although both
are set to be the same. The EncoderNet and PriorNet structures the source- and target-domain datasets are utilized for learning
for the DPTR are {17, 8, 4} and {16, 8, 4}, respectively. The the DAELM model, the shallow structure makes it less com-
DecoderNet and AdversaryNet structure of DPTR is {4, 1}. petitive when learning complex feature representations and
The decoder structure of the plain VAE models is {4, 16}. regression relationships compared with VAE-C, VAE-R, and
The structures of the AE and DAT model in AE-DAT are {16, DPTR. Third, among the deep methods with both source-
8, 4, 8, 16} and {4, 1}. The additional NN structure in VAE-T, and target-training data, the AE-DAT shows inferior perfor-
VAE-C, and AE-DAT are {4, 8, 10, 1}. The main challenge in mance in comparison with those probabilistic deep methods.
adversarial training is to balance the competing components Also, due to its separated learning of representations and
of the network [17]. In the experiments, equal weights are regressor, it is not very competitive than DAELM. Then, ben-
used on the two losses in the proposed DPTR and AE-DAT efited from capturing the uncertainty distribution and simul-
to verify the effectiveness. Other sophisticated weighting taneously learn the regressed latent vector and reconstructed
schemes for the domain adversary loss have also been reported inputs, the VAE-R shows the second-best performance among
recently [38]. To train the deep models, the Adam optimizer the methods, illustrating the effectiveness of deep generative
with a 0.0001 weight decay value is used. The learning rate is models in soft sensor modeling. Finally, in comparison with
selected from {0.01, 0.001}. The rectified linear unit (ReLU) the other methods, the proposed DPTR shows clear improve-
activation function is used in the hidden layers. The number ment in both RMSE and MAE metrics. The potential reason
of hidden nodes of DAELM is searched from {4, 8, 12}. The is twofold.
ridge parameters λS and λT of DAELM are selected from 1) The discrepancy between the source and target prob-
{10−1 , 10−2 , 10−4 , 10−8 }. abilistic feature spaces rather than deterministic data
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE III
P REDICTION P ERFORMANCE FOR THE S IX M ETHODS
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 12. (a) Visualization of raw data. (b) Feature visualization of the
proposed method. (c) Convergence procedure of the proposed method.
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
After that, the DPTR is further extended to the case where [15] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation
missing data exist in the target domain dataset. Thus, recon- via transfer component analysis,” IEEE Trans. Neural Netw., vol. 22,
no. 2, pp. 199–210, Feb. 2011.
struction and prediction knowledge learned from the source [16] J. Li, M. Jing, K. Lu, L. Zhu, and H. T. Shen, “Locality preserving joint
operating condition with sufficient data can be transferred transfer for domain adaptation,” IEEE Trans. Image Process., vol. 28,
to the target operating condition. The overall framework is no. 12, pp. 6103–6115, Dec. 2019.
[17] F. Alam, S. Joty, and M. Imran, “Domain adaptation with adversarial
jointly trained in an end-to-end fashion to transfer the model training and graph embeddings,” in Proc. 56th Annu. Meeting Assoc.
knowledge and abstract representative probabilistic features. Comput. Linguistics, 2018, pp. 1077–1087.
The application on an industrial multiphase flow dataset [18] C. Sun, M. Ma, Z. Zhao, S. Tian, R. Yan, and X. Chen, “Deep transfer
learning based on sparse autoencoder for remaining useful life prediction
demonstrates that the DPTR method is superior to the tra- of tool in manufacturing,” IEEE Trans. Ind. Informat., vol. 15, no. 4,
ditional deterministic TL methods such as DAELM and the pp. 2416–2425, Apr. 2019.
probabilistic method such as VAE trained on a combination [19] Z. Chai, C. Zhao, and B. Huang, “Multisource-refined transfer net-
work for industrial fault diagnosis under domain and category incon-
of datasets. For the missing data problem, the effectiveness sistencies,” IEEE Trans. Cybern., early access, May 25, 2021, doi:
of the DPTR-MD is verified through different missing lev- 10.1109/TCYB.2021.3067786.
els. Considering the multiple historical operating conditions, [20] Y. Liu, C. Yang, K. Liu, B. Chen, and Y. Yao, “Domain adaptation
future work can extend the deep probabilistic TL method to transfer learning soft sensor for product quality prediction,” Chemomet-
ric Intell. Lab. Syst., vol. 192, Sep. 2019, Art. no. 103813.
fully explore soft sensors across multiple domains. Besides, [21] J. Wang and C. Zhao, “Mode-cloud data analytics based transfer learning
exploring the other optimization strategies and more state-of- for soft sensor of manufacturing industry with incremental learning
the-art discrepancy metrics [52] would be an interesting topic, ability,” Control Eng. Pract., vol. 98, May 2020, Art. no. 104392.
[22] Y. Liu, C. Yang, M. Zhang, Y. Dai, and Y. Yao, “Development of
which deserves deep investigation in the future. adversarial transfer learning soft sensor for multigrade processes,” Ind.
Eng. Chem. Res., vol. 59, no. 37, pp. 16330–16345, Aug. 2020.
[23] X. Yuan, C. Ou, Y. Wang, C. Yang, and W. Gui, “A layer-wise
R EFERENCES data augmentation strategy for deep learning networks and its soft
sensor application in an industrial hydrocracking process,” IEEE
[1] P. Zhou, D. Guo, H. Wang, and T. Chai, “Data-driven robust M-LS-
Trans. Neural Netw. Learn. Syst., early access, Dec. 13, 2020, doi:
SVR-based NARX modeling for estimation and control of molten iron
10.1109/TNNLS.2019.2951708.
quality indices in blast furnace ironmaking,” IEEE Trans. Neural Netw.
Learn. Syst., vol. 29, no. 9, pp. 4007–4021, Sep. 2018. [24] W. Yu and C. Zhao, “Low-rank characteristic and temporal corre-
[2] B. Huang, Y. Qi, and A. K. M. M. Murshed, Dynamic Modeling and lation analytics for incipient industrial fault detection with missing
Predictive Control in Solid Oxide Fuel Cells: First Principle and Data- data,” IEEE Trans. Ind. Informat., early access, Apr. 27, 2020, doi:
based Approaches. Hoboken, NJ, USA: Wiley, 2013. 10.1109/TII.2020.2990975.
[3] P. Kadlec, B. Gabrys, and S. Strandt, “Data-driven soft sensors in the [25] S. Dray and J. Josse, “Principal component analysis with missing
process industry,” Comput. Chem. Eng., vol. 33, no. 4, pp. 795–814, values: A comparative survey of methods,” Plant Ecol., vol. 216, no. 5,
Apr. 2009. pp. 657–667, May 2015.
[4] C. Zhao, “A quality-relevant sequential phase partition approach for [26] V. Miranda, J. Krstulovic, H. Keko, C. Moreira, and J. Pereira, “Recon-
regression modeling and quality prediction analysis in manufacturing structing missing data in state estimation with autoencoders,” IEEE
processes,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 983–991, Trans. Power Syst., vol. 27, no. 2, pp. 604–611, May 2012.
Oct. 2014. [27] R. Xie, N. M. Jan, K. Hao, L. Chen, and B. Huang, “Supervised
[5] Y. Liu, C. Yang, Z. Gao, and Y. Yao, “Ensemble deep kernel learn- variational autoencoders for soft sensor modeling with missing data,”
ing with application to quality prediction in industrial polymeriza- IEEE Trans. Ind. Informat., vol. 16, no. 4, pp. 2820–2828, Apr. 2020.
tion processes,” Chemometric Intell. Lab. Syst., vol. 174, pp. 15–21, [28] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in
Mar. 2018. Proc. Int. Conf. Learn. Represent., 2014, pp. 1–14.
[6] X. Yuan, J. Zhou, B. Huang, Y. Wang, C. Yang, and W. Gui, “Hierar- [29] R. D. Camino, C. A. Hammerschmidt, and R. State, “Improving missing
chical quality-relevant feature representation for soft sensor modeling: data imputation with deep generative models,” 2019, arXiv:1902.10666.
A novel deep learning strategy,” IEEE Trans. Ind. Informat., vol. 16, [Online]. Available: http://arxiv.org/abs/1902.10666
no. 6, pp. 3721–3730, Jun. 2020. [30] B. Shen, L. Yao, and Z. Ge, “Nonlinear probabilistic latent variable
[7] C. Zhao, F. Wang, Z. Mao, N. Lu, and M. Jia, “Quality prediction regression models for soft sensor application: From shallow to deep
based on phase-specific average trajectory for batch processes,” AIChE structure,” Control Eng. Pract., vol. 94, Jan. 2020, Art. no. 104198.
J., vol. 54, no. 3, pp. 693–705, Mar. 2008. [31] Y. Qin, C. Zhao, and B. Huang, “A new soft-sensor algorithm with con-
[8] C. Shang, X. Gao, F. Yang, and D. Huang, “Novel Bayesian framework current consideration of slowness and quality interpretation for dynamic
for dynamic soft sensor based on support vector machine with finite chemical process,” Chem. Eng. Sci., vol. 199, no. 18, pp. 28–39,
impulse response,” IEEE Trans. Control Syst. Technol., vol. 22, no. 4, May 2019.
pp. 1550–1557, Jul. 2014. [32] W. Yan, D. Tang, and Y. Lin, “A data-driven soft sensor modeling
[9] J. Corrigan and J. Zhang, “Integrating dynamic slow feature analysis method based on deep learning and its application,” IEEE Trans. Ind.
with neural networks for enhancing soft sensor performance,” Comput. Electron., vol. 64, no. 5, pp. 4237–4245, May 2017.
Chem. Eng., vol. 139, Aug. 2020, Art. no. 106842. [33] L. Feng, C. Zhao, and Y. Sun, “Dual attention-based encoder-decoder: A
[10] C. Zhao, W. Wang, C. Tian, and Y. Sun, “Fine-scale modelling and customized sequence-to-sequence learning for soft sensor development,”
monitoring of wide-range nonstationary batch processes with dynamic IEEE Trans. Neural Netw. Learn. Syst., early access, Aug. 24, 2020, doi:
analytics,” IEEE Trans. Ind. Electron., early access, Jul. 21, 2020, doi: 10.1109/TNNLS.2020.3015929.
10.1109/TIE.2020.3009564. [34] X. Yuan, L. Li, Y. A. W. Shardt, Y. Wang, and C. Yang, “Deep
[11] W. Lu, B. Liang, Y. Cheng, D. Meng, J. Yang, and T. Zhang, “Deep learning with spatiotemporal attention-based LSTM for industrial soft
model based domain adaptation for fault diagnosis,” IEEE Trans. Ind. sensor model development,” IEEE Trans. Ind. Electron., vol. 68, no. 5,
Electron., vol. 64, no. 3, pp. 2296–2305, Mar. 2017. pp. 4404–4414, May 2021.
[12] Z. Chai and C. Zhao, “A fine-grained adversarial network method for [35] J. Li, K. Lu, Z. Huang, L. Zhu, and H. T. Shen, “Heterogeneous domain
cross-domain industrial fault diagnosis,” IEEE Trans. Autom. Sci. Eng., adaptation through progressive alignment,” IEEE Trans. Neural Netw.
vol. 17, no. 3, pp. 1432–1442, Jul. 2020. Learn. Syst., vol. 30, no. 5, pp. 1381–1391, May 2019.
[13] C. Zhao, J. Chen, and H. Jing, “Condition-driven data analytics [36] J. Li, M. Jing, H. Su, K. Lu, L. Zhu, and H. T. Shen, “Faster domain
and monitoring for wide-range nonstationary and transient continuous adaptation networks,” IEEE Trans. Knowl. Data Eng., early access,
processes,” IEEE Trans. Autom. Sci. Eng., early access, Aug. 4, 2021, Feb. 19, 2021, doi: 10.1109/TKDE.2021.3060473.
doi: 10.1109/TASE.2020.3010536. [37] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola,
[14] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. “A kernel method for the two-sample problem,” in Proc. Adv. Neural
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010. Inf. Process. Syst., 2007, pp. 513–520.
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[38] Y. Ganin et al., “Domain-adversarial training of neural networks,” Chunhui Zhao (Senior Member, IEEE) received
J. Mach. Learn. Res., vol. 17, no. 1, pp. 2030–2096, 2016. the Ph.D. degree from Northeastern University,
[39] M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable Shenyang, China, in 2009.
features with deep adaptation networks,” in Proc. Int. Conf. Mach. From 2009 to 2012, she was a Post-Doctoral Fel-
Learn., 2015, pp. 97–105. low with The Hong Kong University of Science and
[40] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural Technology, Hong Kong, and the University of Cal-
Inf. Process. Syst., 2014, pp. 2672–2680. ifornia at Santa Barbara, Santa Barbara, CA, USA.
[41] X. Li, W. Zhang, Q. Ding, and X. Li, “Diagnosing rotating machines Since January 2012, she has been a Professor with
with weakly supervised data using deep transfer learning,” IEEE Trans. the College of Control Science and Engineering,
Ind. Informat., vol. 16, no. 3, pp. 1688–1697, Mar. 2020. Zhejiang University, Hangzhou, China. Her research
[42] X. Yan, J. Yang, K. Sohn, and H. Lee, “Attribute2Image: Conditional interests include statistical machine learning and
image generation from visual attributes,” in Proc. Eur. Conf. Comput. data mining for industrial applications. She has authored or coauthored more
Vis., 2016, pp. 776–791. than 140 articles in peer-reviewed international journals.
[43] K. Sohn, H. Lee, and X. Yan, “Learning structured output representation Dr. Zhao has served as a Senior Editor of Journal of Process Control and
using deep conditional generative models,” in Proc. Adv. Neural Inf. Associate Editor of two international journals, Control Engineering Practice
Process. Syst., 2015, pp. 3483–3491. and Neurocomputing.
[44] C. Du, B. Chen, B. Xu, D. Guo, and H. Liu, “Factorized discriminative
conditional variational auto-encoder for radar HRRP target recognition,”
Signal Process., vol. 158, pp. 176–189, May 2019.
[45] G. Pandey and A. Dukkipati, “Variational methods for conditional
multimodal deep learning,” in Proc. Int. Joint Conf. Neural Netw. Biao Huang (Fellow, IEEE) received the B.Sc. and
(IJCNN), May 2017, pp. 308–315. M.Sc. degrees in automatic control from the Beijing
[46] J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt, and B. Schölkopf, University of Aeronautics and Astronautics, Beijing,
“Correcting sample selection bias by unlabeled data,” in Proc. Adv. China, in 1983 and 1986, respectively, and the Ph.D.
Neural Inf. Process. Syst., 2006, pp. 601–608. degree in process control from the University of
[47] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and Alberta, Edmonton, AB, Canada, in 1997.
J. W. Vaughan, “A theory of learning from different domains,” Mach. In 1997, he joined the Department of Chemical
Learn., vol. 79, nos. 1–2, pp. 151–175, May 2010. and Materials Engineering, University of Alberta,
[48] J. T. McCoy, S. Kroon, and L. Auret, “Variational autoencoders for as an Assistant Professor, where he is currently
missing data imputation with application to a simulated milling circuit,” a Professor and the NSERC Industrial Research
IFAC-PapersOnLine, vol. 51, no. 21, pp. 141–146, 2018. Chair in Control of Oil Sands Processes. He has
[49] C. Ruiz-Cárcel, Y. Cao, D. Mba, L. Lao, and R. T. Samuel, “Statistical applied his expertise extensively in industrial practice. His current research
process monitoring of a multiphase flow facility,” Control Eng. Pract., interests include process control, system identification, control performance
vol. 42, pp. 74–88, Sep. 2015. assessment, Bayesian methods, and state estimation.
[50] Y. Yoo, S. Yun, H. J. Chang, Y. Demiris, and J. Y. Choi, “Variational Dr. Huang is a fellow of the Canadian Academy of Engineering and the
autoencoded regression: High dimensional regression of visual data on Chemical Institute of Canada. He was a recipient of the Germany’s Alexan-
complex manifold,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. der von Humboldt Research Fellowship, the Canadian Chemical Engineer
(CVPR), Jul. 2017, pp. 3674–3683. Society’s Syncrude Canada Innovation and D. G. Fisher Awards, the APEGA
[51] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Summit Research Excellence Award, the University of Alberta McCalla and
J. Mach. Learn. Res., vol. 9, nos. 2579–2605, p. 85, 2008. Killam Professorship Awards, the Petro-Canada Young Innovator Award, and
[52] J. Li, E. Chen, Z. Ding, L. Zhu, K. Lu, and H. T. Shen, “Maximum den- the Best Paper Award from the Journal of Process Control.
sity divergence for domain adaptation,” IEEE Trans. Pattern Anal. Mach.
Intell., early access, Apr. 28, 2020, doi: 10.1109/TPAMI.2020.2991050.
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:26:41 UTC from IEEE Xplore. Restrictions apply.