CH 10
CH 10
10 时间序列分析进阶深度学习方法
深度学习模块和训练方式的改进
神经网络时间序列预测模型
Transformer
InFormer
CNN AutoFormer
FedFormer
TCN
RNN ……
DeepAR,
MQ-RNN
MLP
NHITS
Linear
神经网络时间序列预测模型
Transformer
InFormer
CNN AutoFormer
FedFormer
TCN
RNN Transformer
DeepAR, PathTST
MQ-RNN CrossFormer
MLP CNN
iTransformer
NHITS MICN
RNN
Linear TimesNet
SegRNN
Modern TCN
MLP
MTS-Mixers
Linear TSMixer
DLinear
Rlinear
概要
1. 时间序列模型的
2. MLP模型 3. RNN模型
预处理
4. CNN模型 5. 注意力模型 ……
概要
3. Channel
1.线性模型 2. Normalization
Independent
DLinear
▪ 基于长为𝐿𝐿历史数据,预测长为𝑇𝑇未来数据
▪ 历史数据:𝒀𝒀old ∈ ℝ𝐿𝐿×𝑑𝑑
▪ 目标数据:𝒀𝒀𝑛𝑛𝑛𝑛𝑛𝑛 ∈ ℝ𝑇𝑇×𝑑𝑑
▪ 模型:𝑾𝑾 ∈ ℝ𝑇𝑇×𝐿𝐿
𝒀𝒀𝑛𝑛𝑛𝑛𝑛𝑛 = 𝑊𝑊 𝒀𝒀𝑜𝑜𝑜𝑜𝑜𝑜
▪ LTSF-Linear
▪ Dlinear:分解为Trend以及Remaining,分别做
Linear
▪ Nlienar: 在Naive1的基础上做Linear
Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu. Are Transformers Effective for Time Series Forecasting? AAAI 2023.
Properties of The Linear Model
▪ The time-step-dependent linear model, despite its simplicity, proves to be highly effective in
modeling temporal patterns.
▪ Conversely, even though recurrent or attention architectures have high representational capacity,
achieving time-step independence is challenging for them. They usually overfit on the data
instead of solely considering the positions.
Si-An Chen et al. TSMixer: An all-MLP Architecture for Time Series Forecasting. TMLR (2023)
Properties of The Linear Model
▪ A single linear layer can also effectively learn periodic patterns
linear mapping can predict periodic signals when the length of the input historical sequence
is not less than the period, but that is not a unique solution.
Zhe Li et al. Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping. CoRR abs/2305.10721 (2023)
Properties of The Linear Model
▪ A single linear layer can also effectively learn periodic patterns
the linear model fits seasonality well but performs poorly on the trend
Properties of The Linear Model
▪ The linear model fits seasonality well but performs poorly on the trend
Reversible Instance Normalization (RevIN)
The predictions of the baselines are inaccurately (a) shifted and (b) scaled
Taesung Kim et al. Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. ICLR 2022
RevIN
The (a-3) non-stationary information includes statistical properties from
the input data: mean 𝜇𝜇, variance 𝜎𝜎 2 , and learnable affine parameters 𝛾𝛾, 𝛽𝛽.
The normalization layer transforms the (b-1) original data Using 𝑥𝑥ˆ , the model predicts the future values 𝑦𝑦˜
distribution into a (b-2) mean-centered distribution, where the following the (b-3) distribution where non-stationary
distribution discrepancy between different instances is reduced. information is eliminated.
Taesung Kim et al. Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. ICLR 2022
RevIN
▪ Let 𝐾𝐾, 𝑇𝑇𝑥𝑥 and 𝑇𝑇𝑦𝑦 denote the number of variables, the input sequence length, and the model
prediction length, 𝑥𝑥 (𝑖𝑖) ∈ ℝ𝐾𝐾×𝑇𝑇𝑥𝑥 → 𝑦𝑦 𝑖𝑖
∈ ℝ𝐾𝐾×𝑇𝑇𝑦𝑦
▪ For 𝑥𝑥 (𝑖𝑖) ,
𝑇𝑇𝑥𝑥 𝑇𝑇𝑥𝑥
(𝑖𝑖) 1 (𝑖𝑖) (𝑖𝑖) 1 (𝑖𝑖) (𝑖𝑖) 2
𝔼𝔼𝑡𝑡 𝑥𝑥𝑘𝑘𝑘𝑘 = � 𝑥𝑥𝑘𝑘𝑘𝑘 and Var 𝑥𝑥𝑘𝑘𝑘𝑘 = � 𝑥𝑥𝑘𝑘𝑘𝑘 − 𝔼𝔼𝑡𝑡 𝑥𝑥𝑘𝑘𝑘𝑘
𝑇𝑇𝑥𝑥 𝑇𝑇𝑥𝑥
𝑗𝑗=1 𝑗𝑗=1
▪ Normalize the input data 𝑥𝑥 (𝑖𝑖) as
(𝑖𝑖) (𝑖𝑖)
(𝑖𝑖)
𝑥𝑥𝑘𝑘𝑘𝑘 − 𝔼𝔼𝑡𝑡 𝑥𝑥𝑘𝑘𝑘𝑘
𝑥𝑥ˆ 𝑘𝑘𝑘𝑘 = 𝛾𝛾𝑘𝑘 + 𝛽𝛽𝑘𝑘
(𝑖𝑖)
Var 𝑥𝑥𝑘𝑘𝑘𝑘 + 𝜖𝜖
▪ Denormalize the model output 𝑦𝑦� (𝑖𝑖)
(𝑖𝑖)
(𝑖𝑖) (𝑖𝑖) 𝑦𝑦�𝑘𝑘𝑘𝑘 − 𝛽𝛽𝑘𝑘 (𝑖𝑖)
𝑦𝑦�𝑘𝑘𝑘𝑘 = Var 𝑥𝑥𝑘𝑘𝑘𝑘 + 𝜖𝜖 ⋅ + 𝔼𝔼𝑡𝑡 𝑥𝑥𝑘𝑘𝑘𝑘
𝛾𝛾𝑘𝑘
Slice-level Adaptive Normalization (SAN)
Zhiding Liu et al. Adaptive Normalization for Non-stationary Time Series Forecasting: A Temporal Slice Perspective. NeurIPS 2023
Slice-level Adaptive Normalization (SAN)
� 𝑖𝑖 = ML P 𝝈𝝈𝑖𝑖 , 𝒙𝒙
𝝈𝝈 �𝑖𝑖
▪ Directly applying normalization to input data may erase this statistical information and lead
to poor predictions;
▪ It is challenging to fit trend changes solely using a linear layer. Applying batch normalization
even induces worse results. Disentangling the simulated time series also does not work.
Zhe Li et al. Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping. CoRR abs/2305.10721 (2023)
RevIN and Linear Classifier
▪ For the seasonal signal, RevIN scales the range but does not change the periodicity.
▪ For the trend signal, RevIN scales each segment into the same range and exhibits periodic
patterns. RevIN is capable of turning some trends into seasonality, making models better learn or
memorize trend terms.
RevIN and Linear Classifier
▪ RevIN converts continuously changing trends into multiple segments with a fixed and
similar trend, demonstrating periodic characteristics.
▪ As a result, errors in trend prediction caused by accumulated timesteps in the past can be
alleviated, leading to more accurate forecasting results.
Channel Independent
Lu Han, Han-Jia Ye, De-Chuan Zhan. The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for
Multivariate Time Series Forecasting. CoRR abs/2304.05206 (2023)
MAE Comparison
The Framework
▪ Normalization
▪ Temporal Module
▪ Even using a randomly initialized temporal feature extractor with untrained parameters can
induce competitive, even better forecasting results.
RLinear
▪ RLinear : RevIN + MLP + CI
Zhe Li et al. Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping. CoRR abs/2305.10721 (2023)
RLinear
▪ The time-mixing MLPs are shared across all features and the feature-mixing MLPs are shared
across all of the time steps.
Si-An Chen et al. TSMixer: An all-MLP Architecture for Time Series Forecasting. TMLR (2023)
▪ Time-mixing MLP
▪ Feature-mixing MLP
▪ Temporal Projection
▪ Residual Connections
▪ Normalization
PathTST
▪ Multivariate time series data is divided into different channels. They share the same
Transformer backbone, but the forward processes are independent
(𝑖𝑖)
Forward Process. Denote a 𝑖𝑖 -th univariate series of length 𝐿𝐿 starting at time index 1 as 𝒙𝒙1:𝐿𝐿 =
(𝑖𝑖) (𝑖𝑖)
𝑥𝑥1 , … , 𝑥𝑥𝐿𝐿 where 𝑖𝑖 = 1, … , 𝑀𝑀. The input 𝒙𝒙1 , … , 𝒙𝒙𝐿𝐿 is split to 𝑀𝑀 univariate series 𝒙𝒙(𝑖𝑖) ∈ ℝ1×𝐿𝐿 , where each of
them is fed independently into the Transformer backbone. Then the Transformer backbone will provide
(𝑖𝑖) (𝑖𝑖)
�(𝑖𝑖) = 𝑥𝑥�𝐿𝐿+1 , … , 𝑥𝑥�𝐿𝐿+𝑇𝑇 ∈ ℝ1×𝑇𝑇 .
prediction results 𝒙𝒙
Yuqi Nie et al. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. ICLR 2023
PathTST
Patching. Each input univariate time series 𝒙𝒙(𝑖𝑖) is first divided into patches which can be either
overlapped or non-overlapped. Denote the patch length as 𝑃𝑃 and the stride, then the patching process will
(𝑖𝑖)
generate the a sequence of patches 𝒙𝒙𝑝𝑝 ∈ ℝ𝑃𝑃×𝑁𝑁 where 𝑁𝑁 is the number of patches.
With the use of patches, the number of input tokens can be reduced.
Positional Encoding. A learnable additive position encoding 𝑊𝑊𝑝𝑝𝑝𝑝𝑝𝑝 ∈ ℝ𝐷𝐷×𝑁𝑁 is applied to monitor the
temporal order of patches.
Yuqi Nie et al. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. ICLR 2023
PathTST
PathTST
Yunhao Zhang, Junchi Yan. Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting. ICLR 2023
CrossFormer
Two-Stage Attention Directly using MSA in Cross-Dimension Stage to build the
D-to-D connection results in 𝑂𝑂(𝐷𝐷 2 ) complexity
Router mechanism: a small fixed number (c) of “routers” gather information from all dimensions
and then distribute the gathered information. The complexity is reduced to 𝑂𝑂(2𝑐𝑐𝑐𝑐) = 𝑂𝑂(𝐷𝐷).
iTransformer
Yong Liu et al. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. CoRR abs/2310.06625 (2023)
iTransformer
▪ Transformer treats time series as the natural language but the time aligned embedding may
bring about risks in multi-dimensional series. The problem can be alleviated by expanding the
receptive field.
▪ Patching can be more fine-grained, it also brings higher computational complexity and the
potential interaction noise between time-unaligned patches.
iTransformer
Cristian Challu et al. NHITS: Neural Hierarchical Interpolation for Time Series Forecasting. AAAI 2023: 6989-6997
iTransformer
iTransformer
Modern TCN
ModernTCN: A Modern Pure Convolution Structure for General Time Series Analysis. ICLR 2024 submission.
SegRNN
Shengsheng Lin et al. SegRNN: Segment Recurrent Neural Network for Long-Term Time Series Forecasting. CoRR abs/2308.11200 (2023)