Обработка данных
Обработка данных
Abstract—This paper introduces a novel application of The authors in [5] detailed this approach, providing a compre-
Kolmogorov-Arnold Networks (KANs) to time series forecast- hensive methodology foundational for subsequent statistical
ing, leveraging their adaptive activation functions for enhanced forecasting methods. Extensions of ARIMA, like Seasonal
arXiv:2405.08790v2 [eess.SP] 25 Sep 2024
have data for the six beams over one month. We use two weeks
Context length Network structure Prediction length
+ 1 day for training and one week + 1 day for testing for all the
different beams on the dataset. These test series were not seen
... by the network during training time. We train all the networks
with 500 epochs and Adam optimizer with a learning rate of
...
...
......
... ...
...
...
error (MAE) of the values around the prediction length.
...
...
KAN (4-depth)
forecasting scenarios.
0.4 MLP (4-depth) Furthermore, the parameter count reveals a significant dif-
ference in model complexity. KAN models are notably more
0.2
parameter-efficient, with KAN (4-depth) utilizing only 109k
0 5 10 15 20 parameters compared to 329k parameters for MLP (4-depth) or
Time step [hours]
238k for MLP (3-depth). This reduced complexity suggests
(a) Forecast over beam 1. that KANs can achieve higher or comparable forecasting
accuracy with simpler and potentially lighter models.
Real traffic (past)
Normalized traffic
0.2 Real traffic (future) Such efficiency is especially valuable in scenarios where
KAN (4-depth) computational resources are limited or where rapid model
0.1 MLP (4-depth)
deployment is required. The results also show that with an
0.0 augmentation of 16k parameters in KAN, the performance
0 25 50 75 100 125 150 175 200 can be significantly improved, contrary to MLPs which an
Time step [hours]
Zoomed-in view of prediction horizon increment of 91k parameters does not showcase a significant
Real traffic (future) improvement.
Normalized traffic
0.6 KAN (4-depth) urations of nodes and grid sizes affect the performance of
0.4 MLP (4-depth)
KANs, particularly in the context of traffic forecasting. For
0.2 this analysis, we designed 3 KANs (2-depth) [168, n, 24] with
0.0 n ∈ {5, 10, 20} and varying grids G ∈ {5, 10, 20} for a k = 3
0 5 10 15 20
Time step [hours] order B-spline. These results are shown during training time.
Figure 4 shows a clear trend where increasing the number of
(c) Forecast over beam 3. nodes generally results in lower loss values. This indicates that
Fig. 3: Satellite traffic over three different beams with their higher node counts are more effective at capturing the complex
forecasted values using a 4-depth KAN and a 4-depth MLP. patterns in traffic data, thus improving the performance. For
instance, configurations with n = 20 demonstrate significantly
lower losses across all grid sizes compared to those with fewer
nodes.
of trainable parameters for each model. Analyzing the error Similarly, the grid size within the splines of KANs has
metrics, it becomes clear that KANs outperform MLPs, where a notable impact on model performance. Larger grid sizes,
the KAN (4-depth) is the best in performance. Its lower values when used with a significant amount of nodes (n ∈ {10, 20}),
in MSE and RMSE indicates its better ability to predict traffic consistently result in better performance. However, when the
volumes with lower deviation. Similarly, its lower values in amount of nodes is low (n = 5) the extra complexity
MAE and MAPE suggests that KANs not only provides more of the grid size shows the opposite effect. When having a
accurate predictions but also maintains consistency across significant amount of nodes larger grids likely provide a more
different traffic volumes, which is crucial for practical traffic detailed basis for the spline functions, allowing the model to
R EFERENCES
KANs: Loss across nodes and grid sizes [1] O. B. Sezer, M. U. Gudelek, and A. M. Ozbayoglu, “Financial time
1.0 n=5-G=5 series forecasting with deep learning: A systematic literature review:
n=5-G=10 2005–2019,” Applied soft computing, vol. 90, p. 106181, 2020.
n=5-G=20 [2] K. R. Prakarsha and G. Sharma, “Time series signal forecasting using
n=10-G=5
0.8 n=10-G=10 artificial neural networks: An application on ecg signal,” Biomedical
n=10-G=20 Signal Processing and Control, vol. 76, p. 103705, 2022.
n=20-G=5 [3] Z. Chen, M. Ma, T. Li, H. Wang, and C. Li, “Long sequence time-series
0.6 n=20-G=10 forecasting with deep learning: A survey,” Information Fusion, vol. 97,
n=20-G=20
Loss
p. 101819, 2023.
[4] X. Zhu, Y. Xiong, M. Wu, G. Nie, B. Zhang, and Z. Yang, “Weather2k:
A multivariate spatio-temporal benchmark dataset for meteorological
0.4
forecasting based on real-time observation data from ground weather
stations,” arXiv preprint arXiv:2302.10493, 2023.
[5] G. E. Box and al., Time series analysis: forecasting and control. John
0.2 Wiley & Sons, 2015.
[6] R. J. Hyndman and G. Athanasopoulos, Forecasting: principles and
practice. OTexts, 2018.
0 25 50 75 100 125 150 175 200 [7] C. C. Holt, “Forecasting seasonals and trends by exponentially weighted
Epochs moving averages,” Int. journal of forecasting, vol. 20, no. 1, pp. 5–10,
2004.
Fig. 4: Ablation comparison of KAN-specific parameters [8] P. R. Winters, “Forecasting sales by exponentially weighted moving
during training time. averages,” Management science, vol. 6, no. 3, pp. 324–342, 1960.
[9] B. Lim and S. Zohren, “Time-series forecasting with deep learning: a
survey,” Philosophical Transactions of the Royal Society A, vol. 379,
no. 2194, p. 20200209, 2021.
accommodate better variations in the data, which is crucial for [10] J. F. Torres, D. Hadjout, A. Sebaa, F. Martı́nez-Álvarez, and A. Troncoso,
“Deep learning for time series forecasting: a survey,” Big Data, vol. 9,
capturing complex temporal traffic patterns. no. 1, pp. 3–21, 2021.
The best performance is observed in configurations that [11] G. P. Zhang et al., “Neural networks for time-series forecasting.,”
combine a high node count with a large grid size, such as Handbook of natural computing, vol. 1, p. 4, 2012.
[12] S. Hochreiter, “The vanishing gradient problem during learning recurrent
the n = 20, and G = 20 setup. This combination likely offers neural nets and problem solutions,” International Journal of Uncertainty,
the highest degree of flexibility and learning capacity, making Fuzziness and Knowledge-Based Systems, vol. 6, no. 02, pp. 107–116,
it particularly effective for modeling the intricate dependencies 1998.
[13] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
found in traffic data. However, this superior performance computation, vol. 9, no. 8, pp. 1735–1780, 1997.
comes at the cost of potentially higher computational demands [14] A. Borovykh and al., “Conditional time series forecasting with convo-
and longer training times, as more trainable parameters are lutional neural networks,” arXiv preprint arXiv:1703.04691, 2017.
[15] G. Bachmann, S. Anagnostidis, and T. Hofmann, “Scaling mlps: A tale
included. of inductive bias,” Advances in Neural Information Processing Systems,
These findings imply that while increasing nodes and grid vol. 36, 2024.
sizes can significantly enhance the performance of KANs, [16] Z. Liu and al., “Kan: Kolmogorov-arnold networks,” arXiv preprint
arXiv:2404.19756, 2024.
these benefits must be weighed against the increased compu- [17] A. N. Kolmogorov, On the representation of continuous functions of
tational requirements. For practical applications, particularly several variables by superpositions of continuous functions of a smaller
in real-time traffic management where timely responses are number of variables. American Mathematical Society, 1961.
[18] J. Braun and M. Griebel, “On a constructive proof of Kolmogorov’s
critical, it is essential to strike a balance. An effective approach superposition theorem,” Constructive approximation, vol. 30, pp. 653–
could involve starting with moderate settings and gradually 675, 2009.
adjusting the nodes and grid sizes based on performance [19] J. Schmidt-Hieber, “The kolmogorov–arnold representation theorem
revisited,” Neural networks, vol. 137, pp. 119–126, 2021.
assessments and computational constraints. Besides, we want [20] I. E. Livieris, E. Pintelas, and P. Pintelas, “A cnn–lstm model for
to highlight that for this study continual learning was not gold price time-series forecasting,” Neural computing and applications,
assessed, a possibility mentioned in the original paper [16]. vol. 32, pp. 17351–17360, 2020.
[21] S. Mehtab and J. Sen, “Analysis and forecasting of financial time series
V. C ONCLUSION using cnn and lstm-based deep learning models,” in Advances in Dis-
tributed Computing and Machine Learning: Proceedings of ICADCML
In this paper, we have performed an analysis of KANs and 2021, pp. 405–423, Springer, 2022.
MLPs for satellite traffic forecasting. The results highlighted
several benefits of KANs, including superior forecasting per-
formance and greater parameter efficiency. In our analysis, we
showed that KANs consistently outperformed MLPs in terms
of lower error metrics and were able to achieve better results
with lower computational resources. Additionally, we explored
specific KAN parameters impact on performance. This study
showcases the importance of optimizing node counts and grid
sizes to enhance model performance. Given their effectiveness
and efficiency, KANs appear to be a reasonable alternative to
traditional MLPs in traffic management.