0% found this document useful (0 votes)
12 views6 pages

Обработка данных

This paper presents Kolmogorov-Arnold Networks (KANs) as a new approach for time series forecasting, utilizing adaptive activation functions to enhance predictive modeling. KANs replace traditional linear weights with spline-parametrized functions, demonstrating superior performance over conventional Multi-Layer Perceptrons (MLPs) in satellite traffic forecasting tasks. The study highlights the potential of KANs for adaptive forecasting, emphasizing their efficiency and interpretability while suggesting further research to optimize their application across diverse datasets.

Uploaded by

olegkritski
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Обработка данных

This paper presents Kolmogorov-Arnold Networks (KANs) as a new approach for time series forecasting, utilizing adaptive activation functions to enhance predictive modeling. KANs replace traditional linear weights with spline-parametrized functions, demonstrating superior performance over conventional Multi-Layer Perceptrons (MLPs) in satellite traffic forecasting tasks. The study highlights the potential of KANs for adaptive forecasting, emphasizing their efficiency and interpretability while suggesting further research to optimize their application across diverse datasets.

Uploaded by

olegkritski
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Kolmogorov-Arnold Networks (KANs)

for Time Series Analysis


Cristian J. Vaca-Rubio, Luis Blanco, Roberto Pereira, Màrius Caus
Centre Tecnològic de Telecomunicacions de Catalunya (CTTC/CERCA), Castelldefels, Barcelona, Spain, 08860.
Emails: {cvaca, lblanco, rpereira, mcaus}@cttc.es

Abstract—This paper introduces a novel application of The authors in [5] detailed this approach, providing a compre-
Kolmogorov-Arnold Networks (KANs) to time series forecast- hensive methodology foundational for subsequent statistical
ing, leveraging their adaptive activation functions for enhanced forecasting methods. Extensions of ARIMA, like Seasonal
arXiv:2405.08790v2 [eess.SP] 25 Sep 2024

predictive modeling. Inspired by the Kolmogorov-Arnold repre-


sentation theorem, KANs replace traditional linear weights with ARIMA (SARIMA), adapt the model to handle seasonality in
spline-parametrized univariate functions, allowing them to learn data series, particularly useful in fields like retail and climatol-
activation patterns dynamically. We demonstrate that KANs ogy [6]. Exponential Smoothing techniques constitute another
outperforms conventional Multi-Layer Perceptrons (MLPs) in popular set of traditional (non-ML-based) forecasting methods.
a real-world satellite traffic forecasting task, providing more They are characterized by their simplicity and effectiveness in
accurate results with considerably fewer number of learnable
parameters. We also provide an ablation study of KAN-specific handling data with trends and seasonality. An exponent of this
parameters impact on performance. The proposed approach family of techniques is the so-called Holt-Winters seasonal
opens new avenues for adaptive forecasting models, emphasizing technique, which adjusts the model parameters in response to
the potential of KANs as a powerful tool in predictive analytics. changes in trend and seasonality within the time series data [7],
Index terms— Kolmogorov-Arnold Networks, ML, Time [8]. These models have been widely used for their efficiency,
series, Satellite interpretability and implementation.
More recently, ML models have significantly impacted the
I. I NTRODUCTION forecasting landscape by handling large datasets and capturing
complex nonlinear relationships that traditional methods can-
Time series forecasting is a traditional problem that plays not. In recent years, Deep Learning (DL)-based forecasting
a key role in a wide range of fields, driving critical decision- models [9], [10] have gained popularity, motivated by the
making processes in finance, economics, medicine, meteorol- notable achievements in many fields. For instance, neural
ogy, and biology, reflecting the wide applicability and its sig- networks have been extensively studied due to their flexibility
nificance across many domains [1]–[4]. It involves predicting and adaptability. Simple Multi-Layer Perceptron (MLPs) were
future values based on the previously observed data points. among the first to be applied to forecasting problems, demon-
With this goal in mind, understanding the dynamics of time- strating significant potential in non-linear data modeling [3],
dependent phenomena is essential and requires unveiling the [11].
patterns, trends and dependencies hidden with the historical Built upon these light models, more complex architec-
data. While conventional approaches have been traditionally tures have progressively expanded the capabilities of neural
centered on parametric models grounded in domain-specific networks in time series forecasting. Typical examples are
knowledge, such as autoregressive (AR), exponential smooth- recurrent neural network architectures such as Long Short-
ing, or structural time series models, contemporary Machine Term Memory (LSTM) networks and Gated Recurrent Units
Learning (ML) techniques offered a pathway to discern tem- (GRUs), which are designed to maintain information in mem-
poral patterns solely from data-driven insights. ory for long periods without the risk of vanishing gradients – a
Non-ML methods traditionally tackle the time series fore- common issue in traditional recurrent networks [12], [13]. On
casting problem and often rely on statistical methods to predict a related note, Convolutional Neural Networks (CNNs), which
future values based on previously observed data. One of the are fundamentally inspired by MLPs, are also extensively
most well-known techniques is the AutoRegressive Integrated employed in time series forecasting. These architectures are
Moving Average (ARIMA) model, which combines auto- particularly efficient at processing temporal sequences due
regression, integration, and moving averages to forecast data. to their strong spatial pattern recognition capabilities. The
combination of CNNs with LSTMs has resulted in models
This work was funded by the European Commission under the “5G- that efficiently process both spatial and temporal dependen-
STARDUST” Project, which received funding from the Smart Networks and
Services Joint Undertaking (SNS JU) under the European Union’s Horizon cies, enhancing forecasting accuracy [14]. These models have
Europe research and innovation programme under Grant Agreement No. started to outperform established benchmarks in complex
101096573, and in part by the grant CHIST-ERA-20-SICT-004 (SONATA) forecasting tasks, motivating a significant shift towards more
by PCI2021-122043-2A/AEI/10.13039/501100011033. This work has been
submitted to IEEE for possible publication. Copy right may be transferred complex network structures. Unfortunately, as the majority of
without notice, after which this version may no longer be accessible. the models mentioned above are inspired by MLP architecture,
they tend to have poor scaling law [15], i.e., the number of II. P ROBLEM STATEMENT
parameters in MLPs networks do not scale linear with the We formulate the traffic forecasting problem as a time series
number of layers, and often lack interpretability. at time t represented by yt . Our objective is to predict the
future values of the series
A recent study in reference [16], which caught the attention
of the research community, introduces Kolmogorov-Arnold yt0 :T = [yt0 , yt0 +1 , ..., yt0 +T ] (1)
Networks (KANs), a novel neural network architecture de- based solely on its historical values
signed to potentially replace traditional multilayer perceptrons.
KANs represent a disruptive paradigm shift, and as a potential xt0 −c:t0 −1 = [xt0 −c , ..., xt0 −2 , xt0 −1 ] (2)
game changer have recently attracted the interest of the AI
where t0 denotes the starting point from which future values
community worldwide. They are inspired by the Kolmogorov-
yt , t = t0 , ..., T are to be predicted. We differentiate the
Arnold representation theorem [17]–[19]. Unlike MLPs, which
historical time range [t0 − c, t0 − 1] and the forecast range
are inspired by the universal approximation theorem, KANs
[t0 , T ] as the context and prediction lengths, respectively. Our
take advantage of this representation theorem to generate
approach focuses on generating point forecasts for each time
a different architecture. They innovate by replacing linear
step in the prediction length, aiming to achieve accurate and
weights with spline-based univariate functions along the edges
reliable forecasts. Figure 1 shows an exemplary time series.
of the network, which are structured as learnable activation
functions. This design not only enhances the accuracy and A. Kolmogorov-Arnold representation background
interpretability of the networks, but also enables them to Contrary to MLPs, which are based on universal approxi-
achieve comparable or superior results with smaller network mation theorem, KANs rely on the Kolmogorov-Arnold rep-
sizes across various tasks, such as data fitting and solving resentation theorem, also known as the Kolmogorov-Arnold
partial differential equations. While KANs show promise in superposition theorem. A fundamental result in the theory of
improving the efficiency and interpretability of neural network dynamical systems and ergodic theory. It was independently
architectures, the study acknowledges the necessity for further formulated by Andrey Kolmogorov and Vladimir Arnold in
research into their robustness when applied to diverse datasets the mid-20th century.
and their compatibility with other deep learning architectures. The theorem states that any multivariate continuous function
These areas are crucial for understanding the full potential and f , which depends on x = [x1 , x2 , . . . , xn ], on a bounded
limitations of KANs. domain, can be represented as the finite composition of simpler
continuous functions, involving only one variable. Formally,
Our paper is a prospective study that investigates the appli- a real, smooth, and continuous multivariate function f (x) :
cation of KANs to time series forecasting. We aim to evaluate [0, 1]n → R can be represented by the finite superposition of
the practicality of KANs in real-world scenarios, to the best univariate functions [17]:
of the authors’ knowledge, not previously explored in the  
2n+1 n
literature, analyzing their efficiency regarding the number of X X
trainable parameters and discussing how the additional degrees f (x) = Φi  ϕi,j (xj ) , (3)
i=1 j=1
of freedom might affect forecasting performance. Herein, we
will assess the performance using real-world satellite traffic where Φi : R → R and ϕi,j : [0, 1] → R denote the so-called
data. This exploration seeks to further validate KANs as a outer and inner functions, respectively. One might initially
versatile tool in advanced neural network design for time perceive this development as highly advantageous for ML.
series forecasting, although more comprehensive studies are The task of learning a high-dimensional function simplifies to
required to optimize their use across broader applications. learning a polynomial number of one dimensional functions.
Finally, we note that due to the early stage of KANs, it is fair Nevertheless, these 1-dimensional functions can exhibit non-
to compare them as a potential alternative to MLPs, but further smooth characteristics, rendering them potentially unlearnable
investigation is needed to develop more complex solutions in practical contexts. As a result of this problematic behavior,
that can compete with advanced architectures such as LSTMs, the Kolmogorov-Arnold representation theorem has been tra-
GRUs, and CNNs, already well-established on the MLP-based ditionally disregarded in machine learning circles, recognized
architectures [20], [21]. as theoretically solid, but ineffective in practice. Unexpectedly,
the theoretical result in [16] has recently emerged as a potential
This paper is structured as follows. Section 2 presents game changer, paving the way for new network architectures,
the problem statement, providing fundamental background on inspired by the Kolmogorov-Arnold theorem.
the Kolmogorov-Arnold representation theorem and describes
our generalized KANs for time series forecasting. Section B. Kolmogorov-Arnold network background
3 introduces the experimental setup description. Simulation The authors in [16] mention that equation (3) has two layers
results analyzing the performance of KANs with real-world of non-linearities, with 2n+1 terms in the middle layer. Thus,
datasets are shown in Section 4. Finally, concluding remarks we only need to find the proper functions inner univariate
are provided in Section 5. functions ϕi,j and Φi that approximate the function. The one
dimensional inner functions ϕi,j can be approximated using
Prediction
B-splines. A spline is a smooth curve defined by a set of Context length
length
control points or knots. Splines are often used to interpolate or
approximate data points in a smooth and continuous manner.
A spline is defined by the order k (k = 3 is a common value),
which refers to the degree of the polynomial functions used to
interpolate or approximate the curve between control points.
The number of intervals, denoted by G, refers to the number of
segments or subintervals between adjacent control points. In
spline interpolation, the data points are connected by these
segments to form a smooth curve (of G + 1 grid points).
Although splines other than B-splines could also be consid-
ered, this is the approach proposed in [16]. Equation (3) can Fig. 1: Example of normalized satellite traffic series data with
be represented as a 2-layer (or analogous 2-depth) network, the conditioning and prediction lengths denoted in blue, and
with activation functions placed at the edges (instead of at red, respectively.
the nodes) and nodes performing a simple summation. Such
two-layer network is too simplistic to effectively approximate
any arbitrary function with smooth splines. For this reason, features accurately, KANs can capture compositional structure
reference [16] extends the ideas discussed above by proposing (external degrees of freedom), but also effectively approximate
a generalized architecture with wider and deeper KANs. univariate functions (internal degrees of freedom with the
A KAN layer is defined by a matrix Φ [16] composed splines). It should be noted that by increasing the number of
by univariate functions {ϕi,j (·)} with i = 1, ..., Nin and layers L or the dimension of the grid G, we are increasing the
j = 1, ..., Nout , where Nin and Nout denote the number of number of parameters and, consequently, the complexity of the
inputs and the number of outputs, respectively, and ϕi,j are the network. This approach constitutes an alternative to traditional
trainable spline functions described above. Note according to DL models, which are currently relying on MLP architectures
the previous definition, the Kolmogorov-Arnold representation and motivates our extension of this work.
theorem presented in Section II-A can be expressed as a
C. KAN time series forecasting network
two-layer KAN. The inner functions constitute a KAN layer
with Nin = n and Nout = 2n + 1, while the external We frame our traffic forecasting problem as a supervised
functions constitute another KAN layer with Nin = 2n + 1 learning framework consisting of a training dataset with input-
and Nout = 1. output {xt0 −c:t0 −1 , yt0 :T } in the condition and prediction
Let us define the shape of a KAN by [n1 , ..., nL+1 ], where lengths. We want to find f that approximates yt0 :T , i.e.,
L denotes the number of layers of the KAN. It is worth noting yt0 :T ≈ f (xt0 −c:t0 −1 ). For ease of notation, we describe our
the Kolmogorov-Arnold theorem is defined by a KAN of shape framework as a two-layer (2-depth) KAN [Ni , n, No ](note
[n, 2n + 1, 1]. A generic deeper KAN can be expressed by the that to comply with the original paper notation, the input
composition L layers: layer is not accounted as a layer per se). The output and input
layers will be comprised of No , and Ni nodes corresponding
y = KAN(x) = (ΦL ◦ ΦL−1 ◦ . . . ◦ Φ1 )x. (4) to the total amount of time steps in (1) and (2), while the
Notice that all the operations are differentiable. Consequently, transformation/hidden layer of n nodes. The inner functions
KANs can be trained with backpropagation. Despite their ele- constitute a KAN layer with Nin = Ni and Nout = n,
gant mathematical foundation, KANs are simply combinations while the external functions constitute another KAN layer with
of splines and MLPs, which effectively exploit each other’s Nin = n and Nout = No . Our KAN can be expressed by the
strengths while mitigating their respective weaknesses. Splines composition of 2 layers:
stand out for their accuracy on low-dimensional functions
y = KAN(x) = (Φ2 ◦ Φ1 )x, (5)
and allow transition between various resolutions. Nevertheless,
they suffer from a major dimensionality problem due to their where the output functions Φ2 generates the No outputs values
inability to effectively exploit compositional structures. In corresponding to (1) by doing the transformation from the
contrast, MLPs experience a lower dimensionality problem, previous layers, i.e, we predict the T time steps. The proposed
due to their ability to learn features, but exhibit lower accuracy network can be used to forecast future traffic data in the
than splines in low dimensions due to their inability to prediction length, based solely on the context length.
optimize univariate functions effectively. KANs have by their Fig. 2 shows a generic representation for any arbitrary
construction 2 levels of degrees of freedom. Consequently, number of layers L as presented in (4).
KANs possess the capability not only to acquire features,
owing to their external resemblance to MLPs, but also to op- III. E XPERIMENTAL SETUP
timize these acquired features with a high degree of accuracy, The dataset has been generated within the context of the
facilitated by their internal resemblance to splines. To learn European project 5G-STARDUST. The inputs are obtained
TABLE I: Model configurations for satellite traffic forecasting

Model Configuration Time horizon (h) Spline details Activations


MLP (3-depth) [168, 300, 300, 300, 24] Context/Prediction: 168/24 N/A ReLU (fixed)
MLP (4-depth) [168, 300, 300, 300, 300, 24] Context/Prediction: 168/24 N/A ReLU (fixed)
KAN (3-depth) [168, 40, 40, 24] Context/Prediction: 168/24 Type: B-spline, k = 3, G = 5 learnable
KAN (4-depth) [168, 40, 40, 40, 24] Context/Prediction: 168/24 Type: B-spline, k = 3, G = 5 learnable

have data for the six beams over one month. We use two weeks
Context length Network structure Prediction length
+ 1 day for training and one week + 1 day for testing for all the
different beams on the dataset. These test series were not seen
... by the network during training time. We train all the networks
with 500 epochs and Adam optimizer with a learning rate of
...

...
......

0.001. The selected loss function minimizes the mean absolute


...
...
...

... ...

...
...
error (MAE) of the values around the prediction length.
...
...

IV. S IMULATION RESULTS


......

... A. Performance analysis


We analyze the forecasting performance in the prediction
Fig. 2: Example of the flow of information in the KAN length for different beams over the test set. Figures 3a-c
network architecture for our traffic forecasting task. Learnable depicts the real traffic value used as input (in green) to the
activations are represented inside a square box. networks, the expected output prediction length (in blue) and
the values predicted values using a KAN (in red) and MLP
(in purple) of depth 4 both – see Table I for details on
from a satellite operator (SO), as a result of processing real model configuration. In general, our results show that the
information from a GEO satellite communication system, predictions obtained using KANs better approximates the real
which provisions broadband services. The dataset is a long traffic values than the predictions obtained using traditional
time series capturing aggregated traffic data. To preserve MLPs.
privacy, anonymous clients have been defined with more than This is particularly evident in Figure 3a. Here, KAN ac-
500 connected users, and the traffic has been normalized. The curately matches rapid changes in traffic volume, which the
measurements are monthly long, and the time granularity is MLP models sometimes moderately over/under-predicted, as
1 hour. The traffic has been extracted per satellite beam in the last part of the forecast shows. This capability suggests
Megabits per second (Mbps). Although the data has been that KANs are better suited to adapt to sudden shifts in traffic
collected using a GEO satellite communication system, it conditions, a critical aspect of effective traffic management.
is expected that user needs could be used to address LEO Additionally, the responsiveness of KANs is particularly
systems, as well. It is worth emphasizing that the data collected noticeable in Figure 3b during fast changing traffic conditions.
can be used for AI-driven predictive analysis, to forecast traffic KAN shows a rapid adjustment to its forecast that is closely
conditions, which is essential to avoid congestion and to make aligned with the actual traffic pattern. This is particularly no-
efficient use of satellite resources. Endowing the network with ticeable in the last 6 hours of the prediction length where MLP
intelligence will be beneficial to meet the different demands exhibits a lag failing to capture these immediate fluctuations,
of satellite applications. which shows its worse performance to capture dynamic traffic
We aim to investigate the forecasting performance of dif- variations. Further analysis is shown in Figure 3c, where traffic
ferent KAN and MLP architectures for predicting satellite conditions are more variable and intense, demonstrated the
traffic over a total of six beam areas. Concretely, we have robustness of KAN in maintaining high performance despite
a context length of 168 hours (one week) and a prediction the complexity and higher volume. This robustness suggests
length of 24 hours (one day). This translates to T = 24, that KANs can manage different scales and intensities of traffic
c = 168, where yt0 +T = 192 in (1) and (2). Our focus is on data more effectively than MLPs, making them more reliable
evaluating the efficacy of KAN models compared to traditional for deployment in varied traffic scenarios.
MLPs1 . We designed our experiments to compare models To further quantify the performance and advantages of
with similar depths but varying architectures to analyze their using KANs for the satellite traffic forecasting task we show
impact on forecasting accuracy and parameter efficiency. Table Table II. It shows a detailed comparison of MLPs and KANs
I summarizes the parameters selected for this evaluation. We different architectures used for evaluation over all the beams.
The table displays the Mean Squared Error (MSE), Root
1 As KANs are in their infancy, we remark this comparison is fair instead Mean Squared Error (RMSE), Mean Absolute Error (MAE),
of comparing against more complex architectures as LSTM. Mean Absolute Percentage Error (MAPE), and the number
TABLE II: Results summary
Real traffic (past)
Normalized traffic
0.6
Real traffic (future) Model MSE (×10−3 ) RMSE (×10−2 ) MAE (×10−2 ) MAPE Parameters
0.4 KAN (4-depth)
MLP (4-depth) MLP (3-depth)
MLP (4-depth)
6.34
6.12
7.96
7.82
5.41
5.55
0.64
1.05
238k
329k
0.2 KAN (3-depth) 5.99 7.73 5.51 0.62 93k
KAN (4-depth) 5.08 7.12 5.06 0.52 109k
0 25 50 75 100 125 150 175 200
Time step [hours]
Zoomed-in view of prediction horizon
0.6 Real traffic (future)
Normalized traffic

KAN (4-depth)
forecasting scenarios.
0.4 MLP (4-depth) Furthermore, the parameter count reveals a significant dif-
ference in model complexity. KAN models are notably more
0.2
parameter-efficient, with KAN (4-depth) utilizing only 109k
0 5 10 15 20 parameters compared to 329k parameters for MLP (4-depth) or
Time step [hours]
238k for MLP (3-depth). This reduced complexity suggests
(a) Forecast over beam 1. that KANs can achieve higher or comparable forecasting
accuracy with simpler and potentially lighter models.
Real traffic (past)
Normalized traffic

0.2 Real traffic (future) Such efficiency is especially valuable in scenarios where
KAN (4-depth) computational resources are limited or where rapid model
0.1 MLP (4-depth)
deployment is required. The results also show that with an
0.0 augmentation of 16k parameters in KAN, the performance
0 25 50 75 100 125 150 175 200 can be significantly improved, contrary to MLPs which an
Time step [hours]
Zoomed-in view of prediction horizon increment of 91k parameters does not showcase a significant
Real traffic (future) improvement.
Normalized traffic

0.10 KAN (4-depth)


MLP (4-depth) From a technical perspective, KANs leverage a theoretical
0.05 foundation that provides an intrinsic advantage in modeling
complex, non-linear patterns typical in traffic systems. This
0.00
0 5 10 15 20 capability likely contributes to their flexibility and accuracy
Time step [hours] in traffic forecasting. The consistency in performance across
(b) Forecast over beam 2. diverse conditions also suggests that KANs have strong gen-
eralization capabilities, which is essential for models used in
0.8 Real traffic (past)
Normalized traffic

Real traffic (future)


geographically varied locations under different traffic condi-
0.6 KAN (4-depth) tions. Moreover, besides obtaining lower error rates, our results
0.4 MLP (4-depth)
also suggest that KANs can do so with considerably smaller
0.2
number of parameters than traditional MLP networks.
0.0
0 25 50 75 100 125 150 175 200
Time step [hours] B. KANs parameter-specific analysis
Zoomed-in view of prediction horizon
We provide an insightful analysis of how different config-
Real traffic (future)
Normalized traffic

0.6 KAN (4-depth) urations of nodes and grid sizes affect the performance of
0.4 MLP (4-depth)
KANs, particularly in the context of traffic forecasting. For
0.2 this analysis, we designed 3 KANs (2-depth) [168, n, 24] with
0.0 n ∈ {5, 10, 20} and varying grids G ∈ {5, 10, 20} for a k = 3
0 5 10 15 20
Time step [hours] order B-spline. These results are shown during training time.
Figure 4 shows a clear trend where increasing the number of
(c) Forecast over beam 3. nodes generally results in lower loss values. This indicates that
Fig. 3: Satellite traffic over three different beams with their higher node counts are more effective at capturing the complex
forecasted values using a 4-depth KAN and a 4-depth MLP. patterns in traffic data, thus improving the performance. For
instance, configurations with n = 20 demonstrate significantly
lower losses across all grid sizes compared to those with fewer
nodes.
of trainable parameters for each model. Analyzing the error Similarly, the grid size within the splines of KANs has
metrics, it becomes clear that KANs outperform MLPs, where a notable impact on model performance. Larger grid sizes,
the KAN (4-depth) is the best in performance. Its lower values when used with a significant amount of nodes (n ∈ {10, 20}),
in MSE and RMSE indicates its better ability to predict traffic consistently result in better performance. However, when the
volumes with lower deviation. Similarly, its lower values in amount of nodes is low (n = 5) the extra complexity
MAE and MAPE suggests that KANs not only provides more of the grid size shows the opposite effect. When having a
accurate predictions but also maintains consistency across significant amount of nodes larger grids likely provide a more
different traffic volumes, which is crucial for practical traffic detailed basis for the spline functions, allowing the model to
R EFERENCES
KANs: Loss across nodes and grid sizes [1] O. B. Sezer, M. U. Gudelek, and A. M. Ozbayoglu, “Financial time
1.0 n=5-G=5 series forecasting with deep learning: A systematic literature review:
n=5-G=10 2005–2019,” Applied soft computing, vol. 90, p. 106181, 2020.
n=5-G=20 [2] K. R. Prakarsha and G. Sharma, “Time series signal forecasting using
n=10-G=5
0.8 n=10-G=10 artificial neural networks: An application on ecg signal,” Biomedical
n=10-G=20 Signal Processing and Control, vol. 76, p. 103705, 2022.
n=20-G=5 [3] Z. Chen, M. Ma, T. Li, H. Wang, and C. Li, “Long sequence time-series
0.6 n=20-G=10 forecasting with deep learning: A survey,” Information Fusion, vol. 97,
n=20-G=20
Loss

p. 101819, 2023.
[4] X. Zhu, Y. Xiong, M. Wu, G. Nie, B. Zhang, and Z. Yang, “Weather2k:
A multivariate spatio-temporal benchmark dataset for meteorological
0.4
forecasting based on real-time observation data from ground weather
stations,” arXiv preprint arXiv:2302.10493, 2023.
[5] G. E. Box and al., Time series analysis: forecasting and control. John
0.2 Wiley & Sons, 2015.
[6] R. J. Hyndman and G. Athanasopoulos, Forecasting: principles and
practice. OTexts, 2018.
0 25 50 75 100 125 150 175 200 [7] C. C. Holt, “Forecasting seasonals and trends by exponentially weighted
Epochs moving averages,” Int. journal of forecasting, vol. 20, no. 1, pp. 5–10,
2004.
Fig. 4: Ablation comparison of KAN-specific parameters [8] P. R. Winters, “Forecasting sales by exponentially weighted moving
during training time. averages,” Management science, vol. 6, no. 3, pp. 324–342, 1960.
[9] B. Lim and S. Zohren, “Time-series forecasting with deep learning: a
survey,” Philosophical Transactions of the Royal Society A, vol. 379,
no. 2194, p. 20200209, 2021.
accommodate better variations in the data, which is crucial for [10] J. F. Torres, D. Hadjout, A. Sebaa, F. Martı́nez-Álvarez, and A. Troncoso,
“Deep learning for time series forecasting: a survey,” Big Data, vol. 9,
capturing complex temporal traffic patterns. no. 1, pp. 3–21, 2021.
The best performance is observed in configurations that [11] G. P. Zhang et al., “Neural networks for time-series forecasting.,”
combine a high node count with a large grid size, such as Handbook of natural computing, vol. 1, p. 4, 2012.
[12] S. Hochreiter, “The vanishing gradient problem during learning recurrent
the n = 20, and G = 20 setup. This combination likely offers neural nets and problem solutions,” International Journal of Uncertainty,
the highest degree of flexibility and learning capacity, making Fuzziness and Knowledge-Based Systems, vol. 6, no. 02, pp. 107–116,
it particularly effective for modeling the intricate dependencies 1998.
[13] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
found in traffic data. However, this superior performance computation, vol. 9, no. 8, pp. 1735–1780, 1997.
comes at the cost of potentially higher computational demands [14] A. Borovykh and al., “Conditional time series forecasting with convo-
and longer training times, as more trainable parameters are lutional neural networks,” arXiv preprint arXiv:1703.04691, 2017.
[15] G. Bachmann, S. Anagnostidis, and T. Hofmann, “Scaling mlps: A tale
included. of inductive bias,” Advances in Neural Information Processing Systems,
These findings imply that while increasing nodes and grid vol. 36, 2024.
sizes can significantly enhance the performance of KANs, [16] Z. Liu and al., “Kan: Kolmogorov-arnold networks,” arXiv preprint
arXiv:2404.19756, 2024.
these benefits must be weighed against the increased compu- [17] A. N. Kolmogorov, On the representation of continuous functions of
tational requirements. For practical applications, particularly several variables by superpositions of continuous functions of a smaller
in real-time traffic management where timely responses are number of variables. American Mathematical Society, 1961.
[18] J. Braun and M. Griebel, “On a constructive proof of Kolmogorov’s
critical, it is essential to strike a balance. An effective approach superposition theorem,” Constructive approximation, vol. 30, pp. 653–
could involve starting with moderate settings and gradually 675, 2009.
adjusting the nodes and grid sizes based on performance [19] J. Schmidt-Hieber, “The kolmogorov–arnold representation theorem
revisited,” Neural networks, vol. 137, pp. 119–126, 2021.
assessments and computational constraints. Besides, we want [20] I. E. Livieris, E. Pintelas, and P. Pintelas, “A cnn–lstm model for
to highlight that for this study continual learning was not gold price time-series forecasting,” Neural computing and applications,
assessed, a possibility mentioned in the original paper [16]. vol. 32, pp. 17351–17360, 2020.
[21] S. Mehtab and J. Sen, “Analysis and forecasting of financial time series
V. C ONCLUSION using cnn and lstm-based deep learning models,” in Advances in Dis-
tributed Computing and Machine Learning: Proceedings of ICADCML
In this paper, we have performed an analysis of KANs and 2021, pp. 405–423, Springer, 2022.
MLPs for satellite traffic forecasting. The results highlighted
several benefits of KANs, including superior forecasting per-
formance and greater parameter efficiency. In our analysis, we
showed that KANs consistently outperformed MLPs in terms
of lower error metrics and were able to achieve better results
with lower computational resources. Additionally, we explored
specific KAN parameters impact on performance. This study
showcases the importance of optimizing node counts and grid
sizes to enhance model performance. Given their effectiveness
and efficiency, KANs appear to be a reasonable alternative to
traditional MLPs in traffic management.

You might also like