0% found this document useful (0 votes)
13 views

Tabular Data Classification and Regression XGBoost or Deep Learning With Retrieval-Augmented Generation

This study compares the performance of Tabular Retrieval-Augmented Generation (TabR) and TabNet against eXtreme Gradient Boosting (XGBoost) for tabular data classification and regression tasks. Results indicate that TabR outperforms XGBoost in classification tasks due to its retrieval-augmented generation capabilities, while XGBoost remains superior in regression tasks. The research highlights the potential of deep learning models for tabular data and suggests further exploration of their performance enhancements.

Uploaded by

Mahi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Tabular Data Classification and Regression XGBoost or Deep Learning With Retrieval-Augmented Generation

This study compares the performance of Tabular Retrieval-Augmented Generation (TabR) and TabNet against eXtreme Gradient Boosting (XGBoost) for tabular data classification and regression tasks. Results indicate that TabR outperforms XGBoost in classification tasks due to its retrieval-augmented generation capabilities, while XGBoost remains superior in regression tasks. The research highlights the potential of deep learning models for tabular data and suggests further exploration of their performance enhancements.

Uploaded by

Mahi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Received 17 November 2024, accepted 9 December 2024, date of publication 16 December 2024,

date of current version 24 December 2024.


Digital Object Identifier 10.1109/ACCESS.2024.3518205

Tabular Data Classification and Regression:


XGBoost or Deep Learning With
Retrieval-Augmented Generation
JONINDO PASARIBU , NOVANTO YUDISTIRA , AND WAYAN FIRDAUS MAHMUDY
Faculty of Computer Science, Brawijaya University, Malang, East Java 65145, Indonesia
Corresponding author: Novanto Yudistira ([email protected])

ABSTRACT Tabular data is the most prevalent form of structured data, necessitating robust models
for classification and regression tasks. Traditional models like eXtreme Gradient Boosting (XGBoost)
have gained popularity for their strong performance, while deep learning models such as Tabular
Retrieval-Augmented Generation (TabR) and TabNet offer innovative approaches. TabR uniquely employs
Retrieval-Augmented Generation (RAG) to reduce uncertainty and enhance predictive accuracy, whereas
TabNet relies on a sequential attention mechanism without incorporating RAG. This study systematically
compares TabR and TabNet in classification and regression tasks using benchmark datasets, with evaluations
based on accuracy, Area Under the Curve (AUC), Mean Absolute Error (MAE), Root Mean Squared Error
(RMSE), and R2 (Coefficient of Determination). The results reveal that TabR, with its RAG component,
outperforms XGBoost in classification, effectively managing uncertainty. However, in regression tasks,
XGBoost continues to excel over TabR. Meanwhile, TabNet performs comparably but lacks the performance
enhancement provided by RAG in TabR. These findings highlight the potential of RAG in deep learning
models for tabular data classification and suggest areas for further exploration in improving regression
performance.

INDEX TERMS Deep learning models, retrieval-augmented generation (RAG), equilibrium, machine
learning, TabNet, TabR, tabular data, XGBoost.

I. INTRODUCTION models—including traditional machine learning models [4],


In real-world applications, tabular data is the predominant neural networks (NNs) [5], and deep learning models [2],
form of structured data, necessitating the development of [6]—in classification tasks. Additionally, XGBoost has been
robust and accurate classification models to handle diverse shown to enhance the performance of deep learning models
tasks across various domains [1]. Currently, traditional when combined with them, either through hybrid models [7],
classification models, such as eXtreme Gradient Boosting [8], [9] or hybrid algorithms [10], [11], [12], which leverage
(XGBoost), are popular and widely used for handling the strengths of both approaches for improved accuracy and
classification tasks on tabular data compared to deep efficiency.
learning models. XGBoost’s ability to handle categorical Tabular data typically consists of features that can be
features, manage missing values, and leverage ensemble numerical, categorical, or a combination of various data
learning often makes it a solid choice [2]. XGBoost is types. The scale and distribution of these features can vary
known as one of the most optimal models for tabular data widely, encompassing binary, ordinal, and continuous data.
classification [3] and has consistently outperformed several As a result, deep learning models, which are generally
more effective for data with strong spatial or temporal
The associate editor coordinating the review of this manuscript and structures (such as images or text sequences), struggle to
approving it for publication was Vlad Diaconita . handle the diversity and scale variability in tabular data.
2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by/4.0/ 191719
J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

However, the emergence of deep learning models like TabR In line with the advancement of information technology
and TabNet challenges XGBoost’s dominance in terms of and the growing need for more complex data processing,
accuracy and adaptability to complex data structures. In fact, a deep understanding of the performance of deep learning
recent developments in these models—along with others models on tabular data is crucial. Therefore, this research is
like ConvXGB [13], Neural Oblivious Decision Ensembles highly relevant in supporting the development of effective and
(NODE) [14], and Self-Attention and Intersample Atten- efficient solutions to address the challenges of tabular data
tion Transformer (SAINT) [15]—have demonstrated their classification in today’s digital era.
capability to outperform XGBoost, proving their potential in
handling complex tabular data scenarios.
The TabR deep learning model leverages the power of A. CONTRIBUTIONS
neural networks to extract more abstract feature represen- Our research makes significant contributions to the field
tations from tabular data [16], while TabNet employs a of tabular data classification using deep learning models
sequential attention mechanism to identify and emphasize by exploring both novel methodologies and enhancements
important features for processing at each decision-making to existing techniques. First, we conducted a comprehen-
step [17]. TabR is a newly developed model and is not sive comparative analysis of three state-of-the-art models:
yet widely adopted. However, there is research involving XGBoost, TabR (Tabular Retrieval-Augmented Generation),
a similar mechanism to TabR, the retrieval mechanism, and TabNet, to assess their performance across different
applied to a model called TabPFN (Tabular Prior-Formation benchmark datasets. This comparative study provided valu-
Network) developed by Breejen et al. [18]. On the other able insights into the strengths and weaknesses of each model,
hand, TabNet has been applied to tabular data classification highlighting scenarios where one outperforms the others.
in various studies. For example, in research by Chen et Such insights are crucial for practitioners and researchers to
al. [19], TabNet was used for food safety risk prediction make informed decisions about model selection based on the
by integrating gray relational analysis (GRA) to estimate specific characteristics of their data.
food safety risks more accurately. Another study by Jin We tested these models using both classification and
et al. [20] demonstrated that TabNet could be used for regression datasets, allowing us to evaluate their versatility
Alzheimer’s disease classification with high accuracy, where and adaptability across various types of tabular data tasks.
it outperformed other machine learning models such as By incorporating this dual approach, our study provides a
Support Vector Machine (SVM) and Random Forest (RF). thorough understanding of how each model handles different
Additionally, in research by McDonnell et al. [21], TabNet problem domains, demonstrating their relative strengths
was used for insurance claim prediction using telematics in either classification or regression. This dual evaluation
data, showing superior performance compared to models like approach illustrates the comprehensive capabilities of each
XGBoost and logistic regression in terms of accuracy and model and underscores the contexts in which deep learning
model interpretability. models can either complement or outperform traditional
TabR and TabNet bring architectural advantages that allow methods like XGBoost.
them to better handle the complexity and diversity of tabular Furthermore, we introduced improvements to the TabR
data than conventional deep learning models. Features like model by experimenting with modifications in its architec-
retrieval augmentation, attention mechanisms, and sparse ture. One notable enhancement was changing the activation
feature selection enable these models to overcome issues function from Rectified Linear Unit (ReLU) to sigmoid.
of interpretability, overfitting, and computational efficiency This adjustment resulted in better performance in certain
that typically hinder deep learning performance on tabular cases, demonstrating the potential benefits of fine-tuning
data. Therefore, TabR and TabNet emerge as more suitable model configurations for specific tasks. The ability to switch
choices for problems involving tabular data compared to between different activation functions provides flexibility
other general deep learning models. However, despite the in optimizing TabR’s performance across various types
great potential these models offer, there is no consensus on of tabular data, accommodating the unique patterns and
which model is superior for tabular data classification tasks. relationships within each dataset.
This study aims to compare the performance of TabR and In addition to architectural modifications, we integrated the
TabNet in the context of tabular data classification using deep equilibrium (DEQ) method into the TabR model. DEQ
several datasets from Grinsztajn et al. [22] in their study enables more robust feature representation and facilitates
titled ‘‘Why do tree-based models still outperform deep the handling of complex transformations, which are often
learning on tabular data?’’, which are commonly used as required in tabular data with intricate feature interactions. The
a benchmark for evaluating the performance of models on integration of DEQ showed promising results in enhancing
tabular data. By understanding and comparing the strengths model performance, though it also indicated the need
and weaknesses of both models, this research seeks to provide for further refinement to fully leverage its capabilities.
clearer insights into the application of deep learning models By incorporating DEQ, we introduced a novel approach
on tabular data. to improving feature extraction and representation in deep

191720 VOLUME 12, 2024


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

learning models for tabular data, paving the way for future accuracy but also facilitated better interpretability, making it
exploration and optimization. easier to understand the contribution of individual features to
To comprehensively evaluate the performance of the the prediction outcome.
models, we employed a range of evaluation metrics, including From the insights gathered from these foundational
accuracy and AUC (Area Under the Curve) for classification studies, it can be concluded that deep learning models like
tasks, as well as MAE (Mean Absolute Error), RMSE (Root TabR and TabNet are capable of competing with, and in
Mean Squared Error), and R2 (Coefficient of Determination) certain scenarios surpassing, traditional ensemble tree models
for regression tasks. This multi-metric evaluation approach that have long dominated tabular data applications. These
provided a holistic view of how each model performs under findings prompted us to utilize both TabR and TabNet in our
different conditions, offering a nuanced understanding of research to compare their performance against each other and
their strengths and limitations. By analyzing the models with XGBoost, aiming to further evaluate their applicability
across various metrics, we not only assessed their predictive and efficiency in both classification and regression tasks. This
power but also their robustness and reliability in real-world comparative analysis provides a comprehensive view of how
applications. deep learning models can be optimized and applied effec-
tively to solve tabular data problems, laying the groundwork
II. RELATED WORKS for future advancements in this field.
Recent advancements in deep learning have sparked a
growing interest in their application to tabular data problems,
which have traditionally been dominated by ensemble A. DEEP LEARNING
methods like gradient-boosted decision trees [2], [23]. Two Deep learning, a branch of machine learning, utilizes
prominent studies have explored the performance of deep algorithms inspired by the structure and function of the brain,
learning models in this context, shedding light on their known as artificial neural networks [24]. It is a technique
potential and limitations. for reducing high-dimensional data by using hierarchical
The first study conducted by Gorishniy et al. [16] layers that capture various features of the data to build
aimed to advance the position of deep learning models high-dimensional predictors in an input-output model [25].
by benchmarking them against non-deep learning models, This approach involves a computer-based modeling approach
specifically focusing on gradient-boosted decision trees consisting of multiple processing layers that are utilized to
(GBDT), in the for tabular data. This comprehensive study understand data representations [26].
utilized a total of 43 datasets, covering a wide range of tabular
data scenarios to ensure a robust comparison. The findings
of this study revealed that the TabR model outperformed B. TABR
XGBoost in classification tasks on 23 out of the 43 datasets. TabR is a deep learning model specifically designed for
Furthermore, it matched the performance of XGBoost on tabular data, integrating a simple yet effective retrieval
13 datasets, demonstrating its capability to handle complex augmentation approach within a traditional feed-forward
tabular data effectively. However, TabR underperformed network. The concept of retrieval augmentation has already
on 7 datasets, indicating areas where traditional ensemble proven to be highly successful in natural language pro-
models like XGBoost still hold a performance edge. These cessing (NLP) tasks [27]. Recent research indicates that
results underscore the potential of TabR as a competitive using retrieval strategies with small models, instead of
deep learning alternative for tabular data classification, growing parametric representations, can significantly reduce
especially given its relatively simpler architecture compared model size while maintaining competitive performance
to traditional ensemble models. across various tasks [28]. Additionally, Retrieval-Augmented
The second study, conducted by Arik and Pfister [17], Language Modeling (RALM) methods have shown notable
focused on developing a deep neural network (DNN) improvements in language modeling by conditioning on
model specifically tailored for tabular data. This study relevant documents from a grounding corpus. These methods
introduced the TabNet model, which leveraged a unique not only boost performance but also help mitigate factually
sequential attention mechanism to enhance feature selection inaccurate text generation and provide a natural source
and interpretation in tabular data processing. The results attribution mechanism [29]. This underscores the efficiency
demonstrated that TabNet outperformed traditional models, and adaptability of retrieval augmentation in different
including XGBoost, across a diverse collection of tabular domains. Table 1 presents a summary of studies showcasing
data from various domains. One of the key highlights of the effectiveness of Retrieval-Augmented Generation (RAG)
this study was the effectiveness of unsupervised pre-training, across various models and tasks. Wang et al. demonstrated
which significantly improved the model’s adaptation and that REMFlow combined with Qwen2.5-14B significantly
overall performance. This finding emphasizes the importance outperformed traditional machine learning models in terms
of leveraging unsupervised learning techniques to initialize of prediction and accuracy, leveraging retrieval augmentation
and fine-tune models to better capture the underlying patterns to incorporate relevant context. However, this approach is
in tabular data. The attention mechanism not only improved limited by its high computational cost and sensitivity to

VOLUME 12, 2024 191721


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

FIGURE 1. TabR architecture.

noisy data, which can impact performance in less controlled Finally, the enriched representation is passed to the predictor,
environments. which makes the final prediction [16].
Li et al. introduced Stock-Chain, which improved accuracy A detailed schematic of the TabR architecture is shown in
in financial forecasting by utilizing RAG, highlighting Fig. 1. The encoder (E) and predictor (P) in TabR contain
its advantage over foundational models. Nonetheless, the blocks that consist of fully connected layers, layer normal-
model’s limited generalizability across domains restricts ization, ReLU activation functions, and dropout techniques.
its application outside the financial sector. Similarly, Chen These components facilitate the extraction and refinement of
et al. showed that integrating RAG with GPT-4 enhanced features from the input data. NE and NP represent the number
model accuracy, proving effective in generating contextually of ‘‘Blocks’’ in the encoder and predictor, respectively. When
accurate outputs. Despite this, the method heavily relies NE is greater than zero, indicating the presence of at least
on high-quality retrieval sources, making it less robust in one block in the encoder, the normalized features x̃ and
scenarios where such data is unavailable or incomplete. x̃i are subjected to layer normalization before moving to
Zakka et al. found that RAG-augmented text-davinci- the retrieval module. The retrieval module identifies context
003 outperformed ChatGPT by refining responses through objects based on their similarity to the target object, assigning
relevant retrievals. However, the model is prone to hal- weights using the softmax activation function to determine
lucinations when dealing with ambiguous queries, which their relevance. The final output is an aggregation of these
can lead to less reliable outputs in critical applications. context objects, calculated using their values and associated
Finally, Thompson et al. illustrated that Bison-001 with weights. This approach enables TabR to handle complex
RAG surpassed rule-based methods, reinforcing RAG’s tabular data effectively by leveraging contextual information
adaptability and utility across diverse applications. However, during the prediction process.
its limited explainability compared to rule-based approaches
remains a challenge, particularly in fields requiring high C. TABNET
transparency and interpretability. TabNet is another deep learning model specifically designed
for tabular data, distinguished by its use of a sequential
TABLE 1. Summary of studies on models utilizing retrieval-augmented attention mechanism. This mechanism allows TabNet to
generation (RAG).
focus selectively on the most relevant features, thereby
enhancing the interpretability of the model’s predictions [17].
By combining this attention mechanism with decision tree-
like processing, TabNet overcomes some of the inherent
challenges of tabular data, such as the need for inter-
pretability and efficient feature selection. Table 2 provides
a summary of studies exploring models that utilize attention
mechanisms to enhance performance and interpretability.
Zhu et al. demonstrated that integrating ARIMA with
attention improved stock prediction accuracy, outperforming
The process within TabR begins with encoding both the models like XGBoost. However, this approach struggled with
target and candidate objects. The retrieval module then scalability for high-dimensional data. Ullah et al. applied
enriches the target object’s representation by identifying and attention with Temporal Convolutional Networks (TCN),
processing relevant objects from the pool of candidates. This achieving increased accuracy across five diverse datasets,
retrieval-based enhancement aims to incorporate contextual illustrating the versatility of attention in improving model
information that improves the predictive power of the model. adaptability. The downside, however, is the computational

191722 VOLUME 12, 2024


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

expense required for long sequences. Zheng et al. focused on


recommendation systems, where RP-SANRec, an attention-
based model, delivered superior recommendation accuracy,
underscoring the effectiveness of attention mechanisms in
various applications for capturing relevant features and
boosting predictive precision. This model, however, was
sensitive to sparsity in input data, limiting its performance
in certain scenarios.

TABLE 2. Summary of studies on models utilizing attention mechanism.

The TabNet architecture consists of multiple decision


blocks, each comprising a feature transformer, an attentive
transformer, and feature masking. The feature transformer
processes and transforms the input data, preparing it for
subsequent stages. After transformation, the features are split
by a split block to be used by the attentive transformer and to
contribute to the overall output. At each decision step, feature
masking plays a crucial role by providing feedback on which FIGURE 2. Overview of the methodological framework for tabular data
features are important, enabling the model to make informed classification and regression.

predictions. The processed features then undergo decoding,


involving fully connected layers and feature transformer
blocks. The decoder’s layers include components such as evaluation metrics chosen to assess performance across
batch normalization and GLU (Gated Linear Unit) activation diverse datasets.
function [38] to maintain stability and improve learning.
The attentive transformer block is central to TabNet’s A. METHODOLOGICAL FRAMEWORK
ability to implement its attention mechanism. This block • Data Collection. We utilized secondary data sourced
includes a single-layer mapping to adjust features based from the Hugging Face website, accessible at https://
on their prior usage scales, thereby maintaining a balance huggingface.co/datasets/puhsu/tabular-benchmarks. This
between the exploration of new features and the exploitation resource provides a curated collection of benchmark
of known important features. The sparsemax activation func- datasets commonly used to evaluate models developed
tion further aids in highlighting the most critical features by specifically for tabular data. The datasets are ready-
normalizing the coefficients, thus allowing the model to focus to-use, having undergone preprocessing to handle
more on the features that significantly impact the prediction. missing values, duplicates, and to ensure balanced
This structured approach not only improves accuracy but class distributions. They also include pre-defined train,
also provides transparency into the model’s decision-making validation, and test splits, prepared as part of the research
process, making TabNet a compelling choice for applications by Grinsztajn et al. [22]. From this extensive collection,
requiring both performance and interpretability in tabular we selected a subset of datasets relevant to the goals of
data analysis. this study.
• Data Preprocessing. Data preprocessing is a critical
III. METHODOLOGY step to prepare the datasets for training. For numerical
To effectively compare the performance of the models on features, simple normalization was applied to bring
tabular data classification and regression tasks, a structured numerical features to a similar scale, reducing the impact
methodology was designed. This framework, illustrated in of feature range discrepancies on model performance.
Fig. 2, incorporates essential steps from data collection Meanwhile, for categorical features, encoding methods,
to model evaluation and comparison, ensuring a thorough such as one-hot encoding or label encoding, were
analysis of each model’s strengths and limitations. Each applied to convert categorical variables into a numerical
stage is carefully outlined below to provide clarity on format, allowing the models to process these features
the processes undertaken, including data preprocessing, effectively. This ensures compatibility with models,
automated hyperparameter tuning, model training, and the which can’t directly interpret categorical data.

VOLUME 12, 2024 191723


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

• Hyperparameter Tuning. Hyperparameter tuning was R2 indicates the proportion of variance explained by the
conducted automatically using Optuna, an efficient model. These metrics collectively offer a comprehensive
framework for hyperparameter optimization. By defin- understanding of the models’ accuracy, robustness, and
ing a search space for key parameters—such as learning ability to generalize across data.
rate, momentum, dropout rates, and batch size— • Model Comparison. In the final stage, a comprehensive
Optuna explores a variety of configurations to find the comparison was conducted to evaluate the performance
best-performing set of hyperparameters for each model. of the five models—MLP, MLP-PLR (lite), XGBoost,
This step was crucial to balance the trade-offs between TabR, and TabNet—across various datasets. Each
underfitting and overfitting, helping to maximize model model’s strengths and weaknesses were analyzed based
performance on both training and validation datasets. on the selected evaluation metrics.
The automation with Optuna allows for a systematic
search across numerous configurations, saving time and B. OVERFITTING PREVENTION
reducing the complexity associated with manual tuning. To prevent overfitting and ensure that our models generalize
• Model Training. Five models were used in this study well to unseen data, we employed several key techniques:
to compare performance on tabular data tasks. Two
• Data Splitting: train, validation, and test splits. The
baseline models were employed:
data was divided into separate training, validation, and
1) MLP (Multilayer Perceptron): A standard neural test sets to maintain distinct datasets for training and
network model for tabular data. evaluation, ensuring that performance metrics reflect
2) MLP-PLR (lite): A model incorporating periodic generalization to unseen data.
embeddings, a linear layer, and ReLU activation, • Hyperparameter Tuning: Optuna. Hyperparameter
with a lite version used here. This lite version, tuning was conducted using Optuna to find optimal set-
introduced by Gorishniy et al. [16], simplifies tings for parameters such as learning rate, momentum,
the PLR embedding by sharing a single linear and dropout rates. This fine-tuning helps balance model
layer across all features, thereby reducing model performance and generalization.
complexity while retaining essential components • Optimizer with Regularization: AdamW optimizer.
of the PLR framework. The AdamW optimizer, which includes weight decay
In addition to these baselines, three advanced (L2 regularization), was used to penalize large weights,
models were trained: helping the models generalize and reducing the risk of
3) XGBoost: A gradient-boosting model optimized overfitting.
for tabular data, known for its high performance
in handling structured data. IV. EXPERIMENTS
4) TabR: A deep learning model that uses Retrieval- In this research, we conducted all experiments on Google
Augmented Generation (RAG) to incorporate con- Colaboratory (Google Colab), utilizing a Tesla T4 GPU to
textual information, enhancing prediction accu- ensure efficient training and testing of our deep learning
racy in complex data scenarios. models. This section details the datasets, model configura-
5) TabNet: A deep learning model with a sequential tions, comparison methodologies, and analyses performed to
attention mechanism, allowing it to focus on the evaluate the performance of each model on both classification
most relevant features at each decision-making and regression tasks. We further examine performance
step, which improves interpretability and feature variations of the primary models—XGBoost, TabR, and
selection. TabNet—across datasets of differing sample sizes and feature
Each model was trained on the pre-defined training counts to gain a comprehensive understanding of each
dataset using optimal hyperparameters obtained through model’s strengths and limitations.
Optuna. This setup enables a robust comparison across
models, with configurations that support both accuracy A. DATASET
and generalization. We used ten datasets [22] with diverse characteristics. These
• Model Evaluation. After training, each model was datasets are grouped into two categories based on task: binary
evaluated on the validation and test sets using specific classification and regression. The specific properties of each
metrics to assess performance across different tasks. dataset are presented in Table 3.
Accuracy and AUC (Area Under the Curve) were
used as metrics to evaluate classification models, B. COMPARISON
measuring their predictive power and the quality of Testing was performed on the designated test data using
their classification decisions. Mean Absolute Error each model with its optimal hyperparameters obtained from
(MAE), Root Mean Squared Error (RMSE), and R2 the tuning process. The performance of each model was
(Coefficient of Determination) were used for regression evaluated and compared based on key metrics, allowing
tasks. MAE and RMSE measure prediction errors, while for a comprehensive analysis of predictive accuracy and

191724 VOLUME 12, 2024


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

FIGURE 3. Results for binary classification tasks. ‘AUC’ represents the area under the curve, and ‘Acc.’ denotes accuracy. Higher values for
AUC and accuracy indicate better model performance.

TABLE 3. Properties of the datasets used in the study. significantly surpasses the other models, demonstrating its
ability to handle mixed data types effectively. In contrast,
while TabNet performs comparably on structured datasets
such as MagicTelescope, it struggles on more complex
datasets like credit and electricity, where its AUC scores
drop to 0.79 and 0.88, respectively. XGBoost, as expected,
remains a robust benchmark, showing competitive results
across all datasets, though it is slightly outperformed by
TabR in specific cases. MLP-PLR proves to be a strong
baseline, consistently performing close to the state-of-the-
art models, while MLP lags behind, particularly on datasets
like credit, where it achieves the lowest AUC of 0.77.
generalization across both classification and regression tasks. These visualizations underline the importance of dataset
For classification tasks, the results are illustrated in Fig. 3, characteristics in influencing model performance, with TabR
while Table 4 presents the outcomes for regression tasks. excelling on datasets requiring a deeper understanding of
Fig. 3, illustrates the performance of five models— feature interactions.
MLP, MLP-PLR, XGBoost, TabR, and TabNet—across five Meanwhile, Table 4 compares the regression performance
datasets, focusing on AUC and Accuracy scores for binary of MLP, MLP-PLR, XGBoost, TabR, and TabNet across five
classification tasks. Notably, TabR outperforms XGBoost on datasets, using RMSE, MAE, and R2 as metrics. Overall,
the electricity dataset, achieving an AUC of 0.98 and an accu- MLP-PLR and XGBoost stand out, each excelling in different
racy of 0.93, which are both higher than XGBoost’s scores. datasets. MLP-PLR performs exceptionally well on Ailerons
The electricity dataset stands out as the only one containing and pol, achieving the best RMSE, MAE, and R2 scores.
both numeric and categorical features, which may contribute XGBoost, on the other hand, shows strong results on
to TabR’s strong performance, as its retrieval-augmented superconduct and wine_quality, with the lowest RMSE and
approach is well-suited for handling diverse data types. TabR MAE and the highest R2 values, highlighting its robustness.
also performs slightly better on the MagicTelescope dataset, TabR also performs competitively, especially on the houses
where it matches or exceeds XGBoost’s metrics. Across other dataset, where it achieves the best RMSE and MAE scores.
datasets like bank-marketing and jannis, MLP, MLP-PLR, In contrast, TabNet shows relatively weaker performance
and TabNet show more consistent performance, though they across all datasets, with higher RMSE values and lower
lack the standout scores seen with TabR and XGBoost. The R2 scores, suggesting it may be less suited for these regression
credit dataset, however, remains challenging for all models, tasks.
showing lower scores overall.
The ROC curves, as shown in Fig. 4 further support 1) COMPARISON BY TOTAL SAMPLES
these findings by highlighting TabR’s consistent strength in The comparison of AUC and accuracy scores based on total
classification tasks. For the electricity dataset, TabR’s AUC sample size for classification tasks can be seen in Fig. 5. Both

VOLUME 12, 2024 191725


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

FIGURE 4. Receiver operating characteristic (ROC) curves for binary classification tasks across five datasets. Each curve
represents the trade-off between true positive rate (TPR) and false positive rate (FPR) for a specific dataset, with the area under the
curve (AUC) indicating the performance of the classifier on that dataset.

TABLE 4. Results for Regression Tasks. ‘MAE’ represents mean absolute error, ‘RMSE’ denotes root mean square error, and R 2 indicates the coefficient
of determination. Lower values of RMSE and MAE suggest better model performance, while higher R 2 values indicate a better fit to the data.

charts reveal that the three primary models—XGBoost, TabR, one of the largest sample sizes (21,263 samples). However,
and TabNet—exhibit similar trends in AUC and accuracy, houses, with a comparable sample size (20,640 samples),
with rises and falls in performance as the total sample yields significantly lower MAE and RMSE values across
size increases. However, there is generally no consistent models, indicating that factors beyond sample size—such as
performance across the three models. This indicates that there feature complexity and dataset-specific characteristics—play
is no clear performance pattern for the three models based on a substantial role in influencing model performance. In
total samples in the binary classification tasks used. terms of R2 scores, the models also show varied performance
Meanwhile, the comparison of MAE and RMSE values without a consistent pattern based on sample size. For
based on total samples for regression tasks can be seen in instance, while XGBoost achieves a high R2 of 0.991 on
Fig. 6, while for R2 scores can be seen in Fig. 7. It can be the superconduct dataset, TabR and TabNet perform slightly
observed that all three models exhibit similar fluctuations lower with R2 values of 0.982 and 0.851, respectively.
in performance across datasets with differing sample sizes. Notably, TabNet obtains a negative R2 score of -0.126 on
Among the five datasets, the models show their highest MAE the Ailerons dataset, indicating that its predictions are worse
and RMSE values on the superconduct dataset, which has than a mean-based prediction for this dataset, despite its

191726 VOLUME 12, 2024


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

FIGURE 5. Comparison of AUC and accuracy by total sample size.

FIGURE 6. Comparison of MAE and RMSE by total sample size.

2) COMPARISON BY TOTAL FEATURES


The comparison of AUC and accuracy scores based on the
number of features for classification tasks can be seen in
Fig. 8. In both figures, it can be observed that the results
of the three models, in terms of both AUC and accuracy,
exhibit similar characteristics. All three models experience
similar rises and falls in performance as the number of
features increases. However, there is generally no consistent
performance across the three models. This indicates that
there is no clear performance pattern for the three models
based on the number of features in the binary classification
tasks used. The comparison of MAE and RMSE values
FIGURE 7. Comparison of R 2 by total sample size. based on the number of features for regression tasks can be
seen in Fig. 9, while for R2 scores can be seen in Fig. 10.
Among the five datasets, the superconduct dataset, which
moderate sample size of 13,750. On smaller datasets like has the highest number of features (79), shows the worst
wine_quality (6,497 samples), the R2 scores vary, with MAE and RMSE values for all models, potentially due to its
XGBoost achieving 0.502, TabR reaching 0.472, and TabNet complex feature space. In terms of R2 scores, the models
falling behind at 0.240. This lack of a consistent trend show mixed performance across datasets with varying feature
in R2 , MAE, and RMSE, AUC, and accuracy values across counts, with no clear trend that ties performance directly
datasets with differing sample sizes indicates that there is no to feature count alone. For instance, XGBoost achieves a
clear performance pattern for the three models based solely high R2 of 0.991 on superconduct (79 features), while on the
on total samples in both regression and classification tasks smaller houses dataset (8 features), it achieves an R2 of 0.854.
analyzed. Similarly, TabR performs well on pol (26 features) with an

VOLUME 12, 2024 191727


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

FIGURE 8. Comparison of AUC and accuracy by total features.

R2 of 0.982 but shows lower performance on wine_quality lighting the potential need for specialized preprocessing
(11 features) with an R2 of 0.472. TabNet, which struggles on techniques, such as feature engineering, noise reduction,
most datasets, even produces a negative R2 score (-0.126) on or advanced data augmentation methods, to achieve
Ailerons (33 features), highlighting its difficulty with certain improved results.
feature configurations. Overall, as with classification tasks, • Regression Tasks - Substantial Differences Despite
there is no consistent pattern in MAE, RMSE, or R2 scores Similar Dataset Characteristics: Although the sample
across models based solely on the number of features in size and number of features between the Ailerons and pol
the regression tasks analyzed. This suggests that feature datasets are not significantly different, the MAE (Mean
complexity, rather than simply feature count, likely plays a Absolute Error), RMSE (Root Mean Square Error), and
significant role in influencing model performance. R2 values of each model on these two datasets show
substantial differences. The similar behavior across
C. ANALYSIS
TabR, TabNet, and XGBoost on these datasets, which
both consist solely of numerical features, suggests
From the performance comparison based on total samples and
that factors other than sample size and feature count,
number of features, the following conclusions can be drawn:
such as intrinsic data distribution, feature correlation
• Classification Tasks - Superior Performance on or variance, play a critical role in model performance.
Diverse Data Types: TabR and XGBoost exhibited This discrepancy highlights that specific characteristics
the best performance (both in AUC and accuracy) on of the pol dataset may be better aligned with these
the electricity dataset, which has a relatively large models, while the Ailerons dataset may require tailored
sample size (38,474 samples) and a moderate number preprocessing or further analysis to optimize perfor-
of features (8 features). Notably, this dataset contains mance. These observations emphasize the importance
more than one type of feature, including both numerical of understanding underlying data characteristics in
and categorical data. Despite the challenge posed by regression tasks to achieve optimal results.
the mixture of data types, TabR demonstrated strong • Regression Tasks - Impact of High Dimensionality
capability in effectively managing datasets with diverse and Large Total Sample Size: In regression tasks, all
feature types, outperforming XGBoost. This indicates three models—XGBoost, TabR, and TabNet—exhibited
that TabR’s architecture is particularly well-suited for their highest MAE and RMSE values on the supercon-
handling datasets that combine different data types, duct dataset, which has the largest sample size and the
showcasing its flexibility and robustness in real-world highest count among the regression datasets. Despite
scenarios where such diversity is common. these high error metrics, the relatively strong R2 scores
• Classification Tasks - Challenges with Specific across models indicate that they still capture a significant
Datasets: In contrast, all three models—TabR, TabNet, portion of the variance within this complex dataset.
and XGBoost—performed the worst on the credit This suggests that while the models can recognize
dataset. This dataset does not have the smallest samples general patterns, the high dimensionality and sample
or the highest number of features, yet the significant size introduce challenges that increase prediction error.
drop in performance across all three models suggests Handling such high-dimensional data may benefit from
that it may present unique challenges, possibly due to additional techniques, such as feature selection or
issues such as high noise levels or complex feature inter- dimensionality reduction, to help improve precision
actions. This underperformance implies that standard without sacrificing the broader trends identified by the
approaches may not be sufficient for this dataset, high- models.

191728 VOLUME 12, 2024


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

FIGURE 9. Comparison of MAE and RMSE by total features.

ReLU) and TabR_S (using sigmoid) across several binary


classification datasets, we aim to determine whether this
modification affects model effectiveness in terms of AUC and
accuracy scores. The results of this comparison are presented
in Fig. 11. Overall, changing the activation function in TabR
from ReLU to sigmoid does not appear to significantly impact
AUC or accuracy scores across the datasets. Both versions
of TabR perform similarly to each other and generally match
or outperform XGBoost, especially on the electricity dataset.
The findings suggest that the choice between ReLU and
sigmoid activation functions has minimal effect on TabR’s
binary classification performance in these tasks.
FIGURE 10. Comparison of R 2 by total features.
B. DEEP EQUILIBRIUM MODELS (DEQ)
The Deep Equilibrium (DEQ) approach is a neural network
technique designed to find and optimize a single fixed point
• Consistency Across Models on Large Datasets: On (equilibrium) for an entire sequence of data, effectively
datasets with a large sample size, such as electricity achieving convergence [39]. DEQ has demonstrated signifi-
and jannis, and with a higher number of features cant advantages, particularly in processing high-dimensional
like superconduct, TabR and TabNet demonstrated image data. Its ability to efficiently manage complex data
that they can compete with XGBoost. This finding and achieve convergence makes it a powerful tool, especially
underscores the potential of these deep learning models in scenarios where traditional networks may struggle with
to handle complex datasets without significant loss in issues such as vanishing gradients or the need for very deep
performance. Their ability to remain competitive in sce- layers. Given these strengths, we explored the potential of
narios traditionally dominated by ensemble tree-based applying DEQ to tabular data—a domain where deep learning
models like XGBoost highlights the adaptability of has traditionally been less dominant compared to methods
TabR and TabNet, suggesting that with proper tuning, like decision trees or gradient-boosting machines.
deep learning models can be a viable alternative for
large-scale tabular data analysis. z∗ = fθ (z∗ , x) (1)
Here, x is the input fed into the network, while z∗ stands as
V. OTHER EXPERIMENTS its output. To investigate this potential, we integrated the DEQ
A. CHANGING ACTIVATION FUNCTION approach (1) into the TabR model, a specialized framework
In this experiment, we explored the impact of changing the for tabular data representation and prediction. We utilized
activation function in the TabR model from ReLU to sigmoid, TorchDEQ, a library that provides the necessary tools for
resulting in a modified version named TabR_S. While ReLU implementing DEQ within the PyTorch framework. Our
is widely used in deep learning models due to its efficiency integration specifically targeted the predictor P blocks within
in handling complex data patterns, sigmoid is also a common the TabR model, aiming to leverage DEQ’s equilibrium-
choice, particularly in binary classification tasks, because it seeking capability to enhance the model’s predictive accuracy
outputs values between 0 and 1, which can be interpreted as and stability when applied to tabular data. The results of this
probabilities. By comparing the performance of TabR (using experiment are shown in Fig. 12 and Table 5.

VOLUME 12, 2024 191729


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

FIGURE 11. Comparison of Binary Classification-Task results using TabR with sigmoid. ‘‘TabR_S’’ represents TabR with sigmoid
activation function.

FIGURE 12. Comparison of TabR and TabR+DEQ on binary classification-task datasets.

TABLE 5. Comparison of TabR and TabR+DEQ on Regression-task datasets.

Despite the theoretical advantages of DEQ, our empirical of the modified TabR model with DEQ integration was
results did not align with expectations. The performance not only less impressive than the original version, but it

191730 VOLUME 12, 2024


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

also raised concerns about the suitability of DEQ for this needed to explore its full potential and applicability in regres-
type of data. Unlike image data, where DEQ’s convergence sion scenarios. This study lays the groundwork for future
properties can lead to substantial improvements, tabular data investigations into the nuanced interplay between RAG, deep
may not benefit in the same way. This disparity could be learning architectures, and the inherent characteristics of
due to the inherent differences in data structure, where tabular data, ultimately driving advancements within this
the simpler relationships in tabular data do not necessitate domain.
the complex equilibrium-seeking mechanisms that DEQ
provides. REFERENCES
[1] V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci,
VI. FUTURE WORKS ‘‘Deep neural networks and tabular data: A survey,’’ IEEE Trans. Neural
Based on the research findings, the following development Netw. Learn. Syst., pp. 1–21, 2022.
[2] R. Shwartz-Ziv and A. Armon, ‘‘Tabular data: Deep learning is
suggestions can be made: not all you need,’’ Inf. Fusion, vol. 81, pp. 84–90, May 2022, doi:
1) Adding more datasets from various domains with vary- 10.1016/j.inffus.2021.11.011.
ing sizes and numbers of features can provide a more [3] T. Chen and C. Guestrin, ‘‘XGBoost: A scalable tree boosting system,’’
in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
comprehensive understanding of model performance. Aug. 2016, pp. 785–794.
2) Further optimizing the TabR and TabNet models with [4] S. Ramraj, N. Uzir, R. Sunil, and S. Banerjee, ‘‘Experimenting XGBoost
various model configurations and other parameter algorithm for prediction and classification of different datasets,’’ Int. J.
Control Theory Appl., vol. 9, no. 40, pp. 651–662, 2016.
tuning to improve performance on different datasets. [5] J. Wu, Y. Li, and Y. Ma, ‘‘Comparison of XGBoost and the neural network
3) Using data augmentation techniques to expand existing model on the class-balanced datasets,’’ in Proc. IEEE 3rd Int. Conf.
datasets can help address overfitting issues and enhance frontiers Technol. Inf. Comput. (ICFTIC), Nov. 2021, pp. 457–461.
[6] F. Giannakas, C. Troussas, A. Krouska, C. Sgouropoulou, and I. Voyiatzis,
model generalization. ‘‘XGBoost and deep neural network comparison: The case of Teams’
4) Conducting comparative studies with ensemble mod- performance,’’ in Proc. 17th Int. Conf. Intell. Tutoring Syst. Cham,
els, other newer deep learning models [40], [41] or Switzerland: Springer, Jan. 2021, pp. 343–349.
other classic machine learning models with RAG to [7] Y. Zhao, G. Chetty, and D. Tran, ‘‘Deep learning with XGBoost for
real estate appraisal,’’ in Proc. IEEE Symp. Ser. Comput. Intell. (SSCI),
understand the strengths and weaknesses of each model Dec. 2019, pp. 1396–1401.
in specific contexts. [8] S. A. Fayaz, M. Zaman, S. Kaul, and M. A. Butt, ‘‘Is deep learning on
5) Testing the models on various tasks other than classi- tabular data enough? An assessment,’’ Int. J. Adv. Comput. Sci. Appl.,
vol. 13, no. 4, pp. 1–8, 2022.
fication and regression, such as anomaly detection or [9] P. Devan and N. Khare, ‘‘An efficient XGBoost–DNN-based classification
clustering, to evaluate the flexibility and utility of the model for network intrusion detection system,’’ Neural Comput. Appl.,
models in various applications. vol. 32, no. 16, pp. 12499–12514, Aug. 2020.
[10] Y. Qiu, J. Zhou, M. Khandelwal, H. Yang, P. Yang, and C. Li, ‘‘Performance
6) Investigating the application of RAG-based models like evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost
TabR for regression tasks. models to predict blast-induced ground vibration,’’ Eng. With Comput.,
vol. 38, no. S5, pp. 4145–4162, Dec. 2022.
VII. CONCLUSION [11] K. K. Yun, S. W. Yoon, and D. Won, ‘‘Prediction of stock price
direction using a hybrid GA-XGBoost algorithm with a three-stage
This study provides a comprehensive evaluation of TabR, a feature engineering process,’’ Expert Syst. Appl., vol. 186, Dec. 2021,
novel Retrieval-Augmented Generation (RAG)-based deep Art. no. 115716.
learning model, comparing it against established bench- [12] X. Y. Liew, N. Hameed, and J. Clos, ‘‘An investigation of XGBoost-based
algorithm for breast cancer classification,’’ Mach. Learn. With Appl., vol. 6,
marks like XGBoost and TabNet across a diverse range Dec. 2021, Art. no. 100154.
of tabular datasets. Our findings reveal a compelling [13] S. Thongsuwan, S. Jaiyen, A. Padcharoen, and P. Agarwal, ‘‘ConvXGB:
narrative: TabR, empowered by its RAG capabilities, A new deep learning model for classification problems based on CNN and
XGBoost,’’ Nucl. Eng. Technol., vol. 53, no. 2, pp. 522–531, Feb. 2021.
demonstrates superior performance in classification tasks, [14] S. Popov, S. Morozov, and A. Babenko, ‘‘Neural oblivious decision
notably excelling on datasets like credit, electricity, and ensembles for deep learning on tabular data,’’ 2019, arXiv:1909.06312.
MagicTelescope, even surpassing XGBoost. This underscores [15] G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and
the potential of integrating RAG within deep learning T. Goldstein, ‘‘SAINT: Improved neural networks for tabular data via row
attention and contrastive pre-training,’’ 2021, arXiv:2106.01342.
architectures for classification problems in tabular data. [16] Y. Gorishniy, I. Rubachev, N. Kartashev, D. Shlenskii, A. Kotelnikov, and
Interestingly, dataset characteristics such as size and dimen- A. Babenko, ‘‘TabR: Tabular deep learning meets nearest neighbors in
sionality did not show a direct correlation with model 2023,’’ 2023, arXiv:2307.14338.
[17] S. O. Arik and T. Pfister, ‘‘TabNet: Attentive interpretable tabular
performance. learning,’’ 2019, arXiv:1908.07442.
These results underscore the importance of careful model [18] F. den Breejen, S. Bae, S. Cha, T.-Y. Kim, S. H. Koh, and S.-Y. Yun,
selection in tabular data analysis, emphasizing that the ‘‘Fine-tuning the retrieval mechanism for tabular deep learning,’’ 2023,
arXiv:2311.07343.
optimal choice between RAG-augmented deep learning,
[19] Y. Chen, H. Li, H. Dou, H. Wen, and Y. Dong, ‘‘Prediction and visual
traditional deep learning, and established machine learning analysis of food safety risk based on TabNet-GRA,’’ Foods, vol. 12, no. 16,
techniques hinges on the specific task at hand, whether p. 3113, Aug. 2023, doi: 10.3390/foods12163113.
it be classification or regression. While the integration of [20] Y. Jin, Z. Ren, W. Wang, Y. Zhang, L. Zhou, X. Yao, and T. Wu, ‘‘Clas-
sification of Alzheimer’s disease using robust TabNet neural networks on
RAG within deep learning architectures like TabR shows genetic data,’’ Math. Biosci. Eng., vol. 20, no. 5, pp. 8358–8374, 2023, doi:
significant promise for classification tasks, further research is 10.3934/mbe.2023366.

VOLUME 12, 2024 191731


J. Pasaribu et al.: Tabular Data Classification and Regression: XGBoost or Deep Learning

[21] K. McDonnell, F. Murphy, B. Sheehan, L. Masello, and G. Castignani, JONINDO PASARIBU was born in Bekasi,
‘‘Deep learning in insurance: Accuracy and model interpretability using Indonesia, in July 2002. He is currently pursuing
TabNet,’’ Expert Syst. Appl., vol. 217, May 2023, Art. no. 119543, doi: the bachelor’s degree in informatics engineering
10.1016/j.eswa.2023.119543. with the Faculty of Computer Science, Universitas
[22] L. Grinsztajn, E. Oyallon, and G. Varoquaux, ‘‘Why do tree-based models Brawijaya, Malang, Indonesia.
still outperform deep learning on tabular data?’’ 2022, arXiv:2207.08815. He has a strong passion for machine learning,
[23] Y. LeCun, Y. Bengio, and G. E. Hinton, ‘‘Deep learning,’’ Nature, vol. 521, artificial intelligence, and data science. During his
no. 7553, pp. 436–444, 2015. academic career, he has participated in various
[24] S. Yogasudha, K. Mounika, P. R. Namitha, and K. M. R. Priya, ‘‘Deep
projects and internships that have enhanced his
learning,’’ Int. J. Res. Eng., Sci. Manage., Jan. 2018.
skills and knowledge in these fields. He has also
[25] N. G. Polson and V. O. Sokolov, ‘‘Deep learning,’’ 2018,
been involved in several research studies focusing on the application of
arXiv:1807.07987.
[26] R. K. Mishra, G. Y. S. Reddy, and H. Pathak, ‘‘The understanding of machine learning techniques in real-world problems. His current research
deep learning: A comprehensive review,’’ Math. Problems Eng., vol. 2021, interests include developing innovative algorithms and models to improve
pp. 1–15, Apr. 2021, doi: 10.1155/2021/5548884. data processing and analysis.
[27] P. Lewis, ‘‘Retrieval-augmented generation for knowledge-intensive NLP
tasks,’’ in Proc. Adv. Neural Inf. Process. Syst., Jan. 2020, pp. 9459–9474.
[28] A. Blattmann, R. Rombach, K. Oktay, J. Müller, and B. Ommer,
‘‘Retrieval-augmented diffusion models,’’ in Proc. Adv. Neural Inf.
Process. Syst., vol. 35, 2022, pp. 15309–15324. NOVANTO YUDISTIRA received the bachelor’s
[29] O. Ram, Y. Levine, I. Dalmedigos, D. Muhlgay, A. Shashua, degree in computer science and in informatics
K. Leyton-Brown, and Y. Shoham, ‘‘In-context retrieval-augmented engineering from the Institut Teknologi Sepuluh
language models,’’ Trans. Assoc. Comput. Linguistics, vol. 11, Nopember (ITS), in November 2007, the Master of
pp. 1316–1331, Nov. 2023. Science (M.Sc.) degree in computer science from
[30] G. Wang, Y. Liu, S. Liu, L. Zhang, and L. Yang, ‘‘REMFLOW: Rag- Universiti Teknologi Malaysia (UTM), in 2011,
enhanced multi-factor rainfall flooding warning in sponge airports via large and the Doctor of Engineering (Dr.Eng.) degree in
language model,’’ Tech. Rep., 2024. information engineering from the Faculty of Engi-
[31] X. Li, Z. Li, C. Shi, Y. Xu, Q. Du, M. Tan, J. Huang, and W. Lin, ‘‘AlphaFin: neering, Hiroshima University, Japan, in 2018.
Benchmarking financial analysis with retrieval-augmented stock-chain He is currently a Lecturer and a Researcher with
framework,’’ 2024, arXiv:2403.12582. the Faculty of Computer Science, Universitas Brawijaya, Indonesia. In 2016,
[32] X. Chen, M. You, L. Wang, W. Liu, Y. Fu, J. Xu, S. Zhang, G. Chen, K. Li, he was involved in a research collaboration with the Mathematical Neu-
and J. Li, ‘‘Evaluating and enhancing large language models performance roinformatics Group, National Institute of Advanced Industrial Science and
in domain-specific medicine: Osteoarthritis management with DocOA,’’
Technology (AIST), Japan. In 2018, he pursued a Postdoctoral fellowship in
2024, arXiv:2401.12998.
informatics and data analytics for two years in collaboration with Japan’s
[33] C. Zakka, R. Shad, A. Chaurasia, A. R. Dalal, J. L. Kim, M. Moor, R. Fong,
C. Phillips, K. Alexander, and E. Ashley, ‘‘Almanac-retrieval-augmented largest scientific research institute, RIKEN, and Osaka University. His
language models for clinical medicine,’’ NEJM AI, vol. 1, no. 2, 2024, current research interests include deep learning in computers, multi-modal
Art. no. AIoa2300068. learning computer vision, medical informatics, and big data analytics.
[34] W. E. Thompson, D. M. Vidmar, J. K. De Freitas, J. M. Pfeifer, He is also actively involved as the Founder of Deep Learning Indonesia,
B. K. Fornwalt, R. Chen, G. Altay, K. Manghnani, A. C. Nelsen, a community that develops and studies advancements in deep learning
K. Morland, M. C. Stumpe, and R. Miotto, ‘‘Large language models with algorithms in Indonesia. Additionally, he is currently conducting various
retrieval-augmented generation for zero-shot disease phenotyping,’’ 2023, research and community service activities in collaboration with various
arXiv:2312.06457. institutions, companies, and universities in Indonesia. His fields of expertise:
[35] R. Zhu, Y. Yang, and J. Chen, ‘‘XGBoost and CNN-LSTM hybrid model deep learning, computer vision, data mining, and pattern recognition.
with attention-based stock prediction,’’ in Proc. IEEE 3rd Int. Conf.
Electron. Technol., Commun. Inf. (ICETCI), May 2023, pp. 359–365.
[36] W. Ullah, F. U. M. Ullah, Z. A. Khan, and S. W. Baik, ‘‘Sequential attention
mechanism for weakly supervised video anomaly detection,’’ Expert Syst.
Appl., vol. 230, Nov. 2023, Art. no. 120599. WAYAN FIRDAUS MAHMUDY was born in
[37] X. Zheng, X. Li, Z. Chen, L. Sun, Q. Yu, L. Guo, and Y. Luo, Gresik, East Java. He received the Bachelor of
‘‘Enhanced self-attention mechanism for long and short term sequential Science degree from the Department of Mathe-
recommendation models,’’ IEEE Trans. Emerg. Topics Comput. Intell., matics, Universitas Brawijaya, in 1995, the Master
vol. 8, no. 3, pp. 2457–2466, Jun. 2024. of Engineering degree in informatics from the
[38] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, ‘‘Language modeling Institut Teknologi Sepuluh Nopember, Surabaya,
with gated convolutional networks,’’ in Proc. 34th Int. Conf. Mach. Learn.,
in 1999, and the Ph.D. degree in manufacturing
vol. 70, D. Precup and Y. W. Teh, Eds., Aug. 2017, pp. 933–941.
engineering from the University of South Aus-
[39] S. Bai, J. Z. Kolter, and V. Koltun, ‘‘Deep equilibrium models,’’ 2019,
arXiv:1909.01377.
tralia, in 2014. He has been a Lecturer with
[40] A. Kotelnikov, D. Baranchuk, I. Rubachev, and A. Babenko, ‘‘TabDDPM: Universitas Brawijaya, since 1997, and has held
Modelling tabular data with diffusion models,’’ in Proc. Int. Conf. several positions, including the Coordinator of the Diploma 3 Education
Mach. Learn., 2023, pp. 17564–17579. Program in Informatics and Computer Engineering (MITEK), the Secretary
[41] R. Richman and M. V. Wüthrich, ‘‘LocalGLMnet: Interpretable deep of the Diploma 3 MITEK Program, and the Head of the Computer Science
learning for tabular data,’’ Scandin. Actuarial J., vol. 2023, no. 1, Study Program, Faculty of Mathematics and Natural Sciences (MIPA).
pp. 71–95, Jan. 2023.

191732 VOLUME 12, 2024

You might also like