AI-Driven_Feature_Extraction_and_Classification_Algorithm_for_Large-Scale_High-Dimensional_Data

This paper presents an AI-driven feature extraction and classification algorithm designed to address challenges in processing large-scale high-dimensional data. Utilizing a self-attention mechanism and Transformer network, the algorithm demonstrates superior performance in classifying data compared to traditional methods, particularly when tested on the MNIST dataset. The findings highlight the potential of AI techniques to enhance feature extraction and classification in various fields, paving the way for future research and applications.

Uploaded by

aditya.routray2809

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

AI-Driven_Feature_Extraction_and_Classification_Algorithm_for_Large-Scale_High-Dimensional_Data

Uploaded by

aditya.routray2809

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2024 3rd International Conference on Artificial Intelligence and Autonomous Robot Systems (AIARS)

AI-Driven Feature Extraction and Classification

Algorithm for Large-Scale High-Dimensional Data
2024 3rd International Conference on Artificial Intelligence and Autonomous Robot Systems (AIARS) | 979-8-3503-7617-3/24/$31.00 ©2024 IEEE | DOI: 10.1109/AIARS63200.2024.00140

Puwen An1, Yiding Huang1, Chengkuan Liu1, Xianyue Chen1, Laizhuo Xiang2
1
Dalian University of Technology, Dalian, Liaoning, 116024 China
2
China Railway Guangzhou Group Co., Ltd. Guangzhou EMU Depot, Guangzhou, Guangdong, 510088 China
[email protected], [email protected], [email protected],
[email protected], [email protected]

Abstract—In this paper, an AI-driven feature extraction representations can be effectively learned from data, thereby
and classification algorithm is studied to solve the challenges in achieving accurate classification of data[4-6]. In addition,
large-scale high-dimensional data processing. The algorithm emerging methods such as reinforcement learning, transfer
proposed in this paper uses self-attention mechanism and learning, and meta learning have also provided new ideas
Transformer network to effectively capture the dependence and technical means for processing large-scale high-
between the internal features of the sample in the feature dimensional data.
extraction stage, and realizes end-to-end classification in the
classification stage. Experimental validation conducted on the This paper aims to discuss the research status, methods
MNIST dataset substantiates that the introduced algorithm and applications of AI-driven feature extraction and
surpasses conventional techniques in terms of performance classification algorithms for large-scale high-dimensional
across varying self-attention head depths, exhibiting enhanced data. Through in-depth discussion of related theories,
generalization capabilities. Furthermore, an assessment of the technologies and practices, the paper will reveal the
algorithm's efficacy is presented across diverse data volumes, advantages and limitations of AI in processing large-scale
revealing a consistent enhancement in performance as the high-dimensional data, and look forward to the future
training dataset size augments. The findings of this research research direction and application prospects, so as to make
offer innovative perspectives and methodologies for tackling contributions to promoting the development of this field.
feature extraction and classification challenges within extensive
high-dimensional datasets, contributing significantly to both II. OVERVIEW OF LARGE-SCALE HIGH-DIMENSIONAL DATA
theoretical foundations and practical applications.
With the continuous progress of science and technology
Keywords—AI, feature extraction and classification, large- and the improvement of information level, large-scale high-
scale high-dimensional data dimensional data has become the norm in all fields of today's
society. Because of its large scale and high dimensionality,
I. INTRODUCTION these data pose unprecedented challenges to data processing
With the rapid development of information technology and analysis.
and the diversification of data generation methods, large- Large-scale high-dimensional data usually refers to data
scale high-dimensional data has become an important set containing a large number of samples and high-
resource in all fields of society today. The generation of dimensional features. Among them, "large-scale" refers to
these data covers all forms, from user behavior data on the the huge scale of the data set, which may contain billions or
Internet to remote sensing data in scientific research, and its even more samples; "High dimension" means that the feature
high dimension and large scale bring unprecedented dimension of each sample is very high, which may contain
challenges to traditional data processing and analysis. In this hundreds or even thousands of features.
context, how to extract effective features from large-scale
high-dimensional data and classify them accurately has In the field of biomedicine, genomics, protein omics and
become one of the focuses of academic and industrial circles. neuroscience, a large number of high-dimensional data have
been generated, such as gene expression data, protein
Traditional methods face many challenges when dealing interaction network data and brain image data. In the field of
with large-scale high-dimensional data. One of the most Internet and social media, user behavior data, social network
prominent problems is the disaster of data dimension, that is, data and text data also show large-scale and high-
when the dimension of data becomes very high, traditional dimensional characteristics[7]. In engineering and scientific
feature extraction, selection and classification algorithms research, remote sensing data, meteorological data and
often cannot effectively deal with[1-2]. In addition, due to seismic data are often large-scale and high-dimensional.
the complexity and diversity of data, traditional algorithms
also have limitations in extracting potential features from Large-scale high-dimensional data are widely used in
data and accurately classifying them. However, the rapid various fields, such as data analysis and pattern recognition
development of AI in recent years provides new in biomedicine, genomics, protein omics and medical
opportunities to solve these problems. As a branch of AI, imaging. Internet and social media, user behavior analysis,
deep learning has shown great potential in processing large- recommendation system and public opinion analysis [8].
scale high-dimensional data[3]. By constructing deep neural Data processing and pattern recognition in the fields of
networks, especially models such as Convolutional Neural engineering and scientific research, remote sensing data
Networks (CNNs), Recurrent Neural Networks (RNNs), and analysis, meteorological prediction and geological
Autoencoders, more abstract and high-level feature exploration.

979-8-3503-7617-3/24/$31.00 ©2024 IEEE 744

DOI 10.1109/AIARS63200.2024.00140
Authorized licensed use limited to: KIIT University. Downloaded on January 29,2025 at 13:40:23 UTC from IEEE Xplore. Restrictions apply.
To sum up, large-scale high-dimensional data has high-dimensional data[11]. In addition, due to the increase of
important application and research value in various fields data dimension, the traditional classification algorithm also
because of its large scale and high dimension. Effective faces great challenges in terms of computational complexity
processing and analysis of this kind of data will help to and storage space, which leads to a significant reduction in
reveal the potential laws and information in the data and the efficiency of the algorithm.
promote the development and progress of related fields.
IV. AI-DRIVEN FEATURE EXTRACTION AND CLASSIFICATION
III. CHALLENGES OF TRADITIONAL METHODS ALGORITHM
With the popularization and application of large-scale Transformer network is a neural network structure based
high-dimensional data, traditional methods face many on self-attention mechanism, which is mainly used to process
challenges when dealing with such data. These challenges sequence data, such as text data in natural language
mainly include the disaster of data dimension, the difficulty processing (NLP). Compared with traditional RNN and
of feature extraction and selection, and the degradation of CNN, Transformer network has higher parallelism and
classification algorithm performance. stronger long-distance dependence modeling ability, so it has
achieved remarkable performance improvement in many
(1)Data dimension disaster
NLP tasks.
With the increase of data dimension, traditional data
The Transformer architecture primarily comprises an
processing and analysis methods often encounter the
encoder and a decoder. The role of the encoder is to translate
problem of dimension disaster. The increase of data
the input sequence into an intermediary format, while the
dimension leads to the exponential growth of sample space,
decoder generates an output sequence based on this format.
which makes the data very sparse, and the distance between
Both the encoder and decoder utilize multi-layer self-
samples becomes blurred in high-dimensional space[9]. This
attention mechanisms and positional encoding for sequence
makes the traditional distance measurement method and the
data processing. At the heart of the Transformer model lies
method based on statistical learning have greatly reduced the
the self-attention mechanism, enabling the model to focus on
effect when dealing with high-dimensional data, and it is
other elements within the input sequence when handling each
difficult to obtain accurate results.
element. This approach grants the model the capacity to
(2)Difficulties in feature extraction and selection recognize long-range dependencies without being confined
to a static context window size. Utilizing multi-head self-
For large-scale high-dimensional data, how to effectively attention, the model can distill information from various
extract and select meaningful features is a key problem. subspaces, thereby augmenting its overall representational
Traditional feature extraction methods are often based on power.
manual design and need to rely on the experience and
knowledge of domain experts. For high-dimensional data, In recent years, Transformer network has achieved great
feature selection and extraction becomes more difficult, success in NLP and other fields, and its self-attention
because the dimension of feature space is very high, and it is mechanism can effectively capture long-range dependencies
difficult to find the best feature representation corresponding in sequences. We introduce this mechanism into the task of
to the problem[10]. In addition, traditional feature selection feature extraction and classification and design an innovative
methods often only consider the relevance and importance AI-driven feature extraction and classification algorithm[12].
between features, but ignore the relevance between features The algorithm captures the dependency relationship between
and labels, resulting in the selected features may not be the the internal features of the sample in the feature extraction
most distinctive features. stage through the self-attention mechanism, and uses the
Transformer network for end-to-end classification in the
(3)Performance degradation of classification algorithm classification stage.
In large-scale high-dimensional data, traditional In this study, a feature extractor based on self-attention
classification algorithms often face the problem of mechanism (Figure 1) is designed to extract representative
performance degradation. This is mainly because traditional features from original data.
classification algorithms are often based on statistical
learning methods, which require explicit feature extraction
and selection of data, but this method is often ineffective in

745

Authorized licensed use limited to: KIIT University. Downloaded on January 29,2025 at 13:40:23 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Feature extractor based on self-attention mechanism

Assume that the input data is a feature vector sequence Where WQ , WK , WV is the linear transformation matrix
X   x1 , x2 , , xn  with dimension d , where n of query, key and value respectively. Through the self-
represents the length of the sample sequence. The attention mechanism, a new feature representation Z can be
eigenvector of self-attention representation is calculated by obtained, in which the feature vector of each position takes
the following formula: into account the information of all positions in the sequence.
After the representation sequence Z is obtained in the
 XWQ  XWK T  feature extraction stage, it is input into the Transformer
Z  softmax   XWV (1) network for classification (Figure 2). In this study, a
 d  Transformer classifier is designed, whose structure is similar
 
to the standard Transformer model, but the output layer uses
softmax function to classify.

Fig. 2. Transformer classifier

The calculation process of Transformer network is as MultiHead  Concat  head1 ,head 2 , ,head h  W O
follows:
(3)

 QK T 
Attention   Q,K,V   softmax  V (2) Transformer  X  
 d 
 k 

LayerNorm X  MultiHead  XWQ , XWK , XWV  
(4)

746

Authorized licensed use limited to: KIIT University. Downloaded on January 29,2025 at 13:40:23 UTC from IEEE Xplore. Restrictions apply.
Output  X   softmax  XWO  (5) Experimental environment configuration: CPU: Intel
Core i7-8700k @ 3.70ghz; GPU: NVIDIA GeForce RTX
2080 Ti; Memory: 32 GB DDR 4; Storage: SSD.
Among them, Q,K,V is the sequence after linear
Software environment: operating system: Ubuntu 20.04
transformation of query, key and value respectively; LTS; Deep learning framework: PyTorch 1.8.1; Python
W O ,WQ ,WK ,WV is a linear transformation matrix; h is version: 3.8.5.
the number of heads of attention; dk is the dimension of Images are normalized and their pixel intensities are
adjusted to a range of 0 to 1. The configuration for the self-
each head; Concat stands for the splicing operation of attention mechanism includes four heads, with each head
multi-head attention results. having a hidden dimensionality of 256. The batch size is set
Through the Transformer network, end-to-end feature at 64, while the learning rate is fixed at 0.001. The Adam
extraction and classification can be realized without optimization algorithm facilitates the training process. Under
manually designing features or classifiers, which makes the the specified computational environment, the duration for
algorithm more automatic and intelligent. model training varies from approximately 30 minutes to an
hour, influenced by the model's complexity and the extent of
V. EXPERIMENT AND RESULT ANALYSIS the dataset.
To assess the efficacy of the novel AI-based feature Figure 3 distinctly illustrates fluctuations in the
extraction and classification approach, an array of trials are algorithm's performance across varying depths of self-
executed within the scope of this document, accompanied by attention heads. An incremental rise in the quantity of self-
a thorough examination of the outcomes. The benchmark attention heads correlates with a pattern where the
MNIST dataset, comprising handwritten digits, serves as the algorithm's classification precision initially improves, only to
testbed for these experiments, offering a corpus of 60,000 decline subsequently. Optimal performance and peak
training images and 10,000 evaluation images, each adhering classification accuracy are attained when the count of deep
to the 28*28 pixel grayscale format. The novel algorithm is self-attention heads reaches four. This shows that in the
then pitted against the conventional CNN and logistic current algorithm design, choosing the appropriate number of
regression-based techniques in terms of performance metrics. self-attention heads can achieve the best performance, which
not only considers the effectiveness of the model, but also
avoids excessive complexity and increase of computational
overhead.

Fig. 3. Algorithm performance under different number of attention heads

At lower counts of self-attention heads, the algorithm complexity and computational demands, consequently
may struggle to fully discern intricate relationships within detracting from performance.
the data, leading to subpar classification outcomes. As more
self-attention heads are incorporated, the algorithm becomes In scenarios where the count of self-attention heads is
adept at exploiting information within the dataset, enhancing limited, the algorithm's capability to discern subtle attributes
its classification precision. Nevertheless, an excess in the and interrelations within the data might be compromised,
number of self-attention heads could escalate model translating into reduced classification precision. As the
number of self-attention heads augments, the algorithm
becomes more adept at an exhaustive evaluation of the

747

Authorized licensed use limited to: KIIT University. Downloaded on January 29,2025 at 13:40:23 UTC from IEEE Xplore. Restrictions apply.
dataset's information, thereby sharpening its classification indicate that the suggested algorithm substantially
accuracy. Conversely, an excessive number of self-attention outperforms traditional methodologies when applied to
heads may introduce superfluous data, elevating model MNIST datasets. The ensuing Table I presents a synopsis of
intricacy and potentially impairing performance. the comparative performance outcomes.
Subsequently, it stands in comparison with conventional
CNN and logistic regression approaches. Empirical findings
TABLE I. PERFORMANCE COMPARISON OF ALGORITHMS
Model Accuracy Precision Recall F1 score
Traditional CNN 0.98 0.98 0.98 0.98
Logistic regression 0.92 0.92 0.92 0.92
The proposed AI algorithm 0.99 0.99 0.99 0.99
Observations indicate that the conventional CNN exhibits and F1 score of 0.99 each, marking a notable enhancement
commendable performance on the MNIST dataset, with over established methods.
metrics like accuracy, precision, recall, and F1 score all
attaining a level of 0.98. In contrast, logistic regression lags It can be clearly seen from Figure 4 that the
behind, registering lower scores across these same measures generalization ability of the algorithm has changed under
at approximately 0.92. The cutting-edge AI algorithm different data scales. With the increase of training data scale,
introduced surpasses traditional techniques in all evaluated the classification accuracy of the algorithm presents a
criteria, achieving an impressive accuracy, precision, recall, gradual upward trend.

Fig. 4. Generalization performance of the algorithm under different data scales

In instances where the volume of training data is limited, By comparing the experimental results, we find that the
the algorithm may struggle to adequately discern the traits algorithm achieves the best performance and the highest
and patterns within the samples, leading to a diminished classification accuracy when the training data scale is 30000.
classification accuracy. As the scale of training data expands, This shows that in the current algorithm design, choosing the
the algorithm gains access to a richer set of sample appropriate training data scale can achieve the best
information, enhancing its capability to accurately capture generalization ability, avoiding both under-fitting and over-
the intrinsic characteristics of the samples and thus improve fitting.
classification precision. Nevertheless, an excessively large
dataset might introduce superfluous noise and redundant VI. CONCLUSION
information, which could counteract the positive effects on In this study, an AI-driven feature extraction and
performance increments. classification algorithm is proposed to meet the challenges in
In the case of small data scale, the algorithm may be large-scale high-dimensional data processing. By introducing
affected by over-fitting, resulting in poor performance on the self-attention mechanism and Transformer network, an
test set. With the increase of training data scale, the innovative algorithm is designed, which effectively captures
algorithm can learn the true distribution of samples better, the dependence between the internal features of the sample in
thus reducing the risk of over-fitting and improving the the feature extraction stage and realizes end-to-end
generalization ability. However, when the data scale is too classification in the classification stage. The empirical
large, it may increase the complexity of the model and lead validation conducted on the MNIST dataset corroborates that
to the gradual weakening of the performance improvement the posited algorithm significantly outperforms conventional
effect. methods across varying depths of self-attention heads,
demonstrating superior generalization capabilities.

748

Authorized licensed use limited to: KIIT University. Downloaded on January 29,2025 at 13:40:23 UTC from IEEE Xplore. Restrictions apply.
Furthermore, the investigation assesses the algorithm's [6] Zhou, H. , Yu, K. M. , Chen, Y. C. , & Hsu, H. P. (2021). A hybrid
performance under diverse data scales, revealing a consistent feature selection method rfstl for manufacturing quality prediction
based on a high dimensional imbalanced dataset. IEEE Access,
enhancement in performance with the accretion of training 2021(99), 1-1.
data size. In summary, our findings offer an innovative [7] Bai, Y. , Zhang, Q. , Lu, Z. , & Zhang, Y. (2019). Ssdc-densenet: a
conceptual framework and solution for tackling feature cost-effective end-to-end spectral-spatial dual-channel dense network
extraction and classification challenges in large-scale, high- for hyperspectral image classification. IEEE Access, 2019(99), 1-1.
dimensional datasets, bearing substantial theoretical and [8] Alsenan, S. A. , Alturaiki, I. M. , & Hafez, A. M. (2020). Feature
applied relevance. Future endeavors may delve into the extraction methods in quantitative structure – activity relationship
refinement and expansion of this algorithm to foster modeling: a comparative study. IEEE Access, 2020(99), 1-1.
adaptability across an extensive array of application contexts. [9] Gao, Y. , Zhang, G. , Zhang, C. , Wang, J. , & Zhao, Y. (2021).
Federated tensor decomposition-based feature extraction approach for
REFERENCES industrial iot. IEEE Transactions on Industrial Informatics, 2021(99),
1-1.
[1] Wei, T. , Liu, W. L. , Zhong, J. , & Gong, Y. J. (2020). Multiclass [10] Wickramasinghe, C. S. , Marino, D. L. , & Manic, M. (2021). Resnet
classification on high dimension and low sample size data using autoencoders for unsupervised feature learning from high-
genetic programming. IEEE Transactions on Emerging Topics in dimensional data: deep models resistant to performance degradation.
Computing, 2020(99), 1-1. IEEE Access, 2021(99), 1-1.
[2] Shi, X. , Qin, P. , Zhu, J. , Zhai, M. , & Shi, W. (2020). Feature [11] Li, P. , He, X. , Cheng, X. , Gao, X. , & Li, Z. (2019). Object
extraction and classification of lower limb motion based on semg extraction from very high-resolution images using a convolutional
signals. IEEE Access, 2020(99), 1-1. neural network based on a noisy large-scale dataset. IEEE Access,
[3] Hammad, M. , Zhang, S. , & Wang, K. (2019). A novel two- 2019(99), 1-1.
dimensional ecg feature extraction and classification algorithm based [12] Huang, W. , Yue, B. , Chi, Q. , & Liang, J. (2019). Integrating data-
on convolution neural network for human authentication. Future driven segmentation, local feature extraction and fisher kernel
generation computer systems, 101(Dec.), 180-196. encoding to improve time series classification. Neural processing
[4] Zhang, J. , Mei, K. , Zheng, Y. , & Fan, J. (2019). Exploiting mid- letters, 49(1), 43-66.
level semantics for large-scale complex video classification. IEEE [13] Gao, Y. , Zhong, P. , Tang, X. , Hu, H. , & Xu, P. (2021). Feature
transactions on multimedia, 2019(10), 21. extraction of laser welding pool image and application in welding
[5] Xue, Y. , Zhao, Y. , & Slowik, A. (2020). Classification based on quality identification. IEEE Access, 2021(99), 1-1.
brain storm optimization with feature selection. IEEE Access,
2020(99), 1-1.