0% found this document useful (0 votes)
3 views

A_Characteristics-Based_Least_Common_Multiple_Algorithm_to_Optimize_Magnetic_Field-Based_Indoor_Localization

The article presents a novel Characteristics-Based Least Common Multiple (LCM) algorithm aimed at optimizing magnetic field-based indoor localization by autonomously determining cluster numbers and shapes. It addresses limitations of traditional clustering methods, such as reliance on predefined parameters and difficulties with irregular cluster shapes, leading to improved localization accuracy. The proposed technique demonstrates significant enhancements in localization performance, achieving a mean absolute error rate of 0.1 m in evaluations against benchmark datasets.

Uploaded by

zyyang12138
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

A_Characteristics-Based_Least_Common_Multiple_Algorithm_to_Optimize_Magnetic_Field-Based_Indoor_Localization

The article presents a novel Characteristics-Based Least Common Multiple (LCM) algorithm aimed at optimizing magnetic field-based indoor localization by autonomously determining cluster numbers and shapes. It addresses limitations of traditional clustering methods, such as reliance on predefined parameters and difficulties with irregular cluster shapes, leading to improved localization accuracy. The proposed technique demonstrates significant enhancements in localization performance, achieving a mean absolute error rate of 0.1 m in evaluations against benchmark datasets.

Uploaded by

zyyang12138
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

This article has been accepted for publication in IEEE Internet of Things Journal.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

A Characteristics-Based Least Common Multiple


Algorithm to Optimize Magnetic Field-Based
Indoor Localization
Hamaad Rafique *, Davide Patti, Member, IEEE, Maurizio Palesi, Senior Member, IEEE, Gaetano Carmelo La
Delfa
Department of Electrical, Electronics and Computer Engineering, University of Catania, Catania, Italy
[email protected], [email protected], [email protected], [email protected]

Abstract—Clustering is an unsupervised learning technique reduced long-range accuracy, which affect WiFi, RFID, and
that groups data based on similarity criteria. Traditional methods BLE signals, leading to localization issues. To address these
like K-Means and agglomerative clustering often require prede- challenges, researchers are turning to MFS for localization,
fined parameters, struggle with irregular cluster shapes, and fail
to classify sub-cluster points in magnetic fingerprint-based indoor introducing a concept known as ”fingerprinting.” This method
localization. This study proposes the Characteristics-Based Least involves collecting MFS data from reference points (RPs)
Common Multiple (LCM) algorithm to address these challenges. throughout the study area and using this data for real-time
This novel approach autonomously determines cluster number localization at random test points (TPs) [4], [5].
and shape while accurately classifying misclassified points based Fingerprinting is a key focus in indoor localization re-
on characteristic similarities using LCM. We evaluated the
proposed technique using state-of-the-art metrics and tested it in search. To advance this area, [6] propose DeepWiPos, a deep
magnetic field-based indoor localization scenarios. Comparisons learning-based fingerprinting framework that converts unstable
were made with real-time and benchmark datasets, alongside fingerprints into stable values using a Fingerprint Spatial
traditional clustering methods. Results demonstrate that LCM Gradient (FSG). They fuse FSG and RSS with an LSTM
significantly enhances localization accuracy, achieving a mean and attention module to reduce spatial ambiguity. DeepWiPos
absolute error rate of 0.1 m
reduces average positioning errors by over 22% on Bluetooth
Index Terms—Indoor localization, novel clustering technique, and Wi-Fi. Moreover, authors [7] proposed the QSFDEW
least common multiple, machine learning, Data Clustering, au- i.e., Quadtree Search and Fractal Direction Entropy Weighting
tomatic clustering
method to improve indoor localization by using a quadtree
algorithm to divide the area into grids and locate reference
I. I NTRODUCTION points efficiently. It quickly searches neighbouring quadrants
Advancements in IoT technology have led to a sharp rise and combines entropy weighting to enhance accuracy. Its key
in data-intensive applications, including indoor navigation advantages are low complexity and high efficiency, making it
and localization within various industries. Indoor localization an effective solution compared to other positioning algorithms.
involves determining an approximate or precise user location Furthermore, author [8] proposed CALLOC a framework de-
within an indoor environment. While the Global Positioning signed to resist adversarial attacks and variations across envi-
System (GPS) is highly effective outdoors, its signals cannot ronments and devices. It uses an adaptive curriculum learning
penetrate thick building structures or basements. Therefore, approach with a lightweight neural network for resilience,
researchers are developing alternative approaches for indoor specifically tailored for resource-constrained devices to secure
localization systems (ILS) by leveraging data from sensors fingerprinting. Similarly, authors [9] work on the optimization
such as Wireless Fidelity (WiFi), Radio Frequency Identifi- of the localization using the geometric relationship between a
cation (RFID), Bluetooth Low Energy (BLE), and Magnetic 5G-enabled Metaverse user and 5G femtocells impact indoor
Field Signals (MFS) [1], [2]. localization accuracy. Despite these advantages, fingerprinting
Traditionally, researchers have utilized the angle of arrival presents challenges, particularly the time and effort required to
or time of arrival signals from various access points for local- collect and maintain well-annotated location-based data across
ization [3]. However, these methods have limitations, including extensive study areas [10]–[13]
signal blockages and reduced long-range accuracy. However, Factors such as perturbations and ferromagnetic materials
these methods have limitations, such as signal blockages and can introduce complexities and inaccuracies in MFS used
in ILS. These issues can result in multiple predictions for
H. Rafique*, D. Patti, M. Palesi and GC. ladelfa are with the Depart- distinct location points, especially under unpredictable test
ment of Electrical, Electronics and Computer Engineering, University of
Catania, Catania, Italy. email: [email protected] (corresponding conditions characterized by non-linear relationships among
author). The email addresses of coauthors are [email protected], maur- variables. Machine learning techniques, including supervised
[email protected], and [email protected]. Copyright (c) 20xx IEEE. and unsupervised methods, have emerged as valuable tools
Personal use of this material is permitted. However, permission to use this
material for any other purposes must be obtained from the IEEE by sending to tackle these challenges. Unsupervised clustering techniques
a request to [email protected] such as K-means [14], PF Clust [15], graph-based spectral

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

•Tunable Parameter: A tunable parameter improves the


clustering process.
• Evaluation Metrics: The technique is evaluated using
metrics such as the Silhouette score, Calinski-Harabasz
Index, and Davies-Bouldin Index on benchmark datasets.
• Real-time Localization: The algorithm’s effectiveness is
tested using fully connected artificial neural network
(ANN) models for real-time indoor localization.
The proposed clustering technique is a proof of concept
specifically designed to optimize indoor localization using
MFS. While its primary focus is on MFS data, we acknowl-
edge this as a limitation in terms of broader applicability.
However, our ongoing research aims to apply the proposed
LCM clustering technique to diverse datasets, such as Wi-Fi
Fig. 1. Conceptual representation of five reference points with misplaced
signals, RFID, UWB and BLE data. These efforts are directed
data points (dark and light blue samples) sharing similar characteristics (with toward evaluating the algorithm’s adaptability to various data
cluster two) but being physically distant, leading to misplacement of samples types, particularly for sensor fusion in indoor localization. This
into incorrect clusters. Bx , By and Bz are three dimensions of MFS.
exploration is expected to provide further insights into the
generalizability of the technique, expanding its potential ap-
plications beyond MFS and improving clustering performance
approaches [16], and density-based methods such as k-means
across different environments. We can refine the algorithm
kernel [17], DBSCAN [18], and OPTICS (Ordering Points
by testing on additional datasets and assess whether the
To Identify the Clustering Structure) [19] have been proposed
LCM approach offers comparable benefits in other domains.
to manage large unsupervised datasets effectively. However,
This would ensure a more versatile clustering technique with
unsupervised clustering techniques often struggle to perform
practical use in various scenarios.
optimally on substantial localization datasets, leading to chal-
The rest of the paper is organized as follows: Section II
lenges in maintaining the generality and efficiency of the
provides research motivation and supporting research. Section
clustering process. Consequently, clustering outcomes may
III explains the prosed clustering technique. Details on the
include sample points within a cluster that might be physically
proposed clustering technique are discussed in Section IV.
distant but share similar characteristics, resulting in multiple
Section V elaborates on the state-of-the-art evaluation tech-
predictions for distinct location points.
niques and data used for evaluation. Section VI presents the
This motivates the development of a technique that can
results of the proposed clustering technique using state-of-
accurately identify and correctly classify data points or sub-
the-art evaluation metrics. Section VII explains the results
clusters sharing similar characteristics but physically separated
of real-time indoor localization achieved with the clustered
and incorrectly clustered. The proposed Characteristics-Based
dataset using the proposed technique. Section VIII presents
Least Common Multiple (LCM) Clustering Algorithm aims to
the discussion of the study. Finally, Section IX addresses
address these limitations. This technique is valuable for appli-
limitations and future work, and Section X concludes.
cations such as asset tracking, user navigation, and context-
aware services [20]. II. R ELATED W ORK
The proposed LCM technique overcomes the drawbacks In the Internet of Things (IoT) era, technological advance-
of conventional clustering methods, such as reliance on ments have significantly increased data-intensive applications
user-defined parameters and difficulties with arbitrary clus- across various industries. These IoT applications generate
ter shapes. It also addresses the issue of misplaced sample vast amounts of data in healthcare, transportation, agriculture,
points within clusters caused by overlapping MFS due to smart cities, security, and localization. Various clustering tech-
ferromagnetic materials in indoor environments. Figure 1 niques have been proposed in the literature for better data
conceptually represents sample points or sub-clusters with understanding.
similar characteristics to clusters one and three but physically Clustering is a fundamental challenge in data mining and
distant, leading to misplacement. A detailed explanation of analysis, aiming to organize datasets into distinct groups based
magnetic field features is provided in Section V-B. The key on their similarities [21]. Numerous clustering algorithms,
contributions of this work are as follows: classified as hierarchical and partitional, have been proposed
• We present the ”Characteristics-Based Least Common [22]. Our proposed clustering technique falls within partitional
Multiple Clustering Algorithm” to perform arbitrary clustering, where the method initially treats the entire dataset
shape clustering to improve indoor localization. as a single cluster and then divides it into disjoint clusters.
• Novel LCM Approach: The algorithm uncovers distinct This process is guided by a criterion function that restricts the
characteristics of MFS. clustering based on dissimilarity, facilitating the minimization
• Efficient Organization: The algorithm organizes individ- of dissimilarity within each cluster.
ual samples into separate clusters, rectifying instances of One of the pioneering and most commonly used techniques
mistaken grouping. for clustering is k-means. It defines clusters based on their

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

proximity to the cluster centre, making it an efficient and a backbone network with triply-shared weights for represen-
straightforward technique widely used across multiple disci- tation learning, and involves instance-level and cluster-level
plines [14], [23]. However, despite its simplicity, k-Means has contrastive learning. Similarly, [21] proposes a multi-view
limitations. It requires a user-defined number of clusters, k, deep clustering technique named unified and discrete bipartite
and only produces spherical clusters, often resulting in a global graph learning to handle computational complexity, single-
minimum [24], [25]. view graph learning, and embedding discretization using k-
Clusters in a dataset often exist in fuzzy intervals, making Means. Experiments demonstrate the efficiency and robustness
it challenging to determine the k value. Consequently, the of this approach on multi-view datasets. However, these tech-
algorithm needs multiple executions with different combina- niques are more suitable for high-dimensional datasets and
tions of clustering numbers to identify the suitable number require large datasets for learning deep clustering parameters.
of clusters, balancing execution time and space. Techniques Time series-based clustering techniques are also explored
such as those proposed by [26], [27] address some limitations in the literature. In [30], the authors address the geometrical
but still struggle with clustering arbitrary shapes due to their and global topological properties of time series data with a
reliance on assigning data points to the closest centre. A method called ”topological and geometric mixed distance.”
graph-based spectral technique proposed by [16] recognizes This technique extracts topology features using the persistence
arbitrary shape clusters by grouping data points based on diagram of the point cloud of time series data and employs
closely connected elements in the graph’s structure. However, local geometric properties, specifically time correlations. Their
like k-Means, it requires input to define the cluster centres results outperformed the baseline. Additionally, [31] presents a
[28]. k-shape clustering technique for time series datasets, utilizing a
The PF Clust (Parameter Free Clustering) [15] automati- normalized version of the cross-correlation measure to account
cally determines the number of clusters without user-defined for the shapes of time series. This method compares favourably
parameters. PF Clust applies an agglomerative method to with other clustering methods, demonstrating robustness and
numerous randomly generated sub-datasets, assessing internal efficiency [32].
validity measures and setting thresholds based on the number However, these techniques primarily focus on either cluster-
of clusters. While PF Clust can outperform typical clustering ing shapes or dealing with user-defined parameters. Deep and
methods, it is time-consuming and computationally expensive time series-based clustering methods handle high-dimensional
due to the need for repeated sampling and evaluation. In [24], complex patterns but overlook scenarios where data points
researchers used an automated clustering technique with a with similar characteristics are physically distant and assigned
force function to control the movement of objects. As the to incorrect clusters. This omission is critical in indoor local-
distance between two objects increases, the force between ization, where high accuracy is essential for effective resource
them decreases, causing each item to migrate towards its management, improved security, safety, navigation, and overall
respective cluster centre. This force is calculated using a user- user experience.
defined parameter called λ, which influences the number of To overcome these limitations, we introduce a new cluster-
clusters. ing algorithm called the ”Characteristic-Based Least Common
To address the issue of clustering shapes, density-based Multiple Technique.” This algorithm leverages the features of
clustering techniques are well-known for their ability to detect MFS to detect physically distant sample points, or sub-clusters,
arbitrary-shaped clusters without user-defined cluster centres. sharing similar characteristics and appropriately assigns them
DBSCAN (Density-Based Spatial Clustering of Applications to their respective clusters.
with Noise) [18] is the most commonly used density-based The following section will provide a foundational under-
technique. It identifies arbitrarily shaped clusters by using a standing of the clustering techniques used in the proposed
specific threshold value of the density, ϵ, determined by the algorithm.
MinPts and the radius of the neighbourhood. Authors [26]
proposed a novel clustering technique based on epsilon radius
III. P RELIMINARY K NOWLEDGE
neighbours, automatically identifying the number and shape
of clusters [24]. Spectral clustering [18] and k-Means kernel This section outlines the essential concepts and operations
[17] also define clusters based on arbitrary shapes but require involved in the clustering process, focusing on how datasets
a predefined number of clusters. DBSCAN may merge clusters are partitioned into distinct groups using a clustering algo-
close to each other, a drawback addressed by its variant, rithm.
OPTICS, which orders data points based on their density to The clustering aims to divide the data set (DS) into K
reveal clusters of different densities [19]. clusters CK = {c1 , c2 , c3 . . . , ck } [33]. Let’s assume we have
Recent research suggests deep learning-based clustering, a data set DS = {dsp1 , dsp2 , . . . , dspn } ∈ Rn×df , where
integrating deep learning and common clustering techniques df is the dimension of each sample point dspi , and n is the
to understand complex patterns in high-dimensional spaces. number of sample points. We denote the cluster to which a
This approach is effective in tasks like image segmentation, data point dspi belongs as cdspi .
document clustering, and speech recognition. In [29], the The algorithm produces a K × n partition matrix as
authors introduced a robust clustering technique using strong U (DS) = [Uki ], with K = {1, 2, . . . , k} and i =
augmentation, achieving an overall accuracy of 76.5%. This {1, 2, 3, . . . , n}. Each element Uki indicates the membership
method combines strong and weak augmentations, utilizing degree of the dspi to the assigned cluster cdspi . In hard

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

Phase 1

Distance Matrix ED Data


Start
Eq (7) diff ij Eq (6) dij DS= {dsp1,dsp2, ..., dspn}

Phase 2
2a 2b
Ci= {Ci1,Ci2,...Cin}

No
Yes Eq
(10) Pc
LCM Eq (9) Eq (5) Merged Post Cluster Final
Eq (8) mcki Stop
dij
LCM(h,m) Data Processing Processed Yes Clusters
No Eq (11)
Ic
Cj= {Cj1,Cj2,...Cjn}

Fig. 2. Flowchart: Phase 1 initializes the data by computing the Euclidean distance (ED) on the input data, resulting in a symmetric matrix of distances. In
Phase 2, clusters are formed. Initially, the algorithm computes the LCM to create clusters. Ci represents the main clusters, while Cj denotes clusters with
single values. These are managed using ED to retain information, followed by post-processing to address repetitions Pc and clusters with shared characteristics
Ic to achieve the final clusters.

clustering, the degree of membership is either 1 or 0, indicating IV. P ROPOSED C LUSTERING A LGORITHM
whether a dsp belongs to a particular cluster. The membership This section presents a novel clustering technique to im-
function Uki is defined using Eq. (1). prove indoor localization by grouping MFS based on their
( inherent properties. Unlike conventional clustering methods,
1, if dspi ∈ Ck which typically focus on central points or dense regions, our
Uki = (1)
0, otherwise approach prioritizes the data’s distinctive characteristics.
The proposed algorithm, illustrated in Figure 2, consists
To have proper clustering, the partitioning must satisfy
of two main phases. Phase 1, begins with calculating the
Eq. (2), Eq. (3) and Eq. (4) such that (i ̸= j, i, j =
Euclidean Distance (ED) between data points, as shown in
{1, 2, 3, . . . , k}) ensures that no sample belongs to more than
Eq. (6), which produces a symmetric distance matrix as shown
one cluster.
in Eq. (7). This matrix is the key input for the subsequent
k clustering process. In Phase 2, clusters are formed by cal-
DS = Ui=1 Ci (2) culating the LCM from the distance matrix using Eq. (8),
and the assignment of data points to clusters is determined
Ci ̸= ∅ (3) by Eq. (9). Here, Ci represents the primary clusters, grouping
data points that meet the specified criteria, while Cj contains
outliers representing noise or distant samples.
Ci ∩ Cj ̸= ∅ (4) In addition to noise, there are two other categories of Cj : Pc ,
which denotes repeated samples that form sub-clusters Ic , and
In the Euclidean space, the distance between two sample Ic , which are sub-clusters within larger clusters, containing
points dspi and dspj can be calculated as: samples with unique features distinct from their parent cluster.
v
u df Post-processing steps, as described in Eq. (10) and Eq. (11),
uX are then applied to ensure accurate and meaningful cluster
dij = t (dspip − dspjp )2 ∀i, j (5) assignments. This process preserves the integrity of the data
p=o
by grouping it based on its natural characteristics.
This distance metric is commonly used to determine the The next sections will explore the methodology in detail,
similarity between sample points, which guides the formation following the order outlined in the flowchart for clarity and
of clusters based on proximity. better understanding.

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

A. Phase 1 Algorithm 1: LCM Clustering Algorithm


It focuses on preparing the input data for the proposed for i in range(n) do
clustering technique. Since the LCM function requires integer- if assigned[i] ≥ 0 then
based inputs, continuous data like geomagnetic intensity must continue
first be transformed into a suitable format. This involves end
calculating pairwise Euclidean distances, scaling them, and f ound cluster ← False
rounding the results to integers for clustering. for j in range(len(clusters)) do
We begin by calculating the ED between each pair of data LCM of Clusters
points using Eq. (6). These distances are initially in continuous LCM of nv with Clusters
form and must be converted to integers. First, the distances if Equation (9) then
are rounded to the nearest whole number. Next, these rounded assigned[i] ← j
values are scaled by a tunable Distance Scaling Factor (DSF), f ound cluster ← True
denoted as ϑ. The purpose of the DSF is to convert continuous break
end
distances into manageable integers, which are required for the
end
LCM function to operate effectively.
if not f ound cluster then
The DSF plays a crucial role in controlling the quality
new cluster ← [i]
and structure of the clusters. By adjusting ϑ, we can fine-
for j in range(i + 1, n) do
tune the algorithm’s sensitivity to the distances between data lcm ← (dif f [i, j])
points. A higher scale factor results in fewer, more compact if lcm ≤ ϱ then
clusters, while a lower scale factor creates more numerous, less new cluster(j)
compact clusters. This flexibility ensures that the algorithm can assigned[j] ← len(clusters)
adapt to the unique characteristics of different datasets, such end
as variations in data density and noise levels. end
The total number of estimated distances is determined by clusters.append(new cluster)
the number of distances between each dsp and all other dsp end
are equal to the number of edges in a fully connected graph. end
This is calculated as n(n−1)
2 , where n is the total number dsp.
The results from Eq. (6) are stored in a symmetric matrix of
size n × n.
it processes the MFS uniformly across different devices,
ensuring that device heterogeneity does not negatively impact
v
u df
the clustering process and that the clustering is based on the
uX
dij = t (dspip − dspjp )2 × ϑ ∀i, j (6)
p=0
spatial relationship of samples rather than any discrepancies
caused by different phone models. By focusing on the relative
Once the distances are fine-tuned from the rounding and
proximity of the sample points, the LCM clustering algorithm
scaling process, they are stored in a symmetric matrix as
improves the robustness of the model, allowing it to handle the
Eq. (7), representing the distance between each pair of dspi .
diversity of devices effectively. As a result, the heterogeneity
This matrix serves as the primary input for the next phase of
of devices increases the accuracy of distance estimation with-
the clustering algorithm, where the LCM-based clustering is
out compromising the stability of the clusters. This method
applied. Preparing the data in this structured format ensures
demonstrates universality by accommodating variations in sen-
the clustering process will be accurate and computationally
sor behaviour and ensuring consistent clustering performance
efficient.
across different devices. The next section elaborates on phase

0 d12 d13 . . . d1n
 II of the proposed technique.
 d12 0 d23 . . . d2n  1) Phase 2a: Least Common Multiple (LCM): The pro-
posed algorithm begins by calculating the LCM, the smallest
 
 d13 d23 0 . . . d3n 
diff ij =  (7)
positive integer divisible by two or more given integers. It

 .. .. .. .. .. 
 . . . . .  is calculated between each dspi and the existing clusters to
d1n d2n d3n ... 0 assign dsp to the relevant clusters. For two numbers h and m,
Finally, a threshold-based assignment is performed. If the the LCM can be computed using Eq. (8):
LCM of the distance between a dsp and a cluster follows
the defined criterion defined in the next section then the dsp |h · m|
LCM (h, m) = (8)
is assigned to that cluster. This approach ensures that clusters gcd(h, m)
are formed in a meaningful way, enhancing both accuracy and Where gcd(h, m) indicates the greatest common divisor of
computational feasibility. h and m.
Algorithm 1 illustrates the operation of the clustering tech-
B. Phase 2: Proposed Clustering Technique nique. It begins by selecting the first value from Eq. (7) to
The proposed clustering technique does not rely on the form the first cluster. Subsequently, it evaluates each value
specific mobile phone model used for data collection. Instead, of the distance vector from diff ij . At each step, the algorithm

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

calculates and compares the LCM of existing clusters with the nested lists. Upon studying the characteristics of Ic , we
LCM of the new value nv . The decision to assign a sample identify shared characteristics with other clusters, a primary
to a cluster is governed by the membership condition mcki as research focus discussed in Section I, and illustrated in Figure
Eq. (9): 8. To address Ic , we calculate the LCM of all host clusters Ci
excluding Ic and divide the LCM of Ic with Ci . If it satisfies

1, if LCMnv +Ci % LCMCi == 0 the conditions expressed in Eq. (11) then Ic will be assigned


1, to relevant host clusters with similar characteristics. Where the
if LCMCi ̸= 0

mcki = (9) range of ς is between [0.5, 1.5].
1,
 if LCMCi ≤ ϱ
 
0, 1, if LCMIc % LCMCi == 0

Otherwise 
Once the condition is satisfied, the algorithm adds nv to Ic = 1, if LCMIc % LCMCi ∈ ς (11)

an existing cluster Ci . If the condition is not satisfied, a new 0, Otherwise

cluster will be formed. This process continues until all samples
in diff ij are processed. The threshold ϱ prevents division This iterative process continues until the final cluster sets
by zero and typically defaults to 1. With these rules, the are obtained. Unlike traditional approaches that rely solely
algorithm effectively groups similar nv into clusters. If nv on distance-based metrics, the fundamental concept of this
does not satisfy the conditions, a cluster Cj is formed. It clustering strategy is to autonomously cluster based on the
will represent the noise and can be eliminated from further features of the magnetic field signals. Consequently, the al-
analysis. However, we merge them by finding the cluster near gorithm autonomously determines the appropriate number of
Cj with the minimum distance to generalize the results. clusters and their shapes. These advantages distinguish our
2) Phase 2b1: Handling Repeating Values Pc : After clus- proposed clustering algorithm from conventional methods like
ters are formed, a post-processing step addresses the increase DBSCAN, Agglomerative clustering, and K-means.
in sample size caused by repeated values nv appearing across
different clusters. These repeats occur because nv values with V. E VALUATION C RITERIA
similar LCM tend to fall into the same Ci . To manage this,
Before proceeding with the evaluation, let’s outline the pro-
placement conditions Pc (see Eq. (10)) are used to assign these
cess. First, we will evaluate the performance using established
repeated values accurately to their respective clusters. First, the
techniques. Second, we will utilize the clustered dataset to
process compares the frequency of repeated values in clusters
train a real-time indoor location prediction model.
A and B. If a value appears more often in A than B, it is
moved from B to A. This check is applied to all clusters
containing repeated values. A. State-of-the-Art Clustering Validity Index
The second condition identifies nearby neighbours for each
The proposed clustering technique works unsupervised
repeated value by calculating the differences between it and
without predefined labels for arbitrary shapes. Hence, accord-
other points within the cluster. The algorithm prioritizes the
ing to authors [34]–[36], the Silhouette score (SS), Calinski-
first criterion unless it detects at least three neighbours close
Harabasz Index (CH-I), and Davies-Bouldin Index (DB-I) are
to the repeated value that falls within the threshold distance,
the three state-of-the-art metrics that can be used for arbitrary
ς. For instance, if cluster A contains the repeated value 5
shape clustering.
occurring 10 times and cluster B contains 5 appearing 8
times, then 5 will be assigned to cluster A based on the first 1) Silhouette score: It is mathematically defined as
criterion. However, if the second condition identifies at least dsp  
three nearby neighbours for 5 in cluster B then 5 will move 1 X bi − ai
SS = (12)
from A to B. dsp i=1 max(ai , bi )
 Where dsp denotes the total number of samples, ai is
1, if the repeating value in Cluster A > Cluster B

the average distance between sample i and all other
Pc = 1, if neighbours of repeating value in Cluster A ≥ ς samples in the same cluster, and bi is the average

0, Otherwise distance between sample i and all samples in the nearest

(10) neighbouring cluster.
Moreover, if the number of neighbours is equivalent in 2) Calinski-Harabasz Index: It is mathematically defined as
both clusters, the minimum value constraint of the nearest
h PK 2
i
neighbour is incremented to one. This approach ensures fair i=1 |Ci |d(vi ,v)
K−1
treatment of repeating values, enhancing the coherence and CHI(K) = PK P (13)
d(dsp,vi )2
robustness of the clustering methodology. i=1 dsp∈Cj
dsp−K
3) Phase 2b2: Handling Sub-Clusters Ic : Due to the influ-
ence of the indoor environment, certain MFS exhibit similar Where Vi is the centroid of the cluster Ci , and v is the
characteristics, leading to sub-clusters within clusters that global centroid of all the dsp in DS.
reveal distinctive features. These sub-clusters, designated as 3) Davies-Bouldin Index: This technique is mathematically
independent sub-clusters Ic within the host cluster, exemplify defined as

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

We gathered a new dataset from Building 13 at the Uni-


K
! versity to assess the effectiveness and performance of the
1 X avg(Ci ) + avg(Cj ) proposed clustering technique. It includes 50 strategically
DBI(K) = max (14)
K i=1
j̸=i ξ(Ci , Cj ) placed RPs depicted in Figure 4. The highlighted red circle
in the figure represents the RP in the corridor where MFS
1
P
where avg(Ci ) is |C| dsp∈Ci d(dsp, vi ), here v is the data were gathered. This building, particularly the first floor
centroid of C cluster and |C| are the number of samples where data collection took place, is located above heavy-
in cluster C. ξ(Ci , Cj ) = d(vi , vj ), where vi and vj is duty machinery used by the Civil Engineering Department
the centroid of cluster Ci and Cj . on the ground floor, which generates strong electromagnetic
We evaluated our clustering algorithm using a small, real- interference. This interference resulted in varied geomagnetic
time dataset collected on an iPhone 13 Pro Max. This assess- intensities, contributing to irregular data spread. The new
ment focused on the algorithm’s ability to autonomously orga- dataset demonstrates that the proposed LCM clustering tech-
nize data into meaningful clusters, showcasing its adaptability nique can effectively manage magnetic field data influenced
and accuracy in determining the optimal number and shape by environmental factors.
of clusters without external guidance. The results, illustrated Subsequently, The proposed clustering technique was eval-
in Figure 3, demonstrate that the algorithm effectively groups uated using benchmark datasets adopted from [37]. This
data, validating its suitability for real-time applications. dataset is well-known in indoor positioning and navigation
conferences, covering an area of 185 m2 in an indoor envi-
ronment. Specifically, the evaluation focused on the corridor
B. Dataset and Study Area environment with 36 RPs comprising MFS components Bx,
In this study, the experimental environment was inten- By, and Bz.
tionally constrained to a small indoor space to evaluate the Datasets from [1] were collected using the Huawei P8 lite
proof-of-concept of the proposed LCM clustering technique and the iPhone 13 Pro Max in an open lab setting. The MFS
based on MFS. While this limited the immediate scalability data from building 13 were collected using a Redmi Note
to larger environments, it was necessary for controlled testing 11 Pro equipped with a ”mag-akm0991” sensor operating at
with a restricted set of RPs to demonstrate the feasibility of a sampling frequency of 100 Hz and the dataset from [37]
using geomagnetic intensity for indoor localization. Despite were collected using Sony Xperia M2 mobile phone at a
the limited initial environment, we extended our evaluation frequency of 10 Hz The clustering technique was evaluated
by testing the proposed clustering technique in three distinct on all datasets and results are presented in Table I.
research settings.
First, we utilized the raw and cleaned datasets from [1], VI. C LUSTER A SSESSMENT BASED ON S TATE - OF - THE - ART
which were collected in an open laboratory at the University E VALUATION I NDICES
of Catania using two different devices. This dataset covers a The following section will present the results obtained from
60 m2 area, including MFS features Bx, By, and Bz, and the proposed clustering technique based on raw and processed
involves a total of 27 RPs. Notably, this data was collected datasets.
in an environment built of metal including walls and the roof
along with other ferromagnetic materials. A. Cluster Representation
1) Noisy Dataset: The raw data collected from each device
represents the noisy datasets, illustrated in Figure 5. Subse-
quently, these raw datasets were processed using the LCM
clustering approach to assess the effectiveness of our proposed
method in handling data noise. Figure 5a displays the clusters
generated from the noisy dataset obtained from the Huawei

Fig. 3. Illustration of the Proposed Clustering Technique Using a Smaller Fig. 4. Study Environment: Visual Depiction of the Second Floor of Building
Dataset for Enhanced Comprehension, where Bx , By , and Bz are the 13 at Unict. Data from this building is used to Evaluate the Proposed
dimensions of the magnetic fields Clustering Technique for Real-time Time Localization.

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

(a) Huawei Clean Dataset


(a) Huawei Noisy Dataset

(b) iPhone Noisy Dataset (b) iPhone Clean Dataset


Fig. 5. Clustering Patterns on Noisy Raw Datasets from Open Lab Fig. 6. Clustering Patterns on Processed Datasets from Open Lab

device. Figure 5b shows the clusters from the noisy dataset


obtained from the iPhone. The LCM approach successfully
clusters. These sub-clusters are identified when samples ex-
formed balanced clusters with diverse shapes for both datasets.
hibit similar characteristics to those of another cluster, as
2) Clean Data: The collected datasets underwent prepro- determined by Eq. (11). Figure 8 illustrates the presence of Ic
cessing to mitigate distortions, offsets, and noise in the mag- sub-clusters within the clusters of Huawei and iPhone datasets,
netic field readings, resulting in clean and noise-free datasets. depicted in Figure 8a and Figure 8b, respectively.
These clean datasets were then used to assess the effectiveness
of the proposed clustering technique in improving clustering Upon closer examination of the plots, distinct clusters are
quality compared to the original noisy datasets. The resulting represented by different colours such as green, red, blue,
clusters from the processed datasets are depicted in Figure 6, orange, etc. Within each cluster, Ic sub-clusters are identi-
visually representing how preprocessing influences clustering fiable by sharing the same colour as the primary cluster they
outcomes. resemble. This observation underscores the existence of sub-
Furthermore, Figure 7 presents clusters derived from data clusters that share specific characteristics with other clusters
collected within Building 13 of our university, consisting of while maintaining their unique cluster identities.
over 25, 000 MFS samples. Figure 7a shows clusters obtained Furthermore, Figure 9 illustrates the relationship between
from clean data, while Figure 7b displays clusters from the the parameter ϑ and the number of clusters generated by the
original noisy dataset. algorithm. For the clean datasets, depicted in Figure 9a, the
algorithm produced 21 and 23 clusters. In contrast, Figure
9b shows the clustering results for the noisy dataset, where
B. Sub-Clusters with Shared Characteristics Ic
initially, the number of clusters remained constant at 23.
As discussed in Section IV-B3, Ic represents sub-clusters However, as the ϑ value increased, the number of clusters
within clusters that share common characteristics with other also increased.

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

(a) Clean Dataset (a) Ic of Huawei Dataset

(b) Noisy Dataset (b) Ic of iPhone Dataset


Fig. 7. Clustering Patterns on Dataset from Building 13 Fig. 8. Representation of Ic on both Datasets from Open Lab

For both noisy datasets, the number of clusters remained


C. Fine Tuning the Distance Scale Factor ϑ consistent at 23, accompanied by favourable evaluation results,
as depicted in Figure 11. This consistency suggests that the
1) ϑ for Noisy Dataset: Experiments were conducted to algorithm effectively produced numerous clusters with precise
find the optimal value of ϑ and to measure its impact on evaluation scores at the optimal ϑ value. Figures 11a and
cluster quality using evaluation scores. From the experimental 11b present the evaluation outcomes across different ϑ values,
findings, ϑ values around 1 or 2 were identified as optimal, comprehensively analysing the algorithm’s performance.
evident from the elbow point in Figure 10. Clusters generated 2) ϑ for Clean Dataset: For clean datasets, optimal values
with ϑ values of 1 or 2 exhibited favourable characteristics of ϑ were determined to fall within the range of 55 to 100
according to evaluation metrics, showing an increase in the through extensive experimentation, as illustrated in Figure
number of clusters with higher ϑ values. 12. The selection of ϑ plays a crucial role in adjusting the

TABLE I
C LUSTER E VALUATION M ATRIX R ESULTS ACROSS D IVERSE DATASETS , U SING H ETEROGENEOUS D EVICES , AND S TUDY E NVIRONMENTS U SING
T HREE E VALUATION T ECHNIQUES I . E ., S ILHOUETTE S CORE [-1,1], C ALINSKI -H ARABASZ INDEX (M AX ) AND DAVIES -B OULDIN INDEX [-0,1] ON
N OISY AND C LEAN DATASETS

Study Area Mobile Devices Data Sets Total dsp Silhouette Score Calinski-Harabasz index Davies-Bouldin index
Open Lab [1] Huawei P8 Lite Noisy Data 8920 0.83 229098.30 0.21
Open Lab [1] iPhone 13 Pro Max Noisy Data 1882 0.91 229154.46 0.11
Open Lab [1] Huawei P8 Lite Clean Data 8920 0.72 60817.75 0.30
Open Lab [1] iPhone 13 Pro Max Clean Data 1882 0.91 94257.08 0.11
Corridor [37] Sony Xperia M2 Noisy Data 36795 0.99 72902.23 0.21
Building 13 Red Mi note 11 pro Noisy Data 25000 0.84 1154545.40 0.18
Building 13 Red Mi note 11 pro Clean Data 25000 0.90 1474458.80 0.08

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

10

Number of Clusters vs. Distance Scale Factor Normalized Evaluation Criteria vs. Distance Scale Factor
25
Huawei P8 Lite 1.0
iPhone 13 ProMax
0.9
20
0.8

Normalized Evaluation Criteria


Number of Clusters

15 0.7
Silhouette Score
0.6 Calinski-Harabasz Index
Davies-Bouldin Index

10 0.5

0.4

5 0.3

0.2

10 20 30 40 50 1 2 3 4
Distance Scale Factor Distance Scale Factor

(a) Clean Datasets (a) DSF ϑ vs Evaluation metrics

Number of Clusters vs. Distance Scale Factor Normalized Evaluation Criteria vs. Number of Clusters
Huawei P8 Lite 1.0
iPhone 13 ProMax
2500 0.9

0.8

Normalized Evaluation Criteria


2000
Number of Clusters

0.7

1500 0.6

1000 0.5

0.4
500
0.3 Silhouette Score
Calinski-Harabasz Index
0 0.2
Davies-Bouldin Index

0 10 20 30 40 50 23 27 31 35 39 43
Distance Scale Factor Number of Clusters

(b) Noisy Datasets (b) Clusters vs Evaluation metrics


Fig. 9. Exploration of number of clusters Vs DSF ϑ for both clean and noisy Fig. 11. (a) Analysis of evaluation parameter variations across different
datasets from Open Lab DSF (ϑ) values in a noisy dataset. (b) Investigating the behavioural patterns
of evaluation metrics with varying cluster numbers. These graphs provide
insights derived from the noisy dataset of Huawei P8 Lite, utilizing DSF
Number of Clusters vs. Distance Scale Factor values up to 4, enhancing our understanding of Figure 9b. The same outcomes
43 Huawei P8 Lite will be observed by using the iPhone dataset.
iPhone 13 ProMax

38
technique: a SS above 0.5 indicates robust clustering, with
Number of Clusters

1 being ideal; CH − I measures cluster dispersion relative


33
to separation, favouring higher ratios; and DB − I assesses
cluster compactness, with lower scores being preferable.
28

23 D. Assessment of Clustering Based on Evaluation Techniques


1 2 3 4
Distance Scale Factor Table II and Figure 14 summarize the performance metrics
of the models, including MAE, runtime, and error standard
Fig. 10. Close examination of DSF value ϑ Vs. number of clusters for the deviation. The results indicate that our proposed clustering
noisy data as compared to Figure 9b
technique achieves lower MAE and error standard deviation
compared to DBSCAN and Agglomerative clustering, estab-
data scale and optimizing clustering performance, considering lishing it as a superior clustering method. The LCM method
the variability in data magnitudes during calibration. Figures overcomes challenges such as irregular cluster shapes and
12a and 12b depict the variations in evaluation metrics as ϑ the need for user-defined parameters, which are limitations
increases, demonstrating its impact on clustering quality. Con- of DBSCAN and Agglomerative clustering.
versely, Figure 13 illustrates the peculiar clustering behaviour The datasets were evaluated using three intrinsic clustering
observed in noisy datasets, emphasizing that a ϑ value of 2 evaluation techniques: SS, CH − I, and DB − I that are
consistently yields high evaluation scores. specifically used to evaluate the unsupervised clustering algo-
Table I summarizes the results of state-of-the-art evalua- rithm [34]–[36]. For the SS, which ranges from −1 to 1, a
tion techniques applied to various benchmark datasets. The higher score closer to 1 indicates better-defined clusters, while
achieved scores align with predefined thresholds for each −1 reflects poorly defined clusters. Our method consistently

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

11

Normalized Evaluation Criteria vs. Distance Scale Factor Normalized Evaluation Criteria vs. Distance Scale Factor
1.0 Silhouette Score 1.0
Calinski-Harabasz Index
Davies-Bouldin Index

0.8 0.8
Normalized Evaluation Criteria

Normalized Evaluation Criteria


0.6 0.6 Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index

0.4
0.4

0.2
0.2

0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
Distance Scale Factor Distance Scale Factor

(a) Huawei Dataset (a) Huawei Dataset

Normalized Evaluation Criteria vs. Distance Scale Factor Normalized Evaluation Criteria vs. Distance Scale Factor
1.0 1.0 Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index

0.8 0.8
Normalized Evaluation Criteria

Normalized Evaluation Criteria


0.6 0.6
Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index
0.4 0.4

0.2 0.2

0.0 0.0

0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
Distance Scale Factor Distance Scale Factor

(b) iPhone Dataset (b) iPhone Dataset


Fig. 12. Exploration of evaluation techniques on clean data set vs DSF ϑ Fig. 13. Exploration of evaluation techniques on noisy data set vs DSF ϑ

achieved favourable scores within this range, indicating better instability.


clustering performance. Despite this, the proposed LCM method consistently out-
The CH − I evaluates the dispersion ratio among clus- performs K-means in key evaluation metrics, as shown in
ters, with a higher value indicating superior clustering. Our Figure 14. The LCM-based technique is particularly effective
proposed technique achieved significantly higher CH − I val- in handling irregular data, even though the reference point
ues compared to the other methods, demonstrating improved distribution in this study had fewer irregularities. It performs
separation of clusters. Lastly, DB − I measures the distance better than traditional clustering methods like DBSCAN and
between clusters, aiming for values closer to zero, with a range Agglomerative Clustering, which show higher MAE and vari-
between −1 and 1. Our clustering technique resulted in DB−I ability, particularly in environments with irregular sensor data
scores near zero, outperforming K-means, agglomerative clus- or magnetic interference. While LCM incurs slightly more
tering, and DBSCAN. Based on these three intrinsic evaluation computational time, its enhanced ability to capture data varia-
metrics, our technique shows better performance in terms of tions makes it a more robust choice for real-time localization
clustering quality and cluster separation. tasks.
The proposed LCM clustering method exhibits comparable
Mean Absolute Error (MAE) to K-means clustering, with a VII. L OCALIZATION IN I NDOOR E NVIRONMENT
slightly higher standard deviation in positioning error. This The following section details the clustering technique’s
difference arises primarily because the LCM method retains results on MFS data and describes how the resulting clusters
independent clusters that capture unique samples near existing will be integrated with MFS features to train artificial neural
clusters, as defined in Eq. (11). While these clusters help networks for enhanced indoor localization accuracy.
preserve important variations in the data, they can increase
the error variance, leading to a higher standard deviation
compared to K-means. In contrast, K-means exhibits more A. Experimental Setup in Real-Time Indoor Localization
stable performance due to the clustering parameters derived To evaluate the proposed clustering algorithm’s perfor-
from the LCM technique, which functions similarly to the mance, it is essential to conduct real-time localization. There-
elbow method. This refinement reduces the standard deviation fore, the characteristics-based least common multiple cluster-
and results in a faster runtime, giving K-means a slight edge ing techniques with the basic artificial neural network under-

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

12

Comparison of Silhouette Scores Comparison of Calinski-Harabasz Scores Comparison of Davies-Bouldin Scores


1.0 kmeans kmeans kmeans
agglomerative agglomerative agglomerative
DBSCAN DBSCAN DBSCAN
LCM 200000 LCM 0.4 LCM
0.8

Calinski-Harabasz Score

Davies-Bouldin Score
150000 0.3
Silhouette Score

0.6

100000 0.2
0.4

0.2 50000 0.1

0.0 0 0.0
Clean Iphone Clean Huawei Noisy Iphone Noisy Huawei Bench Mark [24] Clean Iphone Clean Huawei Noisy Iphone Noisy Huawei Bench Mark [24] Clean Iphone Clean Huawei Noisy Iphone Noisy Huawei Bench Mark [24]
Dataset Dataset Dataset

(a) Comparison based on SS (b) Comparison based on CH-I (c) Comparison based on DB-I
Fig. 14. Comparative Evaluation of the Proposed Clustering Technique, K-Means Clustering, and Agglomerative Clustering based with (a) Silhouette Scores,
(b) Calinski-Harabasz Index (CH-I), and (c) Davies-Bouldin Index (DB-I). Silhouette Scores, ranging from [−1,1], where 1 indicates optimal results. CH − I
represents the dispersion ratio among defined clusters, with higher values indicating superior clustering. DB − I measures the distance between well-defined
clusters, aiming for values close to 0 for optimal clustering performance with ranges of [−0,1].

TABLE II
C OMPARISON OF R ESULTS M ETRICS : T HIS TABLE PRESENTS A COMPARATIVE ANALYSIS OF M EAN A BSOLUTE E RROR (MAE), PREDICTION TIME , AND
STANDARD DEVIATION OF ERRORS OBTAINED FROM A SINGLE MODEL TRAINED ON FOUR DISTINCT DATASETS CLUSTERED USING VARIOUS
TECHNIQUES , INCLUDING THE PROPOSED LCM CLUSTERING METHOD .

Clustering Technique MAE Run Time (s) Standard Deviation of Error


LCM Clustering (Proposed) 0.06 0.14 0.04
DBSCAN 0.10 0.05 0.10
Agglomerative 0.09 0.08 0.05
K-means 0.06 0.13 0.03

went evaluation within the indoor environment of building 13


V-B.
MFS data from each RP are represented as
mi = {dsp1 , dsp2 , dsp3 , . . . , dspz } ∈ Rz×df , where z is
the total number of data sample points dsp collected from
each RP. A total of ntrain = ztrain ×number of RPs = 25000
data sample points were collected across all
RPs for training and clustering, while a total of
ntest = ztest × number of RPs = 8000 were collected
Fig. 15. Illustration of the real-time Indoor Localization based on the
for testing at a rate of 100 Hz per second. All data were Proposed Clustering Technique.
collected in a stationary manner. The RP s are depicted in
Figure 4 as red dots.
Being dspi , cdspi the generic row of the feature vector
After clustering, a new feature dimension was added to each
used for training and cdspi the cluster to which dspi belongs.
MFS data row in the dataset, representing the cluster to which
Different data sample points can belong to the same cluster.
each data sample point belongs. Thus, the clustered dataset
used for training can be represented as Eq. (15).
B. Model Training

Cdtrain

= (dsp1 , cdsp1 ), (dsp2 , cdsp2 ), . . . , For real-time localization, An ANN model was trained using
Cdtrain form Eq. (16) to evaluate location accuracy. Multiple

(dspztrain , cdspztrain ) m1 , . . . , experiments were conducted to determine the optimal config-
 uration of the ANN model, focusing on node count reduction
(dsp1 , cdsp1 ), (dsp2 , cdsp2 ), . . . ,
 and hidden layer adjustments for multi-output regression.
Mean Absolute Error (MAE) was used as the evaluation

(dspztrain , cdspztrain ) mn (15)
train metric, with the dataset split into a 70% training set and a
30% test set. The chosen model architecture included four
Cdtrain ∈ Rntrain ×(df +1) hidden layers, each with ten nodes, five inputs, and n output
nodes. The training utilized the backpropagation algorithm
Rntrain ×(df +1) indicating the new dimensionality after clus- with Adam optimization. Detailed parameters of the ANN
tering. Eq. (15) can also be written as: model are provided in Table III. The evaluation results for
indoor localization based on our proposed clustering technique
Cdtrain = {(dspi , cdspi ) | i = 1, . . . , ntrain } (16) are shown in Figure 15.

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

13

termines cluster numbers and shapes, outperforming DBSCAN


Distance Error based on different type of clustered Dataset and providing competitive results against K-means.
1.0 Proposed LCM Based on these findings, our proposed clustering technique
DBSCAN
Agglomerative significantly reduces MAE to 0.1 m for 90% of predictions,
kmeans highlighting its superiority over other techniques.
0.8

VIII. D ISCUSSION
0.6
A. Computational Cost
CDF %

Our proposed approach has two parts: distance calculation


0.4
and LCM computation to form clustering.
1) Distance Matrix Computation: Let’s perform the theo-
0.2 retical complexity analysis of the proposed clustering tech-
nique. We used Euclidean distance among the pairs of the
0.0 samples in the dataset of size n which required n(n−1)
2 pair-
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 wise distance. However, each distance computation involves
Distance Error (m) iterating over d dimensions features of the data. Hence, the
computation of the time complexity of the distance matrix
Fig. 16. Comparison of Cumulative Distribution Functions (CDFs) of an ANN including pairwise calculations as defined using Eq. (17)
Model Using Clustered Data: The CDFs depict the performance evaluation
of the ANN model with data clustered using various techniques, including
the proposed, i.e., LCM clustering along with DBSCAN, Agglomerative, and O(n2 ) (17)
K-Means clustering.
2) LCM Clustering Computation: The LCM of the two
numbers is calculated using the Eq. (8), and the complexity
of calculating the LCM between two numbers is computed
Our main objective is to assess the effectiveness of the
as O(log(min(h, m))) due to the usage of the Euclidean
proposed clustering technique. Therefore, we used a prepro-
algorithm for gcd.
cessed dataset from building 13, focusing on demonstrating
However, LCM is calculated between distances, and for
the impact of clustering using our characteristic-based LCM
each pair, the complexity of LCM computation is O(log(d)),
clustering technique. We compared these results with the same
where d is the maximum distance in the symmetric matrix. It
ANN model applied to datasets clustered with K-means, Ag-
is calculated multiple times, potentially on the order of O(n2 )
glomerative, and DBSCAN methods. K-means and Agglom-
comparisons for the distance-based clustering. Therefore the
erative clustering require explicit input, whereas DBSCAN
total complexity for LCM-based clustering would be approx-
autonomously generates clusters, distinguishing them from our
imately as Eq. (18)
technique.
Figure 15 illustrates the location prediction outcomes using
O(n2 log(d)) (18)
the ANN model based on clustered datasets from our proposed
technique, while Figure 16 presents cumulative distribution Finally, the overall complexity of the proposed clustering
values from trained models. These figures demonstrate the algorithm combining both distance matrix and clustering step
superior performance of our clustering approach compared can be defined as Eq. (19).
to DBSCAN and Agglomerative methods. However, K-means
shows slightly better performance due to its simplicity in O(n2 + n2 log(d)) (19)
cluster criteria. The proposed clustering method effectively de-
To further evaluate the empirical evidence of the computa-
tional cost in terms of distance matrix and LCM clustering
calculation time in seconds and distance matrix and LCM
TABLE III
PARAMETERS OF ANN M ODEL clustering memory usage in MB, we have executed various
datasets using the proposed LCM clustering technique and
Parameters Values
calculated the actual runtime and memory usage. The outcome
Trainable Parameters 522
Batch Size 32 is presented in Table IV. It can be observed that the proposed
No. of epochs 60 technique is taking more time and memory for bigger datasets.
Early stopping 20 Hence, there is a research direction for further optimization of
No. of Splits 20
Training Size 70% the technique.
Testing size 30%
Learning rate 0.001
Regularizer L2 B. Outliers Detection
Optimizer Adam LCM-based clustering approach works based on the unique
Middle layer Activation Relu
Last layer Activation Linear characteristics of magnetic field signals at each location.
Initializer Glorot normal These characteristics differ depending on the position and
Loss Function mean square error the surrounding environment. Even if a room is far from

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

14

TABLE IV
C OMPUTATIONAL C OST COMPARISON IN TERMS OF TIME CALCULATION AND MEMORY USAGE BY EACH STEP WITH N OISY AND CLEAN DATA .
C ALCULATION T IME IN S ECONDS AND MEMORY USAGE IN MB S .

Study Area Mobile Devices Data Sets Distance Matrix Time Clustering Time Distance Matrix Memory Clustering Memory
Open Lab [1] Huawei P8 Lite Clean Data 0.80 0.47 607.04 27.14
Open Lab [1] iPhone 13 pro Max Clean Data 0.04 0.05 27.16 1.64
Open Lab [1] Huawei P8 Lite Noisy Data 0.77 0.46 606.79 28.39
Open Lab [1] iPhone 13 pro Max Noisy Data 0.04 0.05 27.23 1.57
Corridor [37] Sony Xperia M2 Noisy Data 6.97 6.59 498.93 270.54
Building 13 RedMi Note 11 pro Clean Data 5.43 5.43 477.40 253.33
Building 13 RedMi Note 11 pro Noisy Data 5.20 8.14 476.70 364.25

others, its magnetic signature will be unique and can be dynamically. The scale factor adjusts the distance metric,
used to detect that it may belong to a separate, isolated making the algorithm more flexible and capable of handling
region. In this way, magnetic field features act as a natural complex, heterogeneous datasets without a predefined number
discriminator for identifying outliers in the form of far-away of clusters.
rooms. Furthermore, we used Euclidean distance to generate One of the significant advantages of this approach is its abil-
the distance matrix for clustering. This matrix will be used ity to handle variable data densities. K-Means, for example,
to generate the clusters and to distinguish the outliers in often struggle with datasets that contain clusters of varying
our proposed technique. We used Eq. (11) to identify any shapes and densities because they assume a uniform distribu-
significant gaps between data points and the nearby clusters. If tion of data points. This limitation is especially pronounced
the sample exhibits a larger distance to nearby cluster centroids when the number of clusters k is improperly selected. Our
then it will be termed as an independent or outlier cluster. algorithm, however, mitigates this issue by using the distance
Hence Such clusters will then be flagged for further analysis. scale factor to adapt to the natural distribution of data, making
However, if the independent clusters obey the conditions of it more robust and effective in diverse scenarios.
Eq. (11) then it will sustain for further processing.
IX. L IMITATIONS AND F UTURE WORK
C. Shape-Agnostic Environments
In the future, we aim to enhance the clustering process
The proposed clustering technique is shape-agnostic and de- by integrating LCM with other benchmark techniques. This
signed to work based on the MFS distribution. The LCM-based integration will enable us to mitigate uncertainties in indoor
clustering focuses on the distance between magnetic samples location prediction. Additionally, we recognize the need to
rather than the arbitrary-shaped rooms. Whether the room is address the following challenges:
diamond-shaped, irregular, or rectangular, the magnetic field • Scalability to dynamic environments: The current article
at any specific location will have unique characteristics. The discusses clustering in static indoor situations. Future
algorithm calculates distances between points, converts these research should look into the adaptability of the LCM-
into integers, and applies the clustering conditions accordingly, based clustering technique to dynamic indoor contexts
ensuring that samples are organized into relevant clusters where the properties of the environment and data alter
based on their magnetic profiles. over time. Investigating the algorithm’s performance in
Moreover, the proposed technique has been evaluated in a such settings might improve its practical usability.
controlled environment within the corridors of the considered • Robustness to noise and outliers: Outliers and noise are
building, utilizing 40 reference points. This setup allowed us common obstacles in clustering jobs. It is critical to
to test the system’s performance under specific conditions test the resilience of the LCM-based clustering technique
where the structure and the magnetic field variations are more against noisy data and outliers. Creating techniques to
predictable and easier to control. While this provided a suitable deal with and limit the impact of noise and outliers on
testbed for demonstrating the efficacy of the method, we clustering results would improve the algorithm’s overall
acknowledge that this is a limited scenario. The number of performance.
reference points and the size of the area can be increased, and • Exploration of parameter tuning: Certain parameters,
future work will explore the system’s performance in larger such as the threshold for establishing clusters and opti-
and more complex environments, such as shopping malls or mizing criteria for merging clusters, are used in the LCM-
airports. based clustering technique. Exploring ways for automatic
parameter adjustment and investigating the influence of
D. DSF vs K-Means and Agglomerative Clustering different parameter values on clustering results may be
The distance scale factor plays a crucial role in refining beneficial for optimizing the algorithm’s effectiveness.
the clustering process, but it does not define the number of • Real-world deployment and validation: Experiments and
clusters. This distinction is what contributes to the progres- validations in real-world contexts, such as large-scale
siveness of our algorithm. Unlike K-Means, which requires indoor spaces or varied building structures, would provide
the user to define K upfront, our method adapts to the data’s greater insight into the suggested technique’s practical
inherent structure, allowing the clustering process to evolve value. Field trials and user studies can aid in evaluating

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

15

the algorithm’s performance in real-world circumstances The proposed technique was evaluated using the SS, the
and collecting input from users or stakeholders. CH − I, and the DB − I. Besides this, it was also eval-
• Computational cost concern: The computational cost is a uated in a real-time localization environment at building 13
known trade-off for achieving high flexibility in handling of the university. Results show that the proposed technique
irregular cluster shapes as it might increase with a large effectively improves indoor localization, indicating that the
dataset. Therefore, we can adopt dimensionality reduction method has potential in cluster identification in a scenario
techniques like PCA, or t-SNE to reduce the number involving indoor localization based on MFS. Notably, our tech-
of features. Moreover, we can adopt parallelization or nique exhibits significant performance improvements when
approximate methods like locality-sensitive Hashing or applied to datasets collected from stationary reference points,
Ball-Tree method to enhance distance calculations. as opposed to those obtained during periods of ambulation.
• Generalizability Concerns: It is critical to stress that Furthermore, it gives relatively good accuracy in the test
our current technique is designed explicitly for grouping environment because of the handling of the magnetic signals
magnetic field datasets. As a result, the results’ general- based on the characteristics.
izability to other types of datasets may be limited. Future
research should investigate the applicability of the LCM- XI. ACKNOWLEDGMENT
based clustering technique to diverse data types.
This work has been supported by MUR project
By addressing these prospects and limits, we can progress ARS01 00592 reCITY.
the LCM-based clustering technique, improve its Robustness,
and broaden its usefulness in various domains that require
precise indoor localization and localization analysis. R EFERENCES
[1] H. Rafique, D. Patti, M. Palesi, and V. catania, “m-bmc: Exploration
of magnetic field measurements for indoor positioning using mini-batch
X. CONCLUSION magnetometer calibration,” in 2023 First IEEE International Conference
on Mobility: Operations, Services, and Technologies (MOST), pp. 55–61,
This study describes a unique clustering approach termed IEEE, 2023.
characteristic-based least common multiple (LCM) clustering [2] Y. Wang, J. Qian, M. Hassan, X. Zhang, T. Zhang, C. Yang, X. Zhou,
algorithms, which attempts to improve the accuracy of in- and F. Jia, “Density peak clustering algorithms: A review on the decade
2014–2023,” Expert Systems with Applications, vol. 238, p. 121860,
door localization. The technique takes advantage of sample 2024.
similarities to optimize the clustering effect. By considering [3] M. Mallik, S. Das, and C. Chowdhury, “Rank based iterative clustering
the MFS properties, the method finds the overall number of (rbic) for indoor localization,” Engineering Applications of Artificial
Intelligence, vol. 121, p. 106061, 2023.
clusters while identifying clusters with varied densities, forms, [4] H. Guo, H. Yin, S. Song, X. Zhu, and D. Ren, “Application of density
and sizes. clustering with noise combined with particle swarm optimization in uwb
The LCM-based clustering procedure starts with calculating indoor positioning,” Scientific Reports, vol. 14, no. 1, p. 13121, 2024.
[5] E. Vinciguerra, E. Russo, M. Palesi, G. Ascia, and H. Rafique, “Improv-
and building a symmetric matrix representing the differences ing lstm-based indoor positioning via simulation-augmented geomag-
between all data points. The method then clusters these netic field dataset,” in 2024 IEEE International Conference on Mobility,
locations by calculating the LCM of their attributes. If a Operations, Services and Technologies (MOST), pp. 251–259, 2024.
[6] X. Yang, Y. Zhuang, F. Gu, M. Shi, X. Cao, Y. Li, B. Zhou, and L. Chen,
new point does not fulfil the LCM-defined criteria, it is not “Deepwipos: A deep learning-based wireless positioning framework to
allocated to an existing cluster; additional clusters are formed address fingerprint instability,” IEEE Transactions on Vehicular Tech-
to accommodate it. Furthermore, the system identifies clus- nology, vol. 72, no. 6, pp. 8018–8034, 2023.
[7] Y. Huang, R. Ye, B. Yan, C. Zhang, and X. Zhou, “Qsfdew: a fingerprint
ters of independent points with distinguishing characteristics. positioning method based on quadtree search and fractal direction
These independent clusters are blended with the neighbouring entropy weighting,” Wireless Networks, vol. 29, no. 1, pp. 437–448,
clusters based on the minimum distance requirements. 2023.
[8] D. Gufran and S. Pasricha, “Calloc: Curriculum adversarial learning for
Furthermore, the technique considers the case where sepa- secure and robust indoor localization,” in 2024 Design, Automation &
rate clusters exist within more giant clusters and portray them- Test in Europe Conference & Exhibition (DATE), pp. 1–6, IEEE, 2024.
selves as distinct entities within the host cluster. This capacity [9] A. Famili, T. O. Atalay, and A. Stavrou, “5gps: 5g femtocell placement
strategies for ultra-precise indoor localization in the metaverse,” in
distinguishes the proposed approach from existing strategies Proc. of 2024 International Conference on Computing, Networking and
since it allows for identifying and placing these independent Communications: Wireless Communications, pp. 1132–1138, 2024.
clusters within the relevant clusters. This integration aids in [10] S. Sarfraz, V. Sharma, and R. Stiefelhagen, “Efficient parameter-
free clustering using first neighbor relations,” in Proceedings of the
the elimination of prediction ambiguity and enhances indoor IEEE/CVF conference on computer vision and pattern recognition,
localization accuracy. To correctly manage such scenarios, pp. 8934–8943, 2019.
the algorithm integrates the conditions mentioned in Section [11] H. Rafique, D. Patti, M. Palesi, G. C. La Delfa, and V. Catania, “Opti-
mization technique for indoor localization: A multi-objective approach to
IV-B3. sampling time and error rate trade-off,” in 2023 IEEE Third International
The benchmark datasets and the real-world setting were Conference on Signal, Control and Communication (SCC), pp. 01–06,
adopted for this study, utilizing three separate devices: the 2023.
[12] A. S. Yaro, F. Maly, P. Prazak, and K. Malỳ, “Improved fingerprint-based
Huawei P8 Light, Redminote 11 PRO, and the iPhone 13 Pro localization based on sequential hybridization of clustering algorithms,”
Max. The clustering algorithm’s performance was examined Emerging Science Journal, vol. 8, no. 2, pp. 394–406, 2024.
using two types of datasets: noisy datasets and clean datasets. [13] Q. Zhang, X. Bo, J. Yang, and C. Cui, “Indoor positioning for the
elderly based on fuzzy c-means clustering and knn algorithm,” in 2024
In both situations, our proposed approach outperformed the 5th International Seminar on Artificial Intelligence, Networking and
competition by precisely defining the clusters. Information Technology (AINIT), pp. 236–240, IEEE, 2024.

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261

16

[14] K. Berahmand, M. Mohammadi, A. Faroughi, and R. P. Mohammadiani, on Indoor Positioning and Indoor Navigation (IPIN), pp. 1–8, IEEE,
“A novel method of spectral clustering in attributed networks by con- 2016.
structing parameter-free affinity matrix,” Cluster Computing, pp. 1–20,
2022.
[15] L. Mavridis, N. Nath, and J. B. Mitchell, “Pfclust: a novel parameter
free clustering algorithm,” BMC bioinformatics, vol. 14, pp. 1–21, 2013.
[16] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and
computing, vol. 17, pp. 395–416, 2007.
[17] A. Anuwatkun, J. Sangthong, and S. Sang-Ngern, “A diff-based in-
door positioning system using fingerprinting technique and k-means
clustering algorithm,” in 2019 16th International Joint Conference on
Computer Science and Software Engineering (JCSSE), pp. 148–151,
IEEE, 2019.
[18] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based
algorithm for discovering clusters in large spatial databases with noise.,”
in kdd, vol. 96, pp. 226–231, 1996.
[19] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “Optics:
Ordering points to identify the clustering structure,” ACM Sigmod
record, vol. 28, no. 2, pp. 49–60, 1999.
[20] G. Ouyang and K. Abed-Meraim, “Analysis of magnetic field measure-
ments for indoor positioning,” Sensors, vol. 22, no. 11, p. 4014, 2022.
[21] S.-G. Fang, D. Huang, X.-S. Cai, C.-D. Wang, C. He, and Y. Tang,
“Efficient multi-view clustering via unified and discrete bipartite graph
learning,” IEEE Transactions on Neural Networks and Learning Systems,
2023.
[22] Y. Xu, D. Huang, C.-D. Wang, and J.-H. Lai, “Deep image clustering
with contrastive learning and multi-scale graph convolutional networks,”
Pattern Recognition, vol. 146, p. 110065, 2024.
[23] J. Ren, Y. Wang, C. Niu, W. Song, and S. Huang, “A novel clustering
algorithm for wi-fi indoor positioning,” IEEE Access, vol. 7, pp. 122428–
122434, 2019.
[24] G. Junyi, S. Li, H. Xiongxiong, and C. Jiajia, “A novel clustering
algorithm by adaptively merging sub-clusters based on the normal-
neighbor and merging force,” Pattern Analysis and Applications, vol. 24,
no. 3, pp. 1231–1248, 2021.
[25] S. G. Lee and C. Lee, “Developing an improved fingerprint positioning
radio map using the k-means clustering algorithm,” in 2020 International
Conference on Information Networking (ICOIN), pp. 761–765, IEEE,
2020.
[26] T. Vo-Van, A. Nguyen-Hai, M. Tat-Hong, and T. Nguyen-Trang, “A
new clustering algorithm and its application in assessing the quality of
underground water,” Scientific Programming, vol. 2020, pp. 1–12, 2020.
[27] S. El Khediri, W. Fakhet, T. Moulahi, R. Khan, A. Thaljaoui, and
A. Kachouri, “Improved node localization using k-means clustering for
wireless sensor networks,” Computer Science Review, vol. 37, p. 100284,
2020.
[28] M. Singh and S. K. Soni, “Fuzzy based novel clustering technique by
exploiting spatial correlation in wireless sensor network,” Journal of
Ambient Intelligence and Humanized Computing, vol. 10, pp. 1361–
1378, 2019.
[29] X. Deng, D. Huang, D.-H. Chen, C.-D. Wang, and J.-H. Lai, “Strongly
augmented contrastive clustering,” Pattern Recognition, vol. 139,
p. 109470, 2023.
[30] Y. Zhang, Q. Shi, J. Zhu, J. Peng, and H. Li, “Time series clustering
with topological and geometric mixed distance,” Mathematics, vol. 9,
no. 9, p. 1046, 2021.
[31] J. Paparrizos and L. Gravano, “k-shape: Efficient and accurate clustering
of time series,” in Proceedings of the 2015 ACM SIGMOD international
conference on management of data, pp. 1855–1870, 2015.
[32] Z. Cui, X. Jing, P. Zhao, W. Zhang, and J. Chen, “A new subspace
clustering strategy for ai-based data analysis in iot system,” IEEE
Internet of Things Journal, vol. 8, no. 16, pp. 12540–12549, 2021.
[33] E. Zhu and R. Ma, “An effective partitional clustering algorithm based on
new clustering validity index,” Applied soft computing, vol. 71, pp. 608–
621, 2018.
[34] P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and
validation of cluster analysis,” Journal of computational and applied
mathematics, vol. 20, pp. 53–65, 1987.
[35] D. L. Davies and D. W. Bouldin, “A cluster separation measure,”
IEEE transactions on pattern analysis and machine intelligence, no. 2,
pp. 224–227, 1979.
[36] T. Caliński and J. Harabasz, “A dendrite method for cluster analysis,”
Communications in Statistics-theory and Methods, vol. 3, no. 1, pp. 1–
27, 1974.
[37] P. Barsocchi, A. Crivello, D. La Rosa, and F. Palumbo, “A multisource
and multivariate dataset for indoor localization methods based on wlan
and geo-magnetic field fingerprinting,” in 2016 International Conference

Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.

You might also like