A_Characteristics-Based_Least_Common_Multiple_Algorithm_to_Optimize_Magnetic_Field-Based_Indoor_Localization
A_Characteristics-Based_Least_Common_Multiple_Algorithm_to_Optimize_Magnetic_Field-Based_Indoor_Localization
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
Abstract—Clustering is an unsupervised learning technique reduced long-range accuracy, which affect WiFi, RFID, and
that groups data based on similarity criteria. Traditional methods BLE signals, leading to localization issues. To address these
like K-Means and agglomerative clustering often require prede- challenges, researchers are turning to MFS for localization,
fined parameters, struggle with irregular cluster shapes, and fail
to classify sub-cluster points in magnetic fingerprint-based indoor introducing a concept known as ”fingerprinting.” This method
localization. This study proposes the Characteristics-Based Least involves collecting MFS data from reference points (RPs)
Common Multiple (LCM) algorithm to address these challenges. throughout the study area and using this data for real-time
This novel approach autonomously determines cluster number localization at random test points (TPs) [4], [5].
and shape while accurately classifying misclassified points based Fingerprinting is a key focus in indoor localization re-
on characteristic similarities using LCM. We evaluated the
proposed technique using state-of-the-art metrics and tested it in search. To advance this area, [6] propose DeepWiPos, a deep
magnetic field-based indoor localization scenarios. Comparisons learning-based fingerprinting framework that converts unstable
were made with real-time and benchmark datasets, alongside fingerprints into stable values using a Fingerprint Spatial
traditional clustering methods. Results demonstrate that LCM Gradient (FSG). They fuse FSG and RSS with an LSTM
significantly enhances localization accuracy, achieving a mean and attention module to reduce spatial ambiguity. DeepWiPos
absolute error rate of 0.1 m
reduces average positioning errors by over 22% on Bluetooth
Index Terms—Indoor localization, novel clustering technique, and Wi-Fi. Moreover, authors [7] proposed the QSFDEW
least common multiple, machine learning, Data Clustering, au- i.e., Quadtree Search and Fractal Direction Entropy Weighting
tomatic clustering
method to improve indoor localization by using a quadtree
algorithm to divide the area into grids and locate reference
I. I NTRODUCTION points efficiently. It quickly searches neighbouring quadrants
Advancements in IoT technology have led to a sharp rise and combines entropy weighting to enhance accuracy. Its key
in data-intensive applications, including indoor navigation advantages are low complexity and high efficiency, making it
and localization within various industries. Indoor localization an effective solution compared to other positioning algorithms.
involves determining an approximate or precise user location Furthermore, author [8] proposed CALLOC a framework de-
within an indoor environment. While the Global Positioning signed to resist adversarial attacks and variations across envi-
System (GPS) is highly effective outdoors, its signals cannot ronments and devices. It uses an adaptive curriculum learning
penetrate thick building structures or basements. Therefore, approach with a lightweight neural network for resilience,
researchers are developing alternative approaches for indoor specifically tailored for resource-constrained devices to secure
localization systems (ILS) by leveraging data from sensors fingerprinting. Similarly, authors [9] work on the optimization
such as Wireless Fidelity (WiFi), Radio Frequency Identifi- of the localization using the geometric relationship between a
cation (RFID), Bluetooth Low Energy (BLE), and Magnetic 5G-enabled Metaverse user and 5G femtocells impact indoor
Field Signals (MFS) [1], [2]. localization accuracy. Despite these advantages, fingerprinting
Traditionally, researchers have utilized the angle of arrival presents challenges, particularly the time and effort required to
or time of arrival signals from various access points for local- collect and maintain well-annotated location-based data across
ization [3]. However, these methods have limitations, including extensive study areas [10]–[13]
signal blockages and reduced long-range accuracy. However, Factors such as perturbations and ferromagnetic materials
these methods have limitations, such as signal blockages and can introduce complexities and inaccuracies in MFS used
in ILS. These issues can result in multiple predictions for
H. Rafique*, D. Patti, M. Palesi and GC. ladelfa are with the Depart- distinct location points, especially under unpredictable test
ment of Electrical, Electronics and Computer Engineering, University of
Catania, Catania, Italy. email: [email protected] (corresponding conditions characterized by non-linear relationships among
author). The email addresses of coauthors are [email protected], maur- variables. Machine learning techniques, including supervised
[email protected], and [email protected]. Copyright (c) 20xx IEEE. and unsupervised methods, have emerged as valuable tools
Personal use of this material is permitted. However, permission to use this
material for any other purposes must be obtained from the IEEE by sending to tackle these challenges. Unsupervised clustering techniques
a request to [email protected] such as K-means [14], PF Clust [15], graph-based spectral
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
proximity to the cluster centre, making it an efficient and a backbone network with triply-shared weights for represen-
straightforward technique widely used across multiple disci- tation learning, and involves instance-level and cluster-level
plines [14], [23]. However, despite its simplicity, k-Means has contrastive learning. Similarly, [21] proposes a multi-view
limitations. It requires a user-defined number of clusters, k, deep clustering technique named unified and discrete bipartite
and only produces spherical clusters, often resulting in a global graph learning to handle computational complexity, single-
minimum [24], [25]. view graph learning, and embedding discretization using k-
Clusters in a dataset often exist in fuzzy intervals, making Means. Experiments demonstrate the efficiency and robustness
it challenging to determine the k value. Consequently, the of this approach on multi-view datasets. However, these tech-
algorithm needs multiple executions with different combina- niques are more suitable for high-dimensional datasets and
tions of clustering numbers to identify the suitable number require large datasets for learning deep clustering parameters.
of clusters, balancing execution time and space. Techniques Time series-based clustering techniques are also explored
such as those proposed by [26], [27] address some limitations in the literature. In [30], the authors address the geometrical
but still struggle with clustering arbitrary shapes due to their and global topological properties of time series data with a
reliance on assigning data points to the closest centre. A method called ”topological and geometric mixed distance.”
graph-based spectral technique proposed by [16] recognizes This technique extracts topology features using the persistence
arbitrary shape clusters by grouping data points based on diagram of the point cloud of time series data and employs
closely connected elements in the graph’s structure. However, local geometric properties, specifically time correlations. Their
like k-Means, it requires input to define the cluster centres results outperformed the baseline. Additionally, [31] presents a
[28]. k-shape clustering technique for time series datasets, utilizing a
The PF Clust (Parameter Free Clustering) [15] automati- normalized version of the cross-correlation measure to account
cally determines the number of clusters without user-defined for the shapes of time series. This method compares favourably
parameters. PF Clust applies an agglomerative method to with other clustering methods, demonstrating robustness and
numerous randomly generated sub-datasets, assessing internal efficiency [32].
validity measures and setting thresholds based on the number However, these techniques primarily focus on either cluster-
of clusters. While PF Clust can outperform typical clustering ing shapes or dealing with user-defined parameters. Deep and
methods, it is time-consuming and computationally expensive time series-based clustering methods handle high-dimensional
due to the need for repeated sampling and evaluation. In [24], complex patterns but overlook scenarios where data points
researchers used an automated clustering technique with a with similar characteristics are physically distant and assigned
force function to control the movement of objects. As the to incorrect clusters. This omission is critical in indoor local-
distance between two objects increases, the force between ization, where high accuracy is essential for effective resource
them decreases, causing each item to migrate towards its management, improved security, safety, navigation, and overall
respective cluster centre. This force is calculated using a user- user experience.
defined parameter called λ, which influences the number of To overcome these limitations, we introduce a new cluster-
clusters. ing algorithm called the ”Characteristic-Based Least Common
To address the issue of clustering shapes, density-based Multiple Technique.” This algorithm leverages the features of
clustering techniques are well-known for their ability to detect MFS to detect physically distant sample points, or sub-clusters,
arbitrary-shaped clusters without user-defined cluster centres. sharing similar characteristics and appropriately assigns them
DBSCAN (Density-Based Spatial Clustering of Applications to their respective clusters.
with Noise) [18] is the most commonly used density-based The following section will provide a foundational under-
technique. It identifies arbitrarily shaped clusters by using a standing of the clustering techniques used in the proposed
specific threshold value of the density, ϵ, determined by the algorithm.
MinPts and the radius of the neighbourhood. Authors [26]
proposed a novel clustering technique based on epsilon radius
III. P RELIMINARY K NOWLEDGE
neighbours, automatically identifying the number and shape
of clusters [24]. Spectral clustering [18] and k-Means kernel This section outlines the essential concepts and operations
[17] also define clusters based on arbitrary shapes but require involved in the clustering process, focusing on how datasets
a predefined number of clusters. DBSCAN may merge clusters are partitioned into distinct groups using a clustering algo-
close to each other, a drawback addressed by its variant, rithm.
OPTICS, which orders data points based on their density to The clustering aims to divide the data set (DS) into K
reveal clusters of different densities [19]. clusters CK = {c1 , c2 , c3 . . . , ck } [33]. Let’s assume we have
Recent research suggests deep learning-based clustering, a data set DS = {dsp1 , dsp2 , . . . , dspn } ∈ Rn×df , where
integrating deep learning and common clustering techniques df is the dimension of each sample point dspi , and n is the
to understand complex patterns in high-dimensional spaces. number of sample points. We denote the cluster to which a
This approach is effective in tasks like image segmentation, data point dspi belongs as cdspi .
document clustering, and speech recognition. In [29], the The algorithm produces a K × n partition matrix as
authors introduced a robust clustering technique using strong U (DS) = [Uki ], with K = {1, 2, . . . , k} and i =
augmentation, achieving an overall accuracy of 76.5%. This {1, 2, 3, . . . , n}. Each element Uki indicates the membership
method combines strong and weak augmentations, utilizing degree of the dspi to the assigned cluster cdspi . In hard
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
Phase 1
Phase 2
2a 2b
Ci= {Ci1,Ci2,...Cin}
No
Yes Eq
(10) Pc
LCM Eq (9) Eq (5) Merged Post Cluster Final
Eq (8) mcki Stop
dij
LCM(h,m) Data Processing Processed Yes Clusters
No Eq (11)
Ic
Cj= {Cj1,Cj2,...Cjn}
Fig. 2. Flowchart: Phase 1 initializes the data by computing the Euclidean distance (ED) on the input data, resulting in a symmetric matrix of distances. In
Phase 2, clusters are formed. Initially, the algorithm computes the LCM to create clusters. Ci represents the main clusters, while Cj denotes clusters with
single values. These are managed using ED to retain information, followed by post-processing to address repetitions Pc and clusters with shared characteristics
Ic to achieve the final clusters.
clustering, the degree of membership is either 1 or 0, indicating IV. P ROPOSED C LUSTERING A LGORITHM
whether a dsp belongs to a particular cluster. The membership This section presents a novel clustering technique to im-
function Uki is defined using Eq. (1). prove indoor localization by grouping MFS based on their
( inherent properties. Unlike conventional clustering methods,
1, if dspi ∈ Ck which typically focus on central points or dense regions, our
Uki = (1)
0, otherwise approach prioritizes the data’s distinctive characteristics.
The proposed algorithm, illustrated in Figure 2, consists
To have proper clustering, the partitioning must satisfy
of two main phases. Phase 1, begins with calculating the
Eq. (2), Eq. (3) and Eq. (4) such that (i ̸= j, i, j =
Euclidean Distance (ED) between data points, as shown in
{1, 2, 3, . . . , k}) ensures that no sample belongs to more than
Eq. (6), which produces a symmetric distance matrix as shown
one cluster.
in Eq. (7). This matrix is the key input for the subsequent
k clustering process. In Phase 2, clusters are formed by cal-
DS = Ui=1 Ci (2) culating the LCM from the distance matrix using Eq. (8),
and the assignment of data points to clusters is determined
Ci ̸= ∅ (3) by Eq. (9). Here, Ci represents the primary clusters, grouping
data points that meet the specified criteria, while Cj contains
outliers representing noise or distant samples.
Ci ∩ Cj ̸= ∅ (4) In addition to noise, there are two other categories of Cj : Pc ,
which denotes repeated samples that form sub-clusters Ic , and
In the Euclidean space, the distance between two sample Ic , which are sub-clusters within larger clusters, containing
points dspi and dspj can be calculated as: samples with unique features distinct from their parent cluster.
v
u df Post-processing steps, as described in Eq. (10) and Eq. (11),
uX are then applied to ensure accurate and meaningful cluster
dij = t (dspip − dspjp )2 ∀i, j (5) assignments. This process preserves the integrity of the data
p=o
by grouping it based on its natural characteristics.
This distance metric is commonly used to determine the The next sections will explore the methodology in detail,
similarity between sample points, which guides the formation following the order outlined in the flowchart for clarity and
of clusters based on proximity. better understanding.
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
calculates and compares the LCM of existing clusters with the nested lists. Upon studying the characteristics of Ic , we
LCM of the new value nv . The decision to assign a sample identify shared characteristics with other clusters, a primary
to a cluster is governed by the membership condition mcki as research focus discussed in Section I, and illustrated in Figure
Eq. (9): 8. To address Ic , we calculate the LCM of all host clusters Ci
excluding Ic and divide the LCM of Ic with Ci . If it satisfies
1, if LCMnv +Ci % LCMCi == 0 the conditions expressed in Eq. (11) then Ic will be assigned
1, to relevant host clusters with similar characteristics. Where the
if LCMCi ̸= 0
mcki = (9) range of ς is between [0.5, 1.5].
1,
if LCMCi ≤ ϱ
0, 1, if LCMIc % LCMCi == 0
Otherwise
Once the condition is satisfied, the algorithm adds nv to Ic = 1, if LCMIc % LCMCi ∈ ς (11)
an existing cluster Ci . If the condition is not satisfied, a new 0, Otherwise
cluster will be formed. This process continues until all samples
in diff ij are processed. The threshold ϱ prevents division This iterative process continues until the final cluster sets
by zero and typically defaults to 1. With these rules, the are obtained. Unlike traditional approaches that rely solely
algorithm effectively groups similar nv into clusters. If nv on distance-based metrics, the fundamental concept of this
does not satisfy the conditions, a cluster Cj is formed. It clustering strategy is to autonomously cluster based on the
will represent the noise and can be eliminated from further features of the magnetic field signals. Consequently, the al-
analysis. However, we merge them by finding the cluster near gorithm autonomously determines the appropriate number of
Cj with the minimum distance to generalize the results. clusters and their shapes. These advantages distinguish our
2) Phase 2b1: Handling Repeating Values Pc : After clus- proposed clustering algorithm from conventional methods like
ters are formed, a post-processing step addresses the increase DBSCAN, Agglomerative clustering, and K-means.
in sample size caused by repeated values nv appearing across
different clusters. These repeats occur because nv values with V. E VALUATION C RITERIA
similar LCM tend to fall into the same Ci . To manage this,
Before proceeding with the evaluation, let’s outline the pro-
placement conditions Pc (see Eq. (10)) are used to assign these
cess. First, we will evaluate the performance using established
repeated values accurately to their respective clusters. First, the
techniques. Second, we will utilize the clustered dataset to
process compares the frequency of repeated values in clusters
train a real-time indoor location prediction model.
A and B. If a value appears more often in A than B, it is
moved from B to A. This check is applied to all clusters
containing repeated values. A. State-of-the-Art Clustering Validity Index
The second condition identifies nearby neighbours for each
The proposed clustering technique works unsupervised
repeated value by calculating the differences between it and
without predefined labels for arbitrary shapes. Hence, accord-
other points within the cluster. The algorithm prioritizes the
ing to authors [34]–[36], the Silhouette score (SS), Calinski-
first criterion unless it detects at least three neighbours close
Harabasz Index (CH-I), and Davies-Bouldin Index (DB-I) are
to the repeated value that falls within the threshold distance,
the three state-of-the-art metrics that can be used for arbitrary
ς. For instance, if cluster A contains the repeated value 5
shape clustering.
occurring 10 times and cluster B contains 5 appearing 8
times, then 5 will be assigned to cluster A based on the first 1) Silhouette score: It is mathematically defined as
criterion. However, if the second condition identifies at least dsp
three nearby neighbours for 5 in cluster B then 5 will move 1 X bi − ai
SS = (12)
from A to B. dsp i=1 max(ai , bi )
Where dsp denotes the total number of samples, ai is
1, if the repeating value in Cluster A > Cluster B
the average distance between sample i and all other
Pc = 1, if neighbours of repeating value in Cluster A ≥ ς samples in the same cluster, and bi is the average
0, Otherwise distance between sample i and all samples in the nearest
(10) neighbouring cluster.
Moreover, if the number of neighbours is equivalent in 2) Calinski-Harabasz Index: It is mathematically defined as
both clusters, the minimum value constraint of the nearest
h PK 2
i
neighbour is incremented to one. This approach ensures fair i=1 |Ci |d(vi ,v)
K−1
treatment of repeating values, enhancing the coherence and CHI(K) = PK P (13)
d(dsp,vi )2
robustness of the clustering methodology. i=1 dsp∈Cj
dsp−K
3) Phase 2b2: Handling Sub-Clusters Ic : Due to the influ-
ence of the indoor environment, certain MFS exhibit similar Where Vi is the centroid of the cluster Ci , and v is the
characteristics, leading to sub-clusters within clusters that global centroid of all the dsp in DS.
reveal distinctive features. These sub-clusters, designated as 3) Davies-Bouldin Index: This technique is mathematically
independent sub-clusters Ic within the host cluster, exemplify defined as
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
Fig. 3. Illustration of the Proposed Clustering Technique Using a Smaller Fig. 4. Study Environment: Visual Depiction of the Second Floor of Building
Dataset for Enhanced Comprehension, where Bx , By , and Bz are the 13 at Unict. Data from this building is used to Evaluate the Proposed
dimensions of the magnetic fields Clustering Technique for Real-time Time Localization.
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
TABLE I
C LUSTER E VALUATION M ATRIX R ESULTS ACROSS D IVERSE DATASETS , U SING H ETEROGENEOUS D EVICES , AND S TUDY E NVIRONMENTS U SING
T HREE E VALUATION T ECHNIQUES I . E ., S ILHOUETTE S CORE [-1,1], C ALINSKI -H ARABASZ INDEX (M AX ) AND DAVIES -B OULDIN INDEX [-0,1] ON
N OISY AND C LEAN DATASETS
Study Area Mobile Devices Data Sets Total dsp Silhouette Score Calinski-Harabasz index Davies-Bouldin index
Open Lab [1] Huawei P8 Lite Noisy Data 8920 0.83 229098.30 0.21
Open Lab [1] iPhone 13 Pro Max Noisy Data 1882 0.91 229154.46 0.11
Open Lab [1] Huawei P8 Lite Clean Data 8920 0.72 60817.75 0.30
Open Lab [1] iPhone 13 Pro Max Clean Data 1882 0.91 94257.08 0.11
Corridor [37] Sony Xperia M2 Noisy Data 36795 0.99 72902.23 0.21
Building 13 Red Mi note 11 pro Noisy Data 25000 0.84 1154545.40 0.18
Building 13 Red Mi note 11 pro Clean Data 25000 0.90 1474458.80 0.08
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
10
Number of Clusters vs. Distance Scale Factor Normalized Evaluation Criteria vs. Distance Scale Factor
25
Huawei P8 Lite 1.0
iPhone 13 ProMax
0.9
20
0.8
15 0.7
Silhouette Score
0.6 Calinski-Harabasz Index
Davies-Bouldin Index
10 0.5
0.4
5 0.3
0.2
10 20 30 40 50 1 2 3 4
Distance Scale Factor Distance Scale Factor
Number of Clusters vs. Distance Scale Factor Normalized Evaluation Criteria vs. Number of Clusters
Huawei P8 Lite 1.0
iPhone 13 ProMax
2500 0.9
0.8
0.7
1500 0.6
1000 0.5
0.4
500
0.3 Silhouette Score
Calinski-Harabasz Index
0 0.2
Davies-Bouldin Index
0 10 20 30 40 50 23 27 31 35 39 43
Distance Scale Factor Number of Clusters
38
technique: a SS above 0.5 indicates robust clustering, with
Number of Clusters
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
11
Normalized Evaluation Criteria vs. Distance Scale Factor Normalized Evaluation Criteria vs. Distance Scale Factor
1.0 Silhouette Score 1.0
Calinski-Harabasz Index
Davies-Bouldin Index
0.8 0.8
Normalized Evaluation Criteria
0.4
0.4
0.2
0.2
0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
Distance Scale Factor Distance Scale Factor
Normalized Evaluation Criteria vs. Distance Scale Factor Normalized Evaluation Criteria vs. Distance Scale Factor
1.0 1.0 Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index
0.8 0.8
Normalized Evaluation Criteria
0.2 0.2
0.0 0.0
0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
Distance Scale Factor Distance Scale Factor
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
12
Calinski-Harabasz Score
Davies-Bouldin Score
150000 0.3
Silhouette Score
0.6
100000 0.2
0.4
0.0 0 0.0
Clean Iphone Clean Huawei Noisy Iphone Noisy Huawei Bench Mark [24] Clean Iphone Clean Huawei Noisy Iphone Noisy Huawei Bench Mark [24] Clean Iphone Clean Huawei Noisy Iphone Noisy Huawei Bench Mark [24]
Dataset Dataset Dataset
(a) Comparison based on SS (b) Comparison based on CH-I (c) Comparison based on DB-I
Fig. 14. Comparative Evaluation of the Proposed Clustering Technique, K-Means Clustering, and Agglomerative Clustering based with (a) Silhouette Scores,
(b) Calinski-Harabasz Index (CH-I), and (c) Davies-Bouldin Index (DB-I). Silhouette Scores, ranging from [−1,1], where 1 indicates optimal results. CH − I
represents the dispersion ratio among defined clusters, with higher values indicating superior clustering. DB − I measures the distance between well-defined
clusters, aiming for values close to 0 for optimal clustering performance with ranges of [−0,1].
TABLE II
C OMPARISON OF R ESULTS M ETRICS : T HIS TABLE PRESENTS A COMPARATIVE ANALYSIS OF M EAN A BSOLUTE E RROR (MAE), PREDICTION TIME , AND
STANDARD DEVIATION OF ERRORS OBTAINED FROM A SINGLE MODEL TRAINED ON FOUR DISTINCT DATASETS CLUSTERED USING VARIOUS
TECHNIQUES , INCLUDING THE PROPOSED LCM CLUSTERING METHOD .
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
13
VIII. D ISCUSSION
0.6
A. Computational Cost
CDF %
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
14
TABLE IV
C OMPUTATIONAL C OST COMPARISON IN TERMS OF TIME CALCULATION AND MEMORY USAGE BY EACH STEP WITH N OISY AND CLEAN DATA .
C ALCULATION T IME IN S ECONDS AND MEMORY USAGE IN MB S .
Study Area Mobile Devices Data Sets Distance Matrix Time Clustering Time Distance Matrix Memory Clustering Memory
Open Lab [1] Huawei P8 Lite Clean Data 0.80 0.47 607.04 27.14
Open Lab [1] iPhone 13 pro Max Clean Data 0.04 0.05 27.16 1.64
Open Lab [1] Huawei P8 Lite Noisy Data 0.77 0.46 606.79 28.39
Open Lab [1] iPhone 13 pro Max Noisy Data 0.04 0.05 27.23 1.57
Corridor [37] Sony Xperia M2 Noisy Data 6.97 6.59 498.93 270.54
Building 13 RedMi Note 11 pro Clean Data 5.43 5.43 477.40 253.33
Building 13 RedMi Note 11 pro Noisy Data 5.20 8.14 476.70 364.25
others, its magnetic signature will be unique and can be dynamically. The scale factor adjusts the distance metric,
used to detect that it may belong to a separate, isolated making the algorithm more flexible and capable of handling
region. In this way, magnetic field features act as a natural complex, heterogeneous datasets without a predefined number
discriminator for identifying outliers in the form of far-away of clusters.
rooms. Furthermore, we used Euclidean distance to generate One of the significant advantages of this approach is its abil-
the distance matrix for clustering. This matrix will be used ity to handle variable data densities. K-Means, for example,
to generate the clusters and to distinguish the outliers in often struggle with datasets that contain clusters of varying
our proposed technique. We used Eq. (11) to identify any shapes and densities because they assume a uniform distribu-
significant gaps between data points and the nearby clusters. If tion of data points. This limitation is especially pronounced
the sample exhibits a larger distance to nearby cluster centroids when the number of clusters k is improperly selected. Our
then it will be termed as an independent or outlier cluster. algorithm, however, mitigates this issue by using the distance
Hence Such clusters will then be flagged for further analysis. scale factor to adapt to the natural distribution of data, making
However, if the independent clusters obey the conditions of it more robust and effective in diverse scenarios.
Eq. (11) then it will sustain for further processing.
IX. L IMITATIONS AND F UTURE WORK
C. Shape-Agnostic Environments
In the future, we aim to enhance the clustering process
The proposed clustering technique is shape-agnostic and de- by integrating LCM with other benchmark techniques. This
signed to work based on the MFS distribution. The LCM-based integration will enable us to mitigate uncertainties in indoor
clustering focuses on the distance between magnetic samples location prediction. Additionally, we recognize the need to
rather than the arbitrary-shaped rooms. Whether the room is address the following challenges:
diamond-shaped, irregular, or rectangular, the magnetic field • Scalability to dynamic environments: The current article
at any specific location will have unique characteristics. The discusses clustering in static indoor situations. Future
algorithm calculates distances between points, converts these research should look into the adaptability of the LCM-
into integers, and applies the clustering conditions accordingly, based clustering technique to dynamic indoor contexts
ensuring that samples are organized into relevant clusters where the properties of the environment and data alter
based on their magnetic profiles. over time. Investigating the algorithm’s performance in
Moreover, the proposed technique has been evaluated in a such settings might improve its practical usability.
controlled environment within the corridors of the considered • Robustness to noise and outliers: Outliers and noise are
building, utilizing 40 reference points. This setup allowed us common obstacles in clustering jobs. It is critical to
to test the system’s performance under specific conditions test the resilience of the LCM-based clustering technique
where the structure and the magnetic field variations are more against noisy data and outliers. Creating techniques to
predictable and easier to control. While this provided a suitable deal with and limit the impact of noise and outliers on
testbed for demonstrating the efficacy of the method, we clustering results would improve the algorithm’s overall
acknowledge that this is a limited scenario. The number of performance.
reference points and the size of the area can be increased, and • Exploration of parameter tuning: Certain parameters,
future work will explore the system’s performance in larger such as the threshold for establishing clusters and opti-
and more complex environments, such as shopping malls or mizing criteria for merging clusters, are used in the LCM-
airports. based clustering technique. Exploring ways for automatic
parameter adjustment and investigating the influence of
D. DSF vs K-Means and Agglomerative Clustering different parameter values on clustering results may be
The distance scale factor plays a crucial role in refining beneficial for optimizing the algorithm’s effectiveness.
the clustering process, but it does not define the number of • Real-world deployment and validation: Experiments and
clusters. This distinction is what contributes to the progres- validations in real-world contexts, such as large-scale
siveness of our algorithm. Unlike K-Means, which requires indoor spaces or varied building structures, would provide
the user to define K upfront, our method adapts to the data’s greater insight into the suggested technique’s practical
inherent structure, allowing the clustering process to evolve value. Field trials and user studies can aid in evaluating
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
15
the algorithm’s performance in real-world circumstances The proposed technique was evaluated using the SS, the
and collecting input from users or stakeholders. CH − I, and the DB − I. Besides this, it was also eval-
• Computational cost concern: The computational cost is a uated in a real-time localization environment at building 13
known trade-off for achieving high flexibility in handling of the university. Results show that the proposed technique
irregular cluster shapes as it might increase with a large effectively improves indoor localization, indicating that the
dataset. Therefore, we can adopt dimensionality reduction method has potential in cluster identification in a scenario
techniques like PCA, or t-SNE to reduce the number involving indoor localization based on MFS. Notably, our tech-
of features. Moreover, we can adopt parallelization or nique exhibits significant performance improvements when
approximate methods like locality-sensitive Hashing or applied to datasets collected from stationary reference points,
Ball-Tree method to enhance distance calculations. as opposed to those obtained during periods of ambulation.
• Generalizability Concerns: It is critical to stress that Furthermore, it gives relatively good accuracy in the test
our current technique is designed explicitly for grouping environment because of the handling of the magnetic signals
magnetic field datasets. As a result, the results’ general- based on the characteristics.
izability to other types of datasets may be limited. Future
research should investigate the applicability of the LCM- XI. ACKNOWLEDGMENT
based clustering technique to diverse data types.
This work has been supported by MUR project
By addressing these prospects and limits, we can progress ARS01 00592 reCITY.
the LCM-based clustering technique, improve its Robustness,
and broaden its usefulness in various domains that require
precise indoor localization and localization analysis. R EFERENCES
[1] H. Rafique, D. Patti, M. Palesi, and V. catania, “m-bmc: Exploration
of magnetic field measurements for indoor positioning using mini-batch
X. CONCLUSION magnetometer calibration,” in 2023 First IEEE International Conference
on Mobility: Operations, Services, and Technologies (MOST), pp. 55–61,
This study describes a unique clustering approach termed IEEE, 2023.
characteristic-based least common multiple (LCM) clustering [2] Y. Wang, J. Qian, M. Hassan, X. Zhang, T. Zhang, C. Yang, X. Zhou,
algorithms, which attempts to improve the accuracy of in- and F. Jia, “Density peak clustering algorithms: A review on the decade
2014–2023,” Expert Systems with Applications, vol. 238, p. 121860,
door localization. The technique takes advantage of sample 2024.
similarities to optimize the clustering effect. By considering [3] M. Mallik, S. Das, and C. Chowdhury, “Rank based iterative clustering
the MFS properties, the method finds the overall number of (rbic) for indoor localization,” Engineering Applications of Artificial
Intelligence, vol. 121, p. 106061, 2023.
clusters while identifying clusters with varied densities, forms, [4] H. Guo, H. Yin, S. Song, X. Zhu, and D. Ren, “Application of density
and sizes. clustering with noise combined with particle swarm optimization in uwb
The LCM-based clustering procedure starts with calculating indoor positioning,” Scientific Reports, vol. 14, no. 1, p. 13121, 2024.
[5] E. Vinciguerra, E. Russo, M. Palesi, G. Ascia, and H. Rafique, “Improv-
and building a symmetric matrix representing the differences ing lstm-based indoor positioning via simulation-augmented geomag-
between all data points. The method then clusters these netic field dataset,” in 2024 IEEE International Conference on Mobility,
locations by calculating the LCM of their attributes. If a Operations, Services and Technologies (MOST), pp. 251–259, 2024.
[6] X. Yang, Y. Zhuang, F. Gu, M. Shi, X. Cao, Y. Li, B. Zhou, and L. Chen,
new point does not fulfil the LCM-defined criteria, it is not “Deepwipos: A deep learning-based wireless positioning framework to
allocated to an existing cluster; additional clusters are formed address fingerprint instability,” IEEE Transactions on Vehicular Tech-
to accommodate it. Furthermore, the system identifies clus- nology, vol. 72, no. 6, pp. 8018–8034, 2023.
[7] Y. Huang, R. Ye, B. Yan, C. Zhang, and X. Zhou, “Qsfdew: a fingerprint
ters of independent points with distinguishing characteristics. positioning method based on quadtree search and fractal direction
These independent clusters are blended with the neighbouring entropy weighting,” Wireless Networks, vol. 29, no. 1, pp. 437–448,
clusters based on the minimum distance requirements. 2023.
[8] D. Gufran and S. Pasricha, “Calloc: Curriculum adversarial learning for
Furthermore, the technique considers the case where sepa- secure and robust indoor localization,” in 2024 Design, Automation &
rate clusters exist within more giant clusters and portray them- Test in Europe Conference & Exhibition (DATE), pp. 1–6, IEEE, 2024.
selves as distinct entities within the host cluster. This capacity [9] A. Famili, T. O. Atalay, and A. Stavrou, “5gps: 5g femtocell placement
strategies for ultra-precise indoor localization in the metaverse,” in
distinguishes the proposed approach from existing strategies Proc. of 2024 International Conference on Computing, Networking and
since it allows for identifying and placing these independent Communications: Wireless Communications, pp. 1132–1138, 2024.
clusters within the relevant clusters. This integration aids in [10] S. Sarfraz, V. Sharma, and R. Stiefelhagen, “Efficient parameter-
free clustering using first neighbor relations,” in Proceedings of the
the elimination of prediction ambiguity and enhances indoor IEEE/CVF conference on computer vision and pattern recognition,
localization accuracy. To correctly manage such scenarios, pp. 8934–8943, 2019.
the algorithm integrates the conditions mentioned in Section [11] H. Rafique, D. Patti, M. Palesi, G. C. La Delfa, and V. Catania, “Opti-
mization technique for indoor localization: A multi-objective approach to
IV-B3. sampling time and error rate trade-off,” in 2023 IEEE Third International
The benchmark datasets and the real-world setting were Conference on Signal, Control and Communication (SCC), pp. 01–06,
adopted for this study, utilizing three separate devices: the 2023.
[12] A. S. Yaro, F. Maly, P. Prazak, and K. Malỳ, “Improved fingerprint-based
Huawei P8 Light, Redminote 11 PRO, and the iPhone 13 Pro localization based on sequential hybridization of clustering algorithms,”
Max. The clustering algorithm’s performance was examined Emerging Science Journal, vol. 8, no. 2, pp. 394–406, 2024.
using two types of datasets: noisy datasets and clean datasets. [13] Q. Zhang, X. Bo, J. Yang, and C. Cui, “Indoor positioning for the
elderly based on fuzzy c-means clustering and knn algorithm,” in 2024
In both situations, our proposed approach outperformed the 5th International Seminar on Artificial Intelligence, Networking and
competition by precisely defining the clusters. Information Technology (AINIT), pp. 236–240, IEEE, 2024.
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Internet of Things Journal. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2024.3511261
16
[14] K. Berahmand, M. Mohammadi, A. Faroughi, and R. P. Mohammadiani, on Indoor Positioning and Indoor Navigation (IPIN), pp. 1–8, IEEE,
“A novel method of spectral clustering in attributed networks by con- 2016.
structing parameter-free affinity matrix,” Cluster Computing, pp. 1–20,
2022.
[15] L. Mavridis, N. Nath, and J. B. Mitchell, “Pfclust: a novel parameter
free clustering algorithm,” BMC bioinformatics, vol. 14, pp. 1–21, 2013.
[16] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and
computing, vol. 17, pp. 395–416, 2007.
[17] A. Anuwatkun, J. Sangthong, and S. Sang-Ngern, “A diff-based in-
door positioning system using fingerprinting technique and k-means
clustering algorithm,” in 2019 16th International Joint Conference on
Computer Science and Software Engineering (JCSSE), pp. 148–151,
IEEE, 2019.
[18] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based
algorithm for discovering clusters in large spatial databases with noise.,”
in kdd, vol. 96, pp. 226–231, 1996.
[19] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “Optics:
Ordering points to identify the clustering structure,” ACM Sigmod
record, vol. 28, no. 2, pp. 49–60, 1999.
[20] G. Ouyang and K. Abed-Meraim, “Analysis of magnetic field measure-
ments for indoor positioning,” Sensors, vol. 22, no. 11, p. 4014, 2022.
[21] S.-G. Fang, D. Huang, X.-S. Cai, C.-D. Wang, C. He, and Y. Tang,
“Efficient multi-view clustering via unified and discrete bipartite graph
learning,” IEEE Transactions on Neural Networks and Learning Systems,
2023.
[22] Y. Xu, D. Huang, C.-D. Wang, and J.-H. Lai, “Deep image clustering
with contrastive learning and multi-scale graph convolutional networks,”
Pattern Recognition, vol. 146, p. 110065, 2024.
[23] J. Ren, Y. Wang, C. Niu, W. Song, and S. Huang, “A novel clustering
algorithm for wi-fi indoor positioning,” IEEE Access, vol. 7, pp. 122428–
122434, 2019.
[24] G. Junyi, S. Li, H. Xiongxiong, and C. Jiajia, “A novel clustering
algorithm by adaptively merging sub-clusters based on the normal-
neighbor and merging force,” Pattern Analysis and Applications, vol. 24,
no. 3, pp. 1231–1248, 2021.
[25] S. G. Lee and C. Lee, “Developing an improved fingerprint positioning
radio map using the k-means clustering algorithm,” in 2020 International
Conference on Information Networking (ICOIN), pp. 761–765, IEEE,
2020.
[26] T. Vo-Van, A. Nguyen-Hai, M. Tat-Hong, and T. Nguyen-Trang, “A
new clustering algorithm and its application in assessing the quality of
underground water,” Scientific Programming, vol. 2020, pp. 1–12, 2020.
[27] S. El Khediri, W. Fakhet, T. Moulahi, R. Khan, A. Thaljaoui, and
A. Kachouri, “Improved node localization using k-means clustering for
wireless sensor networks,” Computer Science Review, vol. 37, p. 100284,
2020.
[28] M. Singh and S. K. Soni, “Fuzzy based novel clustering technique by
exploiting spatial correlation in wireless sensor network,” Journal of
Ambient Intelligence and Humanized Computing, vol. 10, pp. 1361–
1378, 2019.
[29] X. Deng, D. Huang, D.-H. Chen, C.-D. Wang, and J.-H. Lai, “Strongly
augmented contrastive clustering,” Pattern Recognition, vol. 139,
p. 109470, 2023.
[30] Y. Zhang, Q. Shi, J. Zhu, J. Peng, and H. Li, “Time series clustering
with topological and geometric mixed distance,” Mathematics, vol. 9,
no. 9, p. 1046, 2021.
[31] J. Paparrizos and L. Gravano, “k-shape: Efficient and accurate clustering
of time series,” in Proceedings of the 2015 ACM SIGMOD international
conference on management of data, pp. 1855–1870, 2015.
[32] Z. Cui, X. Jing, P. Zhao, W. Zhang, and J. Chen, “A new subspace
clustering strategy for ai-based data analysis in iot system,” IEEE
Internet of Things Journal, vol. 8, no. 16, pp. 12540–12549, 2021.
[33] E. Zhu and R. Ma, “An effective partitional clustering algorithm based on
new clustering validity index,” Applied soft computing, vol. 71, pp. 608–
621, 2018.
[34] P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and
validation of cluster analysis,” Journal of computational and applied
mathematics, vol. 20, pp. 53–65, 1987.
[35] D. L. Davies and D. W. Bouldin, “A cluster separation measure,”
IEEE transactions on pattern analysis and machine intelligence, no. 2,
pp. 224–227, 1979.
[36] T. Caliński and J. Harabasz, “A dendrite method for cluster analysis,”
Communications in Statistics-theory and Methods, vol. 3, no. 1, pp. 1–
27, 1974.
[37] P. Barsocchi, A. Crivello, D. La Rosa, and F. Palumbo, “A multisource
and multivariate dataset for indoor localization methods based on wlan
and geo-magnetic field fingerprinting,” in 2016 International Conference
Authorized licensed use limited to: Southeast University. Downloaded on March 03,2025 at 02:50:04 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.