jurnalku
jurnalku
Ulumuddin1
Abstract
The effects of social media and modern approaches help offenders to achieve their crimes. This
paper explores machine learning architecture to predict criminal crime cases by classifying each type
of crime using K-Means which is optimized with PSO from the data the researcher got in the past
mas. The clustering parameters use medium, light, and severe crime categories, each of them gets
medium = 74, light = 46, and weight = 30. According to the experimental result, K-Means optimization
with PSO can produce 0,12287 which uses SSE parameters while k-means performance gets results
0.885.
Keywords:
Crime, Prediction K-Means, PSO
1. Introduction
The effects of social media and modern approaches help offenders to achieve their
crimes. Social media, as a modern media form, has significant criminogenic potential,
influencing public perceptions of crime and contributing to the fear of crime. It shapes moral
thinking and stereotypes, impacting societal behavior and attitudes toward justice. The
interaction between crime and society is exacerbated by the dissemination of information
through social media, which can lead to the escalation of criminalization and victimization,
necessitating a criminological assessment of media representations of crime and justice.
Both analysis and prediction of crime is a systematized method that classifies and
examines crime patterns. A criminal offense or Crime is a detrimental act that not only
affects individuals but affects a community. It is very much essential to determine the
different factors for the possible occurrence of crimes and find the optimized ways to reduce
the crime [19].
Traditional statistical methods in crime prediction often struggle due to the complex and
dynamic nature of crime, which involves numerous interacting factors. These methods, like
linear regression, often oversimplify relationships between variables, failing to capture the
intricate dynamics of crime patterns. Additionally, they may not effectively handle the
volume and variety of modern data sources, such as social media and real-time sensor
data. Furthermore, traditional models can perpetuate existing biases in historical data,
leading to skewed predictions and potentially discriminatory policing practices, and lack of
adaptability hinders their ability to keep pace with evolving crime trends [18].
2. Related Works
Crime analysis and prediction is a growing research area with various novel techniques
including Machine Learning and Deep Learning. Machine learning has become a pivotal
tool in crime analysis and prediction, offering law enforcement agencies the ability to
forecast criminal activities and develop effective prevention strategies. Various machine
learning techniques have been explored to identify patterns and trends in crime data,
enhancing the predictive capabilities of these models. The integration of these technologies
into public safety frameworks is crucial for improving crime prevention measures and
ensuring community safety. These methods are extensively used to forecast crime by
identifying patterns and variables associated with criminal activities. They provide insights
that help law enforcement agencies develop strategies to deter crime [10].
A research paper investigates crime analysis and prediction using machine learning
techniques, specifically employing multi-year datasets from Chicago. It evaluates five
models: Polynomial Regression, Random Forest, LightGBM, XGBoost, and Linear
Regression. The findings indicate that XGBoost Regression outperforms the others in
prediction accuracy and data-generalization capabilities, while conventional models like
Polynomial and Linear Regression struggle with the complexity of crime data. The study
highlights the importance of selecting appropriate algorithms for effective crime prediction
and suggests avenues for further research [11].
Another work focuses on applying various machine learning algorithms such as Naïve
Bayes, Random Forest, Decision tree Support vector machine, and Logistic regression for
reducing crime rates in India. Performance of the proposed models compared with various
measures like precision, recall, accuracy, and fl score along with the confusing matrix. Also
considers the error value for various models with the help of the mean absolute error
technique. As a result, the Naïve Bayes model gives the best results of 98.94% compared
with others [2].
For crime analysis and prediction, another paper explores the Radial Basis Function
(RBF) with Support Vector Machine (SVM), It enhances accuracy in predicting crime rates
by mapping input features into a higher-dimensional space to capture complex patterns.
The model, evaluated on the Chicago crime dataset, achieved high accuracy (0.89),
precision (0.80), recall (0.82), and F1-score (0.85), demonstrating its effectiveness
compared to existing techniques like Decision Tree and Long Short-Term Memory [12].
Another paper also proposed techniques with Random Forest, SVM, Logistic Regression,
and Linear Regression to leverage accurate predictions [13].
Machine learning, particularly through algorithms like Random Forest, plays a crucial
role in crime analysis and prediction. It enables law enforcement to analyze extensive
datasets, uncover complex patterns, and manage high-dimensional data effectively.
Random Forest's advantages include feature selection, robustness to noise, and
scalability, making it suitable for predicting crime trends. By integrating these systems,
agencies can anticipate criminal activities, allocate resources strategically, and implement
targeted interventions, ultimately enhancing community safety through data-driven
approaches [14].
A recent study explores predictive crime analysis using modern learning techniques,
specifically employing a Recursive Neural Network in Deep Learning (RNNDL). This
approach processes large datasets of past criminal activities and demographic features to
identify patterns and forecast potential crime hotspots. By assigning greater weight to
recent predictions and less to older ones, the method enhances accuracy in predicting
future crimes, thereby aiding law enforcement in formulating effective countermeasures
against criminal activities [15]. According to several findings, the application of deep
learning models for crime prediction highlights their superiority over traditional statistical
and classical machine learning methods. It emphasizes the importance of utilizing external
features and capturing temporal dynamics in crime data [16].
In this paper, we explore the K-means algorithm with PSO optimization to enhance
efficiency and reduce time complexity in identifying crime patterns. The integration of the
K-Means clustering algorithm with Particle Swarm Optimization (PSO) enhances clustering
performance by addressing the limitations of K-Means, such as sensitivity to initial
centroids and convergence to local optima. K-Means efficiently partitions data into clusters
based on distance metrics, while PSO optimizes the centroid selection process by
leveraging swarm intelligence and global search capabilities. This hybrid approach
improves clustering accuracy, ensures better stability in results, and accelerates
convergence by reducing the likelihood of poor initializations. Additionally, PSO helps
escape local optima by exploring a wider solution space, making the K-Means-PSO
combination particularly effective for complex, high-dimensional datasets.
3. Proposed Method
3.1 K-Means Algorithm
In K-Means operation, given a dataset 𝑋 = {𝑥1 , 𝑥2 , . . . , 𝑥𝑛 }, partitioning it into k clusters.
Initialize k centroids 𝐶 = {𝑐1 , 𝑐2 , . . . , 𝑐𝑘 } } and sign each data point xi to the nearest
centroid:
arg min ∥ 𝑥𝑖 − 𝑐𝑗 ∥2
𝑗
Update centroids by computing the mean of assigned points:
1
𝑐𝑗 = ∑ 𝑥𝑖
∣ 𝑆𝑗 ∣
𝑥𝑖 ∈𝑆𝑗
Evaluate fitness using the K-Means objective function and update best positions.
Iterate until convergence or max iterations.
K-Means is a widely used clustering algorithm that partitions a dataset into k clusters by
minimizing intra-cluster variance. The algorithm begins by initializing k centroids, then
iteratively assigns each data point to the nearest centroid based on Euclidean distance.
After the assignment, the centroids are recalculated as the mean of all points within a
cluster. This process continues until convergence, where centroids no longer change
significantly. However, K-Means has limitations, such as sensitivity to initial centroid
selection and the risk of getting trapped in local optima, which affects clustering
performance.
Particle Swarm Optimization (PSO) enhances K-Means by optimizing centroid
selection, mitigating the algorithm’s dependency on initial placement. PSO is a bio-inspired
algorithm that mimics the collective behavior of swarms, such as birds or fish. It initializes
a population of candidate solutions (particles), each representing potential centroids. These
particles update their positions iteratively based on their own best position and the global
best found by the swarm, leading to an optimal centroid arrangement. By integrating PSO
with K-Means, the algorithm achieves better cluster compactness, faster convergence, and
improved robustness against local optima, making it more effective for clustering tasks.
4. Experimental Setup
below is the implementation of the k-means algorithm to cluster types of criminal acts and
the implementation of PSO to optimize the k-means algorithm.
1. K-Means
a. Determine the K value for the number of clusters to be formed In this study,
researchers used 200 crime data. From the 200 data, researchers used 3 clusters
for crime mapping.
Table 1. Determination of the number of K
No Location Age Gender Crime Cluster
1 66 32 117 102
2 80 30 117 107 Light
3 65 30 118 103
4 75 37 117 104
5 66 36 117 100 currently
6 75 35 117 107
7 78 27 117 105
8 69 31 118 106
Heavy
9 70 33 117 100
10 79 34 118 103
The information in the table above contains 4 attributes, namely location, age, gender and
type of crime, while there are 3 labels, namely light, medium, and heavy. Labels are used
as clusters for grouping criminal acts.
b. Determine the value of the cluster center point by random. The second step is to
randomize or randomize to be used as the cluster point value, for example, from
all the data above, the value that you want to use as the cluster center point is
taken or randomized. Here the researcher takes the values in table 4.2.
Table 2 cluster point
No Location Age Gender Crime
1 66 32 117 102
c. Calculate the distance from each centroid from the cluster value. To calculate the
distance between the object and the centroid, you can use Euclidian Distance.
Euclidian distance is a formula for calculating the distance between objects. Below
is the formula.
e. Do an iteration then determine the position of the new centroid point using the
equation, namely by applying the Cambera equation formula where you first
determine the centroid point.
From the number of records above, we then randomize the use values as particle values
66 36 117 75
35 117 78 27
117 69 31 118
b. Look for the SSE value obtained from the minimum distance. In the k-means PSO
test, you must know the SSE value because the SSE value is used as a fitness
function.
Table 6. SE Value
Data SE
66 32 117 0.00346
80 30 117 0.034894
65 30 118 0.010567
75 37 117 0.006011
66 36 117 0.006071
75 35 117 0.051162
78 27 117 0.010222
69 31 118 0.010222
SSE 0.122387
c. After looking at the SSE value, P best will be determined, where Pbest is the cluster
average value,
= (66 + 66 + 66 + 66 + 66) / 5 = 66
d. After seeing the overall value of Pbest, the next step is to determine the Gbest value,
𝐹(𝑥2 ) = 0.00346
𝑃𝑏𝑒𝑠𝑡(𝑥2 ) = (𝑐1 , 𝑐2 , (66,32,117), 𝑐3 )
This suggests that the personal best solution for 𝑥2 consists of some coefficients
or parameters 𝑐1 , 𝑐2 , 𝑐3 and a specific vector (66,32,117).
Global Best 𝐺𝑏𝑒𝑠𝑡 Representation:
𝐺𝑏𝑒𝑠𝑡 = 𝑃𝑏𝑒𝑠𝑡(𝑥2 ) = (𝑐1 , 𝑐2 , (66,32,117), 𝑐3 )
After determining 𝐺𝑏𝑒𝑠𝑡, the next stage is to calculate the speed (velocity) and position
(position) using equation (2.6) and equation (2.7). for example, in this study for V 0 and X0
= 0.
𝑉𝑖𝑑 = 𝑊 ⋅ 𝑉𝑖𝑑 + 𝐶1 ⋅ rand1 ⋅ (𝑃𝑖𝑑 − 𝑋𝑖𝑑 ) + 𝐶2 ⋅ rand2 ⋅ (𝑃𝑔𝑑 − 𝑋𝑖𝑑 )
𝑉(1) = 𝑉(0) + 1 ⋅ random ⋅ (66 − 66) + 1 ⋅ random ⋅ (66 − 66)
𝑉(1) = 𝑉(0)
𝑋𝑖𝑑 = 𝑋𝑖𝑑 + 𝑉𝑖𝑑
𝑋(1) = 𝑋(0) + 𝑉(1)
If X (1) has been obtained, then repeat the second step in the same way, starting from
determining the particles randomly.
The research results in classifying criminal acts which contain 4 attributes with 3 labels can
be displayed in Table 7.
b. Label 2 is cluster 2 which has a value of 41, where label 2 is a cluster of moderate
crime
c. Label 3 is cluster 3 which has a value of 50, where label 3 is a cluster of minor
crimes
Label 3 represents Cluster 3, which consists of 50 instances categorized as minor
crimes. This clustering indicates that these 50 cases share similarities in terms of offense
severity, distinguishing them from both moderate and serious crimes. Minor crimes typically
include petty theft, disorderly conduct, trespassing, or public nuisance offenses, which are
less severe and often result in fines or minimal legal consequences rather than
imprisonment
By forming Cluster 3, the dataset helps in understanding the distribution of minor crimes,
allowing law enforcement or policymakers to focus on prevention strategies such as
community policing or awareness campaigns. If clustering was performed using K-Means
with Particle Swarm Optimization (PSO), the optimization process would enhance the
precision of centroid selection, ensuring that minor crimes are accurately grouped based
on key attributes like location, frequency, or crime type. This classification aids in crime
trend analysis and efficient resource allocation.
The results showed that K-Means clustering using Particle Swarm Optimization (PSO)
demonstrates that PSO successfully enhances the accuracy of standard K-Means
clustering. The accuracy value achieved through PSO optimization is 0.122387, which
indicates an improvement over the conventional K-Means approach. This enhancement
occurs because PSO refines the centroid initialization process, reducing the sensitivity of
K-Means to initial conditions and minimizing the risk of converging to local optima.
By integrating PSO with K-Means, the algorithm benefits from PSO's ability to explore a
broader solution space efficiently. The swarm intelligence mechanism of PSO enables
better centroid positioning, leading to improved cluster compactness and separation. This
optimization reduces intra-cluster variance and enhances the model’s performance in
classifying data accurately. As a result, the combination of PSO with K-Means proves to
be a more robust and reliable clustering technique, particularly in datasets where standard
K-Means struggles with local minima or unevenly distributed data points.
6. Conclusion
The integration of K-means with Particle Swarm Optimization (PSO) for crime prediction is
a promising approach that enhances clustering efficiency and accuracy. This method
leverages the strengths of both algorithms to address the limitations of traditional clustering
techniques in crime analysis. K-means is known for its simplicity and effectiveness in
clustering, while PSO optimizes the clustering process by improving convergence speed
and accuracy. This combination is particularly useful in identifying crime patterns and
predicting potential crime hotspots.
Acknowledgment
The researcher would like to thank all parties who have helped complete this research well
even though there are still many shortcomings. I hope my research is useful for all parties
and hopefully, future research can be even better.
References
[1] Krishnendu, S. et al. “Crime Analysis and Prediction using Optimized K-Means
Algorithm.” 2020 Fourth International Conference on Computing Methodologies and
Communication (ICCMC) (2020): 915-918.
[2] Kumar, R.Sathish et al. “Empirical Analysis on Crime Prediction using Machine
Learning.” 2023 International Conference on Computer Communication and Informatics
(ICCCI) (2023): 1-5.
[3] Agarwal, Jyoti et al. “Crime Analysis using K-Means Clustering.” International Journal of
Computer Applications 83 (2013): 1-4.
[4] A, Malathi et al. “An Enhanced Algorithm to Predict a Future Crime using Data
Mining.” International Journal of Computer Applications 21 (2011): 1-6.
[5] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. San
Francisco, CA, USA: Morgan Kaufmann, 2012.
[6] S. Sathyadevan and M. S. Devan, “Crime analysis and prediction using data mining,”
in Proc. Int. Conf. Netw. Comput. Secur. (NCS), 2014, pp. 406–412.
[7] F. de A. T. de Carvalho, Y. Lechevallier, and F. M. de Melo, “Partitioning hard clustering
algorithms based on multiple dissimilarity matrices,” Pattern Recognit., vol. 45, no. 1, pp.
447–464, Jan. 2012, doi: 10.1016/j.patcog.2011.07.007.
[8] M. R. Keyvanpour, M. Javideh, and M. R. Ebrahimi, “Detecting and investigating crime by
means of data mining: A general crime matching framework,” Procedia Comput. Sci., vol.
3, pp. 872–880, 2011, doi: 10.1016/j.procs.2010.12.143.
[9] L. Huang, Q. Ke, and X. Hongli, “Deep combination K-means and PSO clustering method,”
Sep. 12, 2017
[10] J. G. J., J. Jefrin, and S. Dhamodaran, “Advanced Crime Prediction and Analysis Using
Machine Learning and Quantum Networking,” Advances in computational intelligence and
robotics book series, pp. 89–102, Oct. 2024, doi: 10.4018/979-8-3693-9336-9.ch007
[11] A. Gangwar, D. S. Bisht, S. Choudhary, V. Chauhan, and V. Tomar, “An Analytical
Comparison of Crime Prediction using Machine Learning Techniques,” pp. 1–6, Aug.
2024, doi: 10.1109/iceect61758.2024.10739168.
[12] K. Ponugoti, K. Sarangam, K. A. N. Reddy, H. Ali, and A. C. Ramachandra, “Predictive
Analytics for Crime Prevention in Smart Cities Using Machine Learning,” pp. 1–4, Aug.
2024, doi: 10.1109/iacis61494.2024.10721948.
[13] P. Sajitha, “Crime Type and Occurrence Prediction Using Machine Learning Algorithm,”
International Journal For Science Technology And Engineering, vol. 12, no. 10, pp. 450–
456, Oct. 2024, doi: 10.22214/ijraset.2024.64542.
[14] M. G. Yadav, R. Nennuri, E. Reddy, M. Vishal, and G. P. Vishal, “The Role of Machine
Learning in Crime Analysis and Prediction,” pp. 885–890, Apr. 2024, doi:
10.1109/icoeca62351.2024.00157.
[15] V. Keerthika, A. Geetha, and D. M. D. Raj, “Predictive Crime Analysis: Statistical Approach
to Forecast Crime Hotspots Using Recursive Neural Network in Deep Learning,” pp. 1–7,
Jul. 2024, doi: 10.1109/icait61638.2024.10690551.
[16] R. Basak, M. N. Alif, Y. Rayhan, T. Hashem, and M. Ali, “Deep Learning Based Crime
Prediction Models: Experiments and Analysis,” Jul. 2024, doi: 10.48550/arxiv.2407.19324.
[17] S. Sathyadevan and M. S. Devan, “Crime analysis and prediction using data mining,”
in Proc. Int. Conf. Netw. Comput. Secur. (NCS), 2014, pp. 406–412.
[18] Ramanan, K., et al.. A Novel Clustering & Machine Learning Algorithm for Crime Rate
Prediction and Analysis. *2023 Second International Conference on Augmented
Intelligence and Sustainable Systems (ICAISS)*, 399–405. 2023.
[19] N. Fedchun, “Mediacriminology: media impact on criminalization and functional
opportunities of deterrence,” 2024. doi: 10.30525/978-9934-26-406-1-28.