oljira1
oljira1
Submitted
By:____________________
February 2025
Nekemte, Ethiopia
APPROVAL SHEET
TITLE: DEVELOPING CRIME PREDICTION MODEL USING deep
LEARNING TECHNIQUES THE CASE OF NEKEMTE POLICE HEAD
QUARTER.
SUBMITTED BY : ______________________________
Approved by :
February 2025
Nekemte, Ethiopia
DECLARATION
I declare that this research titled as “ : Developing crime prediction model using
Deep learning techniques the Case of Nekemte police head quarter. ” is my
original work, and has not been submitted as a partial requirement for the award of
any degree in the university or elsewhere
ACKNOWLEDGEMENTS
First and foremost, I would like to say thanks to God who helps me in all circumstance
to do this research proposal. Next I’d like to thanks my advisor Dr. __________ who
help me by providing the required resource and technical assistance. My advisor guides
me to write the proposal and prepare the quality research proposal and encourage me to
improve the research work at each progress to do the thesis well, without his help I could
not do this proposal in a given time and schedule. I am very lucky to get him as my
advisor. I would also like to thanks my parent who help me or stand behind me to learn
my MSC program by giving financial resource such as computer, money to get d/t
service outside the campus..
Abstract
Forecasting web traffic is crucial for data-driven decision-making in a variety of fields.
Nevertheless, previous research frequently used Wikipedia datasets, which may not adequately
represent the unique characteristics of web traffic. (Zhou, Wang, Huang, & Liu, 2023)
Furthermore, there is a propensity to give priority to traditional models while ignoring the
investigation of potentially better models. The absence of thorough comparisons between various
machine learning models impairs our comprehension of their relative performance and the ways
in which datasets affect their efficacy. Predicting future time series values is one of the most
challenging problems in the industry. The time series field encompasses many different issues,
from inference and analysis for forecasting and classification (Tealab, 2018). A major problem
these days is web traffic forecasting, which can cause setbacks to the operations of major
websites. Time-series forecasting has been a hot topic for research. The purpose of this study is
to use local data and ensemble learning techniques to create accurate online traffic forecasting
models, which will improve decision-making in the end. Data gathering, preprocessing, hyper
parameter tuning, model training, prediction, and evaluation are some of the steps in the
research process. The study offers a thorough examination of tests carried out on datasets of web
traffic from the website of the Awash Bank. The dataset is separated into training, validation,
and testing sets to make analysis easier. The total dataset involves web page visitor counts for a
period of nine years, from January 1, 2017, to January 3, 2025, which is 2920 days. This
research work successfully analyzed and predicted online web traffic patterns using ensemble
learning method such as Long Short Term Memory (LSTM), ARIMA, bidirectional LSTM with
attention, Gated Recurrent Unit (GRU), RNN, and bidirectional GRU with attention. Generally,
the study shows how ensemble learning can optimize resource allocation, improve web service
performance, and enable data-driven decision-making in a variety of domains. The scope of this
work can concentrate on improving and expanding this approach through dataset preparation,
model architecture exploration, and ensemble methods. The findings illustrate that, the potential
of ensemble learning in this field and advance our understanding of web traffic analysis. Since
the task of predicting web traffic in a precise quantities as possible requires large datasets, we
designed a forecasting system to be accurate despite having limited data in the dataset. We
tested the proposed system on the new Wikipedia page dataset and obtained highly accuracy.
Key word : ensemble learning , LSTM,ARIMA, web traffic, RNN, GRU, time series forecasting.
CHAPTER ONE
INTRODUCTION
Introduction
web traffic refers to the amount of data moving across a network at a given point in time. In
networking, the term throughput means how much data was transferred from a source in
a specific duration. (2020) Web traffic analysis is a vital for businesses process or societies that
trust on online platforms to employ their activities and share information (Zhou, Wang, Huang,
& Liu, 2023). Since more and more people worldwide are now able to access the internet, it is
inevitable that traffic to practically all websites will increase. It is the process of making sure a
website works properly and offers users a positive experience (Bojer & Meldgaard, 2020). Web
traffic analysis entails tracking and controlling the quantity of visitors to a website over a
specified time period. Web traffic analysis may be improved in a number of ways, including by
looking at how users interact with the website, identifying and fixing any problems that impact
its performance, and using best practices to improve its functionality (W. W. L. H. a. B. L. K.
Zhou, 2021). Website owners and administrators can accomplish their goals, including boosting
conversion rates, increasing visitor traffic, and enhancing customer satisfaction, with the use of
efficient online traffic management (Madan & SarathiMangipudi, 2018).
The process of projecting future website traffic or sessions using historical and current data is
known as online traffic forecasting. Forecasting web traffic is crucial for a number of reasons,
including load balancing, ad delivery, security, and content provision (Yang, Lu, Zhao, Ju, &
112–115, 2020). Therefore, this can be improve user experience, lower operational costs, and
boost revenue for websites and platforms. So, web traffic prediction can helps website
optimization, planning, and quality improvement. According to the advantages of this paper
indicated above, many researchers presented different scientific models to obtain improved
performance on website forecasting. Generally, forecasting methods can be regarded as classic
statistical models and artificial intelligence (AI)-based models. In order to forecast and analysis
web traffic various researcher were used machine learning (ML) .it has become a useful
technology and traditionally, tasks are manually coded into computers system. Several traditional
machine learning methods are applied in previous work to classify network traffic. K-Nearest
Neighbors (KNN) and Decision Trees (DT).
Due to their ability to identify intricate and nonlinear patterns in data, deep learning (DL)
techniques have become a potent tool for time series forecasting issues. For web traffic
predictions, these models have proven to be more accurate than conventional methods, which is
an interesting field for further study. A kind of deep learning model that can be analyze
sequential data and retain previous inputs, recurrent neural networks and LSTMN are useful for
web traffic forecasting jobs (Zhou, Wang, Huang, & Liu, 2023). Long-term dependencies can be
learned via deep learning models, which can also effectively handle sequential data. Many
researchers and scholars have been used on the work of web traffic forecasting, employing
various models to predict future web traffic patterns (Smagulova & James, 2020). But the
majority of these research have only based on traditional models, and even when deep learning
models have been used, they have mostly depended on LSTM, one particular kind of deep
learning model. This ignores the investigation of alternative models that show promise and could
produce better outcomes. Furthermore, the vast majority of these researches have used statistics
analysis from Wikipedia, which might not fully represent the distinctive features of web traffic
across various domains and geographical areas (Tian, Ma, Zhang, & Zhan, 2018). Since online
traffic forecasting has important ramifications in developing nations like Ethiopia, the models'
generalizability and applicability are thus limited. So, further research is needed to look at online
traffic forecasting using different methods and datasets, with a focus on developing nations.
The purpose of this research study to design and develop an ensemble deep learning-based plat
form to analyze and forecast online web traffic of a considered web-server. Emphasis is placed in
feature extraction and pattern detection in the analysis, which allows the design of a Long Short-
Term Memory (LSTM), ARIMA, RNN and GRU models to forecast the flow of page views on
websites in the short and medium term. It aims to improve web traffic forecasting by developing
ensemble deep learning models that uses a locally obtain acquired dataset. The goal of this
research is to focus on whether ensemble deep learning models are suitable and efficient for
predicting online traffic in our particular context (Rangapuram, Seeger, Gasthaus, Stella, Wang,
& Januschowski, 2018). In order to obtain this goal, we gathered a dataset from local
organization and conduct recurrent neural network and LSTM models, including GRU,
bidirectional Gated Recurrent Unit (biGRU), LSTM, bidirectional Long Short Term Memory
(biLSTM), biLSTM with attention, and biGRU with attention (Chen, Gao, Liu, Chen, Zhang, &
Feng, 2019).
Motivation of the study
Web traffic forecast and analysis using Time series forecasting (TSF) is an important field of
study and covers many different topic (Makridakis, Spiliotis, & Assimakopoulos, 2020). This
topic has a strong research area and has received the attention of several scientists throughout the
world (Zhou, Wang, Huang, & Liu, 2020). In our country, web traffic forecasting is not often
adopted for research purposes, according to its vast significance and needs more work as are
search based web traffic analysis for further investigation. To make informed judgments, plan for
future development, effectively manage resources, and maximize their online presence, website
owners, marketers, and organizations must forecast web traffic. We have motivated that, to
implement an ensemble learning based algorithms that have the potential to provide precise
forecasts and valuable information about patterns in online web traffic.
The majority of earlier researches and scholars done universally based on a dataset from
Wikipedia, which might not fully obtained the traits and difficulties of web traffic in various
organization. As a result, more datasets achieved to web traffic forecasting are needed to help
assess and apply current models. Furthermore, thorough comparisons algorithms of ensemble
learning approach various deep learning models perform on a particular dataset that are clearly
absent, underscoring the influence of the dataset on the algorithms' efficacy. In order to provide
precise and trustworthy online traffic forecasting, it is a vital to study and compare several deep
learning models that are capable of efficiently capturing complex patterns.
This proposal aimed to fill the gap of the previous study by analyzing the performance and
efficiency of eight recurrent neural network models for web traffic forecasting and analysis
context (A. Bhardwaj, 2023). The dataset used in the study was acquired from a local
organization of Awash bank Nekmte district. The RNN and LSTM models for our ensemble
learning include, LSTM, GRU, bidirectional LSTM, bidirectional GRU, bidirectional LSTM
with attention, and bidirectional GRU with attention are the models whose performances are
compared (H. Nunnagoppula, 2023). The purpose of this comparative analysis is to provide light
on how well these models function and whether they can be used in our situation for precise web
traffic forecasts. Forecasting web traffic is an important as it lets digital marketers to plan
efficiently and control risks.
Statement of the problem
Web traffic time series prediction is a vital job in areas that deals on data-driven decision making
(T. Shelatkar, 2020). But, it faces many challenges due to its difficulty and occasional change
behavior of web traffic patterns. For web traffic forecasting utilizing local data, the deep learning
models that are now on the market have not been sufficiently tested and compared. It is
important to keep in mind that local data may exhibit different features and issues than global or
regional data (Q. Kong et al., 2020). Furthermore, the repeatability and generalizability of the
current models are constrained by the limited number of publicly accessible datasets for web
traffic forecasts. For web traffic predictions on local data, it is therefore necessary to investigate
and contrast several recurrent neural network models as well as to produce and distribute a
unique dataset.
Research question
1. What are the common features used for web traffic prediction and forecasting?
2. How do we build a model for web traffic prediction and forecasting using ensemble learning?
3. To what extent an ensemble learning model better predict and forecast web traffic?
4. How we evaluate the performance matrix of the model using web traffic data set ?
Object of the study
The general and specific objective of this study is illustrated and explained as follow.
General objective
The main objective of the research study is to build web traffic analysis and forecasting model
using an ensemble learning time-series techniques :case of Nekmete district Awash Bank.
Specific objectives
To review various related articles and paper which are used for web traffic prediction and
forecasting
To identify and forecast possible factors that affect web traffic analysis and prediction.
To build a predictive and forecasting model for web traffic prediction and forecasting.
To evaluate the performance of an ensemble learning model for web traffic forecasting.
Scope of the study
The scope this proposal includes the development of a dataset especially proposed for web
traffic predictions. It also uses the recently created dataset to assess and contrast how well
various deep learning models to predict online web traffic. for this research study, data for web
traffic analysis will be only collected from the Nekemte District Awash Bank internal
communication website page. To employ a guarantee for data quality, this study preprocessed
the data by using techniques such data splitting, standardization, and cleaning. Additionally, the
study conducts exploratory data analysis to understand the traits and patterns of the data. Using
the web traffic dataset of Awash bank, the study will employ eight types of deep learning models
LSTM, GRU, bidirectional LSTM, bidirectional GRU, bidirectional LSTM, CNN and RNN with
attention, and bidirectional GRU with attention and evaluated the models' performance using
MSE, MAE, and RMSE metrics to determine the accuracy and reliability of the model on web
traffic forecasting and prediction.
The analysis of predicting web traffic has a greater significance for website owners, to make
reliable decisions for website users. This work can also help businesses to decide
about website design, capacity planning, and marketing campaigns. Web
traffic forecasting can be used for numerous purposes, including estimating
website traffic, understanding website trends, and forecasting future traffic
volumes (A. Subashini, 2019). We apply several methods that can be used to
forecast web traffic, including time series and regression analysis. In order to
determine the best forecast method, we must consider the type of data and
the degree of accuracy (A. P. Wibawa, 2020). The availability of dataset's now creates
opportunities for the development of the models particularly for the Ethiopian location, as its
unavailability previously impeded the implementation of online web traffic forecasting models
adapted to local condition.
This proposal work improves the field of web traffic prediction and forecasting by introducing a
new dataset and thoroughly evaluating various deep learning models in this particular context.
Secondly, the study compares a variety of deep learning models called ensemble learning
approach for web traffic analysis and prediction. While ensemble learning methods have shown
good performance in time series forecasting across numerous domains their application in web
traffic prediction has been analyzed.
Additionally, the results of web traffic forecasting will assist website owners with proactive
resource allocation, load balancing, and user experience optimization by offering insights into
traffic volume. System owners can automate their system and make it more engaging for users
by analyzing and forecasting web traffic. To meet the needs and preferences of the users, the
proposed system can provide personalized recommendations, relevant data, and an intuitive user
interface. Additionally, by making it easier and faster for users to find the information they
require, the system can boost user happiness and loyalty.
Research method
In the study, an ensemble learning approach using various LSTM-GRU model was developed
along with a framework for forecasting website traffic using time series forecasting by
employing ARIMA, SARIMA , Prophet, and LSTM, CNN and RNN models. We then apply
many pre-processing task like, cleaning, splitting, model training, evaluation, comparison, and
data gathering in the process. Research methodology is a systematic and scientific approach used
to collect, analyze, and interpret data. It is essential to research the ideas and tenets of data
analysis methods in order to meet our goals. The research methodology in online traffic analysis
and forecasting uses a number of different techniques. These include important processes such as
data collection, preprocessing, model selection, training, time series forecasting, and evaluation.
Our proposed approach implements an advantage of Long Short Term Memory (LSTM) RNN.
When a new piece of information is added to RNN, it adds a function that fully changes the
existing information. The feedback loop recurrent layer is present in both RNNs. It enables them
to store data and information in their memory throughout time.