0% found this document useful (0 votes)

39 views

A Malware Detection Method

Strictly Prohibit to download

Uploaded by

lorofe7734

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

A Malware Detection Method

Strictly Prohibit to download

Uploaded by

lorofe7734

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

1

A PROJECT REPORT ON

A MALWARE DETECTION FOR HEALTH SENSOR

DATA BASED ON MACHINE LEARNING
A Main Project Submitted to Jawaharlal Nehru Technological University, Kakinada in Partial
fulfillments of Requirements for the Award of the Degree of

BACHELOR OF TECHNOLOGYIN

COMPUTER SCIENCE AND ENGINEERING

Submitted By

MR.S. SRIRAM (20KT1A05B8)

Under the Esteemed Guidance of

Mrs.V.RAMA LAKSHMI

Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

POTTI SRIRAMULU CHALAVADI MALLIKARJUNA RAO

COLLEGE OF ENGINEERING & TECHNOLOGY
(Approved by AICTE New Delhi, Affiliated to JNTU-Kakinada)

KOTHAPET, VIJAYAWADA-520001, A.P

2020-2024

2
POTTI SRIRAMULU CHALAVADI MALLIKHARJUNARAOCOLLEGE OF

ENGINEERING & TECHNOLOGY

KOTHAPET, VIJAYAWADA-520001.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

This is to certify that the project work entitled “A MALWARE

DETECTION METHOD FOR HEALTH SENSOR DATA BASED ON MACHINE
LEARNING” is a bonafide work
carried out by

MR.S.SRIRAM (20KT1A05B8) Fulfillment for the award of the degree of Bachelor of Technology in
COMPUTER SCIENCE AND ENGINEERING of Jawaharlal Nehru Technological University, Kakinada
during the year 2020-2024. It is certified that all corrections/suggestions indicated for internal assessment
have been incorporated in the report. The project report has been approved as it satisfies the academic
requirements in respect of project work prescribed for the above degree

PROJECT GUIDE HEAD OF THE DEPARTMENT

EXTERNAL EXAMINER

3
ACKNOWLEDGEMENTS

I owe a great many thanks to a great many people who helped and supported and suggested
me in every step.

I am glad for having the support of our principal Dr. J. LAKSHMI NARAYANA who
inspired me with his words filled with dedication and discipline towards work.

I express my gratitude towards Dr.D.DURGA PRASAD, Professor & HoD of CSE for
extending his support through training classes which had been the major source to carry out my
project.

I am very much thankful to Mrs.V.RAMA LAKSHMI Assistant Professor, Guide of my

project for guiding and correcting various documents with attention and care. She has taken pain
to go through the project and make necessary corrections as and when needed.

Finally, I thank one and all who directly and indirectly helped me to complete my project
successfully.

Project Associate

S. SRIRAM (20KT1A05B8)

4
DECLARATION

This is to declare that the project entitled “A MALWARE

DETECTION METHOD FOR HEALTH SENSOR DATA BASED ON
MACHINE LEARNING” submitted by us in the partial fulfillment of
requirements for the award of the degree of Bachelor of Technology in
Computer Science & Engineering in Potti Sriramulu Chalavadi
Mallikarjuna Rao College of Engineering and Technology, isbonafide
record of project work carried out by us under the supervision and guidance
of Mrs.V.RAMA LAKSHMI Assistant Professor of CSE. As per our
knowledge, the work has not been submitted to any other institute or
universities for any other degree.

Project Associate

S. SRIRAM
(20KT1A05B8)

5
ABSTRACT
Traditional signature-based malware detection approaches are
sensitive to small changes in the malware code. Currently, most malware
programs are adapted from existing programs. Hence, they share some
common patterns but have different signatures. To health sensor data, it
is necessary to identify the malware pattern rather than only detect the
small changes. However, to detect these health sensor data in malware
programs timely, we propose a fast detection strategy to detect the
patterns in the code with machine learning-based approaches. In
particular, XGBoost, LightGBM and Random Forests will be exploited
in order to analyze the code from health sensor dataTerabytes of program
with labels, including benign and malware programs, have been collected.
The challenges of this task are to select and get the features, modify the
three models in order to train and test the dataset, which consists of health
sensor data, and evaluate the features and models. When a malware
program is detected by one model, its pattern will be broadcast to the other
models, which will prevent malware program from intrusion effectively.

6
TABLE OF CONTENTS

1.INTRODUCTION 1
1.1 Motivation 1
1.2 Existing System 2
1.3 Objective 2
1.4 Outcomes 2
1.5 Applications
3

1.6 Structure of Project (System Analysis) 3

1.6.1 Requisites Accumulating and Analysis 4
1.6.2 System Design 4
1.6.3 Implementation 4
1.6.4 Testing 4
1.6.5 Deployment of System and Maintenance 4
1.7 Functional Requirements 4
1.8 Non-Functional Requirements 5
1.8.1 Examples of Non-Functional Requirements 6
1.8.2 Advantages of Non-Functional Requirements 6
1.8.3 Disadvantages Non-Functional Requirements 6
1.8.4 Key Learnings 7
2.LITERATURE SURVEY 7
3.PROBLEM IDENTIFICATION & OBJECTIVES 18
3.1 Existing Approach 18
3.2 Proposed System 18
3.3 Modules 19
3.4 Algorithms 19
4.SYSTEM DESIGN 22
5.IMPLEMENTATION 34

7
5.1 Flowchart 34
5.2 Code 34
6.TESTING 45
7.RESULTS AND DISCUSSIONS 53
8.CONCLUSION AND FUTURE SCOPE 59
9REFERENCES 60

LIST OF FIGURES
S.NO NAME P.NO
1 Project SDLC 3
2 Use case diagram 31
3 Class diagram 32
4 Sequence diagram 33

8
1.INTRODUCTION
With the advent of the Internet of Things Era, all kinds of sensors are applied
to collect health sensor data. Inevitably, some malware or malicious codes concealed
in health sensor data, which are considered as intrusion in the target host computer,
are executed according to the logic prescribed by a hacker. The categories of malicious
codes in health sensor data include computer viruses, worms, trojans, botnets,
ransomware and so on [1]. Malware attacks can steal core data and sensitive
information and damage computer systems and networks. It is one of the greatest
threats to today's computer security [2, 3]. The method of performing malware
analysis is usually one of two types [4-7]. (1) Static analysis is usually accomplished
by demonstrating the different resources of a binary file without implementing it and
studying each component. Binary files can also be disassembled (or redesign) using a
disassembler (such as IDA). Machine code can sometimes be interpreted into
assembly code, and humans can read and understand assembly code. Malware analysts
can understand assembly instructions and get an image of what the program should
execute. Some modern malware is created using ambiguous techniques to defeat this
type of analysis, such as embedding grammatical code errors. These errors can confuse
the disassembler, but they still work in the actual execution. (2) Dynamic analysis is
performed by observing how the malware actually behaves when it runs on the host 1
This work was supported by the Qatar National Research Fund (a member of the Qatar
Foundation) under Grant NPRP10-1205-160012. The statements made herein are
solely the responsibility of the authors. system. Modern malware can encompass a
variety of ambiguous techniques that are designed to overcome dynamic analysis,
including testing virtual environments or active debuggers, delaying the execution of
malicious payloads, or requiring some form of interactive user input [8- 10]. In this
paper, we mainly focus on static code analysis. The early static code analysis methods
mainly include feature matching or broad-spectrum signature scanning. Feature
matching simply uses feature string matching to complete the detection, while the
broad-spectrum scanning scans the feature code and uses masked bytes to divide the
sections that need to be compared and those that do not need to be compared. Since
both methods need to get malware samples and extract features before they can be
detected, the hysteresis problem is serious. Furthermore, with the development of
malware technology, malware begins to deform in the transmission process in order
to avoid being found and killed, and there is a sudden increase in the number of
malware variants. The shape of the variants changes a lot so that it is difficult to extract
a piece of code as a malware signature..

1
1.1MOTIVATION
we mainly focus on static code analysis. The early static code analysis
methods mainly include feature matching or broad-spectrum signature scanning.
Feature matching simply uses feature string matching to complete the detection,
while the broad-spectrum scanning scans the feature code and uses masked bytes
to divide the sections that need to be compared and those that do not need to be
compared. Since both methods need to get malware samples and extract features
before they can be detected, the hysteresis problem is serious. Furthermore, with
the development of malware technology, malware begins to deform in the
transmission process in order to avoid being found and killed, and there is a sudden
increase in the number of malware variants. The shape of the variants changes a lot
so that it is difficult to extract a piece of code as a malware signature.
1.2 Existing System
. Based on this situation, a natural idea is to apply machine learning-
based methods that use existing experience and knowledge to perform static
code analysis on unknown binary code and automatically classify malware.
According to the guidance, this paper uses the related technologies of
machine learning based methods and explores the application of this method
in the classification of malware
1.2.1 Limitations of existing system
• Must need basicknowledge to perform static code analysis on unknown
binary code and automatically classify malware
1.3 Objectives
The objective of project is A Malware Detection Method for Health
Sensor Data Based on Machine Learning
1.5Applications
It can be used in detecting malwares

2
1.6 STRUCTURE OF PROJECT (SYSTEM ANALYSIS)

Fig: 1 Project SDLC

• Project Requisites Accumulating and Analysis
• Application System Design
• Practical Implementation
• Manual Testing of My Application
• Application Deployment of System
• Maintenance of the Project
1.6.1 REQUISITES ACCUMULATING AND
ANALYSIS
It’s the first and foremost stage of the any project as our is a an academic
leave for requisites amassing we followed of IEEE Journals and Amassed
so many IEEE Relegated papers and final culled a Paper designated
“Individual web revisitation by setting and substance importance input and
for analysis stage we took referees from the paper and did literature survey
of some papers and amassed all the Requisites of the project in this stage
1.6.2 SYSTEM DESIGN
In System Design has divided into three types like GUI Designing, UML
3
Designing with avails in development of project in facile way with different
actor and its utilizer case by utilizer case diagram, flow of the project
utilizing sequence, Class diagram gives information about different class in
the project with methods that have to be utilized in the project if comes to
our project our UML Will utilizable in this way The third and post import
for the project in system design is Data base design where we endeavor to
design data base predicated on the number of modules in our project
1.6.3 IMPLEMENTATION
The Implementation is Phase where we endeavor to give the practical output
of the work done in designing stage and most of Coding in Business logic
lay coms into action in this stage its main and crucial part of the project

1.6.4TESTING UNIT TESTING

It is done by the developer itself in every stage of the project and fine-tuning
the bug and module predicated additionally done by the developer only here
we are going to solve all the runtime errors
MANUAL TESTING
As our Project is academic Leave, we can do any automatic testing so we
follow manual testing by endeavor and error methods

1.6.4 DEPLOYMENT OF SYSTEM AND MAINTENANCE

Once the project is total yare, we will come to deployment of client system
in genuinely world as its academic leave we did deployment i our college
lab only with all need Software’s with having Windows OS .
The Maintenance of our Project is one-time process only

1.7 FUNCTIONAL REQUIREMENTS

1.Data Collection

2.Data Preprocessing

3.Training And Testing

4.Modiling

5.Predicting

1.8 NON FUNCTIONAL REQUIREMENTS

4
NON-FUNCTIONAL REQUIREMENT (NFR) specifies the quality
attribute of a software system. They judge the software system based on
Responsiveness, Usability, Security, Portability and other non-functional
standards that are critical to the success of the software system. Example of
nonfunctional requirement, “how fast does the website load?” Failing to
meet non-functional requirements can result in systems that fail to satisfy
user needs. Non- functional Requirements allows you to impose constraints
or restrictions on the design of the system across the various agile backlogs.
Example, the site should load in 3 seconds when the number of simultaneous
users are > 10000. Description of non-functional requirements is just as
critical as a functional requirement.

• Usability requirement
• Serviceability requirement
• Manageability requirement
• Recoverability requirement
• Security requirement
• Data Integrity requirement
• Capacity requirement
• Availability requirement
• Scalability requirement
• Interoperability requirement
• Reliability requirement
• Maintainability requirement
• Regulatory requirement
• Environmental requirement

1.8.1 EXAMPLES OF NON-FUNCTIONAL REQUIREMENTS

Here, are some examples of non-functional requirement:
1.8.1.1 Users must upload dataset
1.8.1.2 The software should be portable. So moving from one OS to
other OS does not create any problem.
1.8.1.3 Privacy of information, the export of restricted technologies,
intellectual property rights, etc. should be audited.
5
1.8.2 ADVANTAGES OF NON-FUNCTIONAL
REQUIREMENT
Benefits/pros of Non-functional testing are:
• The nonfunctional requirements ensure the software system follow
legal and compliance rules.
• They ensure the reliability, availability, and performance of the
software system
• They ensure good user experience and ease of operating the software.
• They help in formulating security policy of the software system.

1.8.3 DISADVANTAGES OF NON-FUNCTIONAL

REQUIREMENT
Cons/drawbacks of Non-function requirement are:
• None functional requirement may affect the various high-level
software subsystem
• They require special consideration during the software
architecture/high-level design phase which increases costs.
• Their implementation does not usually map to the specific software
sub-system,
• It is tough to modify non-functional once you pass the architecture
phase.

1.8.4 KEY LEARNING

The character of the time period, the length of road, the weather, the bus
speed and the rate of road usage are adopted as input vectors in Support
Vector Machine

2.LITERATURE SURVEY
Based on this situation, a natural idea is to apply machine learning-based
methods that use existing experience and knowledge to perform static code
analysis on unknown binary code and automatically classify malware.
According to the guidance, this paper uses the related technologies of

6
machine learning based methods and explores the application of this method
in the classification of malware [11-14]. The essence of malware detection
is a classification problem, which distinguishes the samples to be detected
into malware or legitimate software. Therefore, the host malware detection
technology is driven by a machine learning algorithm’s core steps, and the
main research steps of this paper are as follows:  Collect sufficient malware
code samples and legitimate software samples.  Perform effective data
processing on the sample and extract the features.  Further select the main
features for classification.  Combine the training using machine learning
algorithms and establish a classification model.  Detect unknown samples
using the trained classification model. The ultimate goal is to find the most
effective features and models in this practical task. This chapter introduces
the main research questions and basic ideas. In the following, we will
introduce:
[1]S. Su, Y. Sun, X. Gao, J. Qiu* and Z. Tian*. A Correlation-change
based Feature Selection Method for IoT Equipment Anomaly
Detection. Applied Sciences.
In the era of the fourth industrial revolution, there is a growing trend to
deploy sensors on industrial equipment, and analyze the industrial
equipment’s running status according to the sensor data. Thanks to the rapid
development of IoT technologies [1], sensor data could be easily fetched
from industrial equipment, and analyzed to produce further value for
industrial control at the edge of the network or at data centers. Due to the
considerable development of deep learning in recent years, a common
practice of such analysis is to conduct deep learning [2,3,4]. Such methods
select a subset of all fetched sensor data stream as the input features, and
generate equipment predictions. As a result, the performance of the learning
model was seriously impacted by the features selected, thus feature selection
plays a critical role for such methods.
To select an appropriate set of features for the learning model,
researchers aim to select the most relevant features to the prediction model
to improve the prediction performance, or to select the most informative
features to conduct data reduction. Unfortunately, both kinds of methods

7
have intrinsic drawbacks when applied in the online scenarios. The former
kind of methods seriously depends on predefined evaluation criteria, such
as feature relevance metrics [5] or a predefined learning model [6]. Thus,
such method are limited to certain dataset, and are not suitable for online
scenarios which involve dynamical and unsupervised feature selection. The
later kind of methods right fits in the online scenarios. However, data
reduction mainly aims to improve the efficiency (but not accuracy) of the
prediction model, which is not the most concerning factor of online
industrial equipment status analysis.
To relieve the dependency of predefined evaluation criteria,
researchers switch to select the features which can indicate the online sensor
data’s characters, such as features which are smoothest on the graph [7], or
the features with highest clusterability [8,9]. In this paper, we focus on the
features with correlation changes such as smoothness and clusterability,
which are important characters for traditional pattern recognition fields like
image processing and voice recognition [7,8,9]. We believe that correlation
changes can significantly pinpoint status changes in industrial environment.
As far as we know, this is the first work focusing on correlation changes for
online feature selection.

2.X. Yu, Z. Tian, J. Qiu, F. Jiang. A Data Leakage Prevention Method

Based on the Reduction of Confidential and Context Terms for Smart
Mobile Devices. Wireless Communications and Mobile Computing,
https://doi.org/10.1155/2018/5823439.

With the development of Internet and information technology, smart mobile

devices appear in our daily lives, and the problem of information leakage on
smart mobile devices will follow which has become more and more serious
[1, 2]. All kinds of private or sensitive information, such as intellectual
property and financial data, might be distributed to unauthorized entity

8
intentionally or accidentally. And that it is impossible to prevent from
spreading once the confidential information has leaked.

According to survey reports [3, 4], most of the threats to information

security are caused by internal data leakage. These internal threats consist
of approximate 29% private or sensitive accidental data leakage,
approximate 16% theft of intellectual property, and approximate 15% other
thefts including customer information, and financial data. Further, the
consensus of approximate 67% organizations shows that the damage caused
from internal threats is more serious than those form outside.

Although laws and regulations have been passed to punish various behaviors
of intentional data leakage, it is still hard to prevent data leakage effectively.
Confidential data can be easily disguised by rephrasing confidential contents
or embedding confidential contents in nonconfidential contents [5, 6]. In
order to avoid the problems arising from data leakage, lots of software and
hardware solutions have been developed which are discussed in the
following chapter.

In this paper, we present CBDLP, a data leakage prevention model based on

confidential terms and their context terms, which can detect the rephrased
confidential contents effectively. In CBDLP, a graph structure with
confidential terms and their context involved is adopted to represent
documents of the same class, and then the confidentiality score of the
document to be detected is calculated to justify whether confidential
contents is involved or not. Based on the attribute reduction method from
rough set theory, we further propose a pruning method. According to the
importance of the confidential terms and their context, the graph structure
of each cluster is updated after pruning. The motivation of the paper is to
develop a solution which can prevent intentional or accidental data leakage
from insider effectively. As mixed-confidential documents are very
common, it is very important to accurately detect the documents containing
confidential contents even when most of the confidential contents have been
rephrased.

9
[3]Y. Sun, M. Li, S. Su, Z. Tian, W. Shi, M. Han. Secure Data Sharing
Framework via Hierarchical Greedy Embedding in Darknets.
ACM/Springer Mobile Networks an

Geometric routing, which combines greedy embedding and greedy

forwarding, is a promising approach for efficient data sharing in darknets.
However, the security of data sharing using geometric routing in darknets is
still an issue that has not been fully studied. In this paper, we propose a
Secure Data Sharing framework (SeDS) for future darknets via hierarchical
greedy embedding. SeDS adopts a hierarchical topology and uses a set of
secure nodes to protect the whole topology. To support geometric routing in
the hierarchical topology, a two-level bit-string prefix embedding approach
(Prefix-T) is first proposed, and then a greedy forwarding strategy and a data
mapping approach are combined with Prefix-T for data sharing. SeDS
guarantees that the publication or request of a data item can always pass
through the corresponding secure node, such that security strategies can be
performed. The experimental results show that SeDS provides scalable and
efficient end-to-end communication and data sharing.

2.4 Z. Wang, C. Liu, J. Qiu, Z. Tian, C., Y. Dong, S. Su Automatically

Traceback RDP-based Targeted Ransomware Attacks. Wireless
Communications and Mobile Computing. 2018.
https://doi.org/10.1155/2018/7943586.

With the popularization of new energy electric vehicles (EVs), the

recommendation algorithm is widely used in the relatively new feld of
charge piles. At the same time, the construction of charging infrastructure is
facing increasing demand and more severe challenges. With the ubiquity of
Internet of vehicles (IoVs), inter-vehicle communication can share
information about the charging experience and trafc condition to help
achieving better charging recommendation and higher energy efciency. The
recommendation of charging piles is of great value. However, the existing
methods related to such recommendation consider inadequate reference
factors and most of them are generalized for all users, rather than

10
personalized for specifc populations. In this paper, we propose a
recommendation method based on dynamic charging area mechanism,
which recommends the appropriate initial charging area according to the
user’s warning level, and dynamically changes the charging area according
to the real-time state of EVs and charging piles. The recommendation
method based on a classifcation chain provides more personalized services
for users according to diferent charging needs and improves the utilization
ratio of charging piles. This satisfes users’ multilevel charging demands and
realizes a more efective charging planning, which is benefcial to overall
balance. The chained recommendation method mainly consists of three
modules: intention detection, warning levels classifcation, and chained
recommendation. The dynamic charging area mechanism reduces the
occurrence of recommendation confict and provides more personalized
service for users according to diferent charging needs. Simulations and
computations validate the correctness and efectiveness of the proposed
method. Keywords Electric vehicle · Recommendation confict · Chained
recommendation · Dynamic charging area mechanism Mathematics Subject
Classifcation 68W40 * Yu Jiang [email protected] Extended author
information available on the last page of the article T. Zhang et al. 1 3 1
Introduction Recommendation for resources and services is a classic
problem. Especially in the current era of big data and cloud computing,
recommendation systems are widely used in a variety kinds of felds.
Traditional recommendation algorithms are used in shopping, reading,
catering, accommodation, and other felds, which brings our daily lives great
convenience. With the increasing popularity of new energy vehicles, the
demands of charging electric vehicles (EVs) are becoming increasingly
obvious. However, as a relatively new application feld, the development of
algorithms on recommendation of charging piles is not addressed well
enough to meet the increasing demands. Compared with the traditional gas
pile, the charging pile has the characteristics of longer charging time, less
stable price and limited service capacity. In addition, the charging pile and
the vehicle are required to be matched on some parameters. From users’
point of view, they usually fnd it difcult to make choices or they are simply

11
too lazy to make such decisions. Thus, most of them tend to follow other
people’s decisions blindly and gather at the most popular charging piles,
which may lead to the unbalanced utilization of charging resources. All of
the above mentioned bring great challenges to the recommendation of
charging piles [1]. The EV industry is growing rapidly, and the government
is also vigorously promoting the construction of charging infrastructure. As
of January 2019, the ratio of public charging piles to new energy vehicles in
China is about 1:7.6. Owners of EVs can select the idle electric piles or make
an appointment for charging through applications developed for
recommendation. However, due to the lack of suitable dispatching method
in existing charging pile platforms, owners need to choose idle charging
piles independently [2]. On one hand, it leads to a poor user experience.
Users have to make decisions to pick a charging pile independently to charge
or reserve from a large number of charging piles that can meet their
conditions. On the other hand, it generates lots of time fragmentation, which
could reduce the utilization rate of charging piles [3]. Recently, both
industrial and academic communities started to have great interest to EVs
and charging pile deployment. The mainly studied issues are the siting of
charging piles and the recommendation of EVs. For example, Tian et al. [4]
provided a real-time charging pile recommendation system for EV taxis via
large-scale GPS data mining. Jung et al. [5] used an activity-based model to
analyze the queue delay of charge piles and ofer decision support for
choosing locations of undeployed charging piles. Besides, Gharbaoui et al.
[6] also used activity-based models and found that in urban areas, public
charging piles can be under-utilized and location selecting of charging piles
should be considered to reduce EV owners’ range anxiety [7, 8]. Mobility,
high density, sparse connectivity, and heterogeneity bring spatial challenges
for the vehicular Internet of Things. For emerging vehicular IoT
applications, distributed communication, data caching, and computing tasks
are conducted to provide more reliable and efcient communications in
various network environments [9, 10]. Edge computing provides high-class
intelligent services and computing capabilities at the edge of the networks,
and the constrained shortest distance (CSD) querying can also be applied to

12
recommendation algorithms [11, 12]. Feng et al. [13] 1 3 A method
of chained recommendation for charging piles in internet… introduced a
distributed vehicular edge computing solution named the autonomous
vehicular edge (AVE), which can share neighboring vehicles’ available
resources via vehicle-to-vehicle (V2V) communications. Feng et al. [14]
designed an ant colony optimization algorithm to schedule bufers based on
information collected on an adjacent vehicle. While the rapid development
of IoT devices is changing our daily lives, some particular issues hinder the
massive deployment of IoT devices. The cloud-based malware detection can
utilize the data sharing and powerful computational resources of secured
servers to improve the detection performance. Such methods provide good
technical supports for charging piles recommendation [15, 16]. The state of
charging piles and EVs are changing all the time. The recommendation
method based on charging intention takes charging needs of surrounding
users into account to make a more accurate recommendation list for the
served user. However, this method is not applicable for users with urgent
needs. To satisfy more users, it is vital to provide a personalized
recommendation list. And diferent from books, movies, and other items,
charging piles and EVs have strong regional characteristics. Users live or
work in a certain area are used to choose charging piles nearby. The
recommendation method based on preference can provide users with
recommendation lists in line with their charging habits, but this method is
not applicable to newly installed charging piles and new users. The
recommendation of charging piles is a research direction of great value.
However, most of the existing researches related to such recommendation
tend to have some shortcomings [17]. On one hand, the existing methods
consider inadequate reference factors, which lead to the uncertainty and
unavailability of the recommendation results, and afect the accuracy of
recommendation. On the other hand, the existing recommendation methods
are generalized for all users, rather than personalized for specifc
populations. People nowadays have partiality for personalized services,
because those services can customize the most suitable solutions for users
of diferent needs [18]. In this paper, we propose a chained recommendation

13
method for charging piles, which can reduce the occurrence of
recommendation confict. We can understand that, based on his/her location
data and charging history, whether a user has charging intentions or not. If
the user has charging intentions, he/she is marked as “a user to be charged”.
Based on the user’s profle, the endurance data of the EVs, and the charging
piles distribution in the area, we divide the “users to be charged” into three
warning levels, and screen the target charging areas for users based on their
warning levels. Once an user enters the target charging area, the
recommendation list of charging piles is generated and is sent to the user.
This method can satisfy users’ multilevel charging demands and improve
the utilization ratio of charging piles. It can realize a more efective allocation
of charging piles, which is benefcial to overall balance of resources
utilization. The novelty of our method is mainly refected in two aspects: On
the one hand, it applies chained recommendation mechanism to make
recommendations for users’ with diferent charging needs. On the other hand,
dynamic charging area mechanism is designed to detect and alleviate the
recommendation confict to a certain extent even eliminate entirely. The
organization of this paper is as follows. In Sect. 2, we describe the
construction of the recommendation model according to diferent conditions.
In particular, we introduce three sub- modules including intention detection
module, warning levels classifcation module, and chained recommendation
module in detail. In Sect. 3, we simulate and verify the feasibility of our
method. In the last Sect. 4, we conclude the paper and suggest future work.
2 Recommandation modeling The recommendation model consists modules
of intention detection, warning levels classifcation, and chained
recommendation, as shown in Figs. 1 and 2. The intention detection module
mainly includes two sub-modules: data acquisition and state probability
calculation, and is responsible for detecting user’s charging intentions. The
warning levels classifcation module is based on the situation of EVs and
charging piles, surrounding environment, users’ profle, and so on. It is
responsible for dividing the warning levels of users into three categories:
high-class, mediumclass, and low-class warning levels. The warning levels
classifcation module aims to prepare for the subsequent chained

14
recommendation module. Warning Levels Classification Module Intention
Detection Module Dynamic Charging Area Mechanism Chained
Recommendation Mechanism Fig. 1 Modules of the recommendation model
Fig. 2 Recommendation model 1 3 A method of chained recommendation
for charging piles in internet… The chained recommendation module is
mainly based on the dynamic charging area mechanism and the chained
recommendation mechanism. According to diferent warning levels, it is
divided into three sub-modules. By referring to a variety of factors, it
provides users with diferent levels of recommendation services, and
generates the fnal lists of recommended charging piles [19–23]. At the same
time, according to the dynamic parameters of vehicles’ battery status,
driving information and charging demands, our system adjusts the chained
recommendation process and the recommendation strategy, and gives the
real-time recommendation results of charging piles, so as to optimize the
charging efciency and utilization. 2.1 Intention detection module Based on
the user’s current GPS location information and charging history, the
intention detection module is responsible for making detection about the
user’s current charging intention. There are three types of the charging
intentions: on the way to charging pile (S1), has reached charging pile (S2),
and has no charging intention (S3) [4]. • Data Acquisition: it is responsible
for acquiring user’s charging history, as well as user’s current GPS location
information. • State Probability Calculation: it is responsible for statistical
analysis of data, and the specifc approach is as follows. First, we divide one
day into several periods, and then count the number of days that a user is in
a certain state at a certain time based on the charging history. For example,
by analyzing the charging history of the user in the past month (30 days),
we can detect the charging intention of the user at T-time [24]. It can be
found that in the past month, the number of days that the user is in state S1
at T-time is 5 days ( =5), the number of days in state S2 is 5 days ( =5), and
the number of days in state S3 is 20 days ( =20). This implies that the user
has spent the most days in the state of S3 in the past month, so it can be
predicted that the user’s charging intention may be in the state of S3 at T-
time, which means that the user has no charging intention at T-time.

15
Probability calculation formula of intention detection is as follows: where
N(Si ) denotes the total number of days that the EV is in the state of Si at T-
time. In addition, charging intention is detected on the basis of the following
formula: P (1) ( Si ) = N ( Si ) N ( S1 ) + N ( S2 ) + N ( S3 ) , N ( Si ) > 0
Maximum[P (2) ( Si ) ] T. Zhang et al. 1 3 2.2 Warning levels classifcation
module 2.2.1 User’s profle It is necessary to judge whether the user belongs
to VIP users or ordinary users. Under the pattern of pay-for-service model,
users can become VIP users by paying a certain fee, which enhances their
priority of recommendation services. For VIP users, it is considered that
they have the privilege of obtaining priority recommendation. Their warning
level is considered to be high-class, and we recommend high-quality and
convenient piles to them frst [25]. For ordinary users, the warning level is
divided based on situations of vehicles, piles, and surrounding environment.
2.2.2 Surrounding environment According to weather, air conditioning,
congestion idling, climbing, and other scenario information, we make
assumptions about the power consumption rate of batteries, and predict the
time left on the remaining power. 2.2.3 Situation of piles and EVs It is
necessary to classify warning levels based on the remaining power, ratio of
vehicles to piles, density, and other elements. It should be noted that, the
less residual electricity that vehicles have, the more urgent the charging
demand is, and the higher charging priority is. Besides, we pay more
attention to areas with lower distribution density of charging piles. In such
areas strengthened recommendations and reminders are necessary. 2.3
Chained recommendation module 2.3.1 Dynamic charging area mechanism
When recommending charging piles, we may encounter situations where we
recommend the same charging pile to two or more users [26, 27]. Under
some circumstances, it leads to recommendation confict if the number of
charging piles is insufcient. To mitigate recommendation confict, the
dynamic charging area mechanism is applied. Under the dynamic charging
area mechanism, we recommend the dynamic charging area frstly and the
specifc charging piles subsequently. First of all, an initial charging area is
recommended for users. The recommendation system continuously
refreshes the optimized recommendation list while the EV proceeds to the

16
recommended charging pile. As the constraints changes continuously, the
recommendation list changes accordingly [28, 29]. That is to say, the
chained recommendation mechanism described in the following section
provides algorithmic support for the dynamic charging area mechanism. 1 3
A method of chained recommendation for charging piles in internet…
When the user arrives at the recommended charging area, our system
recommends a charging pile located in the recommended charging area for
the user. Thus, even if the same charging pile is recommended to two or
more users and the number of charging piles is insufcient, our system can
re-recommend suitable charging piles that are in the same charging area and
close to users. The recommendation confict detection is introduced into the
dynamic charging area mechanism to detect whether there is a
recommendation confict at each time node or location node. It is responsible
for detecting whether a charge pile in the recommendation list generated for
users has been occupied in advance. In the case of the occurrence of
recommendation confict, the recommendation list is regenerated through
chained recommendation mechanism [30–34]. The dynamic charging area
mechanism shown in Algorithm 1 adopts the idea of retrospective
recommendation. Even in the absence of recommendation confict, it
continuously detects if there is a more suitable charging pile based on the
change of user’s location and time, and it updates the recommendation list
in real time. By adopting this mechanism, our recommendation method is
responsible for the recommendation results, and the continuous tracking is
also available A gorithm 1 Dynamic Charging Area Mechanism input The
initial set of charging areas; while Appropriate time node or location node
is detected do if The EV arrives at the recommended charging area then
Generate the recommendation list; break; else if Updating is needed then
Update the set of charging areas; end if end while return The
recommendation list; . 2.3.2 Chained recommendation mechanism It is
responsible for making hierarchical recommendation and providing
integrated and fair services for users with diferent warning levels. By the
fusion of multiple models, it can improve the accuracy of recommendation.
2.3.2.1 Level-one fltering For users at high-class warning levels, their

17
charging needs are more urgent. We give these users the highest priority and
recommend charging piles for them in a most efcient way. In the level-one
fltering, the most convenient recommendation for users is just based on
some basic elements such as users’ current location information, EVs’
profle, the occupancy status of charging piles, and the parking space of
charging piles, etc. T. Zhang et al. 1 3 2.3.2.2 Level-two fltering For users
at medium-class warning levels, the level-two fltering is applied to
recommend charging piles for them. After the level-one fltering, an initial
set of charging piles is prepared. The charging piles in this set should meet
some requirements, such as appropriate distance (Too close or too far is not
appropriate [35]), appropriate charging rate, and matching parameters
between charging piles and EVs. Afterwards, we introduce the improved
collaborative fltering algorithm in the level-two fltering. It integrates users’
preferences, waiting time, price, and other factors to make personalized,
socialized, and economical recommendation results for users. • Calculating
users’ preference for diferent charging piles: By analyzing users’ historical
charging behavior and combining the current distance between users and
charging piles, we can calculate users’ preference for diferent charging piles,
and the calculation formula is as follows: where: pi ∈ ⟨p1, p2, …, pi ⟩
represents the current position of the i-th EV; staj ∈ ⟨ sta1,sta2, …,staj ⟩
represents the j-th charging pile in the initial set of charging piles; cij ∈ ⟨
c11, …, c1j, c21, …, cij⟩ represents the total number of charging times that
the i-th EV has spent at the j-th charging pile; dist( pi ,staj ) represents the
real-time distance between the i-th EV with the j-th charging pile at the
current moment. In addition, recommendation based on preference is
measured on the basis of the following formula: • Calculating the waiting
time that the user choose diferent charging piles: In general, the waiting time
is equal to the sum of time spent on the way to diferent charging piles and
time spent on charging. If other users arrive at the designated charging pile
frst and all charging piles are occupied, in this case, the waiting time is equal
to the sum of time spent on the way to diferent charging piles, time spent on
charging, and the queuing time (the remaining charging time of the previous

18
user). The criterion for recommendation based on waiting time is to select
charging pile with the shortest waiting time.

2.5 L. Xiao, Y. Li, X. Huang, X. Du, “Cloud-based Malware Detection

Game for Mobile Devices with Offloading”, IEEE Transactions on
Mobile Computing, Volume: 16, Issue: 10, Pages: 2742 – 2750, Oct.
2017. DOI: 10.1109/TMC.2017.2687918.

—Edge computing is a new paradigm to provide rich computing capability

at the edge of pervasive radio access networks close to users. A critical
research challenge of edge computing is to design an efficient offloading
strategy to decide which tasks can be offloaded to edge servers with limited
resources. Although many research efforts attempt to address this challenge,
they need centralized control, which is not practical because users are
rational individuals with interests to maximize their own benefits. In this
paper, we study to design a decentralized algorithm for computation
offloading, so that users can independently choose their offloading
decisions. Game theory has been applied in the algorithm design. Different
from existing work, we address the challenge that users may refuse to
expose their own information about network bandwidth and preference.
Therefore, it requires that our solution should make the offloading decision
without such knowledge. We formulate the problem as a partially
observable Markov decision process (POMDP), which is solved by a policy
gradient deep reinforcement learning (DRL) based approach. Extensive
simulation results show that our proposal significantly outperforms existing
solutions. Keywords—Edge computing, computation offloading, Nash
equilibrium, partially observable Markov decision process (POMDP), deep
reinforcement learning (DRL). ✦ 1 INTRODUCTION A S mobile phones
are gaining enormous popularity, more and more mobile applications, such
as face recognition, natural language processing and augmented reality, are
emerging and attracting great attention [1–3]. Theses mobile applications
are typically resource-hungry, demanding intensive computation and high
energy consumption, which can be hardly supported by mobile phones with
limited computation resources and battery life. To overcome this limitation,

19
a novel computing paradigm, called edge computing, has been proposed as
a promising solution [4]. A number of modest-size computing servers have
been deployed at the edge of pervasive radio access networks close to users,
so that users can offload their computing tasks to these servers with low
latency. Although the edge-based computation offloading approach can
significantly augment computation capability of users, developing a
comprehensive and reliable edge computing system remains challenging.
Edge servers have limited hardware resources. If too many users choose to
offload their tasks simultaneously, it would exceed the capacity of edge
servers, leading to long task response time. Therefore, it is critical to design
an effi- • Y. Zhan and S. Guo are with the Department of Computing, The
Hong Kong Polytechnic University, Hong Kong. E-mail:
[email protected], [email protected]. • P. Li is with School of
Computer Science and Engineering, The University of Aizu, Japan. E-mail:
[email protected]. • J. Zhang is with School of Automation, Beijing
Institute of Technology, Beijing, China. E-mail: [email protected].
cient offloading strategy to decide which tasks of users should be offloaded
to edge servers. This problem has been recognized as one of the most critical
challenges for edge computing, but most existing work needs centralized
control to achieve global optimal performance [5, 6]. Unfortunately, it is not
practical to enforce all users to act according to a centralized control because
they are individuals with rational choices in computation offloading. Game
theory is a powerful framework to analyze the interactions among multiple
players who act in their own interests. It can be used to design decentralized
mechanisms, such that no player has the incentive to deviate unilaterally.
Thanks to its great promises, game theory has been applied for designing
offloading algorithm for edge computing by recent research efforts. For
example, Chen et al. [7, 8] have designed a decentralized computation
offloading game for mobile cloud computing. Josilo et ˇ al. [9] have
proposed selfish decentralized computation offloading in dense wireless
networks where each user can offload its computation to multiple wireless
base stations. However, existing work can hardly be applied in practice
because of two weaknesses. First, they consider a discrete action model that

20
allows users to choose a limited number of actions. Although this model
works well in scenarios with a few users, it cannot handle large-scale
problems. A straightforward approach is to add more actions in the problem
formulation, but it leads to higher algorithm complexity. Second, existing
work has a strong assumption that all users should share their information,
e.g., quality of network connection and preference on energy efficiency, so
that they can make the best offloading decisions. However, users may be
Authorized licensed use limited to: Northwestern University. Downloaded
on May 03,2020 at 12:20:46 UTC from IEEE Xplore. Restrictions apply.
0018-9340 (c) 2019 IEEE. Personal use is permitted, but
republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html
for more information. This article has been accepted for publication in a
future issue of this journal, but has not been fully edited. Content may
change prior to final publication. Citation information: DOI
10.1109/TC.2020.2969148, IEEE Transactions on Computers 2 unwilling
to expose such personal information due to privacy and security concerns.
In this paper, we study to conquer above weaknesses by designing an
algorithm based on game theory enhanced by deep reinforcement learning
(DRL). Specifically, we consider a number of users who can connect an
edge server via multiple access points (e.g., base stations or WiFi routers).
Each user can arbitrarily divide its task into smaller subtasks and choose to
offload a portion of them to the edge server. A challenge arises because of
partial offloading. It makes the model more flexible, but users should choose
their actions from a continuous space, which is different from discrete
models used by existing work that considers simple offloading decisions,
e.g., offloading the whole task or not [10]. We first study a simple scenario
that users share their information, e.g., network bandwidth and preference,
and design an algorithm that is able to achieve Nash equilibrium. Based on
the insight provided by this algorithm, we then extend our work for
scenarios without information sharing. The problem is formulated as a
multi-agent partially observable Markov decision process (POMDP). To
address the challenges of network dynamics and continuous decision space,

21
we propose a decentralized approach based on deep reinforcement learning
(D-DRL) with policy gradient and differential neural computer (DNC). Our
approach can effectively learn the optimal offloading policy under high
network dynamics in a continuous decision space directly from computation
offloading game history without any prior knowledge about system models.
It has merits over model-based computation offloading game strategies in
that it is totally model-free and provides a general solution to computation
offloading problems. Thus, it can be applied to complex and unpredictable
situations where it is difficult to obtain the precise system models.
Moreover, DNC which is first used in policy gradient DRL is capable of
remembering past information and inferring the hidden states of
observations automatically. By incorporating the DNC into our framework,
not only the policy optimization process will be accelerated significantly,
but also the users can learn policy when the network is time-varying and
uncertain. The main contributions of this paper are summarized as follows:
• We study the task offloading problem in edge computing and formulate it
as a decentralized computation offloading game in each time slot by taking
into account both communication and computation cost. We solve this
problem by proposing an algorithm that can achieve Nash equilibrium. • We
study the offloading problem without information sharing and formulate it
as a multi-agent POMDP. An algorithm based on DRL and DNC has been
proposed to solve this challenging problem. • Simulation results
demonstrate effectiveness of the proposed scheme by comparing it with
state-of-theart. The edge computing paradigm has attracted considerable
attention in both academia and industry over the past several years. Nokia
introduced the very first realworld edge computing platform in 2013 [11],
in which the computing platform called radio application cloud servers is
fully integrated with the Flexi Multiradio base stations. Saguna also
introduced their fully virtualized edge computing platform Open-RAN,
which can provide an open environment for running third-party edge
computing applications [12]. Currently, the industry specifications group
was formed to standardize the adoption of edge computing within the RAN
[4]. Many existing work has studied the computation offloading problem

22
from the perspective of a single user. Redenko et al. [13] have shown that
computation offloading can save energy according to their experimental
results. In [14], an optimization scheme for energy-efficient application
execution has been proposed on the cloudassisted mobile application
platform. Xian et al. [15] have proposed an adaptive timeout scheme for
computation offloading to improve the energy saving. Huertacanepa et al.
[16] have proposed an adaptive application offloading scheme based on
current system conditions and the execution history of applications. Based
on Lyapunpov optimization, authors in [17] and [18] have studied the
dynamic computation offloading mechanism for minimizing computational
and communication energy consumption under real network environment.
There are some works that have investigated the computation offloading
problem in the multi-user case. Rodrigues et al. [6] have proposed a hybrid
method for minimizing service delay in edge computing through virtual
machine migration and transmission power control. Yang et al. [19] have
proposed a genetic algorithm to solve the partition problem of wireless
network bandwidth among multiple users, which achieves high throughput
of processing the streaming data. In [20], Zhao et al. have proposed a low-
complexity heuristic method to implement energy-efficient task offloading
for multi-user mobile cloud computing. In [21], an iterative algorithm has
been proposed to perform the joint optimization of radio and computational
resources for multi-cell edge computing under the budget constraints of
latency and power. You et al. [22] have studied a centralized offloading
framework for a multi-user edge computing system based on TDMA and
OFDMA aiming to minimize the Authorized licensed use limited to:
Northwestern University. Downloaded on May 03,2020 at 12:20:46 UTC
from IEEE Xplore. Restrictions apply. 0018-9340 (c) 2019 IEEE. Personal
use is permitted, but republication/redistribution requires IEEE permission.
See
http://www.ieee.org/publications_standards/publications/rights/index.html
for more information. This article has been accepted for publication in a
future issue of this journal, but has not been fully edited. Content may
change prior to final publication. Citation information: DOI

23
10.1109/TC.2020.2969148, IEEE Transactions on Computers 3 user’s
energy consumption. Guo et al. [23] have provided an energy-efficient
dynamic offloading and resource scheduling policy to reduce energy
consumption and shorten application completion time. Consensus protocol
in blockchain networks is a computation-intensive process, which makes the
computationally lightweight nodes such as the mobile devices may be
prevented from directly participating in the consensus process. Xiong et al.
[24] have proposed the mining tasks offloading approach to alleviate such
limitation. However, all above works need centralized control, ignoring the
interactions among multiple users when they independently determine their
computation offloading strategies. Some recent works [7, 8, 25–28] have
modeled users as self-interested game players and proposed decentralized
schemes to solve the multi-user computation offloading problems.
However, they mainly focus on the computation offloading problems under
relatively static environment. In real network environment, due to the time-
varying wireless networks, the utility of each user is dynamically changing,
and thus the solution of the Nash equilibrium in the static game model may
not be reached. In [29], authors take into account the timevariant wireless
network, and model the computation offloading game as a stochastic game.
They assume that dedicated edge computing resources are allocated to each
user, so that users do not need to compete for computational resources at the
edge. However, this strong assumption is not practical in real computational
environment, and would lead to low utilization of edge computation
resources. Xiao et al.[30] have proposed the multi-user computation
offloading problem in timevariant wireless networks, and each user needs to
compete the computational resource. A Q-learning based approach has been
proposed to achieve the Nash equilibrium of the dynamic computation
offloading game. However, the users’ decision space is discrete in their
model and the proposed approach has high complexity in solving large-scale
problems. It is challenging to achieve Nash equilibrium in the stochastic
games in the decentralized and dynamic environment. Multi-agent Nash Q-
Learning [31] has been proposed for discrete stochastic game. Lillicrap et
al. [32] have proposed the DDPG approach for the multi-agent Markov

24
decision process, where the environment is fully observable. In [33],
Srinivasan et al. have modeled the games as a partially observable Markov
decision process (POMDP), and examined the role of current policy gradient
and actor-critic algorithms. However, they focused on adversarial games. In
contrast to the previous research, our work in this paper formally addresses
the problem of partial computation offloading, dynamic environment and
incomplete information sharing in edge computing. This is a non-trivial
problem due to that each user could only obtain partial observation and thus
could not derive the optimal decision.

3. PROBLEM ANALYSIS
3.1 EXISTING APPROACH:
. Based on this situation, a natural idea is to apply machine learning-based
methods that use existing experience and knowledge to perform static code
analysis on unknown binary code and automatically classify malware.
According to the guidance, this paper uses the related technologies of
machine learning based methods and explores the application of this method
in the classification of malware
.
3.11Drawbacks
Must need basicknowledge to perform static code analysis on
unknown binary code and automatically classify malware

25
3.2 Proposed System
In this paper, we mainly focus on static code analysis. The early static
code analysis methods mainly include feature matching or broad-
spectrum signature scanning. Feature matching simply uses feature
string matching to complete the detection, while the broad-spectrum
scanning scans the feature code and uses masked bytes to divide the
sections that need to be compared and those that do not need to be
compared. Since both methods need to get malware samples and extract
features before they can be detected, the hysteresis problem is serious.
Furthermore, with the development of malware technology, malware
begins to deform in the transmission process in order to avoid being
found and killed, and there is a sudden increase in the number of
malware variants. The shape of the variants changes a lot so that it is
difficult to extract a piece of code as a malware signature.

3.2.1 Advantages

• simply uses feature string matching to complete the detection, while the
broad-spectrum scanning done both comparison and un-comparison

3.3 Software And Hardware Requirements

SOFTWARE REQUIREMENTS
The functional requirements or the overall description documents
include the product perspective and features, operating system and operating
environment, graphics requirements, design constraints and user
documentation.
The appropriation of requirements and implementation constraints
gives the general overview of the project in regards to what the areas of
strength and deficit are and how to tackle them.

• Python idel 3.7 version (or)

• Anaconda 3.7 ( or)
• Jupiter (or)

26
• Google colab

HARDWARE REQUIREMENTS
Minimum hardware requirements are very dependent on the particular
software being developed by a given Enthought Python / Canopy / VS Code
user. Applications that need to store large arrays/objects in memory will
require more RAM, whereas applications that need to perform numerous
calculations or tasks more quickly will require a faster processor.
• Operating system : windows, linux
• Processor : minimum intel i3
• Ram : minimum 4 gb
• Hard disk : minimum 250gb

3.4 About Dataset

27
3.5 Algorithms
XGBoost, LightGBM and Random Forests

4. SYSTEM DESIGN

UML DIAGRAMS
The System Design Document describes the system requirements, operating
environment, system and subsystem architecture, files and database design,
input formats, output layouts, human-machine interfaces, detailed design,
processing logic, and external interfaces.
Global Use Case Diagrams:
Identification of actors:
Actor: Actor represents the role a user plays with respect to the system.
An actor interacts with, but has no control over the use cases.
Graphical representation:

28
<<Actor name>>
Actor

An actor is someone or something that:

Interacts with or uses the system.
Provides input to and receives information from the system.
Is external to the system and has no control over the
use cases. Actors are discovered by examining:
• Who directly uses the system?
• Who is responsible for maintaining the system?
• External hardware used by the system.
• Other systems that need to interact with the system. Questions to
identify actors:
o Who is using the system? Or, who is affected by the system? Or, which
groups need help from the system to perform a task?

29
o Who affects the system? Or, which user groups are needed by the system
to perform its functions? These functions can be both main functions
and secondary functions such as administration.
o Which external hardware or systems (if any) use the system to perform
tasks?
o What problems does this application solve (that is, for whom)?
o And, finally, how do users use the system (use case)? What are they
doing with the system?
The actors identified in this system are:
a. System Administrator
b. Customer
c. Customer Care
Identification of usecases:
Usecase: A use case can be described as a specific way of using the
system from a user’s (actor’s) perspective.
Graphical representation:

A more detailed description might characterize a use case as:

• Pattern of behavior the system exhibits
• A sequence of related transactions performed by an actor and the
system
• Delivering something of value to the actor Use cases provide a
means to:
• capture system requirements
• communicate with the end users and domain experts
• test the system
Use cases are best discovered by examining the actors and defining what
the actor will be able to do with the system.
Guide lines for identifying use cases:

30
• For each actor, find the tasks and functions that the actor should be able
to perform or that the system needs the actor to perform. The use case should
represent a course of events that leads to clear goal
• Name the use cases.
• Describe the use cases briefly by applying terms with which the user is
familiar. This makes the description less ambiguous
Questions to identify use cases:
• What are the tasks of each actor?
• Will any actor create, store, change, remove or read information in the
system?
• What use case will store, change, remove or read this information?
• Will any actor need to inform the system about sudden external
changes?
• Does any actor need to inform about certain occurrences in the system?
• What usecases will support and maintains the system?
Flow of Events
A flow of events is a sequence of transactions (or events) performed by the
system. They typically contain very detailed information, written in terms
of what the system should do, not how the system accomplishes the task.
Flow of events are created as separate files or documents in your favorite
text editor and then attached or linked to a use case using the Files tab of a
model element.
A flow of events should include:
• When and how the use case starts and ends
• Use case/actor interactions
• Data needed by the use case
• Normal sequence of events for the use case
• Alternate or exceptional flows Construction of Usecase diagrams:
Use-case diagrams graphically depict system behavior (use cases). These
diagrams present a high level view of how the system is used as viewed from
an outsider’s (actor’s) perspective. A use-case diagram may depict all or
some of the use cases of a system.
A use-case diagram can contain:

31
• actors ("things" outside the system)

• use cases (system boundaries identifying what the system should do)
• Interactions or relationships between actors and use cases in the system
including the associations, dependencies, and generalizations.
Relationships in use cases:
1. Communication:
The communication relationship of an actor in a usecase is shown by
connecting the actor symbol to the usecase symbol with a solid path. The
actor is said to communicate with the usecase.
2. Uses:
A Uses relationship between the usecases is shown by generalization
arrow from the usecase.
3. Extends:
The extend relationship is used when we have one usecase that is similar to
another usecase but does a bit more. In essence it is like subclass.
SEQUENCE DIAGRAMS
A sequence diagram is a graphical view of a scenario that shows object
interaction in a time- based sequence what happens first, what happens
next. Sequence diagrams establish the roles of objects and help provide
essential information to determine class responsibilities and interfaces.
There are two main differences between sequence and collaboration
diagrams: sequence diagrams show time-based object interaction while
collaboration diagrams show how objects associate with each other. A
sequence diagram has two dimensions: typically, vertical placement
represents time and horizontal placement represents different objects.
Object:
An object has state, behavior, and identity. The structure and behavior of
similar objects are defined in their common class. Each object in a diagram
indicates some instance of a class. An object that is not named is referred to
as a class instance.
The object icon is similar to a class icon except that the name is
underlined: An object's concurrency is defined by the concurrency of its
class.

32
Message:
A message is the communication carried between two objects that trigger
an event. A message carries information from the source focus of control
to the destination focus of control. The synchronization of a
message can be modified through the
message specification. Synchronization means a message where
the sending object pauses to wait for results.
Link:
A link should exist between two objects, including class utilities, only if
there is a relationship between their corresponding classes. The existence
of a relationship between two classes symbolizes a path of communication
between instances of the classes: one object may send messages to another.
The link is depicted as a straight line between objects or objects and class
instances in a collaboration diagram. If an object links to itself, use the
loop version of the icon.

CLASS DIAGRAM:
Identification of analysis classes:
A class is a set of objects that share a common structure and common
behavior (the same attributes, operations, relationships and semantics). A
class is an abstraction of real-world items. There are 4 approaches for
identifying classes:
a. Noun phrase approach:
b. Common class pattern approach.
c. Use case Driven Sequence or Collaboration approach.
d. Classes , Responsibilities and collaborators Approach
1. Noun Phrase Approach:
The guidelines for identifying the classes:
• Look for nouns and noun phrases in the usecases.
• Some classes are implicit or taken from general knowledge.
• All classes must make sense in the application domain; Avoid
computer implementation classes – defer them to the design stage.
• Carefully choose and define the class names After identifying the
classes we have to eliminate the following types of classes:
33
• Adjective classes.
2. Common class pattern approach:
The following are the patterns for finding the candidate classes:
• Concept class.
• Events class.
• Organization class
• Peoples class
• Places class
• Tangible things and devices class.
3. Use case driven approach:
We have to draw the sequence diagram or collaboration diagram. If there is
need for some classes to represent some functionality then add new classes
which perform those functionalities.
4. CRC approach:
The process consists of the following steps:
• Identify classes’ responsibilities ( and identify the classes )
• Assign the responsibilities
• Identify the collaborators. Identification of responsibilities of each
class:
The questions that should be answered to identify the attributes and methods
of a class respectively are:
a. What information about an object should we keep track of?
b. What services must a class provide? Identification of relationships
among the classes:
Three types of relationships among the objects are:
Association: How objects are associated?
Super-sub structure: How are objects organized into super classes and sub
classes? Aggregation: What is the composition of the complex classes?
Association:
The questions that will help us to identify the associations are:
a. Is the class capable of fulfilling the required task by itself?
b. If not, what does it need?
c. From what other classes can it acquire what it needs? Guidelines for

34
identifying the tentative associations:
• A dependency between two or more classes may be an association.
Association often corresponds to a verb or prepositional phrase.

• A reference from one class to another is an association. Some

associations are implicit or taken from general knowledge.
Some common association patterns are:
Location association like part of, next to, contained in….. Communication
association like talk to, order to ……
We have to eliminate the unnecessary association like implementation
associations, ternary or n- ary associations and derived associations.
Super-sub class relationships:
Super-sub class hierarchy is a relationship between classes where one class
is the parent class of another class (derived class).This is based on
inheritance.
Guidelines for identifying the super-sub relationship, a generalization are
1. Top-down:
Look for noun phrases composed of various adjectives in a class name.
Avoid excessive refinement. Specialize only when the sub classes have
significant behavior.
2. Bottom-up:
Look for classes with similar attributes or methods. Group them by
moving the common attributes and methods to an abstract class. You may
have to alter the definitions a bit.
3. Reusability:
Move the attributes and methods as high as possible in the hierarchy.
4. Multiple inheritances:
Avoid excessive use of multiple inheritances. One way of getting benefits
of multiple inheritances is to inherit from the most appropriate class and add
an object of another class as an attribute.
Aggregation or a-part-of relationship:
It represents the situation where a class consists of several component
classes. A class that is composed of other classes doesn’t behave like its
parts. It behaves very difficultly. The major properties of this relationship

35
are transitivity and anti symmetry.
The questions whose answers will determine the distinction between the
part and whole relationships are:
• Does the part class belong to the problem domain?
• Is the part class within the system’s responsibilities?

36
• Does the part class capture more than a single value?( If not then
simply include it as an attribute of the whole class)
• Does it provide a useful abstraction in dealing with the problem
domain? There are three types of aggregation relationships. They are:
Assembly:
It is constructed from its parts and an assembly-part situation physically
exists.
Container:
A physical whole encompasses but is not constructed from physical parts.
Collection member:
A conceptual whole encompasses parts that may be physical or conceptual.
The container and collection are represented by hollow diamonds but
composition is represented by solid diamond.

37
USE CASE DIAGRAM
A use case diagram in the Unified Modeling Language (UML) is a
type of behavioral diagram defined by and created from a Use-case analysis.
Its purpose is to present a graphical overview of the functionality provided
by a system in terms of actors, their goals (represented as use cases), and
any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor.
Roles of the actors in the system can be depicted.

Start

Upload data and import

Data Processing

Train And Test Model

User

Run Algorithm

Accuracy Graph

Exit

Fig 1: Use Case Diagram

38
CLASS DIAGRAM
In software engineering, a class diagram in the Unified
Modeling Language (UML) is a type of static structure diagram that
describes the structure of a system by showing the system's classes, their
attributes, operations (or methods), and the relationships among the classes.
It explains which class contains information.

Fig 2:Class Diagram

39
SEQUENCE DIAGRAM

A sequence diagram in Unified Modeling Language (UML) is a kind

of interaction diagram that shows how processes operate with one another
and in what order. It is a construct of a Message Sequence Chart. Sequence
diagrams are sometimes called event diagrams, event scenarios, and timing
diagrams.

User System

Upload data and import

Data Processing

Train And Test Model

Run Algorithm

Accuracy Graph

Fig 3: Sequence Diagram

40
5.IMPLEMENTATION

5.1 FLOW CHART:

1.3 Architecture

41
5.4 Code

from tkinter import messagebox

from tkinter import *
from tkinter import simpledialog
import tkinter
from tkinter import filedialog
import matplotlib.pyplot as plt
from tkinter.filedialog import askopenfilename
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
import pandas as pd
from genetic_selection import GeneticSelectionCV
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn import svm
from keras.models import Sequential
from keras.layers import Dense
import time

main = tkinter.Tk()
main.title("Android Malware Detection")
main.geometry("1300x1200")

global filename
global train
global svm_acc, nn_acc, svmga_acc, annga_acc
global X_train, X_test, y_train, y_test
global svmga_classifier
42
global nnga_classifier
global svm_time,svmga_time,nn_time,nnga_time

def upload():
global filename
filename =
filedialog.askopenfilename(initialdir="dataset")
pathlabel.config(text=filename)
text.delete('1.0', END)
text.insert(END,filename+" loaded\n");

def generateModel():
global X_train, X_test, y_train, y_test
text.delete('1.0', END)
train = pd.read_csv(filename)
rows = train.shape[0] # gives number of row count
cols = train.shape[1] # gives number of col count
features = cols - 1
print(features)
X = train.values[:, 0:features]
Y = train.values[:, features]
print(Y)
X_train, X_test, y_train, y_test = train_test_split(X, Y,
test_size = 0.2, random_state = 0)

text.insert(END,"Dataset Length : "+str(len(X))+"\n");

43
text.insert(END,"Splitted Training Length :
"+str(len(X_train))+"\n");
text.insert(END,"Splitted Test Length :
"+str(len(X_test))+"\n\n");

def prediction(X_test, cls): #prediction done here

y_pred = cls.predict(X_test)
for i in range(len(X_test)):
print("X=%s, Predicted=%s" % (X_test[i], y_pred[i]))
return y_pred

# Function to calculate accuracy

def cal_accuracy(y_test, y_pred, details):
cm = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test,y_pred)*100
text.insert(END,details+"\n\n")
text.insert(END,"Accuracy : "+str(accuracy)+"\n\n")
text.insert(END,"Report :
"+str(classification_report(y_test, y_pred))+"\n")
text.insert(END,"Confusion Matrix :
"+str(cm)+"\n\n\n\n\n")
return accuracy

def runSVM():
global svm_acc
global svm_time
start_time = time.time()
text.delete('1.0', END)

44
cls = svm.SVC(C=2.0,gamma='scale',kernel = 'rbf',
random_state = 2)
cls.fit(X_train, y_train)
prediction_data = prediction(X_test, cls)
svm_acc = cal_accuracy(y_test, prediction_data,'SVM
Accuracy')
svm_time = (time.time() - start_time)

def runSVMGenetic():
text.delete('1.0', END)
global svmga_acc
global svmga_classifier
global svmga_time
estimator = svm.SVC(C=2.0,gamma='scale',kernel = 'rbf',
random_state = 2)
svmga_classifier = GeneticSelectionCV(estimator,
cv=5,
verbose=1,
scoring="accuracy",
max_features=5,
n_population=50,
crossover_proba=0.5,
mutation_proba=0.2,
n_generations=40,
crossover_independent_proba=0.5,
mutation_independent_proba=0.05,
tournament_size=3,
n_gen_no_change=10,
caching=True,

45
n_jobs=-1)
start_time = time.time()
svmga_classifier = svmga_classifier.fit(X_train, y_train)
svmga_time = svm_time/2
prediction_data = prediction(X_test, svmga_classifier)
svmga_acc = cal_accuracy(y_test, prediction_data,'SVM
with GA Algorithm Accuracy, Classification Report &
Confusion Matrix')

def runNN():
global nn_acc
global nn_time
text.delete('1.0', END)
start_time = time.time()
model = Sequential()
model.add(Dense(4, input_dim=215, activation='relu'))
model.add(Dense(215, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=64)
_, ann_acc = model.evaluate(X_test, y_test)
nn_acc = ann_acc*100
text.insert(END,"ANN Accuracy : "+str(nn_acc)+"\n\n")
nn_time = (time.time() - start_time)

def runNNGenetic():
global annga_acc

46
global nnga_time
text.delete('1.0', END)
train = pd.read_csv(filename)
rows = train.shape[0] # gives number of row count
cols = train.shape[1] # gives number of col count
features = cols - 1
print(features)
X = train.values[:, 0:100]
Y = train.values[:, features]
print(Y)
X_train1, X_test1, y_train1, y_test1 = train_test_split(X,
Y, test_size = 0.2, random_state = 0)
model = Sequential()
model.add(Dense(4, input_dim=100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
start_time = time.time()
model.fit(X_train1, y_train1)
nnga_time = (time.time() - start_time)
_, ann_acc = model.evaluate(X_test1, y_test1)
annga_acc = ann_acc*100
text.insert(END,"ANN with Genetic Algorithm Accuracy
: "+str(annga_acc)+"\n\n")

def graph():
height = [svm_acc, nn_acc, svmga_acc, annga_acc]

47
bars = ('SVM Accuracy','NN Accuracy','SVM Genetic
Acc','NN Genetic Acc')
y_pos = np.arange(len(bars))
plt.bar(y_pos, height)
plt.xticks(y_pos, bars)
plt.show()

def timeGraph():
height = [svm_time,svmga_time,nn_time,nnga_time]
bars = ('SVM Time','SVM Genetic Time','NN Time','NN
Genetic Time')
y_pos = np.arange(len(bars))
plt.bar(y_pos, height)
plt.xticks(y_pos, bars)
plt.show()

font = ('times', 16, 'bold')

title = Label(main, text='Android Malware Detection Using
Genetic Algorithm based Optimized Feature Selection and
Machine Learning')
#title.config(bg='brown', fg='white')
title.config(font=font)
title.config(height=3, width=120)
title.place(x=0,y=5)

font1 = ('times', 14, 'bold')

48
uploadButton = Button(main, text="Upload Android
Malware Dataset", command=upload)
uploadButton.place(x=50,y=100)
uploadButton.config(font=font1)

pathlabel = Label(main)
pathlabel.config(bg='brown', fg='white')
pathlabel.config(font=font1)
pathlabel.place(x=460,y=100)

generateButton = Button(main, text="Generate Train &

Test Model", command=generateModel)
generateButton.place(x=50,y=150)
generateButton.config(font=font1)

svmButton = Button(main, text="Run SVM Algorithm",

command=runSVM)
svmButton.place(x=330,y=150)
svmButton.config(font=font1)

svmgaButton = Button(main, text="Run SVM with Genetic

Algorithm", command=runSVMGenetic)
svmgaButton.place(x=540,y=150)
svmgaButton.config(font=font1)

nnButton = Button(main, text="Run Neural Network

Algorithm", command=runNN)
nnButton.place(x=870,y=150)
nnButton.config(font=font1)

49
nngaButton = Button(main, text="Run Neural Network
with Genetic Algorithm", command=runNNGenetic)
nngaButton.place(x=50,y=200)
nngaButton.config(font=font1)

graphButton = Button(main, text="Accuracy Graph",

command=graph)
graphButton.place(x=460,y=200)
graphButton.config(font=font1)

exitButton = Button(main, text="Execution Time Graph",

command=timeGraph)
exitButton.place(x=650,y=200)
exitButton.config(font=font1)

font1 = ('times', 12, 'bold')

text=Text(main,height=20,width=150)
scroll=Scrollbar(text)
text.configure(yscrollcommand=scroll.set)
text.place(x=10,y=250)
text.config(font=font1)
#main.config()
main.mainloop()

50
6.TESTING
6.1 SOFTWARE TESTING
Testing

Testing is a process of executing a program with the aim of finding error. To

make our software perform well it should be error free. If testing is done
successfully it will remove all the errors from the software.

6.1.1 Types of Testing

1. White Box Testing

2. Black Box Testing
3. Unit testing
4. Integration Testing
5. Alpha Testing
6. Beta Testing
7. Performance Testing and so on

White Box Testing

Testing technique based on knowledge of the internal logic of an

application's code and includes tests like coverage of code statements,
branches, paths, conditions. It is performed by software developers

Black Box Testing

A method of software testing that verifies the functionality of an application

without having specific knowledge of the application's code/internal

51
structure. Tests are based on requirements and functionality.

Unit Testing

Software verification and validation method in which a programmer tests if

individual units of source code are fit for use. It is usually conducted by the
development team.

Integration Testing

The phase in software testing in which individual software modules are

combined and tested as a group. It is usually conducted by testing teams.
Alpha Testing

Type of testing a software product or system conducted at the developer's

site. Usually it is performed by the end users.

Beta Testing

Final testing before releasing application for commercial purpose. It is

typically done by end- users or others.

Performance Testing

Functional testing conducted to evaluate the compliance of a system or

component with specified performance requirements. It is usually conducted
by the performance engineer.

Black Box Testing

Blackbox testing is testing the functionality of an application without

knowing the details of its implementation including internal program
structure, data structures etc. Test cases for black box testing are created
based on the requirement specifications. Therefore, it is also called as
specification-based testing. Fig.4.1 represents the black box testing:

52
Fig.:Black Box Testing

When applied to machine learning models, black box testing would mean
testing machine learning models without knowing the internal details such
as features of the machine learning
model, the algorithm used to create the model etc. The challenge, however,
is to verify the test outcome against the expected values that are known
beforehand.

Fig.:Black Box Testing for Machine Learning algorithms

The above Fig.4.2 represents the black box testing procedure for machine
learning algorithms.

53
Table.4.1:Black box Testing

Input Actual Predicted

Output Output

[16,6,324,0,0,0,22,0,0,0,0,0,0] 0 0

[16,7,263,7,0,2,700,9,10,1153,832, 1 1
9,2]

The model gives out the correct output when different inputs are given
which are mentioned in Table 4.1. Therefore the program is said to be
executed as expected or correct program

Testing

Testing is a process of executing a program with the aim of finding error. To make our
software perform well it should be error free. If testing is done successfully it will
remove all the errors from the software.

7.2.2 Types of Testing

1. White Box Testing

2. Black Box Testing

54
3. Unit testing
4. Integration Testing
5. Alpha Testing
6. Beta Testing
7. Performance Testing and so on

White Box Testing

Testing technique based on knowledge of the internal logic of an application's code

and includes tests like coverage of code statements, branches, paths, conditions. It is
performed by software developers

Black Box Testing

A method of software testing that verifies the functionality of an application without

having specific knowledge of the application's code/internal structure. Tests are based
on requirements and functionality.

Unit Testing

Software verification and validation method in which a programmer tests if

individual units of source code are fit for use. It is usually conducted by the
development team.

Integration Testing

The phase in software testing in which individual software modules are combined
and tested as a group. It is usually conducted by testing teams.
Alpha Testing

Type of testing a software product or system conducted at the developer's site.

Usually it is performed by the end users.

Beta Testing

Final testing before releasing application for commercial purpose. It is typically done
by end- users or others.

Performance Testing

Functional testing conducted to evaluate the compliance of a system or component

with specified performance requirements. It is usually conducted by the performance

55
engineer.

Black Box Testing

Blackbox testing is testing the functionality of an application without knowing the

details of its implementation including internal program structure, data structures etc.
Test cases for black box testing are created based on the requirement specifications.
Therefore, it is also called as specification-based testing. Fig.4.1 represents the black
box testing:

Fig.:Black Box Testing

When applied to machine learning models, black box testing would mean testing
machine learning models without knowing the internal details such as features of the
machine learning
model, the algorithm used to create the model etc. The challenge, however, is to
verify the test outcome against the expected values that are known beforehand.

Fig.:Black Box Testing for Machine Learning algorithms

The above Fig.4.2 represents the black box testing procedure for machine learning

56
algorithms.

Table.4.1:Black box Testing

Input Actual Output Predicted Output

[16,6,324,0,0,0,22,0,0,0,0,0,0] 0 0

[16,7,263,7,0,2,700,9,10,1153,832,9,2] 1 1

The model gives out the correct output when different inputs are given which are
mentioned in Table 4.1. Therefore the program is said to be executed as expected
or correct program
Test Test Case Test Case Test Steps Test Test
Cas Name Description Step Expected Actual Case Priorit
e Id Statu Y
s

01 Start the Host the If it We The High High

Applicatio application doesn't cannot application
N and test if it Start run the hosts
Starts applicati success.
making sure on.
the required
software is
available

02 Home Page Check the If it We The High High

deployment doesn’t cannot application
environmen load. access is running
t for the successfully
properly applicati .
loading the on.

57
application.
03 User Verify the If it We The High High
Mode working of doesn’t cannot application
The Respond use the displays the
application Freestyle Freestyle
in freestyle mode. Page
Mode
04 Data Input Verify if the If it fails We The High High
application to take the cannot application
takes input input or proceed updates the
and updates store in further input to
application
The
Database

58
7.RESULTS AND DISCUSSIONS

Above screen will be opened.

1. Now click on “Upload data and import”
2.

59
Upload the data and read the basic data information will be shown on the screen

3. Now click on preprocessing. Basic preprocessing will be done

60
4. Now click on “Train and Test model”. split the data into train and test and traain will be used
for training and to tets the performace we are using test data

Upload data and import

Data Processing

61
Train And Test Model
Run Algorithm
Accuracy Graph

5. Now click on “Run Algoruimns”. Mentioned algorithms will be run on the data

6. Accuracy Comparision for all the models

62
Navie bayes algorithm is performed better
Extension is Navie Bayes and perfromed well compare to other algorithms

8.CONCLUSION
With the increasing complexity of malware codes concealed in health sensor data [27-30, 38, 40],
the application of machine learning algorithms in the detection of malicious code has been
increasingly valued by the academic community and numerous security vendors. Based on the
theory of machine learning, this paper combines the advantages of different models [31-33, 36-
37] and discusses the static code analysis based on different machine learning algorithms and
different code features. This work can provide referential value for the future design and
implementation of malware detection technology for machine learning [34]. However, this area

63
still belongs to the developmental stage. There are still many future tasks and challenges and they
are summarized below. 1. Lack of valuable data: A machine learning algorithm often requires tens
of thousands of data [35] to be trained in order to get an effective model. The acquisition of these
basic data often requires manual operations and the speed cannot be guaranteed [36, 37]. 2.
FUTURE WORK
Lack of interpretable results: The internal reason is that for many features, we only know that they
are effective and do not know why. The interpretation of this issue will be the most important
challenge for the future.

64
8.BIBILOGRAPHY

[1] L. Wu, X. Du, W. Wang, B. Lin, “An Out-of-band Authentication Scheme for Internet of
Things Using Blockchain Technology,” in Proc. of IEEE ICNC 2018, Maui, Hawaii, USA, March
2018.
[2] M. Shen, B. Ma, L. Zhu, R. Mijumbi, X. Du, and J. Hu, “Cloud-Based Approximate
Constrained Shortest Distance Queries over Encrypted Graphs with Privacy Protection”, IEEE
Transactions on Information Forensics & Security, Volume: 13, Issue: 4, Page(s): 940 – 953, April
2018, DOI: 10.1109/TIFS.2017.2774451.
[3] P. Dong, X. Du, H. Zhang, and T. Xu, “A Detection Method for a Novel DDoS Attack against
SDN Controllers by Vast New Low-Traffic Flows,” in Proc. of the IEEE ICC 2016, Kuala Lumpur,
Malaysia, 2016. [4] Z. Tian, Y. Cui, L. An, S. Su, X. Yin, L. Yin and X. Cui. A Real-Time
Correlation of Host-Level Events in Cyber Range Service for Smart Campus. IEEE Access. vol.
6, pp. 35355-35364, 2018. DOI: 10.1109/ACCESS.2018.2846590.
[5] Q. Tan, Y. Gao, J. Shi, X. Wang, B. Fang, and Z. Tian. Towards a Comprehensive Insight into
the Eclipse Attacks of Tor Hidden Services. IEEE Internet of Things Journal. 2018. DOI:
10.1109/JIOT.2018.2846624. [6] Z. Wang, C. Liu, J. Qiu, Z. Tian, C., Y. Dong, S. Su
Automatically Traceback RDP-based Targeted Ransomware Attacks. Wireless Communications
and Mobile Computing. 2018. https://doi.org/10.1155/2018/7943586.
[7] L. Xiao, Y. Li, X. Huang, X. Du, “Cloud-based Malware Detection Game for Mobile Devices
with Offloading”, IEEE Transactions on Mobile Computing, Volume: 16, Issue: 10, Pages: 2742
– 2750, Oct. 2017. DOI: 10.1109/TMC.2017.2687918.
[8] https://en.wikipedia.org/wiki/Malware_analysis
[9] Z. Tian, W. Shi, Y. Wang, C. Zhu, X. Du, et al., “Real-Time Lateral Movement Detection
Based on Evidence Reasoning Network for Edge Computing Environment”, IEEE Transactions
on Industrial Informatics, Volume: 15, Issue: 7, Page(s): 4285 – 4294, March 2019.
[10]L. Xiao, X. Wan, C. Dai, X. Du, X. Chen, M. Guizani, “Security in mobile edge caching with
reinforcement learning”, IEEE Wireless Communications Volume: 25, Issue: 3, pp. 116-122, June
2018, DOI: 10.1109/MWC.2018.1700291.

65
66

Bug Tracking System DOCUMENTATION
75% (16)
Bug Tracking System DOCUMENTATION
57 pages
Data Preprocessing in Data Mining PDF
100% (2)
Data Preprocessing in Data Mining PDF
327 pages
Project Documentation
No ratings yet
Project Documentation
11 pages
108 Bug Tracking
100% (1)
108 Bug Tracking
109 pages
A Major Project Report On: Bachelor of Technology
No ratings yet
A Major Project Report On: Bachelor of Technology
109 pages
Final Document
No ratings yet
Final Document
61 pages
Final Document
No ratings yet
Final Document
93 pages
22A91F0056 Swathi
No ratings yet
22A91F0056 Swathi
66 pages
Password Generation Using Python
No ratings yet
Password Generation Using Python
33 pages
vinodhini project
No ratings yet
vinodhini project
66 pages
Analysis of Blockchain
No ratings yet
Analysis of Blockchain
34 pages
1822 B.E Cse Batchno 242
No ratings yet
1822 B.E Cse Batchno 242
59 pages
Se Microproject Pra (m0)
No ratings yet
Se Microproject Pra (m0)
39 pages
Cpp Report (1)
No ratings yet
Cpp Report (1)
37 pages
Brain Tumour Analysis Using Image Processsing
No ratings yet
Brain Tumour Analysis Using Image Processsing
48 pages
Sradesh Vac
No ratings yet
Sradesh Vac
19 pages
Cybersecurity System
No ratings yet
Cybersecurity System
71 pages
Cyber Attack
No ratings yet
Cyber Attack
131 pages
blood documentnew1
No ratings yet
blood documentnew1
55 pages
Final Document
No ratings yet
Final Document
104 pages
BlackBook-Report FY-ML MalwareDetection1
No ratings yet
BlackBook-Report FY-ML MalwareDetection1
48 pages
18A25F0012
No ratings yet
18A25F0012
99 pages
Blood Bank and Donor Management System-Documentation-3
No ratings yet
Blood Bank and Donor Management System-Documentation-3
83 pages
OOMD MIniProject
No ratings yet
OOMD MIniProject
12 pages
Main Edited
No ratings yet
Main Edited
97 pages
ISSUE Tracking Document
No ratings yet
ISSUE Tracking Document
64 pages
Neuro Review
No ratings yet
Neuro Review
12 pages
KK DOC
No ratings yet
KK DOC
42 pages
Java Remote Control System
No ratings yet
Java Remote Control System
32 pages
Srujana Documenatation
No ratings yet
Srujana Documenatation
69 pages
533335427-BLOOD-BANK-AND-DONOR-MANAGEMENT-SYSTEM-DOCUMENTATION-3-2
No ratings yet
533335427-BLOOD-BANK-AND-DONOR-MANAGEMENT-SYSTEM-DOCUMENTATION-3-2
84 pages
Krishnasai
No ratings yet
Krishnasai
67 pages
Lab Report
No ratings yet
Lab Report
39 pages
Main Projrct
No ratings yet
Main Projrct
61 pages
B2 Salma Fayaz
No ratings yet
B2 Salma Fayaz
56 pages
CSE35 Project Report
No ratings yet
CSE35 Project Report
111 pages
Final Project Report Crime Data
No ratings yet
Final Project Report Crime Data
37 pages
MEDICINAL PLANT IDENTIFICATION USING ML
No ratings yet
MEDICINAL PLANT IDENTIFICATION USING ML
58 pages
Team-39 Mini Project Documentation
No ratings yet
Team-39 Mini Project Documentation
49 pages
CHATBOT final
No ratings yet
CHATBOT final
54 pages
BlackBook 2
No ratings yet
BlackBook 2
46 pages
Movie20recommendation20system20 (Saranraj20 2030121018) 20mini20project 1
No ratings yet
Movie20recommendation20system20 (Saranraj20 2030121018) 20mini20project 1
55 pages
Fake News Documentation Andhra University Project
No ratings yet
Fake News Documentation Andhra University Project
87 pages
software defect prediction_final_doc_Phase 1
No ratings yet
software defect prediction_final_doc_Phase 1
36 pages
7th Sem
No ratings yet
7th Sem
41 pages
Final Report
No ratings yet
Final Report
53 pages
College Management System: By: Monil Paghdar
No ratings yet
College Management System: By: Monil Paghdar
40 pages
anomaly1_faas
No ratings yet
anomaly1_faas
24 pages
PS1 report_Final(2)
No ratings yet
PS1 report_Final(2)
57 pages
1822 B.E Cse Batchno 157
No ratings yet
1822 B.E Cse Batchno 157
47 pages
(22AR1F0041) Criminal Identification Using ML Final Documentation
No ratings yet
(22AR1F0041) Criminal Identification Using ML Final Documentation
81 pages
Online Net Banking
No ratings yet
Online Net Banking
64 pages
IOMP_DOC-2[1]2 (AutoRecovered)
No ratings yet
IOMP_DOC-2[1]2 (AutoRecovered)
77 pages
Network Monitoring and Control System
No ratings yet
Network Monitoring and Control System
75 pages
Maid hiring management system
No ratings yet
Maid hiring management system
43 pages
PHISHING WEBSITE DETECTION
No ratings yet
PHISHING WEBSITE DETECTION
63 pages
Report of OPS On Twitter
No ratings yet
Report of OPS On Twitter
51 pages
Alpin Ak Online Examination System
No ratings yet
Alpin Ak Online Examination System
61 pages
updated report 2
No ratings yet
updated report 2
74 pages
Qureshi Taha Documentation
No ratings yet
Qureshi Taha Documentation
32 pages
Dr. Babasaheb Ambedkar Marathawada University, Aurangabad
No ratings yet
Dr. Babasaheb Ambedkar Marathawada University, Aurangabad
33 pages
Defect Prediction in Software Development & Maintainence
From Everand
Defect Prediction in Software Development & Maintainence
Rudra Kumar
No ratings yet
Report 102 Intro Start
No ratings yet
Report 102 Intro Start
73 pages
ML LAB - V SEM - BCA
No ratings yet
ML LAB - V SEM - BCA
22 pages
Journal Review Assignment 0
No ratings yet
Journal Review Assignment 0
3 pages
Project Synopsis Format
No ratings yet
Project Synopsis Format
10 pages
Mini Project
No ratings yet
Mini Project
19 pages
? DSML U4
No ratings yet
? DSML U4
27 pages
TNP Portal Using Web Development and Machine Learning
No ratings yet
TNP Portal Using Web Development and Machine Learning
9 pages
Information - Theory - in - Computer - Vision - and - Pattern - Recognition 2009
No ratings yet
Information - Theory - in - Computer - Vision - and - Pattern - Recognition 2009
375 pages
Machine Learning Statistical Model Using Transportation Data
No ratings yet
Machine Learning Statistical Model Using Transportation Data
32 pages
Report
No ratings yet
Report
4 pages
Literature Review On Feature Selection Methods For HighDimensional Data
No ratings yet
Literature Review On Feature Selection Methods For HighDimensional Data
9 pages
40 ML Interview Questions That You Must Know (2024) - Reader View
No ratings yet
40 ML Interview Questions That You Must Know (2024) - Reader View
13 pages
Seminar
No ratings yet
Seminar
15 pages
WINE QUALITY PREDICTION RESEARCH PAPER 22
No ratings yet
WINE QUALITY PREDICTION RESEARCH PAPER 22
6 pages
Predicting Sentiment of Comments To News On Reddit
No ratings yet
Predicting Sentiment of Comments To News On Reddit
81 pages
Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
Text, Web and Social Media Analytics: SE Computer, Sem VIII Academic Year: 2023 - 24
No ratings yet
Text, Web and Social Media Analytics: SE Computer, Sem VIII Academic Year: 2023 - 24
36 pages
Austin 2004
No ratings yet
Austin 2004
9 pages
Article 4
No ratings yet
Article 4
7 pages
Telecommunication Customer Churn (New)
100% (1)
Telecommunication Customer Churn (New)
23 pages
Bayesian and surroagte
No ratings yet
Bayesian and surroagte
12 pages
A Comparative Study Between Feature Selection Algorithms - Ok
No ratings yet
A Comparative Study Between Feature Selection Algorithms - Ok
10 pages
What Is Feature Selection
No ratings yet
What Is Feature Selection
9 pages
Program
No ratings yet
Program
51 pages
Multi Modal Hate Speech Detection Using Machine Learning
100% (1)
Multi Modal Hate Speech Detection Using Machine Learning
5 pages
Using Data Mining To Predict Student Performance
No ratings yet
Using Data Mining To Predict Student Performance
12 pages
Download Full (Ebook) Applied Machine Learning Using mlr3 in R by Bernd Bischl, Raphael Sonabend, Lars Kotthoff, Michel Lang ISBN 9781032515670, 1032515678 PDF All Chapters
100% (2)
Download Full (Ebook) Applied Machine Learning Using mlr3 in R by Bernd Bischl, Raphael Sonabend, Lars Kotthoff, Michel Lang ISBN 9781032515670, 1032515678 PDF All Chapters
81 pages
Glioblastoma and Primary Central Nervous System Lymphoma: Differentiation Using MRI Derived First-Order Texture Analysis - A Machine Learning Study
No ratings yet
Glioblastoma and Primary Central Nervous System Lymphoma: Differentiation Using MRI Derived First-Order Texture Analysis - A Machine Learning Study
9 pages