0% found this document useful (0 votes)
16 views4 pages

Cyber-threat

The document presents a synopsis on exploring open source information for cyber threat intelligence, focusing on the use of machine learning techniques to detect and classify cyber threats from Twitter data. It outlines the goals, objectives, and features of the proposed system, which utilizes various algorithms such as SVM, Decision Trees, Naive Bayes, Random Forest, and Artificial Neural Networks for analysis. The project aims to provide meaningful insights into new patterns of cyber-attacks and security threats through the analysis of social media data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views4 pages

Cyber-threat

The document presents a synopsis on exploring open source information for cyber threat intelligence, focusing on the use of machine learning techniques to detect and classify cyber threats from Twitter data. It outlines the goals, objectives, and features of the proposed system, which utilizes various algorithms such as SVM, Decision Trees, Naive Bayes, Random Forest, and Artificial Neural Networks for analysis. The project aims to provide meaningful insights into new patterns of cyber-attacks and security threats through the analysis of social media data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

DEPARTMENT OF COMPUTER ENGINEERING

G.H.Raisoni College Of Engineering & Management, Pune


2021-2022

SYNOPSIS ON

EXPLORING OPEN SOURCE INFORMATION FOR CYBER


THREAT INTELLIGENCE
Submitted for partial fulfilment of the Requirement for the S.Y.M. Tech in

COMPUTER ENGINEERING
Submitted By:

Aishwarya Bhagwat (2020ACRE2101001)

Sign of Guide
Abstract
Cyberspace is one of the most complicated systems ever created by humanity; many people use cyber-
technology resources on a daily basis, yet the bulk of them have little understanding of it. To use of social
media cannot replace the requirement for security experts to conduct in-depth analyses of specific sorts
of attacks, such as detecting anomalies in network traffic, worms, and port scans, among other things.
Analysing social media data, on the other hand, can help discover new patterns of cyber threat and
security threats including data theft, carding, and hijacking. We used machine learning to predict cyber
threat in the proposed system. The best model is created by training a dataset of Twitter cyber-Threat
using the SVM, NB, DT, RF and ANN algorithms. Used best model for predicting cyber threats and
which categories.

Goals and Objectives


 To Detect Cyber Threat using machine learning techniques.
 To classify and Train dataset of cyber threat twitter dataset using Different Machine
Learning algorithm.
 To analysing social media data can provide meaningful insights in detecting new patterns
of cyber-attack and security threats such as data breach, carding, and hijacking.
 In tweets keywords includes username of selected cybersecurity organizations, list of
buzzwords related to cybersecurity terms (‘ciphertext’, ‘cryptography’, ‘hacked’, ‘breach’,
‘sniffer’, ‘firewall’, ‘hijacking’,‘Clickjacking’, ‘Malware’,‘Sphear phising’, ‘virus’, and
‘vulnerability’) from cybersecurity domain experts.

Features of System:
 Preparing the dataset
 Data Pre-processing
 Feature extraction
 Classification using Algorithm

Technologies and Tools

 Python
 Scikit-learn
 Pandas
 SVM
 DT
 NB
 RF
 ANN

Cyber Threat Intelligence project


This is divided into 3 parts:

1. Creating the dataset


2. Training a Different ML on the Twitter Cyber-threat dataset
3. Predicting the Tweets (Display Cyber-Threat Categories)

Dataset:-
In proposed system we have collect dataset of twitter (related to cyber threats) on kaggle website.
In Dataset A list of keywords was selected to filter the tweets retrieved from the stream listener.
These keywords includes username of selected cybersecurity organizations, list of buzzwords
related to cybersecurity terms (‘ciphertext’, ‘cryptography’, ‘hacked’, ‘breach’, ‘sniffer’,
‘firewall’, ‘hijacking’,‘Clickjacking’, ‘Malware’,‘Sphear phising’, ‘virus’, and ‘vulnerability’)
from cybersecurity domain experts.
Algorithm -
SVM (Support Vector Machine) :-
Support Vector Machine (SVM) is a controlled approach for machine learning that is suitable for both
classification and regression difficulties. It is employed largely in classification issues, however. Each
data item is defined in the SVM algorithm n-dimensional space point (where n is a number of features)
each feature value is the value of a specific coordinate. Then we carry out Support Vectors are merely
individual observation coordinates. The SVM is a boundary between both the two classes (hyper planes
/ rows). Categorization by finding the hyper-plane that distinguishes the classes very well.
DT (Decision Tree):
The goal of using a Decision Tree is to create a training model that can use to predict the class or value
of the target variable by learning simple decision rules inferred from prior data (training data). In
Decision Trees, for predicting a class label for a record we start from the root of the tree.

NB (Nave Bayes):
The number of parameters required by Nave Bayes classifiers is linear in the number of variables
(features/predictors) in a learning problem. Maximum-likelihood training can be done in linear time by
evaluating a closed-form expression, rather than the time-consuming iterative approximation required by
many other forms of classifiers.
RF (Random Forest):
Random forest is a supervised learning algorithm which is used for both classification as well as
regression. But however, it is mainly used for classification problems. As we know that a forest is made
up of trees and more trees means more robust forest. Similarly, random forest algorithm creates decision
trees on data samples and then gets the prediction from each of them and finally selects the best solution
by means of voting. It is an ensemble method which is better than a single decision tree because it reduces
the over-fitting by averaging the result.
ANN:
An Artificial Neural Network is an information processing technique. It works like the way human brain
processes information. ANN includes a large number of connected processing units that work together
to process information. They also generate meaningful results from it.
Artificial Neural network is typically organized in layers. Layers are being made up of many
interconnected ‘nodes’ which contain an ‘activation function’. A neural network may contain the
following 3 layers: a. Input layer, b. Hidden layer and c. Output layer.

REFERENCES:
[1] Wang, S. (2010). Crawling Deep Web using a GA-based set covering algorithm.
[2] Zhou, S., Long, Z., Tan, L., & Guo, H. (2018). Automatic identification of indicators of
compromise using neural-based sequence labelling. arXiv preprint arXiv:1810.10156.
[3] Guo, M.,& Wang, J. A. (2009, April). An ontology-based approach to model common
vulnerabilities and exposures in information security. In ASEE Southest Section Conference.
[4] Ninth Annual Cost if Cybercrime Study unlocking The Value of Improved Cybersecurity
Protection .The Cost of Cybercrime Contents.
[5] Ranade, P., Mittal, S., Joshi, A., & Joshi, K. (2018, November). Using deep neural networks
to translate multi-lingual threat intelligence. In 2018 IEEE International Conference on
Intelligence and Security Informatics (ISI) (pp. 238-243). IEEE.
[6] Dong, Y., Guo, W., Chen, Y., Xing, X., Zhang, Y., & Wang, G. (2019). Towards the detection
of inconsistencies in public security vulnerability reports. In 28th USENIX Security Symposium
(USENIX Security 19) (pp. 869-885).
[7] Rodriguez, A., & Okamura, K. (2020). Social Media Data Mining for Proactive Cyber Defense.
Journal of Information Processing, 28, 230- 238.

You might also like