UNIT-4
UNIT-4
UNIT-4
IMPLEMENTATION OF IOT USING RASPBERRY PI 2
Introduction to SDN 10
SDN architecture 12
Difference between SDN and conventional architecture 18
Keyelements of SDN 19
Challenges to implement SDN 21
SDN for IOT 25
HARDWARE IMPLEMENTATION:
The system that implements the Internet of Things includes clusters of
hardware components that we are familiar with. Firstly, we need a host like a
Personal Computer or a mobile phone that can be used to pass commands to
a remotely operable device. As the brain of the system we are using a
Raspberry Pi that can be used to control and obtain a desired result from a
device. The “things” that we use here are basically day-to-day objects like a
bulb, a fan, a washing machine etc., Our intention is to show the operation of
the Internet of Things in a concise way.
SOFTWARE IMPLEMENTATION:
Hardware without proper software is nothing but a piece a brick. When it
comes to Raspberry Pi, an OS must be installed to control and configure it.
And in the case of the Daughter Board, python scripts are to be coded to
work with the “things”. We have,a communications platform for IOT devices
that enables device setup and user interaction from mobile devices and the
web, can be used to accomplish communication between Host device and
the Raspberry Pi.
Implementation of IoT
with Raspberry Pi
Program1
IOT: Remote Data Logging
System Overview:
1. A network of Temperature and humidity
sensor connected withRaspberry Pi
2. Read data from the sensor
3. Send it to a Server
4. Save the data in the server
Requirements
DHT Sensor
4.7K ohm
resistor Jumper
wires
Raspberry Pi
4
5
Program2
DataProcessing Technique
Introduction to SDN
Software-defined networking (SDN) is an
architecture designed to make a network more flexible and
easier to manage. SDN centralizes management by
abstracting the control plane from the data forwarding
function in the discrete networking devices.
SDN elements
An SDN architecture delivers a centralized, programmable network and
consists of the following:
A controller, the core element of an SDN architecture, that
enables centralized management and control, automation, and
policy enforcement across physical and virtual network
environments
Southbound APIs that relay information between the controller and
the individual network devices (such as switches, access points,
routers, and firewalls)
Northbound APIs that relay information between the controller and
the applications and policy engines, to which an SDN looks like a
single logical network device
Origin of SDN
3. Management Plane
The management plane is used for access and management of our network
devices. For example, accessing our device through telnet, SSH or the
console port.
When discussing SDN, the control and data plane are the most important to
keep in mind. Here’s an illustration of the control and data plane to help you
visualize the different planes:
12
SDN Architecture
SDN attempts to create network architecture that is simple,
inexpensive, scalable, agile and easy to manage. The below figure shows the
SDN architecture and SDN layers in which the control plane and data plane are
decoupled and the network controller is centralized. SDN controller maintains
a unified view of the network and make configurations, management and
provisioning simpler. The underlying infrastructure in SDN uses simple packet
forwarding hardware.
Network devices become simple with SDN as they do not require
implementations of a large no of protocols. Network devices receive
instructions from SDN controller on how to forward packets. These devices can
be simpler and costless as they can be built from standard hardware and
software components.
SDN architecture separates the network into three
distinguishable layers, i.e., applications communicate with the control layer
using northbound API and control layer communicates with data plane using
southbound APIs. The control layer is considered as the brain of SDN. The
intelligence to this layer is provided by centralized SDN controller software.
This controller resides on a server and manages policies and the flow of traffic
throughout the network. The physical switches in the network constitute the
infrastructure layer.
different types of rules and policies to the infrastructure layer through the
southbound interface [1].
Third, the data plane, also known as the infrastructure layer, represents the
forwarding devices on the network (routers, switches, load balancers, etc.). It
uses the southbound APIs to interact with the control plane by receiving the
forwarding rules and policies to apply them to the corresponding devices [1].
Fourth, the northbound interfaces that permit communication between the
control layer and the management layer are mainly a set of open source
application programming interfaces (APIs) [1].
Fifth, the east-west interfaces, which are not yet standardized, allow
communication between the multiple controllers. They use a system of
notification and messaging or a distributed routing protocol like BGP and OSPF.
Sixth, the southbound interfaces allow interaction between the control plane
and the data plane, which can be defined summarily as protocols that permit
the controller to push policies to the forwarding plane. The OpenFlow protocol
is the most widely accepted and implemented southbound API for SDN-
enabled networks.
OpenFlow is normalized by the Open Networking
Foundation (ONF) OpenFlow is just an instantiation of SDN
14
(1) identify and categorize packets from an ingress port based on a various
packet header fields.
(2) Process the packets in various ways, including modifying the header; and,
Each Flow entries contains the match fields , counters and set of
instructions to apply matching packets
21
A rule placement solution deals with which rules must be deployed in the
network and where.
the switches.
Controller placement
Backup Controller
Types of Data
Structured data
1. Data that can be easily organized.
2. Usually stored in relational databases.
3. Structured Query Language (SQL) manages structured
data in databases.
30
Volume
Quantity of data that is generated
Sources of data are added continuously
Example of volume ‐
1. 30TB of images will be generated every night from the Large
Synoptic Survey Telescope
31
(LSST)
2.72 hours of video are uploaded to YouTube every minute
Velocity
*Refers to the speed of generation of data
*Data processing time decreasing day‐by‐day in order to
provide real‐time services
*Older batch processing technology is unable to handle high
velocity of data
Example of velocity –
1.140 million tweets per day on average (according to a survey
conducted in 2011)
2. New York Stock Exchange captures 1TB of trade information
during each trading
Session
Variety
* Refers to the category to which the data belongs
*No restriction over the input data formats
*Data mostly unstructured or semi‐structured
Example of variety –
1.Pure text, images, audio, video, web, GPS data, sensor data,
SMS, documents, PDFs, flash
etc.
Variability
*Refers to data whose meaning is constantly changing.
*Meaning of the data depends on the context.
*Data appear as an indecipherable mass without structure
Example:
Language processing, Hashtags, Geo‐spatial data, Multimedia,
Sensor events
Veracity
*Veracity refers to the biases, noise and abnormality in data.
32
Flow of data
Enterprise data
Online trading and analysis data.
/Production and inventory data.
Sales and other financial data.
IoT data
Data from industry, agriculture,traffic, transportation
Medical‐care data,
Data from public departments, and
families.
Bio‐medical data
Masses of data generated by gene
sequencing.
Data from medical clinics and medical
R&Ds.
Other fields
Fields such as – computational biology,
astronomy, nuclear research etc
34
Data Acquisition
Data collection
Log files or record files that are automatically generated by data sources to
record activities for further analysis, that has been collected from devices like
Sensory data such as sound wave, voice, vibration, automobile, chemical,
current, weather, pressure, temperature etc , and even Complex and variety of
data collection through mobile devices. E.g. – geographical location, 2D
barcodes, pictures, videos etc.
Data transmission
1. After collecting data, it will be transferred to storage system for further
processing and analysis of the data.
2. Data transmission can be categorized as – Inter‐DCN transmission and
Intra‐ DCN transmission
Data pre‐processing
1.Collected datasets suffer from noise, redundancy, inconsistency etc., thus,
preprocessing of data is necessary.
2. Pre‐processing of relational data mainly follows – integration, cleaning,
and redundancy mitigation
3. Integration is combining data from various sources and provides users with
a uniform view of data.
4.Cleaning is identifying inaccurate, incomplete, or unreasonable data,
and then modifying or deleting such data.
5. Redundancy mitigation is eliminating data repetition through
detection, filtering and compression of data to avoid unnecessary
transmission. Data Storage : Data can be stored in
Filesystems or Databases File system
1. Distributed file systems that store massive data and ensure –
consistency, availability,and fault tolerance of data.
2.GFS is a notable example of distributed file system that supports large‐
scale file system, though it’s performance is limited in case of small files
3. Hadoop Distributed File System (HDFS) and Kosmosfs are other
notable file systems, derived from the open source codes of GFS.
Databases
1.Emergence of non‐traditional relational databases (NoSQL) in order to
deal with the characteristics that big data possess i.e unstructured data .
2.Nosql uses 3 different type of databases
1. keyvalue database
2. coloumn oriented database
35
Hadoop History
Hadoop Ecosystem
Hadoop Framework was developed to store and
process data with a simple programming model in a distributed
data processing environment. The data present on different high-
speed and low-expense machines can be stored and analyzed.
Enterprises have widely adopted Hadoop as Big Data
Technologies for their data warehouse needs in the past year.
The trend seems to continue and grow in the coming year as
well. Companies that have not explored Hadoop so far will most
likely see its advantages and applications.
Hadoop Common – the libraries and utilities used by other Hadoop modules.
Hadoop Distributed File System (HDFS) – the Java-based scalable system that
stores data across multiple machines without prior organization.
39
HCatalog A table and storage management layer that helps users share and access data.
A data warehousing and SQL-like query language that presents data in the
Hive
form of tables. Hive programming is similar to database programming.
A platform for manipulating data stored in HDFS that includes a compiler for
MapReduce programs and a high-level language called Pig Latin. It provides a
Pig
way to perform data extractions, transformations and loading, and basic
analysis without having to write MapReduce programs.
A connection and transfer mechanism that moves data between Hadoop and
Sqoop
relational databases.
40
Job Tracker –
Runs with the Namenode
Receives the user’s job
Decides on how many tasks will run (number of mappers)
Decides on where to run each mapper
Task Tracker –
Runs on each datanode
Receives the task from Job Tracker
Always in communication with the Job
Tracker reporting progress
42
Hadoop Master/Slave
Architecture Master‐slave shared‐
nothing architecture Master
Executes operations like opening, closing, and renaming
files and directories.
Determines the mapping of blocks to Datanodes.
Slave
Serves read and write requests from the file system’s
clients.
Performs block creation, deletion, and replication as
instructed by the Namenode
43
YARN:
Yet Another Resource Negotiator, as the name implies,
YARN is the one who helps to manage the resources across
the clusters. In short, it performs scheduling and resource
allocation for the Hadoop System.
Consists of three major components i.e.
1. Resource Manager
2. Nodes Manager
3. Application Manager
Resource manager has the privilege of allocating resources
for the applications in a system whereas Node managers
work on the allocation of resources such as CPU, memory,
bandwidth per machine and later on acknowledges the
resource manager. Application manager works as an
interface between the resource manager and node
manager and performs negotiations as per the
MapReduce:
By making the use of distributed and parallel algorithms,
MapReduce makes it possible to carry over the
processing’s logic and helps to write applications which
transform big data sets into a manageable one.
MapReduce makes the use of two functions
i.e. Map() and Reduce() whose task is:
1. Map() performs sorting and filtering of data and
thereby organizing them in the form of group. Map
44
Data Analytics
What is Data Analytics
“Data analytics (DA) is the process of examining data sets
in order to draw conclusions about the information they contain, increasingly
with the aid of specialized systems and software. Data analytics technologies
and techniques are widely used in commercial industries to enable
organizations to make moreinformed business decisions and by scientists and
researchers to verify or disprove scientific models, theories and hypotheses.”
1. Statistical models
Statistical modeling is the process of applying
statistical analysis to a dataset. A statistical model is a
mathematical representation (or mathematical model) of
observed data.
2. Analysis of variance
Analysis of Variance (ANOVA) is a parametric statistical technique
used to compare datasets.
ANOVA is best applied where more than 2 populations or samples
are meant to be compared.
To perform an ANOVA, we must have a continuous response
variable and at least one categorical factor (e.g. age, gender) with
two or more levels (e.g. Locations 1, 2)
ANOVAs require data from approximately normally distributed
populations
Analysis of variance
Total Sum of square
In statistical data analysis, the total sum of squares (TSS or SST) is a
quantity that
appears as part of a standard way of presenting results of such analyses. It is
defined
49
as being the sum, over all observations, of the squared differences of each
observation from the overall mean.
F –ratio
Helps to understand the ratio of variance between two data sets
The F ratio is approximately 1.0 when the null hypothesis is true and is
greater than
1.0 when the null hypothesis is false.
Degree of freedom
Factors which have no effect on the variance
The number of degrees of freedom is the number of values in the final
calculation of a
statistic that are free to vary.
3.Data dispersion
A measure of statistical dispersion is a nonnegative real number that is
zero if all the data are the same and increases as the data becomes more
diverse.
Examples of dispersion measures:
Range
Average absolute deviation
Variance and Standard deviation
Range
The range is calculated by simply taking the difference between the
maximum and minimum values in the data set.
Average absolute deviation
The average absolute deviation (or mean absolute deviation) of a
data set is the average of the absolute deviations from the mean.
Variance
Variance is the expectation of the squared deviation of a random
variable from its mean
Standard deviation
Standard deviation (SD) is a measure that is used to quantify the
amount of variation or dispersion of a set of data values.
50
5. Regression analysis
Statistical significance
1. Statistical significance is the likelihood that the difference in
conversion rates between a given variation and the baseline is not
due to random chance
2.Statistical significance level reflects the risk tolerance and
confidence level
3. There are two key variables that go into determining statistical
significance:
Sample size
Effect size
4.Sample size refers to the sample size of the experiment
5. The larger your sample size, the more confident you can be in the
result of the experiment (assuming that it is a randomized sample)
6.The effect size is just the standardized mean difference between
the two groups
7. If a particular experiment replicated, the different effect size
estimates from each study can easily be combined to give an overall
best estimate of the effect size