IoT PI
IoT PI
INTERNET OF THINGS
BY
P.INDURANI, M.Sc.,M.Phil.,(Ph.D).,
2019-2020
1
INTERNET OF THINGS – (II M.Sc CS – SEM III)
UNIT - I
UNIT – II
IoT and M2M: Introduction-M2M Difference between ICA and M2M-SDN and NFV for
IoT-IoT System Management with NETCONE-YANG: Need for IoT Systems Management
Simple Network Management Protocol (SNMP)-Network Operator Requirements-NETCONF-
YANG- IoT Systems Management with NETCONF_YANG.
UNIT – III
UNIT – IV
IoT Physical devices & Endpoints: What is an IoT Device-Exemplary Devices- About the
Board-Linux on Raspberry Pi-Raspberry Pi Interfaces- Raspberry Pi with Python there IoT
devices. IoT Physical Servers & Cloud Offer-fags Introduction to Cloud Storage models and
communication APIs –WAMP- autobhan for IoT-Xively cloud for IoT – Phython web
application frame work- Django- Designing a restful Web API-Amazon web services for IOT-
skynet IoT-Messaging platform
UNIT – V
2
INTERNET OF THINGS – (II Msc CS – SEM III)
UNIT - I
The goal of IoT is to extend to internet connectivity from standard devices like computer,
mobile, tablet to relatively dumb devices like a toaster. IoT makes virtually everything "smart,"
by improving aspects of our life with the power of data collection, AI algorithm, and networks.
The thing in IoT can also be a person with a diabetes monitor implant, an animal with
tracking devices, etc.
The entire IOT process starts with the devices themselves like smartphones,
smartwatches, electronic appliances like TV, Washing Machine which helps you to communicate
with the IOT platform.
1) Sensors/Devices: Sensors or devices are a key component that helps you to collect live data
from the surrounding environment. All this data may have various levels of complexities. It
could be a simple temperature monitoring sensor, or it may be in the form of the video feed.
2) Connectivity: All the collected data is sent to a cloud infrastructure. The sensors should be
connected to the cloud using various mediums of communications. These communication
mediums include mobile or satellite networks, Bluetooth, WI-FI, WAN, etc.
3
3) Data Processing: Once that data is collected, and it gets to the cloud, the software performs
processing on the gathered data. This process can be just checking the temperature, reading on
devices like AC or heaters. However, it can sometimes also be very complex like identifying
objects, using computer vision on video.
4)User Interface: The information needs to be available to the end-user in some way which can
be achieved by triggering alarms on their phones or sending them notification through email or
text message. The user sometimes might need an interface which actively checks their IOT
system. For example, the user has a camera installed in his home. He wants to access video
recording and all the feeds with the help of a web server.
Challenges of IoT
Advantages of IoT
4
Key benefits of IoT technology are as follows:
Disadvantages IOT
5
Physical design
Designing for the Internet of Things (IoT) is the designing of connected products. IoT
systems combine physical and digital components that collect data from physical devices and
deliver actionable, operational insights. These components include: physical devices, sensors,
data extraction and secured communication, gateways, cloud servers, analytics, and dashboards.
Product and engineering teams designing IoT systems, the core challenge lies in
taking IoT use cases and turning them into a connected system – with full integration, the right
IoT communication protocols, security, and a user-friendly look and feel. For industrial
manufacturing, IoT product design is also known as Industry 4.0 design.
Device: An IoT system comprises of devices that provide sensing, actuation, monitoring and
control functions.
Communication: Handles the communication for the IoT system.
Services: services for device monitoring, device control service, data publishing services and
services for device discovery.
Management: This block provides various functions to govern the IoT system.
6
Security: This block secures the IoT system and by providing functions such as authentication ,
authorization, message and content integrity, and data security.
Application: This is an interface that the users can use to control and monitor various aspects of
the IoT system. Application also allows users to view the system status and view or analyze the
processed data.
HTTP works as a request-response protocol between a client and server. A web browser
may be the client, and an application on a computer that hosts a web site may be the server.
Example: A client (browser) submits an HTTP request to the server; then the server
returns a response to the client. The response contains status information about the request and
may also contain the requested content.
7
Publish-Subscribe Model
Publish-Subscribe is a communication model that involves publishers, brokers and
consumers. Publishers are the source of data. Publishers send the data to the topics which are
managed by the broker. Publishers are not aware of the consumers. Consumers subscribe to the
topics which are managed by the broker. When the broker receives data for a topic from the
publisher, it sends the data to all the subscribed consumers.
Push-Pull Model
Push-Pull is a communication model in which the data producers push the data to queues
and the consumers pull the data from the Queues. Producers do not need to be aware of the
consumers. Queues help in decoupling the messaging between the Producers and Consumers.
Queues also act as a buffer which helps in situations when there is a mismatch between the rate
at which the producers push data and the rate rate at which the consumer pull data.
8
Exclusive Pair Model
Exclusive Pair is a bidirectional, fully duplex communication model that uses a persistent
connection between the client and server. Connection is setup it remains open until the client
sends a request to close the connection. Client and server can send messages to each other after
connection setup. Exclusive pair is stateful communication model and the server is aware of all
the open connections.
Routers are responsible for routing the data packets from end-nodes to the coordinator.
The coordinator collects the data from all the nodes.
Weather monitoring system use WSNs in which the nodes collect temperature humidity
and other data which is aggregated and analyzed.
Indoor air quality monitoring systems use WSNs to collect data on the indoor air quality
and concentration of various gases
Soil moisture monitoring system use WSNs to monitor soil moisture at various locations.
Surveillance system use WSNs for collecting Surveillance data Smart grid use WSNs for
monitoring the grid at various points.
9
Structural health monitoring system use WSNs to monitor the health of structures by
collecting vibration data from sensor nodes de deployed at various points in the structure.
Cloud Computing
Cloud computing is a trans-formative computing paradigm that involves delivering
applications and services over the Internet Cloud computing involves provisioning of computing,
networking and storage resources on demand and providing these resources as metered services
to the users, in a “pay as you go” model. C loud computing resources can be provisioned on
demand by the users, without requiring interacyions with the cloud service Provider.
10
organizations to better understand the information contained within the data and will also help
identify the data that is most important to the business and future business decisions.
Analysts working with Big Data typically want the knowledge that comes from analyzing
the data.
Some examples of big data generated by IoT systems are described as follows:
11
Embedded Systems
As its name suggests, Embedded means something that is attached to another thing. An
embedded system can be thought of as a computer hardware system having software embedded
in it. An embedded system can be an independent system or it can be a part of a large system. An
embedded system is a controller programmed and controlled by a real-time operating system
(RTOS) with a dedicated function within a larger mechanical or electrical system, often with
real-time computing constraints. It is embedded as part of a complete device often including
hardware and mechanical parts. Embedded systems control many devices in common use today.
Ninety-eight percent of all microprocessors are manufactured to serve as embedded system
component.
It has hardware.
It has application software.
It has Real Time Operating system (RTOS) that supervises the application software and
provide mechanism to let the processor run a process as per scheduling by following a plan to
control the latencies. RTOS defines the way the system works. It sets the rules during the
execution of application program. A small scale embedded system may not have RTOS.
We believe that the real effort of any validation process. You have to spend the efforts on
the validating and analysing the result data and not on setting up the environment. Hence, IOT
environment needs a mass network setup to test and demo the manager applications and test
interoperability of devices.
IOT Templates
With the new update, you will be able to now spend very little time to just create a
template. The template supports specifying a replaceable place holders strings using the client
12
names. Besides that, The template definition allows you to add place holders in WILL Topic,
subscription topics, publishing topics and topics used in behavior pattern.
We can also use pattern in the Data. However, the IOT simulator already
supports dynamic values for the text and JSON based messages. Besides that We will be able to
create tens of thousands of unique devices with unique topic and messages within a few minutes.
The validation of the manager application process have multiple process. However, we
need to review different functional implementation, performance, etc. Keeping this in mind, we
have added options to store multiple simulated networks. These networks can be persisted and
reused on demand.
Making homes and cities smart with Internet of Things
Internet of Things forms the backbone of digital services that can help build smart homes
and smart cities Internet of Things (IoT) has given rise to always-connected smart devices that
are fast becoming a key member of the modern household. It is estimated that by 2025, over 55
billion devices will be in use globally—across smart homes and businesses. These IoT-enabled
smart devices have disrupted the way we work and interact as a society.
13
sector has begun using domotics—a term used to define automation for a smart home. It
comprises a home automation system which is used to control the lights, entertainment systems,
appliances as well as the climate of the house.
Home Automation IoT applications for smart homes: o Smart Lighting o Smart
Appliances o Intrusion Detection o Smoke / Gas Detectors presented controllable LED lighting
system that is embedded with ambient intelligence gathered from a distributed smart WSN to
optimize and control the lighting system to be more efficient and user-oriented.
Paper: Energy-aware wireless sensor network with ambient intelligence for smart LED lighting
system control Wireless-enabled and Internet connected lights can be controlled remotely from
IoT applications such as a mobile or web application.
Key enabling technologies for smart lighting include: Solid state lighting (such as LED lights)
- IP-enabled lights
Smart lighting achieve energy savings by sensing the human movements and their
environments and controlling the lights accordingly.
14
UNIT – II
IoT and M2M: Introduction-M2M Difference between ICA and M2M-SDN and NFV for
IoT-IoT System Management with NETCONE-YANG: Need for IoT Systems Management
Simple Network Management Protocol (SNMP)-Network Operator Requirements-NETCONF-
YANG- IoT Systems Management with NETCONF_YANG.
Therefore, getting the hang of the differences between the Internet of Things and
Machine-To-Machine (M2M) communication, which is the underlying concept that gave rise to
the IoT as we know it, seems to be an essential element in understanding what the whole IoT
scene is all about nowadays.
While many people, some IT know-it-alls included, often treat both terms as synonyms
and use them interchangeably, it doesn’t necessarily take a rocket scientist to spot one basic
characteristic that differentiates them. As predecessor to the IoT, M2M has been used throughout
the decades as the standard technology in telemetry even before the invention of the Internet
itself, as it involved an interaction between two or more machines without human intervention.
The idea of the Internet of Things, on the other hand, having evolved on the foundations
laid down by M2M, aims at offering much more functionality. It uses Internet connectivity not
only to enable communication between a fleet of the same kind of machines, but also to unite
disparate devices and systems in an effort to marry different technology stacks and deliver
interactive and fully integrated networks across varying environments.
A real life example of this dissimilarity could be found in telemedicine. Let us imagine a
solution that connects a sensor monitoring the heart rate of a patient to an external application
15
which lets the doctor know the patient needs attention. Such kinds of solutions could easily be
provided by the M2M technology. On the other hand, if we take a sensor and integrate it with an
interactive pillbox that would advise the patient to take the medicine and, moreover, would be
able to send alerts to their family members’ smart phones that the medicine has not been taken
from the pillbox, it would definitely involve an IoT approach. To cut the story short, IoT could
be viewed as M2M, but acting in a wider context.
Bearing all that in mind, it seems that the whole discussion of differences could be quite
conveniently boiled down to one word: scale. And very rightly so, yet there is much more to it
than this.
Historically speaking, M2M technology harks back to the invention of two-way radio
networks in the beginnings of the 20th century which fuelled the development of telemetry, a
major industry area in which M2M has found numerous applications. With the rise of GSM data
connectivity in the 1990s, M2M entered a new phase in its development, to reach its prime only
in the first decade of the 21st century.
The break-neck speed at which the Internet technology has been developing since the
beginnings of the 2010s has recently forced M2M to give way to the IoT as the primary concept
that is shaping the way we think about the future of business and our everyday life.
Bearing all the aforesaid in mind, it must also be noted that, with new M2M protocols
such as Lightweight Machine-2-Machine (LwM2M) hitting the market and the new Internet-
based connectivity solutions they employ, the lines dividing the IoT from M2M are beginning to
16
blur. Thus, it may well be a matter of near future that we will see both terms acquiring the same
meaning and eventually becoming synonyms.
While it is true that the IoT has been largely based on the foundations provided by M2M
solutions, it must be added that it has been improving on them ever since it was established as
one of the main sources of innovation in the lives of individuals, businesses and whole societies
17
switches, servers, printers, etc. SNMP component include Network Management Station (NMS)
Managed Device Management Information Base (MIB) SNMP Agent that runs on the device.
SNMP components –
There are 3 components of SNMP:
1. SNMP Manager –It is a centralised system used to monitor network.It is also known as
Network Management Station (NMS)
2. SNMP agent –It is a software management software module installed on a managed device.
Managed devices can be network devices like PC, router, switches, servers etc.3. Management
Information Base –MIB consists of information of resources that are to be managed. These
information is organised hierarchically. It consists of objects instances which are essentially
variables.
SNMP security levels –It defines the type of security algorithm performed on SNMP packets.
These are used in only SNMPv3. There are 3 security levels namely:
1. noAuthNoPriv –This (no authentication, no privacy) security level uses community string for
authentication and no encryption for privacy.
2. authNopriv – This security level (authentication, no privacy) uses HMAC with Md5 for
authentication and no encryption is used for privacy.
3. authPriv – This security level (authentication, privacy) uses HMAC with Md5 or SHA
for authentication and encryption uses DES-56 algorithm.
18
SNMP messages –Different variables are:
1. GetRequest –SNMP manager sends this message to request data from SNMP agent. It is
simply used to retrieve data from SNMP agent. In response to this, SNMP agent responds with
requested value through response message.
2. GetNextRequest –This message can be sent to discover what data is available on a SNMP
agent. The SNMP manager can request for data continuously until no more data is left. In this
way, SNMP manager can take knowledge of all the available data on SNMP agent.
3. GetBulkRequest –This message is used to retrieve large data at once by the SNMP manager
from SNMP agent. It is introduced in SNMPv2c.
4. SetRequest –It is used by SNMP manager to set the value of an object instance on the SNMP
agent.
5. Response –It is a message send from agent upon a request from manager. When sent in
response to Get messages, it will contain the data requested. When sent in response to Set
message, it will contain the newly set value as confirmation that the value has been set.
6. Trap –These are the message send by the agent without being requested by the manager. It is
sent when a fault has occurred.
7. InformRequest –It was introduced in SNMPv2c, used to identify if the trap message has been
received by the manager or not. The agents can be configured to set trap continuously until it
receives an Inform message. It is same as trap but adds an acknowledgement that traps doesn’t
provide.
As noted in previous sections, the technical requirements of network operators are becoming
more onerous. The principal areas are:
The ability to stay connected and perhaps also contribute to system stability during
disturbances on the electricity system (including ‘fault ride-through’);
The ability to control reactive power generation/consumption in order to contribute towards
control of voltage; and
19
A general aim for the wind farm (not necessarily each wind turbine) to respond similarly to
conventional thermal generation, where possible.
These issues are considered in more detail in Part II: Grid Integration. However, the net
effect is to increase the arguments for variable-speed operation, and in particular for concepts
using a fully-rated electronic converter between the generator and the network. Concepts with
mechanical or hydraulic variable-speed operation and a synchronous generator, connected
directly to the network, are also suitable.
It should be noted that some of the required functions require fast response and
communications from the wind farm SCADA (Supervisory control and data acquisition) system.
It is also feasible to meet the requirements for reactive power control and for fault ride-
through by using fixed-speed or DFIG wind turbines, and additional power electronic converters.
This can be done by a single converter at the wind farm point of connection to the grid, or by
smaller units added to each turbine.
The NETCONF protocol has a simple layered architecture shown in Fig. 3. The core of
NETCONF is a simple remote procedure call (RPC) layer running over secure transports such as
SSH, TLS, SOAP, or BEEP. Secure Shell (SSH) transport is mandatory to implement as a means
of promoting interoperability.
The operations layer residing on top of the RPC layer provides specific operations to
manipulate configuration state. The configuration data itself forms the content layer residing
above the operations layer.
The NETCONF specification mainly deals with generic operations to retrieve and modify
configuration state. An additional document defines operations to subscribe to notification
streams and receive notifications.
Further operations are expected to be added in the future in order to support data-model-
specific management operations. NETCONF assumes that the configuration state of a device can
be represented as a structured document that can be retrieved and manipulated. In order to deal
20
with large configurations, the protocol supports filtering mechanisms that allow clients to
retrieve only a subset of the configuration.
Since NETCONF uses XML to encode network management data, it may seem obvious
to use one of the existing XML schema languages to formally specify the format of these XML
documents.
While some parts of the industry favor the XML Schema Definition Language (XSD),
there is significant uptake of RelaxNG in recent years.
But putting aside the differences between XSD and RelaxNG, it is clear that additional
NETCONF-specific information needs to be specified that goes well beyond the capabilities of
these XML schema languages. Both XSD and RelaxNG only address part of the problem to be
solved.
21
UNIT – III
Embedded systems are usually relatively small/medium computing platforms that are
self-sufficient. Such systems consist of all the software and hardware components which are
“embedded” inside the system so that complete applications can be implemented and run without
the aid of other external components or resources. Usually, embedded systems are found in
portable computing products such as PDAs, mobile, and smart phones as well as GPS receivers.
Nevertheless, larger systems such as microwave ovens and vehicle electronics contain
embedded systems. Nevertheless, here embedded systems are considered that can communicate
with each other by means of a wired or wireless communication protocol, such as Zigbee,
IEEE.802.11 standard, or any of its derivatives. Therefore, special attention is paid here to
embedded Internet-of-Things (IoT) hardware, its design methodology, and implementation
requirements
HLS Scheduling The HLS scheduling task belongs into two major categories: time-
constrained scheduling and resource-constrained scheduling. Time-constrained scheduling aims
to result into the lowest area or number of functional units, when the task is constrained by the
max number of control steps. Resource constrained scheduling aims to produce the fastest
schedule when the maximum number of hardware resources or hardware area is constrained
Internet of Things (IoT) is adding value to products and applications in the recent years.
The connectivity of the IoT devices over the network has widely reduced the power
consumption, robustness and connectivity to access data over the network. IoT is powering many
22
frontiers of industries and is seen as a promising technology to take Big Data Analytics to a level
higher. Weather monitoring system as a module is an issue among IoT research community and
it has been widely addressed. A new weather monitoring system is developed using various
sensors connecting to Raspberry Pi.
The implementation and data visualization on the data collected are discussed in this
paper in detail. Weather parameters like temperature, humidity, PM 2.5 and PM 10
concentrations and Air Quality Index (AQI) are monitored and visualized in graphical means
using the Raspberry Pi as server and data accessed over the intranet or internet in a specified
subnet or world wide web.
The data visualization is provided as result and proves to be a robust framework for
analyzing weather parameters in any geographical location studying the effect of smog and PM
2.5 concentration.
Humidity, Temperature and Pressure are three basic parameters to build any Weather
Station and to measure environmental conditions. We have previously built a Digital
Thermometer web server using NodeMCU and this time we are extending it to Weather Station
using ESP12E NodeMCU. In this project, we will measure Humidity, Temperature and
Pressure parameters and display them on the web server, which makes it a IoT based Weather
Station where the weather conditions can be monitored from anywhere using the Internet.
23
Python as a mainstream programming language has exploded. Notable advantages of
Python over other languages include, but are not limited to;
1. It is a very simple language to learn and easy to implement and deploy, so you don't need to
spend a lot of time learning lots of formatting standards and compiling options.
2. It is portable, expandable and embeddable, so, it is not system dependent, and hence supports
a lot of single board computers on the market these days, irrespective of architecture and
operating system.
3. Most importantly, it has a huge community which provides a lot of support and libraries for
the language.
The Internet of Things is a big deal today. Some consider it to be a buzzword,others say
it's a phase while many others, large industries included, stand by their belief that it is going to
be a game changer.
However, when all is said and done, the Internet of Things has no classical definition, and
as a result, its meaning is about as uniform as philosophical ideologies.
Because the Internet of Things paradigm has its hands in a lot of pies, the semantics of it
are largely dependent on your perspective. For me, the recipe for the Internet of Things is very
simple.
A 'thing', which could literally be anything, is fitted with an embedded system which
connects it to the internet, in other words, it has its own IP address. This thing may now interact
with other things, remote or local over the internet.
The possibilities made possible by this infrastructure are used cases for At this junction,
IoT occupies a place of importance in Wireless Sensor Networks, Data Analytics, Cyber
Physical Systems, Big Data and Machine Learning. It is also very focused on real time analytics
and processes.
24
Installing Python Data types
As the Internet of Things becomes more and more popular, so too has the need for
computing solutions that work across a network of smaller devices, rather than just centralized
processing servers.
To this end, developers are always challenged by hardware requirements, yet more
modern solutions, such as edge computing, are increasing our demand from such components.
However, it’s often mistakenly assumed that programming edge computing on these
controllers is no mean feat, as they rely on older languages and have limited potential.
The wireless monitoring system can connect to up to five 3rd party sensors such as
piezometers and sends all sensor data to a nearby gateway using a long-range, low-power
radio (LoRa) network. The gateway displays this data on a website and sends it to the
cloud. Fastprk, the leading outdoor Parking Management System, includes parking sensors
which use magnetic and infrared technology for parking detection.
While both Carles and Héctor dived into how Python can work with firmware IoT
developments, Data Lead Pau Beltran gave insights into how Big Data and Python are
connected at Worldsensing:
IoT tutorial provides basic and advanced concepts of IoT. Our Internet of Things tutorial
is designed for beginners and professionals.
IoT stands for Internet of Things, which means accessing and controlling daily usable
equipments and devices using Internet.
25
Our IoT tutorial includes all topics of IoT such as introduction, features, advantage and
disadvantage, ecosystem, decision framework, architecture and domains, biometric, security
camera and door unlock system, devices, etc.
According to the structure theorem, any computer program can be written using the
basic control structures . A control structure is a block of programming that analyses variables
and chooses a direction in which to go based on given parameters. In simple sentence, a control
structure is just a decision that the computer makes.
There are two basic aspects of computer programming: data and instructions .
To work with data, you need to understand variables and data types; to work with
instructions, you need to understand control structures and statements.
Flow of control through any given program is implemented with three basic types of control
structures: Sequential, Selection and Repetition.
26
Sequential
Sequential execution is when statements are executed one after another in order. You
don't need to do anything more for this to happen.
Selection
Selection used for decisions, branching - choosing between 2 or more alternative paths.
if
if...else
switch
Repetition
Repetition used for looping, i.e. repeating a piece of code multiple times in a row.
while loop
do..while loop
for loop
27
Writing modules
Modules in Python are simply Python files with a .py extension. The name of the module
will be the name of the file. A Python module can have a set of functions, classes or variables
defined and implemented.
Python Modules
A python module can be defined as a python program file which contains a python code
including python functions, class, or variables. In other words, we can say that our python code
file saved with the extension (.py) is treated as the module. We may have a runnable code inside
the python module. Modules in Python provides us the flexibility to organize the code in a
logical way. To use the functionality of one module into another, we must have to import the
specific module.
We need to load the module in our python code to use its functionality. Python provides
two types of statements as defined below.
The import statement is used to import all the functionality of one module into another.
Here, we must notice that we can use the functionality of any python source file by importing
that file as the module into another python source file.
We can import multiple modules with a single import statement, but a module is loaded
once regardless of the number of times, it has been imported into our file.
28
The from-import statement
Instead of importing the whole module into the namespace, python provides the
flexibility to import only the specific attributes of a module. This can be done by using from?
import statement. The syntax to use the from-import statement is given below.
Python packages
The packages in python facilitate the developer with the application development
environment by providing a hierarchical directory structure where a package contains sub-
packages, modules, and sub-modules. The packages are used to categorize the application level
code efficiently.
2. Create a python source file with name ITEmployees.py on the path /home/Employees.
3. Similarly, create one more python file with name BPOEmployees.py and create a
function getBPONames().
4. Now, the directory Employees which we have created in the first step contains two
python modules. To make this directory a package, we need to include one more file
here, that is __init__.py which contains the import statements of the modules defined in
this directory.
__init__.py
5. Now, the directory Employees has become the package containing two python
modules. Here we must notice that we must have to create __init__.py inside a directory
to convert this directory to a package.
29
6. To use the modules defined inside the package Employees, we must have to import
this in our python source file. Let's create a simple python source file at our home
directory (/home) which uses the modules defined in this package.
Python too supports file handling and allows users to handle files i.e., to read and write
files, along with many other file handling options, to operate on files. The concept of file
handling has stretched over various other languages, but the implementation is either
complicated or lengthy, but alike other concepts of Python, this concept here is also easy and
short. Python treats file differently as text or binary and this is important.
Each line of code includes a sequence of characters and they form text file. Each line of a
file is terminated with a special character, called the EOL or End of Line characters like comma
{,} or newline character. It ends the current line and tells the interpreter a new one has begun.
Let’s start with Reading and Writing files.
With Python you can create a .text files (guru99.txt) by using the code, we have
demonstrated here how you can do this
Step 1)
f= open("guru99.txt","w+")
We declared the variable f to open a file named textfile.txt. Open takes 2 arguments, the
file that we want to open and a string that represents the kinds of permission or operation
we want to do on the file
30
Here we used "w" letter in our argument, which indicates write and the plus sign that
means it will create a file if it does not exist in library
The available option beside "w" are "r" for read and "a" for append and plus sign means
if it is not there then create it
Step 2)
for i in range(10):
f.write("This is line %d\r\n" % (i+1))
Step 3)
f.close()
We don't usually store all of our files in our computer in the same location. We use a
well-organized hierarchy of directories for easier access.
Similar files are kept in the same directory, for example, we may keep all the songs in the
"music" directory. Analogous to this, Python has packages for directories and modules for files.
As our application program grows larger in size with a lot of modules, we place similar
modules in one package and different modules in different packages. This makes a project
(program) easy to manage and conceptually clear.
31
Similar, as a directory can contain sub-directories and files, a Python package can have
sub-packages and modules.
A directory must contain a file named __init__.py in order for Python to consider it as a
package. This file can be left empty but we generally place the initialization code for that
package in this file.
We can import modules from packages using the dot (.) operator.
For example, if want to import the start module in the above example, it is done as follows.
1. import Game.Level.start
Python is an “object-oriented programming language.” This means that almost all the
code is implemented using a special construct called classes. Programmers use classes to keep
32
related things together. This is done using the keyword “class,” which is a grouping of object-
oriented constructs.
A class by itself is of no use unless there is some functionality associated with it.
Functionalities are defined by setting attributes, which act as containers for data and functions
related to those attributes. Those functions are called methods.
Attributes:
You can define the following class with the name Snake. This class will have an attribute
name.
class Snake:
name = "python" # set an attribute `name` of the class
Methods:
Once there are attributes that “belong” to the class, you can define functions that will
access the class attribute. These functions are called methods. When you define methods, you
will need to always provide the first argument to the method with a self keyword.
class Snake:
name = "python"
def change_name(self, new_name): # note that the first argument is self
self.name = new_name # access the class attribute with the self keyword.
33
UNIT – IV
IoT Physical devices & Endpoints: What is an IoT Device-Exemplary Devices- About the
Board-Linux on Raspberry Pi-Raspberry Pi Interfaces- Raspberry Pi with Python there IoT
devices. IoT Physical Servers & Cloud Offer-fags Introduction to Cloud Storage models and
communication APIs –WAMP- autobhan for IoT-Xively cloud for IoT –Phython
web application frame work- Django- Designing a restful Web API-Amazon web services for
IOT-skynet IoT-Messaging platform
IoT involves extending internet connectivity beyond standard devices, such as desktops,
laptops, smartphones and tablets, to any range of traditionally dumb or non-internet-enabled
physical devices and everyday objects. Embedded with technology, these devices can
communicate and interact over the internet, and they can be remotely monitored and controlled.
Connected devices are part of a scenario in which every device talks to other related
devices in an environment to automate home and industry tasks, and to communicate
usable sensor data to users, businesses and other interested parties. IoT devices are meant to
work in concert for people at home, in industry or in the enterprise. As such, the devices can be
categorized into three main groups: consumer, enterprise and industrial.
Consumer connected devices include smart TVs, smart speakers, toys, wearables and
smart appliances. Smart meters, commercial security systems and smart city technologies -- such
as those used to monitor traffic and weather conditions -- are examples of industrial and
enterprise IoT devices. Other technologies, including smart air conditioning, smart thermostats,
smart lighting and smart security, span home, enterprise and industrial uses.
In a smart home, for example, a user arrives home and his car communicates with the
garage to open the door. Once inside, the thermostat is already adjusted to his preferred
34
temperature, and the lighting is set to a lower intensity and his chosen color for relaxation, as his
pacemaker data indicates it has been a stressful day.
In the enterprise, smart sensors located in a conference room can help an employee locate
and schedule an available room for a meeting, ensuring the proper room type, size and features
are available. When meeting attendees enter the room, the temperature will adjust according to
the occupancy, and the lights will dim as the appropriate PowerPoint loads on the screen and the
speaker begins his presentation.
A number of challenges can hinder the successful deployment of an IoT system and its
connected devices, including security, interoperability, power/processing capabilities, scalability
and availability. Many of these can be addressed with IoT device management either by adopting
standard protocols or using services offered by a vendor.
Device management helps companies integrate, organize, monitor and remotely manage
internet-enabled devices at scale, offering features critical to maintaining the health, connectivity
and security of the IoT devices along their entire lifecycles. Such features include:
Device registration
Device authentication/authorization
Device configuration
Device provisioning
Device monitoring and diagnostics
Device troubleshooting
Available standardized device management protocols include the Open Mobile Alliance's
Device Management (OMA DM) and Lightweight Machine-to-Machine (OMA LwM2M).
IoT device management services and software are also available from vendors including
Amazon, Bosch Software Innovations GmbH, Microsoft, Software AG and Xively.
35
IoT device connectivity and networking
Communications protocols include CoAP, DTLS and MQTT, among others. Wireless
protocols include IPv6, LPWAN, Zigbee, Bluetooth Low Energy, Z-Wave, RFID and NFC.
Cellular, satellite, Wi-Fi and Ethernet can also be used.
Each option has its tradeoffs in terms of power consumption, range and bandwidth, all of
which must be considered when choosing connected devices and protocols for a particular IoT
application.
To share the sensor data they collect, IoT devices connect to an IoT gateway or another
edge device where data can either be analyzed locally or sent to the cloud for analysis.
Researchers have already demonstrated remote hacks on pacemakers and cars, and, in
October 2016, a large distributed denial-of-service attack dubbed Mirai affected DNS servers on
the east coast of the United States, disrupting services worldwide -- an issue traced back to
hackers infiltrating networks through IoT devices, including wireless routers and connected
cameras. The Raspberry Pi is equipped with one SPI bus that has 2 chip selects.
The SPI master driver is disabled by default on Raspbian. To enable it, use raspi-config,
or ensure the line dtparam=spi=on isn't commented out in /boot/config.txt, and reboot. If the SPI
driver was loaded, you should see the device /dev/spidev0.0.
36
The SPI bus is available on the P1 Header:
MOSI P1-19
MISO P1-21
SCLK P1-23 P1-24 CE0
GND P1-25 P1-26 CE1
WiringPi
WiringPi includes a library which can make it easier to use the Raspberry Pi's on-board
SPI interface. Accesses the hardware registers directly.
bcm2835 library
This is a C library for Raspberry Pi (RPi). It provides access to GPIO and other IO
functions on the Broadcom BCM 2835 chip. Accesses the hardware registers directly.
point. See the Troubleshooting section. Uses the Linux spidev driver to access the bus.
Shell
# Write binary 1, 2 and 3
echo -ne "\x01\x02\x03" > /dev/spidev0.0
Hardware
The BCM2835 on the Raspberry Pi has 3 SPI Controllers. Only the SPI0 controller is
available on the header. Chapter 10 in the BCM2835 ARM Peripherals datasheet describes this
controller.
Master modes
Signal name abbreviations
SCLK - Serial CLocK
CE - Chip Enable (often called Chip Select)
MOSI - Master Out Slave In
37
MISO - Master In Slave Out
MOMI - Master Out Master InStandard mode
In Standard SPI master mode the peripheral implements the standard 3 wire serial protocol
(SCLK, MOSI and MISO).
Bidirectional mode
In bidirectional SPI master mode the same SPI standard is implemented, except that a
single wire is used for data (MOMI) instead of the two used in standard mode (MISO and
MOSI). In this mode, the MOSI pin serves as MOMI pin.
38
Linux driver
The default Linux driver is spi-bcm2708.
The following information was valid 2014-07-05.
Speed
The driver supports the following speeds:
Supported bits per word
8 - Normal
9 - This is supported using LoSSI mode.
Transfer modes
Only interrupt mode is supported.
Deprecated warning
The following appears in the kernel log: unqueued, this is deprecated
SPI driver latency
This thread discusses latency problems.
R-Pi Configuration
1- The Raspberry Pi should be accessible remotely via an IP. If not, there might be
several issues that could be easily fixed. Checkout this link
2- Flask framework must be installed on the Raspberry Pi. Check this documentation for
installation
39
Python Program on Raspberry Pi
Circuit Construction
The circuit consists of the Pi, LEDs, and a resistor to limit the current that can flow
through the circuit.
All the ground are connected to the (GND) pin of the Rpi that basically acts like the
negative or 0 volts of a battery. On the other hand, he GPIO Pins 17 , 18 and 19 act like the
positive terminal of a battery. When the signal is HIGH, the LED will light up and when the
signal is LOW, the LED will stop glowing.
Session: Session is a conversation between two peers that runs over a transport.
Client: Clients are peers that can have one or more roles.
In the publish–subscribe model, the Client can have the following roles: –
Publisher: Publisher publishes events (including payload) to the topic maintained by the Broker.
Subscriber: Subscriber subscribes to the topics and receives the events including the payload.
In the RPC model, the Client can have the following roles:
Caller: Caller issues calls to the remote procedures along with call arguments.
40
Callee: Callee executes the procedures to which the calls are issued by the Caller and returns the
results to the Caller.
Router: Routers are peers that perform generic call and event routing.
Broker: Broker acts as a Router and routes messages published to a topic to all the subscribers
subscribed to the topic.
Dealer: Dealer acts a router and routes RPC calls from the Caller to the Callee and routes results
from the Callee to the Caller.
Application code: Application code runs on the Clients (Publisher, Subscriber, Callee or Caller).
WAMP
The acronym WAMP refers to a set of free (open source) applications, combined with
Microsoft Windows, which are commonly used in Web server environments.
The WAMP stack provides developers with the four key elements of a Web server:
an operating system, database, Web server and Web scripting software. The combined usage of
these programs is called a server stack.
In this stack, Microsoft Windows is the operating system (OS), Apache is the Web
server, MySQL handles the database components, while PHP, Python, or PERL represents the
dynamic scripting languages.
As a developer using a framework, you typically write code which conforms to some
kind of conventions that lets you "plug in" to the framework, delegating responsibility for the
communications, infrastructure and low-level stuff to the framework while concentrating on the
logic of the application in your own code. This "plugging in" aspect of Web development is
often seen as being in opposition to the classical distinction between programs and libraries, and
the notion of a "mainloop" dispatching events to application code is very similar to that found
in GUI programming.
Full-stack frameworks
Web Frameworks
A web framework is a code library that makes web development faster and easier by
providing common patterns for building reliable, scalable and maintainable web applications.
After the early 2000s, professional web development projects always use an existing web
framework except in very unusual situations.
42
These common operations include:
1. URL routing
2. Input form handling and validation
3. HTML, XML, JSON, and other output formats with a templating engine
4. Database connection configuration and persistent data manipulation through an ORM
5. Session storage and retrieval
6. Web security against Cross-site request forgery (CSRF), SQL Injection, Cross-site
Scripting (XSS) and other common malicious attacks
Django
Django is a high-level Python Web framework that encourages rapid development and
clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of
Web development, so you can focus on writing your app without needing to reinvent the wheel.
It’s free and open source.
Ridiculously fast.
Django was designed to help developers take applications from concept to completion as
quickly as possible.
Reassuringly secure.
Django takes security seriously and helps developers avoid many common security
mistakes.
Exceedingly scalable.
Some of the busiest sites on the Web leverage Django’s ability to quickly and flexibly
scale.
43
Testing Django applications.
Testing is vital. The articles on testing will introduce you to unit and integration testing
for your Django applications. You will also learn about the different packages and libraries
available to assist with writing and running test suites.
REST API
Learn how to create RESTFul APIs using the Django Rest Framework(DRF), an
application used for rapidly building RESTful APIs based on Django models.
Best practices
Learn Django best practices, recommended workflow, project structure and also how to
avoid common pitfalls when building Django projects.
Deployment
When your application is ready to leave the room and be deployed, the tutorials and
articles on deployment will cover deployment options available to you and how to deploy your
site to each one.
44
Identify Object Model
The very first step in designing a REST API based application is – identifying the objects
which will be presented as resources. For a network based application, object modeling is pretty
much simpler. There can be many things such as devices, managed entities, routers, modems etc.
For simplicity sake, we will consider only two resources i.e.
Devices
Configurations
Note that both objects/resources in our above model will have a unique identifier, which is
the integer id property.
The AWS IoT Device SDK helps you easily and quickly connect your hardware device
or your mobile application to AWS IoT Core. The AWS IoT Device SDK enables your devices
to connect, authenticate, and exchange messages with AWS IoT Core using the MQTT, HTTP,
or WebSockets protocols. The AWS IoT Device SDK supports C, JavaScript, and Arduino, and
includes the client libraries, the developer guide, and the porting guide for manufacturers. You
can also use an open source alternative or write your own SDK.
Device Gateway
The Device Gateway serves as the entry point for IoT devices connecting to AWS. The
Device Gateway manages all active device connections and implements semantics for multiple
protocols to ensure that devices are able to securely and efficiently communicate with AWS IoT
Core.
AWS IoT Core is a managed cloud service that lets connected devices easily and securely
interact with cloud applications and other devices. AWS IoT Core can support billions of devices
45
and trillions of messages, and can process and route those messages to AWS endpoints and to
other devices reliably and securely.
AWS IoT Core also makes it easy to use AWS services like AWS Lambda, Amazon
Kinesis, Amazon S3, Amazon SageMaker, Amazon DynamoDB, Amazon CloudWatch, AWS
CloudTrail, and Amazon QuickSight, to build IoT applications that gather, process, analyze and
act on data generated by connected devices, without having to manage any infrastructure.
SKYNET
Skynet offers a realtime websocket API as well as a Node.JS NPM module to make
event-driven IoT development fast and easy. When nodes and devices register with Skynet, they
are assigned a unique id known as a UUID along with a security token.
Upon connecting your node or device to Skynet, you can query and update devices on the
network and send machine-to-machine (M2M) messages in an RPC-style fashion.
IoT-Messaging platform
Here’s something new in the communication sector in the emerging field of the Internet
of Things (IoT). Skynet is an open source Instant Messaging (IM) service for connected devices
and services, launched recently.
Skynet, not to be confused with the Artificial Intelligence company that Google bought
last month, is a Cloud-based MQTT-powered network that scales to meet any needs whether the
nodes are smart home devices, sensors, cloud resources, drones, Arduinos, Raspberry Pis, among
others. It is powered by Node.JS, known for fast, event-driven operations, ideal for nodes and
devices such as RaspberryPi, Arduino, and Tessel.
When nodes and devices register with Skynet, they are assigned a unique id known as a
UUID along with a security token. Upon connecting your node or device to Skynet, you can
46
query and update devices on the network and send machine-to-machine (M2M) messages in an
RPC-style fashion.
The single SkyNet API supports the following IoT protocols: HTTP, REST,
WebSockets, MQTT (Message Queue Telemetry Transport), and CoAP (Constrained
Application Protocol) for guaranteed message delivery and low-bandwidth satellite
communications.
Every connected device is assigned a 36 character UUID and secret token that act as the
device’s strong credentials. Security permissions can be assigned to allow device discoverability,
configuration, and messaging.
SkyNet recently released its IoT Hub which allows the user to connect smart devices with
and without IP addresses directly to SkyNet including: Nest, Phillips Hue lightbulbs, Belkin
Wemos, Insteons, and other not-so-smart devices such as serial port devices and RF (radio
frequency) devices.
Not only does this allow any device to be connected to the Internet but it also allows
people to message smart devices without going through the manufacturers’ clouds and apps.
47
UNIT - V
IoT design
In the world of IoT, user research and service design are more crucial than ever. While
early adopters are eager to try out new technology, many others are reluctant to take new
technology into use and cautious about using it, due to not feeling confident with it. For your IoT
solution to become widely adopted, you need to dig deep into users’ needs in order to find out
where lies a problem truly worth solving and what is the real end user value of the solution. You
also need to understand what might be the barriers of adopting the new technology in general and
your solution specifically. For deciding on your feature set, you need research too. The features
that might be valuable and highly relevant for the tech early adopters may be uninteresting for
the majority of the users and vice versa, so you need to plan carefully what features to include
and in which order.
IoT solutions typically consist of multiple devices with different capabilities and both
physical and digital touchpoints. The solution may also be provided in co-operation with
multiple different service providers. It is not enough to design one of the touchpoints well,
instead you need to take a holistic look across the whole system, the role of each device and
service, and the conceptual model of how user understands and perceives the system. The whole
system needs to work seamlessly together in order to create a meaningful experience.
As the IoT solutions are placed in the real world context, the consequences can be
serious, when something goes wrong. At the same time the users of the IoT solutions may be
vary of using new technology, so building trust should be one of your main design drivers. Trust
48
is built slowly and lost easily, so you really need to make sure that every interaction with the
product/service builds the trust rather than breaks it. What it means in practise? First of all, it
means understanding possible error situations related to context of use, HW, SW and network as
well as to user interactions and trying to prevent them. Secondly, if the error situations still
occur, it means appropriately informing the user about them and helping them to recover.
Secondly, it means considering data security & privacy as key elements of your design. It is
really important for users to feel, that their private data is safe, their home, working environment
and everyday objects cannot be hacked and their loved ones are not put at risk. Thirdly, quality
assurance is critical and it should not only focus on testing the SW, but on testing the end to end
system, in a real-world context.
IoT solutions exist at the crossroads of the physical and digital worlds. Commands given
through digital interfaces may produce real world effects, but unlike digital commands, the
actions happening in the real-world cannot necessarily be undone. In the real world context lots
of unexpected things can happen and at the same time user should be able to feel safe and in
control. The context places also other kind of requirements to the design. Depending on the
physical context, the goal might be to minimize distraction of the user or e.g. to design devices
that hold up against changing weather conditions. IoT solutions in homes, workplaces and public
areas are are typically multi-user systems and thus less personal than e.g. screen based solutions
used in smartphones, which also brings into picture the social context where the solution is used
and its’ requirements for the design.
Due to the real world context of the IoT solutions, regardless of how carefully you design
things and aim to build trust, something unexpected will happen at some point and your solution
is somehow going to fail. In this kind of situations, it is of utmost importance, that you have built
a strong brand that truly resonates with the end users. When they feel connected to your brand,
they will be more forgiving about the system failures and will still keep on using your solution.
While designing your brand, you must keep in mind, that trust should be a key element of the
49
brand, one of the core brand values. This core value should also be reflected in the rest of the
brand elements, like the choice of color, tone of voice, imagery etc.
Typically HW and SW have quite different lifespans, but as successful IoT solution needs
both the HW and SW elements, the lifespans should be aligned. At the same time, IoT solutions
are hard to upgrade, because once the connected object is placed somewhere, it is not so easy to
replace it with a newer version, especially if the user would need to pay for the upgrade and even
the software within the connected object may be hard to update due to security and privacy
reasons. Due to these factors and to avoid costly hardware iterations, it’s crucial to get the
solution right, from the beginning of implementation. What this means from the design
perspective is that prototyping and rapid iteration of both the HW and the whole solution are
essential in the early stages of the project. New, more creative ways of prototyping and faking
the solution are needed.
IoT solutions can easily generate tons of data. However, the idea is not to hoard as much
data as possible, but instead to identify the data points that are needed to make the solution
functional and useful. Still, the amount of data may be vast, so it’s necessary for the designer to
understand the possibilities of data science and how to make sense of the data. Data science
provides a lot of opportunities to reduce user friction, i.e. reducing use of time, energy and
attention or diminishing stress. It can be used to automate repeated context dependent decisions,
to interpret intent from incomplete/inadequate input or to filter meaningful signals from noise.
Understanding what data is available and how it can be used to help the user is a key element in
designing successful IoT services.
IoT and data remain intrinsically linked together. Data consumed and produced keeps
growing at an ever expanding rate. This influx of data is fueling widespread IoT adoption as there
will be nearly 30.73 billion IoT connected devices by 2020. The Internet of Things (IoT) is an
50
interconnection of several devices, networks, technologies, and human resources to achieve a
common goal. There are a variety of IoT-based applications being used in different sectors and
have succeeded in providing huge benefits to the users.
The data generated from IoT devices turns out to be of value only if it gets subjected to
analysis, which brings data analytics into the picture. Data Analytics (DA) is defined as a process,
which is used to examine big and small data sets with varying data properties to extract
meaningful conclusions and actionable insights. These conclusions are usually in the form of
trends, patterns, and statistics that aid business organizations in proactively engaging with data to
implement effective decision-making processes.
Data Analytics has a significant role to play in the growth and success of IoT applications
and investments. Analytics tools will allow the business units to make effective use of their
datasets as explained in the points listed below.
Volume: There are huge clusters of data sets that IoT applications make use of. The business
organizations need to manage these large volumes of data and need to analyze the same for
extracting relevant patterns.
Structure: IoT applications involve data sets that may have a varied structure as unstructured,
semi-structured and structured data sets. There may also be a significant difference in the data
formats and types.
Data analytics will allow the business executive to analyze all of these varying sets of data
using automated tools and software.
Driving Revenue: The use of data analytics in IoT investments will allow the business units to
gain an insight into customer preferences and choices. This would lead to the development of
services and offers as per the customer demands and expectations. This, in turn, will improve
the revenues and profits earned by the organizations.
51
Competitive Edge
IoT is a buzzword in the current era of technology and there are numerous IoT application
developers and providers present in the market. The use of data analytics in IoT investments will
provide a business unit to offer better services and will, therefore, provide the ability to gain a
competitive edge in the market.
Streaming Analytics: This form of data analytics is also referred as event stream processing
and it analyzes huge in-motion data sets.
Real-time data streams are analyzed in this process to detect urgent situations and immediate
actions.
Spatial Analytics: This is the data analytics method that is used to analyze geographic patterns
to determine the spatial relationship between the physical objects.
Time Series Analytics: As the name suggests, this form of data analytics is based upon the
time-based data which is analyzed to reveal associated trends and patterns.
IoT applications, such as weather forecasting applications and health monitoring systems can
benefit from this form of data analytics method.
Prescriptive Analysis: This form of data analytics is the combination of descriptive and
predictive analysis. It is applied to understand the best steps of action that can be taken in a
particular situation.
Commercial IoT applications can make use of this form of data analytics to gain better
conclusions.
IoT Analytics
The first thing to understand about analytics on IoT data is that it involves datasets
generated by sensors, which are now both cheap and sophisticated enough to support a
seemingly endless variety of use cases.
52
Apache Hadoop
Hadoop is an open source distributed processing framework that manages data processing
and storage for big data applications in scalable clusters of computer servers. It's at the center of
an ecosystem of big data technologies that are primarily used to support advanced analytics
initiatives, including predictive analytics, data mining and machine learning.
Hadoop systems can handle various forms of structured and unstructured data, giving
users more flexibility for collecting, processing and analyzing data than relational databases
and data warehouses provide.
Modules
Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-
throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Hadoop Ozone: An object store for Hadoop.
Hadoop Submarine: A machine learning engine for Hadoop.
Despite the emergence of alternative options, especially in the cloud, Hadoop is still an
important and valuable technology for big data users for the following reasons:
It can store and process vast amounts of structured, semistructured and unstructured data,
quickly.
It protects application and data processing against hardware failures. If one node in a cluster
goes down, processing jobs are automatically redirected to other nodes to ensure applications
continue to run.
53
It doesn't require that data be preprocessed before being stored. Organizations can store raw
data in HDFS and decide later how to process and filter it for specific analytics uses.
It's scalable, so companies can easily add more nodes to enable their systems to handle more
data.
It can support real-time analytics to help drive better operational decision-making, as well as
batch workloads for historical analysis.
Apache Oozie
Apache Oozie is a workflow scheduler for Hadoop. It is a system which runs the
workflow of dependent jobs. Here, users are permitted to create Directed Acyclic
Graphs of workflows, which can be run in parallel and sequentially in Hadoop.
Oozie runs as a service in the cluster and clients submit workflow definitions for
immediate or later processing.
Oozie workflow consists of action nodes and control-flow nodes.
An action node represents a workflow task, e.g., moving files into HDFS, running a
MapReduce, Pig or Hive jobs, importing data using Sqoop or running a shell script of a program
written in Java.
Start Node, End Node, and Error Node fall under this category of nodes.
Start Node, designates the start of the workflow job.
End Node, signals end of the job.
Error Node designates the occurrence of an error and corresponding error message to be
printed.
54
Apache Spark
Speed − Spark helps to run an application in Hadoop cluster, up to 100 times faster in
memory, and 10 times faster when running on disk. This is possible by reducing number
of read/write operations to disk. It stores the intermediate processing data in memory.
Supports multiple languages − Spark provides built-in APIs in Java, Scala, or Python.
Therefore, you can write applications in different languages. Spark comes up with 80
high-level operators for interactive querying.
Advanced Analytics − Spark not only supports ‘Map’ and ‘reduce’. It also supports SQL
queries, Streaming data, Machine learning (ML), and Graph algorithms.
55
Apache Spark Core
Spark Core is the underlying general execution engine for spark platform that all other
functionality is built upon. It provides In-Memory computing and referencing datasets in
external storage systems.
Spark SQL
Spark SQL is a component on top of Spark Core that introduces a new data abstraction
called SchemaRDD, which provides support for structured and semi-structured data.
Spark Streaming
Spark Streaming leverages Spark Core's fast scheduling capability to perform streaming
analytics. It ingests data in mini-batches and performs RDD (Resilient Distributed Datasets)
transformations on those mini-batches of data.
GraphX
Apache Storm
Storm was originally created by Nathan Marz and team at BackType. BackType is a
social analytics company. Later, Storm was acquired and open-sourced by Twitter. In a short
time, Apache Storm became a standard for distributed real-time processing system that allows
you to process large amount of data, similar to Hadoop. Apache Storm is written in Java and
Clojure. It is continuing to be a leader in real-time analytics. This tutorial will explore the
56
principles of Apache Storm, distributed messaging, installation, creating Storm topologies and
deploy them to a Storm cluster, workflow of Trident, real-time applications and finally
concludes
Storm is simple and developers can write Storm topologies using any programming
language. Five characteristics make Storm ideal for real-time data processing workloads. Storm
is:
Fast – benchmarked as processing one million 100 byte messages per second per node
Fault-tolerant - when workers die, Storm will automatically restart them. If a node dies,
the worker will be restarted on another node.
Reliable – Storm guarantees that each unit of data (tuple) will be processed at least once
or exactly once. Messages are only replayed when there are failures.
Easy to operate – standard configurations are suitable for production on day one. Once
deployed, Storm is easy to operate.
Supervisor nodes – communicates with Nimbus through Zookeeper, starts and stops
workers according to signals from Nimbus
57
Five key abstractions help to understand how Storm processes data:
Bolts – process input streams and produce output streams. They can: run functions; filter,
aggregate, or join data; or talk to databases.
Topologies – the overall calculation, represented visually as a network of spouts and bolts
(as in the following diagram)
Storm users define topologies for how to process the data when it comes streaming in
from the spout. When the data comes in, it is processed and the results are passed into Hadoop
58
Real-time Analytics
Real-time analytics is the use of all available enterprise data and resources, when they are
needed. It consists of dynamic analysis and reporting, based on the data entered into a system, it
takes less than one minute before the actual time of use. Real-time analytics is also known as
real-time data analytics, real-time data integration, and real-time intelligence.
A storm cluster has 3 sets of nodes- The master here is the Nimbus, which runs in the
node or machine. It is responsible for submitting jobs to the cluster. Zookeeper is a distributed
code initiation service, it has to be installed with storm separately. It has the responsibility to
keep it in the running stage. Nimbus submits it, but zookeeper runs it, if there is a failure the
supervisor takes care of it.
Nimbus Node
Apache Storm
Apache Storm is a free and open source distributed realtime computation system. Apache
Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing
what Hadoop did for batch processing. Apache Storm is simple, can be used with any
programming language, and is a lot of fun to use!
Apache Storm has many use cases: realtime analytics, online machine learning,
continuous computation, distributed RPC, ETL, and more. Apache Storm is fast: a benchmark
clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant,
guarantees your data will be processed, and is easy to set up and operate.
____________________________________
59