unit 4 iot
unit 4 iot
Data-acq uiring and data-stor age functions for IoT/M2M devices data and messages
Data Generation
Data generates at devices that later o n, transfers to the Internet through a gateway.
Services, business processes and business intelligenc e use data. Valid. useful and relevant data can be
categorised into three categories for storag~a ta alone. da1a as well as results of processing, only the results
of data analytics are stored.
FolJowing are three cases for storage: . . d h f data alone
l. Data which needs 10 be repeatedly processed, referenced or audited m future, an t ere ore,
needs to be scored. . . • d
Data which needs processing only once, and the results are used at a later ume usmg ~e ana 1yucs, a_n
2
· both the data and results of processing and analytics are stored. Advantages of this case are qwck
visualisation and reports generation without reprocessing. Also the data is available for reference or
auditing in fumre. . . .
_ Online, real-time or screaming data need to be processed and the resLtlts of this processing and analysis
3
need storage.
4 Data from large number of devices and sources categorises into a fourth category called Big data. Data
_
is stored in databases at a server or in a data warehouse or on a Cloud as Big data.
Data Store
A data store is a data repository of a set of objects which integrate into tl1e store.
Features of data score are:
• Objects in a data-store are modeled using Classes which are defined by d1e database schemas.
• A data store is a general concept. lt includes data repositories such as database, relational database, flat
file, spreadsheet, mail server, web server, directory services and VMware
• A data store may be distributed over multiple nodes. Apache Cassandra is an example of distributed data
store.
• A data store may consist of multiple schemas or may consist of data in only one scheme. Example of
only one scheme data score is a relational database.
Cons_ider goods wid1 ~FID ta~s. Whe_n goods move from one place to another, the IDs of goods as well as
locauons ar~ needed_ ~ tracking or mvento1y control applications. Spatial storage is storage as spatial
database which 1s opunused to store and later on receives queries from 1..he applications.
ORGANISING THE DATA
Data can be organised in a number of ways. For example, objects, fi les, data s tore, database, relational
database and object oriented database.
Databases
Required data val ues are organised as database(s) so that select val ues can be retrieved larer.
Database
O ne popular method of organising data is a database, which is a collection of data. This collection is
organised into tables. A table provid es a systematic way for access, management and update.
Relational Database
A relational database is a collection o f data into multiple tables which relate 10 each other through s pecial
fields, called keys (primary key, foreign key and unique key).
Object Oriented Database (000B) is a collection of objects, wlticb save the objects in objected orieated
design.
Database Management System
Database Management System (DBMS) is a software system, which contai ns a set of programs specially
designed for creation and management of data stored in a database. Database transactions can be performed
on a database or relational database.
Atomicity, Data Consistency, Data Isolation and Durability (ACID) Rules
The database transactions must mainta in the atomicity, data consistency, data isolation and durabi]jty during
transactions.
Atomicity means a transaction must complete in full, treating it as indivisible.
Consistency means that data after the transactions should rema in cons istent. For example, sum of chocolates
sent shoul d equal the sums of sold and unsold chocolates for each flavour after the transactions on the
database.
Isolation means uaasactions between tab les are isolated from eaclJ other.
Durability means after completion of transactions, the previous transaction cannot be reca lled. Only a new
transaction can affect any change.
Distributed Database
Distributed Database (DOB) is a collection of logically interrelated databases over a computer network.
Distributed DBMS means a software system that manages a distributed database. Distributed DB system has
abi.lity 10 access remOle sites and transmit queries. The features of a distributed database system are:
• DDB is a collection of dacabases which are logically related to each other.
• Cooperation exists between the databases in a transparent manner. Transparent means that each user
within the system may access all of lhe data within all of the databases as Lf they were a single database.
• DDB should be 'location independent', which means the user is ua aware of where the data is located,
and it is possible to move the data from one physical location to another without affecting the user.
Consistency, Availability and Partillo11-Tolera11ce Theorem
Consistency, Availability and Partitioa-Tolerance Theorem (CAP theorem) is a theorem for distributed
computing systems. The theorem states Lhat it is impossible for a d istributed computer system to
simultaneously provide all three of the Consistency, Availability, Partition tolerance (CAP) guarantees. This
is due to the fact that a network failure can occur during communication among the distributed computing
nodes. Partitioning of a network therefore needs to be tolerated. Hence,
at all times either there will be
consistency or availability.
Consis tency means 'Every read receives tJ1e most recent write or an error'.
When a message or data is
sought the network generally issues notification of time-out or read error.
During an interval of a network
failure, lhe notification may not reach the requesting node(s).
Availability means 'Every request receive s a response. withou t guaran tee that
it contain s the most recent
version of the infom1ation'. Due to tJ1e interval of network failure, it may happen
that most recent version of
message or data requested may not be avai lable.
Partition toleran ce means 'The system continues to operate despite an arbitrar
y numbe r of messages
being droppe d by tJ1e networ k between the nodes'. The system continu es to
work even if a panitio n causes
communication interruption between nodes. During the interval of a networ
k failure, the networ k wi 11 have
two separate set of networked nodes. Since failure can always occur d1erefo
re, the panitio ning needs to be
tolerated.
Query Processing
Query means an appl icalion seeking a specific data set from a database.
Query process ing means using a proces s and getting the results of the query
made from a databa se. The
prncess s hould use a correct as well as efficien t execution strategy. Five steps
in processing are:
I. Parsing and cranslation: This step translates the query into an interna
l form, into a relational
algebraic expression and then a Parser, which checks the syntax and verifies
the relations.
2. Decomposition to complete the query process into micro-operations using the
analysis (for the numbe r of
micro- operati ons required for the operations), conjun ctive and disjunctive
normalisation and semant ic
analysis.
3. Optimisation which means optimis ing the cost of processing. The cost
means numbe r of micro-
operations genera ted in processing.
-!. Evaluation plan: A query-execution engine (software) takes a query-evaluati
on plan a nd executes chat
plan.
5. Returning tJ1e results of the query.
NOSQL
NOSQL stands for No-SQL or Not Only SQL tha t does not integrate with ;ippliratio ns that are based on
SQL. NOSQL is used in cloud data store. NOSQL may cons ist of a class of non-relaLio nal data storage
systems, flexib le data models and multiple schemas
Extract, Trans form and Load
Extract, Transform a nd Load or £TL is a system which enables the usage of databases used, especially the
o nes s tored at a data wareho use. Extract means obtaining data from homogeneous o r heterogeneous data
sources. Transform means transforming a nd storing the data in an app ropriate structure or format. Load
means the structured data load in the fina l target database or data store or da ta wa rehouse. All the three
phases can execute in parallel. ETL system usages are for integrating data from multiple applications
(systems) hos ted separately.
BUSINESS PROCESSES
A b1Usiness process consists of a series of activities which serves a [Particular speci fie result. The BP is a
repr,esentation or process matrix or flowchart of a sequence of activities with inierleaving decision points.
Business lntelligence
Business inteLligence is a process which enables a business service to extract new facts and knowledge and
Ulen undertake better dec isions. The new facts and knowledge follow from the earlie r results of data
processing, aggregation and then analysing those results.
Distribution of processes reduces the complexity, communication costs, enables faster responses and smaJJer
processing load at the central system.
Dis tributed Business Process System (DBPS) is a collection of logically interrelated business processes in
an Enterprise network. DBPS means a software system that manages Lhe distributed BPs.
DBPS features are:
DBPS is a collection of logically related BPs like DDBS. DBPS exists as cooperation between U1e B Ps i.n
a transparent manner. Transparent means Uiat each user withi n the system may access alJ of the process
decisions within all of U1e processes as if they were a single business process.
DBPS should possess 'location independence' which means the enterprise Bl is unaware of where the BPs
are located. It is possible to move the resuJts of a nalytics and knowl edge from one physical locatlion to
another wiUiout affecting U1e use r.
ANALYTlCS
Imernet of things can use analytics, new facts are found and those facts enable caking o[ i.be decisions for
new option(s) to maximise the profits from the machines. Analytics require the data to be available and
accessible. Tr uses arithmetic and statistical, data mining and advanced methods, such as machine fearning to
find new parameters and information which add value to the data. Analytics enable building models based
on selection of right data. Later the models are tested and used for services and processes.
Analytics Phases
Analytics has three phases before deriving new facts and providing business intelligence. These are:
1. Descriptive analytics enables deriving the additional value from visualisations and reports.
2. Prediccive analytics is advanced analytics whid1 enables extraction of new faces and knowledge, and
then predicts or forecasts.
3. Prescriptive analytics enables derivation of the additional value and undertake better decisions for new
option(s) to maximise die profits.
Descriptive Analytics
Descriptive analytics means finding the aggregates, frequencies of occurrences, mean values. Descriptive
analytics enable Lhe followi ng:
• Actions, sudl as Online Analytical Processing (OLAP) for the analytics
• Reporting or generating spreadsheets
• Visualisations or dashboard displays of the analysed results
• Creation of indicators, called key performance indicators.
Descripdve Analytics Methods
• Spreadsheet-based reports and data visualisatiorts: Results or descriptive analysis can be presented in a
spreadsheet format before creating the data visuals for the user. Spreadsheet enables user visualisation
of what if.
• Descriptive statistics-based reports and data visualisations: Descriptive analysis can also use
descriptive statislics. Staristical analysis means finding peak, minima, variance, probabilities, and
statistical parameters.
• Data mining and machine learning methods in analytics: Data mining analysis means use of algorithms
which extract hidden or un.known information or patterns from large amOLmts of data. Machine learning
means modelling of the specific tasks.
• 011/i11e analytical processing (OLAP) in analytics: OLAP enables viewing of analysed data up to the
desired granularity. OLAP enables obtaining summarized infom1ation and automated reports from large
volume dalabase.
OLAP is an interactive system to show different summaries of multidimensional data by interactive.ly
selecting the attributes in a multidimensional data cube.
Advanced Analytics: Predictive Analytics
Predictive anaJytics answer the question "What will happen?" Predictive analytics is advanced analytics. The
user interprets the outputs from advanced analytics using descriptive analytics methods, such as data
visualisation. For example, output predictions are visualised along with the yearly sales growth of past
five years and predicts next two years sales.
Predictive analytics uses algorithms, such as regression analysis, correlation, optimisation, and multivariate
statistics, and techniques sud1 as modeling, simulation, machine learning, and neural networks.
Prescriptive Analytics
This final phase, suggests actions for deriving benefits from predictions, and shows lhe implications of the
decision options or the optimal solutions or new resource allocation strategies or risk mitigalion strategies.
Prescriptive analytics suggest best course of actions in the g.iven state or set of inputs and rules.
Even t Analytics
event reporting. Event ana!ylics use event data, for
• Event anal~ cs use event data. for events tracking and
event reports using event metric (event counts.
events trackrng and event repontng. Event analytics generate
ation) in each category of events.
events acted up on, event pending action, race of new evems gener
An event has the following components:
example belongs co one category and event of
• Cace~ory-an e~ent of chocolate purchase in ACV M flavour which belongs to other category
reaching predefined threshold of sell for specific chocolate
fined sell is the action taken on the event
• Action-sending message from ACVM on completing prede
• Labe l (optional)
chocolate of that flavow- sold or remaining.
• Value (optional)-on evem. messaging the number of
c, such as event counts for a category of events,
Event analytics generate even t repons using event metri
s generation in that category.
events acted upon , eveoc pending action, rate of new event
Big data is multi-stmctured data w hile RDMS maintain more stnictured data. The open source software
Hadoop and MapReduce enable storage and analyse the massive amounts of data.
Hadoop File System (HDFS), Ma.bout, a library of machine learning algorithms and HiveQL , a SQL like
scri pting language software are used for Big data analytics in the Hadoop ecosystem.
MapReduce is a programming model . Large data sets process on to a cluster of nodes using MapReduce.
Same node runs the algorit hm using the data sets at HDFS and processing is at that node itself.
Hadoop is an open-source framework. The framework stores and processes big data. T he clusters of
computing nodes process that data us ing s imple programming models. Processing takes place in a
d istributed environment. Hadoop accesses data in sequential manner and perfom1s batch processing. ).
HBase is database for big data. Data access is random access.
l
<>t-pmled
IkiJ.1<;1orc
I \ ~Ill .. n.am Pro"~IIII
l uwplc..t. [, ..._"Ul Pro..~wi, Lll~tt
I
f,,,- \U\I O..i.. <;c,11,,~~
s..ur~n
Ai:qu.tJIII?
~041.a~cei
0.iLJ
C.uan.al 0.aU Sourtt,
I
Arrlk:M•,,_ \lpmll,,M
- I . I - I m,cQL
I
Dau .V•- SOL 0 - , l'lt uss•'- Oll P. t"TL .\MIJ,111<•
Af"'I,_,._ 1'-QL bl,.-
It llucr,ru,c ,1aut1,,;&. 1a-,1crw,y •• On-\',we
l>Maboo..- r..ic~,m, \l;plto.t.1« t.-d (Jthn, SuppDt1 x,..-.;
~~,
" ' ~ '>"!'11"'1 u-
I l t U,p, ttild
l ...._ ~
llDFS
l t11<fo-•l t>:iuo,w•• lw Wllttb 0
•~
lbtl ~,,rt
l\.,. "1an1 l'rc,,."unJ nia..... 1,k
luaJIIDl•cn&~ <;:,-mi ""
'.w.,t,,;Q
fl,t O.i.t
I I I
I
'C,.JI, ottn: O,;u
t., T \ ll.\l O.u -..:~
~ • • O.i,ta S.Uu"'
E•1<-I0;11a<;."''"°
-
F19ure S.S Betlteley d.iu .an.lyuo sUd\ ¥th tecture
KNOWLED GE ACQUTRING, MANAGING ANl> STORING PROCESSES
Three processes for knowledge are Acquiring process, Managing process and Storing process.
loT data somces continuously generate data, which the applications or processes acquire, organise
and integrates or enriches using analytics. Knowledge discovery tools provide the know ledge at particular
point of time as more and more data is processed and analysed. Know ledge is an important asset of an
enterprise.
Knowledge Management
Knowledge management (KM) is managing knowledge when Lhe new knowledge is regularly acquired,
processed and stored. Knowledge management also provisions for replacing the earlier gathered knowledge
and managing the life cycle of stored knowledge. A KM tool has processes for discovering, using, sharing,
repllacing with new, creating and managing the knowledge database and information of the enterprise.
Figure 5.6 (a) shows a reference architecrure for knowledge management. Figure 5.6(b) shows
cori-espondences with the ITU-T reference model four layers and OSI model layers.
The lowest layer has sublayers for devices data, streaming data sources which provide input for analytics
and knowledge. Databases, Business Support Systems (BSSs), Operational Support Systems (OSSs) data
can also be additional inputs.
Next higher layer has data adaptation and enrichment sublayers. Adap tation and enrichment sublayers
ada pt the data from the lowest layer in appropriate forms, such as database, structured data and unstructured
data so that it can be used for analytics and processing.
Next higher layer has processing and analytics sublayers. These sublayers are input to infomJacion access
tools and know ledge discovery tools.
The highest layer has knowledge acquiring, managing, storing and knowledge life-cycle management;
sublayers for managing, storing and knowledge life-cycle management Knowledge acquires from the use of
information access tools and knowledge discovery tools.
K...,,,.,c.i_~ \~UW. Storn,, ,t'IJ t.n..••lcd~
I tf" c·r:lf \ b n a ~ ~DJ ~•=~nJ
Kn.:.- l.:J11t .\u1111r,1, '-s.hbycr
llllocm.11•111 .'4-.C'i-4 aaJ
~lc,,u,c Uh,'1 1\.-t)' lwh .',i.i>l•1a
- I
\ppl,ultl•lll•
C:'llf>llllllUn
l.t)<I
(',.--,,er,, anJ
-'rpl,.lliua l~)l-t
I
\JtploealluD 11t1J
\r,ik'lll•oD <;.,1,,,.of\ <;,1t,b)<'ft ~Iii,;
Arphu•io"I '-•irr-in
~
L.oct
(~boldu:i)
Flgunt , .. IA) A fl.!fetencf! .lrd'\JIKMe '°' Lne knowled~ m.ln.g~ment lldt hM\d sldt'I and
(bl Coff~l)Onc:l!>nc• In ,~m, of ,ru T rrforcnce mod..J .incl ~I I•~ ror loT/M2M (middle
Ind tight h.lnd 11deol