IoT Chapter 3 PDF
IoT Chapter 3 PDF
Generates Data
Instructor Materials
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
Chapter 3: Everything
Generates Data
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
3.1 Big Data
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11
What is Big Data?
What is Big Data? ▪ Data is information that comes from a variety
of sources, such as people, pictures, text,
sensors, web sites and technology devices.
▪ Three characteristics that indicate an
organization may be dealing with Big Data:
• A large amount of data that increasingly
requires more storage space (volume).
• An amount of data that is growing exponentially
fast (velocity).
• Data that is generated in different formats
(variety).
▪ Examples of data amounts collected by
sensors:
• One autonomous car can generate 4,000
gigabits (Gb) of data per day.
• One smart connected home can produce as
much as 1 gigabyte (GB) of information a week.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12
What is Big Data?
Does the Business Generate Big Data?
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13
What is Big Data?
Large Datasets
▪ Companies do not necessarily have to
generate their own Big Data.
▪ There are sources of free data sets
available, ready to be used and
analyzed.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
What is Big Data?
Lab – Database Search
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
Where is Big Data Stored?
What are the Challenges of Big Data?
▪ IBM’s Big Data estimates conclude that
“each day we create 2.5 quintillion bytes
of data”.
▪ Five major storage problems with Big
Data:
• Management
• Security
• Redundancy
• Analytics
• Access
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
Where is Big Data Stored?
Where Can We Store Big Data?
▪ Big data is typically stored on multiple
servers, in data centers.
▪ Fog computing utilizes end-user clients or
“edge” devices to do a substantial amount of
the pre-processing and storage.
• Data from that pre-processed analysis can be
fed back into the companies’ systems to modify
processes if required.
• Communications to and from the servers and
devices is quicker and requires less bandwidth
than constantly going out to the cloud.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
Where is Big Data Stored?
The Cloud and Cloud Computing
▪ The cloud is a collection of data centers or groups of connected
servers.
▪ Cloud services for individuals include:
• Storage of data, such as pictures, music, movies, and emails.
• Access many applications instead of downloading onto local device.
• Access data and applications anywhere, anytime, and on any device.
▪ Cloud Services for an Enterprise include:
• Access to organizational data anywhere and at any time.
• Streamlines the IT operations of an organization.
• Eliminates or reduces the need for onsite IT equipment, maintenance,
and management.
• Reduces cost for equipment, energy, physical plant requirements, and
personnel training needs.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
Where is Big Data Stored?
Distributed Processing ▪ Distributed data processing takes the large
volume of data and breaks it into smaller pieces.
▪ These smaller pieces are distributed in many
locations to be processed by many computers.
▪ Each computer in the distributed architecture
analyzes its part of the Big Data picture (horizontal
scaling).
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
Supporting Business with Big Data
Sources of Information ▪ Data originates from sensors and anything that has
been scanned, entered, and released to the Internet.
▪ Collected data can be categorized as structured or
unstructured.
▪ Structured data is created by applications that use
“fixed” format input such as spreadsheets. May need
to be manipulated into a common format such as
CSV.
▪ Unstructured data is generated in a “freeform” style
such as audio, video, web pages, and tweets.
▪ Examples of tools to prepare unstructured data for
processing are:
• “Web scraping” tools automatically extract data from
HTML pages.
• RESTful application program interfaces (APIs).
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
Supporting Business with Big Data
Data Visualization
▪ Data mining is the process of turning raw data
into meaningful information.
▪ The mined data must be analyzed and
presented to managers and decision makers.
▪ Determining the best visualization tools to use
will vary based on the following:
• Number of variables
• Number of data points in each variable
• Is the data representing a timeline
• Items require comparisons
▪ Popular charts include line, column, bar, pie, and
scatter.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
Supporting Business with Big Data
Chart Types
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
Supporting Business with Big Data
Analyzing Big Data for Effective Use in Business
▪ Data analysis is the process of inspecting,
cleaning, transforming, and modeling data to
uncover useful information.
▪ Having a strategy helps a business determine
the type of analysis required and the best tool to
do the analysis.
▪ Tools and applications range from using an
Excel spreadsheet or Google Analytics for small
to medium data samples, to the applications
dedicated to manipulating and analyzing really
big datasets.
▪ Examples include Knime, OpenRefine, Orange,
and RapidMiner.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
Supporting Business with Big Data
Excel lab: Forecasting
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25
3.2 Chapter Summary
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
Chapter Summary
Summary
▪ Three characteristics of Big Data:
• large amount of data that increasingly requires more storage space (volume)
• growing exponentially fast (velocity)
• generated in different formats (variety)
▪ Fog computing utilizes end-user clients or “edge” devices to do pre-processing and storage.
• Designed to keep the data closer to the source for pre-processing.
▪ The cloud is a collection of data centers or groups of connected servers giving anywhere,
anytime access to software, storage, and services using a browser interface.
• Provide increased data storage and reduce the need for onsite IT equipment, maintenance, and
management.
▪ Distributed data processing takes large volumes of data from a source and breaks it into
smaller pieces and distributes to many locations to be processed.
• Each computer in the distributed architecture analyzes its part of© 2016
theCiscoBig Data picture.
and/or its affiliates. All rights reserved. Cisco Confidential 27
Chapter Summary
Summary (Cont.)
▪ Businesses gain value by collecting and analyzing data to understand the impact of their
products and services, adjust their methods and goals, and provide their customers with
better products faster.
▪ Structured data is created by applications that use “fixed” format input such as spreadsheets or
medical forms.
▪ Unstructured data is generated in a “freeform” style such as audio, video, web pages, and tweets.
▪ Data mining is the process of turning raw data into meaningful information by discovering patterns
and relationships in large data sets.
▪ Data visualization is the process of taking the analyzed data and using charts such as line, column,
bar, pie, or scatter to present meaningful information.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28