0% found this document useful (0 votes)
167 views

IoT Chapter 3 PDF

The document discusses big data including its sources, storage challenges and solutions, and how businesses analyze big data. It describes how everything from sensors to websites generates large amounts of diverse data. Big data is stored in data centers and the cloud, using techniques like distributed processing and Hadoop for scalability and fault tolerance. Businesses use data analytics to better understand customers and operations in order to improve products and services.

Uploaded by

Real Rajapaksha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
167 views

IoT Chapter 3 PDF

The document discusses big data including its sources, storage challenges and solutions, and how businesses analyze big data. It describes how everything from sensors to websites generates large amounts of diverse data. Big data is stored in data centers and the cloud, using techniques like distributed processing and Hadoop for scalability and fault tolerance. Businesses use data analytics to better understand customers and operations in order to improve products and services.

Uploaded by

Real Rajapaksha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 23

Chapter 3: Everything

Generates Data
Instructor Materials

Introduction to the Internet of Things v2.0


Chapter 3: Everything
Generates Data
Introduction to the Internet of Things
v2.0 Planning Guide

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
Chapter 3: Everything
Generates Data

Introduction to the Internet of Things v2.0


Chapter 3 - Sections & Objectives
▪ 3.1 Big Data
• Explain the concept of Big Data.
• Describe the sources of Big Data.
• Explain the challenges and solutions to Big Data storage.
• Explain how Big Data analytics are used to support Business.

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
3.1 Big Data

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11
What is Big Data?
What is Big Data? ▪ Data is information that comes from a variety
of sources, such as people, pictures, text,
sensors, web sites and technology devices.
▪ Three characteristics that indicate an
organization may be dealing with Big Data:
• A large amount of data that increasingly
requires more storage space (volume).
• An amount of data that is growing exponentially
fast (velocity).
• Data that is generated in different formats
(variety).
▪ Examples of data amounts collected by
sensors:
• One autonomous car can generate 4,000
gigabits (Gb) of data per day.
• One smart connected home can produce as
much as 1 gigabyte (GB) of information a week.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12
What is Big Data?
Does the Business Generate Big Data?

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13
What is Big Data?
Large Datasets
▪ Companies do not necessarily have to
generate their own Big Data.
▪ There are sources of free data sets
available, ready to be used and
analyzed.

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
What is Big Data?
Lab – Database Search

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
Where is Big Data Stored?
What are the Challenges of Big Data?
▪ IBM’s Big Data estimates conclude that
“each day we create 2.5 quintillion bytes
of data”.
▪ Five major storage problems with Big
Data:
• Management
• Security
• Redundancy
• Analytics
• Access

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
Where is Big Data Stored?
Where Can We Store Big Data?
▪ Big data is typically stored on multiple
servers, in data centers.
▪ Fog computing utilizes end-user clients or
“edge” devices to do a substantial amount of
the pre-processing and storage.
• Data from that pre-processed analysis can be
fed back into the companies’ systems to modify
processes if required.
• Communications to and from the servers and
devices is quicker and requires less bandwidth
than constantly going out to the cloud.

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
Where is Big Data Stored?
The Cloud and Cloud Computing
▪ The cloud is a collection of data centers or groups of connected
servers.
▪ Cloud services for individuals include:
• Storage of data, such as pictures, music, movies, and emails.
• Access many applications instead of downloading onto local device.
• Access data and applications anywhere, anytime, and on any device.
▪ Cloud Services for an Enterprise include:
• Access to organizational data anywhere and at any time.
• Streamlines the IT operations of an organization.
• Eliminates or reduces the need for onsite IT equipment, maintenance,
and management.
• Reduces cost for equipment, energy, physical plant requirements, and
personnel training needs.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
Where is Big Data Stored?
Distributed Processing ▪ Distributed data processing takes the large
volume of data and breaks it into smaller pieces.
▪ These smaller pieces are distributed in many
locations to be processed by many computers.
▪ Each computer in the distributed architecture
analyzes its part of the Big Data picture (horizontal
scaling).

▪ Hadoop was created to deal with these Big Data


volumes. It has two main features that has made it
the industry standard:
• Scalability - Larger cluster sizes improve
performance and provide higher data processing
capabilities.
• Fault tolerance – Hadoop automatically replicates
data across clusters.
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
Supporting Business with Big Data
Why Do Businesses Analyze Data?
▪ Data analytics allows businesses to better
understand the impact of their products and
services, adjust their methods and goals, and
provide their customers with better products faster.
▪ Value comes from two primary types of processed
data, transactional and analytical.
▪ Transactional information is captured and
processed as events happen.
• Used to analyze daily sales reports and production
schedules to determine how much inventory to carry.
▪ Analytical information supports managerial analysis
tasks like determining whether the organization
should build a new manufacturing plant.

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
Supporting Business with Big Data
Sources of Information ▪ Data originates from sensors and anything that has
been scanned, entered, and released to the Internet.
▪ Collected data can be categorized as structured or
unstructured.
▪ Structured data is created by applications that use
“fixed” format input such as spreadsheets. May need
to be manipulated into a common format such as
CSV.
▪ Unstructured data is generated in a “freeform” style
such as audio, video, web pages, and tweets.
▪ Examples of tools to prepare unstructured data for
processing are:
• “Web scraping” tools automatically extract data from
HTML pages.
• RESTful application program interfaces (APIs).
© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
Supporting Business with Big Data
Data Visualization
▪ Data mining is the process of turning raw data
into meaningful information.
▪ The mined data must be analyzed and
presented to managers and decision makers.
▪ Determining the best visualization tools to use
will vary based on the following:
• Number of variables
• Number of data points in each variable
• Is the data representing a timeline
• Items require comparisons
▪ Popular charts include line, column, bar, pie, and
scatter.

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
Supporting Business with Big Data
Chart Types

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
Supporting Business with Big Data
Analyzing Big Data for Effective Use in Business
▪ Data analysis is the process of inspecting,
cleaning, transforming, and modeling data to
uncover useful information.
▪ Having a strategy helps a business determine
the type of analysis required and the best tool to
do the analysis.
▪ Tools and applications range from using an
Excel spreadsheet or Google Analytics for small
to medium data samples, to the applications
dedicated to manipulating and analyzing really
big datasets.
▪ Examples include Knime, OpenRefine, Orange,
and RapidMiner.

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
Supporting Business with Big Data
Excel lab: Forecasting

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25
3.2 Chapter Summary

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
Chapter Summary
Summary
▪ Three characteristics of Big Data:
• large amount of data that increasingly requires more storage space (volume)
• growing exponentially fast (velocity)
• generated in different formats (variety)
▪ Fog computing utilizes end-user clients or “edge” devices to do pre-processing and storage.
• Designed to keep the data closer to the source for pre-processing.
▪ The cloud is a collection of data centers or groups of connected servers giving anywhere,
anytime access to software, storage, and services using a browser interface.
• Provide increased data storage and reduce the need for onsite IT equipment, maintenance, and
management.
▪ Distributed data processing takes large volumes of data from a source and breaks it into
smaller pieces and distributes to many locations to be processed.
• Each computer in the distributed architecture analyzes its part of© 2016
theCiscoBig Data picture.
and/or its affiliates. All rights reserved. Cisco Confidential 27
Chapter Summary
Summary (Cont.)
▪ Businesses gain value by collecting and analyzing data to understand the impact of their
products and services, adjust their methods and goals, and provide their customers with
better products faster.
▪ Structured data is created by applications that use “fixed” format input such as spreadsheets or
medical forms.
▪ Unstructured data is generated in a “freeform” style such as audio, video, web pages, and tweets.

▪ Both forms of data need to be manipulated into a common format to be analyzed.

▪ Data mining is the process of turning raw data into meaningful information by discovering patterns
and relationships in large data sets.
▪ Data visualization is the process of taking the analyzed data and using charts such as line, column,
bar, pie, or scatter to present meaningful information.

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28

You might also like