0% found this document useful (0 votes)
77 views

Data Analyst Chapter 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Data Analyst Chapter 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Instructor Materials

Chapter 3: Data Analysis

Big Data & Analytics

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 1
Chapter 3: Data Analysis

Big Data & Analytics

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 9
Chapter 3 - Sections & Objectives
 3.1 Analyzing Data
• Analyze data using basic statistics.
 3.2 Preparation for Chapter 3 Internet Meter Lab
• Configure data for analysis.
 3.3 Summary
• Summarize the concepts presented in this chapter.

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 10
3.1 Analyzing Data

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 11
Analyzing Data
Preliminaries

 Data is changed from its


raw format into information
after it has been gathered,
prepared, analyzed, and
presented in a usable
format.
 Exploratory data analysis is
a set of procedures
designed to produce
descriptive and graphical
summaries of data with the
notion that the results may
reveal interesting patterns
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 12
Analyzing Data
Preliminaries cont…
 IoT Concerns
• IoT data may come in large volume and in different forms.
• IoT data may require more advanced analytic tools for structured and
unstructured data
• IoT data is frequently streaming in real time or nearly real time.
 Observations, Variables, and Values
• A variable is anything that varies from one instance to another and is
something that can be measured, manipulated or controlled.
• The recordings of the values, patterns and occurrences for a set of
variables is an observation.
• The set of values for a specific observation is called a data point.

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 13
Analyzing Data
Preliminaries cont…
 Categorical variables include:
• Nominal – Two or more categories or names that identify the object
• Ordinal – Two or more categories in which order matter in the value
 Numerical variables include:
• Continuous – quantitative along a continuum or range of values
• Ratio - Interval variables where zero (0) means none
• Discrete - Quantitative with a specific value from a finite set of values

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 14
Analyzing Data
Statistical Analysis

 Statistics is the collection


and analysis of data using
mathematical techniques.
 Sample and Population
• A population is a group of similar
entities such as people, objects,
or events that share some
common set of characteristics.
• A sample is a representative
group from the population.

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 15
Analyzing Data
Statistical Analysis cont…

 Descriptive statistics
• describe or summarize
the values and
observations of a data
set.

 Inferential statistics
• process of collecting,
analyzing and
interpreting data
gathered from a
sample to make
generalizations or
predictions about a
population

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 16
Analyzing Data
Characteristics of Samples

 Distribution
• a variable and its frequency or
probability

 Centrality
• The mean, median, and mode

 Dispersion
• the variability in the distribution

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 17
Analyzing Data
Analysis Using Descriptive Statistics

 Pandas
• open source library for
Python that adds high-
performance data
structures and tools
for analysis of large
data sets
• Import data from files
• Import data from web
• Descriptive statistics
in pandas

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 18
Analyzing Data
Analysis Using Correlation
 “Correlation does
not imply causation”
• Causation is a
relationship in which
one thing changes,
or is created, directly
because of
something else.
• Correlation is a
relationship between
phenomena in which
two or more things
change at a similar
rate.
• Correlations can be
positive or negative.

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 19
Analyzing Data
Analysis Using Correlation cont…

 Correlations can be
calculated for
multiple variables
simultaneously
 Heat map
• values for correlation
coefficients relate to
one another

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 20
3.2 Preparation for
Chapter 3 Internet Meter
Lab

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 21
Preparation for Chapter 3 Internet Meter Lab
Basic Analysis with pandas

 More often than not, the  NaNs (Not a Number)


data sets that you work values are used to
with will have represent data that is
incompatibilities undefined or cannot be
represented. pandas refers
 Cleaning data can involve to missing data as NaN
removing missing or values
unwanted values, or • NaTs are used for timestamps
altering the format of the  Pandas has many built-in
values to make them functions for:
consistent • converting the datatypes
• manipulating data frames
• running statistical analysis on
data sets.
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 22
3.3 Summary

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 23
Chapter Summary
Summary
 Exploratory data analysis produces descriptive and graphical summaries
of data with the notion that the results may reveal interesting patterns.
 IoT data may be structured or unstructured and data must be organized
in real time.
 Observations, variables, and values are critical to an analysis.
 Variables include Numerical (Continuous and Discrete) and Categorical
(Nominal and Ordinal)
 Statistics is the collection and analysis of data using mathematical
techniques.
• The interpretation of data and the presentation of findings.
• The discovery of patterns or relationships between variables.

• Statistics uses samples and populations.


• Statistical analysis includes descriptive and inferential statistics.

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 24
Chapter Summary
Summary cont…
 Distribution is a simple association between a value and the number or
percentage of times it appears in a data sample.
 Centrality includes the mean, median, and mode.
• These values that are closer to the center of the distribution occur with greater
frequency.

• Dispersion is the variability in the distribution.


• Pandas is an open source library for Python with tools for analysis of
large data sets
• Importing data from files
• Importing data from Web
• Viewing descriptive statistics

• “Correlation does not imply causation”


• Data commonly needs cleaning, converting, and manipulating before
data analysis.

Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 25
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 26
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 27

You might also like