Data Analyst Chapter 3
Data Analyst Chapter 3
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 1
Chapter 3: Data Analysis
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 9
Chapter 3 - Sections & Objectives
3.1 Analyzing Data
• Analyze data using basic statistics.
3.2 Preparation for Chapter 3 Internet Meter Lab
• Configure data for analysis.
3.3 Summary
• Summarize the concepts presented in this chapter.
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 10
3.1 Analyzing Data
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 11
Analyzing Data
Preliminaries
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 13
Analyzing Data
Preliminaries cont…
Categorical variables include:
• Nominal – Two or more categories or names that identify the object
• Ordinal – Two or more categories in which order matter in the value
Numerical variables include:
• Continuous – quantitative along a continuum or range of values
• Ratio - Interval variables where zero (0) means none
• Discrete - Quantitative with a specific value from a finite set of values
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 14
Analyzing Data
Statistical Analysis
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 15
Analyzing Data
Statistical Analysis cont…
Descriptive statistics
• describe or summarize
the values and
observations of a data
set.
Inferential statistics
• process of collecting,
analyzing and
interpreting data
gathered from a
sample to make
generalizations or
predictions about a
population
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 16
Analyzing Data
Characteristics of Samples
Distribution
• a variable and its frequency or
probability
Centrality
• The mean, median, and mode
Dispersion
• the variability in the distribution
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 17
Analyzing Data
Analysis Using Descriptive Statistics
Pandas
• open source library for
Python that adds high-
performance data
structures and tools
for analysis of large
data sets
• Import data from files
• Import data from web
• Descriptive statistics
in pandas
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 18
Analyzing Data
Analysis Using Correlation
“Correlation does
not imply causation”
• Causation is a
relationship in which
one thing changes,
or is created, directly
because of
something else.
• Correlation is a
relationship between
phenomena in which
two or more things
change at a similar
rate.
• Correlations can be
positive or negative.
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 19
Analyzing Data
Analysis Using Correlation cont…
Correlations can be
calculated for
multiple variables
simultaneously
Heat map
• values for correlation
coefficients relate to
one another
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 20
3.2 Preparation for
Chapter 3 Internet Meter
Lab
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 21
Preparation for Chapter 3 Internet Meter Lab
Basic Analysis with pandas
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 23
Chapter Summary
Summary
Exploratory data analysis produces descriptive and graphical summaries
of data with the notion that the results may reveal interesting patterns.
IoT data may be structured or unstructured and data must be organized
in real time.
Observations, variables, and values are critical to an analysis.
Variables include Numerical (Continuous and Discrete) and Categorical
(Nominal and Ordinal)
Statistics is the collection and analysis of data using mathematical
techniques.
• The interpretation of data and the presentation of findings.
• The discovery of patterns or relationships between variables.
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 24
Chapter Summary
Summary cont…
Distribution is a simple association between a value and the number or
percentage of times it appears in a data sample.
Centrality includes the mean, median, and mode.
• These values that are closer to the center of the distribution occur with greater
frequency.
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 25
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 26
Presentation_ID © 2008 Cisco Systems, Inc. All rights reserved. Cisco Confidential 27