Lecture 08 - Histograms
Lecture 08 - Histograms
Histograms
Announcements
● HW02 is due today
● Lab03 is posted on data science server
○ Due coming Wednesday before class
● HW03 is posted on data science server
● Due coming Friday before class
Weekly Goals
● Monday
○ Table review
○ Working with Census data
● Wednesday
○ Visualizing data
○ Line plots, scatter plots, bar charts
● Today
○ Visualizing two kinds of distributions
○ Proportions as areas
Distributions
Terminology
● Individuals: those whose attributes are recorded
● Variable: an attribute (column)
○ can be numerical or categorical
○ has different values
■ each individual has exactly one value
○ has a distribution:
■ For each different value of the variable, the
frequency of individuals that have that value
A Distribution
Each individual is in exactly one category. Percents add up to 100.
Each respondent
can pick more than
one answer.
(Demo)
Bar Chart
To display all the values of the variable along with all their
frequencies
● Bar chart
○ One bar for each category
○ You can choose the order of the bars
○ Length of bar is the percent (or count) of individuals in
that category
(Demo)
Numerical Distributions
Grouping Numerical Values: Binning
Binning is counting the number of numerical values that lie
within ranges, called bins.
● Bins are defined by their lower bounds (inclusive)
● The upper bound is the lower bound of the next bin
188, 170, 189, 163, 183, 171, 185, 168, 173, ... The [185,190) bin
Caption: The new iPad battery is 70% bigger than the previous iPad. Source
Area Principle
Areas should be proportional to the values they represent.
For example
● If you represent 20% of a population by