Unit 2 Notes
Unit 2 Notes
UNIT-2
SUMMARIZATION OF DATA
Descriptive Statistics:
Descriptive statistics involve summarizing and describing data to understand its main
features.
Example: Consider the heights (in cm) of a group of students - {150, 155, 160, 165,
170, 175, 180}.
Mean: (150 + 155 + 160 + 165 + 170 + 175 + 180) / 7 = 164.29 cm
Median: Arrange data in ascending order - {150, 155, 160, 165, 170, 175, 180}.
Since there are 7 data points, the median is the middle value, which is 165 cm.
Mode: The most frequent value in the data set is 170 cm.
A relative frequency table shows the proportion or percentage of each data value
relative to the total number of observations.
Example: Using the previous test scores data, the relative frequency table will be:
Test Score Relative Frequency (%)
75 (3 / 15) * 100 ≈ 20%
80 (3 / 15) * 100 ≈ 20%
85 (7 / 15) * 100 ≈ 47%
90 (2 / 15) * 100 ≈ 13%
Ungrouped data is the data you first gather from an experiment or study. The data
is raw — that is, it’s not sorted into categories, classified,
BAR GRAPH
A bar graph represents categorical or discrete data using rectangular bars of equal
width. The height of each bar corresponds to the frequency or relative frequency of
the category it represents.
Bar graphs consist of two axes. On a vertical bar graph, as shown above, the
horizontal axis (or x-axis) shows the data categories.
Bar graphs have three key attributes:
• A bar diagram makes it easy to compare sets of data between different groups
at a glance.
• The graph represents categories on one axis and a discrete value in the other.
The goal is to show the relationship between the two axes.
• Bar charts can also show big changes in data over time.
PIE CHART
The “pie chart” also is known as “circle chart”, that divides the circular statistical
graphic into sectors or slices in order to illustrate the numerical problems. Each
sector denotes a proportionate part of the whole
1. Pie charts are a visual way of displaying data that might otherwise be given in a
small table.
2. Pie charts are useful for displaying data that are classified into nominal or
ordinal categories. Nominal data are categorised according to descriptive or
qualitative information such as county of birth or type of pet owned. Ordinal data
are similar but the different categories can also be ranked, for example in a survey
people may be asked to say whether they classed something as very poor, poor,
fair, good, very good.
3. Pie charts are generally used to show percentage or proportional data and
usually the percentage represented by each category is provided next to the
corresponding slice of pie.
Line Graph
A line graph is a graph that utilizes points and lines to represent change over time.
It is a chart that shows a line joining several points or a line that shows the relation
between the points. The graph represents quantitative data between two changing
variables with a line or curve that joins a series of successive data points. Linear
graphs compare these two variables in a vertical axis and a horizontal axis.
• You want to show trends. For example, how your investments change over
time or how food prices have increased over time. • You want to make
predictions. A line graph can be extrapolated beyond the data at hand. They
enable you to make predictions about the results of data.
Interpret the following graphs
What is a Histogram?
A histogram is an area diagram. It can be defined as a set of rectangles with bases
along with the intervals between class boundaries and with areas proportional to
frequencies in the corresponding classes. In such representations, all the rectangles
are adjacent since the base covers the intervals between class boundaries. The
heights of rectangles are proportional to corresponding frequencies of similar
classes and for different classes, the heights will be proportional to corresponding
frequency densities.
In other words, histogram a diagram involving rectangles whose area is
proportional to the frequency of a variable and width is equal to the class interval.
A histogram is used to summarize discrete or continuous data. In other words, it
provides a visual interpretation of numerical data by showing the number of data
points that fall within a specified range of values (called “bins”). It is similar to a
vertical bar graph. However, a histogram, unlike a vertical bar graph, shows no
gaps between the bars.
Frequency Polygon
A frequency polygon is almost identical to a histogram, which is used to compare
sets of data or to display a cumulative frequency distribution. It uses a line graph
to represent quantitative data.
Frequency polygons are a visually substantial method of representing quantitative
data and its frequencies. Let us discuss how to represent a frequency polygon.
• Step 1- Choose the class interval and mark the values on the horizontal axes •
Step 2- Mark the mid value of each interval on the horizontal axes.
• Step 3- Mark the frequency of the class on the vertical axes.
• Step 4- Corresponding to the frequency of each class interval, mark a point at
the height in the middle of the class interval
• Step 5- Connect these points using the line segment.
• Step 6- The obtained representation is a frequency polygon.
Boxplot:
In descriptive statistics, a boxplot is a method for graphically depicting groups of
numerical data through their quartiles.
A boxplot is a standardized way of displaying the distribution of data based on a
five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3),
and “maximum”). It can tell you about your outliers and what their values are. It
can also tell you if your data is symmetrical, how tightly your data is grouped, and
if and how your data is skewed.
For some distributions/datasets, you will find that you need more information
than the measures of central tendency (median, mean, and mode). You need to
have information on the variability or dispersion of the data. A boxplot is a graph
that gives you a good indication of how the values in the data are spread out.
Stem-and-Leaf Plot:
Now we are ready to add the ones place from each of the values in the list we
made.
Box Plot Box and whisker plot for Distribution and Outliers
data distribution