0% found this document useful (0 votes)
66 views44 pages

ADDB - Week 1

Statistics is the methodology of extracting useful information from data. Descriptive statistics organizes and summarizes data, while inferential statistics draws conclusions about populations from samples. A sample is a subset of a population. Qualitative variables use nominal or ordinal scales to categorize data, while quantitative variables use interval or ratio scales to measure numerical values. Frequency distributions organize data into classes to summarize both qualitative and quantitative variables. Graphs like pie charts and bar charts can also visually summarize qualitative data.

Uploaded by

Little A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views44 pages

ADDB - Week 1

Statistics is the methodology of extracting useful information from data. Descriptive statistics organizes and summarizes data, while inferential statistics draws conclusions about populations from samples. A sample is a subset of a population. Qualitative variables use nominal or ordinal scales to categorize data, while quantitative variables use interval or ratio scales to measure numerical values. Frequency distributions organize data into classes to summarize both qualitative and quantitative variables. Graphs like pie charts and bar charts can also visually summarize qualitative data.

Uploaded by

Little A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

CHAPTER 1

STATISTICS & DATA


The relevance of statistics: Daily sales record of a seller at Tokopedia –
Example 1

What can you tell from the picture?


The relevance of statistics: Product record of a seller at
Tokopedia – Example 2

What conclusions can you draw from the table?


Definition of Statistics

•Statistics is “the methodology of


extracting useful information from a
data set” (Jaggia & Kelly, 2019)
Division of Statistics:

• Descriptive statistics – collecting, organizing, and


presenting the data

• Inferential statistics – drawing conclusions about


characteristics of a population by using sample set
What is your understanding on Sampling and Inference?
Population & Sample

• All members of a specified • A smaller set of data drawn


group from a specified group

• Ex:2,700,500 male • Ex: 270,005 male


Indonesians aged 75+ Indonesians aged 75+

A sample statistics is interpreting a situation by using samples to describe characteristics of


population
Discussions
• An accounting professor wants to know the average GPA of
the students enrolled in her class. She looks up information
on Blackboard about the students enrolled in her class and
computes the average GPA as 3.29.
• Describe the relevant population.
• Does the value 3.29 represent the population parameter
or the sample statistic?
Reasons to draw samples (students are asked to write
down their answers)

• 1. Limited in resources (time, funds)

• 2. Dataset is too large


Types of data (time of data collection)

• Cross section: data collected by recording a characteristic of many subjects


at the same point in time, or without regard to differences in time.

• Time series: data collected over several time periods focusing on certain
groups of people, specific events, or objects. Time series can include hourly,
daily, weekly, monthly, quarterly, or annual observations.
Types of data (Data format)

• Structured data: generally refers to data that has a well-defined length and
format. Structured data reside in a predefined row-column format.
Structured data generally consist of numerical information that is objective.

• Unstructured data: (or un-modeled data) do not conform to a predefined


row-column format.
Big Data

• a catchphrase, meaning a massive volume of both structured and


unstructured data that are extremely difficult to manage, process, and
analyze using traditional data processing tools.

• Can you give an example?


Types of variables
• A quantitative variable: that • Qualitative variable: that labels
assumes meaningful numerical or names are used to identify
values the distinguishing characteristic
• Discrete: assumes a countable of each observation.
number of values.
• Continuous: is characterized by
• Example: type of business, type
uncountable values within an of cars
interval.
Discussion:

• Which of the following variables are qualitative and which are quantitative?
If the variable is quantitative, then specify whether the variable is discrete or
continuous.
• Points scored in a football game.
• Colors of cars in a mall parking lot.
• Heights of 15-year-olds.
Types of measurement scales

Quantitative variables Qualitative variables


• Interval scale • Nominal scale
• Ratio scale • Ordinal scale
Measurement scales for qualitative variables

• Nominal: categorizing or grouping the data by giving a label or name.


• Ex: 0 = Male; 1 = Female

• Ordinal: categorizing and ranking the data with respect to some


characteristic or trait.
• Ex: Olympic Championships: gold medal champion, silver medal champion, and bronze
medal champion
Measurement scales for quantitative variables

• Interval: assured that the differences between scale values are meaningful.
Thus, the arithmetic operations of addition and subtraction are meaningful. the
value of zero is unspecified.
• Example: Temperature

• Ratio: Ratio data have all the characteristics of interval data as well as a true
zero point, which allows us to interpret the ratios of values. A ratio scale is used
to measure many types of data in business analysis.
• Ex: Sales, profits, weights
Exercise

• In each of the following scenarios, define the type of measurement scale.


• A kindergarten teacher marks whether each student is a boy or a girl.
• A ski resort records the daily temperature during the month of January.
• A restaurant surveys its customers about the quality of its waiting staff
on a scale of 1 to 4, where 1 is poor and 4 is excellent.
CHAPTER 2
Tabular & Graphical Methods
Summarizing qualitative data

• Frequency distribution groups data into categories and records the number
of observations that fall into each category
• A brief outlook on automotive brands sold in February 2021 is given below:
Toyota Toyota Toyota Toyota Honda Nissan Toyota
Toyota Toyota Nissan Toyota Toyota Toyota Nissan
Toyota Toyota Toyota Nissan Toyota Nissan Toyota
Nissan Toyota Toyota Toyota Nissan Toyota Toyota
Summarizing qualitative data

BRANDS FREQUENCY RELATIVE FREQUENCY


HONDA 1 1/28 = 0.036
TOYOTA 20 20/28 = 0.714
NISSAN 7 7/28 = 0.250
Total cars = 28 Total = 1.00

We calculate each category’s relative frequency by dividing the


respective category’s frequency by the total number of
observations
Pie Chart

• A pie chart is a segmented circle whose segments portray the relative


frequencies of the categories of some qualitative variable.
Bar chart

• A bar chart depicts the


frequency or the relative
frequency for each category of
the qualitative variable as a
series of horizontal or vertical
bars, the lengths of which are
proportional to the values that
are to be depicted.
Discussions

• A local restaurant is committed to providing its patrons with the best dining experience possible. On a
recent survey, the restaurant asked patrons to rate the quality of their entrées. The responses ranged
from 1 to 5, where 1 indicated a disappointing entrée and 5 indicated an exceptional entrée. The
results of the survey are as follows:

a. Construct frequency and relative frequency distributions that summarize the survey’s results.
b. Are patrons generally satisfied with the quality of their entrées? Explain.
Summarizing quantitative data

• Frequency distribution groups data into classes or intervals.


• Constructing a frequency distribution requires:
• Classes must be mutually exclusive - classes do not overlap.
Ex., 400 ≤ Price < 500, 400 < Price ≤ 500, in which case the value
400 is included in the previous class interval.
• Classes must be exhaustive, The total number of classes
covers the entire sample (or population)
Summarizing quantitative data

• The number of classes usually ranges from 5 to 20. This is a guideline, not an
absolute rule.
• Approximating the class width:
What is the price range?
For quantitative data, a frequency distribution groups data into intervals
called classes and
records the number of observations that falls into each class.

A cumulative frequency distribution records the number of observations that


fall below the upper limit of each class.

Class (in $1,000s) Frequency Cumulative Frequency

300 up to 400 4 4
400 up to 500 11 4 + 11 = 15
500 up to 600 14 4 + 11 + 14 = 29
600 up to 700 5 4 + 11 + 14 + 5 = 34
700 up to 800 2 4 + 11 + 14 + 5 + 2 = 36
Total 36  
Relative & cumulative relative frequency
distribution
a relative frequency distribution identifies the proportion (or the
fraction) of observations that falls into each class—that is,

A cumulative relative frequency distribution records the


proportion (or the fraction) of observations that fall below
the upper limit of each class
Example of relative and cumulative relative
frequency distribution
Relative
Class (in $1,000s) Frequency
Frequency Cumulative Relative Frequency

300 up to 400 4 4/36 = 0.11 0.11

400 up to 500 11 11/36 = 0.31 0.11 + 0.31 = 0.42

500 up to 600 14 14/36 = 0.39 0.11 + 0.31 + 0.39 = 0.81

600 up to 700 5 5/36 = 0.14 0.11 + 0.31 + 0.39 + 0.14 = 0.95

700 up to 800 2 2/36 = 0.06 0.11 + 0.31 + 0.39 + 0.14 + 0.06 ≈ 1.0

Total 36 1.0  
Histograms
(students are asked to tell the difference between two provided histograms)

• A histogram is a series of rectangles where the width and height of each


rectangle represent the class width and frequency (or relative frequency)
of the respective class.
How many types of Histograms?

Can you tell the difference? Explain!


Polygon

A polygon connects a series of neighboring points where each point represents


the midpoint of a particular class and its associated frequency or relative
frequency.
Ogive

• An ogive connects a series of neighboring points where each point represents the
upper limit of a particular class and its associated cumulative frequency or
cumulative relative frequency.

• An ogive differs from a polygon in that the use of the upper limit of each class as the
x-coordinate is evident and the cumulative frequency or cumulative relative
frequency of the corresponding class as the coordinate. To close the ogive, we need
to use the lower end by intersecting the x-axis at the lower limit of the first class.
Example of Ogive
Stem & Leaf Diagram

A stem-and-leaf diagram is constructed by separating each


value of a data set into two parts: a stem, which consists of
the leftmost digits, and a leaf, which consists of the last
digit.
Is the steam-and-leaf diagram symmetric? Explain!
Exercise

Consider the following data set


5.4 4.6 3.5 2.8 2.6 5.5 5.5 2.3 3.2 4.2
4.0 3.0 3.6 4.5 4.7 4.2 3.3 3.2 4.2 3.4

Construct a stem & leaf diagram. Is the distribution symmetric? Explain!


Scatterplots
• Displaying the relationship between two variables.
• Incomes vary with education.
• Sales vary with advertising expenditures.
• Price varies with reliability

• A scatterplot is a graphical tool that helps in determining whether or not two


quantitative variables are related in some systematic way. Each
point in the diagram represents a pair of observed values of the two variables.
Types of Scatterplot (Linear)

Negative slope: Linear Scatterplot with a


negative slope
• the two variables have a
negative linear relationship
• As the value of Y decreases, the
value of X increases
Linear Scatter Plot

Linear Scatter plot with a positive slope


Types of Scatterplot (Non-Linear)

• Non linear with a positive slope


Scatterplot:
• As the value of X increases,
the value of Y increases at
an increasing/decreasing
rate
Types of Scatterplot (No-relationship)

• Non-linear scatterplot occurs


when there is no relationship
between the two variables.

You might also like