unit 1 DS vs IS
unit 1 DS vs IS
Statistics
Descriptive statistics provide a summary of the features or attributes of a dataset, while inferential
statistics enable hypothesis testing and evaluation of the applicability of the data to a larger population.
Here are the key differences between descriptive vs inferential statistics:
Inferential Statistics helps to draw conclusions and make predictions based on a data set. It is done
using several techniques, methods, and types of calculations. Some of the most important types of
inferential statistics calculations are:
1. Regression Analysis
Regression models show the relationship between a set of independent variables and a dependent
variable. This statistical method lets you predict the value of the dependent variable based on
different values of the independent variables. Hypothesis tests are incorporated to determine whether
the relationships observed in sample data actually exist in the data set.
2. Hypothesis Tests
Hypothesis testing is used to compare entire populations or assess relationships between variables
using samples. Hypotheses or predictions are tested using statistical tests so as to draw valid
inferences.
3. Confidence Intervals
The main goal of inferential statistics is to estimate population parameters, which are mostly
unknown or unknowable values. A confidence interval observes the variability in a statistic to draw an
interval estimate for a parameter. Confidence intervals take uncertainty and sampling error into
account to create a range of values within which the actual population value is estimated to fall.
Each confidence interval is associated with a confidence level that indicates the probability in the
percentage of the interval to contain the parameter estimate if you repeat the study.
Examples of descriptive statistics are used to enumerate and explain a dataset's key characteristics.
Measures like mean, median, mode, range, variance, and standard deviation are some examples. For
instance, you could use descriptive statistics to determine the average age, the age distribution, and
the age standard deviation of a group of individuals if you wanted to summarize their ages.
Using a sample of data, inferential statistics is used to draw conclusions or generalizations about a
broader population. Examples include regression analysis, confidence ranges, and hypothesis testing.
For instance, you could use inferential statistics to assess whether there is a significant difference in
the outcomes of patients who receive the drug compared to those who receive a placebo if you want
to know if a new drug is effective.
Introduction
Hypothesis testing is one of the most important techniques applied in various fields such as
statistics, economics, pharmaceutical, mining and manufacturing industries. Suppose we want
to know if something took place if certain medicines are effective, if groups differ from each
other or if one variable predicts another variable.
ll in all, we want to predict if the data collected is statistically significantly different from another. This article is
for anyone who wants to know and understand the concept of hypothesis testing, which is a significant
component of inferential statistics. The 5 steps taken to conduct the hypothesis testing have been explained in
detail.
Hypothesis Testing is an inferential statistical method that is required to use sample data to solve assumptions
about a population parameter (a characteristic that describes a population).
Unlike inferential statistics, descriptive statistics simply describes a data set without helping in drawing
inferences. In this context, inferential statistics is said to go beyond the descriptive statistics. It is particularly
used when it is not possible to examine each data point of the population.
Further, the sampling error can be observed here. This error occurs if the sample
drawn does not represent the entire population. To prevent this error, it is
recommended to collect a random sample before applying inferential statistics.
Inferential statistics requires logical reasoning to arrive at the results. The procedure
of reaching the outcomes is stated as follows:
a. A sample is chosen from the population that needs to be studied. The chosen sample must
reflect the nature and characteristics of the population.
b. The tools of inferential statistics are applied to the sample to assess its behavior. These
include the regression models and the hypothesis testing models. The former consists of
linear regression, nominal regression, logistic regression, etc., while the latter consists of the
z-test, t-test, f-test, analysis of variance (ANOVA), etc.
c. Inferences are drawn from the sample chosen in the first step. The inferences are
assumptions or estimations related to the entire population.
Types
Let us go through the types of tools used under inferential statistics.
#1 – Regression Analysis
It measures the change in one variable with respect to the other variable. Linear
regression is popularly used in inferential statistics.
Z-test is used when the sample size is greater than or equal to 30 and the data set
follows a normal distribution. The population variance is known to the researcher.
The formulas are given as follows:
Null hypothesis: H0 : μ=μ0
x̄ = sample mean
μ = population mean
σ = standard deviation of the population
n = sample size
b) T-test
T-test is used when the sample size is less than 30 and the data set follows a t-
distribution. The population variance is not known to the researcher. The formulas
are given as follows:
Null Hypothesis: H0: μ=μ0
The representations x̄ , μ, and n are the same as stated for the z-test. The letter “s”
represents the standard deviation of the sample.
c) F-test
where,
d) Confidence interval
It suggests the range within which the estimate will fall if the test is conducted on
the population. When the confidence interval is high, one can state confidently that
the sample results reflect the behavior of the population.
Example
Let us consider an example of inferential statistics.
Mr. A wants to open a coffee shop in New York, USA. To design the appropriate
menu, a survey is conducted on 300 residents with the aim of understanding their
tastes and preferences. The survey includes people of different age groups, gender,
and income class. After applying the tools of inferential statistics, the results are
stated as follows:
70% of women like the caramel macchiato.
50% of the total residents like café mocha.
Almost 100% of the adults like Americano coffee.
25% of teenagers like café latte.
With these outcomes, Mr. A is confident that including all the above varieties of
coffee will bring diverse customers to his shop. Moreover, Mr. A also wants to add
new, innovative flavors to give a rich drinking experience to his customers.