Unit 5 SP(Notes Questionbank)
Unit 5 SP(Notes Questionbank)
Statistical analysis in various fields (business, healthcare, social sciences), Design and interpretation
of experiments, Quality control and process improvement, Introduction to regression analysis.
1|Page
Contents
Sta s cal analysis in various fields (business, healthcare, social sciences) ............................................ 3
Sta s cal Analysis Methods for Business ........................................................................................... 4
sta s cal analysis in healthcare .......................................................................................................... 5
sta s cal analysis in social science ..................................................................................................... 6
Design and interpreta on of experiments .............................................................................................. 7
Quality control and process improvement ............................................................................................. 9
Introduc on to regression analysis ....................................................................................................... 11
Why do we use Regression Analysis?................................................................................................ 12
Linear Regression: ............................................................................................................................. 12
Regression Equa on:............................................................................................................................. 13
Regression equa ons Using regression co-efficient(Actual values of X and Y series): ..................... 14
Ques on Bank ....................................................................................................................................... 16
2|Page
Sta s cal analysis in various fields (business, healthcare, social
sciences)
Sta s cal analysis is the process of collec ng and analyzing data in order to discern pa erns and
trends. It is a method for removing bias from evalua ng data by employing numerical analysis. This
technique is useful for collec ng the interpreta ons of research, developing sta s cal models, and
planning surveys and studies.
Sta s cal analysis is a scien fic tool in AI and ML that helps collect and analyze large amounts of data
to iden fy common pa erns and trends to convert them into meaningful informa on. In simple
words, sta s cal analysis is a data analysis tool that helps draw meaningful conclusions from raw and
unstructured data.
The conclusions are drawn using sta s cal analysis facilita ng decision-making and helping
businesses make future predic ons on the basis of past trends. It can be defined as a science of
collec ng and analyzing data to iden fy trends and pa erns and presen ng them. Sta s cal analysis
involves working with numbers and is used by businesses and other ins tu ons to make use of data
to derive meaningful informa on.
Descrip ve Analysis
Descrip ve sta s cal analysis involves collec ng, interpre ng, analyzing, and summarizing data to
present them in the form of charts, graphs, and tables. Rather than drawing conclusions, it simply
makes the complex data easy to read and understand.
Inferen al Analysis
The inferen al sta s cal analysis focuses on drawing meaningful conclusions on the basis of the data
analyzed. It studies the rela onship between different variables or makes predic ons for the whole
popula on.
Predic ve Analysis
Predic ve sta s cal analysis is a type of sta s cal analysis that analyzes data to derive past trends
and predict future events on the basis of them. It uses machine learning algorithms, data
mining, data modelling, and ar ficial intelligence to conduct the sta s cal analysis of data.
Prescrip ve Analysis
The prescrip ve analysis conducts the analysis of data and prescribes the best course of ac on based
on the results. It is a type of sta s cal analysis that helps you make an informed decision.
Exploratory analysis is similar to inferen al analysis, but the difference is that it involves exploring
the unknown data associa ons. It analyzes the poten al rela onships within the data.
Causal Analysis
The causal sta s cal analysis focuses on determining the cause and effect rela onship between
different variables within the raw data. In simple words, it determines why something happens and
3|Page
its effect on other variables. This methodology can be used by businesses to determine the reason
for failure.
Sta s cal analysis eliminates unnecessary informa on and catalogs important data in an
uncomplicated manner, making the monumental work of organizing inputs appear so serene. Once
the data has been collected, sta s cal analysis may be u lized for a variety of purposes. Some of
them are listed below:
The sta s cal analysis aids in summarizing enormous amounts of data into clearly diges ble
chunks.
The sta s cal analysis aids in the effec ve design of laboratory, field, and survey
inves ga ons.
Sta s cal analysis may help with solid and efficient planning in any subject of study.
Sta s cal analysis aid in establishing broad generaliza ons and forecas ng how much of
something will occur under par cular condi ons.
Sta s cal methods, which are effec ve tools for interpre ng numerical data, are applied in
prac cally every field of study. Sta s cal approaches have been created and are increasingly
applied in physical and biological sciences, such as gene cs.
Sta s cal approaches are used in the job of a businessman, a manufacturer, and a
researcher. Sta s cs departments can be found in banks, insurance businesses, and
government agencies.
A modern administrator, whether in the public or commercial sector, relies on sta s cal data
to make correct decisions.
Poli cians can u lize sta s cs to support and validate their claims while also explaining the
issues they address.
4|Page
Mul ple Regression
Whereas single variable linear regression analysis studies the rela onship between two
variables—a dependent variable and an independent variable—mul ple regression
analysis inves gates the rela onship between a dependent variable and mul ple independent
variables.
Forecas ng with mul ple regression analysis is similar to using single variable linear regression.
However, instead of entering only one value for an independent variable, a value is input for each
independent variable.
2. Clinical Trials
3. Pa ent Outcomes
Measures and analyzes pa ent recovery rates, mortality, and sa sfac on.
4. Predic ve Analy cs
Example: Analyzing pa ent admission data to improve scheduling and reduce wait mes.
6. Genomic Studies
5|Page
Common Sta s cal Techniques in Healthcare
Descrip ve Sta s cs: Summarizes data using measures like mean, median, and mode.
Inferen al Sta s cs: Makes predic ons or inferences about a popula on based on sample
data.
Regression Analysis: Iden fies rela onships between variables (e.g., age and disease risk).
Machine Learning and AI: Advanced models for analyzing complex datasets.
2. Educa on Research
Example: Assessing the rela onship between class size and academic achievement.
3. Sociology
Example: Inves ga ng income inequality and its correla on with educa on levels.
4. Psychology
Analyzes behavioral data to understand mental health, cogni on, and emo ons.
Example: Evalua ng the effec veness of therapy techniques using experimental designs.
o Summarizes data using measures like mean, median, standard devia on, and
frequency distribu ons.
6|Page
2. Inferen al Sta s cs:
3. Regression Analysis:
4. Factor Analysis:
5. Cluster Analysis:
6. Longitudinal Analysis:
Design of experiments (DOE) is a systema c, efficient method that enables scien sts and engineers
to study the rela onship between mul ple input variables (aka factors) and key output variables (aka
responses). It is a structured approach for collec ng data and making discoveries.
Ronald Fisher first introduced four enduring principles of DOE in 1926: the factorial principle,
randomiza on, replica on and blocking. Genera ng and analyzing these designs relied primarily on
hand calcula on in the past; un l recently prac oners started using computer-generated designs
for a more effec ve and efficient DOE.
7|Page
DOE is useful:
To run trials that span the poten al experimental region for our factors.
The simplest form of experimental research design in Sta s cs is the pre-experimental research
design. In this method, a group or various groups are kept under observa on, a er some factors are
recognised for the cause and effect. This method is usually conducted in order to understand
whether further inves ga ons are needed for the targeted group. That is why this process is
considered to be cost-effec ve. This method is classified into three types, namely,
This is the most accurate form of experimental research design as it relies on the sta s cal
hypothesis to prove or disprove the hypothesis. This is the most commonly used method
implemented in Physical Science. True experimental research design is the only method that
establishes the cause and effect rela onship within the groups. The factors which need to be
sa sfied in this method are:
Random variable
Control Groups (A group of par cipants are familiar with the experimental group, but the
experimental rules do not apply to them)
Experimental Group (Research par cipants where experimental rules are applied)
Quasi-Experimental Design
8|Page
In a true experiment design, the par cipants of the group are randomly assigned. So, every unit has
an equal chance of ge ng into the experimental group.
In a quasi-experimental design, the par cipants of the groups are not randomly assigned. So, the
researcher cannot make a cause or effect conclusion. Thus, it is not possible to assign the
par cipants to the group.
1. Consistency: Ensures that products and processes meet predefined quality standards.
2. Customer Sa sfac on: Maintains and improves product quality to meet or exceed customer
expecta ons.
3. Cost Reduc on: Minimizes waste, defects, and opera onal inefficiencies.
o Defining acceptable limits or tolerances for product a ributes (e.g., weight, size,
performance).
Control Charts:
o Types:
9|Page
P and C Charts: For a ribute data (e.g., defects per unit, pass/fail rates).
Assesses whether a process can consistently produce products within specifica on limits.
Key Metrics:
Tools:
4. Hypothesis Tes ng
Example: Determining whether a new material reduces defect rates compared to the current
material.
Systema cally inves gates the effects of mul ple variables on a process.
DMAIC:
7. Reliability Analysis
10 | P a g e
Challenges in Quality Control and Process Improvement
2. Resistance to Change:
3. Complex Processes:
4. Balancing Costs:
Example: Suppose there is a marke ng company A, who does various adver sement every year and
get sales on that. The below list shows the adver sement made by the company in the last 5 years
and the corresponding sales:
11 | P a g e
Now, the company wants to do the adver sement of $200 in the year 2019 and wants to know the
predic on about the sales for this year. So to solve such type of predic on problems in machine
learning, we need regression analysis.
o Regression es mates the rela onship between the target and the independent variable.
o By performing the regression, we can confidently determine the most important factor, the
least important factor, and how each factor is affec ng the other factors.
Linear Regression:
o Linear regression is a sta s cal regression method which is used for predic ve analysis.
o It is one of the very simple and easy algorithms which works on regression and shows the
rela onship between the con nuous variables.
o Linear regression shows the linear rela onship between the independent variable (X-axis)
and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called mul ple linear regression.
o The rela onship between variables in the linear regression model can be explained using the
below image. Here we are predic ng the salary of an employee on the basis of the year of
experience.
12 | P a g e
o Below is the mathema cal equa on for Linear regression:
1. Y= aX+b
∑x = Na + b∑y
∑xy = a∑y + b∑y2
Regression Equa ons of Y on X
∑y = Na + b∑x
∑xy = a∑x + b∑x2
13 | P a g e
Ques on: Calculate the regression equa ons of x on y from the following data by the method of
least square:
X 1 2 3 4 5
Y 2 5 3 8 7
Solu on:
**Note: if in the ques on men oned that solve using “Least Square” method, it means “Normal
Equa on”
∑x = Na + b∑y----------------------------(1)
∑xy = a∑y + b∑y2------------------------(2)
HomeWork:
Obtain the regression equa on of Y on X by the least square method for the following data. Also
es mate the value of y when x=10
X 1 2 3 4 5
Y 9 9 10 12 11
14 | P a g e
Ques on:
Calculate the regression equa ons of X on Y and Y on X from the following data using regression
coefficient:
X 1 2 3 4 5
Y 2 5 3 8 7
Solu on:
X On Y
(x-x’)=bxy(y-y’)
Bxy is the regression coefficient
Bxy=(N∑xy-∑x. ∑y)÷(N∑y2-(∑y)2)
X Y Xy Y2 X2
1 2 2 4 1
2 5 10 25 4
3 3 9 9 9
4 8 32 64 16
5 7 35 49 25
∑x=15 ∑y=25 ∑xy=88 ∑Y2=151 ∑x2=55
X’=(∑x)/N=15/5=3
Y’=(∑y)/N=25/5=5
(∑x)2=252=625
Bxy=(5*88-15*25)÷(5*151-625)=0.5
(x-x’)=bxy(y-y’)
A er Placing values, equa on will be,
X=0.5y+0.5
Y on X
(y-y’)=byx(x-x’)
Bxy=(n∑xy-∑x. ∑y)÷(N∑x2-(∑x)2)
Bxy=(5*88-15*25)÷(5*55-625)=1.3
(y-5)=1.3*(x-3)
Y=1.3x+1.1
Ques on H/W)
Calculate the regression equa ons of X on Y and Y on X from the following data using regression
coefficient:
X -1 5 3 2 1 1 7 3
Y -6 1 0 0 1 2 1 5
15 | P a g e
Ques on Bank
X 1 2 3 4 5
Y 3 6 4 9 8
6 Obtain the regression equa on of Y on X by the least square method for the 3
following data. Also es mate the value of y when x=10
X 1 2 3 4 5
Y 9 9 10 12 11
7 Simple linear regression vs Mul ple linear regression? 3
8 Calculate the regression equa ons of X on Y and Y on X from the following data 5
using regression coefficient:
X -1 5 3 2 1 1 7 3
Y -6 1 0 0 1 2 1 5
9 Calculate the regression equa ons of X on Y and Y on X from the following data 5
using regression coefficient:
X 1 2 3 4 5
Y 2 5 3 8 7
10 Explain about regression and uses of regression. 5
16 | P a g e