0% found this document useful (0 votes)
10 views8 pages

Regression Modeling

Linear regression models represent a continuous response variable as a function of one or more predictor variables. There are simple and multiple linear regression models. Hypothesis testing involves formulating null and alternative hypotheses, determining a significance level, choosing a test statistic like z, t or F, calculating the test statistic value, and making a decision to reject or fail to reject the null hypothesis based on the p-value.

Uploaded by

workvasudha18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views8 pages

Regression Modeling

Linear regression models represent a continuous response variable as a function of one or more predictor variables. There are simple and multiple linear regression models. Hypothesis testing involves formulating null and alternative hypotheses, determining a significance level, choosing a test statistic like z, t or F, calculating the test statistic value, and making a decision to reject or fail to reject the null hypothesis based on the p-value.

Uploaded by

workvasudha18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

What are linear regression models?

A linear regression model can be defined as the function approximation that


represents a continuous response variable as a function of one or more predictor
variables. While building a linear regression model, the goal is to identify a linear
equation that best predicts or models the relationship between the response or
dependent variable and one or more predictor or independent variables.
There are two different kinds of linear regression models. They are as follows:

•Simple or Univariate linear regression models: These are linear


regression models that are used to build a linear relationship between one
response or dependent variable and one predictor or independent variable. The
form of the equation that represents a simple linear regression model is
Y=mX+b, where m is the coefficients of the predictor variable and b is bias. When
considering the linear regression line, m represents the slope and b represents
the intercept
•Multiple or Multi-variate linear regression models:
These are linear regression models that are used to build a linear
relationship between one response or dependent variable and more than one
predictor or independent variable. The form of the equation that represents
a multiple linear regression model is Y=b0+b1X1+ b2X2 + … + bnXn, where
bi represents the coefficients of the ith predictor variable. In this type of
linear regression model, each predictor variable has its own coefficient that
is used to calculate the predicted value of the response variable.
Hypothesis testing is done to confirm our observation about the
population using sample data, within the desired error level. Through
hypothesis testing, we can determine whether we have enough
statistical evidence to conclude if the hypothesis about the population is
true or not.

Key steps to perform hypothesis test are as follows:


1.Formulate a Hypothesis
2.Determine the significance level
3.Determine the type of test
4.Calculate the Test Statistic values and the p values
5.Make Decision
Formulating the hypothesis
One of the key steps to do this is to formulate the below two hypotheses:
The null hypothesis represented as H₀ is the initial claim that is based on the prevailing
belief about the population.
The alternate hypothesis represented as H₁ is the challenge to the null hypothesis. It is
the claim which we would like to prove as True
One of the main points which we should consider while formulating the null and
alternative hypothesis is that the null hypothesis always looks at confirming the existing
notion. Hence, it has sign >= or , < and ≠
Determine the significance level also known as alpha or α for Hypothesis Testing
The significance level is the proportion of the sample mean lying in critical regions. It is
usually set as 5% or 0.05 which means that there is a 5% chance that we would accept
the alternate hypothesis even when our null hypothesis is true.

Based on the criticality of the requirement, we can choose a lower significance level of
1% as well.
Determine the Test Statistic and calculate its value for Hypothesis Testing
Hypothesis testing uses Test Statistic which is a numerical summary of a data-set that reduces
the data to one value that can be used to perform the hypothesis test.
Select the type of Hypothesis test
We choose the type of test statistic based on the predictor variable – quantitative or categorical.
Below are a few of the commonly used test statistics for quantitative data
Type of predictor
Distribution type Desired Test Attributes
variable
•Large sample size
Quantitative Normal Distribution Z – Test •Population standard
deviation known
•Sample size less than 30
Quantitative T Distribution T-Test •Population standard
deviation unknown
•When you want to
Positively skewed
Quantitative F – Test compare 3 or more
distribution
variables
•Requires feature
Negatively skewed
Quantitative NA transformation to
distribution
perform a hypothesis test
•Test of independence
Categorical NA Chi-Square test
•Goodness of fit
Z-statistic – Z Test
Z-statistic is used when the sample follows a normal distribution. It is calculated based on the population
parameters like mean and standard deviation.
One sample Z test is used when we want to compare a sample mean with a population mean
Two sample Z test is used when we want to compare the mean of two samples

T-statistic – T-Test
T-statistic is used when the sample follows a T distribution and population parameters are unknown. T
distribution is similar to a normal distribution, it is shorter than normal distribution and has a flatter tail.
If the sample size is less than 30 and population parameters are not known, we use T distribution. Here also,
we can use one Sample T-test and a two-sample T-test.
F-statistic – F test
For samples involving three or more groups, we prefer the F Test. Performing T-test on multiple groups
increases the chances of Type-1 error. ANOVA is used in such cases.
Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. ANOVA
uses F-tests to statistically test the equality of means.
F-statistic is used when the data is positively skewed and follows an F distribution. F distributions are always
positive and skewed right.
F = Variation between the sample means/variation within the samples
For negatively skewed data we would need to perform feature transformation
Chi-Square Test
For categorical variables, we would be performing a chi-Square test.
Following are the two types of chi-squared tests:
1.Chi-squared test of independence – We use the Chi-Square test to determine whether
or not there is a significant relationship between two categorical variables.
2.Chi-squared Goodness of fit helps us determine if the sample data correctly represents
the population.

The decision about your model


Test Statistic is then used to calculate P-Value. A P-value measures the strength of
evidence in support of a null hypothesis. If the P-value is less than the significance level,
we reject the null hypothesis.
if the p-value < α, then we have statistically significant evidence against the null
hypothesis, so we reject the null hypothesis and accept the alternate hypothesis
if the p-value > α then we do not have statistically significant evidence against the null
hypothesis, so we fail to reject the null hypothesis
Errors while making decisions
There are two possible types of error we could commit while performing
hypothesis testing.

1) Type1 Error – This occurs when the null hypothesis is true but we reject it. The
probability of type I error is denoted by alpha (α). Type 1 error is also known as the
level of significance of the hypothesis test
2) Type 2 Error – This occurs when the null hypothesis is false but we fail to reject
it. The probability of type II error is denoted by beta (β

You might also like