0% found this document useful (0 votes)

54 views

Final Answer Bank

Uploaded by

Shumaila Khan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Final Answer Bank

Uploaded by

Shumaila Khan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

what is fitted values and residuals explain in easy language

Certainly! Let's explain "fitted values" and "residuals" in simple terms:

### Fitted Values:

In statistics, when you have a regression model (like linear regression), the "fitted values" are the
predicted values of the dependent variable (y) based on the independent variables (x) in your model.

Here's a breakdown:
- **Regression Model**: Suppose you have a set of data where you're trying to predict one variable
(let's call it y) based on one or more other variables (x1, x2, etc.).
- **Fitted Values**: Once you fit your regression model to this data (which means finding the best
relationship between x and y), the fitted values are the predicted values of y that the model estimates
for each corresponding x value in your dataset.
- **Interpretation**: Fitted values represent what your model predicts the y values should be based
on the x values and the estimated relationship between them.

### Residuals:
In the context of regression analysis, "residuals" are the differences between the observed values of
the dependent variable (y) and the corresponding fitted values predicted by your regression model.

Here's a simple explanation:

- **Observed Values**: These are the actual y values that you have in your dataset.
- **Fitted Values**: As explained above, these are the predicted y values based on your regression
model.
- **Residuals**: Residuals are calculated as the differences between the observed y values and the
corresponding fitted y values.

For example:
- If your regression model predicts that y should be 10 for a particular set of x values, but the actual
observed y value in your dataset is 8, then the residual for that data point would be \( 8 - 10 = -2 \).
- Residuals essentially tell you how well your model is performing. Ideally, the residuals should be
small and randomly distributed around zero. Patterns or trends in residuals can indicate issues with
the model (like underfitting or overfitting).

Define the following terms related to regression analysis:

- a. Overfitting
b. Cross validation
c. R²
d. Residuals
a. **Overfitting**:
Overfitting happens when a statistical model (like a regression model) captures the noise in
the data rather than the underlying pattern. This means the model fits too closely to the training
data, including its random fluctuations or outliers. As a result, an overfitted model performs well
on the training data but poorly on new, unseen data because it's too tailored to the specific
dataset it was trained on.

b. **Cross Validation**:
Cross validation is a technique used to assess how well a model generalizes to new, unseen
data. Instead of just evaluating a model's performance on the training data, cross validation
involves splitting the dataset into multiple subsets. The model is trained on some subsets
(training set) and then tested on another subset (validation set). This process is repeated
multiple times with different combinations of training and validation sets to get a more reliable
estimate of the model's performance.

c. **R² (R-squared)**:
R-squared is a statistical measure that represents the proportion of the variance in the
dependent variable (target variable) that is predictable from the independent variables (predictor
variables) in a regression model. It's a measure of how well the variation in the dependent
variable is explained by the independent variables. R-squared ranges from 0 to 1, where 0
indicates that the model does not explain any variability in the dependent variable, and 1
indicates perfect prediction.

d. **Residuals**:
Residuals are the differences between the observed values of the dependent variable (actual
data points) and the predicted values (fitted values) from a regression model. In other words, a
residual is the error or the distance between the observed data points and the regression line.
Residuals are used to assess how well a regression model fits the data. Ideally, residuals should
be random and evenly distributed around zero; patterns or trends in residuals can indicate
issues like non-linearity or heteroscedasticity in the model.

I hope these explanations help clarify these terms for you! Let me know if you have more
questions or need further details.

Multiple linear regression:

Assumptions of Linear Regression

Linearity of residuals: There needs to be a linear relationship between the dependent variable
and independent variable(s).
time-series data wherein the next value is dependent on the previous one). There should be no
correlation between the residual terms. The absence of this phenomenon is known as
Autocorrelation.

There should not be any visible patterns in the error terms.

3. Normal distribution of residuals: The mean of residuals should follow a normal distribution
with a mean equal to zero or close to zero. This is done in order to check whether the selected
line is actually the line of best fit or not.If the error terms are non-normally distributed, suggests
that there are a few unusual data points that must be studied closely to make a better model.
4. The equal variance of residuals: The error terms must have constant variance. This
phenomenon is known as Homoscedasticity.

The presence of non-constant variance in the error terms is referred to as Heteroscedasticity.

Generally, non-constant variance arises in the presence of outliers or extreme leverage values.

Assessing the model fit

Evaluation Metrics for Linear Regression

The strength of any linear regression model can be assessed using various evaluation metrics.
These evaluation metrics usually provide a measure of how well the observed outputs are being
generated by the model.

The most used metrics are,

1. Coefficient of Determination or R-Squared (R2)

2. Root Mean Squared Error (RSME) and Residual Standard Error (RSE)

Coefficient of Determination or R-Squared (R2)

R-Squared is a number that explains the amount of variation that is explained/captured by the
developed model. It always ranges between 0 & 1 . Overall, the higher the value of R-squared,
the better the model fits the data.

Mathematically it can be represented as,

R2 = 1 – ( RSS/TSS )

Residual sum of Squares (RSS)

Total Sum of Squares (TSS)\
Root Mean Squared Error

The Root Mean Squared Error is the square root of the variance of the residuals. It specifies the
absolute fit of the model to the data i.e. how close the observed data points are to the predicted
values. Mathematically it can be represented as,

Some other parameters to assess a model are:

1. t statistic: It is used to determine the p-value and hence, helps in determining whether
the coefficient is significant or not
2. F statistic: It is used to assess whether the overall model fit is significant or not.
Generally, the higher the value of the F-statistic, the more significant a model turns out to
be.

Stepwise regression:
Stepwise regression is a technique used in statistical modeling to automatically select a subset
of predictor variables(independent variable) from a larger pool of potential predictors. The aim is
to build a regression model that includes only the most significant variables, removing those that
do not contribute significantly to the model's predictive power. Stepwise regression proceeds by
iteratively adding or removing predictors based on specific criteria until a stopping condition is
met.

There are several types of stepwise regression techniques

Forward Selection:

Approach: Forward selection starts with an empty model and gradually adds predictors one at a
time based on their individual contribution to improving the model's fit. At each step, the
predictor that provides the greatest improvement in the model's performance, according to a
predefined criterion (such as p-value or information criterion like AIC or BIC), is added to the
model.

Backward Elimination:
Approach: Backward elimination begins with a model that includes all potential predictors and
iteratively removes the least significant predictors based on a predefined criterion. The process
continues until removing additional predictors significantly worsens the model's fit.

BIDIRECTIONAL ELIMINATION:
It consist of both the methods , it chooses which variable should be included or excluded

Logistic regression:
logistic response function and logit in simple terms:

### Logistic Response Function:

The logistic response function, also known as the sigmoid function, is used in logistic regression
to transform the output of a linear regression model into probabilities. This function helps in
predicting binary outcomes (e.g., yes/no, 0/1) based on predictor variables.

**Explanation**:
- **Input**: The logistic response function takes the linear combination of predictor variables
(let's call this \( z \)), which is represented as:
\[ z = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_k x_k \]
Here, \( \beta_0, \beta_1, \ldots, \beta_k \) are coefficients of the regression model, and \( x_1,
x_2, \ldots, x_k \) are the predictor variables.

- **Output**: The logistic response function transforms \( z \) into a probability value \( p \) that
the outcome belongs to a specific category (e.g., \( Y = 1 \)):
\[ p = \frac{1}{1 + e^{-z}} \]
Here, \( e \) is the base of the natural logarithm (approximately equal to 2.718).

**Interpretation**:
- The logistic response function maps any real-valued number \( z \) to a probability \( p \)
between 0 and 1.
- As \( z \) increases, \( p \) approaches 1 (higher probability of belonging to category 1).
- As \( z \) decreases, \( p \) approaches 0 (lower probability of belonging to category 1).

### Logit Function:

The logit function is the inverse of the logistic response function. It's used to transform
probabilities back to the linear scale (log-odds) that is suitable for regression analysis.

**Explanation**:
- **Input**: Suppose you have a probability \( p \) that an event occurs (e.g., \( p = P(Y = 1) \)).

- **Output**: The logit function (denoted as \( \text{logit}(p) \)) transforms \( p \) into the log-odds
scale \( \text{logit}(p) \):
\[ \text{logit}(p) = \log\left(\frac{p}{1 - p}\right) \]
**Interpretation**:
- The logit function maps probabilities (ranging from 0 to 1) to real numbers (ranging from \(
-\infty \) to \( +\infty \)).
- \( \text{logit}(p) \) represents the logarithm of the odds of the event occurring (\( Y = 1 \)) versus
not occurring (\( Y = 0 \)).

Why They Matter:

- Logistic response function and logit are fundamental in logistic regression for modeling and
predicting binary outcomes.
- The logistic response function converts linear model outputs into probabilities, while the logit
function is used to interpret and analyze these probabilities in the context of regression.

In summary, the logistic response function converts linear model outputs to probabilities, and the
logit function provides a way to interpret and transform these probabilities back into the linear
scale for regression analysis.

GENERALIZED LINEAR MODEL

A Generalized Linear Model (GLM) is a statistical framework used to model relationships

between a response variable (dependent variable) and one or more predictor variables
(independent variables), particularly when the response variable follows a non-normal
distribution or has a non-linear relationship with the predictors. GLMs are an extension of
traditional linear regression models.

below given are some types of datasets and the corresponding distributions
which would help us in constructing the model for a particular type of data
(The term data specified here refers to the output data or the labels of the
dataset).

1. Binary classification data – Bernoulli distribution

2. Real valued data – Gaussian distribution

3. Count-data – Poisson distribution

### Components of a GLM:

1. **Linear Predictor**: The GLM includes a linear predictor that combines the effects of
predictor variables.
\[ \eta = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_k x_k \]
Here,
- \( \eta \) is the linear predictor,
- \( \beta_0, \beta_1, \ldots, \beta_k \) are coefficients (parameters) to be estimated,
- \( x_1, x_2, \ldots, x_k \) are predictor variables.

2. **Link Function**: The linear predictor \( \eta \) is related to the expected value of the
response variable \( Y \) through a link function \( g(\cdot) \).
\[ g(\mu) = \eta \]
Here,
- \( g(\cdot) \) is the link function,
- \( \mu \) is the expected value of \( Y \) given the predictors.

3. Probability Distribution: GLMs allow the response variable \( Y \) to follow a probability

distribution from the exponential family (e.g., binomial, Poisson, gamma) that matches the
nature of the data.

### Key Concepts Explained:

- **Link Function**: The link function \( g(\cdot) \) establishes the relationship between the linear
predictor \( \eta \) and the expected value \( \mu \) of the response variable. It transforms the
linear scale to the scale of the response variable's distribution.

- **Types of GLMs**:
- **Binary Outcome (Logistic Regression)**: Use a binomial distribution with a logit link function
for binary response variables (e.g., yes/no).
- **Count Data (Poisson Regression)**: Use a Poisson distribution with a log link function for
count data (e.g., number of events).
- **Continuous Outcome (Gamma Regression)**: Use a gamma distribution with a log link
function for continuous positive outcomes (e.g., insurance claims).

### Example:
Let's consider an example of using GLM for binary outcome (logistic regression):

**Problem**: Predicting whether a student passes (1) or fails (0) an exam based on study hours
(continuous predictor).

**Model**:
\[ \text{logit}(p) = \beta_0 + \beta_1 \times \text{study\_hours} \]
where \( p = P(\text{pass} = 1) \) and \( \text{logit}(p) = \log\left(\frac{p}{1 - p}\right) \).

- **Link Function**: The logit link function (\( g(\mu) = \log\left(\frac{\mu}{1 - \mu}\right) \)) is used
to map the linear predictor to probabilities.

- Probability Distribution: The response variable (pass/fail) follows a binomial distribution.

### Why GLM Matters:
- **Flexibility**: GLMs can handle a wide range of response variable distributions and
relationships with predictors.
- **Interpretability**: The model coefficients have interpretable effects through the link function.
- **Widely Used**: GLMs are fundamental in many fields including biology, medicine, social
sciences, and economics for analyzing categorical and non-normal data.

In summary, a Generalized Linear Model is a versatile statistical framework that extends linear
regression to handle diverse data types and relationships, making it a powerful tool for modeling
and inference in various real-world scenarios.

Module 5:

1)DATA IMPORT AND EXPORT IN R IN 15POINTS

1. read.csv(): Use this to bring in data from a CSV file.

- ExplanaƟon: If you have a table of informaƟon saved in a CSV file, this funcƟon helps you
bring it into R so you can work with it.

2. read.table(): Brings data from a text file.

- ExplanaƟon: Similar to read.csv(), but for text files. If your data is in a plain text file instead of
a CSV, you'd use this.

3. read_excel(): For imporƟng Excel files.

- ExplanaƟon: If you have data in an Excel spreadsheet, this funcƟon lets you bring it into R for
analysis

4. readRDS(): Import R data files.

- ExplanaƟon: R can save data in its own format. This funcƟon lets you bring that data back
into R. 5.

write.csv(): Export data to a CSV file

. - ExplanaƟon: Once you've done your analysis in R, this funcƟon helps you save your results
as a CSV file.

6. write.table(): Export data to a text file.

- ExplanaƟon: Similar to write.csv(), but for saving data as a plain text file.

7. write_excel_csv(): Save data to an Excel file.

- ExplanaƟon: If you want to share your results with someone who uses Excel, this funcƟon
helps you save your data in an Excel-friendly format.

8. saveRDS(): Save R objects as R data files.

- ExplanaƟon: If you've created custom objects in R that you want to use later, you can save
them using this funcƟon.

1. **Data Import**: - R offers various functions to import data from different file formats including
CSV, Excel, JSON, XML, and databases like MySQL, SQLite, etc. - Commonly used functions
for importing data include `read.csv()`, `read.table()`, `read.xlsx()` (from the `readxl` package),
and `read_json()` (from the `jsonlite` package).

- Users can also import data directly from URLs using functions like `read.csv()` or
`read.table()`.

2. **Data Export**: - After processing or analyzing data in R, it's often necessary to export the
results or modified datasets for further use

. - R provides functions like `write.csv()`, `write.table()`, and `write.xlsx()` (from the `writexl`
package) to export data to CSV, text files, and Excel files respectively

. - For exporting data to databases, packages like `DBI` and `RMySQL` can be used.

3. Package Dependencies: - Some file formats may require additional packages to be

installed for importing or exporting data. For example, importing Excel files may need the
`readxl` or `openxlsx` package. - Ensure that the necessary packages are installed and loaded
before attempting data import or export operations.

4. **Data Frame Structure**: - R typically imports data into a data frame, which is a tabular
structure where rows represent observations and columns represent variables. - Data frames
are versatile and allow for easy manipulation and analysis of data.

5. **Data Cleaning**: - Importing data into R often involves cleaning and preprocessing steps to
handle missing values, incorrect data types, or inconsistent formatting. - Functions like
`na.omit()`, `na.exclude()`, and `complete.cases()` are commonly used for handling missing
data.
6. **File Paths**: - When importing data from local files, users need to specify the file path
correctly. Relative or absolute paths can be used depending on the location of the file. 7. **Data
Exploration**: - Before proceeding with analysis, it's essential to explore the imported data using
functions like `head()`, `summary()`, `str()`, and `dim()` to understand its structure and
characteristics.

Applied ML notes
No ratings yet
Applied ML notes
123 pages
CH02 - Wooldridge - 7e PPT - 2pp
100% (3)
CH02 - Wooldridge - 7e PPT - 2pp
40 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
W6 - L6 - Multiple Linear Regression
No ratings yet
W6 - L6 - Multiple Linear Regression
3 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
Machine Learning Algorithm
100% (2)
Machine Learning Algorithm
20 pages
Introudction To Regression Analysis and Measuring With Stat Model 1702371825910
No ratings yet
Introudction To Regression Analysis and Measuring With Stat Model 1702371825910
16 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
ML CH
No ratings yet
ML CH
19 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Information Retrieval Important questions
No ratings yet
Information Retrieval Important questions
20 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Logistic Regression
No ratings yet
Logistic Regression
11 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
Regression
No ratings yet
Regression
45 pages
Module3-Fitting A Model To Data
No ratings yet
Module3-Fitting A Model To Data
57 pages
Machine Learning QB
No ratings yet
Machine Learning QB
32 pages
9 Types of Regression Analysis
No ratings yet
9 Types of Regression Analysis
16 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
Unit - II_DA
No ratings yet
Unit - II_DA
22 pages
datamining unit4
No ratings yet
datamining unit4
21 pages
Forecasting
No ratings yet
Forecasting
15 pages
Thesis Using Multiple Regression
100% (3)
Thesis Using Multiple Regression
5 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
STATISTIC%20AND%20DATA%20SCIENCE%20II.pdf
No ratings yet
STATISTIC%20AND%20DATA%20SCIENCE%20II.pdf
37 pages
UNIt-3 TY
No ratings yet
UNIt-3 TY
67 pages
m2 Data analytic and visualization
No ratings yet
m2 Data analytic and visualization
53 pages
DATT - Class 05 - Assignment - GR 9
No ratings yet
DATT - Class 05 - Assignment - GR 9
9 pages
ML final
No ratings yet
ML final
92 pages
ML DL NLP Definitions
No ratings yet
ML DL NLP Definitions
22 pages
FDSA UNIT 5
No ratings yet
FDSA UNIT 5
48 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Unit II - Diagnotis and Multiple Linear
No ratings yet
Unit II - Diagnotis and Multiple Linear
8 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Group_1_Practical
No ratings yet
Group_1_Practical
16 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
ML 2 nd Unit
No ratings yet
ML 2 nd Unit
50 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
CFA Level2
No ratings yet
CFA Level2
8 pages
Simple Linear Regression: Coefficient of Determination
No ratings yet
Simple Linear Regression: Coefficient of Determination
21 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
CPSY 501:: Class 4 Outline
No ratings yet
CPSY 501:: Class 4 Outline
22 pages
Term End Examination: Descriptive Answer Script
No ratings yet
Term End Examination: Descriptive Answer Script
5 pages
Unit1 - Data Science - SPPU
No ratings yet
Unit1 - Data Science - SPPU
15 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
Module 4: Regression Shrinkage Methods
No ratings yet
Module 4: Regression Shrinkage Methods
5 pages
Predictive Modelling Using Linear Regression
No ratings yet
Predictive Modelling Using Linear Regression
12 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
Data Analytics Unit 2
No ratings yet
Data Analytics Unit 2
13 pages
FMD PRACTICAL FILE
No ratings yet
FMD PRACTICAL FILE
61 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
6.Classification & Regression
No ratings yet
6.Classification & Regression
45 pages
Supervised Learning Notes 1-4
No ratings yet
Supervised Learning Notes 1-4
42 pages
How To Minimize Misclassification Rate and Expected Loss For Given Model
No ratings yet
How To Minimize Misclassification Rate and Expected Loss For Given Model
7 pages
Sales and Advertising
No ratings yet
Sales and Advertising
14 pages
Introduction To AI and ML
No ratings yet
Introduction To AI and ML
22 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
(eBook PDF) Data Analysis and Decision Making 4th Editionpdf download
100% (2)
(eBook PDF) Data Analysis and Decision Making 4th Editionpdf download
46 pages
The Effects of Information Communication Technology On Administrative Efficiency of Tamale Technical University
No ratings yet
The Effects of Information Communication Technology On Administrative Efficiency of Tamale Technical University
17 pages
Highway Notes
No ratings yet
Highway Notes
111 pages
Travel Demand Forecasting Group 5-1-092003
No ratings yet
Travel Demand Forecasting Group 5-1-092003
59 pages
Business Statistics Session 17: Simple Correlation and Regression
No ratings yet
Business Statistics Session 17: Simple Correlation and Regression
24 pages
FRAM Time Series
No ratings yet
FRAM Time Series
30 pages
Extra PMTH 003 Qa
No ratings yet
Extra PMTH 003 Qa
4 pages
Chapter 4 Regression Analysis
No ratings yet
Chapter 4 Regression Analysis
70 pages
Lecture 1 - Simple Linear Regression
No ratings yet
Lecture 1 - Simple Linear Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Diggle 2013 Statistical Analysis of Spatial and
No ratings yet
Diggle 2013 Statistical Analysis of Spatial and
69 pages
CM6 - Mathematics As A Tool - Dispersion and Correlation
No ratings yet
CM6 - Mathematics As A Tool - Dispersion and Correlation
18 pages
lecture2-supervised-learning slides
No ratings yet
lecture2-supervised-learning slides
56 pages
Chapter 2-Exercises-Managerial Accounting
No ratings yet
Chapter 2-Exercises-Managerial Accounting
7 pages
2024 11 S1 - 1 Linear Regression (A)
No ratings yet
2024 11 S1 - 1 Linear Regression (A)
2 pages
January 2015 (IAL) QP - S1 Edexcel
No ratings yet
January 2015 (IAL) QP - S1 Edexcel
15 pages
Assignment 1 (Part 1) - To Excel in Chemistry (Instructions) - SV
No ratings yet
Assignment 1 (Part 1) - To Excel in Chemistry (Instructions) - SV
16 pages
Chapter Two 2022
No ratings yet
Chapter Two 2022
15 pages
8% Turnitin - Originality Report - Identify The Best Global Practices and Trends..
No ratings yet
8% Turnitin - Originality Report - Identify The Best Global Practices and Trends..
7 pages
Vehicle Pedestrian Collisions
No ratings yet
Vehicle Pedestrian Collisions
13 pages
2018 Book FoundationsOfBiostatistics PDF
100% (3)
2018 Book FoundationsOfBiostatistics PDF
474 pages
Research On Impacts of Mechanical Vibrations On TH
No ratings yet
Research On Impacts of Mechanical Vibrations On TH
10 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
A Study On Family Environment and Self Efficacy of Adolescents
No ratings yet
A Study On Family Environment and Self Efficacy of Adolescents
5 pages
Path Analysis vs. Structural Equation Modeling: Marketing Research January 2000
No ratings yet
Path Analysis vs. Structural Equation Modeling: Marketing Research January 2000
11 pages
BA Assignment Ans
No ratings yet
BA Assignment Ans
2 pages
Overview
No ratings yet
Overview
7 pages
ECL 222-A Numerical Methods-Day 1: Department of Physics, University of Colombo Electronics & Computing Laboratary Ii
No ratings yet
ECL 222-A Numerical Methods-Day 1: Department of Physics, University of Colombo Electronics & Computing Laboratary Ii
23 pages

Final Answer Bank

Uploaded by

Final Answer Bank

Uploaded by

what is fitted values and residuals explain in easy language

Certainly! Let's explain "fitted values" and "residuals" in simple terms:

### Fitted Values:

Here's a simple explanation:

Define the following terms related to regression analysis:

Multiple linear regression:

Assumptions of Linear Regression

There should not be any visible patterns in the error terms.

The presence of non-constant variance in the error terms is referred to as Heteroscedasticity.

Assessing the model fit

Evaluation Metrics for Linear Regression

The most used metrics are,

1. Coefficient of Determination or R-Squared (R2)

Coefficient of Determination or R-Squared (R2)

Mathematically it can be represented as,

Residual sum of Squares (RSS)

Some other parameters to assess a model are:

There are several types of stepwise regression techniques

### Logistic Response Function:

### Logit Function:

**Why They Matter**:

GENERALIZED LINEAR MODEL

A Generalized Linear Model (GLM) is a statistical framework used to model relationships

1. Binary classification data – Bernoulli distribution

2. Real valued data – Gaussian distribution

3. Count-data – Poisson distribution

### Components of a GLM:

3. **Probability Distribution**: GLMs allow the response variable \( Y \) to follow a probability

### Key Concepts Explained:

- **Probability Distribution**: The response variable (pass/fail) follows a binomial distribution.

1)DATA IMPORT AND EXPORT IN R IN 15POINTS

1. read.csv(): Use this to bring in data from a CSV file.

2. read.table(): Brings data from a text file.

3. read_excel(): For imporƟng Excel files.

4. readRDS(): Import R data files.

write.csv(): Export data to a CSV file

6. write.table(): Export data to a text file.

7. write_excel_csv(): Save data to an Excel file.

8. saveRDS(): Save R objects as R data files.

3. **Package Dependencies**: - Some file formats may require additional packages to be

You might also like

Why They Matter:

3. Probability Distribution: GLMs allow the response variable \( Y \) to follow a probability

- Probability Distribution: The response variable (pass/fail) follows a binomial distribution.

3. Package Dependencies: - Some file formats may require additional packages to be