Assignment 2 Completed
Assignment 2 Completed
It is realized that the Price ($) of commuting in a country depends on the age of the passenger, duration, and
distance.
Age of the passenger Duration (in minutes) Distance (in miles) Price (in $)
61 16 3.2 22.3
24 4 1.5 12.5
47 29 5 29
32 33 5.8 36.2
23 14 2.3 19.1
82 30 6.1 36.5
57 56 12 66.9
36 11 1.9 17.3
42 2 0.8 8
47 14 1.7 18
29 15 2.8 24.3
27 15 2.5 22.1
19 45 4.2 44.3
45 19 3.6 23
39 49 10.2 61.1
33 31 6 35
1. State dependent (output) and independent variables (input) in the dataset provided
Dependent Variable (Output): Price (in $)
Independent Variables (Inputs): Age of the passenger, Duration (in minutes), Distance (in miles)
2. Conduct a univariate analysis for Price vs Age of the passenger. Stated univariate regression function,
calculate error % and calculate model accuracy. Predict the price for age=37.
The regression function based on the output would be:
Price = 21.3917+0.2074×Age
The R² value is 0.0426816, or 4.27%. This means that approximately 4.27% of the variance in the Price can be explained by the Age
of the passenger. An R² value of 4.27% is quite low, suggesting that the Age of the passenger is not a strong predictor of the Price in
this model.
Using the regression function, we would predict the price for age=37 as follows: Price=21.3917+0.2074×37Price=21.3917+0.2074×37
= $29.06
3. Conduct a univariate analysis for Price vs Duration. Stated univariate regression function, calculate error %
and calculate model accuracy. Predict the price for duration=22
Univariate Regression Function: The regression function based on the output would be:
Price=5.5011+1.01197×DurationPrice
Model Accuracy: The R-squared value represents the model accuracy, which in this case is 0.95556, or 95.56%. This high R-squared
value indicates a strong linear relationship between Duration and Price.
Predict the Price for Duration 22: Using the regression equation provided, we predict the price for a duration of 22 minutes as
follows: Price=5.5011+1= $27.76
4. Conduct a univariate analysis for Price vs Distance. Stated univariate regression function, calculate error %
and calculate model accuracy. Predict the price for distance=3.9
The R-squared value, which provides the model accuracy, is 0.92205, or 92.21%. This suggests that the model explains 92.21% of the
variance in the Price based on the Distance alone, indicating a strong linear relationship.
Distance (in miles) Actual Price ($) Predicted Price ($) Absolute Error ($) Percentage Error (%)
5. Conduct a multivariate regression analysis [ Include all inputs/ independent variables. Do not exclude any
inputs even if the p values are not significant (>0.05)]
Take a snapshot of the summary output of regression that shows R square, coefficients for the intercept and
the independent variables
6. From the summary output above, was there any independent variable (input) that was not statistically
significant (p value > 0.05)? If so, which one?
The independent variable that was not statistically significant (p-value > 0.05) is the "Age of the
passenger". The p-value for age is 0.282423278, which is greater than the usual significance level of 0.05.
7. Remove the statistically insignificant input (p value > 0.05) from the dataset and rerun the regression again.
4.1. If you are using Word file, take a snapshot of the summary output of regression that
shows R square, coefficients for the intercept and the variables.
4.3. Conduct model evaluation (Hint: Predicted vs Actual, Absolute of Predicted vs Actual
and Variance)
AbsoluteResidu
Actual Value PredictedValue AbsoluteDifference al
8. From your multivariate analysis, calculate the predicted value for the following inputs (Age = 30, distance
= 15 and duration = 4)