0% found this document useful (0 votes)
9 views

Assignment-1_IMTS

Uploaded by

alok kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Assignment-1_IMTS

Uploaded by

alok kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

GATI SHAKTI VISHWAVIDALAYA

INTRODUCTION TO MULTIMODEL TRANSPORT SYSTEM


ASSIGNMENT-1 (TRIP GENERATION MODEL)

SUBMITTED TO – PRO. Dr. AGNIVESH

Submitted By: -
Jaivendra Singh Vais
M. Tech. (Railway Engineering)
Sem-1
Roll No. 24MTRE-W006
CONTENTS

Sr. No. Description Page No.


1. Introduction 2
2. Assignment Details 3-4
3. Problem Statement -1 and its Solution 5-10
4. Problem Statement -2 and its Solution 11-12
5. Problem Statement -3 and its Solution 13-28
6. Problem Statement -4 and its Solution 29-34
7. Problem Statement -5 and its Solution 35-37
8. Problem Statement -6 and its Solution 38-43
9. Problem Statement -7 and its Solution 44
10. Problem Statement -8 and its Solution 45

Page | 1
01.INTRODUCTION

A trip production model is a predictive tool in transportation planning that estimates the number
of trips generated from specific origin zones, usually based on socio-economic and demographic
characteristics of the residents in these areas. The concept of trip production refers to the outbound
trips initiated from a given location, typically a household or a traffic analysis zone, reflecting
daily travel demand patterns. These models are instrumental in analysing how factors such as
household income, size, employment status, and accessibility to amenities influence travel
behaviours, helping planners anticipate infrastructure needs and prioritize transportation projects.
The primary purpose of trip production models is to provide insight into travel demand at a
localized level, informing policies and interventions aimed at improving urban mobility and
reducing congestion.
While trip production models serve as valuable tools for estimating and analysing travel demand,
they also have certain limitations. A significant challenge is the reliance on past data to forecast
future behaviour, which may not always capture evolving travel trends, such as shifts in travel
modes or the impact of emerging technologies like ride-sharing services. Additionally, these
models are often static, assuming that socio-economic and demographic factors affecting travel
behaviour remain constant over time, which may not account for rapid urbanization or economic
shifts. Trip production models may also struggle with capturing complex interdependencies
between variables, such as the influence of lifestyle changes or remote working trends on travel
demand. Despite these limitations, trip production models remain essential for understanding
travel demand and supporting data-driven planning in urban environments.

Page | 2
02. Assignment Details

The attached dataset presents home interview survey data collected from 1500 respondents in
Pune Metropolitan Area by Pune Metropolitan Region Development Authority (PMRDA). The
dataset contains following information fields:
1. Day_of_Survey 2. Month_of_Survey 3. Age 4. Job 5. Marital Status 6. Education 7. Trips
8. Income (INR) 9. Household Expenditure (INR) 10. Household Area 11. Distance to City
Centre 12. Household Size 13. Number of Employed HH Members 14. Accessibility Index 15.
Number of Elderly 16. Distance to Metro Station 17. Metro Card Holders 18. Car Ownership
As a transportation planner, you are entrusted with the responsibility of developing trip
production models for Pune. It is specified that the models need to be validated using 12% of
unused data. As a general practice, it is recommended that the univariate outliers need to be
removed before partitioning the data into two: (a) training dataset and (b) validation dataset. Once
the outliers are removed (maximum limit = 6%), the following tasks need to be performed as a
part of this assignment.
1. Specify the details of training dataset and validation dataset (i.e., descriptive statistics of
dependent variable and selected independent variables) and explain some of your key
observations
2. Provide graphical representations of variability in dependent variable and selected independent
variables using Minitab.
3. Develop a set of 5 models for predicting trip production from the study city. Compare the
models using standard goodness of fit measures and justify their inclusion.
4. Conduct external validation of the set of 5 models and compare their error percentage.
5. How do you justify that the regression model results are statistically valid? Demonstrate the
validity of various assumptions of regression analysis and discuss them briefly. (For any one
model)
6. Develop a cross-classification analysis table to understand the mean variation of trips among
car ownership levels (0,1 and 2) within different job categories (Students, Professionals, Self-
Employed, Technician, Unemployed, Housemaids, Retired, Entrepreneurs). Try to show the
variation graphically in Minitab.
7. Do you think that having a metro card influence the number of trips made by a person? Justify
your answer using statistical visualizations or a model.
8. Do you think that the education level (4th standard, 6th standard, 9th standard, high school,
professional course, university degree) of a person has a positive influence on his/her trip
production? Justify your answer using statistical visualizations or a model.

Page | 3
The assignment involves utilizing home interview survey data collected from 1500 respondents
across the Pune Metropolitan Area by the Pune Metropolitan Region Development Authority
(PMRDA). This comprehensive dataset offers insights into various socio-economic and travel
behaviour aspects of households, such as age, job type, education level, income, household size,
travel distance, metro card ownership, and car ownership.
As a transportation planner, I have to develop trip production models based on this data to predict
and analyse travel demand within Pune. These models will support urban transportation planning
by providing insights into the factors influencing the frequency and nature of trips. A crucial step
before modelling involves data pre-processing to ensure accuracy and reliability. This process
includes identifying and removing univariate outliers to a maximum of 6% of the dataset, thereby
maintaining data integrity. Following outlier treatment, the dataset will be divided into a training
set for model development and a validation set, composed of 12% of the unused data, for
performance assessment.
This approach will provide a robust basis for generating predictive models, which will assist in
making informed decisions to enhance Pune's transportation planning and infrastructure
development.

Page | 4
03.PROBLEM PART – 1 AND ITS SOLUTIONS

1. Specify the details of training dataset and validation dataset (i.e., descriptive statistics of
dependent variable and selected independent variables) and explain some of your key
observations.
The first step was to remove the outliers from the 1500 dataset. Here, we are considering Trip as
the dependent variable and boxplot was used for identifying the outliers. A total of 4 outliers was
found and removed from the Dataset. After that 12% of the remaining data was removed and kept
as subset which serves as validation dataset. Now, there is a total of 1316 numbers of Training
data set and 180 numbers of Validation dataset. Now the descriptive statistical analysis of
dependent variable (Trips) against four independent variables (Income, Age, Accessibility Index
& Distance to Metro Station) were performed for both Training and Validation data set.

Fig. 1 Box Plot of Trips

Page | 5
➢ Descriptive Statistics for Training Data Set

Results for: Worksheet 3

Descriptive Statistics: Trips, Income (INR), Age, Accessibility, Number of Cars Owned

Variable N N* Mean SE Mean TrMean StDev Variance


Trips 1315 0 17.108 0.220 17.082 7.968 63.486
Income (INR) 1315 0 105365 6525 64240 236634 55995463786
Age 1315 0 42.988 0.273 42.597 9.887 97.746
Accessibility Index 1315 0 0.26942 0.00618 0.25313 0.22393 0.05015
Number of Cars Owned 1315 0 0.4844 0.0160 0.4353 0.5789 0.3352

Variable CoefVar Sum Sum of Squares Minimum Q1


Trips 46.57 22497.000 468299.000 1.000 10.000
Income (INR) 224.58 138554847 8.81769E+13 0 64
Age 23.00 56529.000 2558497.000 20.000 36.000
Accessibility Index 83.12 354.28113 161.33988 0.00007 0.08711
Number of Cars Owned 119.52 637.0000 749.0000 0.0000 0.0000

Variable Median Q3 Maximum Range IQR Mode


Trips 17.000 24.000 43.000 42.000 14.000 11
Income (INR) 1985 57864 1344445 1344445 57801 *
Age 42.000 50.000 78.000 58.000 14.000 35, 36
Accessibility Index 0.20581 0.39875 0.99403 0.99395 0.31164 *
Number of Cars Owned 0.0000 1.0000 2.0000 2.0000 1.0000 0

N for
Variable Mode Skewness Kurtosis
Trips 64 0.06 -1.04
Income (INR) 0 2.81 7.57
Age 62 0.64 0.46
Accessibility Index 0 0.96 0.17
Number of Cars Owned 734 0.72 -0.47

Page | 6
➢ Key Observations in Descriptive Statistics of Training Dataset:
The results for the descriptive statistics provide an in-depth view of several key variables: Trips,
Income (INR), Age, Accessibility Index, and Number of Cars Owned. These statistics include
measures of central tendency (mean, median, mode) and variability (standard deviation, range,
interquartile range), as well as shape descriptors (skewness and kurtosis), which give insight into
the distribution of each variable.
1. Trips: The average number of trips is 17.1 with a standard deviation of 8.0, indicating
moderate variability. The distribution is slightly positively skewed (0.06) with a slight
flattening effect indicated by a kurtosis of -1.04, suggesting fewer extreme values or
outliers in the data.
2. Income (INR): Income has a high mean of 105,365 INR, but a substantial standard
deviation of 236,634 INR, reflecting significant income disparity. The distribution is
highly skewed (skewness of 2.81) and leptokurtic (kurtosis of 7.57), indicating a heavy
tail with potential outliers or extremely high values.
3. Age: With a mean age of 43, most respondents fall into a middle-aged category. The
distribution shows slight positive skewness (0.64) and moderate kurtosis (0.46),
indicating a relatively normal distribution but with a tendency for slightly older ages.
4. Accessibility Index: The mean Accessibility Index is low (0.27), and it varies slightly
(standard deviation 0.22). The index is positively skewed (0.96), with most households
likely having limited access to transportation options or city centre facilities.
5. Number of Cars Owned: The average is below 1 (mean = 0.48), showing that most
households own fewer than one car on average, with a mode of 0, indicating that many
households do not own a car. Skewness is positive (0.72), and kurtosis is -0.47, showing
a distribution leaning towards low car ownership.
These statistics reveal that income levels and car ownership vary significantly, which may impact
trip production patterns. Skewness and kurtosis in variables such as Income suggest potential
outliers, which may need to be addressed to improve model accuracy. Understanding these
distributions will help in selecting appropriate transformations or normalization techniques when
developing trip production models.

Page | 7
➢ Descriptive Statistics for Validation Data Set

Results for: Worksheet 4

Descriptive Statistics: Trips, Income (INR), Age, Accessibility, Number of Cars Owned

Variable N N* Mean SE Mean TrMean StDev Variance


Trips 180 0 15.972 0.564 15.741 7.565 57.223
Income (INR) 180 0 77045 15795 35326 211913 44907212157
Age 180 0 41.633 0.711 41.278 9.535 90.915
Accessibility Index 180 0 0.2561 0.0156 0.2399 0.2099 0.0440
Number of Cars Owned 180 0 0.4000 0.0399 0.3642 0.5348 0.2860

Variable CoefVar Sum Sum of Squares Minimum Q1


Trips 47.36 2875.000 56163.000 4.000 10.000
Income (INR) 275.05 13868105 9.10686E+12 3 49
Age 22.90 7494.000 328274.000 24.000 35.000
Accessibility Index 81.94 46.1011 19.6903 0.0024 0.0892
Number of Cars Owned 133.71 72.0000 80.0000 0.0000 0.0000

N for
Variable Median Q3 Maximum Range IQR Mode Mode
Trips 16.000 21.000 45.000 41.000 11.000 16 14
Income (INR) 1078 19059 1153734 1153731 19010 * 0
Age 41.000 47.000 78.000 54.000 12.000 41 13
Accessibility Index 0.2126 0.3590 0.9591 0.9567 0.2698 * 0
Number of Cars Owned 0.0000 1.0000 2.0000 2.0000 1.0000 0 112

Variable Skewness Kurtosis


Trips 0.52 -0.00
Income (INR) 3.43 11.39
Age 0.81 1.25
Accessibility Index 1.03 0.66
Number of Cars Owned 0.85 -0.41

Page | 8
➢ Key Observations in Descriptive Statistics of Validation Dataset:

The descriptive statistics for the validation dataset of 180 respondents provides an overview of
Trips, Income (INR), Age, Accessibility Index, and Number of Cars Owned, with essential
measures of central tendency, variability, and distribution shape.

1. Trips: The mean number of trips in this subset is 15.97 with a standard deviation of 7.57,
slightly lower than the main dataset, indicating slightly fewer trips on average. The
distribution has a modest positive skew (0.52) and near-zero kurtosis (-0.00), reflecting a
relatively symmetric spread of trip counts.
2. Income (INR): The mean income is 77,045 INR, substantially lower than in the main
dataset, suggesting this subset might include more lower-income respondents. Income is
highly skewed (3.43) with significant kurtosis (11.39), highlighting a concentration of
lower incomes with a few very high values, which could impact model results if not
normalized or transformed.
3. Age: The mean age is 41.6, closely aligned with the main dataset, and has moderate
variability (standard deviation = 9.54). The skewness (0.81) and kurtosis (1.25) indicate
a slightly positively skewed distribution with some tendency for extreme age values.
4. Accessibility Index: The mean Accessibility Index is 0.26, close to that of the main
dataset, with a standard deviation of 0.21. A skewness of 1.03 suggests a right-skewed
distribution, meaning fewer households have high accessibility, while the kurtosis (0.66)
shows moderate peak.
5. Number of Cars Owned: The average car ownership remains low (mean = 0.4), with
many respondents not owning a car. This variable also has positive skewness (0.85) and
slight negative kurtosis (-0.41), reflecting a concentration around zero, with few
households owning more than one car.

The validation subset's distribution characteristics, particularly in Income and Number of Cars
Owned, align with the main dataset, though slightly more concentrated around lower values.
These findings emphasize the need to address the high skewness in income for both the
training and validation datasets, possibly through transformations or robust modelling
techniques. Addressing these differences will be crucial to developing a trip production model
that generalizes well across diverse population segments in Pune’s metropolitan area.

Page | 9
Fig. 2 Histogram of Trips

Page | 10
04.PROBLEM PART – 2 AND ITS SOLUTIONS

2. Provide graphical representations of variability in dependent variable and selected


independent variables using Minitab.

Fig. 3 Scatter Plot Trips Vs Accessibility Index Fig. 4 Scatter Plot Trips Vs Age

Fig. 5 Scatter Plot Trips Vs Distance to Metro Station Fig. 6 Scatter Plot Trips Vs Employed HH Members

Page | 11
Fig. 3 Scatter Plot Trips Vs Household Size Fig. 4 Scatter Plot Trips Vs Income

Fig. 5 Scatter Plot Trips Vs Numbers of Cars Owned Fig. 6 Scatter Plot Trips Vs Distance to City Centroid

Page | 12
05.PROBLEM PART – 3 AND ITS SOLUTIONS

3. Develop a set of 5 models for predicting trip production from the study city. Compare the
models using standard goodness of fit measures and justify their inclusion.

MODEL-1 Regression Analysis: Trips versus Income (INR)

The regression equation is


Trips = 14.9 + 0.000021 Income (INR)

Predictor Coef SE Coef T P


Constant 14.8997 0.1883 79.11 0.000
Income (INR) 0.00002096 0.00000073 28.82 0.000

S = 6.23848 R-Sq = 38.7% R-Sq(adj) = 38.7%

Analysis of Variance

Source DF SS MS F P
Regression 1 32321 32321 830.47 0.000
Residual Error 1313 51100 39
Total 1314 83421

Unusual Observations

Income
Obs (INR) Trips Fit SE Fit Residual St Resid
5 742509 28.000 30.462 0.494 -2.462 -0.40 X
6 931554 29.000 34.424 0.625 -5.424 -0.87 X
13 1 2.000 14.900 0.188 -12.900 -2.07R
26 1087723 29.000 37.697 0.735 -8.697 -1.40 X
44 1344445 43.000 43.078 0.917 -0.078 -0.01 X
47 651750 28.000 28.560 0.433 -0.560 -0.09 X
52 6 36.000 14.900 0.188 21.100 3.38R
88 1153594 29.000 39.078 0.782 -10.078 -1.63 X
96 977848 30.000 35.394 0.657 -5.394 -0.87 X
140 843218 30.000 32.572 0.564 -2.572 -0.41 X
145 1136587 29.000 38.721 0.769 -9.721 -1.57 X
150 663231 39.000 28.800 0.441 10.200 1.64 X
155 1007629 28.000 36.018 0.678 -8.018 -1.29 X
156 1201624 30.000 40.084 0.816 -10.084 -1.63 X
164 708551 20.000 29.750 0.471 -9.750 -1.57 X
169 1074752 31.000 37.425 0.726 -6.425 -1.04 X
180 897187 29.000 33.704 0.601 -4.704 -0.76 X
186 4 39.000 14.900 0.188 24.100 3.86R
190 692499 30.000 29.414 0.460 0.586 0.09 X
202 839285 29.000 32.490 0.561 -3.490 -0.56 X
231 648174 29.000 28.485 0.431 0.515 0.08 X
234 1112628 30.000 38.219 0.752 -8.219 -1.33 X
257 837630 29.000 32.455 0.560 -3.455 -0.56 X
264 6000 40.000 15.025 0.187 24.975 4.01R
300 339444 36.000 22.014 0.242 13.986 2.24R
343 316884 4.000 21.541 0.231 -17.541 -2.81R
360 1070223 31.000 37.330 0.723 -6.330 -1.02 X
385 807308 31.000 31.820 0.539 -0.820 -0.13 X
389 1144390 28.000 38.885 0.775 -10.885 -1.76 X
395 1892 29.000 14.939 0.188 14.061 2.25R
414 1176573 31.000 39.559 0.798 -8.559 -1.38 X
437 997068 30.000 35.797 0.671 -5.797 -0.93 X
484 834715 29.000 32.394 0.558 -3.394 -0.55 X
491 1008852 30.000 36.044 0.679 -6.044 -0.97 X

Page | 13
497 1112253 30.000 38.211 0.752 -8.211 -1.33 X
498 1024388 30.000 36.370 0.690 -6.370 -1.03 X
503 933827 31.000 34.472 0.627 -3.472 -0.56 X
524 673795 28.000 29.022 0.448 -1.022 -0.16 X
544 863442 30.000 32.996 0.578 -2.996 -0.48 X
555 1072503 29.000 37.378 0.724 -8.378 -1.35 X
558 766346 30.000 30.961 0.511 -0.961 -0.15 X
586 803332 31.000 31.737 0.536 -0.737 -0.12 X
596 908554 28.000 33.942 0.609 -5.942 -0.96 X
646 718936 31.000 29.968 0.478 1.032 0.17 X
657 1112177 30.000 38.210 0.752 -8.210 -1.33 X
660 857351 29.000 32.869 0.573 -3.869 -0.62 X
671 747332 29.000 30.563 0.498 -1.563 -0.25 X
711 1157129 30.000 39.152 0.784 -9.152 -1.48 X
755 984640 30.000 35.536 0.662 -5.536 -0.89 X
779 968958 27.000 35.208 0.651 -8.208 -1.32 X
789 1161625 30.000 39.246 0.787 -9.246 -1.49 X
798 908983 27.000 33.951 0.609 -6.951 -1.12 X
845 1155254 31.000 39.112 0.783 -8.112 -1.31 X
851 965089 31.000 35.127 0.648 -4.127 -0.67 X
856 1000069 31.000 35.860 0.673 -4.860 -0.78 X
870 1140903 28.000 38.812 0.773 -10.812 -1.75 X
900 698365 29.000 29.537 0.464 -0.537 -0.09 X
913 709938 29.000 29.779 0.472 -0.779 -0.13 X
931 740117 29.000 30.412 0.493 -1.412 -0.23 X
943 768013 30.000 30.996 0.512 -0.996 -0.16 X
944 899583 28.000 33.754 0.603 -5.754 -0.93 X
960 965136 30.000 35.128 0.649 -5.128 -0.83 X
967 773643 28.000 31.114 0.516 -3.114 -0.50 X
987 867677 28.000 33.085 0.580 -5.085 -0.82 X
1004 1134879 30.000 38.685 0.768 -8.685 -1.40 X
1023 1084226 29.000 37.624 0.732 -8.624 -1.39 X
1028 831167 29.000 32.320 0.555 -3.320 -0.53 X
1053 1 2.000 14.900 0.188 -12.900 -2.07R
1082 765691 30.000 30.948 0.510 -0.948 -0.15 X
1119 709633 29.000 29.773 0.472 -0.773 -0.12 X
1124 669718 28.000 28.936 0.445 -0.936 -0.15 X
1140 1104685 31.000 38.052 0.747 -7.052 -1.14 X
1155 655946 27.000 28.647 0.436 -1.647 -0.26 X
1168 889466 29.000 33.542 0.596 -4.542 -0.73 X
1172 783838 32.000 31.328 0.523 0.672 0.11 X
1179 711943 30.000 29.821 0.474 0.179 0.03 X
1183 1090714 29.000 37.760 0.737 -8.760 -1.41 X
1193 1140652 29.000 38.806 0.772 -9.806 -1.58 X
1203 980837 30.000 35.457 0.660 -5.457 -0.88 X
1234 788111 28.000 31.417 0.526 -3.417 -0.55 X
1242 637723 30.000 28.266 0.424 1.734 0.28 X
1273 0 1.000 14.900 0.188 -13.900 -2.23R
1274 663958 29.000 28.815 0.441 0.185 0.03 X
1285 29 2.000 14.900 0.188 -12.900 -2.07R
1311 821948 29.000 32.127 0.549 -3.127 -0.50 X

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large leverage.

Page | 14
Model Interpretation:
1. Intercept (Constant): The intercept is 14.9, meaning that if Income is zero, the predicted
number of trips would be approximately 14.9. This suggests that on average, households
make about 15 trips regardless of income, likely reflecting baseline travel needs.
2. Income Coefficient: The coefficient for Income (INR) is 0.000021, indicating that for
each additional unit increase in income, the number of trips increases by approximately
0.000021. Although this effect is small due to the scale of the income variable, it still
shows a positive relationship: as income increases, the number of trips tends to increase.
3. Significance: The t-value for the income predictor is 28.82 with a p-value of 0.000,
indicating that Income (INR) is a statistically significant predictor of Trips at conventional
significance levels.
Model Fit:
• Standard Error (S): The standard error of the estimate is 6.23848, indicating the average
deviation of observed trips from the predicted trips is around 6.2 trips.
• R-Squared (R²): The model explains 38.7% of the variability in Trips (R² = 38.7%),
which suggests a moderate fit. This means that income alone accounts for a substantial,
though not exhaustive, portion of the variation in trip production. Other variables not
included in this model likely contribute to the remaining unexplained variation in trips.
Analysis of Variance (ANOVA):
• Regression Sum of Squares (SS): The regression SS is 32,321, reflecting the portion of
variation in Trips explained by Income.
• Residual Sum of Squares (SS): The residual SS is 51,100, which represents the variation
in Trips not explained by the model.
• F-statistic: The F-statistic (830.47) with a p-value of 0.000 indicates that the model is
statistically significant overall, meaning income is an effective predictor of trips.
This regression model shows that income is a significant factor influencing trip production.
However, the moderate R² suggests other factors, such as household size, car ownership, or
accessibility, may also play an essential role. Expanding the model to include these additional
predictors could improve its explanatory power and provide a more comprehensive understanding
of trip production in the Pune Metropolitan Area.

Page | 15
MODEL-2 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned

The regression equation is


Trips = 10.8 + 0.000017 Income (INR) + 0.0482 Age + 5.20 Number of Cars Owned

Predictor Coef SE Coef T P


Constant 10.7641 0.6843 15.73 0.000
Income (INR) 0.00001666 0.00000068 24.41 0.000
Age 0.04817 0.01546 3.12 0.002
Number of Cars Owned 5.1978 0.2793 18.61 0.000

S = 5.52257 R-Sq = 52.1% R-Sq(adj) = 52.0%

Analysis of Variance

Source DF SS MS F P
Regression 3 43437 14479 474.74 0.000
Residual Error 1311 39984 30
Total 1314 83421

Source DF Seq SS
Income (INR) 1 32321
Age 1 550
Number of Cars Owned 1 10566

Unusual Observations

Income
Obs (INR) Trips Fit SE Fit Residual St Resid
6 931554 29.000 33.648 0.555 -4.648 -0.85 X
26 1087723 29.000 31.822 0.788 -2.822 -0.52 X
40 56615 26.000 13.827 0.200 12.173 2.21R
44 1344445 43.000 35.761 0.924 7.239 1.33 X
52 6 36.000 11.969 0.334 24.031 4.36R
88 1153594 29.000 37.684 0.706 -8.684 -1.59 X
96 977848 30.000 29.703 0.692 0.297 0.05 X
120 200652 28.000 16.804 0.307 11.196 2.03R
140 843218 30.000 37.133 0.568 -7.133 -1.30 X
145 1136587 29.000 37.593 0.708 -8.593 -1.57 X
155 1007629 28.000 34.385 0.626 -6.385 -1.16 X
156 1201624 30.000 43.392 0.743 -13.392 -2.45RX
169 1074752 31.000 36.177 0.649 -5.177 -0.94 X
173 215933 27.000 15.903 0.284 11.097 2.01R
180 897187 29.000 33.557 0.557 -4.557 -0.83 X
183 193222 27.000 15.862 0.231 11.138 2.02R
185 610148 29.000 32.769 0.529 -3.769 -0.69 X
186 4 39.000 13.173 0.233 25.827 4.68R
202 839285 29.000 27.250 0.595 1.750 0.32 X
209 126048 26.000 14.598 0.230 11.402 2.07R
234 1112628 30.000 31.225 0.760 -1.225 -0.22 X
264 6000 40.000 18.470 0.255 21.530 3.90R
300 339444 36.000 23.158 0.296 12.842 2.33R
343 316884 4.000 18.162 0.275 -14.162 -2.57R
355 62015 25.000 13.869 0.199 11.131 2.02R
360 1070223 31.000 35.524 0.658 -4.524 -0.83 X
364 157104 26.000 14.826 0.289 11.174 2.03R
389 1144390 28.000 31.369 0.800 -3.369 -0.62 X
395 1892 29.000 13.541 0.303 15.459 2.80R
414 1176573 31.000 32.387 0.801 -1.387 -0.25 X
437 997068 30.000 34.305 0.612 -4.305 -0.78 X
440 4817 18.000 19.751 0.570 -1.751 -0.32 X
491 1008852 30.000 30.315 0.720 -0.315 -0.06 X

Page | 16
497 1112253 30.000 36.369 0.677 -6.369 -1.16 X
498 1024388 30.000 40.199 0.652 -10.199 -1.86 X
503 933827 31.000 38.450 0.621 -7.450 -1.36 X
509 155141 28.000 15.516 0.217 12.484 2.26R
536 87055 25.000 13.708 0.268 11.292 2.05R
555 1072503 29.000 36.044 0.646 -7.044 -1.28 X
558 766346 30.000 26.324 0.580 3.676 0.67 X
596 908554 28.000 28.067 0.625 -0.067 -0.01 X
611 69104 26.000 13.505 0.247 12.495 2.26R
657 1112177 30.000 36.608 0.671 -6.608 -1.21 X
664 66222 26.000 14.758 0.337 11.242 2.04R
700 487803 27.000 32.465 0.549 -5.465 -0.99 X
711 1157129 30.000 32.063 0.788 -2.063 -0.38 X
755 984640 30.000 34.532 0.589 -4.532 -0.83 X
779 968958 27.000 34.030 0.583 -7.030 -1.28 X
789 1161625 30.000 37.432 0.703 -7.432 -1.36 X
798 908983 27.000 37.795 0.633 -10.795 -1.97 X
845 1155254 31.000 41.897 0.754 -10.897 -1.99 X
848 97534 27.000 13.882 0.269 13.118 2.38R
851 965089 31.000 34.592 0.590 -3.592 -0.65 X
856 1000069 31.000 34.596 0.601 -3.596 -0.66 X
861 67077 26.000 13.519 0.238 12.481 2.26R
870 1140903 28.000 36.846 0.696 -8.846 -1.61 X
877 549628 29.000 23.099 0.535 5.901 1.07 X
906 601938 29.000 32.536 0.542 -3.536 -0.64 X
943 768013 30.000 36.410 0.536 -6.410 -1.17 X
944 899583 28.000 27.243 0.648 0.757 0.14 X
960 965136 30.000 39.164 0.625 -9.164 -1.67 X
966 156806 27.000 15.833 0.251 11.167 2.02R
967 773643 28.000 35.926 0.541 -7.926 -1.44 X
983 62925 27.000 13.306 0.266 13.694 2.48R
987 867677 28.000 27.627 0.607 0.373 0.07 X
1004 1134879 30.000 36.986 0.686 -6.986 -1.27 X
1023 1084226 29.000 32.583 0.909 -3.583 -0.66 X
1053 1 2.000 13.076 0.219 -11.076 -2.01R
1055 72206 26.000 13.412 0.277 12.588 2.28R
1064 63162 25.000 13.454 0.238 11.546 2.09R
1065 12183 22.000 19.922 0.583 2.078 0.38 X
1082 765691 30.000 25.061 0.561 4.939 0.90 X
1091 9427 18.000 14.679 0.586 3.321 0.60 X
1096 260929 27.000 24.066 0.570 2.934 0.53 X
1108 37089 23.000 20.337 0.579 2.663 0.48 X
1110 4915 20.000 19.801 0.584 0.199 0.04 X
1119 709633 29.000 26.343 0.732 2.657 0.49 X
1140 1104685 31.000 36.243 0.672 -5.243 -0.96 X
1144 4159 19.000 14.591 0.587 4.409 0.80 X
1148 174351 27.000 22.624 0.569 4.376 0.80 X
1168 889466 29.000 28.423 0.658 0.577 0.11 X
1172 783838 32.000 35.999 0.552 -3.999 -0.73 X
1179 711943 30.000 31.579 0.668 -1.579 -0.29 X
1183 1090714 29.000 35.962 0.666 -6.962 -1.27 X
1193 1140652 29.000 41.365 0.780 -12.365 -2.26RX
1197 4027 19.000 19.786 0.584 -0.786 -0.14 X
1203 980837 30.000 34.469 0.586 -4.469 -0.81 X
1205 5580 19.000 14.615 0.587 4.385 0.80 X
1227 10500 24.000 12.529 0.246 11.471 2.08R
1235 150946 27.000 15.543 0.224 11.457 2.08R
1273 0 1.000 12.306 0.256 -11.306 -2.05R
1274 663958 29.000 33.713 0.539 -4.713 -0.86 X
1281 547229 29.000 31.624 0.529 -2.624 -0.48 X
1285 29 2.000 13.510 0.303 -11.510 -2.09R

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large leverage.

Page | 17
Model Interpretation:
1. Intercept (Constant): The intercept is 10.8, suggesting that for a household with zero
income, age of 0, and no cars, the baseline number of trips would be approximately 10.8.
This serves as the starting point for the prediction model.
2. Income Coefficient: The coefficient for Income (INR) is 0.000017. This indicates that for
every additional INR in income, the number of trips increases by 0.000017, a small but
statistically significant effect (p = 0.000). Income remains a strong predictor of trip
production.
3. Age Coefficient: The coefficient for Age is 0.0482, meaning that each additional year in
age is associated with an increase of about 0.048 trips. This effect is smaller than the other
predictors but still statistically significant (p = 0.002), suggesting that age contributes
modestly to trip production.
4. Number of Cars Owned Coefficient: The coefficient for Number of Cars Owned is 5.20,
meaning that each additional car owned by a household is associated with an increase of
5.2 trips, making this the strongest predictor in the model. This indicates that car
ownership has a significant impact on the number of trips, likely due to greater mobility
and convenience associated with vehicle access.
Model Fit:
• Standard Error (S): The standard error of the estimate is 5.52257, indicating a reduced
average deviation of observed trips from the predicted trips compared to the previous
model, suggesting improved accuracy.
• R-Squared (R²): The model explains 52.1% of the variability in Trips (R² = 52.1%), a
substantial improvement over the single-variable model. This suggests that the addition
of Age and Number of Cars Owned has significantly enhanced the model’s explanatory
power.
Analysis of Variance (ANOVA):
• Regression Sum of Squares (SS): The regression SS is 43,437, representing the portion of
variation in Trips explained by the model.
• Residual Sum of Squares (SS): The residual SS is 39,984, which represents the
unexplained variation in Trips after including the predictors.
• F-statistic: The F-statistic (474.74) with a p-value of 0.000 indicates that the model is
statistically significant overall.
Sequential Sum of Squares:
• Income (INR) contributes 32,321 to the explained variability in trips, affirming its role as
a significant predictor.
• Age adds an additional 550, a smaller but significant contribution.
• Number of Cars Owned explains 10,566, confirming its strong impact on trip production.

Page | 18
This enhanced model, which includes Income (INR), Age, and Number of Cars Owned, provides
a more comprehensive prediction of trip production. The high coefficient for Number of Cars
Owned suggests vehicle access is a major driver of trip generation, while income and age also
positively contribute. With an adjusted R² of 52.0%, this model has a much-improved fit
compared to the single-variable model, offering a more accurate and meaningful analysis for
predicting travel behaviour in Pune.

MODEL – 3 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned,
Accessibility Index

The regression equation is


Trips = 8.89 + 0.000013 Income (INR) + 0.0388 Age + 4.24 Number of Cars Owned
+ 11.5 Accessibility Index

Predictor Coef SE Coef T P


Constant 8.8910 0.6314 14.08 0.000
Income (INR) 0.00001324 0.00000065 20.28 0.000
Age 0.03878 0.01405 2.76 0.006
Number of Cars Owned 4.2446 0.2599 16.33 0.000
Accessibility Index 11.5039 0.6877 16.73 0.000

S = 5.01490 R-Sq = 60.5% R-Sq(adj) = 60.4%

Analysis of Variance

Source DF SS MS F P
Regression 4 50475 12619 501.76 0.000
Residual Error 1310 32945 25
Total 1314 83421

Source DF Seq SS
Income (INR) 1 32321
Age 1 550
Number of Cars Owned 1 10566
Accessibility Index 1 7038

Unusual Observations

Income
Obs (INR) Trips Fit SE Fit Residual St Resid
13 1 2.000 20.650 0.571 -18.650 -3.74RX
26 1087723 29.000 29.893 0.725 -0.893 -0.18 X
44 1344445 43.000 29.674 0.915 13.326 2.70RX
52 6 36.000 10.047 0.325 25.953 5.19R
88 1153594 29.000 36.138 0.648 -7.138 -1.44 X
96 977848 30.000 34.086 0.681 -4.086 -0.82 X
145 1136587 29.000 40.428 0.665 -11.428 -2.30RX
155 1007629 28.000 38.117 0.611 -10.117 -2.03RX
156 1201624 30.000 40.597 0.695 -10.597 -2.13RX
164 708551 20.000 31.242 0.392 -11.242 -2.25R
169 1074752 31.000 35.284 0.591 -4.284 -0.86 X
180 897187 29.000 27.678 0.616 1.322 0.27 X
183 193222 27.000 13.077 0.268 13.923 2.78R
185 610148 29.000 28.643 0.540 0.357 0.07 X
186 4 39.000 11.903 0.225 27.097 5.41R
202 839285 29.000 22.245 0.618 6.755 1.36 X

Page | 19
209 126048 26.000 13.792 0.214 12.208 2.44R
213 608918 29.000 28.972 0.577 0.028 0.01 X
234 1112628 30.000 32.074 0.692 -2.074 -0.42 X
257 837630 29.000 26.126 0.570 2.874 0.58 X
264 6000 40.000 17.253 0.243 22.747 4.54R
287 138668 25.000 14.856 0.218 10.144 2.02R
300 339444 36.000 18.901 0.370 17.099 3.42R
335 47623 23.000 11.385 0.269 11.615 2.32R
343 316884 4.000 21.851 0.333 -17.851 -3.57R
360 1070223 31.000 30.926 0.658 0.074 0.01 X
364 157104 26.000 12.783 0.289 13.217 2.64R
371 49475 23.000 11.871 0.210 11.129 2.22R
389 1144390 28.000 26.476 0.783 1.524 0.31 X
395 1892 29.000 11.784 0.294 17.216 3.44R
414 1176573 31.000 36.915 0.776 -5.915 -1.19 X
437 997068 30.000 36.641 0.573 -6.641 -1.33 X
440 4817 18.000 22.734 0.547 -4.734 -0.95 X
473 96177 25.000 12.627 0.211 12.373 2.47R
483 572022 29.000 26.928 0.547 2.072 0.42 X
491 1008852 30.000 28.793 0.660 1.207 0.24 X
497 1112253 30.000 37.148 0.617 -7.148 -1.44 X
498 1024388 30.000 35.622 0.652 -5.622 -1.13 X
503 933827 31.000 31.439 0.703 -0.439 -0.09 X
544 863442 30.000 27.077 0.584 2.923 0.59 X
551 85821 25.000 12.039 0.242 12.961 2.59R
555 1072503 29.000 39.313 0.618 -10.313 -2.07RX
558 766346 30.000 29.224 0.554 0.776 0.16 X
586 803332 31.000 25.978 0.543 5.022 1.01 X
596 908554 28.000 25.942 0.582 2.058 0.41 X
611 69104 26.000 14.392 0.230 11.608 2.32R
657 1112177 30.000 38.179 0.617 -8.179 -1.64 X
664 66222 26.000 13.682 0.313 12.318 2.46R
701 296671 28.000 16.863 0.248 11.137 2.22R
711 1157129 30.000 30.606 0.721 -0.606 -0.12 X
755 984640 30.000 29.195 0.623 0.805 0.16 X
781 22831 22.000 10.902 0.278 11.098 2.22R
789 1161625 30.000 35.000 0.655 -5.000 -1.01 X
798 908983 27.000 41.658 0.620 -14.658 -2.95RX
840 79334 24.000 11.623 0.236 12.377 2.47R
845 1155254 31.000 37.655 0.730 -6.655 -1.34 X
851 965089 31.000 34.280 0.536 -3.280 -0.66 X
856 1000069 31.000 29.676 0.620 1.324 0.27 X
862 62079 25.000 13.999 0.188 11.001 2.20R
870 1140903 28.000 37.279 0.632 -9.279 -1.87 X
883 53398 24.000 12.236 0.213 11.764 2.35R
921 30120 26.000 15.653 0.285 10.347 2.07R
944 899583 28.000 26.437 0.591 1.563 0.31 X
960 965136 30.000 33.551 0.659 -3.551 -0.71 X
983 62925 27.000 11.912 0.256 15.088 3.01R
987 867677 28.000 31.027 0.587 -3.027 -0.61 X
1004 1134879 30.000 40.532 0.658 -10.532 -2.12RX
1005 585979 29.000 28.599 0.539 0.401 0.08 X
1023 1084226 29.000 28.695 0.858 0.305 0.06 X
1033 10463 22.000 10.812 0.222 11.188 2.23R
1055 72206 26.000 13.391 0.252 12.609 2.52R
1059 5 5.000 19.233 0.433 -14.233 -2.85R
1082 765691 30.000 28.726 0.555 1.274 0.26 X
1091 9427 18.000 16.319 0.541 1.681 0.34 X
1101 273147 28.000 17.274 0.231 10.726 2.14R
1108 37089 23.000 22.540 0.542 0.460 0.09 X
1119 709633 29.000 31.031 0.722 -2.031 -0.41 X
1140 1104685 31.000 34.048 0.624 -3.048 -0.61 X
1144 4159 19.000 12.120 0.553 6.880 1.38 X
1145 339838 29.000 18.338 0.264 10.662 2.13R
1148 174351 27.000 19.777 0.544 7.223 1.45 X
1168 889466 29.000 31.042 0.617 -2.042 -0.41 X
1179 711943 30.000 32.269 0.608 -2.269 -0.46 X
1183 1090714 29.000 33.602 0.621 -4.602 -0.92 X

Page | 20
1193 1140652 29.000 39.635 0.716 -10.635 -2.14RX
1199 372577 28.000 17.594 0.301 10.406 2.08R
1205 5580 19.000 17.196 0.555 1.804 0.36 X
1242 637723 30.000 19.716 0.469 10.284 2.06R
1261 9272 22.000 11.805 0.195 10.195 2.03R
1271 337082 30.000 19.123 0.382 10.877 2.18R

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large leverage.

Model Interpretation:
1. Intercept (Constant): The intercept is 8.89, indicating that for households with zero
values for Income, Age, Number of Cars Owned, and Accessibility Index, the predicted
number of trips is 8.89. This represents baseline travel independent of these variables.
2. Income (INR): The coefficient for Income is 0.000013, showing a positive, but small,
effect of income on trips. For each additional unit increase in income, trips increase by
0.000013. This effect is statistically significant (p<0.001p < 0.001p<0.001).
3. Age: The coefficient for Age is 0.0388, meaning each additional year of age increases the
predicted number of trips by 0.0388. The positive relationship suggests that older
individuals may undertake slightly more trips, possibly due to family responsibilities or
work-related commutes. The effect is statistically significant (p=0.006p = 0.006p=0.006).
4. Number of Cars Owned: With a coefficient of 4.24, owning an additional car
significantly increases trips by approximately 4.24 trips, reflecting the role of private
vehicle ownership in travel behaviour. This variable has a highly significant effect
(p<0.001p < 0.001p<0.001).
5. Accessibility Index: The coefficient for Accessibility Index is 11.5, indicating that a unit
increase in accessibility substantially increases trips by 11.5, emphasizing the importance
of connectivity in influencing travel demand. This predictor is also highly significant
(p<0.001p < 0.001p<0.001).

Model Fit:

• Standard Error (S): The standard error of 5.01490 suggests that, on average, observed
trips deviate from predicted trips by about 5 trips.
• R-Squared (R²): The model explains 60.5% of the variability in Trips, a considerable
improvement over the simpler model with only income (R² = 38.7%). The adjusted R²
(60.4%) confirms the model's robustness, accounting for the number of predictors.

Analysis of Variance (ANOVA):

• Regression SS: The sum of squares for the regression (50,475) indicates a substantial
portion of total variability in Trips is explained by the model.
• Residual SS: The residual SS (32,945) represents unexplained variability, significantly
reduced compared to the simpler model.
• F-Statistic: The F-statistic (501.76) and p<0.001p < 0.001p<0.001 indicate the model is
statistically significant overall.

Page | 21
This multiple regression model demonstrates that Income (INR), Age, Number of Cars Owned,
and Accessibility Index significantly influence trip production in the Pune Metropolitan Area. The
inclusion of additional predictors improves the explanatory power of the model, with Number of
Cars Owned and Accessibility Index playing particularly strong roles. Future efforts could explore
interaction effects or incorporate additional socio-economic variables to further refine the model.

MODEL – 4 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned,
Accessibility Index, Distance to City Centroid

The regression equation is


Trips = 11.5 + 0.000013 Income (INR) + 0.0356 Age + 3.52 Number of Cars Owned
+ 9.18 Accessibility Index - 0.159 Distance to City Centroid

Predictor Coef SE Coef T P


Constant 11.5478 0.5940 19.44 0.000
Income (INR) 0.00001305 0.00000059 22.05 0.000
Age 0.03563 0.01274 2.80 0.005
Number of Cars Owned 3.5166 0.2397 14.67 0.000
Accessibility Index 9.1757 0.6388 14.36 0.000
Distance to City Centroid -0.159004 0.009441 -16.84 0.000

S = 4.54822 R-Sq = 67.5% R-Sq(adj) = 67.4%

Analysis of Variance

Source DF SS MS F P
Regression 5 56342 11268 544.73 0.000
Residual Error 1309 27078 21
Total 1314 83421

Source DF Seq SS
Income (INR) 1 32321
Age 1 550
Number of Cars Owned 1 10566
Accessibility Index 1 7038
Distance to City Centroid 1 5867

Unusual Observations

Income
Obs (INR) Trips Fit SE Fit Residual St Resid
13 1 2.000 4.751 1.077 -2.751 -0.62 X
26 1087723 29.000 30.668 0.659 -1.668 -0.37 X
44 1344445 43.000 26.966 0.845 16.034 3.59RX
52 6 36.000 7.901 0.321 28.099 6.19R
82 165868 25.000 -1.227 1.329 26.227 6.03RX
88 1153594 29.000 35.884 0.588 -6.884 -1.53 X
96 977848 30.000 33.807 0.618 -3.807 -0.84 X
128 2 3.000 2.678 0.549 0.322 0.07 X
145 1136587 29.000 39.163 0.608 -10.163 -2.25RX
150 663231 39.000 29.629 0.326 9.371 2.07R
155 1007629 28.000 37.359 0.556 -9.359 -2.07RX
156 1201624 30.000 39.535 0.634 -9.535 -2.12RX
164 708551 20.000 30.823 0.357 -10.823 -2.39R
169 1074752 31.000 35.372 0.536 -4.372 -0.97 X
180 897187 29.000 28.378 0.560 0.622 0.14 X
183 193222 27.000 15.161 0.273 11.839 2.61R

Page | 22
186 4 39.000 8.366 0.293 30.634 6.75R
202 839285 29.000 23.929 0.569 5.071 1.12 X
209 126048 26.000 15.485 0.219 10.515 2.31R
234 1112628 30.000 32.618 0.629 -2.618 -0.58 X
264 6000 40.000 18.193 0.227 21.807 4.80R
300 339444 36.000 19.422 0.337 16.578 3.66R
335 47623 23.000 13.207 0.266 9.793 2.16R
343 316884 4.000 22.105 0.302 -18.105 -3.99R
360 1070223 31.000 31.253 0.597 -0.253 -0.06 X
364 157104 26.000 14.352 0.278 11.648 2.57R
371 49475 23.000 13.670 0.218 9.330 2.05R
389 1144390 28.000 27.898 0.715 0.102 0.02 X
395 1892 29.000 13.589 0.288 15.411 3.40R
414 1176573 31.000 36.484 0.704 -5.484 -1.22 X
473 96177 25.000 14.347 0.217 10.653 2.34R
491 1008852 30.000 29.446 0.600 0.554 0.12 X
497 1112253 30.000 36.320 0.561 -6.320 -1.40 X
498 1024388 30.000 35.548 0.592 -5.548 -1.23 X
503 933827 31.000 31.812 0.638 -0.812 -0.18 X
510 1 3.000 -0.323 0.712 3.323 0.74 X
530 204 13.000 -6.445 1.083 19.445 4.40RX
539 160 12.000 -19.028 2.155 31.028 7.75RX
544 863442 30.000 28.248 0.535 1.752 0.39 X
551 85821 25.000 13.911 0.246 11.089 2.44R
555 1072503 29.000 38.466 0.563 -9.466 -2.10RX
596 908554 28.000 27.307 0.534 0.693 0.15 X
611 69104 26.000 15.412 0.218 10.588 2.33R
625 1 3.000 -0.293 0.740 3.293 0.73 X
657 1112177 30.000 37.430 0.561 -7.430 -1.65 X
664 66222 26.000 15.179 0.297 10.821 2.38R
701 296671 28.000 18.483 0.244 9.517 2.10R
711 1157129 30.000 31.316 0.655 -1.316 -0.29 X
755 984640 30.000 29.764 0.566 0.236 0.05 X
781 22831 22.000 12.914 0.279 9.086 2.00R
789 1161625 30.000 34.764 0.594 -4.764 -1.06 X
798 908983 27.000 39.687 0.574 -12.687 -2.81RX
840 79334 24.000 13.280 0.236 10.720 2.36R
845 1155254 31.000 37.033 0.663 -6.033 -1.34 X
856 1000069 31.000 30.593 0.565 0.407 0.09 X
862 62079 25.000 15.486 0.192 9.514 2.09R
870 1140903 28.000 36.963 0.574 -8.963 -1.99 X
883 53398 24.000 14.163 0.225 9.837 2.17R
944 899583 28.000 27.079 0.537 0.921 0.20 X
960 965136 30.000 33.380 0.598 -3.380 -0.75 X
983 62925 27.000 13.541 0.251 13.459 2.96R
987 867677 28.000 30.903 0.533 -2.903 -0.64 X
1004 1134879 30.000 39.325 0.601 -9.325 -2.07RX
1020 2 3.000 1.154 0.628 1.846 0.41 X
1023 1084226 29.000 29.859 0.781 -0.859 -0.19 X
1033 10463 22.000 12.569 0.227 9.431 2.08R
1042 2 3.000 1.891 0.551 1.109 0.25 X
1053 1 2.000 -5.145 1.016 7.145 1.61 X
1055 72206 26.000 15.046 0.248 10.954 2.41R
1059 5 5.000 15.008 0.466 -10.008 -2.21R
1101 273147 28.000 18.684 0.226 9.316 2.05R
1119 709633 29.000 30.866 0.655 -1.866 -0.41 X
1140 1104685 31.000 34.301 0.567 -3.301 -0.73 X
1145 339838 29.000 19.565 0.250 9.435 2.08R
1168 889466 29.000 31.162 0.560 -2.162 -0.48 X
1179 711943 30.000 31.865 0.552 -1.865 -0.41 X
1183 1090714 29.000 33.558 0.563 -4.558 -1.01 X
1193 1140652 29.000 38.564 0.653 -9.564 -2.12RX
1271 337082 30.000 20.075 0.351 9.925 2.19R
1273 0 1.000 -11.485 1.344 12.485 2.87RX
1285 29 2.000 12.262 0.283 -10.262 -2.26R

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large leverage.

Page | 23
Model Interpretation:

1. Intercept (Constant): The intercept of 11.5 suggests that a household with all predictor
variables at zero would be expected to make 11.5 trips. This represents a baseline level
of travel independent of the other factors.
2. Income (INR): For every unit increase in income, trips increase by 0.000013. This
effect is small due to the scale of the income variable but significant, as evidenced by a
high t-value (22.05) and p-value (0.000).
3. Age: For each additional year of age, trips increase by 0.0356. This positive relationship
suggests that older individuals or households with older members contribute slightly
more to trip production. The effect is statistically significant (p = 0.005).
4. Number of Cars Owned: Each additional car owned by a household increases trips by
3.52, indicating a substantial influence of car ownership on mobility. The effect is
highly significant (p = 0.000).
5. Accessibility Index: A unit increase in the accessibility index is associated with an
increase of 9.18 trips. This indicates that better access to amenities and transport
significantly enhances trip production.
6. Distance to City Centroid: For every unit increase in distance from the city centre,
trips decrease by 0.159. This negative relationship suggests that households farther from
the city centre tend to make fewer trips, reflecting a distance decay effect.

Model Fit:

• Standard Error (S): The standard error of 4.55 indicates the average deviation of
observed trips from predicted trips, which is substantially lower than the simpler
models, suggesting improved prediction accuracy.
• R-Squared (R²): The model explains 67.5% of the variability in Trips, with an adjusted
R² of 67.4%, indicating that the model has strong explanatory power and that the
predictors collectively explain a large portion of trip production variability.

Analysis of Variance (ANOVA):

• Regression Sum of Squares (SS): The regression SS is 56,342, showing the variation
in trips explained by the predictors.
• Residual SS: The residual SS is 27,078, representing the unexplained variability.
• F-statistic: The F-value of 544.73 with p = 0.000 indicates that the model is highly
significant overall.

Sequential Sum of Squares (Seq SS):

This shows the unique contribution of each variable to the explained variance:

• Income (INR): Accounts for the largest share of explained variance (32,321),
emphasizing its critical role.
• Number of Cars Owned: Contributes significantly (10,566), highlighting its
importance.
• Accessibility Index and Distance to City Centroid: Also play substantial roles, with
Seq SS of 7,038 and 5,867, respectively.
• Age: Has a smaller but significant contribution (550).

Page | 24
This multiple regression model provides a comprehensive explanation of trip production by
incorporating diverse socio-economic and spatial factors. With an adjusted R² of 67.4% and
significant predictors, the model is robust and captures critical drivers of travel behaviour in Pune.
Including accessibility, car ownership, and distance to the city centre improves the explanatory
power compared to simpler models. This highlights the importance of a multi-faceted approach
in trip production analysis for effective transportation planning.

MODEL – 5 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned,
Accessibility Index, Distance to Metro Station (KM)

The regression equation is


Trips = 9.24 + 0.000013 Income (INR) + 0.0380 Age + 4.25 Number of Cars Owned
+ 11.5 Accessibility Index - 0.0175 Distance to Metro Station (km)

Predictor Coef SE Coef T P


Constant 9.2381 0.6869 13.45 0.000
Income (INR) 0.00001326 0.00000065 20.32 0.000
Age 0.03804 0.01406 2.71 0.007
Number of Cars Owned 4.2506 0.2599 16.36 0.000
Accessibility Index 11.4602 0.6883 16.65 0.000
Distance to Metro Station (km) -0.01748 0.01365 -1.28 0.200

S = 5.01367 R-Sq = 60.6% R-Sq(adj) = 60.4%

Analysis of Variance

Source DF SS MS F P
Regression 5 50516 10103 401.93 0.000
Residual Error 1309 32904 25
Total 1314 83421

Source DF Seq SS
Income (INR) 1 32321
Age 1 550
Number of Cars Owned 1 10566
Accessibility Index 1 7038
Distance to Metro Station (km) 1 41

Unusual Observations

Income
Obs (INR) Trips Fit SE Fit Residual St Resid
13 1 2.000 20.909 0.605 -18.909 -3.80RX
26 1087723 29.000 29.624 0.754 -0.624 -0.13 X
44 1344445 43.000 29.748 0.916 13.252 2.69RX
52 6 36.000 9.809 0.374 26.191 5.24R
88 1153594 29.000 36.326 0.664 -7.326 -1.47 X
96 977848 30.000 34.345 0.710 -4.345 -0.88 X
145 1136587 29.000 40.562 0.673 -11.562 -2.33RX
155 1007629 28.000 38.201 0.614 -10.201 -2.05RX
156 1201624 30.000 40.505 0.699 -10.505 -2.12RX
164 708551 20.000 31.228 0.392 -11.228 -2.25R
169 1074752 31.000 35.020 0.626 -4.020 -0.81 X
180 897187 29.000 27.987 0.661 1.013 0.20 X
183 193222 27.000 13.293 0.317 13.707 2.74R
186 4 39.000 11.974 0.231 27.026 5.40R

Page | 25
202 839285 29.000 22.507 0.651 6.493 1.31 X
209 126048 26.000 13.836 0.217 12.164 2.43R
213 608918 29.000 29.184 0.600 -0.184 -0.04 X
234 1112628 30.000 31.812 0.721 -1.812 -0.37 X
264 6000 40.000 17.415 0.274 22.585 4.51R
300 339444 36.000 19.237 0.454 16.763 3.36R
335 47623 23.000 11.564 0.302 11.436 2.29R
343 316884 4.000 21.771 0.339 -17.771 -3.55R
360 1070223 31.000 30.904 0.658 0.096 0.02 X
364 157104 26.000 12.840 0.292 13.160 2.63R
371 49475 23.000 11.613 0.291 11.387 2.27R
389 1144390 28.000 26.332 0.791 1.668 0.34 X
395 1892 29.000 11.966 0.327 17.034 3.40R
414 1176573 31.000 36.672 0.799 -5.672 -1.15 X
473 96177 25.000 12.360 0.297 12.640 2.53R
491 1008852 30.000 28.951 0.671 1.049 0.21 X
497 1112253 30.000 36.980 0.630 -6.980 -1.40 X
498 1024388 30.000 35.421 0.671 -5.421 -1.09 X
503 933827 31.000 31.707 0.733 -0.707 -0.14 X
544 863442 30.000 27.148 0.587 2.852 0.57 X
551 85821 25.000 12.005 0.243 12.995 2.60R
555 1072503 29.000 39.615 0.661 -10.615 -2.14RX
611 69104 26.000 14.101 0.324 11.899 2.38R
657 1112177 30.000 38.387 0.637 -8.387 -1.69 X
664 66222 26.000 13.407 0.380 12.593 2.52R
701 296671 28.000 16.900 0.249 11.100 2.22R
711 1157129 30.000 30.347 0.748 -0.347 -0.07 X
755 984640 30.000 29.103 0.627 0.897 0.18 X
781 22831 22.000 10.616 0.356 11.384 2.28R
789 1161625 30.000 34.769 0.679 -4.769 -0.96 X
798 908983 27.000 41.741 0.623 -14.741 -2.96RX
840 79334 24.000 11.776 0.265 12.224 2.44R
845 1155254 31.000 37.860 0.747 -6.860 -1.38 X
856 1000069 31.000 29.621 0.622 1.379 0.28 X
862 62079 25.000 14.162 0.227 10.838 2.16R
870 1140903 28.000 37.372 0.636 -9.372 -1.88 X
883 53398 24.000 12.178 0.218 11.822 2.36R
921 30120 26.000 15.896 0.343 10.104 2.02R
944 899583 28.000 26.172 0.626 1.828 0.37 X
960 965136 30.000 33.466 0.662 -3.466 -0.70 X
983 62925 27.000 11.657 0.324 15.343 3.07R
987 867677 28.000 30.754 0.625 -2.754 -0.55 X
1004 1134879 30.000 40.277 0.687 -10.277 -2.07RX
1023 1084226 29.000 28.900 0.872 0.100 0.02 X
1033 10463 22.000 10.964 0.252 11.036 2.20R
1055 72206 26.000 13.189 0.297 12.811 2.56R
1059 5 5.000 19.156 0.437 -14.156 -2.83R
1101 273147 28.000 17.064 0.284 10.936 2.18R
1119 709633 29.000 31.073 0.722 -2.073 -0.42 X
1140 1104685 31.000 34.087 0.625 -3.087 -0.62 X
1145 339838 29.000 18.486 0.288 10.514 2.10R
1168 889466 29.000 31.306 0.651 -2.306 -0.46 X
1179 711943 30.000 32.374 0.613 -2.374 -0.48 X
1183 1090714 29.000 33.606 0.620 -4.606 -0.93 X
1193 1140652 29.000 39.625 0.716 -10.625 -2.14RX
1199 372577 28.000 17.898 0.383 10.102 2.02R
1205 5580 19.000 17.455 0.590 1.545 0.31 X
1234 788111 28.000 26.314 0.589 1.686 0.34 X
1242 637723 30.000 19.698 0.469 10.302 2.06R
1271 337082 30.000 18.881 0.426 11.119 2.23R

R denotes an observation with a large standardized residual.


X denotes an observation whose X value gives it large leverage.

Page | 26
Model Interpretation:

1. Intercept (Constant):
The intercept is 9.24, indicating that if all predictors are zero, the baseline number of
trips would be approximately 9.24. While hypothetical (since these predictors rarely
equal zero in practice), this represents the baseline trip production.
2. Income (INR):
For every additional unit increase in income, the predicted number of trips increases by
0.000013. This positive relationship suggests that higher income leads to increased trip-
making, though the effect is small due to the scale of income.
3. Age:
Each year of increase in age adds approximately 0.038 trips, reflecting a slight positive
influence of age on trip production, possibly due to varying mobility needs at different
life stages.
4. Number of Cars Owned:
Households owning an additional car produce 4.25 more trips on average. This
substantial coefficient highlights the role of vehicle ownership in facilitating mobility
and increasing trip production.
5. Accessibility Index:
A unit increase in the Accessibility Index results in approximately 11.5 additional trips,
making this the strongest positive predictor. This suggests that better access to amenities
or transport infrastructure significantly drives trip production.
6. Distance to Metro Station (km):
The negative coefficient (-0.0175) implies that as the distance from the metro station
increases, trip production slightly decreases. However, this effect is not statistically
significant (p = 0.200), suggesting that it may not play a critical role in this model.

Model Fit:

• Standard Error (S): The standard error is 5.01367, indicating that the average
deviation of actual trips from the predicted values is around 5 trips.
• R-Squared (R²): The model explains 60.6% of the variation in Trips (adjusted R² =
60.4%). This is a significant improvement over the single-variable model, indicating that
including additional predictors improves the model’s explanatory power.

Analysis of Variance (ANOVA):

• Regression Sum of Squares (SS): The regression SS is 50,516, representing the portion
of variability in trips explained by the predictors.
• Residual Sum of Squares (SS): The residual SS is 32,904, representing unexplained
variation.
• F-statistic: The F-statistic of 401.93 with a p-value of 0.000 confirms that the overall
model is statistically significant.

Predictor Significance:

• All predictors except Distance to Metro Station (km) have statistically significant p-
values (<0.05), indicating their relevance to the model.
• The insignificance of Distance to Metro Station (p = 0.200) suggests its effect on trip
production may not be substantial or could be overshadowed by other variables.

Page | 27
This multiple regression model explains a substantial portion of the variability in trip production
(R² = 60.6%), demonstrating that factors like income, age, car ownership, and accessibility are
significant contributors. While Distance to Metro Station does not appear statistically significant,
the model as a whole provides a robust foundation for predicting trip-making behaviour in Pune’s
metropolitan area. Future iterations could explore interaction terms or additional variables to
further enhance explanatory power.

Page | 28
06.PROBLEM PART – 4 AND ITS SOLUTIONS

4. Conduct external validation of the set of 5 models and compare their error percentage

MODEL – 1 Regression Analysis: Trips versus Income (INR)


Results for: Worksheet 3 (Training Dataset)
The regression equation is
Trips = 14.9 + 0.000021 Income (INR)

Predictor Coef SE Coef T P


Constant 14.8997 0.1883 79.11 0.000
Income (INR) 0.00002096 0.00000073 28.82 0.000

S = 6.23848 R-Sq = 38.7% R-Sq(adj) = 38.7%

Analysis of Variance

Source DF SS MS F P
Regression 1 32321 32321 830.47 0.000
Residual Error 1313 51100 39
Total 1314 83421

MODEL – 1 Regression Analysis: Trips versus Income (INR)


Results for: Worksheet 4 (Validation Dataset)

The regression equation is


Trips = 14.3 + 0.000021 Income (INR)

Predictor Coef SE Coef T P


Constant 14.3479 0.4856 29.54 0.000
Income (INR) 0.00002108 0.00000216 9.76 0.000

S = 6.12137 R-Sq = 34.9% R-Sq(adj) = 34.5%

Analysis of Variance

Source DF SS MS F P
Regression 1 3573.0 3573.0 95.35 0.000
Residual Error 178 6669.9 37.5
Total 179 10242.9

MODEL-2 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned
Results for: Worksheet 3 (Training Dataset)
The regression equation is
Trips = 10.8 + 0.000017 Income (INR) + 0.0482 Age + 5.20 Number of Cars Owned

Predictor Coef SE Coef T P


Constant 10.7641 0.6843 15.73 0.000
Income (INR) 0.00001666 0.00000068 24.41 0.000
Age 0.04817 0.01546 3.12 0.002
Number of Cars Owned 5.1978 0.2793 18.61 0.000

Page | 29
S = 5.52257 R-Sq = 52.1% R-Sq(adj) = 52.0%

Analysis of Variance

Source DF SS MS F P
Regression 3 43437 14479 474.74 0.000
Residual Error 1311 39984 30
Total 1314 83421

Source DF Seq SS
Income (INR) 1 32321
Age 1 550
Number of Cars Owned 1 10566

MODEL-2 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned
Results for: Worksheet 4 (Validation Dataset)
The regression equation is
Trips = 15.0 + 0.000014 Income (INR) - 0.0592 Age + 5.93 Number of Cars Owned

Predictor Coef SE Coef T P


Constant 14.961 1.838 8.14 0.000
Income (INR) 0.00001432 0.00000215 6.67 0.000
Age -0.05924 0.04275 -1.39 0.168
Number of Cars Owned 5.9341 0.8513 6.97 0.000

S = 5.43922 R-Sq = 49.2% R-Sq(adj) = 48.3%

Analysis of Variance

Source DF SS MS F P
Regression 3 5035.9 1678.6 56.74 0.000
Residual Error 176 5207.0 29.6
Total 179 10242.9

Source DF Seq SS
Income (INR) 1 3573.0
Age 1 25.4
Number of Cars Owned 1 1437.5

MODEL – 3 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned,
Accessibility Index
Results for: Worksheet 3 (Training Dataset)
The regression equation is
Trips = 8.89 + 0.000013 Income (INR) + 0.0388 Age + 4.24 Number of Cars Owned
+ 11.5 Accessibility Index

Predictor Coef SE Coef T P


Constant 8.8910 0.6314 14.08 0.000
Income (INR) 0.00001324 0.00000065 20.28 0.000
Age 0.03878 0.01405 2.76 0.006
Number of Cars Owned 4.2446 0.2599 16.33 0.000
Accessibility Index 11.5039 0.6877 16.73 0.000

S = 5.01490 R-Sq = 60.5% R-Sq(adj) = 60.4%

Page | 30
Analysis of Variance

Source DF SS MS F P
Regression 4 50475 12619 501.76 0.000
Residual Error 1310 32945 25
Total 1314 83421

Source DF Seq SS
Income (INR) 1 32321
Age 1 550
Number of Cars Owned 1 10566
Accessibility Index 1 7038

MODEL – 3 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned,
Accessibility Index
Results for: Worksheet 4 (Validation Dataset)
The regression equation is
Trips = 12.7 + 0.000011 Income (INR) - 0.0592 Age + 4.69 Number of Cars Owned
+ 11.7 Accessibility Index

Predictor Coef SE Coef T P


Constant 12.696 1.729 7.34 0.000
Income (INR) 0.00001112 0.00000204 5.44 0.000
Age -0.05916 0.03920 -1.51 0.133
Number of Cars Owned 4.6923 0.8090 5.80 0.000
Accessibility Index 11.735 2.005 5.85 0.000

S = 4.98828 R-Sq = 57.5% R-Sq(adj) = 56.5%

Analysis of Variance

Source DF SS MS F P
Regression 4 5888.3 1472.1 59.16 0.000
Residual Error 175 4354.5 24.9
Total 179 10242.9

Source DF Seq SS
Income (INR) 1 3573.0
Age 1 25.4
Number of Cars Owned 1 1437.5
Accessibility Index 1 852.5

MODEL – 4 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned,
Accessibility Index, Distance to City Centroid
Results for: Worksheet 3 (Training Dataset)

The regression equation is


Trips = 11.5 + 0.000013 Income (INR) + 0.0356 Age + 3.52 Number of Cars Owned
+ 9.18 Accessibility Index - 0.159 Distance to City Centroid

Predictor Coef SE Coef T P


Constant 11.5478 0.5940 19.44 0.000
Income (INR) 0.00001305 0.00000059 22.05 0.000
Age 0.03563 0.01274 2.80 0.005
Number of Cars Owned 3.5166 0.2397 14.67 0.000
Accessibility Index 9.1757 0.6388 14.36 0.000
Distance to City Centroid -0.159004 0.009441 -16.84 0.000

Page | 31
S = 4.54822 R-Sq = 67.5% R-Sq(adj) = 67.4%

Analysis of Variance

Source DF SS MS F P
Regression 5 56342 11268 544.73 0.000
Residual Error 1309 27078 21
Total 1314 83421

Source DF Seq SS
Income (INR) 1 32321
Age 1 550
Number of Cars Owned 1 10566
Accessibility Index 1 7038
Distance to City Centroid 1 5867

MODEL – 4 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned,
Accessibility Index, Distance to City Centroid
Results for: Worksheet 4 (Validation Dataset)
The regression equation is
Trips = 16.9 + 0.000012 Income (INR) - 0.0457 Age + 3.33 Number of Cars Owned
+ 6.69 Accessibility Index - 0.340 Distance to City Centroid

Predictor Coef SE Coef T P


Constant 16.854 1.526 11.04 0.000
Income (INR) 0.00001174 0.00000171 6.85 0.000
Age -0.04571 0.03289 -1.39 0.166
Number of Cars Owned 3.3255 0.6961 4.78 0.000
Accessibility Index 6.693 1.778 3.76 0.000
Distance to City Centroid -0.33963 0.03916 -8.67 0.000

S = 4.18013 R-Sq = 70.3% R-Sq(adj) = 69.5%

Analysis of Variance

Source DF SS MS F P
Regression 5 7202.5 1440.5 82.44 0.000
Residual Error 174 3040.4 17.5
Total 179 10242.9

Source DF Seq SS
Income (INR) 1 3573.0
Age 1 25.4
Number of Cars Owned 1 1437.5
Accessibility Index 1 852.5
Distance to City Centroid 1 1314.1

MODEL – 5 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned,
Accessibility Index, Distance to Metro Station (KM)
Results for: Worksheet 3 (Training Dataset)

The regression equation is


Trips = 9.24 + 0.000013 Income (INR) + 0.0380 Age + 4.25 Number of Cars Owned
+ 11.5 Accessibility Index - 0.0175 Distance to Metro Station (km)

Predictor Coef SE Coef T P

Page | 32
Constant 9.2381 0.6869 13.45 0.000
Income (INR) 0.00001326 0.00000065 20.32 0.000
Age 0.03804 0.01406 2.71 0.007
Number of Cars Owned 4.2506 0.2599 16.36 0.000
Accessibility Index 11.4602 0.6883 16.65 0.000
Distance to Metro Station (km) -0.01748 0.01365 -1.28 0.200

S = 5.01367 R-Sq = 60.6% R-Sq(adj) = 60.4%

Analysis of Variance

Source DF SS MS F P
Regression 5 50516 10103 401.93 0.000
Residual Error 1309 32904 25
Total 1314 83421

Source DF Seq SS
Income (INR) 1 32321
Age 1 550
Number of Cars Owned 1 10566
Accessibility Index 1 7038
Distance to Metro Station (km) 1 41

MODEL – 5 Regression Analysis: Trips versus Income (INR), Age, Number of Cars Owned,
Accessibility Index, Distance to Metro Station (KM)
Results for: Worksheet 4 (Validation Dataset)
The regression equation is
Trips = 12.6 + 0.000011 Income (INR) - 0.0591 Age + 4.69 Number of Cars Owned
+ 11.7 Accessibility Index + 0.0026 Distance to Metro Station (km)

Predictor Coef SE Coef T P


Constant 12.650 1.860 6.80 0.000
Income (INR) 0.00001112 0.00000205 5.42 0.000
Age -0.05910 0.03933 -1.50 0.135
Number of Cars Owned 4.6892 0.8127 5.77 0.000
Accessibility Index 11.741 2.013 5.83 0.000
Distance to Metro Station (km) 0.00260 0.03833 0.07 0.946

S = 5.00253 R-Sq = 57.5% R-Sq(adj) = 56.3%

Analysis of Variance

Source DF SS MS F P
Regression 5 5888.5 1177.7 47.06 0.000
Residual Error 174 4354.4 25.0
Total 179 10242.9

Source DF Seq SS
Income (INR) 1 3573.0
Age 1 25.4
Number of Cars Owned 1 1437.5
Accessibility Index 1 852.5
Distance to Metro Station (km) 1 0.1

Page | 33
COMPARISON TABLE:
Training Validation
Sr. Error
Regression Model Dataset Dataset
No. Percentage
(R2) (R2)
1. Trips Vs Income 38.7% 34.9% 3.8%
Trips Vs Income, Age,
2. 52.1% 49.2% 2.9%
Number of Cars Owned
Trips Vs Income, Age,
3. Number of Cars Owned 60.5% 57.5% 3%
Accessibility Index
Trips Vs Income, Age,
Number of Cars Owned
4. 67.5% 70.3% -2.8%
Accessibility Index, Distance
to City Centroid
Trips Vs Income, Age,
Number of Cars Owned
5. 60.6% 57.5% 3.1%
Accessibility Index, Distance
to Metro Station

The data provides an evaluation of five regression models for predicting trip production using
both training and validation datasets. The models vary in complexity, incorporating different
combinations of predictors: Income, Age, Accessibility Index, Distance to City Centroid, and
Distance to Metro Station. The training dataset R2 values range from 38.7% to 67.5%, while
validation dataset R2 values range from 34.9% to 70.3%. Error percentages, calculated as the
difference between training and validation R2, offer insights into each model's performance.
• Best Model:
Model 4 (Trips vs. Income, Age, Accessibility Index, Distance to City Centroid) stands out
with the highest validation R2 of 70.3%, surpassing its training R2 (67.5%) by 2.8%. This
negative error percentage suggests the model generalizes well and benefits from the
additional variable, "Distance to City Centroid."
• Worst Model:
Model 1 (Trips vs. Income) exhibits the lowest R2 values for both training (38.7%) and
validation (34.9%). The error percentage of 3.8% reflects poorer generalization compared
to other models, likely due to oversimplification and insufficient explanatory variables.
• Other Models:
Models 2, 3, and 5 show moderate performance, with validation R2 values ranging from
49.2% to 57.5%. The error percentages for these models (2.9%, 3%, and 3.1%, respectively)
indicate reasonable but not exceptional generalization capabilities.

Model 4 is the best-performing model, demonstrating high predictive power and


generalizability. In contrast, Model 1 is the weakest due to its limited scope, capturing only
Income as a predictor. This analysis underscores the importance of incorporating relevant
explanatory variables like Accessibility Index and Distance to City Centroid to enhance
model performance.

Page | 34
07.PROBLEM PART – 5 AND ITS SOLUTIONS

5. How do you justify that the regression model results are statistically valid? Demonstrate the
validity of various assumptions of regression analysis and discuss them briefly. (For any one
model)

I have considered Trips Vs Income Model


(1) Linearity - The Plot: Trips against Income shows a straight-line trend. Linearity ensures that
the model accurately represents the relationship between variables. Hence, the regression results
are statically valid.

(2) Homoscedasticity
From the below graph, it shows that points are randomly scattered around the horizontal line (y =
0), which suggests that the variance of residuals is constant across all levels of fitted values,
indicating homoscedasticity.

Page | 35
(3) Multicollinearity
Regression Analysis: Trips versus Income (INR)

The regression equation is


Trips = 14.9 + 0.000021 Income (INR)

Predictor Coef SE Coef T P VIF


Constant 14.8997 0.1883 79.11 0.000
Income (INR) 0.00002096 0.00000073 28.82 0.000 1.000

S = 6.23848 R-Sq = 38.7% R-Sq(adj) = 38.7%

PRESS = 51257.7 R-Sq(pred) = 38.56%

Analysis of Variance

Source DF SS MS F P
Regression 1 32321 32321 830.47 0.000
Residual Error 1313 51100 39
Total 1314 83421

A VIF of 1 indicates no correlation between the independent variable and other variables. From the
Regression analysis, it is clear that VIF = 1 and there is no collinearity for Income (Independent
variable)

Page | 36
(4) Normality of Residuals
A. Histogram - A bell-shaped histogram suggests that the residuals are normally distributed.

B. Q-Q Plot

The points closely follow the diagonal line, it suggests that the residuals are normally distributed.
This confirms that the regression analysis is reliable and valid.

Page | 37
08.PROBLEM PART – 6 AND ITS SOLUTIONS

6. Develop a cross-classification analysis table to understand the mean variation of trips among
car ownership levels (0,1 and 2) within different job categories (Students, Professionals, Self-
Employed, Technician, Unemployed, Housemaids, Retired, Entrepreneurs). Try to show the
variation graphically in Minitab.

For developing a cross-classification analysis table following steps involved:


• Sort the data by Car Ownership Levels and Job Categories for easier segmentation:
• Go to Data > Sort.
• Select Car Ownership Levels as the primary column and Job Categories as the secondary
column.
• Use Descriptive Statistics to calculate the mean trips for each combination:
• Go to Stat > Tables > Cross Tabulation and Chi-Square.
• In the dialog box:
• Select Car Ownership Levels as the first categorical variable.
• Select Job Categories as the second categorical variable.
• Choose the Trips column as the "Summary" variable to calculate the mean.
• Configure Display Descriptive Statistics and check Mean.

Number of Cars Owned Education Level Total Trips (All Rows)


0 basic.4y 102
0 basic.6y 80
0 basic.9y 114
0 high.school 160
0 professional.course 90
0 university.degree 149
0 unknown 39
1 basic.4y 74
1 basic.6y 48
1 basic.9y 84
1 high.school 111
1 professional.course 52
1 university.degree 115
1 unknown 41
2 basic.4y 11
2 basic.6y 4
2 basic.9y 5
2 high.school 16
2 professional.course 8
2 university.degree 6
2 unknown 6

Page | 38
Descriptive Statistics: Trips

Results for Number of Cars Owned = 0

Variable Education N N* Mean SE Mean TrMean StDev


Trips basic.4y 102 0 14.186 0.749 13.837 7.562
basic.6y 80 0 11.725 0.767 11.278 6.864
basic.9y 114 0 13.386 0.646 13.088 6.902
high.school 160 0 13.088 0.591 12.688 7.475
professional.course 90 0 13.811 0.769 13.425 7.298
university.degree 149 0 13.523 0.674 13.185 8.223
unknown 39 0 14.92 1.26 14.69 7.90

Sum of
Variable Education Variance CoefVar Sum Squares Minimum
Trips basic.4y 57.183 53.30 1447.000 26303.000 3.000
basic.6y 47.113 58.54 938.000 14720.000 2.000
basic.9y 47.637 51.56 1526.000 25810.000 3.000
high.school 55.879 57.12 2094.000 36290.000 2.000
professional.course 53.256 52.84 1243.000 21907.000 4.000
university.degree 67.616 60.80 2015.000 37257.000 1.000
unknown 62.34 52.91 582.00 11054.00 4.00

Variable Education Q1 Median Q3 Maximum Range IQR


Trips basic.4y 8.000 13.000 19.000 39.000 36.000 11.000
basic.6y 6.250 10.000 16.000 29.000 27.000 9.750
basic.9y 8.000 11.000 19.000 30.000 27.000 11.000
high.school 7.000 11.000 18.000 43.000 41.000 11.000
professional.course 8.000 11.000 20.000 31.000 27.000 12.000
university.degree 7.000 11.000 19.000 36.000 35.000 12.000
unknown 9.00 13.00 23.00 30.00 26.00 14.00

N for
Variable Education Mode Mode Skewness Kurtosis
Trips basic.4y 9 11 0.72 0.07
basic.6y 7, 8 8 0.93 -0.08
basic.9y 11 10 0.62 -0.61
high.school 7 16 0.86 0.52
professional.course 8, 10 8 0.65 -0.74
university.degree 7, 9 13 0.68 -0.65
unknown 11, 14 5 0.64 -0.90

The descriptive statistics for trips under the category of Number of Cars Owned = 0 reveal
distinct patterns across educational levels. The data indicates variability in trip behavior
influenced by education, with key insights summarized below:
➢ Mean and Median Trends:
• The mean number of trips is highest for individuals with unknown education level
(14.92) and lowest for those with basic.6y education (11.725).
• Median values are generally lower than means, indicating slight positive skewness
across categories.
➢ Spread and Variability:
• Standard deviations range from 6.86 (basic.6y) to 8.22 (university.degree), suggesting
greater variation in trips for individuals with university degrees.
• The coefficient of variation (CV) supports this finding, with university.degree showing
the highest variability (60.80%) and basic.4y the lowest (53.30%).
➢ Distribution and Skewness:

Page | 39
• Skewness values are positive but modest, ranging from 0.62 to 0.93, indicating right-
tailed distributions across all categories.
• The kurtosis values near zero suggest that distributions are close to normal, with no
extreme peaks or flatness.
➢ Range and Interquartile Range (IQR):
• The maximum number of trips is observed in the high.school category (43 trips), while
the minimum (1 trip) occurs for individuals with a university.degree.
• The IQR is broadest for unknown education (14), reflecting a wide middle spread.
➢ Mode and Common Values:
• Modes indicate frequently occurring trip counts, with basic.4y (9 trips) and
university.degree (7 and 9 trips) being popular choices.

The analysis highlights how educational background influences travel behavior. Individuals
with basic.6y education tend to make fewer trips on average, potentially due to limited
accessibility or lower engagement in activities requiring travel. Conversely, those with higher
or unknown educational levels show greater trip activity, possibly reflecting diverse
socioeconomic factors or occupational demands. The modest skewness and variability in most
categories suggest a consistent but nuanced relationship between education and trip frequency,
valuable for planning transportation strategies tailored to educational demographics.
Results for Number of Cars Owned = 1

Variable Education N N* Mean SE Mean TrMean StDev


Trips basic.4y 74 0 20.405 0.668 20.409 5.746
basic.6y 48 0 21.563 0.797 21.636 5.519
basic.9y 84 0 20.798 0.651 20.882 5.965
high.school 111 0 21.568 0.598 21.434 6.305
professional.course 52 0 21.000 0.822 21.109 5.928
university.degree 115 0 21.574 0.489 21.699 5.242
unknown 41 0 20.537 0.728 20.595 4.659

Sum of
Variable Education Variance CoefVar Sum Squares Minimum
Trips basic.4y 33.011 28.16 1510.000 33222.000 10.000
basic.6y 30.464 25.60 1035.000 23749.000 10.000
basic.9y 35.585 28.68 1747.000 39287.000 9.000
high.school 39.757 29.24 2394.000 56006.000 10.000
professional.course 35.137 28.23 1092.000 24724.000 11.000
university.degree 27.475 24.30 2481.000 56657.000 9.000
unknown 21.705 22.69 842.000 18160.000 10.000

Variable Education Q1 Median Q3 Maximum Range IQR


Trips basic.4y 15.750 20.000 25.000 31.000 21.000 9.250
basic.6y 18.250 21.500 25.750 31.000 21.000 7.500
basic.9y 15.250 21.000 26.000 31.000 22.000 10.750
high.school 17.000 22.000 26.000 40.000 30.000 9.000
professional.course 17.000 20.500 27.000 30.000 19.000 10.000
university.degree 18.000 22.000 26.000 31.000 22.000 8.000
unknown 17.500 20.000 24.500 29.000 19.000 7.000

N for
Variable Education Mode Mode Skewness Kurtosis
Trips basic.4y 24 7 -0.03 -1.16
basic.6y 19, 25 5 -0.20 -0.70
basic.9y 26 8 -0.23 -1.15
high.school 17, 22 8 0.22 -0.17
professional.course 28 7 -0.13 -1.19
university.degree 19, 25 11 -0.27 -0.70
unknown 18 6 -0.08 -0.48

Page | 40
The results for trips taken by individuals with 1 car ownership reveal insightful patterns
across different education levels. Here's a detailed interpretation:
➢ Central Tendency (Mean and Median):
o The mean trips range between 20.405 (basic.4y) and 21.574
(university.degree).
o Medians are close to their respective means, indicating a roughly symmetric
distribution for most education levels.
o Higher education levels like university degree and high school show slightly
higher mean trips (~21.5).
➢ Variability:
o The standard deviation is lowest for unknown education (4.659) and highest
for high school (6.305), indicating greater consistency in trips for unknown
education and more variability in high school trips.
o The coefficient of variation (CV) is lowest for university degree (24.30%),
showing stable trip behaviour, while basic.9y has the highest CV, indicating
more variability in trips relative to the mean.
➢ Distribution:
o The skewness values are close to zero across all education levels, suggesting a
near-normal distribution of trips. However, high school has a slight positive
skew, indicating a small number of higher-than-average trip counts.
o Kurtosis is negative for all categories, indicating a flatter distribution
compared to a normal distribution, with fewer extreme values.
➢ Range and IQR:
o The total range of trips is widest for high school (30 trips) and narrowest for
unknown (19 trips).
o The interquartile range (IQR) varies between 7 (unknown) and 10.75
(basic.9y), showing differences in spread between the middle 50% of data
points.
➢ Modal Analysis:
o Mode values indicate the most frequent trip counts, with some categories like
basic.6y and university degree showing bimodal distributions (e.g., 19 and 25
trips).
➢ Summary:
o Individuals with higher education levels (e.g., university degree and high
school) tend to take slightly more trips on average, with consistent behavior
(lower CV and narrower IQR).
o Those with lower education (e.g., basic.4y) show lower average trips and
higher variability, which may reflect differing travel needs or patterns.
o The unknown education group displays the least variability and a narrower
distribution of trips, potentially suggesting homogeneity in this group's travel
patterns.

This analysis can be leveraged to understand travel behaviour in relation to education,


assisting in transportation planning or targeted interventions to support different groups. For
instance, initiatives to optimize travel for lower education groups might focus on addressing
variability in trip demands.

Page | 41
Results for Number of Cars Owned = 2

Variable Education N N* Mean SE Mean TrMean StDev Variance


Trips basic.4y 11 0 28.455 0.474 28.444 1.572 2.473
basic.6y 4 0 26.50 1.44 * 2.89 8.33
basic.9y 5 0 27.600 0.872 * 1.949 3.800
high.school 16 0 27.188 0.356 27.214 1.424 2.029
professional.course 8 0 28.000 0.964 * 2.726 7.429
university.degree 6 0 28.333 0.333 * 0.816 0.667
unknown 6 0 27.500 0.922 * 2.258 5.100

Sum of
Variable Education CoefVar Sum Squares Minimum Q1
Trips basic.4y 5.53 313.000 8931.000 26.000 27.000
basic.6y 10.89 106.00 2834.00 23.00 23.75
basic.9y 7.06 138.000 3824.000 25.000 26.000
high.school 5.24 435.000 11857.000 25.000 26.000
professional.course 9.73 224.000 6324.000 23.000 26.250
university.degree 2.88 170.000 4820.000 27.000 27.750
unknown 8.21 165.000 4563.000 25.000 25.750

Variable Education Median Q3 Maximum Range IQR Mode


Trips basic.4y 29.000 30.000 31.000 5.000 3.000 27, 29
basic.6y 26.50 29.25 30.00 7.00 5.50 *
basic.9y 27.000 29.500 30.000 5.000 3.500 27
high.school 27.000 28.750 29.000 4.000 2.750 26, 29
professional.course 28.500 29.750 32.000 9.000 3.500 29
university.degree 28.500 29.000 29.000 2.000 1.250 29
unknown 27.000 29.500 31.000 6.000 3.750 26

N for
Variable Education Mode Skewness Kurtosis
Trips basic.4y 3 0.01 -1.07
basic.6y 0 0.00 0.91
basic.9y 2 -0.08 -0.82
high.school 4 -0.06 -1.33
professional.course 2 -0.56 0.79
university.degree 3 -0.86 -0.30
unknown 2 0.63 -0.75
For respondents with 2 cars owned, the average number of trips varies slightly across educational
levels, with the lowest mean observed for individuals with basic.6y education (26.50 trips) and the
highest mean for those with a university degree (28.33 trips). The standard deviations are generally
low, indicating minimal variation within groups, with the smallest deviation for university
graduates (0.816) and the largest for professional course participants (2.726).
1. Trip consistency: Higher education levels (university degrees) show lower variability
(Coefficient of Variation = 2.88%), suggesting more predictable travel patterns.
2. Range and spread: The largest range of trips is observed among professional course
participants (9 trips), while the smallest range (2 trips) is for university graduates.
3. Median comparison: The median trips are comparable across all education levels, cantering
around 27 to 28.5 trips, except for basic.6y education (26.5 trips).
4. Distribution shape: The skewness values are close to zero for most education levels,
implying near-symmetric distributions, except for the "unknown" education group, which
has a slight positive skew (0.63).
In summary, individuals with higher education (e.g., university degrees) tend to have more
consistent and slightly higher trip counts. The stability of their travel behaviour may reflect

Page | 42
structured routines or access to better commuting options, while the variability in lower education
levels might reflect diverse lifestyle or mobility constraints.

Page | 43
09.PROBLEM PART – 7 AND ITS SOLUTIONS

7. Do you think that having a metro card influence the number of trips made by a person?
Justify your answer using statistical visualizations or a model.

From the boxplot, it is quite clear that People with Metro card makes a greater number of Trips
than people without Metro Cards, Therefore, we can infer that metro card influences the number
of trips made by a person.

Page | 44
10.PROBLEM PART – 8 AND ITS SOLUTIONS

8. Do you think that the education level (4th standard, 6th standard, 9th standard, high school,
professional course, university degree) of a person has a positive influence on his/her trip
production? Justify your answer using statistical visualizations or a model

The above boxplot clearly shows that there is no much influence for education on deciding the
number of Trips.

Page | 45

You might also like