Math EE IB
Math EE IB
Mathematics
Table of Contents
Introduction…………………………………………………………………………………...… 2
Methodology…………………………………………………………………………………..... 8
Conclusion…………………………………………………………………………………….. 18
Word Cited…………………………………………………………………………………….. 19
2
Introduction
disasters, or mistakes that occur later in life. There are many different types of
insurances, varying from life insurance (a payout to your family in case of a death) to
fire insurance (payout in the instance of a destructive fire). Usually, a person will pay a
monthly fee, known as a premium, to retain coverage from an insurance provider, given
that the provider will help pay for future repairs or damage as needed. The main type I
will be focusing on in this essay is automobile insurance. Auto insurance covers the cost
of any potential costs incurred while driving. When deciding on a premium, insurance
companies will often hire actuaries to determine the risk of an individual applying for
insurance. Many different factors can affect the cost of insurance such as location,
annual mileage, credit record, and type of vehicle. The higher the risk of a person to
factors is extremely important to determining the best price for insurance applicants. If a
risk individual more money per month. If a person has a very good driving record, it is
important for an insurance company to charge a low premium so the customer is likely
In the process of creating a model for age and insurance pricing, it is important to
determine the significance of the discovered relationship between age and insurance
pricing. To do this, the slope of the least squares regression line (LSRL) will be tested in
and the probability p is less than the given significance level ( a ), then H 0 can be
result can be generalized to a population. If the probability of a result is less than the
significance level, it can be concluded that the sampling error is not responsible for the
results and they can be generalized to the whole population and therefore more useful
for predictions. If the probability is higher than the significance level, then there was not
1.2 Errors
There are two types of errors: type I error and type II error. A type I error occurs
when the null hypothesis is true, but it is rejected. This denotes that a false positive has
4
occurred, that we assert a new claim that is not true when compared to H 0 . To
minimize type I error, an investigation would use a low significance level to limit the
room for detecting a false positive. The other type of error, type II, occurs when H 0 is
false but is not rejected. This would imply that a false negative has occurred, or that the
results were statistically significant but the H 0 was not rejected. To minimize a type II
error, a higher significance level should be used as it will allow for more statistically
Since type I and type II error require the opposite conditions to be minimized, the
significance level needs to be chosen in the context of which error is worse should it
occur. Choosing a higher significance level would imply that the type II error would be
worse should it occur, while a low significance level implies that a type I error would be
worse if it occured.
In the context of a company providing car insurance, a type I error would mean
there is no relationship between age and annual price of insurance, but it is concluded
that there is a relationship. A type II error would imply that there is a relationship
between age and the price of insurance, but it is concluded that there is no relationship.
Since a company is the most likely to be doing an analysis of age and annual price, the
point of view of the company providing insurance will be used. For a theoretical
company, a type I error is worse because it means that the company would be giving
estimates that are largely inaccurate and based on the results that were due to random
chance, hurting their reputation.To minimize this, a lower significance level of .05 will be
used.
5
relationship. If the relationship is significant, this means that the equation found is useful
between the independent variable x and the dependent variable y must be linear and
1.4 Equations
These three equations can be used to look for relationships within data:
︿
y = b1 + b0 x
rS y
b0 = Sx
b 1 = y − b 0x
︿
y is the predicted insurance annual cost
b1 is the y-intercept
6
b0 is the slope
√
n
∑ (y i −y) 2
i=1
Sy= n−1
√
n
∑ (xi −x) 2
i=1
Sx= n−1
b−β
t= SE b
s
SE b = √n
( S1x )
b ± t * ( √sn )
All above equations from The Practice of Statistics. Equation of standard error from
Methodology
I will create multiple LSRLs to determine the relationship between age and
old (35-85).
test to determine the significance of the slope.I will interpret the meaning of the slopes
for each age group and what use they could have.
9
In the 16-25 year old age group, there is a strong, negative, linear relationship
between age and average insurance price. The LSRL between age and annual
√
n
∑ (y i −y) 2
i=1
Sy= n−1
√
10
∑ (y i −3567.620) 2
i=1
Sy= 10−1
√
30,503,371.68
Sy= 9
√
30,503,371.68
Sy= 9
S y = 1840.995
√
n
∑ (xi −x) 2
i=1
Sx= n−1
√
10
∑ (xi −20.5) 2
i=1
Sx= 9
10
Sx=
√ 82.5
9
S x = 3.028
rS y
b0 = Sx
(−.938)(1840.995)
b0 = (3.028)
b 0 =− 570.295
b 1 = y − b 0x
b 1 = 3567.620 + (570.295)(20.5)
b 1 = 15258.668
︿
Which leads to an LSRL of y = 15268.668 − 570.295x , as shown below with
r = .938
r 2 = .8798
year of increased age, the predicted average cost of insurance decreases by about
$570.30.
affecting the price of insurance that are not related to age such as gender and type of
vehicle. As there is not many as many factors to consider for younger individuals, age is
acts as an important factor explaining car insurance pricing, reflected in the high r 2
value. Although the line is not necessarily accurate, a test will still be done, with the
The null hypothesis is there is no relationship between age and insurance pricing
When calculating the probability of the slope being significant the standard error
s
SE b = √n
( S1x )
674.591 1
SE b = √10
( 3.028 )
S E b = 70.451
b−β
t= SE b
−570.295−0
t= 70.451
t =− 8.095
p-value of 2.005 × 10 −5 , or approximately 0, is found. Since the p-value is less than .05,
we can reject H 0 and say that there is a relationship between age and the annual price
of insurance.
Using the found LSRL, we can make predictions about the possible price of a
︿
y = 15268.668 − 12, 546.49
︿
y = 2722.178
In this example, we can predict that at age 22, a person would have an annual
insurance premium of $2722.18. This is not the same as the average of the collected
prices for 22 year olds ($2323.96) possibly because of external factors affecting the
Since the results are significant, creating a 95% confidence interval can help
show the relationship between age and premium price. Using a critical value of 1.860
b ± t * ( √sn )
− 570.295 ± 396.783
(− 967.078,− 173.512)
The 95% confidence interval means that we can be 95% certain that the true
relationship between age and annual insurance premiums is between -967.078 and
-173.512. Since -570.295 is within this interval, it is a possible relationship between age
Another important pattern to note is the change seen in the average and
standard deviation of the annual prices with increasing age. The average annual price
14
stays above $5000 until the age of 18, but is followed by a large fall to about $3500 at
19, and consistent decreases down to $1770.96 per year Standard deviation remains
above $1400 until the age of 18, then drops blow $1000 at 19, then decreases down to
$454.82. The largest drops in both of these values within this age group occur at the
age where students are leaving high school and becoming adults, which could be a
In the 35-85 age group there remains a strong, negative, linear relationship
√
n
∑ (y i −y) 2
i=1
Sy= n−1
√
6
∑ (y i −1602.330) 2
i=1
Sy= 5
√
263,924.498
Sy= 5
S y = 229.750
15
√
n
∑ (xi −x) 2
i=1
Sx= n−1
√
6
∑ (xi −60) 2
i=1
Sx= 5
Sx=
√ 1750
5
S x = 18.708
rS y
b0 = Sx
(.611)(229.750)
b0 = (18.708)
b 0 = 7.504
b 1 = y − b 0x
b 1 = 1602.330 − (7.504)(60)
b 1 = 1152.09
︿
The resulting LSRL, y = 1152.09 + 7.504x , is pictured below. For the purposes of
remaining in the age group, the domain of the ages will be limited to 25 < x ≤ 85 .
16
r = .661
r 2 = .373
year of increased age, the predicted average cost of insurance increases by about
$7.504.
car insurance. This is more apparent in the older age group because there are more
factors to consider when determining prices, such as the driver’s record. A test and
confidence interval will still be made with the LSRL in spite of these issues.
The null and alternate hypothesis will remain the same as in the young age
group, H 0 = 0 and H a =/ 0. The standard error of the sample slope and the test statistic
s
SE b = √n
( S1x )
214.835 1
SE b = √6
( 18.708 )
S E b = 4.851
b−β
t= SE b
7.504−0
t= 4.851
t = 1.547
Using this test statistic and degrees of freedom value of 4, a p-value of .197 is obtained.
Since the p-value is greater than the significance level of .05, we can not reject H 0 and
do not say that there is a relationship between age and the annual price of insurance.
One reason that the results from the older age group are not significant is
because of the greater variety of factors influencing their price. A given individual’s
driving record, financial situation, or credit score might all have an unforeseen impact on
the price of insurance. Because of the high probability that the results are due to
Conclusion
Overall, the equation to predict the price of insurance based on age was more
accurate and useful for individuals between ages 16 and 25 while age did not act as a
good predictor for price on individuals in the older age bracket. This is largely due to the
increase in confounding variables affecting the data of older individuals, such as their
Although the older age model is not the best for making a price prediction, The
model for younger drivers could be used by a car insurance company in order to give a
As type I error is worse for a company providing car insurance, I used a smaller
significance level during my analysis. This was to minimize the likelihood that a car
company would use these results to make predictions when the results might have been
based on random chance. Although a type II error fails to provide a potentially usable
model for a car company, the potential of using an incorrect model poses a much
It is interesting to note that the results with the higher correlation coefficient
yielded a lower p-value. Further investigation into the relationship between correlation
Works Cited
Starnes, Daren S., et al. The Practice of Statistics. 5th ed., 2015.