Bivariate Data Analysis Olympics Project Lef-2
Bivariate Data Analysis Olympics Project Lef-2
Mrs. Jones
27 August 2021
1948 13.75
1952 15.28
1956 16.59
1960 17.32
1964 18.14
1968 19.61
1972 21.03
1976 21.16
1980 22.41
1984 20.48
1988 22.24
1992 21.06
1996 20.56
2000 20.56
2004 19.59
2008 20.56
2012 20.7
2016 20.63
2020 20.58
Sources:
https://olympics.com/tokyo-2020/olympic-games/en/results/athletics/result-women-s-shot-put-fn
l-000100-.htm
https://olympics.com/en/olympic-games/olympic-results
The relationship between the distances and years is a moderate, positive, and nonlinear
correlation.
Sources: https://istats.shinyapps.io/LinearRegression/
A linear model is not appropriate for the relationship between the year or olympic games and the
distance because the distance increases at first and then fluctuates around the same value in the
later years. In the earlier games, each winning distance would break the record of the one before,
but when the distance reached around 20-21 meters, in the later years, it stopped increasing. The
scatterplot shows a slightly curved, nonlinear shape. The residual plot shows a more curved
shape, as well. Therefore this data is not fit for a linear model.
The winning distance for the 2020 or 2021 olympic games is about the same or a little less than
the games before. I don’t think the delay of the games due to covid affected my event much,
statistically. This year's value follows the trend of staying around the same values of 20-21
meters for the winning distance. An extra year gives the athletes more time to prepare and train
for the event, but covid also made it difficult to train in a normal setting.
Ŷ = -121.16 + 0.07x
Slope interpretation:
For every additional year, the model predicts an increase of 0.07 meters for the winning distance
When the year is 0, the winning distance in meters is -121.16. This is not relevant because the
Olympic games were not invented within the first year of human life.
20.56 = observed y
This residual is positive, meaning it is above the LSR line. A residual of 2 is a small residual,
R-value:
R = 0.686
This r value indicates that the relationship between the years or olympic games and the winning
women's shot put distances is moderate in strength. A moderate correlation has an r value
r^2 = 0.471
The r-squared value tells you how much of the data can be explained by the LSR line. In this
correlation, 47.1% of the variability of the winning distances is explained by the least squares
regression line for the relationship between the years and the distances.
Explanatory-
Mean = 1984
Response-
Mean = 19.59
Proving Slope:
B = 0.07
B = r (Sy/Sx)
0.686(2.33/22.51) = 0.07 = b
0.07 = 0.07
X mean = 1984
Y mean = 19.59
Ŷ = -121.16 + 0.07x
Ŷ = -121.16 + 0.07(1984)
Ŷ = 17.72
17.72 ≠ 19.59
Although it is close, the LSRL does not pass through the point (1984, 19.59). This is most likely
because there is not a linear relationship between years or olympic games and the winning