0% found this document useful (0 votes)
4 views

AP Statistics Sampling Project Example

This study investigates the correlation between average individual income and average individual expenses across U.S. states, revealing a weak positive correlation with a correlation coefficient of 0.303. The least squares regression analysis indicates that only 4.6% of the variability in expenses can be explained by income, suggesting other factors also play significant roles. Limitations include a small sample size and potential data inaccuracies, highlighting the need for further research incorporating additional variables.

Uploaded by

25groppec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

AP Statistics Sampling Project Example

This study investigates the correlation between average individual income and average individual expenses across U.S. states, revealing a weak positive correlation with a correlation coefficient of 0.303. The least squares regression analysis indicates that only 4.6% of the variability in expenses can be explained by income, suggesting other factors also play significant roles. Limitations include a small sample size and potential data inaccuracies, highlighting the need for further research incorporating additional variables.

Uploaded by

25groppec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A.

Introduction
The goal of this study is to determine whether there is a positive correlation between average
individual income and average individual expenses across different states. I am interested in this
topic because understanding the relationship between income and expenses can provide valuable
insights into consumer behavior and financial planning. By investigating this relationship, I hope to
learn whether higher individual income leads to higher individual expenses, and to what extent this
correlation holds true across different states. The population of this study is the fifty U.S. states
plus the District of Columbia as these states represent the geographical regions within the United
States for which the income and expense data is collected. The individuals in this study are the 50
U.S. states with the addition of the District of Columbia, and a random sample of 20 states will be
analyzed to investigate the correlation between average individual income and average individual
expenses. The explanatory variable in this study is individual income by state as I believe that this
value can be used to influence or explain changes in the response variable which is average
individual expenses.

The study titled The Impact of Labor Freedom on Geographic Cost of Living Differentials examines
how varying levels of labor market freedom across U.S. states influence state-level cost of living.
Using cross-sectional data from 2016, the researchers analyzed the impact of labor market
freedom (measured by a combination of factors such as employment regulations and union
density) on cost of living indices. Key findings include an inverse relationship between labor market
freedom and cost of living, indicating that as labor market freedom increases, the overall cost of
living decreases.

B. Data Collection
The population of interest for this study consists of all 50 U.S. states and the District of Columbia. A
list of these 51 states was created to serve as the sampling frame. The target sample size is 20
states. To ensure randomness, a random number generator was used to select 20 states from the
sampling frame. Each state was assigned a unique number from 1 to 51, and the random number
generator picked 20 unique numbers corresponding to the states. The randomly selected states
are: Alaska, Hawaii, Idaho, Illinois, Indiana, Kansas, Maine, Maryland, Minnesota, Missouri,
Montana, Nebraska, Ohio, Oregon, Pennsylvania, Rhode Island, South Dakota, Texas, Washington,
and West Virginia.

The data for this study was sourced from two places: the Federal Reserve Bank of St. Louis and
GoBankingRates. The Federal Reserve Economic Data is continuously updated and collected over
time. The gathers data from various sources across the United States, including government
agencies, private sector reports, and international organizations. The purpose of which is to provide
comprehensive economic data for analysis and decision-making. Once collected, the data is
compiled, standardized, and made available through the Federal Reserve Economic Data
database. On the other hand, GoBankingRates is a financial news and information website that
provides data on the cost of living in different states. The data is updated periodically, with the most
recent update in October 2024. GoBankingRates collects information from sources such as the
Missouri Economic Research and Information Center and the Bureau of Labor Statistics. The aim of
which is to provide consumers with detailed information on the cost of living. The data is collected
through surveys, government reports, and other financial data sources, and analyzed to compile a
comprehensive cost of living index for each state.

C. Data Presentation
Sample Data List:

State Per Capita Personal Income Annual Cost of Living


Alaska $71,611.00 $91,355.00
Hawaii $66,175.00 $131,560.00
Idaho $59,385.00 $71,945.00
Illinois $72,245.00 $67,203.00
Indiana $61,243.00 $66,400.00
Kansas $66,115.00 $63,554.00
Maine $65,105.00 $80,191.00
Maryland $75,391.00 $85,007.00
Minesota $72,557.00 $68,662.00
Missouri $62,604.00 $64,576.00
Montana $64,989.00 $75,083.00
Nebraska $71,347.00 $66,327.00
Ohio $61,495.00 $69,100.00
Oregan $67,838.00 $83,693.00
Pennsylvania $68,945.00 $69,756.00
Rhode Island $67,562.00 $80,774.00
South Dakota $72,466.00 $67,422.00
Texas $66,252.00 $67,640.00
Washington $80,930.00 $84,642.00
West Virginia $52,826.00 $63,992.00

Scatterplot with no line and description:


The scatterplot above displays the relationship between average individual income and average
individual expenses. The data points are scattered without a clear linear pattern, indicating a weak
relationship between the two variables. The form of the scatterplot shows a dispersed distribution
of data points with no distinct linear trend and the strength of the relationship between average
individual income and expenses is very weak, as evidenced by the wide dispersion of data points.
Although the relationship is weak, there is a slight positive trend, suggesting that as average
individual income increases, average individual expenses also tend to increase, albeit marginally.
The scatterplot also identifies a few potential outliers, including the data point where the average
individual income is around $60,000 and the average individual expenses are approximately
$130,000. This point deviates significantly from the overall trend and may influence the analysis.

D. Modeling
Least Squares Regression Line (LSRL):

Equation n s r2
𝑦̂ = 40017.9165 + 0.5334𝑥 20 15562.611 0.046

New Scatterplot with Regression Line:


Equation: 𝑦̂ = 40017.9165 + 0.5334𝑥

3. Residual Plot:

The slope of the regression line (0.5334) indicates that for each additional dollar increase in average
individual income, the average individual expenses increase by $0.5334. This positive slope
suggests that as income rises, expenses also tend to increase. The y-intercept (40017.9165)
represents the estimated average individual expenses when the average individual income is zero,
providing a baseline level of expenses when no income is considered.
The residual plot reveals that the residuals are widely spread and do not form a clear pattern,
indicating potential issues with the fit of the linear model. The correlation coefficient (r) is 0.303,
which indicates a weak positive correlation between average individual income and expenses.
Additionally, the coefficient of determination (r²) is 0.046, meaning that only about 4.6% of the
variability in expenses can be explained by income, suggesting that other factors also play a
significant role in determining expenses.

E. Predictions
An interpolative prediction involves estimating a value within the range of the existing data points.
Using this model, we can predict the average individual expenses for a state with an average
individual income of $65,000, which falls within the range of our sample data.

Given the LSRL equation:

𝑦̂ = 40017.9165 + 0.5334𝑥
Plugging in x = 65,000:

𝑦̂ = 40017.9165 + 0.5334 × 65,000 = 40017.9165 + 34,671 = 74,688.9165


For a state with an average individual income of $65,000, the model predicts average individual
expenses to be approximately $74,688.92. This prediction is reasonable and falls within the
observed data range.

An extrapolative prediction involves estimating a value outside the range of the existing data points.
Using this model, we can predict the average individual expenses for a state with an average
individual income of $100,000, which is outside the range of our sample data.

Given the LSRL equation:

𝑦̂ = 40017.9165 + 0.5334𝑥
Plugging in x = 100,000:

𝑦̂ = 40017.9165 + 0.5334 × 100,000 = 40017.9165 + 53,340 = 93,357.9165


For a state with an average individual income of $100,000, the model predicts average individual
expenses to be approximately $93,357.92. While this prediction is mathematically consistent with
the model, it should be approached with caution as extrapolating beyond the observed data range
may not always yield accurate predictions, especially given the weak correlation in the data.

F. Discussion
This study aimed to investigate the relationship between average individual income and average
individual expenses across different U.S. states. The analysis revealed a weak positive correlation
between these two variables, indicated by a correlation coefficient (r) of 0.303. The least squares
regression line or LSRL equation provided a model for predicting expenses based on income, with a
slope of 0.5334 and a y-intercept of 40017.9165. However, the coefficient of determination (r2) was
0.046, suggesting that only about 4.6% of the variability in expenses can be explained by income.
It's important to note that correlation does not imply causation. While the study identifies a positive
relationship between income and expenses, it does not establish that higher income directly
causes higher expenses. Other factors, such as cost of living, tax policies, and economic
conditions, may also play significant roles in determining individual expenses.

Several limitations must also be acknowledged in this study. Since the study used a sample of only
20 states, it might not capture the full variability of the entire population. Data accuracy is another
potential issue, as there may be measurement errors and variations in data collection
methodologies. The model itself has limitations, given the weak correlation and the presence of
outliers that may affect the regression model's accuracy. Unaccounted variables were also a
concern, with several important factors, such as cost of living and tax policies. Lastly, the study
identifies a correlation but does not establish causation, which is a critical distinction.

To build on these findings, future research could focus on incorporating additional variables, such
as cost of living, tax policies, healthcare costs, and education and childcare expenses.
Demographic analyses, including age distribution and household size, should also be considered to
better understand their impact on expenses as well as geographic and environmental factors, like
urban or rural locations, employment rates, inflation rates and economic growth. Behavioral
factors, such as spending habits and financial literacy and policy analysis related to government
programs and minimum wage policies should also be considered.

G. Reflection
Reflecting on this project, I learned the importance of systematically collecting, organizing, and
analyzing data. I gained a deeper understanding of statistical tools and techniques, such as
scatterplots, correlation coefficients, and regression analysis. The project reinforced the need to
distinguish correlation from causation and to consider multiple factors when analyzing financial
data. Presenting complex statistical concepts in a clear and accessible manner was another
valuable lesson. Overall, this project has enhanced my critical thinking and research skills, and
sparked curiosity about other variables influencing individual expenses.

Works Cited
Cebula, Richard, et al. "The Impact of Labor Freedom on Geographic Cost of Living Differentials."
Journal of Entrepreneurship and Public Policy, vol. 6, no. 3, 2017, pp. 385-395. ProQuest,
https://www.proquest.com/scholarly-journals/impact-labor-freedom-on-geographic-cost-
living/docview/1955915967/se-2, doi:10.1108/JEPP-D-17-00015.

Federal Reserve Bank of St. Louis. "Federal Reserve Economic Data (FRED)." Federal Reserve Bank
of St. Louis, https://fred.stlouisfed.org/.

GoBankingRates. "Here's the Cost of Living in Every State." GoBankingRates, 28 Oct. 2024,
https://www.gobankingrates.com/saving-money/home/cost-of-living-by-state/.

You might also like