0% found this document useful (0 votes)
2 views

Econometrics_1_

Uploaded by

Tran Minh Tri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Econometrics_1_

Uploaded by

Tran Minh Tri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

1.

An Introduction to Econometrics
“There are two things you are better off not watching in the making: sausages and
econometric estimates” (Leamer, 1983)
In the following chapters, we try to understand the concept of econometrics, the main steps in
empirical analysis, and types of data.

1.1. What is Econometrics? What are the goals of econometrics?

Econometrics is the application of statistical methods for estimating economic relationships,


creating new economic models, testing existing theories, and evaluating economic policies.

Econometrics is about how we can use theory and data from economics, business, and the social
sciences, along with tools from statistics, to answer ‘‘how much’’ questions (Hill et al., 2011:3).

By using econometrics, we try to understand, characterise and measure economic and


socioeconomic phenomena. What kind of questions can arise in this field? Among others:
- What are the main factors influencing the gross domestic product?
- How should we describe the demand for a given product?
- What are the key determinants of consumer choice?
- What is the relationship between inflation and unemployment?
- What is the effect of increasing price on total revenue?

Some previous questions belong to the field of microeconomics, and some to macroeconomics.
It is therefore important to distinguish between microeconometrics and macroeconometrics.
Microeconometrics focuses on the behaviour of individual decision-making units such as the
decisions of consumers, firms, and workers, based on the analysis of different data (cross
section and panel data). In the field of microeconometrics, professionals generally use the main
tools of microeconomics:
- Constrained optimisation,
- Equilibrium analysis,
- Comparative statistics.

Constrained optimisation is an analytical tool for making the optimal choice, taking into account
limitations or restrictions on choice (Besanko-Braeutigam, 2011: 6).
Equilibrium analysis is used to analyse and describe a condition or state that could continue
indefinitely in a system, or at least until there is a change in some exogenous variable (Besanko-
Braeutigam, 2011: 12).
Comparative statistics: Analysis used to examine how a change in a certain exogenous variable
affects the level of a certain endogenous variable in an economic system (Besanko-Braeutigam,
20).
Macroeconomics deals with the operation of an economy as a whole. Macroeconomics treats
issues and problems related to economic growth, sustainable development, employment,
unemployment, inflation, economic and social institutional systems, and the business cycle,
among others. Certain elements of macroeconometrics rely on the analysis of time-series data
and panel data.

However, we cannot establish that econometrics can be divided only into micro- and
macroeconometrics, because there are no clear dividing lines between the categories, and we
cannot forget the other fields of economics, for example, the financial element of econometrics.

1.2. Types of data and economic data

In order to make empirical analysis and to make inferences, we must have data. How can we
collect and derive data?

Types of data

We can distinguish between experimental data and non-experimental data.

Experimental data
Experimental data are often collected under controlled circumstances, where the analyst is able
to fix the values of independent variables at a predetermined value, and the experiment can be
repeated several times. Using experimental data is very rare in social sciences; however, these
data are often collected in natural sciences. If we would like to analyse the impact of
unemployment benefits on the level of unemployment, or we would like to know the main
determinants of Gross Domestic Product (GDP), we can use data from previous years and from
different countries. We are not easily able to change the number of employees or the amount of
human capital in an economy in order to analyse a change in the unemployment rate and GDP.
It is almost impossible to create laboratory environments to analyse the market mechanism and
the operation of an economy.

Non-experimental data

Analysts do not have absolute control over the conditions under which data are collected.
Analysts are not able to select for given values of independent variables, and the experiment
cannot be repeated several times. Non-experimental data are population surveys or other sample
surveys, and administrative records, among others. Non-experimental data are sometimes called
observational data (or retrospective data).
Economic data

Economic data can be:


- Cross-sectional data,
- Time-series data,
- Panel data.

Cross-sectional data

Cross-sectional data are collected across sample units at a given point of time or a particular
time period. Sample units may be individuals, consumers, households, firms, employees,
countries, and other economic units. For example, firms draw up and submit to the National
Statistical Office in Hungary, on a monthly basis, a definitive statistical report of their activity;
we would view these data for one year as a cross-sectional data set. Another example: assume
that we would like to analyse the internal rate of return to education. Therefore, we collect data
on wages, education, experience, gender and other characteristics by randomly drawing 1 000
individuals from the active population at a given time period. The collected data set can be seen
as a cross-sectional data set.

Time-series data

“A time-series data set is collected over discrete intervals of time.


Macroeconomic data are usually reported in monthly, quarterly, or annual terms. Financial data,
such as stock prices, can be recorded daily, or at even higher frequencies. The key feature of
time-series data is that the same economic quantity is recorded at a regular time interval”. (Hill,
et al., 2011)

Time series data in Table 1.1 shows the Hungarian GDP data between 2005 and 2016.

Table 1.1 Gross Domestic Product per capita in Hungary between 2000 and 2015.
Year Gross Domestic Product
per capita (HUF)
2000 1 304 629.2
2001 1 510 019.9
2002 1 714 957.1
2003 1 883 335.8
2004 2 080 075.9
2005 2 227 684.9
2006 2 398 186.5
2007 2 541 859.6
2008 2 696 887.9
2009 2 623 798.4
2010 2 708 583.8
2011 2 824 597.6
2012 2 889 059.8
2013 3 045 294.7
2014 3 283 864.9
2015 3 454 121.2
Source: Hungarian Central Statistical Office (2017a)

Time series data are data collected over several time periods.

The number of college and university students in Hungary between 1990 and 2017 can be seen
as another time-series data set (Table 1.2; Fig. 1.1).
Table 1.2 The number of college and university students in Hungary between 1990 and 2017
Number of students
Number of students
School School
year year of which on full-time
total
courses
of which on full-time
total
courses

1990/1991 108 376 76 601 2004/2005 421 225 512


520
1991/1992 114 690 83 191 2005/2006 424 231 482
161
1992/1993 125 874 92 382 2006/2007 416 238 674
348
1993/1994 144 560 105 240 2007/2008 397 242 893
704
1994/1995 169 940 118 847 2008/2009 381 242 928
033
1995/1996 195 586 132 997 2009/2010 370 242 701
331
1996/1997 215 115 145 843 2010/2011 361 240 727
347
1997/1998 254 693 156 904 2011/2012 359 241 614
824
1998/1999 279 397 168 183 2012/2013 338 233 678
467
1999/2000 305 702 177 654 2013/2014 320 223 604
124
2000/2001 327 289 183 876 2014/2015 306 217 248
524
2001/2002 349 301 192 974 2015/2016 295 210 103
316
2002/2003 381 560 203 379 2016/2017 287 205 560
018
2003/2004 409 075 216 296
Source: Hungarian Central Statistical Office (2017b)

Figure 1.1 The annual change in the number of college and university students in Hungary
between 1990 and 2017

Source: Hungarian Central Statistical Office (2017b)


Graphs similar to Fig 1.1 help researchers to understand the change in time and what happened
in the past, to analyse any trends over time; researchers may forecast future values of data by
using different econometrics methods.

Panel data
Panel data or longitudinal data represent observations of individual economic-units which are
followed over time. Panel data includes the corresponding data for the same cross-sectional
units, for example, the same set of countries, firms or employees.
If we collect data on the number of students in tertiary education and the amount of GDP for
different countries and years, we get a panel data set. Table 1.3 shows the first part (educational
data) of the mentioned panel data.

Table 1.3 Students in tertiary education as a percentage of those aged 20-24 years in the
population (%)
Country 2013 2014 2015 Country 2013 2014 2015
(%) (%) (%) (%) (%) (%)

Belgium 69.2 70.4 72.2 Netherlands 63.9 66.1 78.8

Bulgaria 62.1 65.4 68.9 Austria 78.5 77.6 78.6

Czech Republic 64.8 64.9 63.4 Poland 71.2 68.0 66.4

Denmark 81.0 81.8 83.2 Portugal 64,9 64.3 61.0

Germany (until 1990 former territory of the FRG) 56.9 62.5 64.9 Romania 48.5 48.5 48.0

Estonia 70.3 70.0 70.3 Slovenia 83.2 83.1 79.5

Ireland 75.8 82.3 90.9 Slovakia 54.0 52.1 50.2

Greece 106.8 - 117.8 Finland 90.9 89.8 88.4

Spain 80.6 835 84.7 Sweden 65.4 63.9 63.8

France 59.3 61.6 63.4 United Kingdom 55.2 54.6 54.0

Croatia 64.9 66.5 65.7 Iceland 78.4 80.4 75.6

Italy 60.7 59.5 59.3 Liechtenstein 37.1 36.8 33.6

Cyprus 45.0 49.9 56.0 Norway 75.3 77.3 77.9

Latvia 65.8 66.8 69.0 Switzerland 56.2 58.1 59.1


Former Yugoslav
74.3 69.6 - Republic of Macedonia, 38.2 38.9 41.3
Lithuania the
Luxembourg 19.9 20.1 19.9 Serbia 55.2 57.3 57.6

Hungary 57.0 52.5 49.2 Turkey - 88.1 96.8

Malta 41.6 41.9 44.8


Source: Eurostat (2017a)

Let assume that the personal income tax system will change at the beginning of the next year.
We would like to analyse the effect of the change in personal income tax on household
consumption. We survey a random sample this year and another random sample two years later
following the introduction of the new personal income tax system, in order to implement the
examination. The data set collected from the two surveys will result in a pooled cross-section
instead of panel data, since the two data sets are taken from different random samples. As we
have mentioned, in the case of panel data, the sample units are the same.
1.3. The main application areas of econometrics, and the main
elements of empirical analysis

The main application areas of econometrics are the following, among others:
- estimating economic links, examining the relationship among variables, specifying
models,
- testing hypothesis,
- forecasting variables.

Examination of economic relationships


Economists generally want to know whether relationships between certain economic variables
might exist. For example,
- What is the relationship between working experience and wages?
- What is the effect of increasing the price on total revenue?
- What is the relationship between tax rates and tax revenues?
- Is there any relationship between the tax burden and GDP?
- What is the relationship between supply chain and firm performance?

Important information to enable firm managers to understand how changes in price affect the
total revenue includes the relationship between price and total revenue. From our previous
microeconomic studies, we know that according to the law of demand, there is a negative
relationship between price and quantity demanded a product, when all other influencing factors
remain the same. This means that if the price increases, the quantity demanded decreases;
however, the total revenue might increase and decrease. The change in total revenue depends
on the degree of changes in quantity demanded caused by the change in price. The price
elasticity shows how the quantity demanded for a product changes if the price increases or
decreases.

Testing hypothesis

In the field of economics, there are many theoretical model and different theories. However, we
can see rapid economic change over time, during which many theories and models can remain
the same or change. As a result of the changing economic relationship and environment,
hypothesis testing is very important, despite existing evidence. Under hypothesis testing, we
examine whether the given sample provides support for the examined theory or provides
evidence against the theory.

Forecasting of variables

If we have information about the relationship between variables, we would like to know the
future values of the examined variables. We try to estimate the expected values of the examined
variables based on their values in the past.
Firm managers would like to know the expected demand for the firm’s products, its production
costs, and its expected share price. Useful information for the government may be the expected
level of unemployment or the forecasted value of the Gross Domestic Product.
The main elements of empirical analysis

The main steps of empirical analysis are the following (Fig. 1):

1. Specifying the model,


2. Collecting data,
3. Estimating the model,
4. Testing the hypothesis,
5. Evaluating and interpreting the results.

Figure 1.2 The main steps of an empirical analysis

Specifying the Collecting Estimating the Testing Interpretation


model data model hypothesis of reuslts

RRReeem
mooodddeeelllllliiinnnggg
m

1. Specifying the model

The model is specified by using an equation or equations which describe(s) the behaviour of
economic units or variables, and the relationship between economic variables. The development
of the initial model is based on economic theories, studies, or previous empirical results. Let us
assume we would like to analyse our students’ performance in econometrics. First of all, we
choose the main influencing factors, so we try to describe the Student’s PERFormance in
ECONometrics (sperfecon) as a function of various different factors:

Sperfecon = f(lect, stud, abil, prev, gend), (1.1)

where lect is the number of econometrics lectures and seminars attended by the students, stud
is the number of hours they have studied, abil characterises the students’ ability, prev represents
the previous econometrics knowledge, and gend is the students’ gender. The specification of
the function (1.1) is based on our intuition, because we do not have knowledge of any similar
research. We have to give the form of our function before carrying out the econometrics
analysis. Let us assume that the function form (1.1) is linear; in this case, we can give the
function as the following:

Sperfecon = β0 + β1 ∙ lect + β2 ∙ stud + β3 ∙ abil + β4 ∙ prev + β5 ∙ gend + u, (1.2)


where u is the error term, which includes the unobserved factors.

2. Collecting data

In the next step, we collect data in order to test our model. In the previous subchapter, we have
mentioned that the data can be non-experimental or experimental data. We can use data from
previous surveys, or we can make surveys in different ways ourselves. In this section, it is
important to decide how to manage the unobservable variables. For example, a student’s ability
is difficult to observe; in this case, we can choose other variables to describe a student’s ability,
such as mathematics test scores.

3. Estimating the model and testing the model

In this step, we try to quantify our model. We estimate the unknown parameters (𝛽0 , 𝛽1 , … , 𝛽5 )
of the econometric model. By estimating the unknown parameters, we can quantify the
relationship between the dependent (𝑆𝑝𝑒𝑟𝑓𝑒𝑐𝑜𝑛) and independent
variables(𝑙𝑒𝑐𝑡, 𝑠𝑡𝑢𝑑, 𝑎𝑏𝑖𝑙, 𝑝𝑟𝑒𝑣, 𝑔𝑒𝑛𝑑 ). We test our model to check the underlying
assumptions and the validity of the chosen functional form and parameters. It may be necessary
to modify our model in the light of the test results.

4. Evaluation and interpretation of results

In the last step, we evaluate and interpret our model before finally drawing conclusions. We
can verify previous economic statements, or we can explore new relationships between
economic variables, among others.

Example 1.1
Let us assume that we would like to analyse the quantity of ice cream demanded. Give the main
variables of the quantity of ice cream demanded; explain your reasoning and give the direction
of the relationship.
From our previous microeconomics studies, we know that the demand for a good depends on
the price of the good, the price of substitute goods, the consumers’ tastes, the consumers’
income, and the price of complementary goods. The quantity of ice cream demanded as a
dependent variable can be given as a function of different independent variables:

Quantity of ice cream demanded = f(price ic, price sg, tastes, income, price cg),

where
price ic = the price of ice cream,
price sg = price of substitute goods,
tastes = consumers’ tastes,
income = consumers’ income,
price cg = price of complementary goods.

According to the law of demand, there is a negative relationship between the price of ice cream
and the quantity demanded. This means that if the price of ice cream increases, consumers buy
less ice cream, all other things being equal.
Let us assume that the price of a substitute good for ice cream decreases. According to the law
of demand, the demand for the substitute good (frozen yoghurt) increases, and the demand for
the ice cream probably decreases, all other things being equal.
If the consumers’ income decreases, they have less to spend on different kind of goods, and the
quantity of ice cream demanded decreases if all other influencing factors remain the same. We
must mention that the relationship between quantity demanded and consumers’ income is not
the same for all goods. It depends on the measure of the income elasticity of a good. Goods can
be normal, inferior and superior (luxury) goods. If ice cream can be seen as a normal good, the
quantity demanded is positively related to the consumers’ income.
Finally, suppose that the price of ice cream cones increases. What is the effect of the change in
the price of ice cream cones on the quantity of ice cream demanded? The price of ice cream
increases as the price of ice cream cones increases, which means that the quantity of ice cream
demanded decreases. Ice cream and ice cream cones are complementary goods; the price of ice
cream cones and the quantity demanded move in the opposite direction, ceteris paribus.
The model of the quantity of ice cream demanded can be specified as the following:

Sperfecon = β0 + β1 ∙ price ic + β2 ∙ price sg + β3 ∙ tastes + β4 ∙ income + β5 ∙ price cg +


+u,
where u is the error term, and βi are the parameters of the independent variables.

We have repeatedly mentioned the following expressions: all other things being equal, ceteris
paribus, all other influencing factors remain the same. Ceteris paribus means that other things
are equal or other influencing factors are equal. If we would like to analyse the effect of a
change in an independent variable on the dependent variable, it is necessary to assume that only
the given independent variable changes and other factors remain the same.
1.4. Terms and Questions
ceteris paribus,
complementary goods,
cross-sectional data set,
econometrics,
experimental data,
forecasting of variables,
inferior goods,
law of demand,
luxury goods,
macroeconometrics,
microeconometrics,
non-experimental data,
normal goods,
observational data,
panel data,
pooled cross sectional data set,
specifying the model,
substitute goods
superior goods,
testing hypothesis,
time series data set.

Problems

Theoretical questions

1. What is the difference between experimental data and non-experimental data?

2. What is econometrics?

3. Give the definition of a cross sectional data set.

4. How can we specify a model?

5. Explain what the relationship is between specifying the model and testing the model.
6. What is the difference between a cross-sectional data set and a pooled cross sectional
data set? Give an example of each.

7. Give the main steps of empirical research.

8. Explain what the difference is between microeconometrics and macroeconometrics.

9. Give an example of a time series data set.

10. Give the definition of a panel data set.

11. Give an example of an econometric model.

12. What is the difference between statistics and econometrics?

Calculation exercise

1.
Suppose that you would like to analyse students’ height at Debrecen University. Give the height
as a function of independent variables. List the main independent variables.

2.
Which of the following pairs can be seen as dependent and independent variables (separately)?

a) The growth rate of the price level and the real interest rate.
b) The amount of GDP in an economy and the level of human capital in the same economy.
c) Nationality of students and their height.
d) The aggregate net investment and the amount of aggregate income.
e) The quantity of pencils demanded and the number of primary school pupils in a country.
f) The quality of the lecture and the number of students.

3.
Give an example of a dependent variable and at least two independent variables which affect
the dependent variable. Give the direction of the relationship.

4.
You would like to analyse the price of flats in Debrecen. The price is the dependent variable of
your econometric model. Give the list of independent variables and the direction of the
relationship separately.

5.
Assume that your lecturer has collected data on the performance of students. These data are
summarised in the Table below. The required minimum level of the econometrics test was 100
points, and the maximum score was 200 points.

Students’ performance
Observation Test points Time spent on econometrics studies
(minutes)
1 195 684
2 200 720
3 58 144
4 82 216
5 74 192
6 200 960
7 32 66
8 194 552
9 200 714
10 200 746.4
11 80 294
12 180 576
13 94 300
14 52 144
15 88 343.2

a) Find the smallest and largest values of the test points.


b) Find the average of the test points in the sample.
c) Find the average of the time spent on studying in the sample.
d) How many students have passed?
e) Which one is the dependent variable?
f) Does it make sense to think that the students’ performance depends on the quality of the
teacher’s lectures?
g) Give the direction of the relationship between the two variables (dependent variable and
independent variable).
h) List other independent variables which can have an impact on the dependent variables.
Explain your reasoning and the give direction of the relationship.

6.
An estate agent collected data on 20 flats in Debrecen in 2017.

Data of flats
Price Floor area Age
Number Million Square
years
HUF metres
1 15.6 48 2
2 26.4 55 2
3 13.6 71 3
4 26.8 82 0
5 28.8 100 3
6 38.4 85 1
7 19.2 70 8
8 17.6 73 9
9 17.2 74 10
10 15.6 66 7
11 7.6 35 47
12 13.6 53 18
13 15.2 73 18
14 8 39 63
15 12.8 67 23
16 8.8 48 63
17 10.8 51 31
18 21.6 61 5
19 10 53 40
20 9.2 54 78
20
a) Give the types of the given data set.
b) Find the smallest and largest values of the prices.
c) Find the average of the prices in the sample.
d) Which one is the dependent variable?
e) What is the relationship between the price and the floor area?
f) Is the relationship between price and age positive or negative?
g) Give an independent variable which has a positive effect on the price.
h) Give an independent variable which has a negative effect on the price.

7.
Assume that you built a model of the price of the ith flat as a function of the floor area (FLA),
and the result of your estimation is the following:
̂ 𝑖 = 3.2 + 0.4 ∙ FLAi .
𝑃𝑟𝑖𝑐𝑒
a) Give the graph of the estimated function. (Excel)
b) Give the mathematical meaning of the estimated coefficients.
c) Interpret the economic meaning of the estimated coefficients.
d) You have possibility to add another independent variable to the equation. Give your choice
and explain it.

8.
Assume that you built a model of price of the ith flat as a function of floor area (FLA), age
(AGE), and the number of rooms (ROOM):
𝑃𝑟𝑖𝑐𝑒𝑖 = β0 + β1 ∙ FLAi + β2 ∙ AGEi + β3 ∙ ROOMi + ui .

a) What is the meaning of the β1 coefficient?


b) What is the meaning of the β2 coefficient?
c) What is the meaning of the β3 coefficient?
d) You have the possibility to add another independent variable to the equation. Give your
choice and explain it.
2. Simple Linear Regression Analysis
In the subsequent sections, we will build a model in order to study the relationship between
only two variables, and we will try to interpret the estimated model. This is the first step in an
econometric analysis. The first part of this chapter deals with the linear regression model with
only two variables, and in the subsequent sections, we will study how to test the goodness of
the model and evaluate the results.

2.1. An Introduction – correlation versus regression

In this section, we introduce the concept of a linear regression model, show several varieties of
such a model, and explain the estimation method (least squares) that is generally applied with
regression models. Regression Analysis is the process used to describe the relationship between
two or more variables. If we would like to analyse only the strength of the relationship between
two variables, we can use a correlation coefficient, such as the Pearson correlation coefficient.

Correlation
The correlation between variables is a measure of the nature and degree of association
between the variables.
According to our previous studies, we know that the Pearson correlation coefficient (linear
correlation coefficient) can be given by the following equation:
∑𝑛 ̅ ̅
𝑖=1(𝑋𝑖−𝑋 )∙(𝑌𝑖 −𝑌)
𝑟𝑥𝑦 = , (2.1)
√∑𝑛 ̅ 2 𝑛 ̅ 2
𝑖=1(𝑋𝑖 −𝑋) ∙ ∑𝑖=1(𝑌𝑖 −𝑌 )

where 𝑋 and 𝑌 are the two variables, 𝑋̅ (𝑌̅ ) is the mean of variable 𝑋 (𝑌), and n is the number
of observations. The value of the linear correlation coefficient can be found between -1 and 1:
−1 ≤ 𝑟𝑥𝑦 ≤ 1.
There is a positive correlation between two variables if the two variables change in the same
direction. This means that if one variable increases (decreases), the other variable also increases
(decreases). In the case of a positive association, the coefficient is positive:
0 ≤ 𝑟𝑥𝑦 .
A perfect positive (negative) correlation exists when the value of the indicator is equal to 1 (-
1). If there is no correlation between the two variables, the correlation coefficient is zero. We
can say that two variables are independent: the change in one variable has no effect on the other
variable.
The Pearson correlation coefficient measures the linear correlation between two variables.

Pearson's correlation coefficient is equal to the ratio of the covariance of the two variables and
their standard deviations.
The correlation measures the nature and degree of association between the variables; however
it does not show the functional relationship between the variables and the causality.
Figure 2.1 The Pearson correlation coefficient

Example 2.1
Consider the following time-series data on the total expenditure for advertising and the total
revenue of a given firm. Examine the association between the two variables.

Years Total expenditure on Total revenue


advertising (million HUF)
(million HUF)
1. 8 20
2. 7 16
3. 4 15
4. 3 14
5. 5 19
6. 4 12
7. 5 18
8. 7 24
9. 3 16
10. 5 22
11. 9 28
12. 6 25
SUM 66 229
Mean 5.5 19.08
To examine the association between total expenditure on advertising and the total revenue, we
calculate the Pearson correlation coefficient:

Years ̅ ) (𝒀𝒊 − 𝒀
(𝑿𝒊 − 𝑿 ̅ ) (𝑿𝒊 − 𝑿
̅ ) ∙ (𝒀𝒊 − 𝒀
̅)

1. 2.5 0.92 2.29


2. 1.5 -3.08 -4.63
3. -1.5 -4.08 6.13
4. -2.5 -5.08 12.71
5. -0.5 -0.08 0.04
6. -1.5 -7.08 10.63
7. -0.5 -1.08 0.54
8. 1.5 4.92 7.38
9. -2.5 -3.08 7.71
10. -0.5 2.92 -1.46
11. 3.5 8.92 31.21
12. 0.5 5.92 2.96
Sum 75.5
Sum of Squares 41 260.92

∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅ ) ∙ (𝑌𝑖 − 𝑌̅ ) 75.5 75.5


𝑟𝑥𝑦 = = = = 0.73
√41 ∙ 206.92 √10 697.72
√∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2 ∙ ∑𝑛𝑖=1(𝑌𝑖 − 𝑌̅ )2

According to the result, we can see that there is a relatively strong association between the two
variables. The correlation coefficient is positive; this refers to the positive correlation between
total expenditure on advertising and the total revenue. However, there is no information on the
causation. We do not know whether the change in expenditure on advertising causes the change
in total revenue or vice versa.

Example 2.2
Examine the Pearson correlation coefficient for the following data:

Observations 𝑿𝒊 𝒀𝒊

1. 1 11
2. 2 13
3. 3 15
4. 4 17
5. 5 19
6. 6 21
Mean 3.5 16
Years ̅ ) (𝒀𝒊 − 𝒀
(𝑿𝒊 − 𝑿 ̅ ) (𝑿𝒊 − 𝑿
̅ ) ∙ (𝒀𝒊 − 𝒀
̅)

1. 2.5 0.92 2.29


2. 1.5 -3.08 -4.63
3. -1.5 -4.08 6.13
4. -2.5 -5.08 12.71
5. -0.5 -0.08 0.04
6. -1.5 -7.08 10.63
Sum 35
Sum of Squares 17.5 70

∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅) ∙ (𝑌𝑖 − 𝑌̅ ) 35 35


𝑟𝑥𝑦 = = = = 1.
√17.5 ∙ 70 √1225
√∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2 ∙ ∑𝑛𝑖=1(𝑌𝑖 − 𝑌̅ )2

There is a perfect positive correlation between the two variables. If we illustrate the given data,
we can see the perfect linear association:

Y
25

20

15

10

X
0
1 2 3 4 5 6

The equation of the function is:

𝑓 (𝑥 ) = 2 ∙ 𝑥 + 9.
Example 2.3
Given the following function:
𝑓 (𝑥 ) = 𝑥 2 + 3.

The following Table contains y values when x goes from 1 to 6.


Observations 𝑿𝒊 𝒀𝒊

1. 1 4
2. 2 7
3. 3 12
4. 4 19
5. 5 28
6. 6 39
Mean 3.5 18.17

∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅) ∙ (𝑌𝑖 − 𝑌̅) 122.5


𝑟𝑥𝑦 = = = 0.9789.
√17.5 ∙ 894.8
√∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2 ∙ ∑𝑛𝑖=1(𝑌𝑖 − 𝑌̅ )2

We might expect a perfect correlation, because there is a functional relationship between X and
Y; however, the value of the coefficient is less than 1. The reason for the result comes from the
main characteristic of the indicator. As we mentioned earlier, the Pearson coefficient measures
the degree of linear association. The relationship between the dependent (y) and independent
variables (x) is other than linear since the dependent variable is given by a parabolic function
of the independent variable:
𝑦 = 𝑥 2 + 3.

You might also like