0% found this document useful (0 votes)
38 views

Dummy Variable

Econometrics

Uploaded by

vani14iips
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
38 views

Dummy Variable

Econometrics

Uploaded by

vani14iips
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
DUMMY WARIAR} REGRESSION MODE|s ——vVsamv0reerew In Chapter 1 we discussed briefly the four types of variables that one gene, ally encounters in empirical analysis: These are: ratio scale, interval Scale, ordinal scale, and nominal scale. The types of variables that we hay, encountered in the preceding chapters were essentially ratio scale. But th should not give the impression that regression models can deal only With ratio scale variables. Regression models can also handle other types of ian, ables mentioned previously. In this chapter, we consider models that may involve not only ratio scale variables but also nominal scale variables. Suh variables are also known as indicator variables, categorical variables, h = qualitative variables, or dummy variables.! 2.1 THE NATURE-OF-DUMMY VARIABLES In regression analysis the dependent variable, or regressand, is frequent influenced not only by ratio scale variables (e.g., income, output. prices costs, height, temperature) but also by variables that are essentially qualita. tive, or nominal scale, in nature, such as sex, race, color, religion, national ity, geographical region, political upheavals, and party affiliation, For evan: ple, holding all other factors constant, female workers are found to earn less than their male counterparts or nonwhite worker: found to earn less than whites.” This pattern may result from sex or racial discrimination, but whatever the reason, qualitative variables such as sex and race seem to {We will discuss ordinal scale variables in Chap. 15 ?For a review of the evidence on this subject, see Bruce E. Kaufman and Julie L. Hotchis The Economics of Labor Market, 5th ed., Dryden Press, New York, 2000 ™“N CHAPTERNINE. DUMMY VARIABLE REGMESSION MODELS 305 anttuenee the reetessand and clearly should be included among the explana- in eatviales, OF the FeREESSONS. Since such variables usually indicate the presence or absence of & quality” ora attribute, such as male or female, black or white, Catholic or varicatholic, Democrat or Republican, they are essentially nontinal seale ables. One way we could "quantify" such attributes is by constructing aviables that take on values of 1 or 0, 1 indicating the presence (or ariession) of that attribute and O indicating the absence of that attribute. pe example | may indicate that a person is @ female and 0 may designate a Foviesor | may indicate that a person is a college graduate, and O that the person is wot, and so on, Variables that assume such 0 and | values are Peed dummy variables.’ Sitch variables are thus essentially a device to elas- idata into mutually exclusive categories such as male or female. ‘pummy variables can be incorporated in regression models just as easily as quantitative variables. As a matter of fact, a regression model may con- as MMgressors that are all exclusively dummy, or qualitative, in nature Such models are called Analysis of Variance (ANOVA) models.’ ke 2 ayova MODELS “Do illustrate the ANOVA models, consider the following example. EXAMPLE 9.1 . PUBLIC SCHOOL TEACHERS’ SALARIES BY GEOGRAPHICAL REGION able 9.1 gives data on average salary (in doliars) of public school teachers in 80 states and the District of Columbia for the year 1985. These 51 areas are classified into three geo- tgaptical regions: (1) Northeast and North Central (21 states in all) (2) South (17 states Sin and (3) West (13 states in al), For the time being, do not worry about the format ofthe table and the other data given in the table. ‘Suppose we want to find out i the average annual salary (AAS) of public school teachers ditfers among the three geographical regions of the country. If you take the simple arth mato average of the average salaries of the teachers in the three regions, you wil find that these averages for the three regions are as follows: $24,424.14 (Northeast and North Cen- tra), $22,894 (South), and $26,158.62 (West). These numbers look diferent, ut are they (Continued) geese eee Tris not absolutely essential that dummy variables take the values of O and L-The pat 0.1) can be transformed into any other pair by a linear function such that Z = a + bD (2 0). here cab are constants and where D = 1 or 0, When D = t,we have Z = + band w hen D = 0. cathave Z aa. Thus the pair (0,1) becomes (a,a+). For example, ita = 1 and b=2. the dummy variables will be (1, 3). This expression shows tar qualitative. or dawn, variables do not have a natural scale of measurement. That is why they are described as nominal scale variables. ANOVA models are used to assess the statistical significance of the relationship Detweey quantitative regressand and qualitative or dummy regressors They ate often used to com: pare the differences in the mean values of two or more groups oF ategories, and are therefore pavee general than the f test which can be used to compare the means of two groups oF Ete: gories only. «i 308 PART ONE: SINGLE-EQUATION REGRESSION MODELS EXAMPLE 9.1 (Continued) B, = $26,158 $24,424 (B, + B) el $22,894 (6, + B,) | West Northeast and South North Central FIGURE 9.1 ‘Average salary (in dollars) of public school teachers in three regions. Differences in educational levels, in cost of living indexes, in gender and race may ange some effect on the observed differences. Therefore, unless we take into account al thecee variables that may affect a teachers salary, we will not be able to pin down the causes y the differences. From the preceding discussion, it is clear that all one has to do is see ifthe cceicens attached to the various dummy variables are individually statistically significant This amps also shows hoiv easy itis to incorporate qualitative, or dummy, regressors in the regessor models. Caution in the Use of Dummy Variables Although they are easy to incorporate in the regression models, one mustue the dummy variables carefully. In particular, consider the following aspect 1. In Example 9.1, to distinguish the three regions, we used only dummy variables, Dz and D3. Why did we not use three dummies to disit guish the three regions? Suppose we do that and write the model (9.2.11 Yi =a + Bi Dy + ByDaj + B3Dsi + 14 (9281 where Dj; takes a value of 1 for states in the West and 0 otherwise. This. now have a dummy variable for each of the three geographical resi Using the data in Table 9.1, if you were to run the regression (9.2.6), thee puter will “refuse” to run the regression (try it). Why? The reason is that | yj | ‘Actually you will get a message saying that the data matrix is singular - 9.2.6) where you h up of © have a dummy vari ° oe variable for cach catego hdakoan tere pt, vou have a case of perfect collinearity, nate re Oe i ete 7 a 31 we a 1 column, taking th 8 faa pay ¢ the value of 1 whenevei ' i and Ostherdee: New If you'uéd she htce D columns Kor epee will obtain a column that has 51 ones in it. B eel i et ut since the value is (implicitly) 1 for each observati wie ‘ach observation, you will have a colt hat also contains 51 ones. In other v 7 : . words, the Stamos will ‘simply reproduce the intercept column, ae ‘ a un 2 to perfect In this case, estimation of the mod se ° el (9.2.6) is i olfye message Here is If a qualitative variable has Lee uee only =1 ) ey variables. In our example, since the Se ative de ie vegion” has three categories, we introduced only two dummies. If vu do not follow this rule, you will fall into what is called the dummy oa able traps that is, the situation of perfect collinearity or perfect multi- collinearit if there is more than one exact relati i : nore % ionship among the vari- ables. This “sale also applies if we have more than one qualitative variable in he model, an example of wi hich is presented later. Thus v : we shi the meceding rule as: For each qual ce cavers ‘ jitative regressor thi dummy variables introduced must be one less than the ne st durPvariable. Thus, if in Example 9.1 we had information about the gender tine teacher, we would use an additional dummy variable (but not two) of ging a value of 1 for female and 0 for male or vice versa 3 the category for which no dummy vari she base, benchmark, control, comparison, ‘gory, And all co isons CHAPTER Nn NINE DUMMY VARIABLE REGRESSION MODELS 309. the § croup a" able is assigned is known as at reference, or omitted cate- 7 ind all comparisons are tade in relation to the benchmark category: oF The intercept value (B1) represents the mean value of the benchmark category. In Example -{, the benchmark category is the Western region. Fence, in the regression (9.2.5) the intercept value of about 26,159 repre- sents the mean salary of teachers in the Western states. 4, “The coefficients attached to the dummy variables in (9.2.1) are known ential intercept coefficients because they tell by how much ‘of ihe intercept that receives the value of 1 differs from the inter- cept. fficient of the benchmark category. For example, in (9. 5), the value of about — 1734 tells us that Tie mean salary of teachers in the Northeast oF North Central is smaller by about $1734 than the mean salary ‘of about $26,159 for the benchmark category, the West. 5. Ifa qualitative variable has more than one category: a in our illus- trative example, the ice of the benchmark category is strictly up to the searcher. Sometimes the choice of the benchmark is dictated by the par- ticular problem at hand, In our illustrative example, we could have chosen the South as the benchmark category: Tn that case the v ession results given in (9.2.5) will change, because now all comparisons are made in rela- tion to the South. Of course, this will not change the ov erall conclusion of our example (why?). In this case, the intercept value will be about $22,894, which is the mean salary of teachers in the South. 310 PARTONE: SINGLE-EQUATION REGRESSION MODELS 6. We warned above about the dum: circumvent this trap by introducing as eh ariable trap, 7 tof categories of that variable, provided we dain) Vataby™ a, tuch a model. Thus, if we drop the intercept cert ttodugs ‘Sti sider the following model, From 5 ss at Y= BDut BDa + BsDy 44 we do not fall into the dummy variable trap, collinearity. But make sure that when you run thig ere 8 ° intercept option in your regression package, Bresion, yoy gS How do we interpret regression (9.2.7)? If “ab A (9.2.7), you will find that: You take the i “| 6; = mean salary of teachers in the West ean salary of teachers in the Northeast a pie, | ind North, Cony, (986.8645) ou 27.5072)" (23.1987) i R? = 0.0901 | lues of these f ratios are very smal ummy \C ents give directly the mean (sslan)*| ans, West, Northeast and North Central, and So better: method|of introducing a dummy variable:(I)® each category and omit the intercept term or 2) luice Only (2 — 1) dummies, where" variable? As Kennedy notes: ith an intercept more conveniznt ts eh they wl kes adi gis wu t the categorization mal ization does make a different atiable coefficient estimates", can be done by running 8° ‘to be more general, at FE" TWO QUALITATIVE VARIABLES pets - 7 i oe No he evans eet eee a tie an ANOVA model with one qualitative riable With © ries. In this section we consider another ANOV: variable ith tivo qualitative variables, and bring out some Saitonal : meee about ‘gummy variables. 7 \Wihich is the benchmark category here? Obvious WG ne? a expmton TO NARITAL STATUS. cues non extn residence. oer ‘words. ; Jerson who donot Tivo int _ | Sree hesenee ort Se cet uer cr eneke, + | nur 085 ho tat ote gon The mean ew wae ee os peanes yenchmark is about $8.81. Compared with this, the vesiiane eoragje Hourly wag of those who are marred is righor | sry 672003 by about $1.10, for an actual average wage of $9.91 etl a) (neat + "110). By contrast, for those who live in the ve tn uth, the average hourly wage slower by about $1.87, Me ie aan oan, eaten rourly wage o $7.44, i. ' ire ine proceding average nouly wagos stastcaly # (0.0008) iferent compared to the base category? They are, tor " 1182)" p00 (0.01 all the differential intercepts are statistically significant, as their p values are quite low. aa ‘The point to note about this example is this: Once yel row a = martied.O= otherwise YOU. go beyond one qualitative variable, you have to pay po aried gt = South, = otnerise ‘Hose attention to the category that's treated as the base peregon % ‘category, since all comparisons are made in relation to fi that category. This is especially important when you ve. wo qualitative repressors, have several qualatie regressors, each with several rawvjenco we have assigned & categories. Bit the mechanics of introducing several Soy anebe gory. qualitative variables should be clear by now. ~y 4 REGRESSION WITH A MIXTURE OF QUANTITATIVE AND Ee D (QUTATIVE REGRESSORS: THE ANCOVA MODELS RP = 0.0322 é ) ANOVA models of the type discussed in the preceding two sections, al- and though common in fields such as sociology, psychology, market research, are not that common in economics. Typically, in most €co- nomic research a regression model contains some explanatory variables ‘d some that are qualitative. Regression models con that are quantitative an taining a mix of quantitative and qualitative variables are called analysis of els are an extension of the covariance (ANCOVA) models. ANCOVA mode i ANOVA models in that they provide a ‘method of statistically controlling the alled covariates or control variables, effects of quantitative regressors, C education, hai anaes PThe data are obtained from the data disk in Arthur S. Goldberger, Intvoductory Economet- ries, Harvard University Press, Cambridge, Mass., 1998. We have already considered these dats in Chap. 2. PART ONE: SINGLE-EQUATION REGRESSION MODELS v “org, Example gy ers may no, in a model that includes both quantitative and gressors. We now illustrate the ANCOVA models To motivate the analysis, let us reconsider p that the average salary of public schoo} teach, three regions if we take into account any variat ‘ dardized across the regions, Consider, for exam, ee thar ance ture on public schools by local authorities, ns publin © the Varig! be local and state question. To see if this ie the case, \otCation abe model: * we Malitatiye y be agin fog 7 Yi= Bi + B2Dx + AsDy, FBX; hy . if the state is in the South = 0, otherwise The data on X are given in Table 9.1. Kee West as the benchmark category. Also, Tegressors, we have a quantitative vay P in mind that we are hote that besides the ne riable, X, which in thot delim, ANCOVA models is known as a covariate, as noted ew, Soman EXAMPLE 9.3 95 7 TEACHER'S SALARY IN RELATION TO REGION AND SPENDING ON PUBLIC SCHOOL PER PUPIL From the data in Table 9.1, the results of the model (9.4.1) are as follows: Y= 13,260.11 ~ 1673.514De;— 1144.157Ds + 3.2889%, Se= (1895.056) (801.1703) (861.1182) (0.3176) = f= (85115 (-2.0880)" (-1.9086)"* (10.3539" FP = 0.7266 é t Where * indicates pvalues less than 5 Percent, and ** indicates p values greater than Spee" As these results suggest, ceteris paribus: as public expeniture goes up by adit Sverage, a public schoo! teacher's salary goes up by about $3.29. Controlling for bei se education, we now see that the cltferential intercept cootfiient is signiicant 0: eae a nowh-Centra region, but not for the South, These results are dieren ee (8.2.5). But this should not be Surprising, for in (9.2.5) we did not account for ically, we have the st Sifferences in per pupil public ‘spending on education, Diagrammatically, we have tion shown in Figure 9.2. Note that althou the regression regression li o,stasicl igh we have shown three regression lines for the three at the the? 's are the same for the West and the South. Also no! 8 are drawn parallel (why?), (conn 8 _ yeu VARIABL CHAPTER NINE: DUMMY VARIAOLE REG _xanirteoo (cominued) y FIGURE 92 FIGURE teacher’ salary (Yin relation to per pupil expenditure on education %. \ E ALTERNATIVE TO THE CHOW TEST? In Section 8.8 we discussed the Chow test to examine-the structural stabil- ity of a regression model. The example we discussed there related to the “lationship between savings and income in the United States over the eriod 1970-1995. We divided the sample period into two, 1970-1981 and 982-1995, and showed on the basis of the Chow test that there was a dif- ference in the regression of savings on income between the two periods. However) we could not tell whether the difference in the two regressions was because of differences in the intercept terms or the slope coefficients or both. Very. often this knowledge itself is very useful. Referring to Eqs. (8.81) and (8.8.2), we see that there are four possibili- ties, which we illustrate in Figure 93. 1. Both the intercept and the slope coefficients are the same in the wo re- gressions. This, the case of coincident regressions, is shown in Figure 9.3a. 2, Only the intercepts in the two regressions are different but the slopes are the same. This is the case of parallel regressions, which is shown in Figure 9.3b. The material in this section draws on the author's articles, "Use of Dummy Va ‘Testing for Equality between Sets of Coefficients in Two Lineat Resress 0m. ANote,” and “Use of Dummy Variables ...A Generalization,” both published in the America Statistician, vol. 24, nos. land 5, 1970, pp. 50-52 and 18-21. SINGLE-EQUATION REGRESSION MODELS cE z 312. par ON nes savings | T | nn Tora, (Ls Income ! (a) Coincident regressions (6) Parallel regression, Ming Savings ings nea Income ae (c) Concurrent regressions (d) Dissimilar regressions ‘ FIGURE 9.3 Plausible savings-income regressions. 3. The intercepts in the two regressions are the same, but the slopes, different. This is the situation of concurrent regressions (Figure 9.3. 4. Both the intercepts and slopes in the two regressions are differ This is the case of dissimilar regressions, which is shown in Figure 9 The multistep Chow test procedure discussed in Section 8.8, as noted ex lier, tells us only if two (or more) regressions are different without tellingus what is the source of the difference. The source of difference, if any: can’ pinned down by pooling all the observations (26 in all) and running justo multiple regression as shown below": Ye = ay +a2D; + BrX; + B2(D:X1) +H (951 where savings X = income time 1 for observations in 1982-1995 = 0, otherwise (i.e., for observations in 1970-1981) D Thro. a ‘As in the Chow test, the pooling technique assumes homoscedasticity thatis.21 ="! INCOME DATA, UNITED neste z = oe cy we 36 972 eo 33 ee we 976 a 1004 8 es a ee 7 1126 oe 301 = 1618 we 1901 4982 2055 198 167 ei 2057 ee 2082 = 4985 1987 1084 408 102.4 4988 sere 4300 2087 ‘eat 246.4 ay 2728 a 218d {ead 1994 249.3 1995, __ 88208 Tor asarvaone waging 1820 iso! ool “e yeu S eaten a's. 98 9 eeonami ae ppt 007. Te 828.0 SP ccture of the data matrix: 2 shows the strus table 9. : as 7 ‘implications of ( 5,1), and, assumins: Mean savings function for 1970-1981: (1D = 0 = + BX (9.5.2) as previousl drifter): vings fun HU [be 25 = 20 97 99:5 4806 7 —S b 542 1 ON = SINGLE-EQUATION REGRESSION MODELS category that receives the dummy value of 1) differs from that of the firg riod. Notice how the introduction of the dummy variable D in the ing? tive, or multiplicative, form (D multiplied by X) enables us to differen 8 between slope coefficients of the two periods, just as the introduction ,t€ dummy variable in the additive form enabled us to distinguish betweg,, 7 intercepts of the two periods EXAMPLE 9.4 SAVING: STRUCTURAL DIFFERENCES IN THE US. THE DUMMY VARIABLE APPROACH S-INCOME REGRESSION, Before we proceed further, let us first present the regression results of Model (9.5.1) appieg to the US. savings-income data. Y= 10161 + 152.4786D, + 0.0803X,— 0.0655(D:X) se = (20.1648) (33.0824) (0.0144) (0.0159) 54 | tM t= (0.0504)"* (4.6090)* (5.5413)" —_ (~4.0963)" FP = 0.8819 where * indicates p values less than 5 percent and ** indicates p values greater than 5 percent ‘As these regression results show, both the differential intercept and slope coetticients arg statistically significant, strongly suggesting that the savings-income regressions for the tug time periods are different, as in Figure 9.3d. From (9.5.4), we can derive equations (9.5.2) and (9.5.3), which are: ‘Savings-income regression, 1970-1981: ¥,= 1.0161 + 0.0803X; (9.5.5) ‘Savings-income regression, 1982-1995: Y= (1.0161 + 152.4786) + (0.0803 ~ 0.0655)X, = 153.4947 + 0.0148%, (958) These are precisely the results we obtained in (8.8.14) and (8.8.2a), which should not be sur prising. These regressions are already shown in Figure 8.3. The advantages of the dummy variable technique [i.e., estimating (9.5.1)] over the Chow test [Le., estimating the three regressions (8.8.1), (8.8.2), and (8.8.3)] can now be seen readily: 1, We need to run only a single regression because the individual regressions can easily be derived from it in the manner indicated by equations (9.5.2) and (9.5.3) 2, The single regression (9.5.1) can be used to test a variety of hypotheses. Thus if the di ferential intercept coefficient az is statistically insignificant, we may accept the hypothess that the two regressions have the same intercept, that is, the two regressions are concu- rent (see Figure 9.30). Similarly, if the differential slope coefficient 2 is statistically i significant but as is significant, we may not reject the hypothesis that the two regressions have the same slope, that is, the two regression lines are parallel (cf. Figure 9.3b). The tes! of the stability of the entire regression (i.6., a2 = Bz = 0, simultaneously) can be made bj the usual Ftest (recall the restricted least-squares F test). If this hypothesis is not rejected the regression lines will be coincident, as shown in Figure 9.3a. (Continued) CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 317 EXAMPLE 9.4 (Continued) 3, The Chow test not expticitly tell us which coefficient, intercept. or lop is ditterant on whether (as in this example) both are different in the two perieds That is. one can ob- tain a sianficant Chow test because the slope only is dit ent or the intercept only is dit ferent, of Both are different In other words, we cannot tell via the Chow test, which ane af tne four possibiitves depicted in Figure 9.2 exists in a given instance In this respect. the dummy variable approach has a distinct advantage. for it not only tells # the two are dit- ferent but also pinpoints the source(s) of the ditterence—whether its due to the inte dor the slope or both. In practice, the knowledge that two regres: apt ons difer inthis or that co- efficient is as important as. if not more than, the plain knowledge that they are different. 4, Finally, since pooling (1.e., including all the observations in one regression) increases the degrees of freedom, it may improve the relative precision of the estimated parameters. Of course. keep in mind that every addition of a dummy variable will consume ene degree of freedom, «ren#GTION EFFECTS USING DUMMY VARIABLES wl Dummy variables are a flexible tool that can handle a variety of interesting problems. To see this, consider the following model: ¥, =a + 02D3, + @3D3 + BX) + (9.6.1) where Y = hourly wage in dollars X = education (years of schooling) Dz = 1 if female, 0 otherwise Dy = 1 if nonwhite and non-Hispanic, 0 otherwise In this model gender and race are qualitative regressors and education is a quantitative regressor.'! Implicit in this model is the assumption that the differential effect of the gender dummy D) is constant across the two cate- gories of race and the differential effect of the race dummy D; is also con- stant across the two sexes. That is to say, if the mean salary is higher for males than for females, this is so whether they are nonwhite/non-Hispanic or not. Likewise, if, say, nonwhite/non-Hispanics have lower mean wages, this is so whether they are females or males. In many applications such an assumption may be untenable. A female nonwhite/non-Hispanic may earn lower wages than a male nonwhite/non- Hispanic. In other words, there may be interaction between the two qualita- tive variables D2 and D3. Therefore their effect on mean Y¥ may not be simply additive as in (9.6.1) but multiplicative as well, as in the following model. Y, = ay + a Dy) + eDyi + (D2, Dy) + BX: + (9.6.2) where the variables are as defined for model (9.6.1). From (9.6.2), we obtain: E(Y;\ Dy = 1, Dai = 1, Xi) = (ory + 2 + 3 Ferg) + BX; (9.6.3) T]f we were to define education as less than high school, high school, and more than big school, we could then use two dummies to represent the three classes. > NG E EQUATION HEGRE .SSION MODELS - which is the mean hourly wage fanetion for femal workers, Observe that ay = dillerential effect of being a female = differential effect of being a non lillerential effect of being a femal hite hiteMonttigy, le nonwhite ‘nic ned "I i which shows thatthe mean hourly waes of female tony isdlferent (by ea) from the mean hourly wages of Fema Mm | Hispanics. 1, for instance, all the three differential dung MMe) negative, this world imply that female nonvhitchnonsttiny etl st nich lower mean hourly wages than female or oni k, workers as compared with the base category, which in we is male white or Hispanic. re Now the reader can see how the interaction dummy ( two qualitative or dummy variables) modifies the effect of considered individually (i.e,, additively), EXAMPLE, aT AVERAGE HOURLY EARNINGS IN RELATION TO EDUCATION, GENDER, anon 4.0. the prog, the two att, Let us first present the regression results based on model (9.6.1). Using the dala aus sed to estimate regression (9.3.1), we obtained the following results: Y= 0.2610 - 2,3606Dz)—. 1.7327Dy + 0.8028X, t= (-0.2387)"" (—5.4873)" — (-2.1803)"_(9.9094)" Bie fete ie ‘here "Indicates p values less than § percent and ** indicates p values greater thanSpeos ‘The reader can check that the differential intercept coefficients are statistical so that they have the expected signs (why?), and that education has a strong poste tt" hourly wage, an unsurprising finding, i As (9.6.4) shows, ceteris paribus, the average hourly earnings of females #2 tt) about $2.36, and the average hourly earnings of nonwhite non-Hispanic workes lower by about $1.73, We now consider the results of model (9.6.2), which includes the interaction "7 Y= 026100 - 2.3606Dy— 1.790703 + 2.1209D,Dy + 08026 of '= (-02957)" "(5.4873)" (-2,1803)" (1.7420) (99085) Re=02032 1-58 nana where” indicates pvalues less than & percent and “indicates p values grea pe far 20u can see, the two additive dummies are stil statistically significa et dummy a 0 atthe conventional 5 percent level; the actual p valve! Marv? lummy Is about the 8 percent level, if you think this is a low enough proba! stat Sar et (2.85) can be interpreted as follows: Holding tne level of eduction ea the three dummy coefficients you will obtain: -1.964 2.3605 - oa thea groans that mean hourly wages of nonwhitemnon-Hispanic female ¥O about $1.96, which is betwee: ce al : tl + differen (ac tron ange neon 2 value of ~2.9605 (gnder | CHAP” : TER NINE: DUMMY VARIABLE REGRESS! oeis 32t 2 a a receding, example clearly reveals the role of interacti we note that in the model (9.6.5) we are assumin ihe e of hourly earnings with respect to education (of abe 4 . additional year of schooling) remains constant across gender and ace Ba sd not be the case. Ifyou want to test for this, you wil hav een th uferential slope coeiiens (see exercise 9.25 ee |AL_ ANALYSIS nomic time series based on moni abi patterns (regular oscillatory ee ey eels oa of : i nents). ples are sales of nt storés at Christmas and other major holiday times, d id fe ‘cash balances) by households at holiday times demand for ice monty and soft drinks during summer, prices of crops right after parvesting season, demand for air travel, etc. Often it is desirable to remove the fe sonal factor, OF component, from a time series so that one can concentrate on the other components, such as the trend.!2 The process of removing on. the sonal component from a time series is known as deseasonalization or seasonal adjustment, and the time series thus obtained is called the deseasonalized, or seasonally adjusted, time series. Important economic time series, such as the unemployment rate, the consumer price index (CPD), the producer's price index (PPI), and the index of industrial production, are treuslly published in seasonally adjusted form. \ neve are several methods of deseasonalizing a time series, but we will consider only one of these methods, namely, the method of dummy vari- _ ables. To illustrate how the dummy variables can be used to deseasonalize geonomic time series, consider the data given in Table 9.3. This table gives verly data for the years 1978-1995 on the sale of four major appliances, dishwashers, garbage disposers, refrigerators, and washing machines, all Geta in thousands of units. The table also gives data on durable goods expen- diture in 1982 billions of dollars. Wer lustrate the dummy technique, we will consider only the sales of re- frigerators over the sample period. But first Jet us look at the data, which is shown in Figure 9.4. This figure suggests that perhaps there is a seasonal pattern in the data associated with the various quarters. To see if this is the case, consider the following model: Y, = a1 Dy + 02 Dar + 3 Dy + 04 Dar +e (9.7.1) where ¥; = sales of refrigerators (in thousands) and the D’s are the dum- mies, taking a value of 1 in the relevant quarter and 0 otherwise. Note that TA time series may contain four components: 2 seasonal, a cyclical, trend, and one that is strictly random. For the various methods of 5 Elements of Forecasting, 2 ed., Soul seasonal adjustment, see, for instance, Franets X. Diebod. ‘eQvestern Publishers, 2001, Chap. 5 &T ONE TABLE 8.3 FIGURE 9.4 SINGLE-EQUATION REGRESSION MODELS QUARTERLY DATA ON APPLIANCE SALES (IN THOUSANDS) AND EXPENDITURE ON DURABLE GOODS (1978-1 TO 1985:1V) DISH DISPFRIG. WASH DUR DISH DISPFRIG wasn eee eee s 798 13171271 252.6 480 708 943 tg 837 1615 1295724 530 582 1175 tog 821 1662 1313—270.9 587 G59. 1269 igay 858 1205 «1150 273.9 602 897973 ig 837 12711289 268.9 658 86711021437 898 15851245 _262.9 749 860 1344 tg 632 1639 ©1270 270.9 827 91816411239 gig 1238 «« 1103 263.4 a58 1017122599 868 ©1277 -««1273-—«S «2606 aos = 108314291396 623 1258 «1031 231.9 840 955 16991228 66214171143 242.7 893 973 1749 t297 22 1185 1101 248.6 950 109611171199 a7i 11961181 258.7 838 1086. 12421299 7oi 14101116 248.4 884 990 1684 1349 759 «14171190 255.5 905 102817641323, 7349191125240. 909 © 100313281274 ‘Note: DISH = dishwashers: DISP = garbage dlsposers; FRIG = refrigerators: WASH = washing naan. DUR = durable goods expenditure, bitions of 1992 dollars. ‘Source: Business Statistics and Survey of Current Business, Deparimont of Commerce (vanous sues) 1800 ‘Thousands of units 8 1000 800 4 7 79 +80 81 82 83 Bd BS Ro Year Sales of refrigerators 1978-1985 (quarterly). to avoid the dunmny variable trap, we are assigning a dummy fo each qual of the year, but omitting the intercept term. Tf there is any seasonal ellectt) given quarter, that will be indicated by a statistically significant! value ott dummy coefficient for that quarter.'* "Note a technical point. This method of assigning a dummy to each quater the sonal factor, if present, is deterministic and not stochastic. We will te when we discuss time series econometrics in Part V of this book, ee) jit this Of CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 921 (Continued) TABLE OF ATOR SALES REGRE: REFRIGER: SSSION: ACTUAL, FIT QALUES (EQ. 9.7.9) ED, AND RESIDUAL Actual Fitted Residuals 0 1317 1222.12 94.875 jo7ell 1615 1467.50 447.500 jored 1662 1569.75 22.250 ye7ev 1298 1160.00 195.000 497941 1271 1222.12 ‘4a676 4979 1555 1467.50 87.500 49791 1639 1569.75, 69.250 ig7av (1288 4160.00 78,000 4980-1 127 1222.12 54.875 1980-1 1258 1467.50 209.500 4980-II1 1417 1569.75, ~152.750 1980-1V, 1185 1160.00, 25.000 4981-1 1196 1222.12 26.125 1981-ll 4410 1467.50 57.500 1981-lIL A417 1569.75, -152.750 4981-1V 919 1160.00 —241.000 4982-1 ~ 943 1222.12 279.125 1962-1 A175 1467.50 292.500 4982-1 1269 1569.75, -300.750 1982-1V 973 1160.00 —187,000 1983-1 4102 1222.12 120.125 1983-1 1344 1467.50 -123,500 1983-111 1641 1569.75 71.250 1983-1V 1225 1160.00 65.000 1984-1 1429 1222.12 206.875 1984-1 1699 1467.50 231.500 1984-IIl 1749 1569.75 179.250 1984-1V 1417 1160.00 = 43,000 1985-1 4242 1202.12 19.875 1985-11 1684 1467.50 216.500 4985-II 1764 1569.75 194.250 1985-V 1828 4160.00 168,000 (tamer ce 1 Ves Again, keep in mind that we are treating the first quarter as our base. As in (9.7.3), we see shat a Uitereriel intercap toeffictenta for the second and thie Guar are statistically dit- ferent from that of the first quarter, but the intercepts of the fourth quarter and the first quar terave statistically about the same. The coefficient of X (durable goods expenciture) of about 277 tells us that, allowing for seasonal effects, if expenditure on durable goods goes uP bya dollar, on average, sales of refrigerators go up by about 2.77 units, that is, approximately a ints; Dear in ming that refrigerators aro in thousands of Units ‘and X is in (1982) billions lars, (Continued)

You might also like