0% found this document useful (0 votes)
147 views

Iacovone Exploring Data

This paper discusses the firm-level data collected by INEGI and their content. It also documents how the different surveys can be merged and their main limitations

Uploaded by

Victor Lugo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views

Iacovone Exploring Data

This paper discusses the firm-level data collected by INEGI and their content. It also documents how the different surveys can be merged and their main limitations

Uploaded by

Victor Lugo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Exploring Mexican Firm-Level Data

Leonardo Iacovone

April 12, 2008

Abstract

This paper discusses the firm-level data collected by INEGI and their content.
It also documents how the different surveys can be merged and their main
limitations.1

Keywords: Mexico, Plant-level data, INEGI, microdata

1 Introduction

The past ten years have witnessed an explosion in the number of studies using
micro-level data. This has been made possible by an unprecedented increase in the
availability of firm and household level longitudinal dataset. Mexico has a long tra-
dition of collecting firm-level data and this paper describes some of the principal
firm-level surveys produced by INEGI.2 For each survey we will discuss their con-
tent, sampling methodology, and the principal problems encountered when linking

University of Sussex and World Bank. Arts Building E, Falmer, Brighton BN1 9SN, United
Kingdom. Email: [email protected] and [email protected].
1
The authors are grateful to Gerardo Leyva and Abigail Duran for granting access to INEGI
data at the offices of INEGI in Aguascalientes under the commitment of complying with the
confidentiality requirements set by the laws of Mexico. We would like to thank all the INEGIs
employees who helped during the work at Aguascalientes, and express our special gratitude to
Alejandro Cano and Gabriel Romero whose patience and camaraderie helped and supported us
during this work. We also thank Araceli Martinez, Armando Arellanes, Ramon Sanchez, Otoniel
Soto, Candido Aguilar, Adriana Ramirez for their valuable comments and help. Leonardo Iacovone
gratefully acknowledges the ESRC and LENTISCO financial support, and Alan Winters, Gustavo
Crespi and Sherman Robinson for their guidance and support.
2
Mexican authorities are committed to develop a sound and complete statistical information
that is useful for analysing social and economic dynamics, as well as evaluate the impact of specific
policies. INEGI, Instituto Nacional de Estadistica Geografia y Informatica, is the institution
responsible to fulfil this task. INEGIs work provide an impressive amount of high quality data.

1
them. We will also carefully explain the procedures used for cleaning and deflating
the data. The main objective of this paper is to serve as a background to the rest
of the thesis, as the data described here are used in all the subsequent empirical
analysis. Furthermore, we also hope this paper will serve to enhance knowledge
about the Mexican sources of information available for further research and raise
awareness about the remarkable work done by INEGI.

During various months spent at INEGI, I had the chance to work directly with vari-
ous INEGI analysts, interview a number of INEGIs experts, consult various internal
methodologies for collecting, processing and revising the data, as well as analyse the
data directly. Some of the material revised is published by INEGI and available on-
line.3 However, a large part of it is not processed for public use and only accessible
at INEGI premises in Aguascalientes (INEGI 2002a, INEGI 2002b). INEGI does a
remarkable amount of work with a large number of analysts involved in monitoring
the quality of the information and processing it. I spent several months at INEGI,
in Aguascalientes, and worked directly with the data, supported by various INEGI
analysts, to ensure that I could adequately use this information and understand the
limitations of the data. This document is a distilled result of this work and of the
the work, on which we strongly relied, of INEGIs analysts.

The paper is divided into two sections, each one covering a different survey.

2 EIA

The Annual Industrial Survey (EIA4 henceforth) is the main, and oldest, survey
covering the manufacturing sector5 and was originally started in 1963, when it only
included 622 plants spread over 29 classes of activity.6 The original coverage was
increased in 1976, 1987 and 1994 when the classes of activities covered were respec-
tively expanded to 57, 129 and 205, and the number of plants surveyed enlarged to
1338, 3218 and 6867 respectively (see Table 1). Normally the expansion of the cov-
erage coincided with a new industrial census, at which point an updated and more
3
See INEGI website (http://www.inegi.gob.mx)
4
Encuesta Industrial Anual (or EIA).
5
It is important to notice that the maquiladoras are excluded from the EIA and their infor-
mation is collected by a specific, and separate, survey (i.e. Encuesta de la Industria Maquiladora
de Exportacion).
6
A class of activity is the most disaggregated level of industrial classification and is defined at
6 digits.

2
complete picture of the manufacturing sector was developed, which allowed new
activities and firms to emerge and be included, and which also allowed those that
were not no longer important to be excluded. As the diversification and development
of Mexican economy has naturally led to a diversification of the manufacturing ac-
tivities in Mexico, the coverage of EIA has also expanded over time. For this
reason, after the latest industrial census carried out in 2003, the latest EIA saw
a further expansion of the classes of activity covered to 231.7 It is important to
explain this point a little further. There are two distinct reasons that drive the ex-
pansion in the number of clases covered. The first driver is linked to the updating
of the industrial classification system with new class of activities introduced and
previously existing groups split into more detailed classes of activities. The second
driver, already mentioned above, is due to the diversification of the economy and
the emergence of new manufacturing activities previously not important enough to
justify their inclusion in the sampling.

Table 1: EIAs historical evolution (Source: INEGI)

N. of manufacturing activities covered N. of firms covered Based on Industrial Census


1962 29 622 1961
1975 57 1,338 1976
1986 129 3,218 1986
1993 205 6,867 1993

The unit of observation is the plant described as the manufacturing establishment


where the production takes place and each plant is classified in its respective class of
activity based on its principal product (at 6 digits level based on the CMAP8 ).

2.1 The old EIA

The old EIA covers the period 1984-1994. The number of classes of activity9 encom-
passed is 129 and the number of plants is 3,300. This is a balanced panel, exiting
firms having been excluded from the sample, and the questionnaire applied has been
maintained constant during the entire period.

7
When my fieldwork was carried out and I visited INEGI in Aguascalientes during the second
semester of 2005 the latest available year for which the EIA information had been processed was
2002. The new new EIA spanning over 231 classes started in 2003 and is therefore not analysed
here.
8
Mexican System of Classification for Productive Activities.
9
The system of classification is the CMAE75 (Clasificacion Mexicana de Actividades Economi-
cas).

3
The variables captured by the old EIA, similar to many other industrial surveys,
include the various inputs used by manufacturing plants (labour split into white
and blue collars, raw materials, intermediate inputs, energy consumption, industrial
services and maquila services, non-industrial services like technology transfers) and
the principal output indicators (value of production, value of sales, inventory, rev-
enues derived from industrial services like maquila and non-industrial services like
technology transfers). The old EIA also captures variables related to existing initial
capital stock at book value (divided machinery, building, land, and transport equip-
ment), depreciation of capital stock during the year (also divided by different type
of fixed assets), investments in new and used assets.
The principal difference between the old and new EIA is the absence of trade
related variables. In particular, in the old EIA there is no information on the
quantity of imported intermediate inputs or fixed assets. However, for certain years,
the World Bank financed a special effort to collect exports information for the firms
covered by the EIA. For this reason, we have information on the value of exports
for the period 1986-1990.

The sampling method is deterministic and aims at capturing the most represen-
tative classes of activities and the larger establishments, while the 1986 Industrial
Census was used as sampling frame.10

When we tried to link the old EIA with the new EIA only a sub-set of the plants
appear to be linkable. Out of the 3,300 firms present in the old EIA and about
6,800 plants from the new EIA, only 2,300 can be followed and linked across the
two surveys.11 Because in our analysis we will mostly focus on trade related vari-
ables and also because, if we were to use the linked dataset, we would be starting
our sample with one third of the plants captured by the new EIA we will perform
our analysis using only the new EIA. A further reason that explains our choice
to use only the new EIA is due to the availability of tariff data. While for the
period 1993-2002 there is availability of disaggregated, i.e. at six digits, tariff data,
for the previous period the availability of tariff information is reduced and we could
only obtain tariffs at 2 or 3 digits of disaggregation. Finally, in the case of the old
EIA there is not availability of detailed product-level information, which is crucial
in our analysis.

10
See section 2.2.1 for more details on the sampling methods.
11
The reasons of this mismatch could be various: (a) differences in the coverage of the two
surveys, (b) problems with the use of the plant identifier.

4
2.2 The new EIA

The new EIA started in 1993 (in correspondence with a new economic census)
and implied a very important improvement over the old EIA both in quantitative
and qualitative terms.

The system of classification used for the new EIA is the CMAP9412 (Clasificacion
Mexicana de Actividades y Productos) . The first digit indicates the sector (the EIA
only includes firms which fall in the sector 3 corresponding to the manufacturing
sector). The second digit indicates the division (sub-sector or division), e.g. 31
indicates food products, beverages and tobacco. The first four digits identify
the rama (sub-division branch) of activity. Finally, CMAP at six digit indicates
the clase of activity (e.g. 311203 indicates preparation of condensed milk,
evaporated milk and in powder). The classification level that is used for the EIA
is the CMAP at 6 digits, which allows us to identify the respective activities at a
high degree of disaggregation.

2.2.1 Coverage and sampling structure

As previously mentioned, the new EIA spans over the period 1993-2002 and is very
similar to the old EIA in terms of the sampling methodology. Here we describe it in
detail.

First INEGI selected the manufacturing activities13 (clases) to be included in the


following way:

1. Based on the industrial census of 1993, the various classes of activity are ranked
in decreasing order based on their total value of production measured at the
factory gate price.

2. The most important activities, which jointly represent 85 percent of the total
manufacturing output, are selected.

3. Finally, some other classes of special interest for defining the national accounts
are added, even if their contribution in terms of industrial output does not
justify their inclusion.
12
This system of classification can be harmonised with international systems as the SITC, re-
gional systems as SCIAN, and other Mexican systems adopted in the past as the SCNM. For this
purpose INEGI has developed appropriate tables of concordance
13
Each manufacturing activity is captured by a specific six digits code.

5
Second, INEGI proceeds to the selection of the plants within each one of the already
chosen activity classes:

1. Plants are ranked in decreasing order based on their total production value,
measured at the factory gate price.

2. Plants are added to the sample until the set of the selected plants covers
approximately 85 percent of the respective classs output value.

3. All plants with 100 or more employees are included automatically, regardless
of the 85 percent threshold having already been reached.14

4. For the highly disaggregated classes15 , whenever the normal sampling proce-
dure implies that more than 120 plants need to be surveyed to reach the 85
percent threshold, the number of plants surveyed is kept to a maximum of 120.
In fact, for highly disaggregated sectors the actual coverage is at about 60
percent of the total manufacturing output of the respective class.

5. For the highly concentrated classes,16 where the 85 percent threshold is reached
by covering less than 15 plants, then all the plants are included.

As already mentioned, the new EIA covers 205 of the 309 6-digits classes of the
CMAP-1994. The ramas covered are all 50 included in the CMAP 1994, and the
number of divisions covered are all the 9 subsectors included in the CMAP 1994.
The number of firms covered in 1993 is 6,861 and it decreases over time because of
attrition. Furthermore, it is important to note that entering firms are captured in a
non systematic way (See section 2.4.3 for more details on entry).

As a consequence of this sampling method, the EIA is clearly skewed towards larger
firms. In fact, while the 1993 census covered 106,748 plants, the number of plants
covered by the EIA is equal to 6.5 percent of the total number of plants covered
in the Census. Nevertheless, this represents about 85 percent of the total Mexican
industrial output.
As reported by Table 2, the average plant in the EIA has 188 employees, and about
a quarter of the plants surveyed are large and have in average 423 employees.

Once a firm is included in the sample, then it is classified on the basis of its princi-
14
This means that the EIA is in reality a census for plants with more than 100 employees.
15
These are classes of activities characterised by plants with small size and a high number of
manufacturing establishments (e.g. textile, footwear, etc.)
16
These are classes of activities characterised by a reduced number of large plants or in other
words sectors where the industrial concentration is higher (e.g. chemical, machineries, etc.

6
Table 2: Average Size and Stratification
Stratum Mean No. Employees Median No. Employees No. Plants

Small 50 48 3354
Medium 160 152 1908
Large 423 541 1463
All 188 101 6725
Notes
- Based on EIA 1993
- Based on INEGI stratification small plants have less than 100 and
more than 15 employees, medium plants have more than 100 and less
than 250 employees, and large plants have more than 250 employees.
- We are excluding extreme observations and missing values.

pal product. Each establishment then fills in one questionnaire. However, another
possibility is that, in a few cases, the owner of a plant is also the owner of other
plants producing the same product. In this case, he can request to aggregate the
information of the different plants into one questionnaire only as if it was one
unique large plant.17 Since 1997, INEGI has started to identify those plants that
concentrate information of multiple establishments. There is a possibility that there
are concentradoras in the years pre-1997, which remain in the sample, although
we are unable to identify them. The same applies also to concentradas plants:
pre-1997, there could also be some of these firms. However, these firms are in prin-
ciple easier to identify because they would appear with all zero values but still be
maintained in the sample. We identify both the concentradoras and concen-
tradas plants and are able to exclude them in our robustness checks.

2.3 Content

The EIA, similarly to other industrial surveys, contains the following variables re-
lated to labour force, inputs and costs, investment, output and revenues. In de-
tail:

Labour force related variables


17
These plants are known as concentradoras while the sub-units that are aggregated to this
plant are known as concentradas.

7
Total number of workers

Total wages and, separately, total social contributions paid

Total hours worked

Costs related variables

Costs of intermediate goods and materials split between domestic and


imported ones

Costs of packaging

Costs of fuel and lubricants

Costs for industrial services including maintenance and reparation, as


well as maquila services18

Costs for non-industrial services as commissions paid to retailers and


merchants, transport and distribution costs

Expenditures for technology transfers

Marketing and advertisement costs

R&D expenditures

Energy expenditures and quantity of energy consumed

Revenues related variables

Domestic and export sales

Total production value evaluated at average gate prices

Revenues for maquila services

Revenues for services of maintenance and reparation

Revenues for technology transfers

Inventories of intermediate goods, raw materials, finished and semi-finished


products

Book value of fixed assets, and investments split in different categories

Machineries acquired domestically and imported, split between new and


second-hand
18
Maquila services are a special type of sub-contracting services when the sub-contracted firm
receive all the inputs and materials to be processed

8
Buildings

Transport equipments

Others (i.e. office equipment)

Equipment for reducing and controlling pollution

Land

2.4 Additional Information

2.4.1 Ownership

It is important to note that while all the variables are obtained annually, the in-
formation on foreign ownership was collected only in 1994,19 with the Industrial
Census, and subsequentially dropped from the questionnaire. For 1993, we then
have the ownership share and the nationality of the plants owners.

Table 3: Firms Ownerhip


Ownership Number Percentage

Domestic 5,979 87.3


Foreign Participated 317 4.6
Foreign Owned 553 8.1

Notes:
Foreign participation is defined for plants
with foreign share smaller or equal than
50 percent.
Foreign ownership is defined for plants
with foreign share larger than 50 percent.

Unfortunately, because of its questionnaire design the EIA does not allow us to
identify plants that are part of a multi-plant complex, because there is no question
concerning ownership.20
19
The information is collected in 1994 but refers to 1993.
20
However a special project carried out in 2005 by the department administering the Monthly
Industrial Survey (EIM henceforth) identified those plants that were part of multi-plant firms (see
section 3.3.1 for more details).

9
2.4.2 Capital stock

Obtaining a correct measure of the capital stock is especially important, both when
we focus our attention on labour productivity and need to control for the capital
intensity of a plant, or when estimating total factor productivity. In order to obtain
a correct measure of the capital stock we proceeded in the following way.21 First,
from the 199422 Industrial Census we obtained the value of the capital stock at its
replacement value.23 However, the matching between the EIA and the Census is
imperfect and we were unable to match about 14 percent of the plants in the EIA.
For these plants we used the book value of their capital stock.

An alternative method explored to evaluate the initial capital stock is described


in appendix A. When comparing the capital stock estimated using this method
with the book value capital stock, we observe a correlation equal to .92 and the
distribution of the two capital stock is presented in Figure 1, where K2 is the capital
stock estimated while K3 is the capital stock at book value.

Because of the similarity between the two series of capital stock and because previ-
ous papers opted for using the capital stock calculated at book values we used the
book value capital stock (Verhoogen 2008, Lopez-Cordova 2003).

A question worth asking is if the subset of plants for which the initial capital stock
at its replacement value is missing is a random subset. We can try to answer this
question by comparing the capital stock, evaluated at its book value, for the two
groups of plants: the group for which we do not have the replacement value (dashed
line) and the group for which we do have its replacement value (continuous line) as
in Figure 2. We observe that it seems that the plants for which the capital stock at
its replacement value is missing appear to have a smaller stock of initial capital.24

21
In my work I have tried to do better than previous work using the Mexican data that either
only used the plants for which the replacement value of initial capital stock is available or simply
used for all plants its initial capital stock calculated at its book value. A second improvement with
respect to previous studies is due to the use of specific capital assets deflators.
22
Also in this case the information is collected in 1994 but refers to 1993
23
While in the EIA plants are asked to indicate the historical value, or book value, of their capital
stock, in the census the question explicitly asks for market replacement value of the capital stock.
24
Unfortunately based on discussions and interviews with INEGIs expert we could not find a
reason for this as INEGI officers insisted that the missing links between 1993 industrial census and
EIA can be considered random.

10
Figure 1: Distribution of estimated capital stock and book value capital stock

.2
.15
.1
.05
0

0 5 10 15
x

kdensity lnK2 kdensity lnK3

Figure 2: Distribution of capital stock at its book value


.25
.2
.15
.1
.05
0

0 5 10 15
Log of Capital Stock in 1993

Book Value for Plants with Missing Replacement Value

Book Value for Plants with Existing Replacement Value

11
Once we have obtained the initial value of capital stock for all plants, we can calculate
the value of the capital stock in the following years using the perpetuary inventory
method formula

kt+1 = kt (1 ) + It t [1993, 2001] (1)

The deprecitation rates, , chosen are equal to the mean depreciation rate offered
by fiscal authorities (see Table 4).

Table 4: Depreciation Rates

Type of Fixed Assets Fiscal Depreciation Band Applied Depreciation Rate

Machineries and equipment 5-15% 10%


Buildings 3-8% 5.5%
Transport equipment 15-25% 20%
Office equipment and others 7-35% 21%
Anti-Contamination equipment 5-50% 27.5%

Finally, we need to deflate this nominal values and transform them into constant
1994 peso prices. With this purpose, we applied assets specific deflators obtained
from Banco de Mexico for each one of the five different types of fixed assets: ma-
chineries, buildings, office equipment, transportation equipment, land.

2.4.3 Entry and Exit

INEGI tries to refresh the EIA sample by including new firms that are created. How-
ever, the identification of these firms, new entries into the EIA panel, is not done in
a systematic way. There is one specific department within INEGI that is in charge
of updating the sample and this has traditionally been done by relying mostly on
local and national media. Also, whenever the number of firms included in a class of
activities shrunk below 8 then, in order to satisfy the confidentiality requirements
of the survey, this department actively looked for new firms established. Unfortu-
nately, there is no formal agreement between INEGI and other Mexican institutions
maintaining an updated administrative register of existing firms that would allow a
continuous refreshing of the sample and also give a picture of new entries.25
25
Since 2005, INEGI has started to make use of administrative registries of Ministry of Finance
to refresh the sampling framework and try to better capture entry. However, this new method is

12
A very unique and interesting feature of the new EIA is the way it captures exits.
The plants are not eliminated from the sample at the precise moment when they
become inactive, but are kept as suspended for 2 years in a type of stand-by
mode.26 After two years of suspension, the plant exits the sample and the causes
of this exit are recorded in detail.27 This information on exit is kept separately and
we merged this with the main panel.

2.5 Data Management

2.5.1 Linking multiple waves and building a panel

Each plant surveyed by the EIA was assigned in 1993 an identifier composed of its
6-digit class of activity and an additional 4-digit code (folio). Jointly, these two
codes allow us to uniquely identify each plant and follow it over time. We build the
panel using this 10-digit unique plant identifier.

Whenever a plant closes down its identifier disappears and it is not used again.
Analogously, whenever a new plant is included in the sample, it is assigned the
corresponding 6-digit class code, based on what it produces, and also assigned a
new 4-digit folio.

2.5.2 Deflating Variables

All variables reported in the EIA are in current nominal values so it was neces-
sary to transform them into constant real values. In order to do this appropriately
we used different deflators and transformed all nominal values into constant 1994
peso.28

The domestic sales were deflated using the price-producer index at 6 digits pro-
applied only to the new new EIA survey starting in 2003.
26
The rationale is that the suspension could just be temporary.
27
The causes behind an exit can be: merger, switching of class of activity, change of activity,
change of trade name, disappeared, information reported by another plant, duplicated, admin-
istrative merger, strike, liquidation, export maquila, domestic maquila, bankruptcy, unwilling to
provide information, accident, suspension of operations.
28
Most price deflators, except the ones for fixed assets, can be directly downloaded from Banco
de Mexico (www.banxico.org.mx.

13
vided by Banco de Mexico.29 Similarly, net inventories and maquila revenues were
deflated using the same price-producer index at 6 digit.

The export sales were deflated using the export-producer index at 2 digit provided
by Banco de Mexico.

The labour costs were deflated using the consumer-price index provided by Banco
de Mexico with base year 1994.30

The domestic intermediate inputs were deflated using the 4 digit intermediate
inputs price index published by Banco de Mexico.

To deflate the imported intermediate inputs we used the US intermediate inputs


price deflator for exported non-agricultural supplies and materials (excluding fuels
and building materials) adjusted for the exchange rate fluctuations.31

2.5.3 Data cleaning

We already mentioned that the EIA does not include the export maquiladoras.32
For this reason, we want to tackle the possibility that by mistake some maquilado-
ras have been included in the sample. To address this potential error we defined
as maquiladora any plant that exports all its production and imports all its in-
puts. We identify and exclude from the panel all the firms that appear as potential
maquiladoras, even just for one year. There are eventually only 15 plants identified
as potential maquiladoras.

As previously explained, some firms are included in the sample as entry. However,
this is not done in a systematic way. For this reason we identify these plants in order
to be able to exclude them when evaluating the robustness of our results.

Another issue to be resolved is the presence of extreme values. To resolve this


29
Banco de Mexico classifies the economic activities using the CMAE, Mexican classification of
economic activities. Therefore it was necessary to first match the 6-digit CMAE clases with the
respective CMAP clases, this was possible because INEGI has developed an appropriate conversion
table
30
Ideally we would have preferred to use wholesale prices because this would avoid to incorporate
into the deflators issues related to imperfect competition and market power of the retail sector but
we were unable to obtain this price index.
31
This can be downloaded from Bureau of Labor Statistics.
32
These are firms that benefit of a special system of tax exemptions because import most of
intermediate inputs and export most of their output.

14
problem the common solution is some type of trimming. The two most common
options in the literature discussed by Angrist and Krueger (1999) are winsorizing
and truncating. Winsorizing consists of setting the observations in the top and
bottom deciles, for instance 5th and 95th decile, precisely equal to the value of the
observation at the 5th and 95th decile. Truncating consists in eliminating altogether
the observations in the extreme deciles. As a general rule of thumb, Angrist and
Krueger (1999) suggest that winsorizing should be preferred when the extreme val-
ues are exaggerated versions of the true values, but the true values still lie in the
tails, whilst truncating should be used when extreme values are pure mistakes that
do not bear any resemblance to the true values. During the period spent at INEGI,
I had the chance to discuss these issues with those responsible for the survey, as
well as with the analysts in charge of ensuring the reliability of the data. Based
on these discussions, the evaluation of the internal data revision process, and the
direct analysis of the data we came to the conclusion that the quality of information
collected by INEGI could be considered reliable. However, because we were con-
cerned with the possibility of extreme values being due to mistakes in collecting or
inputting the data, we identified these potential extreme values by truncating the
top and bottom 1 percent of the observations. These observations were then flagged
and excluded during the robustness checks.

Finally, in order to confirm that there were no mistakes in the data we ran a set of
identity checks to confirm that:

the value of total sales is equal to the sum of domestic plus export sales;

the value of total intermediates is equal to the sum of domestic plus imported
intermediates; and

the value of total costs is equal to the sum each one of the individual costs;

Whenever any of these identity checks failed we discussed the problem with INEGIs
analysts and, when they could not provide a solution or an explanation, we flagged
the observation in order to exclude it during our robustness checks.33
33
A remarkable characteristic of the way the EIA is administered is that each analyst is allocated
a certain number of plants (on average 150) to follow up and every year he is in charge of analysing
the responses of the same plants. Whenever the responses appear out of line with the previous
years or what would be reasonable the analyst calls the plant and confirm the results. Indeed,
in one case, when running some checks on the data, we found that the employment of a plant
had dropped from one year to another of more than 50 percent and thought this was a mistake.
However, when we were able to meet the responsible analyst he explained me that this was not a
mistake but it was due to a strike that had paralysed the plant for more than six months.

15
3 EIM

The EIM34 is a monthly survey that is collected by INEGI to monitor short-term


trends. Traditionally, the survey has been run in parallel with the EIA and covers
the same plants. The principal differences with EIA are its periodicity and the
variables collected. Also, within INEGI two different departments are responsible
for collecting, processing and analysing the two surveys.

3.1 Historical Evolution and Coverage

Similar to the EIA, the Mexican authorities started collecting monthly industrial
data in 1964. However, the number of manufacturing activities covered was initially
limited. This was the case until 1987 when the EIM was expanded to cover 129
clases and an initial sample of 3,218 firms. This was expanded even further in
1994 with an initial sample35 of 6,884 firms covering 205 clases.

The number of firms decreases over time because of attrition. For details see Table
5

Table 5: Number of firms in the EIM 1994-2002


Year 1994 1995 1996 1997 1998 1999 2000 2001 2002

Number of firms 6,711 6,683 6,608 6,350 6,008 5,753 5,551 5,378 5,173

There are some cases of mis-matching between the EIA and EIM regarding the num-
ber of plants covered because of the timing of these two surveys (see section 3.4.1
for more details).

Similarly, as it occurs for the EIA, in the case of the EIM INEGI runs a number of
filters to check the data obtained from the respondents. Each analyst is responsible
for analysing every month the same plants (about 150 plants) and, whenever the re-
sponses fall outside the expected ranges, they call the respondent and doublecheck
the information. In certain cases, when errors and inconsistencies are discovered af-
ter some delay, the information provided in the previous months is revised and
updated.
34
Encuesta Industrial Mensual
35
As for the EIA the sample base is provided by the 1993 industrial census and the sample covers
about 85 percent of the included industrial activities. For more detail about sampling process see
the section 2.2.1 as the sample structure of these two surveys is identical

16
3.2 Content

The EIM contains fundamentally two group of variables: labour-force related and
output related variables. In detail:

Labour-force related variables

Total number of workers broken down into blue collars (obreros) and
white collars ((empleados)

Total wages, net of the social contributions, broken down as the labour
force

Social contributions paid to workers broken down as the labour force

Total hours worked broken down as the labour force

Output related variables

Revenues from maquila services

Revenues from services of maintenance and assistance

Total production

Net sales

Export sales

Installed capacity usage

It is important to make two remarks with respect to the variables capturing pro-
duction, sales and exports. First, the plants are asked to report both values and
quantities, therefore an implicit average unit price can be calculated. Second, for
these variables the plant is requested to distinguish each one of its products, so
these variables are reported product by product. In 1993, INEGI defined a list of
products for each 6-digit class36 from which the plant can choose. However, if the
product is not in the list then it is recorded as other non-generic products or
residues and sub-products. However, the weight of these two residual categories
is negligible for most of firms (i.e. less than 2 percent in average). In table 6 we
show in the first two columns the average and the median weights of these residuals
for sold products, exported products and produced products. Also, for those plants
having a relatively high share of residual products, we can see this is not a major
problem. For plants in the 90th and 95th percentile the residuals are never above
8 percent of the total output. Only for plants above the 99th percentile does this
36
This list was developed based on the census and previous surveys.

17
appear to be a serious issue, because the weight of their residual products is equal
to about one third of their output. These plants are identified and excluded in our
robustness checks.

Table 6: Weight of residuals products

Mean Median 90th Pctile 95th Pctile 99th Pctile


Sold Products 1% 0 2% 8% 27%
Exported Products 1% 0 0 6% 33%
Produced Products 1% 0 2% 8% 27%
Source: EIM, INEGI

3.3 Additional Information

3.3.1 Ownership

Normally, the EIM questionnaire does not contain any information regarding own-
ership. However, during 2005-2006 a special module was run and plants were asked
if they were part of a multi-plant complex or single-plant. This information can be
linked to the main EIM variables using the plant identifiers. We do so for 2003 and
the number of multi-plant firms by sector is detailed in Table 7: there are 458 multi-
plant firms with an average number of three plants per firm, and 3,791 single-plant
firms.

Table 7: Multiple- and single-plant firms

No. of Firms No. of Plants Average No. of Plants

Multiplant Firms 458 1245 3


Single-Plant Firms 3791 3791 1
Notes
- Based on 2003 data
- Based on trimmed data

18
3.4 Data Management

3.4.1 Panel Creation

In the EIM, as in the yearly industrial survey (EIA), a firm can be tracked over time
using a unique plant identifier. This is built in the same way as for the EIA, and
actually coincides with it (see section 2.5.1 for details). Based on these identifiers a
panel using the individual monthly EIM can be built.

Having built the panel, we dropped the observations relative to the residual cate-
gories (see previous sub-section). The result of this is a panel with 187,533 observa-
tions where we have multiple observations per plant given plants normally produce
multiple products. The number of plant and products present in the panel is re-
ported in the table.

Having built the panel, we annualised the information provided by the EIM in or-
der to link this panel with the EIA panel. We are able to link these two surveys
using the same plant identifier. However, the matching between the two surveys is
imperfect because of their timing. The information for the EIA is collected in the
following year during the period between April and July, while the information for
the EIM is collected during the following month.37 The resulting panel that merges
the information from the EIA and EIM will be our main dataset and we report in
table 8 the number of plants and products present in this panel.

3.4.2 Data Cleaning

Also as described for the EIA, we apply an analogous trimming to the main variables
of interest obtained from the EIM:

Domestic unit values equal to the ratio of domestic sales revenues and domestic
quantities sold

Export unit values equal to the ratio of export sales revenues and export
quantities sold
37
For example, one plant can be operating until March 2000 and then closed, in which case it
would be captured by the EIM during all 1999 and the first two months of 2000 but would not
appear in the data of both the EIA-1999 and EIA-2000 as the information of the latter two is
captured respectively between April and July of 2000 and 2001.

19
Table 8: Number of plants and products - merged EIA and EIM panel
Year No. of Plants No. of Products
All Exporting Sold Exported
1994 6,299 1,586 19,314 2,857
1995 6,070 1,880 19,284 3,526
1996 5,786 2,061 18,229 3,989
1997 5,572 2,161 17,325 4,186
1998 5,400 2,106 16,761 4,269
1999 5,255 1,967 16,226 3,962
2000 5,118 1,914 15,522 3,796
2001 4,952 1,780 14,924 3,555
2002 4,782 1,696 14,404 3,357

Domestic sales, export sales, total sales per product

Number of employees and wages paid to blue collars

Number of employees and wages paid to white collars

We flag these observations and exclude them in our robustness checks. In the case
of product-level unit values, because we are particularly concerned with noise and
errors at such a level of disaggregation, we also flag those cases where their yearly
increase is larger than 300 percent or their yearly decrease larger than 65 percent
(basically a boom or drop larger then one third the unit value).

4 Conclusion

In this paper I have described the source of information that will be used in the
subsequent empirical analysis. As it emerged from this description, the Mexican
plant-level data provide an extremely rich dataset with detailed information not
only at plant but also product level.

20
References

Angrist, J. D., and A. B. Krueger (1999): Empirical Strategies in Labor Eco-


nomics, in Handbook of labor economics, vol. 3A (1999), pp. 12771366. Elsevier
Science, North-Holland, New York and Oxford.

INEGI (2002a): Sntesis Metodologica de la Encuesta Industrial Anual.

(2002b): Sntesis Metodologica de la Encuesta Industrial Mensual.

Lopez-Cordova, E. (2003): NAFTA and Manufacturing Productivity in Mex-


ico, Economia: Journal of the Latin American and Caribbean Economic Associ-
ation, 4(1), 5588.

Verhoogen, E. A. (2008): Trade, Quality Upgrading and Wage Inequality in


the Mexican Manufacturing Sector, Quarterly Journal of Economics, 123(2).

21
A Methodology to calculate the initial capital stock
using perpetuary inventory method

An alternative method used to calculate the capital stock, in the absence of the
initial capital stock, exploits the perpetuary inventory method (henceforth PIM).

Based on the PIM the capital stock at time t is equal to

Kt = It + (1 )Kt1 (2)

Consequently, in order to calculate the capital stock at time t we need three vari-
ables: Kt1 , (depreciation rate), and It . Normally It is reported in the survey.
is a parameter and it is given exogenously.

Complications arise in order to obtain Kt1 . In the case of EIA, we acquire this
variable from the Industrial Census for most of the plants.

When there is a large number of firms for which the capital stock is never reported
this must be input in some manner. One possible metholodogy is the following:

1. Calculate investment at sectoral level 38 using the aggregate investment series


for all the available years (in our case for the period 1993-2002)

2. On the basis of this investment series we can get the initial sectoral capital
stock as

I0j
K0j = (3)
+ gj

where j is the 4-digit industry, 0 is the initial year for which we have the
investment, and g growth rate of the capital stock over the entire period 1993-
2002

1 Itj I0j
gj = (4)
t I0j
38
At 4 digits

22
3. We obtain gj from the regression

lnIijt = + gj t (5)

where Iijt is the log of the investment of plant i belonging to sector j.

4. Once we have estimated the initial capital stock for each sector we need to
assign the appropriate capital stock to each individual plant i in sector j. We
do so by applying an appropriate weight that in most of the case is either
the electricity or the intermediates inputs consumed. Because in the EIA we
observe more missing values for the electricity variable, we opted for the value
of intermediate inputs consumed (m0i )

K0ij = Kj w0i Where w0i = P m0i (6)


pj m0p

23

You might also like