DE Assignment
DE Assignment
Assignment
Posted on February 20, 2024
Due on March 22, 2024 by 11:59 PM
Data Description:
The data uploaded on Google Classroom contain election results for state assembly
elections during the period 1978-2007. Each row gives information about one can-
didate running in a constituency in a state in a given election year. state_name is
the name of the state, year is the election year, constituency_no is an id for the
constituency, candidate contains the name of the candidate, party gives his/her
party affiliation, and position gives the ranking of the candidate in the election
based on votes. Therefore, the winner in the election has position = 1, and so
on. constituency_type tells us whether the constituency is open (“GEN”) or is
1
reserved for some group.1
Data Preparation:
Step 1 Create a dummy variable open that takes value one if a constituency in an
election is open and zero if it is reserved for any population group (SC, ST
etc.).
Step 2 Create a dummy variable Post that takes value one if the year is 1996 or later
and zero otherwise.
Step 3 Create a dummy variable indcand that takes value one if the candidate in an
election is an independent candidate. The variable party takes value “IND”
for independent candidates.
Step 4 For each constituency in each election, calculate the total number of indepen-
dent candidates. Store this information in a variable n_indcand. (This is the
only hard step in this exercise.
Step 5 For each constituency in each election, calculate the total number of candi-
dates. Store this information in a variable n_cand.
Data Analysis:
1.1 Plot in a graph the average number of total candidates (n_cand) and indepen-
dent candidates (n_indcand) in a constituency over the years. In the graph
the x-axis will have years and the y-axis will have number of candidates. There
will be two plots, one each for the two variables.
1.2 Compute the average number of independent candidates in the reservation and
open constituencies in the pre period. Compute the same in the post period.
You will have 4 averages to report.
1.3 What is the difference-in-difference estimate that you get using these averages?
Show the work using a table as described in the lecture slides.
1.4 Using the same approach, compute the difference-in-difference estimate for the
number of party candidates (i.e., for n_partycand).
1
Type of constituency is a feature of the constituency, and therefore, its value is repeated for
rows (candidates) that correspond to a particular constituency.
2
1.5 Run the following regression and report the output:
n_indcandct = α + β1 openc + β2 postt + β3 openc ∗ postt + ϵct
where n_indcandct is the number of independent candidates in constituency c
and election year t.
1.6 Why does openc not have the t subscript? Why does postt not have the c
subscript?
1.7 What is the estimate of β3 ? What is its interpretation? How does it line up
with your answer in [1.3]?
1.8 Run the following regression and report the output:
n_partycandct = α + β1 openc + β2 postt + β3 openc ∗ postt + ϵct
1.14 What is the estimate of β3 now? How does it compare with your answer in
[1.9]?
1.15 Does the exercise tell you anything about the effectiveness of the policy in
removing “frivolous” candidates?
3
Output:
Upload a single PDF file containing the code, the output of the regressions and an-
swers to the questions asked. Please write your name and division in the document.