The policy evaluator’s toolbox: Presenting your next regression discontinuity analysis

When evaluating public policies, regression discontinuity designs are impossible to live without. Here is how to do one.
statistical analysis
policy evaluation
Author

Paw Hansen

Published

June 1, 2023

In the previous post of my “policy evaluator’s toolbox” series, I showed you two basic ways of plotting results from randomized experiments. A randomized experiment should be the design you are going for whenever possible since it allows for easy estimation and clean causal inference.

In practice, though, randomized experiments may not be feasible. Often policymakers are unwilling to assign policies at random, so the question becomes what we can do in those cases if we still want to evaluate if a policy had an effect.

Today, we’ll have a look at one widely-used approach: the regression discontinuity design. Here is the definition I got when asking ChatGPT:

“A regression discontinuity design (RDD) is a quasi-experimental research design used to estimate causal effects by taking advantage of a discontinuity in the treatment assignment process. The basic logic behind RDD is that when a certain threshold or cutoff point is used to determine whether an individual receives a treatment or not, individuals on either side of the cutoff are expected to be similar in all other relevant aspects, except for the treatment itself. By comparing the outcomes of individuals just above and just below the cutoff, researchers can attribute any differences to the treatment rather than other confounding factors.”

This is not a bad definition but I do want to emphasize one core assumptions: the assignment variable should be continuous or “smooth” around the cutoff.

In my field of public welfare, many such cutoff variables exist:

Finally, a popular threshold is test scores, which we will work with today. Specifically, we’ll use the Chile Schools data taken from (the highly recommendable) Regression and Other Stories. Let’s get our packages and load up the data:

Packages used in this post
library(tidyverse)
library(broom)
library(ggsci)
library(readr)
library(kableExtra)

theme_set(theme_minimal(base_size = 12))

chile <- read_csv("chile.csv")

The Chile Schools data is a good example of something you will often encounter when doing policy evaluations. In 1992, the Chilean government wanted to improve poor-performing schools. To do so, the government assigned extra resources (teacher training and after-school tutoring) to all schools scoring lower than a predefined cutoff on the average fourth-grade test score. This assignment is clearly not random. In fact, we can be quite certain that the schools that received the extra resources are different from those that did not (that is why those schools received extra help!).

However, as we move closer to the cutoff, it seems more and more likely that the schools falling on either side are not that different. For example, say the cutoff was 60 points on a 100-point scale. Two schools with an average score of 30 and 70 are clearly different, and it makes no sense to compare those. But what about two schools scoring 58 and 62? Do we really believe that those schools differ fundamentally from each other? The key assumption we make in a regression discontinuity design is that the units we look at on either side of the cutoff are as-if random.

Plot 1: Showing the raw data

A good place to start is to show the raw data on either side of the cutoff. In the Figure 1, all units above the threshold (scaled to zero) received extra help, while those below did not.

Code
# Make a variable saying if schools received the treatment or not
chile <-
  chile %>% 
  mutate(condition = ifelse(rule2 < 0, "Treatment", "Control")) 


chile %>% 
  ggplot(aes(rule2, read92)) + 
  geom_vline(xintercept = 0, size = 1, color = "grey50", linetype = "dashed") + 
  geom_point(aes(color = condition), alpha = .6) + 
  scale_color_jama() + 
  labs(x = "Assignment variable",
       y = "Reading test score") + 
  theme(legend.position = "none")

Figure 1: Reading test scores in 1992 by assignment variable. The assignment variable is centered reading test scores in 1988. Having a score less than 0 meant that schools would be included in the program.

However, since we wanted schools to be “alike” (in terms of potential outcomes), we should probably focus the graph on those units that are close to the threshold. Let’s define a cutoff range from -5 through 5 and make a new plot (Figure 2):

Code
chile_sub <- 
  chile %>% 
  filter(rule2 < 5, rule2 > -5) 

chile_sub %>% 
  ggplot(aes(rule2, read92)) + 
  geom_vline(xintercept = 0, 
             size = 1, 
             color = "grey50", 
             linetype = "dashed") + 
  geom_point(aes(color = condition)) + 
 scale_color_jama() + 
  labs(x = "Assignment variable",
       y = "Reading test score") + 
  theme(legend.position = "none")

Figure 2: Reading test scores in 1992 by assignment variable. In this version of the graph, only schools close to the cutoff (-5 through 5) are plotted.

And we can even make this plot a bit clearer if we use binned averages instead of showing all data points:

Code
chile_sub %>% 
  ggplot(aes(rule2, read92)) + 
  geom_vline(xintercept = 0, 
             size = 1, 
             color = "grey50", 
             linetype = "dashed") + 
  stat_summary_bin(aes(color = condition), 
                   fun = "mean", 
                   size = 2, 
                   geom = "point") + 
  scale_color_jama() + 
  labs(x = "Assignment variable",
       y = "Reading test score") + 
  theme(legend.position = "none")

Figure 3: Reading test scores in 1992 by assignment variable. In this version of the graph, schools are plotted using binned averages on the assignment variable.

Plot 2: Adding the model

Showing the data (to others but also yourself) is always a good place to start and a key part of of exploratory data analysis. However, to answer the question of interest (did assigning extra help to schools improve reading?), we need a model.

Luckily, the modeling part of a regression discontinuity analysis is straightforward: We simply include the condition and the assignment variable as predictors in our regression model:

Code
chile_rs <- 
  lm(read92 ~ condition + rule2, data = chile_sub) 

chile_rs %>% 
  tidy(conf.int=T,) %>% 
  select(-c("statistic", "std.error", "p.value")) %>% 
  mutate(term = c("Intercept", "In program", "Assignment variable")) %>% 
  kbl(col.names = c("Term", "Estimate", "Lower bound", "Upper bound"),
      digits = 2)
Table 1: Effect of being in policy program
Term Estimate Lower bound Upper bound
Intercept 59.83 58.80 60.85
In program 2.15 0.27 4.04
Assignment variable 0.95 0.61 1.28

Thus, the estimated effect of being in the program is about 2.15 points.

To understand this result better, and to present it in an intuitive way to stakeholders, we can plot the model on either side of the cutoff (i.e. the regression line):

Code
chile_sub %>% 
  ggplot(aes(rule2, read92)) + 
  geom_vline(xintercept = 0, size = 1, color = "grey50", linetype = "dashed") + 
  geom_point(aes(color = condition), alpha = .6) + 
  geom_smooth(
    data = chile_sub %>%  filter(rule2 < 0),
    method = "lm",
    color = "black",
    se = F) +
  geom_smooth(
    data = chile_sub %>%  filter(rule2 > 0),
    method = "lm",
    color = "black",
    se = F) +
  annotate(geom="text", 
           x  = -2.5, 
           y = 33, 
           color = "#DF8F44FF", 
           label = "Included in program") + 
    annotate(geom="text", 
           x  = 2.5, 
           y = 33, 
           color = "#374E55FF", 
           label = "Not included in program") + 
  scale_color_jama() + 
  labs(x = "Assignment variable",
       y = "Reading test score") + 
  theme(legend.position = "none")

Figure 4: Reading test scores in 1992 by assignment variable. In this version of the graph, the fitted model on either side of the threshold is overlaid on the raw data.

The estimated effect of 2.15 points corresponds to the difference (the “kink”) between the two lines.

Building the final model

Finally, in most real-world applications, it makes sense to include pre-treatment variables that strongly predict the outcome since this can make our model estimates more precise. In our example, we have schoolchildren’s math and reading scores in 1988, which is probably useful when trying to predict reading scores in 1992. Let’s include those:

Code
chile_rs_final <- 
  lm(read92 ~ condition + rule2 + read88 + math88, data = chile_sub) 

chile_rs_final %>% 
  tidy(conf.int=T,) %>% 
  select(-c("statistic", "std.error", "p.value")) %>% 
  mutate(term = c("Intercept", 
                  "In program", 
                  "Assignment variable",
                  "Reading score in 1988",
                  "Math score in 1988")) %>% 
  kbl(col.names = c("Term", "Estimate", "Lower bound", "Upper bound"),
      digits = 2) 
Table 2: Effect of being in policy program, covariate adjusted
Term Estimate Lower bound Upper bound
Intercept 23.35 14.84 31.85
In program 2.08 0.28 3.88
Assignment variable 0.14 -0.22 0.51
Reading score in 1988 0.61 0.46 0.75
Math score in 1988 0.16 -0.01 0.32

As can be seen, our confidence intervals are now slightly narrower, reflecting the extra information we have included in our analysis.

Final thoughts: plotting your next regression discontinuity analysis

Regression discontinuity designs (RDD) are a popular way of evaluating policies when randomization b y the researcher is not an option. When doing RDD, we look for naturally occurring cutoffs that split units into a treatment and a control group.

Cutoff variables should be continuous. Finally, be aware that RDD is quasi-experimental. It is not a randomized experiment, and it is on you to prove that the assumptions hold.

Cool! Where can I learn more?

  • Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and other stories. Cambridge University Press.
  • Cunningham, S. (2021). Causal inference: The mixtape. Yale university press.