Bayesian analysis of a randomized controlled trial II: Defining and validating the model

Part two of my three-part series on Bayesian analysis of randomized controlled trials. You’ve built a model but is it any good?
Bayesian modeling
Author

Paw Hansen

Published

July 5, 2023

Welcome to the second post in my brief series on getting started with Bayesian modeling in R. In my last post, we covered specifying the priors to take into account any prior knowledge.

Today, we’ll try to validate the models we built. As we are working with logistic regression, we’ll focus on two questions Johnson, Ott, and Dogucu (2022):

  1. Have our simulation stabilized?
  2. How wrong is the model?
  3. How accurate are the model’s posterior classifications?
Packages used in this post
library(tidyverse)
library(rstanarm)
library(bayesrules)
library(tidybayes)
library(bayesplot)
library(kableExtra)
library(patchwork)

theme_set(theme_minimal(base_size = 12))

Recall our two models from my previous post:

Check #1: Have our simulations stabilized?

Before moving on, we should check the stability of our simulations.

This is easy to do using mcmc_trace from the bayesrules packages.

Let’s first check the model using a weakly informative prior:

Code
# MCMC trace, density, & autocorrelation plots - weakly informative prior 
mcmc_trace(fail_model_weakinf) + scale_x_continuous(breaks = c(0, 5000))
mcmc_dens_overlay(fail_model_weakinf)
mcmc_acf(fail_model_weakinf)

(a) Traceplot

(b) Density of posterior simulated coefficients

(c) Autocorrelation

Figure 1: Diagnostic plots for the stability of our simulation results concerning the model using weakly informative priors. All looks good.

All looks good. For the evidence based model…

Code
# MCMC trace, density, & autocorrelation plots - evidence based model
mcmc_trace(fail_model_evidence) + scale_x_continuous(breaks = c(0, 5000))
mcmc_dens_overlay(fail_model_evidence)
mcmc_acf(fail_model_evidence)

(a) Traceplot

(b) Density of posterior simulated coefficients

(c) Autocorrelation

Figure 2: Diagnostic plots for the stability of our simulation results concerning the model using evidencebased priors. All looks good.

same.

All set, let’s get on with model validation!

Check #2: How well does the model fit the data?

To see this, we simulate 100 data sets from our posterior distribution. For each data set, we then calculate the number of failed tests to see if this matches up with that of the original data.

Code
calc_prop_fail <- function(x) {mean(x == 1)
}
Code
pp_check_model_weakinf <- 
  pp_check(fail_model_weakinf, 
           plotfun = "stat", 
           stat = "calc_prop_fail",
           seed = 2307) + 
  xlab("Share of failed reading tests") +
  xlim(0,1) + 
  theme(legend.position = "none")

pp_check_model_evidence <-
  pp_check(fail_model_evidence, 
           plotfun = "stat", 
           stat = "calc_prop_fail",
           seed = 2307) + 
  xlab("Share of failed reading tests") +
  xlim(0,1) + 
  theme(legend.position = "none")

pp_check_model_weakinf / pp_check_model_evidence + plot_annotation(tag_levels = 'A')

Posterior predictve checks of the two models. Histograms show the proportion of failed students in a number of simulated datasets based on each model, whereas the dark blue lines mark the real proportion of failed students in the data. (a) With a weakly informative prior, the simulated data sets are unwilling to say much about the share of failed reading tests. (b) In contrast, using an evidence-based prior results in posterior simulations that resemble the original data well.

Check #3: How well does the model fit new data?

Because we are working with a categorical outcome, we can be either right or wrong. The question is how often are we right?

Code
set.seed(0407)

class_sum_weakinf <- 
  classification_summary(model = fail_model_weakinf, data = fake, cutoff = 0.5) 

class_sum_evidence <- 
  classification_summary(model = fail_model_evidence, data = fake, cutoff = 0.5) 
  • overall accuracy captures the proportion of all Y observations that are accurately classified
  • sensitivity (true positive rate) captures the proportion of Y = 1 observations that are accurately classified
  • specificity (true negative rate) the proportion of Y = 0 observations that are accurately classified:
How well do the two models fit the data?
Measure Weakly informative priors Evidence-based priors
Sensitivity (true positive rate) 0.57 0.54
Specificity (true negative rate) 0.43 0.54
Overall accuracy 0.50 0.54

References

Johnson, Alicia A., Miles Q. Ott, and Mine Dogucu. 2022. “Logistic Regression.” In, 329–54. Chapman; Hall/CRC. https://doi.org/10.1201/9780429288340-13.