Paw Hansen - Bayesian analysis of a randomized controlled trial II: Defining and validating the model

Welcome to the second post in my brief series on getting started with Bayesian modeling in R. In my last post, we covered specifying the priors to take into account any prior knowledge.

Today, we’ll try to validate the models we built. As we are working with logistic regression, we’ll focus on two questions Johnson, Ott, and Dogucu (2022):

Have our simulation stabilized?
How wrong is the model?
How accurate are the model’s posterior classifications?

Packages used in this post

library(tidyverse)
library(rstanarm)
library(bayesrules)
library(tidybayes)
library(bayesplot)
library(kableExtra)
library(patchwork)

theme_set(theme_minimal(base_size = 12))

Recall our two models from my previous post:

In one model, I used a weakly informative prior
In another, model I used an evidence-based prior

Check #1: Have our simulations stabilized?

Before moving on, we should check the stability of our simulations.

This is easy to do using mcmc_trace from the bayesrules packages.

Let’s first check the model using a weakly informative prior:

Code

# MCMC trace, density, & autocorrelation plots - weakly informative prior 
mcmc_trace(fail_model_weakinf) + scale_x_continuous(breaks = c(0, 5000))
mcmc_dens_overlay(fail_model_weakinf)
mcmc_acf(fail_model_weakinf)

All looks good. For the evidence based model…

Code

# MCMC trace, density, & autocorrelation plots - evidence based model
mcmc_trace(fail_model_evidence) + scale_x_continuous(breaks = c(0, 5000))
mcmc_dens_overlay(fail_model_evidence)
mcmc_acf(fail_model_evidence)

same.

All set, let’s get on with model validation!

Check #2: How well does the model fit the data?

To see this, we simulate 100 data sets from our posterior distribution. For each data set, we then calculate the number of failed tests to see if this matches up with that of the original data.

Code

calc_prop_fail <- function(x) {mean(x == 1)
}

Code

pp_check_model_weakinf <- 
  pp_check(fail_model_weakinf, 
           plotfun = "stat", 
           stat = "calc_prop_fail",
           seed = 2307) + 
  xlab("Share of failed reading tests") +
  xlim(0,1) + 
  theme(legend.position = "none")

pp_check_model_evidence <-
  pp_check(fail_model_evidence, 
           plotfun = "stat", 
           stat = "calc_prop_fail",
           seed = 2307) + 
  xlab("Share of failed reading tests") +
  xlim(0,1) + 
  theme(legend.position = "none")

pp_check_model_weakinf / pp_check_model_evidence + plot_annotation(tag_levels = 'A')

Posterior predictve checks of the two models. Histograms show the proportion of failed students in a number of simulated datasets based on each model, whereas the dark blue lines mark the real proportion of failed students in the data. (a) With a weakly informative prior, the simulated data sets are unwilling to say much about the share of failed reading tests. (b) In contrast, using an evidence-based prior results in posterior simulations that resemble the original data well.

Check #3: How well does the model fit new data?

Because we are working with a categorical outcome, we can be either right or wrong. The question is how often are we right?

Code

set.seed(0407)

class_sum_weakinf <- 
  classification_summary(model = fail_model_weakinf, data = fake, cutoff = 0.5) 

class_sum_evidence <- 
  classification_summary(model = fail_model_evidence, data = fake, cutoff = 0.5)

overall accuracy captures the proportion of all Y observations that are accurately classified
sensitivity (true positive rate) captures the proportion of Y = 1 observations that are accurately classified
specificity (true negative rate) the proportion of Y = 0 observations that are accurately classified:

How well do the two models fit the data?
Measure	Weakly informative priors	Evidence-based priors
Sensitivity (true positive rate)	0.57	0.54
Specificity (true negative rate)	0.43	0.54
Overall accuracy	0.50	0.54

References

Johnson, Alicia A., Miles Q. Ott, and Mine Dogucu. 2022. “Logistic Regression.” In, 329–54. Chapman; Hall/CRC. https://doi.org/10.1201/9780429288340-13.