Packages used in this post
library(tidyverse)
library(rstanarm)
library(bayesrules)
library(tidybayes)
library(bayesplot)
library(kableExtra)
library(patchwork)
theme_set(theme_minimal(base_size = 12))
Paw Hansen
July 5, 2023
Welcome to the second post in my brief series on getting started with Bayesian modeling in R. In my last post, we covered specifying the priors to take into account any prior knowledge.
Today, we’ll try to validate the models we built. As we are working with logistic regression, we’ll focus on two questions Johnson, Ott, and Dogucu (2022):
Recall our two models from my previous post:
Before moving on, we should check the stability of our simulations.
This is easy to do using mcmc_trace
from the bayesrules
packages.
Let’s first check the model using a weakly informative prior:
All looks good. For the evidence based model…
same.
All set, let’s get on with model validation!
To see this, we simulate 100 data sets from our posterior distribution. For each data set, we then calculate the number of failed tests to see if this matches up with that of the original data.
pp_check_model_weakinf <-
pp_check(fail_model_weakinf,
plotfun = "stat",
stat = "calc_prop_fail",
seed = 2307) +
xlab("Share of failed reading tests") +
xlim(0,1) +
theme(legend.position = "none")
pp_check_model_evidence <-
pp_check(fail_model_evidence,
plotfun = "stat",
stat = "calc_prop_fail",
seed = 2307) +
xlab("Share of failed reading tests") +
xlim(0,1) +
theme(legend.position = "none")
pp_check_model_weakinf / pp_check_model_evidence + plot_annotation(tag_levels = 'A')
Because we are working with a categorical outcome, we can be either right or wrong. The question is how often are we right?
Measure | Weakly informative priors | Evidence-based priors |
---|---|---|
Sensitivity (true positive rate) | 0.57 | 0.54 |
Specificity (true negative rate) | 0.43 | 0.54 |
Overall accuracy | 0.50 | 0.54 |