Tutorial 7.5b - Analysis of Covariance (Bayesian)
12 Jan 2018
Overview
Previous tutorials have concentrated on designs for either continuous (Regression) or categorical (ANOVA) predictor variables. Analysis of covariance (ANCOVA) models are essentially ANOVA models that incorporate one or more continuous and categorical variables (covariates). Although the relationship between a response variable and a covariate may itself be of substantial biological interest, typically covariate(s) are incorporated to reduce the amount of unexplained variability in the model (analogous to blocking - see tutorials ) and thereby increase the power of any treatment effects.
Linear model
The generic linear model is presented here purely for revisory purposes. If, it is unfamiliar to you or you are unsure about what parameters are to be estimated and tested, you are strongly advised to review the the tutorial on frequentist analysis of covariance.
The linear models for two and three factor design are: $$y_{ijk}=\mu+\alpha_i + \beta_{j} + \varepsilon_{ijk}$$ where $\mu$ is the overall mean, $\alpha$ is the effect of Factor A, $\beta$ is the effect of the covariate (Factor B), and $\varepsilon$ is the random unexplained or residual component.
ANCOVA in R
Data and scenario
Consider an experimental design aimed at exploring the effects of a categorical variable with three levels (Group A, Group B and Group C) on a response. From previous studies, we know that the response is influenced by another variable (covariate). Unfortunately, it was not possible to ensure that all sampling units were the same degree of the covariate. Therefore, in an attempt to account for this anticipated extra source of variability, we measured the level of the covariate for each sampling unit. Actually, in allocating treatments to the various treatment groups, we tried to ensure a similar mean and range of the covariate within each group.
- the sample size per treatment=10
- the categorical $x$ variable with 3 levels
- the first treatment group has a population mean of 40.
- the other two treatments reduce the mean by 15 and 20 units respectively
- the data are drawn from normal distributions with a mean of 0 and standard deviation of 4 ($\sigma^2=16$)
- the covariate (B) is a continuous variable with a mean of 20 and a standard deviation of 15
set.seed(3) n <- 10 p <- 3 A.eff <- c(40, -15, -20) beta <- -0.45 sigma <- 4 B <- rnorm(n * p, 0, 15) A <- gl(p, n, lab = paste("Group", LETTERS[1:3])) mm <- model.matrix(~A + B) data <- data.frame(A = A, B = B, Y = as.numeric(c(A.eff, beta) %*% t(mm)) + rnorm(n * p, 0, 4)) data$B <- data$B + 20 head(data)
A B Y 1 Group A 5.570999 50.09555 2 Group A 15.612114 45.38163 3 Group A 23.881823 41.16404 4 Group A 2.718022 50.72290 5 Group A 22.936742 37.26995 6 Group A 20.451859 42.61873
Assumptions
As ANCOVA designs are essentially regular ANOVA designs that are first adjusted (centered) for the covariate(s), ANCOVA designs inherit all of the underlying assumptions of the appropriate ANOVA design. Readers should also eventually consult Tutorial 7.6a, Tutorial 9.2a, Tutorial 9.3a and Tutorial 9.4a. Specifically, hypothesis tests assume that:
- the appropriate residuals are normally distributed. Boxplots using the appropriate scale of replication (reflecting the appropriate residuals/F-ratio denominator, see the above tables) should be used to explore normality. Scale transformations are often useful.
- the appropriate residuals are equally varied. Boxplots and plots of means against variance (using the appropriate scale of replication) should be used to explore the spread of values. Residual plots should reveal no patterns. Scale transformations are often useful.
- the appropriate residuals are independent of one another.
- the relationship between the response variable and the covariate should be linear. Linearity can be explored using scatterplots and residual plots should reveal no patterns.
- for repeated measures and other designs in which treatment levels within blocks can not be be randomly ordered, the variance/covariance matrix is assumed to display sphericity.
- for designs that utilize blocking, it is assumed that there are no block by within block interactions
- homogeneity of slopes. Since a single slope is estimated for the covariate (rather than a separate slope per treatment), it is important that the trends between the response and the covariate is similar for each Group.
Exploratory data analysis
library(car) scatterplot(Y ~ B | A, data = data)
boxplot(Y ~ A, data)
# OR via ggplot library(ggplot2) ggplot(data, aes(y = Y, x = B, group = A)) + geom_point() + geom_smooth(method = "lm")
ggplot(data, aes(y = Y, x = A)) + geom_boxplot()
- there is no evidence of obvious non-normality
- the assumption of linearity seems reasonable
- the variability of the three groups seems approximately equal
- the slopes (Y vs B trends) appear broadly similar for each treatment group
Homogeneity of slopes
We can explore inferential evidence of unequal slopes by examining estimated effects of the interaction between the categorical variable and the covariate. Note, pay no attention to the main effects - only the interaction. Even though I intend to illustrate Bayesian analyses here, for such a simple model, it is considerably simpler to use traditional OLS for testing for the presence of an interaction..
anova(lm(Y ~ B * A, data = data))
Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) B 1 354.80 354.80 23.9691 5.414e-05 *** A 2 2772.56 1386.28 93.6531 4.609e-12 *** B:A 2 55.08 27.54 1.8606 0.1773 Residuals 24 355.26 14.80 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model fitting or statistical analysis
Consistent with Tutorial 7.2b we will explore Bayesian modelling of ANCOVA using a variety of tools (such as MCMCpack, JAGS, RSTAN, RSTANARM and BRMS). Whilst JAGS and RSTAN are extremely flexible and thus allow models to be formulated that contain not only the simple model, but also additional derivatives, the other approaches are more restrictive. Consequently, I will mostly restrict models to just the minimum necessary and all derivatives will instead be calculated in R itself from the returned posteriors.
The observed response ($y_i$) are assumed to be drawn from a normal distribution with a given mean ($\mu$) and standard deviation ($\sigma$). The expected values ($\mu$) are themselves determined by the linear predictor ($\mathbf{X}\boldsymbol{\beta}$). In this case, $\boldsymbol{\beta}$ represents the vector of $\beta$'s - the intercept associated with the first group, the (effects) differences between this intercept and the intercepts for each other group as well as the slope associated with the continuous covariate. $\mathbf{X}$ is the model matrix.
MCMC sampling requires priors on all parameters. We will employ weakly informative priors. Specifying 'uninformative' priors is always a bit of a balancing act. If the priors are too vague (wide) the MCMC sampler can wander off into nonscence areas of likelihood rather than concentrate around areas of highest likelihood (desired when wanting the outcomes to be largely driven by the data). On the other hand, if the priors are too strong, they may have an influence on the parameters. In such a simple model, this balance is very forgiving - it is for more complex models that prior choice becomes more important.
For this simple model, we will go with zero-centered Gaussian (normal) priors with relatively large standard deviations (100) for both the intercept and the treatment effect and a wide half-cauchy (scale=5) for the standard deviation. $$ \begin{align} y_{ij} &\sim{} N(\mu_{ij}, \sigma)\\ \mu_{ij} &= \beta_0 + \mathbf{X}\boldsymbol{\beta}\\[1em] \beta_0 &\sim{} N(0,100)\\ \beta &\sim{} N(0,10)\\ \sigma &\sim{} cauchy(0,5)\\ \end{align} $$ Note, exploratory data analysis suggests that while the intercept (intercept of Group A) and categorical predictor effects (differences between intercepts of each of the Group and Group A's intercept) could be drawn from a similar distribution (with mean in the 10's and variances in the 100's), the slope (effect associated with Group A linear relationship) is likely to be an order of magnitude less. We might therefore be tempted to provide different priors for the intercept, categorical effects and slope effect. For a simple model such as this, it is unlikely to be necessary. However, for more complex models, where prior specification becomes more critical, separate priors would probably be necessary.
library(MCMCpack) data.mcmcpack <- MCMCregress(Y ~ A + B, data = data)
Define the model
modelString = " model { #Likelihood for (i in 1:n) { y[i]~dnorm(mean[i],tau) mean[i] <- inprod(beta[],X[i,]) } #Priors for (i in 1:ngroups) { beta[i] ~ dnorm(0, 1.0E-6) } sigma ~ dunif(0, 100) tau <- 1 / (sigma * sigma) } "
Define the data
Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:
- the response variable (y)
- the predictor model matrix (X)
- the total number of observed items (n)
- the number of predictor terms (nX)
X <- model.matrix(~A + B, data) data.list <- with(data, list(y = Y, X = X, n = nrow(data), ngroups = ncol(X)))
Define the MCMC chain parameters
Next we should define the behavioural parameters of the MCMC sampling chains. Include the following:
- the nodes (estimated parameters) to monitor (return samples for)
- the number of MCMC chains (3)
- the number of burnin steps (1000)
- the thinning factor (10)
- the number of MCMC iterations - determined by the number of samples to save, the rate of thinning and the number of chains
params <- c("beta", "sigma") nChains = 3 burnInSteps = 3000 thinSteps = 10 numSavedSteps = 15000 #across all chains nIter = ceiling(burnInSteps + (numSavedSteps * thinSteps)/nChains) nIter
[1] 53000
Fit the model
Now run the JAGS code via the R2jags interface. Note that the first time jags is run after the R2jags package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.
library(R2jags)
data.r2jags <- jags(data = data.list, inits = NULL, parameters.to.save = params, model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter, n.burnin = burnInSteps, n.thin = thinSteps)
Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 30 Unobserved stochastic nodes: 5 Total graph size: 232 Initializing model
print(data.r2jags)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10 n.sims = 15000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] 50.201 1.803 46.617 49.021 50.207 51.404 53.741 1.001 8400 beta[2] -17.141 1.879 -20.832 -18.393 -17.149 -15.910 -13.402 1.001 8400 beta[3] -22.917 1.882 -26.640 -24.161 -22.922 -21.682 -19.181 1.001 5900 beta[4] -0.414 0.065 -0.543 -0.457 -0.414 -0.372 -0.285 1.001 15000 sigma 4.175 0.619 3.178 3.735 4.108 4.531 5.587 1.001 7200 deviance 169.232 3.548 164.517 166.636 168.502 171.075 178.013 1.001 15000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 6.3 and DIC = 175.5 DIC is an estimate of expected predictive error (lower deviance is better).
data.mcmc.list <- as.mcmc(data.r2jags)
Whilst Gibbs sampling provides an elegantly simple MCMC sampling routine, very complex hierarchical models can take enormous numbers of iterations (often prohibitory large) to converge on a stable posterior distribution.
To address this, Andrew Gelman (and other collaborators) have implemented a variation on Hamiltonian Monte Carlo (HMC: a sampler that selects subsequent samples in a way that reduces the correlation between samples, thereby speeding up convergence) called the No-U-Turn (NUTS) sampler. All of these developments are brought together into a tool called Stan ("Sampling Through Adaptive Neighborhoods").
By design (to appeal to the vast BUGS users), Stan models are defined in a manner reminiscent of BUGS. Stan first converts these models into C++ code which is then compiled to allow very rapid computation.
Consistent with the use of C++, the model must be accompanied by variable declarations for all inputs and parameters.
One important difference between Stan and JAGS is that whereas BUGS (and thus JAGS) use precision rather than variance, Stan uses variance.
Stan itself is a stand-alone command line application. However, conveniently, the authors of Stan have also developed an R interface to Stan called Rstan which can be used much like R2jags.
Model matrix formulation
The minimum model in Stan required to fit the above simple regression follows. Note the following modifications from the model defined in JAGS:- the normal distribution is defined by variance rather than precision
- rather than using a uniform prior for sigma, I am using a half-Cauchy
We now translate the likelihood model into STAN code.
$$\begin{align}
y_{ij}&\sim{}N(\mu_{ij}, \sigma)\\
\mu_{ij} &= \mathbf{X}\boldsymbol{\beta}\\
\beta_0&\sim{}N(0,100)\\
\beta&\sim{}N(0,10)\\
\sigma&\sim{}Cauchy(0,5)\\
\end{align}
$$
Define the model
modelString = " data { int<lower=1> n; int<lower=1> nX; vector [n] y; matrix [n,nX] X; } parameters { vector[nX] beta; real<lower=0> sigma; } transformed parameters { vector[n] mu; mu = X*beta; } model { #Likelihood y~normal(mu,sigma); #Priors beta ~ normal(0,100); sigma~cauchy(0,5); } generated quantities { vector[n] log_lik; for (i in 1:n) { log_lik[i] = normal_lpdf(y[i] | mu[i], sigma); } } "
Define the data
Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:
- the response variable (y)
- the predictor model matrix (X)
- the total number of observed items (n)
- the number of predictor terms (nX)
Xmat <- model.matrix(~A + B, data) data.list <- with(data, list(y = Y, X = Xmat, nX = ncol(Xmat), n = nrow(data)))
Fit the model
Now run the JAGS code via the R2jags interface. Note that the first time jags is run after the R2jags package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.
## load the rstan package library(rstan)
data.rstan <- stan(data = data.list, model_code = modelString, chains = 3, iter = 2000, warmup = 500, thin = 3)
In file included from /usr/local/lib/R/site-library/BH/include/boost/config.hpp:39:0, from /usr/local/lib/R/site-library/BH/include/boost/math/tools/config.hpp:13, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/var.hpp:7, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:12, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4, from file4be61ee4e612.cpp:8: /usr/local/lib/R/site-library/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: "BOOST_NO_CXX11_RVALUE_REFERENCES" redefined # define BOOST_NO_CXX11_RVALUE_REFERENCES ^ <command-line>:0:0: note: this is the location of the previous definition SAMPLING FOR MODEL '3d2414c9dcf4b5e12be870eadd2c894a' NOW (CHAIN 1). Gradient evaluation took 1.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.062651 seconds (Warm-up) 0.061975 seconds (Sampling) 0.124626 seconds (Total) SAMPLING FOR MODEL '3d2414c9dcf4b5e12be870eadd2c894a' NOW (CHAIN 2). Gradient evaluation took 7e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.053706 seconds (Warm-up) 0.057327 seconds (Sampling) 0.111033 seconds (Total) SAMPLING FOR MODEL '3d2414c9dcf4b5e12be870eadd2c894a' NOW (CHAIN 3). Gradient evaluation took 7e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.054613 seconds (Warm-up) 0.064711 seconds (Sampling) 0.119324 seconds (Total)
print(data.rstan, par = c("beta", "sigma"))
Inference for Stan model: 3d2414c9dcf4b5e12be870eadd2c894a. 3 chains, each with iter=2000; warmup=500; thin=3; post-warmup draws per chain=500, total post-warmup draws=1500. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta[1] 50.22 0.06 1.79 46.60 49.03 50.24 51.40 53.83 1027 1 beta[2] -17.12 0.06 1.92 -20.88 -18.31 -17.14 -15.96 -13.19 1210 1 beta[3] -22.88 0.05 1.93 -26.78 -24.11 -22.86 -21.58 -19.24 1407 1 beta[4] -0.42 0.00 0.06 -0.54 -0.46 -0.42 -0.37 -0.29 1282 1 sigma 4.11 0.02 0.58 3.12 3.70 4.05 4.45 5.37 1353 1 Samples were drawn using NUTS(diag_e) at Fri Nov 3 15:05:11 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
The STAN team has put together pre-compiled modules (functions) to make specifying and applying STAN models much simpler. Each function offers a consistent interface that is Also reminiscent of major frequentist linear modelling routines in R.
Whilst it is not necessary to specify priors when using rstanarm functions (as defaults will be generated), there is no guarantee that the routines for determining these defaults will persist over time. Furthermore, it is always better to define your own priors if for no other reason that it forces you to thing about what you re doing. Consistent with the pure STAN version, we will employ the following priors:
- weakly informative Gaussian prior for the intercept $\beta_0 \sim{} N(0, 100)$
- weakly informative Gaussian prior for the treatment effect $\beta_1 \sim{} N(0, 100)$
- half-cauchy prior for the variance $\sigma \sim{} Cauchy(0, 5)$
Note, I am using the refresh=0 option so as to suppress the larger regular output in the interest of keeping output to what is necessary for this tutorial. When running outside of a tutorial context, the regular verbose output is useful as it provides a way to gauge progress.
library(rstanarm) library(broom) library(coda)
data.rstanarm = stan_glm(Y ~ A + B, data = data, iter = 2000, warmup = 500, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 100), prior = normal(0, 100), prior_aux = cauchy(0, 2))
Gradient evaluation took 5.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.55 seconds. Adjust your expectations accordingly! Elapsed Time: 0.149605 seconds (Warm-up) 0.181861 seconds (Sampling) 0.331466 seconds (Total) Gradient evaluation took 1.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. Adjust your expectations accordingly! Elapsed Time: 0.120154 seconds (Warm-up) 0.198657 seconds (Sampling) 0.318811 seconds (Total) Gradient evaluation took 1.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds. Adjust your expectations accordingly! Elapsed Time: 0.182133 seconds (Warm-up) 0.174642 seconds (Sampling) 0.356775 seconds (Total)
print(data.rstanarm)
stan_glm family: gaussian [identity] formula: Y ~ A + B ------ Estimates: Median MAD_SD (Intercept) 50.2 1.8 AGroup B -17.2 1.9 AGroup C -22.9 1.9 B -0.4 0.1 sigma 4.1 0.6 Sample avg. posterior predictive distribution of y (X = xbar): Median MAD_SD mean_PPD 30.0 1.1 ------ For info on the priors used see help('prior_summary.stanreg').
tidyMCMC(data.rstanarm, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 50.2286902 1.9365801 46.4428738 54.1263008 2 AGroup B -17.2114440 1.9386491 -21.0066484 -13.4338089 3 AGroup C -22.9037572 1.9679361 -26.6757211 -19.0189639 4 B -0.4138748 0.0681432 -0.5451839 -0.2852376 5 sigma 4.1990238 0.6317351 3.0581470 5.4328442
The brms package serves a similar goal to the rstanarm package - to provide a simple user interface to STAN. However, unlike the rstanarm implementation, brms simply converts the formula, data, priors and family into STAN model code and data before executing stan with those elements.
Whilst it is not necessary to specify priors when using brms functions (as defaults will be generated), there is no guarantee that the routines for determining these defaults will persist over time. Furthermore, it is always better to define your own priors if for no other reason that it forces you to thing about what you are doing. Consistent with the pure STAN version, we will employ the following priors:
- weakly informative Gaussian prior for the intercept $\beta_0 \sim{} N(0, 100)$
- weakly informative Gaussian prior for the treatment effect $\beta_1 \sim{} N(0, 100)$
- half-cauchy prior for the variance $\sigma \sim{} Cauchy(0, 5)$
Note, I am using the refresh=0. option so as to suppress the larger regular output in the interest of keeping output to what is necessary for this tutorial. When running outside of a tutorial context, the regular verbose output is useful as it provides a way to gauge progress.
library(brms) library(broom) library(coda)
data.brms = brm(Y ~ A + B, data = data, iter = 2000, warmup = 500, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 100), class = "Intercept"), prior(normal(0, 100), class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 1.8e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds. Adjust your expectations accordingly! Elapsed Time: 0.036403 seconds (Warm-up) 0.043672 seconds (Sampling) 0.080075 seconds (Total) Gradient evaluation took 8e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds. Adjust your expectations accordingly! Elapsed Time: 0.033241 seconds (Warm-up) 0.041044 seconds (Sampling) 0.074285 seconds (Total) Gradient evaluation took 7e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds. Adjust your expectations accordingly! Elapsed Time: 0.036154 seconds (Warm-up) 0.041079 seconds (Sampling) 0.077233 seconds (Total)
print(data.brms)
Family: gaussian(identity) Formula: Y ~ A + B Data: data (Number of observations: 30) Samples: 3 chains, each with iter = 2000; warmup = 500; thin = 2; total post-warmup samples = 2250 ICs: LOO = NA; WAIC = NA; R2 = NA Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 50.20 1.76 46.79 53.75 2177 1 AGroupB -17.17 1.83 -20.81 -13.68 2158 1 AGroupC -22.89 1.85 -26.43 -19.25 2127 1 B -0.41 0.06 -0.54 -0.29 2129 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 4.11 0.59 3.14 5.41 1854 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
tidyMCMC(data.brms, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 b_Intercept 50.1970335 1.75540856 46.6017008 53.5440980 2 b_AGroupB -17.1728328 1.83135058 -20.6803108 -13.5582289 3 b_AGroupC -22.8915861 1.85449337 -26.3846325 -19.2178870 4 b_B -0.4122714 0.06370931 -0.5416812 -0.2961189 5 sigma 4.1081576 0.58644550 3.1075609 5.3375706
MCMC diagnostics
In addition to the regular model diagnostic checks (such as residual plots), for Bayesian analyses, it is necessary to explore the characteristics of the MCMC chains and the sampler in general. Recall that the purpose of MCMC sampling is to replicate the posterior distribution of the model likelihood and priors by drawing a known number of samples from this posterior (thereby formulating a probability distribution). This is only reliable if the MCMC samples accurately reflect the posterior.
Unfortunately, since we only know the posterior in the most trivial of circumstances, it is necessary to rely on indirect measures of how accurately the MCMC samples are likely to reflect the likelihood. I will breifly outline the most important diagnostics, however, please refer to Tutorial 4.3, Secton 3.1: Markov Chain Monte Carlo sampling for a discussion of these diagnostics.
- Traceplots for each parameter illustrate the MCMC sample values after each successive
iteration along the chain. Bad chain mixing (characterized by any sort of pattern) suggests
that the MCMC sampling chains may not have completely traversed all features of the posterior
distribution and that more iterations are required to ensure the distribution has been accurately
represented.
- Autocorrelation plot for each paramter illustrate the degree of correlation between
MCMC samples separated by different lags. For example, a lag of 0 represents the degree of
correlation between each MCMC sample and itself (obviously this will be a correlation of 1).
A lag of 1 represents the degree of correlation between each MCMC sample and the next sample along the Chain
and so on. In order to be able to generate unbiased estimates of parameters, the MCMC samples should be
independent (uncorrelated). In the figures below, this would be violated in the top autocorrelation plot and met in the bottom
autocorrelation plot.
- Rhat statistic for each parameter provides a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05. If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or overly vague priors that permit sampling in otherwise nonscence parameter space.
Prior to inspecting any summaries of the parameter estimates, it is prudent to inspect a range of chain convergence diagnostics.
- Trace plots
View trace plots
library(MCMCpack) plot(data.mcmcpack)
- Raftery diagnostic
View Raftery diagnostic
library(MCMCpack) raftery.diag(data.mcmcpack)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) (Intercept) 2 3834 3746 1.020 AGroup B 2 3771 3746 1.010 AGroup C 2 3962 3746 1.060 B 2 3741 3746 0.999 sigma2 2 3771 3746 1.010
- Autocorrelation diagnostic
View autocorrelations
library(MCMCpack) autocorr.diag(data.mcmcpack)
(Intercept) AGroup B AGroup C B sigma2 Lag 0 1.000000000 1.000000e+00 1.000000000 1.000000000 1.000000000 Lag 1 -0.003346420 -4.064355e-03 -0.002480710 0.002480390 0.132659644 Lag 5 0.016500991 1.299476e-02 -0.003939996 0.001939800 -0.018030548 Lag 10 -0.005233676 -6.125242e-06 -0.013564685 -0.006005238 0.006407515 Lag 50 0.006330549 -9.343650e-03 -0.011855223 -0.002673821 0.007502186
Again, prior to examining the summaries, we should have explored the convergence diagnostics.
library(coda) data.mcmc = as.mcmc(data.r2jags)
- Trace plots
plot(data.mcmc)
When there are a lot of parameters, this can result in a very large number of traceplots. To focus on just certain parameters (such as $\beta$s)
preds <- c("beta[1]", "beta[2]", "beta[3]", "beta[4]") plot(as.mcmc(data.r2jags)[, preds])
- Raftery diagnostic
raftery.diag(data.mcmc)
[[1]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta[1] 20 36200 3746 9.66 beta[2] 20 38660 3746 10.30 beta[3] 20 38030 3746 10.20 beta[4] 20 36200 3746 9.66 deviance 20 37410 3746 9.99 sigma 20 38030 3746 10.20 [[2]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta[1] 20 39300 3746 10.50 beta[2] 20 38660 3746 10.30 beta[3] 20 39950 3746 10.70 beta[4] 20 37410 3746 9.99 deviance 20 39300 3746 10.50 sigma 20 36800 3746 9.82 [[3]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta[1] 20 38030 3746 10.20 beta[2] 20 36800 3746 9.82 beta[3] 20 37410 3746 9.99 beta[4] 20 38030 3746 10.20 deviance 20 36800 3746 9.82 sigma 20 36200 3746 9.66
- Autocorrelation diagnostic
autocorr.diag(data.mcmc)
beta[1] beta[2] beta[3] beta[4] deviance sigma Lag 0 1.0000000000 1.0000000000 1.000000000 1.000000000 1.0000000000 1.0000000000 Lag 10 -0.0086789407 0.0002409218 -0.012834845 -0.014876564 -0.0155077394 -0.0009823887 Lag 50 -0.0133637002 -0.0146591155 -0.022266022 0.007955582 -0.0131845968 -0.0066436459 Lag 100 0.0007452402 0.0128511117 -0.001800657 -0.007853232 0.0008253937 -0.0101001901 Lag 500 -0.0029682531 0.0011976990 0.001848702 -0.005150388 -0.0016034261 -0.0118212907
Again, prior to examining the summaries, we should have explored the convergence diagnostics. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarization).
- extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
- use the numerous routines that come with the rstan package
- use the routines that come with the bayesplot package
- explore the diagnostics interactively via shinystan
- via coda
- Traceplots
- Autocorrelation
library(coda) s = as.array(data.rstan) wch = grep("beta", dimnames(s)$parameters) s = s[, , wch] mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) plot(mcmc)
library(coda) s = as.array(data.rstan) wch = grep("beta", dimnames(s)$parameters) s = s[, , wch] mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) autocorr.diag(mcmc)
beta[1] beta[2] beta[3] Lag 0 1.000000000 1.0000000000 1.000000000 Lag 1 0.064013692 0.0605531343 0.026397084 Lag 5 0.007321517 0.0009337118 0.006726571 Lag 10 -0.016376224 -0.0098527765 -0.009376572 Lag 50 -0.017683148 -0.0290390559 0.010863075
- via rstan
- Traceplots
stan_trace(data.rstan)
- Raftery diagnostic
raftery.diag(data.rstan)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
- Autocorrelation diagnostic
stan_ac(data.rstan)
- Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05.
If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this
mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling
in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or
overly vague priors that permit sampling in otherwise nonscence parameter space.
stan_rhat(data.rstan)
- Another measure of sampling efficiency is Effective Sample Size (ess).
ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples
on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain
may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
stan_ess(data.rstan)
- Traceplots
- via bayesplot
- Trace plots and density plots
library(bayesplot) mcmc_trace(as.matrix(data.rstan), regex_pars = "beta|sigma")
library(bayesplot) mcmc_combo(as.matrix(data.rstan), regex_pars = "beta|sigma")
- Density plots
library(bayesplot) mcmc_dens(as.matrix(data.rstan), regex_pars = "beta|sigma")
- Trace plots and density plots
- via shinystan
library(shinystan) launch_shinystan(data.rstan)
- It is worth exploring the influence of our priors.
Again, prior to examining the summaries, we should have explored the convergence diagnostics. There are numerous ways of working with STANARM model fits (for exploring diagnostics and summarization).
- extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
- use the numerous routines that come with the rstan package
- use the routines that come with the bayesplot package
- explore the diagnostics interactively via shinystan
- via coda
- Traceplots
- Autocorrelation
library(coda) s = as.array(data.rstanarm) mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) plot(mcmc)
library(coda) s = as.array(data.rstanarm) mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) autocorr.diag(mcmc)
(Intercept) AGroup B AGroup C B Lag 0 1.0000000000 1.000000000 1.000000000 1.000000000 Lag 1 0.0290559273 0.001379705 -0.023247646 -0.018452103 Lag 5 -0.0035230620 -0.003667234 0.004820692 0.008991159 Lag 10 0.0180539117 -0.016292304 0.028751812 0.024889288 Lag 50 0.0008045354 -0.013900306 0.008315425 0.009651236
- via rstan
- Traceplots
stan_trace(data.rstanarm)
- Raftery diagnostic
raftery.diag(data.rstanarm)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
- Autocorrelation diagnostic
stan_ac(data.rstanarm)
- Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05.
If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this
mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling
in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or
overly vague priors that permit sampling in otherwise nonscence parameter space.
stan_rhat(data.rstanarm)
- Another measure of sampling efficiency is Effective Sample Size (ess).
ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples
on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain
may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
stan_ess(data.rstanarm)
- Traceplots
- via bayesplot
- Trace plots and density plots
library(bayesplot) mcmc_trace(as.array(data.rstanarm), regex_pars = "Intercept|x|sigma")
mcmc_combo(as.array(data.rstanarm))
- Density plots
mcmc_dens(as.array(data.rstanarm))
- Trace plots and density plots
- via rstanarm
The rstanarm package provides additional posterior checks.- Posterior vs Prior - this compares the posterior estimate for each parameter against the associated prior.
If the spread of the priors is small relative to the posterior, then it is likely that the priors are too influential.
On the other hand, overly wide priors can lead to computational issues.
library(rstanarm) posterior_vs_prior(data.rstanarm, color_by = "vs", group_by = TRUE, facet_args = list(scales = "free_y"))
Gradient evaluation took 3.6e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.36 seconds. Adjust your expectations accordingly! Elapsed Time: 0.318686 seconds (Warm-up) 0.056968 seconds (Sampling) 0.375654 seconds (Total) Gradient evaluation took 1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds. Adjust your expectations accordingly! Elapsed Time: 0.371641 seconds (Warm-up) 0.056263 seconds (Sampling) 0.427904 seconds (Total)
- Posterior vs Prior - this compares the posterior estimate for each parameter against the associated prior.
If the spread of the priors is small relative to the posterior, then it is likely that the priors are too influential.
On the other hand, overly wide priors can lead to computational issues.
- via shinystan
library(shinystan) launch_shinystan(data.rstanarm))
Again, prior to examining the summaries, we should have explored the convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarization).
- extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
- use the numerous routines that come with the rstan package
- use the routines that come with the bayesplot package
- explore the diagnostics interactively via shinystan
- via coda
- Traceplots
- Autocorrelation
library(coda) mcmc = as.mcmc(data.brms) plot(mcmc)
library(coda) mcmc = as.mcmc(data.brms) autocorr.diag(mcmc)
Error in ts(x, start = start(x), end = end(x), deltat = thin(x)): invalid time series parameters specified
- via rstan
- Traceplots
stan_trace(data.brms$fit)
- Raftery diagnostic
raftery.diag(data.brms)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
- Autocorrelation diagnostic
stan_ac(data.brms$fit)
- Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05.
If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this
mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling
in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or
overly vague priors that permit sampling in otherwise nonscence parameter space.
stan_rhat(data.brms$fit)
- Another measure of sampling efficiency is Effective Sample Size (ess).
ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples
on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain
may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
stan_ess(data.brms$fit)
- Traceplots
Model validation
Model validation involves exploring the model diagnostics and fit to ensure that the model is broadly appropriate for the data. As such, exploration of the residuals should be routine.
For more complex models (those that contain multiple effects, it is also advisable to plot the residuals against each of the individual predictors. For sampling designs that involve sample collection over space or time, it is also a good idea to explore whether there are any temporal or spatial patterns in the residuals.
There are numerous situations (e.g. when applying specific variance-covariance structures to a model) where raw residuals do not reflect the interior workings of the model. Typically, this is because they do not take into account the variance-covariance matrix or assume a very simple variance-covariance matrix. Since the purpose of exploring residuals is to evaluate the model, for these cases, it is arguably better to draw conclusions based on standardized (or studentized) residuals.
Unfortunately the definitions of standardized and studentized residuals appears to vary and the two terms get used interchangeably. I will adopt the following definitions:
Standardized residuals: | the raw residuals divided by the true standard deviation of the residuals (which of course is rarely known). | |
Studentized residuals: | the raw residuals divided by the standard deviation of the residuals. Note that externally studentized residuals are calculated by dividing the raw residuals by a unique standard deviation for each observation that is calculated from regressions having left each successive observation out. | |
Pearson residuals: | the raw residuals divided by the standard deviation of the response variable. |
The mark of a good model is being able to predict well. In an ideal world, we would have sufficiently large sample size as to permit us to hold a fraction (such as 25%) back thereby allowing us to train the model on 75% of the data and then see how well the model can predict the withheld 25%. Unfortunately, such a luxury is still rare in ecology.
The next best option is to see how well the model can predict the observed data. Models tend to struggle most with the extremes of trends and have particular issues when the extremes approach logical boundaries (such as zero for count data and standard deviations). We can use the fitted model to generate random predicted observations and then explore some properties of these compared to the actual observed data.
Residuals are not computed directly within MCMCpack. However, we can calculate them manually form the posteriors.
mcmc = as.data.frame(data.mcmcpack) # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates head(mcmc)
(Intercept) AGroup B AGroup C B sigma2 1 50.18839 -18.23379 -22.87414 -0.4394640 14.11917 2 49.39556 -15.64877 -20.98086 -0.4528605 13.22004 3 49.44854 -16.36367 -23.66886 -0.3887300 15.57976 4 51.46891 -16.26642 -24.26313 -0.4709924 12.01597 5 50.55617 -17.54613 -22.44262 -0.4252965 16.35805 6 50.73340 -17.54078 -23.75846 -0.4598987 10.84731
coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$Y - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
mcmc = as.data.frame(data.mcmcpack) # generate a model matrix newdata = newdata Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$Y - fit newdata = newdata %>% cbind(fit, resid) ggplot(newdata) + geom_point(aes(y = resid, x = A))
ggplot(newdata) + geom_point(aes(y = resid, x = B))
And now for studentized residuals
mcmc = as.data.frame(data.mcmcpack) # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$Y - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
mcmc = as.matrix(data.mcmcpack) # generate a model matrix Xmat = model.matrix(~A + B, data) ## get median parameter estimates coefs = mcmc[, 1:4] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data), fit[i, ], sqrt(mcmc[i, "sigma2"]))) newdata = data.frame(A = data$A, B = data$B, yRep) %>% gather(key = Sample, value = Value, -A, -B) ggplot(newdata) + geom_violin(aes(y = Value, x = A, fill = "Model"), alpha = 0.5) + geom_violin(data = data, aes(y = Y, x = A, fill = "Obs"), alpha = 0.5) + geom_point(data = data, aes(y = Y, x = A), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata) + geom_violin(aes(y = Value, x = B, fill = "Model", group = B, color = A), alpha = 0.5) + geom_point(data = data, aes(y = Y, x = B, group = B, color = A))
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data.mcmcpack), regex_pars = "Intercept|^A|^B|sigma")
mcmc_areas(as.matrix(data.mcmcpack), regex_pars = "Intercept|^A|^B|sigma")
Residuals are not computed directly within JAGS. However, we can calculate them manually form the posteriors.
mcmc = data.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$Y - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
mcmc = data.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata = newdata Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$Y - fit newdata = newdata %>% cbind(fit, resid) ggplot(newdata) + geom_point(aes(y = resid, x = A))
ggplot(newdata) + geom_point(aes(y = resid, x = B))
And now for studentized residuals
mcmc = data.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$Y - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
mcmc = data.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix Xmat = model.matrix(~A + B, data) ## get median parameter estimates coefs = mcmc[, 1:4] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data), fit[i, ], mcmc[i, "sigma"])) newdata = data.frame(A = data$A, B = data$B, yRep) %>% gather(key = Sample, value = Value, -A, -B) ggplot(newdata) + geom_violin(aes(y = Value, x = A, fill = "Model"), alpha = 0.5) + geom_violin(data = data, aes(y = Y, x = A, fill = "Obs"), alpha = 0.5) + geom_point(data = data, aes(y = Y, x = A), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata) + geom_violin(aes(y = Value, x = B, fill = "Model", group = B, color = A), alpha = 0.5) + geom_point(data = data, aes(y = Y, x = B, group = B, color = A))
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(data.r2jags$BUGSoutput$sims.matrix, regex_pars = "beta|sigma")
mcmc_areas(data.r2jags$BUGSoutput$sims.matrix, regex_pars = "beta|sigma")
Residuals are not computed directly within RSTAN. However, we can calculate them manually form the posteriors.
mcmc = as.data.frame(data.rstan) %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$Y - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
mcmc = as.data.frame(data.rstan) %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata = newdata Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$Y - fit newdata = newdata %>% cbind(fit, resid) ggplot(newdata) + geom_point(aes(y = resid, x = A))
ggplot(newdata) + geom_point(aes(y = resid, x = B))
And now for studentized residuals
mcmc = as.data.frame(data.rstan) %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates coefs = apply(mcmc[, 1:4], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data$Y - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
mcmc = as.data.frame(data.rstan) %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix Xmat = model.matrix(~A + B, data) ## get median parameter estimates coefs = mcmc[, 1:4] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data), fit[i, ], mcmc[i, "sigma"])) newdata = data.frame(A = data$A, B = data$B, yRep) %>% gather(key = Sample, value = Value, -A, -B) ggplot(newdata) + geom_violin(aes(y = Value, x = A, fill = "Model"), alpha = 0.5) + geom_violin(data = data, aes(y = Y, x = A, fill = "Obs"), alpha = 0.5) + geom_point(data = data, aes(y = Y, x = A), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata) + geom_violin(aes(y = Value, x = B, fill = "Model", group = B, color = A), alpha = 0.5) + geom_point(data = data, aes(y = Y, x = B, group = B, color = A))
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data.rstan), regex_pars = "beta|sigma")
mcmc_areas(as.matrix(data.rstan), regex_pars = "beta|sigma")
Residuals are not computed directly within RSTANARM. However, we can calculate them manually form the posteriors.
resid = resid(data.rstanarm) fit = fitted(data.rstanarm) ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
resid = resid(data.rstanarm) dat = data %>% mutate(resid = resid) ggplot(dat) + geom_point(aes(y = resid, x = A))
ggplot(dat) + geom_point(aes(y = resid, x = B))
And now for studentized residuals
resid = resid(data.rstanarm) sresid = resid/sd(resid) fit = fitted(data.rstanarm) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
y_pred = posterior_predict(data.rstanarm) newdata = data %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -Y:-B) ggplot(newdata) + geom_violin(aes(y = Value, x = A, fill = "Model"), alpha = 0.5) + geom_violin(data = data, aes(y = Y, x = A, fill = "Obs"), alpha = 0.5) + geom_point(data = data, aes(y = Y, x = A), position = position_jitter(width = 0.1, height = 0), color = "black")
Error: Aesthetics must be either length 1 or the same as the data (67530): x, y, fill
ggplot(newdata) + geom_violin(aes(y = Value, x = B, fill = "Model", group = B, color = A), alpha = 0.5) + geom_point(data = data, aes(y = Y, x = B, group = B, color = A))
Error: Aesthetics must be either length 1 or the same as the data (67530): x, y, fill, group, colour
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data.rstanarm), regex_pars = "Intercept|^A|^B|sigma")
mcmc_areas(as.matrix(data.rstanarm), regex_pars = "Intercept|^A|^B|sigma")
Residuals are not computed directly within BRMS. However, we can calculate them manually form the posteriors.
resid = resid(data.brms)[, "Estimate"] fit = fitted(data.brms)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
resid = resid(data.brms)[, "Estimate"] dat = data %>% mutate(resid = resid) ggplot(dat) + geom_point(aes(y = resid, x = A))
ggplot(dat) + geom_point(aes(y = resid, x = B))
And now for studentized residuals
resid = resid(data.brms)[, "Estimate"] sresid = resid/sd(resid) fit = fitted(data.brms)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
y_pred = posterior_predict(data.brms) newdata = data %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -Y:-B) ggplot(newdata) + geom_violin(aes(y = Value, x = A, fill = "Model"), alpha = 0.5) + geom_violin(data = data, aes(y = Y, x = A, fill = "Obs"), alpha = 0.5) + geom_point(data = data, aes(y = Y, x = A), position = position_jitter(width = 0.1, height = 0), color = "black")
Error: Aesthetics must be either length 1 or the same as the data (67530): x, y, fill
ggplot(newdata) + geom_violin(aes(y = Value, x = B, fill = "Model", group = B, color = A), alpha = 0.5) + geom_point(data = data, aes(y = Y, x = B, group = B, color = A))
Error: Aesthetics must be either length 1 or the same as the data (67530): x, y, fill, group, colour
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data.brms), regex_pars = "b_|sigma")
mcmc_areas(as.matrix(data.brms), regex_pars = "b_|sigma")
Parameter estimates (posterior summaries)
Although all parameters in a Bayesian analysis are considered random and are considered a distribution, rarely would it be useful to present tables of all the samples from each distribution. On the other hand, plots of the posterior distributions are do have some use. Nevertheless, most workers prefer to present simple statistical summaries of the posteriors. Popular choices include the median (or mean) and 95% credibility intervals.
library(coda) mcmcpvalue <- function(samp) { ## elementary version that creates an empirical p-value for the ## hypothesis that the columns of samp have mean zero versus a general ## multivariate distribution with elliptical contours. ## differences from the mean standardized by the observed ## variance-covariance factor ## Note, I put in the bit for single terms if (length(dim(samp)) == 0) { std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - mean(samp), transpose = TRUE) sqdist <- colSums(std * std) sum(sqdist[-1] > sqdist[1])/length(samp) } else { std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - colMeans(samp), transpose = TRUE) sqdist <- colSums(std * std) sum(sqdist[-1] > sqdist[1])/nrow(samp) } }
Matrix model (MCMCpack)
summary(data.mcmcpack)
Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 50.2015 1.7846 0.017846 0.0178458 AGroup B -17.1433 1.8782 0.018782 0.0187821 AGroup C -22.9133 1.8682 0.018682 0.0186817 B -0.4138 0.0637 0.000637 0.0006543 sigma2 17.0079 5.0889 0.050889 0.0581569 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 46.7714 49.0274 50.2019 51.3437 53.7977 AGroup B -20.8797 -18.3726 -17.1445 -15.8884 -13.4552 AGroup C -26.6231 -24.1304 -22.9030 -21.6955 -19.1900 B -0.5397 -0.4556 -0.4134 -0.3714 -0.2885 sigma2 9.7749 13.4022 16.1334 19.6823 29.3438
# OR library(broom) tidyMCMC(data.mcmcpack, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 50.2014506 1.78458442 46.7660798 53.7766054 2 AGroup B -17.1433463 1.87821127 -20.7503649 -13.3485818 3 AGroup C -22.9132573 1.86817189 -26.6020778 -19.1852142 4 B -0.4137879 0.06369625 -0.5369301 -0.2860994 5 sigma2 17.0078626 5.08890812 8.9411967 27.1577316
- the intercept of the first group (Group A) is
50.2014506
- the mean of the second group (Group B) is
-17.1433463
units greater than (A) - the mean of the third group (Group C) is
-22.9132573
units greater than (A) - a one unit increase in B in Group A is associated with a
-0.4137879
units increase in Y
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(data.mcmcpack[, 2]) # effect of (B-A = 0)
[1] 0
mcmcpvalue(data.mcmcpack[, 3]) # effect of (C-A = 0)
[1] 0
mcmcpvalue(data.mcmcpack[, 4]) # effect of (slope = 0)
[1] 0
mcmcpvalue(data.mcmcpack[, 2:4]) # effect of (model)
[1] 0
There is evidence that the reponse differs between the groups.
Matrix model (JAGS)
print(data.r2jags)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10 n.sims = 15000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] 50.201 1.803 46.617 49.021 50.207 51.404 53.741 1.001 8400 beta[2] -17.141 1.879 -20.832 -18.393 -17.149 -15.910 -13.402 1.001 8400 beta[3] -22.917 1.882 -26.640 -24.161 -22.922 -21.682 -19.181 1.001 5900 beta[4] -0.414 0.065 -0.543 -0.457 -0.414 -0.372 -0.285 1.001 15000 sigma 4.175 0.619 3.178 3.735 4.108 4.531 5.587 1.001 7200 deviance 169.232 3.548 164.517 166.636 168.502 171.075 178.013 1.001 15000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 6.3 and DIC = 175.5 DIC is an estimate of expected predictive error (lower deviance is better).
# OR library(broom) tidyMCMC(as.mcmc(data.r2jags), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 beta[1] 50.2007407 1.80285405 46.5994039 53.6923687 2 beta[2] -17.1407725 1.87934256 -20.8989661 -13.5038133 3 beta[3] -22.9165431 1.88243441 -26.5907610 -19.1480417 4 beta[4] -0.4139048 0.06529536 -0.5434197 -0.2853424 5 deviance 169.2319386 3.54753309 163.9376050 176.1387130 6 sigma 4.1749569 0.61875758 3.0490746 5.3897463
- the intercept of the first group (Group A) is
50.2007407
- the mean of the second group (Group B) is
-17.1407725
units greater than (A) - the mean of the third group (Group C) is
-22.9165431
units greater than (A) - a one unit increase in B in Group A is associated with a
-0.4139048
units increase in Y
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(data.r2jags$BUGSoutput$sims.matrix[, "beta[2]"]) # effect of (B-A = 0)
[1] 0
mcmcpvalue(data.r2jags$BUGSoutput$sims.matrix[, "beta[3]"]) # effect of (C-A = 0)
[1] 0
mcmcpvalue(data.r2jags$BUGSoutput$sims.matrix[, "beta[4]"]) # effect of (slope = 0)
[1] 0
mcmcpvalue(data.r2jags$BUGSoutput$sims.matrix[, 2:4]) # effect of (model)
[1] 0
There is evidence that the reponse differs between the groups.
Matrix model (RSTAN)
print(data.rstan, pars = c("beta", "sigma"))
Inference for Stan model: 3d2414c9dcf4b5e12be870eadd2c894a. 3 chains, each with iter=2000; warmup=500; thin=3; post-warmup draws per chain=500, total post-warmup draws=1500. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta[1] 50.22 0.06 1.79 46.60 49.03 50.24 51.40 53.83 1027 1 beta[2] -17.12 0.06 1.92 -20.88 -18.31 -17.14 -15.96 -13.19 1210 1 beta[3] -22.88 0.05 1.93 -26.78 -24.11 -22.86 -21.58 -19.24 1407 1 beta[4] -0.42 0.00 0.06 -0.54 -0.46 -0.42 -0.37 -0.29 1282 1 sigma 4.11 0.02 0.58 3.12 3.70 4.05 4.45 5.37 1353 1 Samples were drawn using NUTS(diag_e) at Fri Nov 3 15:05:11 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
# OR library(broom) tidyMCMC(data.rstan, conf.int = TRUE, conf.method = "HPDinterval", pars = c("beta", "sigma"))
term estimate std.error conf.low conf.high 1 beta[1] 50.2161150 1.78767398 46.4168478 53.5737947 2 beta[2] -17.1206004 1.91600172 -20.7216779 -13.0233814 3 beta[3] -22.8794576 1.92793878 -26.8184381 -19.3130173 4 beta[4] -0.4157079 0.06337683 -0.5361929 -0.2891039 5 sigma 4.1094504 0.58022288 3.0779992 5.3039276
- the intercept of the first group (Group A) is
50.216115
- the mean of the second group (Group B) is
-17.1206004
units greater than (A) - the mean of the third group (Group C) is
-22.8794576
units greater than (A) - a one unit increase in B in Group A is associated with a
-0.4157079
units increase in Y
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(as.matrix(data.rstan)[, "beta[2]"]) # effect of (B-A = 0)
[1] 0
mcmcpvalue(as.matrix(data.rstan)[, "beta[3]"]) # effect of (C-A = 0)
[1] 0
mcmcpvalue(as.matrix(data.rstan)[, "beta[4]"]) # effect of (slope = 0)
[1] 0
mcmcpvalue(as.matrix(data.rstan)[, 2:4]) # effect of (model)
[1] 0
There is evidence that the reponse differs between the groups.
library(loo) (full = loo(extract_log_lik(data.rstan)))
Computed from 1500 by 30 log-likelihood matrix Estimate SE elpd_loo -87.3 3.5 p_loo 4.6 1.0 looic 174.5 7.1 Pareto k diagnostic values: Count Pct (-Inf, 0.5] (good) 29 96.7% (0.5, 0.7] (ok) 1 3.3% (0.7, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0% All Pareto k estimates are ok (k < 0.7) See help('pareto-k-diagnostic') for details.
# now fit a model without main factor modelString = " data { int<lower=1> n; int<lower=1> nX; vector [n] y; matrix [n,nX] X; } parameters { vector[nX] beta; real<lower=0> sigma; } transformed parameters { vector[n] mu; mu = X*beta; } model { #Likelihood y~normal(mu,sigma); #Priors beta ~ normal(0,1000); sigma~cauchy(0,5); } generated quantities { vector[n] log_lik; for (i in 1:n) { log_lik[i] = normal_lpdf(y[i] | mu[i], sigma); } } " Xmat <- model.matrix(~1, data) data.list <- with(data, list(y = Y, X = Xmat, n = nrow(data), nX = ncol(Xmat))) data.rstan.red <- stan(data = data.list, model_code = modelString, chains = 3, iter = 2000, warmup = 500, thin = 3)
In file included from /usr/local/lib/R/site-library/BH/include/boost/config.hpp:39:0, from /usr/local/lib/R/site-library/BH/include/boost/math/tools/config.hpp:13, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/var.hpp:7, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:12, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4, from file12361ebf982d.cpp:8: /usr/local/lib/R/site-library/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: "BOOST_NO_CXX11_RVALUE_REFERENCES" redefined # define BOOST_NO_CXX11_RVALUE_REFERENCES ^ <command-line>:0:0: note: this is the location of the previous definition SAMPLING FOR MODEL '3b057d3d81cbed2078ce678376a94574' NOW (CHAIN 1). Gradient evaluation took 1.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.018509 seconds (Warm-up) 0.041875 seconds (Sampling) 0.060384 seconds (Total) SAMPLING FOR MODEL '3b057d3d81cbed2078ce678376a94574' NOW (CHAIN 2). Gradient evaluation took 7e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.017436 seconds (Warm-up) 0.046029 seconds (Sampling) 0.063465 seconds (Total) SAMPLING FOR MODEL '3b057d3d81cbed2078ce678376a94574' NOW (CHAIN 3). Gradient evaluation took 6e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.012316 seconds (Warm-up) 0.026582 seconds (Sampling) 0.038898 seconds (Total)
(reduced = loo(extract_log_lik(data.rstan.red)))
Computed from 1500 by 30 log-likelihood matrix Estimate SE elpd_loo -116.0 3.2 p_loo 1.6 0.4 looic 232.0 6.4 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
compare_models(full, reduced)
Error in discrete == discrete[1]: comparison of these types is not implemented
Matrix model (RSTANARM)
summary(data.rstanarm)
Model Info: function: stan_glm family: gaussian [identity] formula: Y ~ A + B algorithm: sampling priors: see help('prior_summary') sample: 2250 (posterior sample size) num obs: 30 Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) 50.2 1.9 46.4 49.0 50.2 51.5 54.1 AGroup B -17.2 1.9 -21.0 -18.5 -17.2 -15.9 -13.3 AGroup C -22.9 2.0 -26.6 -24.2 -22.9 -21.6 -19.0 B -0.4 0.1 -0.5 -0.5 -0.4 -0.4 -0.3 sigma 4.2 0.6 3.2 3.8 4.1 4.5 5.6 mean_PPD 30.0 1.1 27.8 29.3 30.0 30.7 32.2 log-posterior -98.3 1.8 -103.1 -99.1 -97.9 -96.9 -95.9 Diagnostics: mcse Rhat n_eff (Intercept) 0.0 1.0 1980 AGroup B 0.0 1.0 2145 AGroup C 0.0 1.0 2250 B 0.0 1.0 2250 sigma 0.0 1.0 1200 mean_PPD 0.0 1.0 1720 log-posterior 0.1 1.0 960 For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
# OR library(broom) tidyMCMC(data.rstanarm$stanfit, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 50.2286902 1.9365801 46.4428738 54.1263008 2 AGroup B -17.2114440 1.9386491 -21.0066484 -13.4338089 3 AGroup C -22.9037572 1.9679361 -26.6757211 -19.0189639 4 B -0.4138748 0.0681432 -0.5451839 -0.2852376 5 sigma 4.1990238 0.6317351 3.0581470 5.4328442 6 mean_PPD 30.0266842 1.1171901 27.8860300 32.2422092 7 log-posterior -98.2605466 1.8364590 -101.7802647 -95.7415132
- the intercept of the first group (Group A) is
50.2286902
- the mean of the second group (Group B) is
-17.211444
units greater than (A) - the mean of the third group (Group C) is
-22.9037572
units greater than (A) - a one unit increase in B in Group A is associated with a
-0.4138748
units increase in Y
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero head(as.matrix(data.rstanarm))
parameters iterations (Intercept) AGroup B AGroup C B sigma [1,] 49.25161 -18.13674 -20.71374 -0.3963549 4.384061 [2,] 50.28149 -20.18446 -23.60224 -0.3823662 4.601748 [3,] 47.22865 -13.74348 -20.42610 -0.3439257 3.559008 [4,] 50.36383 -18.72387 -23.75338 -0.3776768 4.211066 [5,] 49.33028 -18.88190 -20.94786 -0.3634085 3.830714 [6,] 49.08087 -14.40159 -22.87480 -0.4144611 3.793997
mcmcpvalue(as.matrix(data.rstanarm)[, "AGroup B"]) # effect of (B-A = 0)
[1] 0
mcmcpvalue(as.matrix(data.rstanarm)[, "AGroup C"]) # effect of (C-A = 0)
[1] 0
mcmcpvalue(as.matrix(data.rstanarm)[, "B"]) # effect of (slope = 0)
[1] 0
mcmcpvalue(as.matrix(data.rstanarm)[, 2:4]) # effect of (model)
[1] 0
There is evidence that the reponse differs between the groups. There is very little evidence that the response of group D differs from that of group A
library(loo) (full = loo(data.rstanarm))
Computed from 2250 by 30 log-likelihood matrix Estimate SE elpd_loo -87.2 3.3 p_loo 4.3 0.9 looic 174.4 6.7 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
data.rstanarm.red = update(data.rstanarm, . ~ 1)
Gradient evaluation took 3.8e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.38 seconds. Adjust your expectations accordingly! Elapsed Time: 0.027383 seconds (Warm-up) 0.039991 seconds (Sampling) 0.067374 seconds (Total) Gradient evaluation took 1.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. Adjust your expectations accordingly! Elapsed Time: 0.0218 seconds (Warm-up) 0.039113 seconds (Sampling) 0.060913 seconds (Total) Gradient evaluation took 1.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. Adjust your expectations accordingly! Elapsed Time: 0.02501 seconds (Warm-up) 0.043676 seconds (Sampling) 0.068686 seconds (Total)
(reduced = loo(data.rstanarm.red))
Computed from 2250 by 30 log-likelihood matrix Estimate SE elpd_loo -116.0 3.1 p_loo 1.5 0.4 looic 232.0 6.1 All Pareto k estimates are good (k < 0.5) See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
compare_models(full, reduced)
elpd_diff se -28.8 4.5
Matrix model (BRMS)
summary(data.brms)
Family: gaussian(identity) Formula: Y ~ A + B Data: data (Number of observations: 30) Samples: 3 chains, each with iter = 2000; warmup = 500; thin = 2; total post-warmup samples = 2250 ICs: LOO = NA; WAIC = NA; R2 = NA Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 50.20 1.76 46.79 53.75 2177 1 AGroupB -17.17 1.83 -20.81 -13.68 2158 1 AGroupC -22.89 1.85 -26.43 -19.25 2127 1 B -0.41 0.06 -0.54 -0.29 2129 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 4.11 0.59 3.14 5.41 1854 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
# OR library(broom) tidyMCMC(data.brms$fit, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 b_Intercept 50.1970335 1.75540856 46.6017008 53.5440980 2 b_AGroupB -17.1728328 1.83135058 -20.6803108 -13.5582289 3 b_AGroupC -22.8915861 1.85449337 -26.3846325 -19.2178870 4 b_B -0.4122714 0.06370931 -0.5416812 -0.2961189 5 sigma 4.1081576 0.58644550 3.1075609 5.3375706
- the intercept of the first group (Group A) is
50.1970335
- the mean of the second group (Group B) is
-17.1728328
units greater than (A) - the mean of the third group (Group C) is
-22.8915861
units greater than (A) - a one unit increase in B in Group A is associated with a
-0.4122714
units increase in Y
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero head(as.matrix(data.brms))
parameters iterations b_Intercept b_AGroupB b_AGroupC b_B sigma lp__ [1,] 46.14429 -17.89799 -23.20626 -0.1747196 4.234569 -112.8789 [2,] 50.46466 -17.00736 -21.94628 -0.4189975 3.964478 -105.7635 [3,] 49.51423 -16.74688 -22.42555 -0.4566537 3.784797 -106.7988 [4,] 52.03476 -20.63035 -26.45616 -0.3795796 4.637355 -108.6522 [5,] 49.58570 -18.21573 -23.70766 -0.4119922 4.153369 -106.9716 [6,] 51.87136 -19.76110 -26.12697 -0.4639165 3.485067 -109.3362
mcmcpvalue(as.matrix(data.brms)[, "b_AGroupB"]) # effect of (B-A = 0)
[1] 0
mcmcpvalue(as.matrix(data.brms)[, "b_AGroupC"]) # effect of (C-A = 0)
[1] 0
mcmcpvalue(as.matrix(data.brms)[, "b_B"]) # effect of (slope = 0)
[1] 0
mcmcpvalue(as.matrix(data.brms)[, 2:4]) # effect of (model)
[1] 0
There is evidence that the reponse differs between the groups. There is very little evidence that the response of group D differs from that of group A
library(loo) (full = loo(data.brms))
LOOIC SE 174.34 7.05
data.brms.red = update(data.brms, . ~ 1)
SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 1). Gradient evaluation took 1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.019211 seconds (Warm-up) 0.014546 seconds (Sampling) 0.033757 seconds (Total) SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 2). Gradient evaluation took 4e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.015666 seconds (Warm-up) 0.017803 seconds (Sampling) 0.033469 seconds (Total) SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 3). Gradient evaluation took 4e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.016449 seconds (Warm-up) 0.016509 seconds (Sampling) 0.032958 seconds (Total)
(reduced = loo(data.brms.red))
LOOIC SE 232.15 6.38
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
compare_models(full, reduced)
Error in discrete == discrete[1]: comparison of these types is not implemented
Graphical summaries
A nice graphic is often a great accompaniment to a statistical analysis. Although there are no fixed assumptions associated with graphing (in contrast to statistical analyses), we often want the graphical summaries to reflect the associated statistical analyses. After all, the sample is just one perspective on the population(s). What we are more interested in is being able to estimate and depict likely population parameters/trends.
Thus, whilst we could easily provide a plot displaying the raw data along with simple measures of location and spread, arguably, we should use estimates that reflect the fitted model. In this case, it would be appropriate to plot the credibility interval associated with each group.
Matrix model (MCMCpack)
mcmc = data.mcmcpack ## Calculate the fitted values newdata = expand.grid(A = levels(data$A), B = seq(min(data$B), max(data$B), len = 100)) Xmat = model.matrix(~A + B, newdata) coefs = mcmc[, 1:4] fit = coefs %*% t(Xmat) newdata = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ggplot(newdata, aes(y = estimate, x = B, fill = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
As this is simple single factor ANOVA, we can simple add the raw data to this figure. For more complex designs with additional predictors, it is necessary to plot partial residuals.
## Calculate partial residuals fitted values fdata = rdata = data fMat = rMat = model.matrix(~A + B, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$Y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) ggplot(newdata, aes(y = estimate, x = B, fill = A)) + geom_point(data = rdata, aes(y = partial.resid, x = B, color = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
Matrix model (JAGS)
mcmc = data.r2jags$BUGSoutput$sims.matrix ## Calculate the fitted values newdata = expand.grid(A = levels(data$A), B = seq(min(data$B), max(data$B), len = 100)) Xmat = model.matrix(~A + B, newdata) coefs = mcmc[, c("beta[1]", "beta[2]", "beta[3]", "beta[4]")] fit = coefs %*% t(Xmat) newdata = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ggplot(newdata, aes(y = estimate, x = B, fill = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
As this is simple single factor ANOVA, we can simple add the raw data to this figure. For more complex designs with additional predictors, it is necessary to plot partial residuals.
## Calculate partial residuals fitted values fdata = rdata = data fMat = rMat = model.matrix(~A + B, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$Y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) ggplot(newdata, aes(y = estimate, x = B, fill = A)) + geom_point(data = rdata, aes(y = partial.resid, x = B, color = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
Matrix model (RSTAN)
mcmc = as.matrix(data.rstan) ## Calculate the fitted values newdata = expand.grid(A = levels(data$A), B = seq(min(data$B), max(data$B), len = 100)) Xmat = model.matrix(~A + B, newdata) coefs = mcmc[, c("beta[1]", "beta[2]", "beta[3]", "beta[4]")] fit = coefs %*% t(Xmat) newdata = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ggplot(newdata, aes(y = estimate, x = B, fill = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
As this is simple single factor ANOVA, we can simple add the raw data to this figure. For more complex designs with additional predictors, it is necessary to plot partial residuals.
## Calculate partial residuals fitted values fdata = rdata = data fMat = rMat = model.matrix(~A + B, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data$Y - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = resid + fit) ggplot(newdata, aes(y = estimate, x = B, fill = A)) + geom_point(data = rdata, aes(y = partial.resid, x = B, color = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
Matrix model (RSTANARM)
## Calculate the fitted values newdata = expand.grid(A = levels(data$A), B = seq(min(data$B), max(data$B), len = 100)) fit = posterior_linpred(data.rstanarm, newdata = newdata) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) ggplot(newdata, aes(y = estimate, x = B, fill = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
As this is simple single factor ANOVA, we can simple add the raw data to this figure. For more complex designs with additional predictors, it is necessary to plot partial residuals.
## Calculate partial residuals rdata = data pp = posterior_linpred(data.rstanarm, newdata = rdata) fit = as.vector(apply(pp, 2, median)) resid = resid(data.rstanarm) rdata = rdata %>% mutate(partial.resid = resid + fit) ggplot(newdata, aes(y = estimate, x = B, fill = A)) + geom_point(data = rdata, aes(y = partial.resid, x = B, color = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
Matrix model (BRMS)
Although we could calculated the fitted values via matrix multiplication of the coefficients and the model matrix (as for MCMCpack, RJAGS and RSTAN), for more complex models, it is more convenient to use the marginal_effects function that comes with brms.
plot(marginal_effects(data.brms), points = TRUE)
# OR eff = plot(marginal_effects(data.brms), points = TRUE, plot = FALSE) eff
$A
$B
## Calculate the fitted values newdata = expand.grid(A = levels(data$A), B = seq(min(data$B), max(data$B), len = 100)) fit = fitted(data.brms, newdata = newdata, summary = FALSE) newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) ggplot(newdata, aes(y = estimate, x = B, fill = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
As this is simple single factor ANOVA, we can simple add the raw data to this figure. For more complex designs with additional predictors, it is necessary to plot partial residuals.
## Calculate partial residuals rdata = data fit = fitted(data.brms, summary = TRUE)[, "Estimate"] resid = resid(data.brms)[, "Estimate"] rdata = rdata %>% mutate(partial.resid = resid + fit) ggplot(newdata, aes(y = estimate, x = B, fill = A)) + geom_point(data = rdata, aes(y = partial.resid, x = B, color = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
Posteriors
In frequentist statistics, when we have more than two groups, we are typically not only interested in whether there is evidence for an overall "effect" of a factor - we are also interested in how various groups compare to one another.
To explore these trends, we either compare each group to each other in a pairwise manner (controlling for family-wise Type I error rates) or we explore an independent subset of the possible comparisons. Although these alternate approaches can adequately address a specific research agenda, often they impose severe limitations and compromises on the scope and breadth of questions that can be asked of your data. The reason for these limitations is that in a frequentist framework, any single hypothesis carries with it a (nominally) 5% chance of a false rejection (since it is based on long-run frequency). Thus, performing multiple tests are likely to compound this error rate. The point is, that each comparison is compared to its own probability distribution (and each carries a 5% error rate).
By contrast, in Bayesian statistics, all comparisons (contrasts) are drawn from the one (hopefully stable and convergent) posterior distribution and this posterior is invariant to the type and number of comparisons drawn. Hence, the theory clearly indicates that having generated our posterior distribution, we can then query this distribution in any way that we wish thereby allowing us to explore all of our research questions simultaneously.
Bayesian "contrasts" can be performed either:
- within the Bayesian sampling model or
- construct them from the returned MCMC samples (they are drawn from the posteriors)
In order to allow direct comparison to the frequentist equivalents, I will explore the same set of planned and "Tukey's" test comparisons described here. For the "planned comparison" we defined two contrasts:
- group B vs group C
- group A vs the average of groups B and C
Lets start by comparing each group to each other group in a pairwise manner. Arguably the most elegant way to do this is to generate a Tukey's contrast matrix. This is a model matrix specific to comparing each group to each other group. Again, since the lines are parallel, it does not really matter what level of B we estimate these efffects at - so lets use the mean B.
mcmc = data.mcmcpack coefs <- as.matrix(mcmc)[, 1:4] newdata = expand.grid(A = levels(data$A), B = mean(data$B)) # A Tukeys contrast matrix library(multcomp) tuk.mat <- contrMat(n = table(newdata$A), type = "Tukey") tuk.mat
Multiple Comparisons of Means: Tukey Contrasts Group A Group B Group C Group B - Group A -1 1 0 Group C - Group A -1 0 1 Group C - Group B 0 -1 1
Xmat <- model.matrix(~A + B, data = newdata) Xmat
(Intercept) AGroup B AGroup C B 1 1 0 0 16.50175 2 1 1 0 16.50175 3 1 0 1 16.50175 attr(,"assign") [1] 0 1 1 2 attr(,"contrasts") attr(,"contrasts")$A [1] "contr.treatment"
pairwise.mat <- tuk.mat %*% Xmat pairwise.mat
(Intercept) AGroup B AGroup C B Group B - Group A 0 1 0 0 Group C - Group A 0 0 1 0 Group C - Group B 0 -1 1 0
mcmc_areas(coefs %*% t(pairwise.mat))
(comps = tidyMCMC(coefs %*% t(pairwise.mat), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Group B - Group A -17.143346 1.878211 -20.750365 -13.348582 2 Group C - Group A -22.913257 1.868172 -26.602078 -19.185214 3 Group C - Group B -5.769911 1.821653 -9.552934 -2.271914
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
With a couple of modifications, we could also express this as percentage changes. A percentage change represents the change (difference between groups) divided by one of the groups (determined by which group you want to express the percentage change to). Hence, we generate an additional mcmc matrix that represents the cell means for the divisor group (group we want to express change relative to).
Since the tuk.mat defines comparisons as -1 and 1 pairs, if we simply replace all the -1 with 0, the eventual matrix multiplication will result in estimates of the divisor cell means instread of the difference. We can then divide the original mcmc matrix above with this new divisor mcmc matrix to yeild a mcmc matrix of percentage change.
# Modify the tuk.mat to replace -1 with 0. This will allow us to get a # mcmc matrix of .. tuk.mat[tuk.mat == -1] = 0 comp.mat <- tuk.mat %*% Xmat comp.mat
(Intercept) AGroup B AGroup C B Group B - Group A 1 1 0 16.50175 Group C - Group A 1 0 1 16.50175 Group C - Group B 1 0 1 16.50175
comp.mcmc = 100 * (coefs %*% t(pairwise.mat))/coefs %*% t(comp.mat) (comps = tidyMCMC(comp.mcmc, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Group B - Group A -65.78138 9.862131 -84.99925 -46.617429 2 Group C - Group A -112.84960 15.200206 -144.80969 -84.547452 3 Group C - Group B -28.70578 10.358636 -50.05632 -9.004512
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size (%)") + scale_x_discrete("") + coord_flip() + theme_classic()
And now for the specific planned comparisons (Group B vs Group C as well as Group A vs the average of Groups B and C). This is achieved by generating our own contrast matrix (defining the contributions of each group to each contrast).
c.mat = rbind(c(0, 1, -1), c(1/2, -1/3, -1/3)) c.mat
[,1] [,2] [,3] [1,] 0.0 1.0000000 -1.0000000 [2,] 0.5 -0.3333333 -0.3333333
mcmc = data.mcmcpack coefs <- as.matrix(mcmc)[, 1:4] newdata <- data.frame(A = levels(data$A), B = mean(data$B)) Xmat <- model.matrix(~A + B, data = newdata) c.mat = c.mat %*% Xmat c.mat
(Intercept) AGroup B AGroup C B [1,] 0.0000000 1.0000000 -1.0000000 0.000000 [2,] -0.1666667 -0.3333333 -0.3333333 -2.750292
(comps = tidyMCMC(as.mcmc(coefs %*% t(c.mat)), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 var1 5.769911 1.8216528 2.271914 9.552934 2 var2 6.123330 0.9187204 4.344498 7.965326
Lets start by comparing each group to each other group in a pairwise manner. Arguably the most elegant way to do this is to generate a Tukey's contrast matrix. This is a model matrix specific to comparing each group to each other group. Again, since the lines are parallel, it does not really matter what level of B we estimate these efffects at - so lets use the mean B.
mcmc = data.r2jags$BUGSoutput$sims.matrix coefs <- as.matrix(mcmc)[, 1:4] newdata <- data.frame(A = levels(data$A), B = mean(data$B)) # A Tukeys contrast matrix library(multcomp) tuk.mat <- contrMat(n = table(newdata$A), type = "Tukey") Xmat <- model.matrix(~A + B, data = newdata) pairwise.mat <- tuk.mat %*% Xmat pairwise.mat
(Intercept) AGroup B AGroup C B Group B - Group A 0 1 0 0 Group C - Group A 0 0 1 0 Group C - Group B 0 -1 1 0
mcmc_areas(coefs %*% t(pairwise.mat))
(comps = tidyMCMC(coefs %*% t(pairwise.mat), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Group B - Group A -17.140773 1.879343 -20.898966 -13.503813 2 Group C - Group A -22.916543 1.882434 -26.590761 -19.148042 3 Group C - Group B -5.775771 1.880763 -9.402033 -1.979244
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
With a couple of modifications, we could also express this as percentage changes. A percentage change represents the change (difference between groups) divided by one of the groups (determined by which group you want to express the percentage change to). Hence, we generate an additional mcmc matrix that represents the cell means for the divisor group (group we want to express change relative to).
Since the tuk.mat defines comparisons as -1 and 1 pairs, if we simply replace all the -1 with 0, the eventual matrix multiplication will result in estimates of the divisor cell means instread of the difference. We can then divide the original mcmc matrix above with this new divisor mcmc matrix to yeild a mcmc matrix of percentage change.
# Modify the tuk.mat to replace -1 with 0. This will allow us to get a # mcmc matrix of .. tuk.mat[tuk.mat == -1] = 0 comp.mat <- tuk.mat %*% Xmat comp.mat
(Intercept) AGroup B AGroup C B Group B - Group A 1 1 0 16.50175 Group C - Group A 1 0 1 16.50175 Group C - Group B 1 0 1 16.50175
comp.mcmc = 100 * (coefs %*% t(pairwise.mat))/coefs %*% t(comp.mat) (comps = tidyMCMC(comp.mcmc, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Group B - Group A -65.77396 9.861029 -85.38307 -46.803881 2 Group C - Group A -112.95081 15.533036 -144.44128 -83.512153 3 Group C - Group B -28.79007 10.740187 -50.07490 -8.241323
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size (%)") + scale_x_discrete("") + coord_flip() + theme_classic()
And now for the specific planned comparisons (Group B vs Group C as well as Group A vs the average of Groups B and C). This is achieved by generating our own contrast matrix (defining the contributions of each group to each contrast).
c.mat = rbind(c(0, 1, -1), c(1/2, -1/3, -1/3)) c.mat
[,1] [,2] [,3] [1,] 0.0 1.0000000 -1.0000000 [2,] 0.5 -0.3333333 -0.3333333
mcmc = data.r2jags$BUGSoutput$sims.matrix coefs <- as.matrix(mcmc)[, 1:4] newdata <- data.frame(A = levels(data$A), B = mean(data$B)) Xmat <- model.matrix(~A + B, data = newdata) c.mat = c.mat %*% Xmat c.mat
(Intercept) AGroup B AGroup C B [1,] 0.0000000 1.0000000 -1.0000000 0.000000 [2,] -0.1666667 -0.3333333 -0.3333333 -2.750292
(comps = tidyMCMC(as.mcmc(coefs %*% t(c.mat)), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 var1 5.775771 1.880763 1.979244 9.402033 2 var2 6.124007 0.913995 4.278228 7.879558
Lets start by comparing each group to each other group in a pairwise manner. Arguably the most elegant way to do this is to generate a Tukey's contrast matrix. This is a model matrix specific to comparing each group to each other group. Again, since the lines are parallel, it does not really matter what level of B we estimate these efffects at - so lets use the mean B.
mcmc = data.rstan coefs <- as.matrix(mcmc)[, 1:4] newdata <- data.frame(A = levels(data$A), B = mean(data$B)) # A Tukeys contrast matrix library(multcomp) tuk.mat <- contrMat(n = table(newdata$A), type = "Tukey") Xmat <- model.matrix(~A + B, data = newdata) pairwise.mat <- tuk.mat %*% Xmat pairwise.mat
(Intercept) AGroup B AGroup C B Group B - Group A 0 1 0 0 Group C - Group A 0 0 1 0 Group C - Group B 0 -1 1 0
mcmc_areas(coefs %*% t(pairwise.mat))
(comps = tidyMCMC(coefs %*% t(pairwise.mat), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Group B - Group A -17.120600 1.916002 -20.721678 -13.023381 2 Group C - Group A -22.879458 1.927939 -26.818438 -19.313017 3 Group C - Group B -5.758857 1.908307 -9.465031 -1.972144
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
With a couple of modifications, we could also express this as percentage changes. A percentage change represents the change (difference between groups) divided by one of the groups (determined by which group you want to express the percentage change to). Hence, we generate an additional mcmc matrix that represents the cell means for the divisor group (group we want to express change relative to).
Since the tuk.mat defines comparisons as -1 and 1 pairs, if we simply replace all the -1 with 0, the eventual matrix multiplication will result in estimates of the divisor cell means instread of the difference. We can then divide the original mcmc matrix above with this new divisor mcmc matrix to yeild a mcmc matrix of percentage change.
# Modify the tuk.mat to replace -1 with 0. This will allow us to get a # mcmc matrix of .. tuk.mat[tuk.mat == -1] = 0 comp.mat <- tuk.mat %*% Xmat comp.mat
(Intercept) AGroup B AGroup C B Group B - Group A 1 1 0 16.50175 Group C - Group A 1 0 1 16.50175 Group C - Group B 1 0 1 16.50175
comp.mcmc = 100 * (coefs %*% t(pairwise.mat))/coefs %*% t(comp.mat) (comps = tidyMCMC(comp.mcmc, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Group B - Group A -65.68005 9.938618 -86.57138 -46.874022 2 Group C - Group A -112.65860 15.767824 -145.18904 -84.337691 3 Group C - Group B -28.69047 10.825916 -48.80312 -6.685448
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size (%)") + scale_x_discrete("") + coord_flip() + theme_classic()
And now for the specific planned comparisons (Group B vs Group C as well as Group A vs the average of Groups B and C). This is achieved by generating our own contrast matrix (defining the contributions of each group to each contrast).
c.mat = rbind(c(0, 1, -1), c(1/2, -1/3, -1/3)) c.mat
[,1] [,2] [,3] [1,] 0.0 1.0000000 -1.0000000 [2,] 0.5 -0.3333333 -0.3333333
mcmc = data.rstan coefs <- as.matrix(mcmc)[, 1:4] newdata <- data.frame(A = levels(data$A), B = mean(data$B)) Xmat <- model.matrix(~A + B, data = newdata) c.mat = c.mat %*% Xmat c.mat
(Intercept) AGroup B AGroup C B [1,] 0.0000000 1.0000000 -1.0000000 0.000000 [2,] -0.1666667 -0.3333333 -0.3333333 -2.750292
(comps = tidyMCMC(as.mcmc(coefs %*% t(c.mat)), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 var1 5.758857 1.9083068 1.972144 9.465031 2 var2 6.107318 0.9312412 4.373877 8.159077
Lets start by comparing each group to each other group in a pairwise manner. Arguably the most elegant way to do this is to generate a Tukey's contrast matrix. This is a model matrix specific to comparing each group to each other group. Again, since the lines are parallel, it does not really matter what level of B we estimate these efffects at - so lets use the mean B.
mcmc = data.rstanarm coefs <- as.matrix(mcmc)[, 1:4] newdata <- data.frame(A = levels(data$A), B = mean(data$B)) # A Tukeys contrast matrix library(multcomp) tuk.mat <- contrMat(n = table(newdata$A), type = "Tukey") Xmat <- model.matrix(~A + B, data = newdata) pairwise.mat <- tuk.mat %*% Xmat pairwise.mat
(Intercept) AGroup B AGroup C B Group B - Group A 0 1 0 0 Group C - Group A 0 0 1 0 Group C - Group B 0 -1 1 0
mcmc_areas(coefs %*% t(pairwise.mat))
(comps = tidyMCMC(coefs %*% t(pairwise.mat), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Group B - Group A -17.211444 1.938649 -21.006648 -13.433809 2 Group C - Group A -22.903757 1.967936 -26.675721 -19.018964 3 Group C - Group B -5.692313 1.892174 -9.159957 -1.799432
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
With a couple of modifications, we could also express this as percentage changes. A percentage change represents the change (difference between groups) divided by one of the groups (determined by which group you want to express the percentage change to). Hence, we generate an additional mcmc matrix that represents the cell means for the divisor group (group we want to express change relative to).
Since the tuk.mat defines comparisons as -1 and 1 pairs, if we simply replace all the -1 with 0, the eventual matrix multiplication will result in estimates of the divisor cell means instread of the difference. We can then divide the original mcmc matrix above with this new divisor mcmc matrix to yeild a mcmc matrix of percentage change.
# Modify the tuk.mat to replace -1 with 0. This will allow us to get a # mcmc matrix of .. tuk.mat[tuk.mat == -1] = 0 comp.mat <- tuk.mat %*% Xmat comp.mat
(Intercept) AGroup B AGroup C B Group B - Group A 1 1 0 16.50175 Group C - Group A 1 0 1 16.50175 Group C - Group B 1 0 1 16.50175
comp.mcmc = 100 * (coefs %*% t(pairwise.mat))/coefs %*% t(comp.mat) (comps = tidyMCMC(comp.mcmc, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Group B - Group A -66.16030 10.09871 -86.76507 -46.675359 2 Group C - Group A -112.70201 15.95929 -144.71321 -82.068118 3 Group C - Group B -28.33604 10.75415 -50.15437 -8.304887
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size (%)") + scale_x_discrete("") + coord_flip() + theme_classic()
And now for the specific planned comparisons (Group B vs Group C as well as Group A vs the average of Groups B and C). This is achieved by generating our own contrast matrix (defining the contributions of each group to each contrast).
c.mat = rbind(c(0, 1, -1), c(1/2, -1/3, -1/3)) c.mat
[,1] [,2] [,3] [1,] 0.0 1.0000000 -1.0000000 [2,] 0.5 -0.3333333 -0.3333333
mcmc = data.rstanarm coefs <- as.matrix(mcmc)[, 1:4] newdata <- data.frame(A = levels(data$A), B = mean(data$B)) Xmat <- model.matrix(~A + B, data = newdata) c.mat = c.mat %*% Xmat c.mat
(Intercept) AGroup B AGroup C B [1,] 0.0000000 1.0000000 -1.0000000 0.000000 [2,] -0.1666667 -0.3333333 -0.3333333 -2.750292
(comps = tidyMCMC(as.mcmc(coefs %*% t(c.mat)), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 var1 5.692313 1.8921744 1.799432 9.159957 2 var2 6.138562 0.9558204 4.235113 7.985014
Lets start by comparing each group to each other group in a pairwise manner. Arguably the most elegant way to do this is to generate a Tukey's contrast matrix. This is a model matrix specific to comparing each group to each other group. Again, since the lines are parallel, it does not really matter what level of B we estimate these efffects at - so lets use the mean B.
mcmc = data.brms coefs <- as.matrix(mcmc)[, 1:4] newdata <- data.frame(A = levels(data$A), B = mean(data$B)) # A Tukeys contrast matrix library(multcomp) tuk.mat <- contrMat(n = table(newdata$A), type = "Tukey") Xmat <- model.matrix(~A + B, data = newdata) pairwise.mat <- tuk.mat %*% Xmat pairwise.mat
(Intercept) AGroup B AGroup C B Group B - Group A 0 1 0 0 Group C - Group A 0 0 1 0 Group C - Group B 0 -1 1 0
mcmc_areas(coefs %*% t(pairwise.mat))
(comps = tidyMCMC(coefs %*% t(pairwise.mat), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Group B - Group A -17.172833 1.831351 -20.680311 -13.558229 2 Group C - Group A -22.891586 1.854493 -26.384632 -19.217887 3 Group C - Group B -5.718753 1.846079 -9.169838 -2.065845
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
With a couple of modifications, we could also express this as percentage changes. A percentage change represents the change (difference between groups) divided by one of the groups (determined by which group you want to express the percentage change to). Hence, we generate an additional mcmc matrix that represents the cell means for the divisor group (group we want to express change relative to).
Since the tuk.mat defines comparisons as -1 and 1 pairs, if we simply replace all the -1 with 0, the eventual matrix multiplication will result in estimates of the divisor cell means instread of the difference. We can then divide the original mcmc matrix above with this new divisor mcmc matrix to yeild a mcmc matrix of percentage change.
# Modify the tuk.mat to replace -1 with 0. This will allow us to get a # mcmc matrix of .. tuk.mat[tuk.mat == -1] = 0 comp.mat <- tuk.mat %*% Xmat comp.mat
(Intercept) AGroup B AGroup C B Group B - Group A 1 1 0 16.50175 Group C - Group A 1 0 1 16.50175 Group C - Group B 1 0 1 16.50175
comp.mcmc = 100 * (coefs %*% t(pairwise.mat))/coefs %*% t(comp.mat) (comps = tidyMCMC(comp.mcmc, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Group B - Group A -65.90339 9.650503 -84.05345 -46.582017 2 Group C - Group A -112.55948 15.341280 -143.66383 -84.530670 3 Group C - Group B -28.43318 10.498519 -47.72283 -7.691681
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size (%)") + scale_x_discrete("") + coord_flip() + theme_classic()
And now for the specific planned comparisons (Group B vs Group C as well as Group A vs the average of Groups B and C). This is achieved by generating our own contrast matrix (defining the contributions of each group to each contrast).
c.mat = rbind(c(0, 1, -1), c(1/2, -1/3, -1/3)) c.mat
[,1] [,2] [,3] [1,] 0.0 1.0000000 -1.0000000 [2,] 0.5 -0.3333333 -0.3333333
mcmc = data.brms coefs <- as.matrix(mcmc)[, 1:4] newdata <- data.frame(A = levels(data$A), B = mean(data$B)) Xmat <- model.matrix(~A + B, data = newdata) c.mat = c.mat %*% Xmat c.mat
(Intercept) AGroup B AGroup C B [1,] 0.0000000 1.0000000 -1.0000000 0.000000 [2,] -0.1666667 -0.3333333 -0.3333333 -2.750292
(comps = tidyMCMC(as.mcmc(coefs %*% t(c.mat)), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 var1 5.718753 1.8460785 2.065845 9.169838 2 var2 6.122501 0.8984429 4.344599 7.808558
Finite Population Standard Deviations
Variance components, the amount of added variance attributed to each influence, are traditionally estimated for so called random effects. These are the effects for which the levels employed in the design are randomly selected to represent a broader range of possible levels. For such effects, effect sizes (differences between each level and a reference level) are of little value. Instead, the 'importance' of the variables are measured in units of variance components.
On the other hand, regular variance components for fixed factors (those whose measured levels represent the only levels of interest) are not logical - since variance components estimate variance as if the levels are randomly selected from a larger population. Nevertheless, in order to compare and contrast the scale of variability of both fixed and random factors, it is necessary to measure both on the same scale (sample or population based variance).
Finite-population variance components (Gelman, 2005)
assume that the levels of all
factors (fixed and random) in the design are all the possible levels available.
In other words, they are assumed to represent finite populations of levels.
Sample (rather than population) statistics are then used to calculate these
finite-population variances (or standard deviations).
Since standard deviation (and variance) are bound at zero, standard deviation posteriors are typically non-normal. Consequently, medians and HPD intervals are more robust estimates.
library(broom) mcmc = data.mcmcpack head(mcmc)
Markov Chain Monte Carlo (MCMC) output: Start = 1001 End = 1007 Thinning interval = 1 (Intercept) AGroup B AGroup C B sigma2 [1,] 50.18839 -18.23379 -22.87414 -0.4394640 14.11917 [2,] 49.39556 -15.64877 -20.98086 -0.4528605 13.22004 [3,] 49.44854 -16.36367 -23.66886 -0.3887300 15.57976 [4,] 51.46891 -16.26642 -24.26313 -0.4709924 12.01597 [5,] 50.55617 -17.54613 -22.44262 -0.4252965 16.35805 [6,] 50.73340 -17.54078 -23.75846 -0.4598987 10.84731 [7,] 51.16774 -17.41150 -22.41515 -0.3996571 17.57170
# A wch = grep("^A", colnames(mcmc)) # Get the rowwise standard deviations between effects parameters sd.A = apply(mcmc[, wch], 1, sd) # B wch = grep("^B", colnames(mcmc)) sd.B = sd(data$B) * abs(mcmc[, wch]) # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates wch = grep("(Intercept)|^A|^B", colnames(mcmc)) coefs = mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$Y, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.A, sd.B, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.A 4.080665 1.2858143 1.606486 6.754944 2 sd.B 5.030454 0.7743607 3.478134 6.527505 3 sd.resid 3.983878 0.1906139 3.761980 4.356994
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.A 31.28801 7.123578 16.55312 44.15266 2 sd.B 38.61193 5.120548 28.29820 48.32862 3 sd.resid 30.05025 4.065060 24.60465 38.59845
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_hline(yintercept = 0, linetype = "dashed") + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
Conclusions:
Approximately 31.3%
of the total finite population standard deviation is due to x.
library(broom) mcmc = data.r2jags$BUGSoutput$sims.matrix head(mcmc)
beta[1] beta[2] beta[3] beta[4] deviance sigma [1,] 52.60917 -18.82408 -23.38489 -0.4880702 167.0879 4.283265 [2,] 53.75233 -18.89312 -24.92630 -0.5191816 171.4824 5.124059 [3,] 51.15722 -16.81356 -23.25841 -0.5336914 170.4625 3.705732 [4,] 50.32853 -16.98682 -23.41316 -0.4149829 165.9603 4.545555 [5,] 48.78829 -15.18438 -24.19762 -0.4395482 172.1889 4.458376 [6,] 51.55891 -17.64261 -22.37656 -0.5040106 166.7893 4.155680
# A wch = grep("beta.[2-3]", colnames(mcmc)) # Get the rowwise standard deviations between effects parameters sd.A = apply(mcmc[, wch], 1, sd) # B wch = grep("beta.4", colnames(mcmc)) sd.B = sd(data$B) * abs(mcmc[, wch]) # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates wch = grep("beta.[1-4]", colnames(mcmc)) coefs = mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$Y, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.A, sd.B, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.A 4.085929 1.3242287 1.399537 6.648241 2 sd.B 5.031876 0.7938012 3.468931 6.606399 3 sd.resid 3.991447 0.1970424 3.762837 4.379866
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.A 31.33381 7.376887 15.57164 43.97278 2 sd.B 38.65910 5.313671 27.75097 48.62132 3 sd.resid 29.97544 4.188131 24.58803 39.19370
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_hline(yintercept = 0, linetype = "dashed") + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
Conclusions:
Approximately 31.3%
of the total finite population standard deviation is due to x.
library(broom) mcmc = as.matrix(data.rstan) head(mcmc)
parameters iterations beta[1] beta[2] beta[3] beta[4] sigma mu[1] mu[2] mu[3] mu[4] [1,] 52.68797 -15.89902 -22.48872 -0.5325084 4.190521 49.72136 44.37439 39.97070 51.24060 [2,] 51.04073 -14.86065 -23.63103 -0.4404722 3.869872 48.58686 44.16402 40.52145 49.84351 [3,] 52.08091 -17.19621 -23.79866 -0.4785798 4.179663 49.41474 44.60927 40.65155 50.78012 [4,] 51.54719 -19.52916 -24.99191 -0.3788141 4.283020 49.43681 45.63310 42.50041 50.51756 [5,] 49.24231 -15.18013 -21.04165 -0.3906899 3.719325 47.06578 43.14281 39.91192 48.18041 [6,] 49.65178 -16.39404 -19.21428 -0.4629938 3.691510 47.07244 42.42347 38.59465 48.39336 parameters iterations mu[5] mu[6] mu[7] mu[8] mu[9] mu[10] mu[11] mu[12] mu[13] [1,] 40.47396 41.79718 41.35552 33.11874 51.77358 31.91453 32.08782 35.17453 31.86079 [2,] 40.93773 42.03225 41.66692 34.85374 50.28437 33.85767 32.29147 34.84469 32.10367 [3,] 41.10385 42.29307 41.89613 34.49351 51.25912 33.41126 30.65966 33.43378 30.45562 [4,] 42.85842 43.79973 43.48554 37.62609 50.89671 36.76944 28.67375 30.86957 28.51224 [5,] 40.28116 41.25197 40.92793 34.88479 48.57144 34.00129 30.61307 32.87772 30.44650 [6,] 39.03221 40.18270 39.79869 32.63715 48.85676 31.59015 29.17031 31.85408 28.97291 parameters iterations mu[14] mu[15] mu[16] mu[17] mu[18] mu[19] mu[20] mu[21] mu[22] [1,] 24.12069 24.92430 28.59623 33.75113 31.31670 16.35942 24.54276 24.16979 27.07583 [2,] 25.70134 26.36605 29.40334 33.66730 31.65363 19.28149 26.05046 22.42234 24.82611 [3,] 23.49939 24.22161 27.52168 32.15453 29.96664 16.52413 23.87872 22.86342 25.47515 [4,] 23.00612 23.57779 26.18991 29.85699 28.12520 17.48493 23.30637 22.26606 24.33334 [5,] 24.76775 25.35734 28.05136 31.83340 30.04731 19.07348 25.07742 23.77698 25.90907 [6,] 22.24322 22.94193 26.13451 30.61648 28.49985 15.49512 22.61020 25.19514 27.72182 parameters iterations mu[23] mu[24] mu[25] mu[26] mu[27] mu[28] mu[29] mu[30] log_lik[1] [1,] 21.17639 32.86026 23.41873 25.46849 10.27852 11.46507 20.12482 28.62927 -2.355750 [2,] 19.94630 29.61079 21.80108 23.49658 10.93196 11.91344 19.07648 26.11106 -2.348154 [3,] 20.17316 30.67378 22.18841 24.03059 10.37895 11.44533 19.22809 26.87127 -2.362435 [4,] 20.13662 28.44826 21.73177 23.18992 12.38413 13.22821 19.38856 25.43843 -2.385424 [5,] 21.58078 30.15298 23.22594 24.72981 13.58525 14.45580 20.80927 27.04880 -2.564270 [6,] 22.59250 32.75114 24.54212 26.32431 13.11726 14.14891 21.67820 29.07247 -2.560301 parameters iterations log_lik[2] log_lik[3] log_lik[4] log_lik[5] log_lik[6] log_lik[7] log_lik[8] log_lik[9] [1,] -2.380651 -2.392311 -2.359395 -2.644059 -2.370981 -2.656101 -2.358963 -3.937789 [2,] -2.321658 -2.285946 -2.297979 -2.721302 -2.283643 -2.564277 -2.439362 -3.463785 [3,] -2.366243 -2.356686 -2.349263 -2.769866 -2.352204 -2.562282 -2.450072 -3.731235 [4,] -2.375320 -2.422274 -2.374746 -3.224846 -2.411614 -2.408977 -3.057791 -3.556057 [5,] -2.413647 -2.289148 -2.466128 -2.560216 -2.299999 -2.726477 -2.418549 -2.888813 [6,] -2.546048 -2.467202 -2.424089 -2.338922 -2.442708 -3.079584 -2.224991 -2.983441 parameters iterations log_lik[10] log_lik[11] log_lik[12] log_lik[13] log_lik[14] log_lik[15] log_lik[16] [1,] -2.560221 -2.385313 -2.758740 -2.999655 -2.808351 -2.357355 -5.539627 [2,] -2.291578 -2.298123 -2.669736 -2.956470 -3.313616 -2.305454 -6.602203 [3,] -2.391007 -2.530008 -2.468268 -3.440631 -2.676762 -2.386746 -4.935821 [4,] -2.499489 -2.925431 -2.381090 -4.170168 -2.601230 -2.460896 -4.195091 [5,] -2.246331 -2.469401 -2.312064 -3.614919 -3.014530 -2.232484 -5.873203 [6,] -2.561867 -2.812913 -2.232745 -4.376755 -2.390970 -2.440833 -4.643851 parameters iterations log_lik[17] log_lik[18] log_lik[19] log_lik[20] log_lik[21] log_lik[22] log_lik[23] [1,] -2.602428 -2.541644 -2.411276 -2.925312 -2.550560 -2.796703 -2.397480 [2,] -2.549710 -2.440502 -2.909103 -3.472430 -2.915533 -2.369028 -2.480363 [3,] -2.402926 -2.791773 -2.423397 -2.767720 -2.795438 -2.507550 -2.496696 [4,] -2.397022 -3.282272 -2.553796 -2.661806 -2.936898 -2.413541 -2.518648 [5,] -2.272281 -2.768729 -2.857917 -3.144357 -2.565447 -2.513088 -2.259383 [6,] -2.226004 -3.294590 -2.237378 -2.464606 -2.320908 -3.001039 -2.225789 parameters iterations log_lik[24] log_lik[25] log_lik[26] log_lik[27] log_lik[28] log_lik[29] log_lik[30] [1,] -4.438382 -3.307734 -2.499837 -2.496503 -3.808212 -2.814545 -2.829593 [2,] -3.213944 -2.854593 -2.275338 -2.554512 -3.772549 -3.133717 -2.355333 [3,] -3.512033 -2.945362 -2.369487 -2.507912 -3.821285 -3.044316 -2.505695 [4,] -2.842706 -2.833433 -2.373597 -2.891791 -3.165134 -2.993189 -2.395956 [5,] -3.470845 -3.366605 -2.318398 -3.350387 -2.858380 -2.637408 -2.461301 [6,] -4.845740 -3.980841 -2.585876 -3.176836 -2.957508 -2.450304 -2.981159 parameters iterations lp__ [1,] -56.99328 [2,] -56.17719 [3,] -54.73138 [4,] -55.42088 [5,] -54.94832 [6,] -56.99186
# A wch = grep("beta.[2-3]", colnames(mcmc)) # Get the rowwise standard deviations between effects parameters sd.A = apply(mcmc[, wch], 1, sd) # B wch = grep("beta.[4]", colnames(mcmc)) sd.B = sd(data$B) * abs(mcmc[, wch]) # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates wch = grep("beta", colnames(mcmc)) coefs = mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$Y, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.A, sd.B, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.A 4.072432 1.3484539 1.394517 6.692788 2 sd.B 5.053796 0.7704775 3.514660 6.518542 3 sd.resid 3.993504 0.1978560 3.764060 4.389092
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.A 31.42998 7.687620 14.88253 43.87950 2 sd.B 38.57359 5.482235 27.76693 48.93887 3 sd.resid 30.05401 4.027620 25.10582 39.01512
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_hline(yintercept = 0, linetype = "dashed") + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
Conclusions:
Approximately 31.4%
of the total finite population standard deviation is due to x.
library(broom) mcmc = as.matrix(data.rstanarm) head(mcmc)
parameters iterations (Intercept) AGroup B AGroup C B sigma [1,] 49.25161 -18.13674 -20.71374 -0.3963549 4.384061 [2,] 50.28149 -20.18446 -23.60224 -0.3823662 4.601748 [3,] 47.22865 -13.74348 -20.42610 -0.3439257 3.559008 [4,] 50.36383 -18.72387 -23.75338 -0.3776768 4.211066 [5,] 49.33028 -18.88190 -20.94786 -0.3634085 3.830714 [6,] 49.08087 -14.40159 -22.87480 -0.4144611 3.793997
# A wch = grep("^A", colnames(mcmc)) # Get the rowwise standard deviations between effects parameters sd.A = apply(mcmc[, wch], 1, sd) # B wch = grep("^B", colnames(mcmc)) sd.B = sd(data$B) * abs(mcmc[, wch]) # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates wch = grep("(Intercept)|^A|^B", colnames(mcmc)) coefs = mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$Y, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.A, sd.B, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.A 4.026941 1.3323358 1.272390 6.477067 2 sd.B 5.031510 0.8284226 3.467658 6.627846 3 sd.resid 4.004512 0.2119432 3.762475 4.420304
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.A 31.14949 7.543440 13.88979 43.33205 2 sd.B 38.63719 5.482952 27.23552 49.17710 3 sd.resid 30.23369 4.336975 24.98450 40.19062
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_hline(yintercept = 0, linetype = "dashed") + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
Conclusions:
Approximately 31.1%
of the total finite population standard deviation is due to x.
library(broom) mcmc = as.matrix(data.brms) head(mcmc)
parameters iterations b_Intercept b_AGroupB b_AGroupC b_B sigma lp__ [1,] 46.14429 -17.89799 -23.20626 -0.1747196 4.234569 -112.8789 [2,] 50.46466 -17.00736 -21.94628 -0.4189975 3.964478 -105.7635 [3,] 49.51423 -16.74688 -22.42555 -0.4566537 3.784797 -106.7988 [4,] 52.03476 -20.63035 -26.45616 -0.3795796 4.637355 -108.6522 [5,] 49.58570 -18.21573 -23.70766 -0.4119922 4.153369 -106.9716 [6,] 51.87136 -19.76110 -26.12697 -0.4639165 3.485067 -109.3362
# A wch = grep("^b_A", colnames(mcmc)) # Get the rowwise standard deviations between effects parameters sd.A = apply(mcmc[, wch], 1, sd) # B wch = grep("^b_B", colnames(mcmc)) sd.B = sd(data$B) * abs(mcmc[, wch]) # generate a model matrix newdata = data Xmat = model.matrix(~A + B, newdata) ## get median parameter estimates wch = c(grep("(b_Intercept)|^b_", colnames(mcmc))) coefs = mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$Y, "-") sd.resid = apply(resid, 1, sd) sd.all = cbind(sd.A, sd.B, sd.resid) (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.A 4.043769 1.3053747 1.460773 6.484054 2 sd.B 5.012018 0.7745195 3.599942 6.585264 3 sd.resid 3.982644 0.1855119 3.761674 4.356373
# OR expressed as a percentage (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 sd.A 31.29667 7.332669 15.34081 43.31182 2 sd.B 38.43989 5.149650 28.75407 49.05275 3 sd.resid 30.14396 4.349195 24.45494 39.31055
## we can even plot this as a Bayesian ANOVA table ggplot(fpsd, aes(y = estimate, x = term)) + geom_hline(yintercept = 0, linetype = "dashed") + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate), vjust = -1)) + scale_y_continuous("Finite population standard deviation") + scale_x_discrete() + coord_flip() + theme_classic()
Conclusions:
Approximately 31.3%
of the total finite population standard deviation is due to x.
$R^2$
In a frequentist context, the $R^2$ value is seen as a useful indicator of goodness of fit. Whilst it has long been acknowledged that this measure is not appropriate for comparing models (for such purposes information criterion such as AIC are more appropriate), it is nevertheless useful for estimating the amount (percent) of variance explained by the model.
In a frequentist context, $R^2$ is calculated as the variance in predicted values divided by the variance in the observed (response) values.
Unfortunately, this classical formulation does not translate simply into a Bayesian context since
the equivalently calculated numerator can be larger than the an equivalently calculated denominator - thereby resulting in an $R^2$
greater than 100%. Gelman, Goodrich, Gabry, and Ali (2017)
proposed an alternative
formulation in which the denominator comprises the sum of the explained variance and the variance of the residuals.
So in the standard regression model notation of: $$ \begin{align} y_i \sim{}& N(\mu_i, \sigma)\\ \mu_i =& \mathbf{X}\boldsymbol{\beta} \end{align} $$ The $R^2$ could be formulated as: $$ R^2 = \frac{\sigma^2_f}{\sigma^2_f + \sigma^2_e} $$ where $\sigma^2_f = var(\mu)$, ($\mu = \mathbf{X}\boldsymbol{\beta})$) and for Gaussian models $\sigma^2_e = var(y-\mu)$
library(broom) mcmc <- data.mcmcpack Xmat = model.matrix(~A + B, data) wch = grep("(Intercept)|^A|^B", colnames(mcmc)) coefs = mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$Y, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.8712662 0.02114025 0.8316521 0.8958785
# for comparison with frequentist summary(lm(Y ~ A + B, data))
Call: lm(formula = Y ~ A + B, data = data) Residuals: Min 1Q Median 3Q Max -8.671 -2.911 0.328 2.188 7.407 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 50.20598 1.71450 29.283 < 2e-16 *** AGroup B -17.15033 1.78618 -9.602 4.91e-10 *** AGroup C -22.91188 1.79775 -12.745 1.09e-12 *** B -0.41403 0.06143 -6.740 3.76e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.973 on 26 degrees of freedom Multiple R-squared: 0.884, Adjusted R-squared: 0.8706 F-statistic: 66.05 on 3 and 26 DF, p-value: 2.719e-12
library(broom) mcmc <- data.r2jags$BUGSoutput$sims.matrix Xmat = model.matrix(~A + B, data) wch = grep("beta", colnames(mcmc)) coefs = mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$Y, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.8708952 0.02144096 0.8290317 0.8960092
# for comparison with frequentist summary(lm(Y ~ A + B, data))
Call: lm(formula = Y ~ A + B, data = data) Residuals: Min 1Q Median 3Q Max -8.671 -2.911 0.328 2.188 7.407 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 50.20598 1.71450 29.283 < 2e-16 *** AGroup B -17.15033 1.78618 -9.602 4.91e-10 *** AGroup C -22.91188 1.79775 -12.745 1.09e-12 *** B -0.41403 0.06143 -6.740 3.76e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.973 on 26 degrees of freedom Multiple R-squared: 0.884, Adjusted R-squared: 0.8706 F-statistic: 66.05 on 3 and 26 DF, p-value: 2.719e-12
library(broom) mcmc <- as.matrix(data.rstan) Xmat = model.matrix(~A + B, data) wch = grep("beta", colnames(mcmc)) coefs = mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$Y, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.8706419 0.02178395 0.8297436 0.8959157
# for comparison with frequentist summary(lm(Y ~ A + B, data))
Call: lm(formula = Y ~ A + B, data = data) Residuals: Min 1Q Median 3Q Max -8.671 -2.911 0.328 2.188 7.407 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 50.20598 1.71450 29.283 < 2e-16 *** AGroup B -17.15033 1.78618 -9.602 4.91e-10 *** AGroup C -22.91188 1.79775 -12.745 1.09e-12 *** B -0.41403 0.06143 -6.740 3.76e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.973 on 26 degrees of freedom Multiple R-squared: 0.884, Adjusted R-squared: 0.8706 F-statistic: 66.05 on 3 and 26 DF, p-value: 2.719e-12
library(broom) mcmc <- as.matrix(data.rstanarm) Xmat = model.matrix(~A + B, data) wch = grep("Intercept|^A|^B", colnames(mcmc)) coefs = mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$Y, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.8699886 0.0241677 0.8256175 0.8960312
# for comparison with frequentist summary(lm(Y ~ A + B, data))
Call: lm(formula = Y ~ A + B, data = data) Residuals: Min 1Q Median 3Q Max -8.671 -2.911 0.328 2.188 7.407 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 50.20598 1.71450 29.283 < 2e-16 *** AGroup B -17.15033 1.78618 -9.602 4.91e-10 *** AGroup C -22.91188 1.79775 -12.745 1.09e-12 *** B -0.41403 0.06143 -6.740 3.76e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.973 on 26 degrees of freedom Multiple R-squared: 0.884, Adjusted R-squared: 0.8706 F-statistic: 66.05 on 3 and 26 DF, p-value: 2.719e-12
library(broom) mcmc <- as.matrix(data.brms) Xmat = model.matrix(~A + B, data) wch = grep("b_", colnames(mcmc)) coefs = mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, data$Y, "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.8712322 0.02007052 0.8322119 0.8956157
# for comparison with frequentist summary(lm(Y ~ A + B, data))
Call: lm(formula = Y ~ A + B, data = data) Residuals: Min 1Q Median 3Q Max -8.671 -2.911 0.328 2.188 7.407 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 50.20598 1.71450 29.283 < 2e-16 *** AGroup B -17.15033 1.78618 -9.602 4.91e-10 *** AGroup C -22.91188 1.79775 -12.745 1.09e-12 *** B -0.41403 0.06143 -6.740 3.76e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.973 on 26 degrees of freedom Multiple R-squared: 0.884, Adjusted R-squared: 0.8706 F-statistic: 66.05 on 3 and 26 DF, p-value: 2.719e-12
Dealing with heterogeneous slopes
- the sample size per treatment=10
- the categorical $x$ variable with 3 levels
- the first treatment group has a population mean of 40.
- the other two treatments reduce the mean by 15 and 20 units respectively
- the data are drawn from normal distributions with a mean of 0 and standard deviation of 4 ($\sigma^2=16$)
- the covariate (B) is a continuous variable with a mean of 20 and a standard deviation of 15
set.seed(6) n <- 10 p <- 3 A.eff <- c(40, -15, -20) beta <- c(-0.45, -0.1, 0.5) sigma <- 4 B <- rnorm(n * p, 0, 15) A <- gl(p, n, lab = paste("Group", LETTERS[1:3])) mm <- model.matrix(~A * B) data1 <- data.frame(A = A, B = B, Y = as.numeric(c(A.eff, beta) %*% t(mm)) + rnorm(n * p, 0, 4)) data1$B <- data1$B + 20 head(data1)
A B Y 1 Group A 24.04409 35.49432 2 Group A 10.55022 46.22216 3 Group A 33.02990 29.41898 4 Group A 45.90793 24.10656 5 Group A 20.36281 44.38834 6 Group A 25.52038 36.87477
Exploratory data analysis
library(car) scatterplot(Y ~ B | A, data = data1)
boxplot(Y ~ A, data1)
# OR via ggplot library(ggplot2) ggplot(data1, aes(y = Y, x = B, group = A)) + geom_point() + geom_smooth(method = "lm")
ggplot(data1, aes(y = Y, x = A)) + geom_boxplot()
- there is no evidence of obvious non-normality
- the assumption of linearity seems reasonable
- the variability of the three groups seems approximately equal
- the slopes (Y vs B trends) do appear to differ between treatment groups - in particular, Group C seems to portray a different trend to Groups A and B.
Homogeneity of slopes
We can explore inferential evidence of unequal slopes by examining estimated effects of the interaction between the categorical variable and the covariate. Note, pay no attention to the main effects - only the interaction. Even though I intend to illustrate Bayesian analyses here, for such a simple model, it is considerably simpler to use traditional OLS for testing for the presence of an interaction..
anova(lm(Y ~ B * A, data = data1))
Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) B 1 1257.14 1257.14 98.520 5.685e-10 *** A 2 2042.02 1021.01 80.015 2.420e-11 *** B:A 2 510.02 255.01 19.985 7.778e-06 *** Residuals 24 306.25 12.76 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model fitting or statistical analysis
The multiplicative model is constructed and specified in the same way as the additive model. The difference is that the model matrix ($\mathbf{X}$) contains the interaction terms.
$$ \begin{align} y_{ij} &\sim{} N(\mu_{ij}, \sigma)\\ \mu_{ij} &= \beta_0 + \mathbf{X}\boldsymbol{\beta}\\[1em] \beta_0 &\sim{} N(0,100)\\ \beta &\sim{} N(0,10)\\ \sigma &\sim{} cauchy(0,5)\\ \end{align} $$
library(MCMCpack) data1.mcmcpack <- MCMCregress(Y ~ A * B, data = data1)
Define the model
modelString = " model { #Likelihood for (i in 1:n) { y[i]~dnorm(mean[i],tau) mean[i] <- inprod(beta[],X[i,]) } #Priors for (i in 1:ngroups) { beta[i] ~ dnorm(0, 1.0E-6) } sigma ~ dunif(0, 100) tau <- 1 / (sigma * sigma) } "
Define the data
Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:
- the response variable (y)
- the predictor model matrix (X)
- the total number of observed items (n)
- the number of predictor terms (nX)
X <- model.matrix(~A * B, data1) data1.list <- with(data1, list(y = Y, X = X, n = nrow(data1), ngroups = ncol(X)))
Define the MCMC chain parameters
Next we should define the behavioural parameters of the MCMC sampling chains. Include the following:
- the nodes (estimated parameters) to monitor (return samples for)
- the number of MCMC chains (3)
- the number of burnin steps (1000)
- the thinning factor (10)
- the number of MCMC iterations - determined by the number of samples to save, the rate of thinning and the number of chains
params <- c("beta", "sigma") nChains = 3 burnInSteps = 3000 thinSteps = 10 numSavedSteps = 15000 #across all chains nIter = ceiling(burnInSteps + (numSavedSteps * thinSteps)/nChains) nIter
[1] 53000
Fit the model
Now run the JAGS code via the R2jags interface. Note that the first time jags is run after the R2jags package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.
library(R2jags)
data1.r2jags <- jags(data = data1.list, inits = NULL, parameters.to.save = params, model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter, n.burnin = burnInSteps, n.thin = thinSteps)
Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 30 Unobserved stochastic nodes: 7 Total graph size: 298 Initializing model
print(data1.r2jags)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10 n.sims = 15000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] 49.553 2.336 45.049 48.003 49.538 51.092 54.162 1.001 15000 beta[2] -12.227 3.093 -18.248 -14.285 -12.228 -10.182 -6.142 1.001 11000 beta[3] -28.270 2.993 -34.198 -30.236 -28.275 -26.304 -22.340 1.001 15000 beta[4] -0.495 0.093 -0.680 -0.556 -0.494 -0.433 -0.313 1.001 15000 beta[5] -0.121 0.115 -0.347 -0.197 -0.122 -0.046 0.105 1.001 15000 beta[6] 0.410 0.111 0.193 0.336 0.410 0.482 0.636 1.001 15000 sigma 3.772 0.585 2.827 3.354 3.707 4.106 5.105 1.001 15000 deviance 163.101 4.379 156.875 159.924 162.329 165.512 173.639 1.001 15000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 9.6 and DIC = 172.7 DIC is an estimate of expected predictive error (lower deviance is better).
data1.mcmc.list <- as.mcmc(data1.r2jags)
Model matrix formulation
The minimum model in Stan required to fit the above simple regression follows. Note the following modifications from the model defined in JAGS:- the normal distribution is defined by variance rather than precision
- rather than using a uniform prior for sigma, I am using a half-Cauchy
We now translate the likelihood model into STAN code.
$$\begin{align}
y_{ij}&\sim{}N(\mu_{ij}, \sigma)\\
\mu_{ij} &= \mathbf{X}\boldsymbol{\beta}\\
\beta_0&\sim{}N(0,100)\\
\beta&\sim{}N(0,10)\\
\sigma&\sim{}Cauchy(0,5)\\
\end{align}
$$
Define the model
modelString = " data { int<lower=1> n; int<lower=1> nX; vector [n] y; matrix [n,nX] X; } parameters { vector[nX] beta; real<lower=0> sigma; } transformed parameters { vector[n] mu; mu = X*beta; } model { #Likelihood y~normal(mu,sigma); #Priors beta ~ normal(0,100); sigma~cauchy(0,5); } generated quantities { vector[n] log_lik; for (i in 1:n) { log_lik[i] = normal_lpdf(y[i] | mu[i], sigma); } } "
Define the data
Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:
- the response variable (y)
- the predictor model matrix (X)
- the total number of observed items (n)
- the number of predictor terms (nX)
Xmat <- model.matrix(~A * B, data1) data1.list <- with(data1, list(y = Y, X = Xmat, nX = ncol(Xmat), n = nrow(data1)))
Fit the model
Now run the JAGS code via the R2jags interface. Note that the first time jags is run after the R2jags package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.
## load the rstan package library(rstan)
data1.rstan <- stan(data = data1.list, model_code = modelString, chains = 3, iter = 2000, warmup = 500, thin = 3)
SAMPLING FOR MODEL '3d2414c9dcf4b5e12be870eadd2c894a' NOW (CHAIN 1). Gradient evaluation took 2.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.22 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.099447 seconds (Warm-up) 0.151706 seconds (Sampling) 0.251153 seconds (Total) SAMPLING FOR MODEL '3d2414c9dcf4b5e12be870eadd2c894a' NOW (CHAIN 2). Gradient evaluation took 7e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.106983 seconds (Warm-up) 0.116191 seconds (Sampling) 0.223174 seconds (Total) SAMPLING FOR MODEL '3d2414c9dcf4b5e12be870eadd2c894a' NOW (CHAIN 3). Gradient evaluation took 7e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.142543 seconds (Warm-up) 0.116691 seconds (Sampling) 0.259234 seconds (Total)
print(data1.rstan, par = c("beta", "sigma"))
Inference for Stan model: 3d2414c9dcf4b5e12be870eadd2c894a. 3 chains, each with iter=2000; warmup=500; thin=3; post-warmup draws per chain=500, total post-warmup draws=1500. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta[1] 49.50 0.08 2.34 44.55 48.01 49.58 51.01 54.11 902 1 beta[2] -12.29 0.09 3.04 -18.60 -14.30 -12.17 -10.22 -6.70 1094 1 beta[3] -28.21 0.10 2.97 -33.80 -30.23 -28.34 -26.40 -22.04 957 1 beta[4] -0.49 0.00 0.09 -0.67 -0.55 -0.49 -0.44 -0.31 978 1 beta[5] -0.12 0.00 0.11 -0.34 -0.19 -0.12 -0.05 0.11 1164 1 beta[6] 0.41 0.00 0.11 0.18 0.34 0.41 0.48 0.62 953 1 sigma 3.73 0.02 0.58 2.81 3.32 3.66 4.05 5.07 1271 1 Samples were drawn using NUTS(diag_e) at Sat Nov 4 07:45:04 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
library(rstanarm) library(broom) library(coda)
data1.rstanarm = stan_glm(Y ~ A * B, data = data1, iter = 2000, warmup = 500, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 100), prior = normal(0, 100), prior_aux = cauchy(0, 2))
Gradient evaluation took 7.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.71 seconds. Adjust your expectations accordingly! Elapsed Time: 0.586557 seconds (Warm-up) 0.34573 seconds (Sampling) 0.932287 seconds (Total) Gradient evaluation took 2.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds. Adjust your expectations accordingly! Elapsed Time: 0.478284 seconds (Warm-up) 0.353797 seconds (Sampling) 0.832081 seconds (Total) Gradient evaluation took 1.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. Adjust your expectations accordingly! Elapsed Time: 0.438293 seconds (Warm-up) 0.338371 seconds (Sampling) 0.776664 seconds (Total)
print(data1.rstanarm)
stan_glm family: gaussian [identity] formula: Y ~ A * B ------ Estimates: Median MAD_SD (Intercept) 49.6 2.3 AGroup B -12.3 3.1 AGroup C -28.3 3.0 B -0.5 0.1 AGroup B:B -0.1 0.1 AGroup C:B 0.4 0.1 sigma 3.7 0.6 Sample avg. posterior predictive distribution of y (X = xbar): Median MAD_SD mean_PPD 26.9 1.0 ------ For info on the priors used see help('prior_summary.stanreg').
tidyMCMC(data1.rstanarm, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 49.5438430 2.38328655 44.8579446 54.1707410 2 AGroup B -12.3097610 3.13510315 -18.8059933 -6.6018889 3 AGroup C -28.2751561 3.02810337 -33.8195233 -21.9065319 4 B -0.4940998 0.09381214 -0.6922853 -0.3200262 5 AGroup B:B -0.1204605 0.11724241 -0.3514843 0.1104638 6 AGroup C:B 0.4111006 0.11271108 0.1810014 0.6214744 7 sigma 3.7727701 0.59136185 2.7322401 4.9499110
library(brms) library(broom) library(coda)
data1.brms = brm(Y ~ A * B, data = data1, iter = 2000, warmup = 500, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 100), class = "Intercept"), prior(normal(0, 100), class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 2.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds. Adjust your expectations accordingly! Elapsed Time: 0.094494 seconds (Warm-up) 0.100503 seconds (Sampling) 0.194997 seconds (Total) Gradient evaluation took 8e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds. Adjust your expectations accordingly! Elapsed Time: 0.076777 seconds (Warm-up) 0.104894 seconds (Sampling) 0.181671 seconds (Total) Gradient evaluation took 9e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds. Adjust your expectations accordingly! Elapsed Time: 0.091052 seconds (Warm-up) 0.09965 seconds (Sampling) 0.190702 seconds (Total)
print(data1.brms)
Family: gaussian(identity) Formula: Y ~ A * B Data: data1 (Number of observations: 30) Samples: 3 chains, each with iter = 2000; warmup = 500; thin = 2; total post-warmup samples = 2250 ICs: LOO = NA; WAIC = NA; R2 = NA Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 49.60 2.32 44.97 54.25 1182 1 AGroupB -12.33 3.06 -18.49 -6.37 1216 1 AGroupC -28.29 2.97 -34.18 -22.34 1246 1 B -0.49 0.09 -0.67 -0.32 1160 1 AGroupB:B -0.12 0.11 -0.35 0.11 1186 1 AGroupC:B 0.41 0.11 0.19 0.62 1248 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 3.71 0.54 2.81 4.98 1423 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
tidyMCMC(data1.brms, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 b_Intercept 49.6027270 2.3173028 44.8624049 54.0547491 2 b_AGroupB -12.3262056 3.0553552 -18.7775957 -6.7594663 3 b_AGroupC -28.2870152 2.9659002 -33.8736952 -22.1495177 4 b_B -0.4926334 0.0908813 -0.6696454 -0.3177777 5 b_AGroupB:B -0.1212027 0.1126979 -0.3497945 0.1053154 6 b_AGroupC:B 0.4067397 0.1097094 0.1878784 0.6203292 7 sigma 3.7129053 0.5394484 2.6776034 4.7406944
MCMC diagnostics
In addition to the regular model diagnostic checks (such as residual plots), for Bayesian analyses, it is necessary to explore the characteristics of the MCMC chains and the sampler in general. Recall that the purpose of MCMC sampling is to replicate the posterior distribution of the model likelihood and priors by drawing a known number of samples from this posterior (thereby formulating a probability distribution). This is only reliable if the MCMC samples accurately reflect the posterior.
Unfortunately, since we only know the posterior in the most trivial of circumstances, it is necessary to rely on indirect measures of how accurately the MCMC samples are likely to reflect the likelihood. I will breifly outline the most important diagnostics, however, please refer to Tutorial 4.3, Secton 3.1: Markov Chain Monte Carlo sampling for a discussion of these diagnostics.
- Traceplots for each parameter illustrate the MCMC sample values after each successive
iteration along the chain. Bad chain mixing (characterized by any sort of pattern) suggests
that the MCMC sampling chains may not have completely traversed all features of the posterior
distribution and that more iterations are required to ensure the distribution has been accurately
represented.
- Autocorrelation plot for each paramter illustrate the degree of correlation between
MCMC samples separated by different lags. For example, a lag of 0 represents the degree of
correlation between each MCMC sample and itself (obviously this will be a correlation of 1).
A lag of 1 represents the degree of correlation between each MCMC sample and the next sample along the Chain
and so on. In order to be able to generate unbiased estimates of parameters, the MCMC samples should be
independent (uncorrelated). In the figures below, this would be violated in the top autocorrelation plot and met in the bottom
autocorrelation plot.
- Rhat statistic for each parameter provides a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05. If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or overly vague priors that permit sampling in otherwise nonscence parameter space.
Prior to inspecting any summaries of the parameter estimates, it is prudent to inspect a range of chain convergence diagnostics.
- Trace plots
View trace plots
library(MCMCpack) plot(data1.mcmcpack)
- Raftery diagnostic
View Raftery diagnostic
library(MCMCpack) raftery.diag(data1.mcmcpack)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) (Intercept) 2 3929 3746 1.050 AGroup B 2 3650 3746 0.974 AGroup C 2 3710 3746 0.990 B 2 3650 3746 0.974 AGroup B:B 2 3962 3746 1.060 AGroup C:B 2 3865 3746 1.030 sigma2 2 3929 3746 1.050
- Autocorrelation diagnostic
View autocorrelations
library(MCMCpack) autocorr.diag(data1.mcmcpack)
(Intercept) AGroup B AGroup C B AGroup B:B AGroup C:B sigma2 Lag 0 1.00000000 1.000000000 1.000000000 1.000000000 1.000000000 1.0000000000 1.000000000 Lag 1 -0.01234075 -0.008277569 -0.011438936 -0.003198364 -0.011825247 -0.0003818285 0.217142614 Lag 5 0.01859622 0.019122855 -0.003823357 0.013434001 0.026688474 0.0020263629 0.005952516 Lag 10 0.01991699 0.021417875 0.011460174 0.013885559 0.022966909 0.0030683915 0.023030028 Lag 50 -0.00543317 -0.009493830 -0.007385782 -0.005974147 -0.008253629 0.0009582506 0.011223549
Again, prior to examining the summaries, we should have explored the convergence diagnostics.
library(coda) data1.mcmc = as.mcmc(data1.r2jags)
- Trace plots
plot(data1.mcmc)
When there are a lot of parameters, this can result in a very large number of traceplots. To focus on just certain parameters (such as $\beta$s)
preds <- c("beta[1]", "beta[2]", "beta[3]", "beta[4]") plot(as.mcmc(data1.r2jags)[, preds])
- Raftery diagnostic
raftery.diag(data1.mcmc)
[[1]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta[1] 20 37410 3746 9.99 beta[2] 20 38660 3746 10.30 beta[3] 20 36800 3746 9.82 beta[4] 20 38660 3746 10.30 beta[5] 20 37410 3746 9.99 beta[6] 20 38030 3746 10.20 deviance 20 39000 3746 10.40 sigma 20 36200 3746 9.66 [[2]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta[1] 20 37410 3746 9.99 beta[2] 20 38030 3746 10.20 beta[3] 20 38050 3746 10.20 beta[4] 10 37410 3746 9.99 beta[5] 20 36800 3746 9.82 beta[6] 20 36800 3746 9.82 deviance 20 38030 3746 10.20 sigma 20 36800 3746 9.82 [[3]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta[1] 20 36200 3746 9.66 beta[2] 20 37410 3746 9.99 beta[3] 20 39300 3746 10.50 beta[4] 20 39300 3746 10.50 beta[5] 20 36800 3746 9.82 beta[6] 20 36800 3746 9.82 deviance 20 39300 3746 10.50 sigma 20 36810 3746 9.83
- Autocorrelation diagnostic
autocorr.diag(data1.mcmc)
beta[1] beta[2] beta[3] beta[4] beta[5] beta[6] deviance Lag 0 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000 1.0000000000 Lag 10 0.004437399 0.002545394 0.005457633 0.002408673 0.005032132 0.003931065 -0.0005090962 Lag 50 -0.005470768 -0.006470215 -0.008316906 -0.004031177 -0.009725834 -0.003744444 0.0069346156 Lag 100 -0.006510539 -0.018290658 -0.009018695 0.003875208 -0.005127466 0.004179197 -0.0033519319 Lag 500 -0.002091585 -0.006610883 0.001302301 -0.000777923 -0.003703581 0.005995650 0.0081995570 sigma Lag 0 1.000000000 Lag 10 0.002181234 Lag 50 -0.002582261 Lag 100 0.004125329 Lag 500 0.008170166
Again, prior to examining the summaries, we should have explored the convergence diagnostics. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarization).
- extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
- use the numerous routines that come with the rstan package
- use the routines that come with the bayesplot package
- explore the diagnostics interactively via shinystan
- via coda
- Traceplots
- Autocorrelation
library(coda) s = as.array(data1.rstan) wch = grep("beta", dimnames(s)$parameters) s = s[, , wch] mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) plot(mcmc)
library(coda) s = as.array(data1.rstan) wch = grep("beta", dimnames(s)$parameters) s = s[, , wch] mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) autocorr.diag(mcmc)
beta[1] beta[2] beta[3] beta[4] beta[5] Lag 0 1.00000000 1.000000000 1.000000000 1.000000000 1.000000000 Lag 1 0.19962255 0.134452951 0.162486507 0.181592704 0.125717018 Lag 5 -0.01473278 0.013954338 -0.020209597 -0.007642484 0.007016602 Lag 10 -0.01802321 -0.008248189 -0.007702265 -0.019093851 -0.002760047 Lag 50 -0.03711757 -0.035737496 -0.030485873 -0.020661680 -0.014791965
- via rstan
- Traceplots
stan_trace(data1.rstan)
- Raftery diagnostic
raftery.diag(data1.rstan)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
- Autocorrelation diagnostic
stan_ac(data1.rstan)
- Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05.
If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this
mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling
in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or
overly vague priors that permit sampling in otherwise nonscence parameter space.
stan_rhat(data1.rstan)
- Another measure of sampling efficiency is Effective Sample Size (ess).
ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples
on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain
may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
stan_ess(data1.rstan)
- Traceplots
- via bayesplot
- Trace plots and density plots
library(bayesplot) mcmc_trace(as.matrix(data1.rstan), regex_pars = "beta|sigma")
library(bayesplot) mcmc_combo(as.matrix(data1.rstan), regex_pars = "beta|sigma")
- Density plots
library(bayesplot) mcmc_dens(as.matrix(data1.rstan), regex_pars = "beta|sigma")
- Trace plots and density plots
- via shinystan
library(shinystan) launch_shinystan(data1.rstan)
- It is worth exploring the influence of our priors.
Again, prior to examining the summaries, we should have explored the convergence diagnostics. There are numerous ways of working with STANARM model fits (for exploring diagnostics and summarization).
- extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
- use the numerous routines that come with the rstan package
- use the routines that come with the bayesplot package
- explore the diagnostics interactively via shinystan
- via coda
- Traceplots
- Autocorrelation
library(coda) s = as.array(data1.rstanarm) mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) plot(mcmc)
library(coda) s = as.array(data1.rstanarm) mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc)) autocorr.diag(mcmc)
(Intercept) AGroup B AGroup C B AGroup B:B AGroup C:B Lag 0 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000 Lag 1 0.051997304 0.022839434 0.049122699 0.029332271 0.018600498 0.039303026 Lag 5 -0.009030666 -0.023573121 -0.021521616 0.002608995 -0.008629405 -0.008541973 Lag 10 0.022740083 0.005983251 -0.006472736 0.009874178 -0.009109182 -0.041524169 Lag 50 0.023718877 0.022629378 0.043072824 0.029933371 0.029119110 0.025530832
- via rstan
- Traceplots
stan_trace(data1.rstanarm)
- Raftery diagnostic
raftery.diag(data1.rstanarm)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
- Autocorrelation diagnostic
stan_ac(data1.rstanarm)
- Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05.
If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this
mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling
in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or
overly vague priors that permit sampling in otherwise nonscence parameter space.
stan_rhat(data1.rstanarm)
- Another measure of sampling efficiency is Effective Sample Size (ess).
ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples
on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain
may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
stan_ess(data1.rstanarm)
- Traceplots
- via bayesplot
- Trace plots and density plots
library(bayesplot) mcmc_trace(as.array(data1.rstanarm), regex_pars = "Intercept|x|sigma")
mcmc_combo(as.array(data1.rstanarm))
- Density plots
mcmc_dens(as.array(data1.rstanarm))
- Trace plots and density plots
- via rstanarm
The rstanarm package provides additional posterior checks.- Posterior vs Prior - this compares the posterior estimate for each parameter against the associated prior.
If the spread of the priors is small relative to the posterior, then it is likely that the priors are too influential.
On the other hand, overly wide priors can lead to computational issues.
library(rstanarm) posterior_vs_prior(data1.rstanarm, color_by = "vs", group_by = TRUE, facet_args = list(scales = "free_y"))
Gradient evaluation took 6.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.69 seconds. Adjust your expectations accordingly! Elapsed Time: 0.403063 seconds (Warm-up) 0.077008 seconds (Sampling) 0.480071 seconds (Total) Gradient evaluation took 1.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. Adjust your expectations accordingly! Elapsed Time: 0.280175 seconds (Warm-up) 0.104319 seconds (Sampling) 0.384494 seconds (Total)
- Posterior vs Prior - this compares the posterior estimate for each parameter against the associated prior.
If the spread of the priors is small relative to the posterior, then it is likely that the priors are too influential.
On the other hand, overly wide priors can lead to computational issues.
- via shinystan
library(shinystan) launch_shinystan(data1.rstanarm))
Again, prior to examining the summaries, we should have explored the convergence diagnostics. Rather than duplicate this for both additive and multiplicative models, we will only explore the multiplicative model. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarization).
- extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
- use the numerous routines that come with the rstan package
- use the routines that come with the bayesplot package
- explore the diagnostics interactively via shinystan
- via coda
- Traceplots
- Autocorrelation
library(coda) mcmc = as.mcmc(data1.brms) plot(mcmc)
library(coda) mcmc = as.mcmc(data1.brms) autocorr.diag(mcmc)
Error in ts(x, start = start(x), end = end(x), deltat = thin(x)): invalid time series parameters specified
- via rstan
- Traceplots
stan_trace(data1.brms$fit)
- Raftery diagnostic
raftery.diag(data1.brms)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
- Autocorrelation diagnostic
stan_ac(data1.brms$fit)
- Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05.
If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this
mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling
in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or
overly vague priors that permit sampling in otherwise nonscence parameter space.
stan_rhat(data1.brms$fit)
- Another measure of sampling efficiency is Effective Sample Size (ess).
ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples
on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain
may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
stan_ess(data1.brms$fit)
- Traceplots
Model validation
Model validation involves exploring the model diagnostics and fit to ensure that the model is broadly appropriate for the data. As such, exploration of the residuals should be routine.
For more complex models (those that contain multiple effects, it is also advisable to plot the residuals against each of the individual predictors. For sampling designs that involve sample collection over space or time, it is also a good idea to explore whether there are any temporal or spatial patterns in the residuals.
There are numerous situations (e.g. when applying specific variance-covariance structures to a model) where raw residuals do not reflect the interior workings of the model. Typically, this is because they do not take into account the variance-covariance matrix or assume a very simple variance-covariance matrix. Since the purpose of exploring residuals is to evaluate the model, for these cases, it is arguably better to draw conclusions based on standardized (or studentized) residuals.
Unfortunately the definitions of standardized and studentized residuals appears to vary and the two terms get used interchangeably. I will adopt the following definitions:
Standardized residuals: | the raw residuals divided by the true standard deviation of the residuals (which of course is rarely known). | |
Studentized residuals: | the raw residuals divided by the standard deviation of the residuals. Note that externally studentized residuals are calculated by dividing the raw residuals by a unique standard deviation for each observation that is calculated from regressions having left each successive observation out. | |
Pearson residuals: | the raw residuals divided by the standard deviation of the response variable. |
The mark of a good model is being able to predict well. In an ideal world, we would have sufficiently large sample size as to permit us to hold a fraction (such as 25%) back thereby allowing us to train the model on 75% of the data and then see how well the model can predict the withheld 25%. Unfortunately, such a luxury is still rare in ecology.
The next best option is to see how well the model can predict the observed data. Models tend to struggle most with the extremes of trends and have particular issues when the extremes approach logical boundaries (such as zero for count data and standard deviations). We can use the fitted model to generate random predicted observations and then explore some properties of these compared to the actual observed data.
Residuals are not computed directly within MCMCpack. However, we can calculate them manually form the posteriors.
mcmc = as.data.frame(data1.mcmcpack) # generate a model matrix newdata1 = data1 Xmat = model.matrix(~A * B, newdata1) ## get median parameter estimates head(mcmc)
(Intercept) AGroup B AGroup C B AGroup B:B AGroup C:B sigma2 1 48.70652 -11.241113 -25.43591 -0.4057086 -0.1929317 0.2966322 9.505405 2 43.17996 -8.274010 -22.66362 -0.3300430 -0.1959714 0.2629181 13.109720 3 49.48138 -10.480517 -27.46155 -0.4829662 -0.1457857 0.3635239 14.426140 4 47.75332 -11.387909 -27.18387 -0.3879251 -0.2055644 0.3181976 10.130435 5 49.43988 -11.918492 -29.20874 -0.4436741 -0.2074017 0.3961033 9.917436 6 46.51075 -6.677204 -22.78787 -0.3225533 -0.3883363 0.1657746 11.947826
coefs = apply(mcmc[, 1:6], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data1$Y - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
mcmc = as.data.frame(data1.mcmcpack) # generate a model matrix newdata1 = newdata1 Xmat = model.matrix(~A * B, newdata1) ## get median parameter estimates coefs = apply(mcmc[, 1:6], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data1$Y - fit newdata1 = newdata1 %>% cbind(fit, resid) ggplot(newdata1) + geom_point(aes(y = resid, x = A))
ggplot(newdata1) + geom_point(aes(y = resid, x = B))
And now for studentized residuals
mcmc = as.data.frame(data1.mcmcpack) # generate a model matrix newdata1 = data1 Xmat = model.matrix(~A * B, newdata1) ## get median parameter estimates coefs = apply(mcmc[, 1:6], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data1$Y - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
mcmc = as.matrix(data1.mcmcpack) # generate a model matrix Xmat = model.matrix(~A * B, data1) ## get median parameter estimates coefs = mcmc[, 1:6] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data1), fit[i, ], sqrt(mcmc[i, "sigma2"]))) newdata1 = data.frame(A = data1$A, B = data1$B, yRep) %>% gather(key = Sample, value = Value, -A, -B) ggplot(newdata1) + geom_violin(aes(y = Value, x = A, fill = "Model"), alpha = 0.5) + geom_violin(data = data1, aes(y = Y, x = A, fill = "Obs"), alpha = 0.5) + geom_point(data = data1, aes(y = Y, x = A), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata1) + geom_violin(aes(y = Value, x = B, fill = "Model", group = B, color = A), alpha = 0.5) + geom_point(data = data1, aes(y = Y, x = B, group = B, color = A))
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data1.mcmcpack), regex_pars = "Intercept|^A|^B|sigma")
mcmc_areas(as.matrix(data1.mcmcpack), regex_pars = "Intercept|^A|^B|sigma")
Residuals are not computed directly within JAGS. However, we can calculate them manually form the posteriors.
mcmc = data1.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata1 = data1 Xmat = model.matrix(~A * B, newdata1) ## get median parameter estimates coefs = apply(mcmc[, 1:6], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data1$Y - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
mcmc = data1.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata1 = newdata1 Xmat = model.matrix(~A * B, newdata1) ## get median parameter estimates coefs = apply(mcmc[, 1:6], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data1$Y - fit newdata1 = newdata1 %>% cbind(fit, resid) ggplot(newdata1) + geom_point(aes(y = resid, x = A))
ggplot(newdata1) + geom_point(aes(y = resid, x = B))
And now for studentized residuals
mcmc = data1.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata1 = data1 Xmat = model.matrix(~A * B, newdata1) ## get median parameter estimates coefs = apply(mcmc[, 1:6], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data1$Y - fit sresid = resid/sd(resid) ggplot() + geom_point(data1 = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
mcmc = data1.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix Xmat = model.matrix(~A * B, data1) ## get median parameter estimates coefs = mcmc[, 1:6] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data1), fit[i, ], mcmc[i, "sigma"])) newdata1 = data.frame(A = data1$A, B = data1$B, yRep) %>% gather(key = Sample, value = Value, -A, -B) ggplot(newdata1) + geom_violin(aes(y = Value, x = A, fill = "Model"), alpha = 0.5) + geom_violin(data = data1, aes(y = Y, x = A, fill = "Obs"), alpha = 0.5) + geom_point(data = data1, aes(y = Y, x = A), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata1) + geom_violin(aes(y = Value, x = B, fill = "Model", group = B, color = A), alpha = 0.5) + geom_point(data = data1, aes(y = Y, x = B, group = B, color = A))
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(data1.r2jags$BUGSoutput$sims.matrix, regex_pars = "beta|sigma")
mcmc_areas(data1.r2jags$BUGSoutput$sims.matrix, regex_pars = "beta|sigma")
Residuals are not computed directly within RSTAN. However, we can calculate them manually form the posteriors.
mcmc = as.data.frame(data1.rstan) %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata1 = data1 Xmat = model.matrix(~A * B, newdata1) ## get median parameter estimates coefs = apply(mcmc[, 1:6], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data1$Y - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
mcmc = as.data.frame(data1.rstan) %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata1 = newdata1 Xmat = model.matrix(~A * B, newdata1) ## get median parameter estimates coefs = apply(mcmc[, 1:6], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data1$Y - fit newdata1 = newdata1 %>% cbind(fit, resid) ggplot(newdata1) + geom_point(aes(y = resid, x = A))
ggplot(newdata1) + geom_point(aes(y = resid, x = B))
And now for studentized residuals
mcmc = as.data.frame(data1.rstan) %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix newdata1 = data1 Xmat = model.matrix(~A * B, newdata1) ## get median parameter estimates coefs = apply(mcmc[, 1:6], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = data1$Y - fit sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
mcmc = as.data.frame(data1.rstan) %>% dplyr:::select(contains("beta"), sigma) %>% as.matrix # generate a model matrix Xmat = model.matrix(~A * B, data1) ## get median parameter estimates coefs = mcmc[, 1:6] fit = coefs %*% t(Xmat) ## draw samples from this model yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data1), fit[i, ], mcmc[i, "sigma"])) newdata1 = data.frame(A = data1$A, B = data1$B, yRep) %>% gather(key = Sample, value = Value, -A, -B) ggplot(newdata1) + geom_violin(aes(y = Value, x = A, fill = "Model"), alpha = 0.5) + geom_violin(data = data1, aes(y = Y, x = A, fill = "Obs"), alpha = 0.5) + geom_point(data = data1, aes(y = Y, x = A), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata1) + geom_violin(aes(y = Value, x = B, fill = "Model", group = B, color = A), alpha = 0.5) + geom_point(data = data1, aes(y = Y, x = B, group = B, color = A))
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data1.rstan), regex_pars = "beta|sigma")
mcmc_areas(as.matrix(data1.rstan), regex_pars = "beta|sigma")
Residuals are not computed directly within RSTANARM. However, we can calculate them manually form the posteriors.
resid = resid(data1.rstanarm) fit = fitted(data1.rstanarm) ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
resid = resid(data1.rstanarm) dat = data1 %>% mutate(resid = resid) ggplot(dat) + geom_point(aes(y = resid, x = A))
ggplot(dat) + geom_point(aes(y = resid, x = B))
And now for studentized residuals
resid = resid(data1.rstanarm) sresid = resid/sd(resid) fit = fitted(data1.rstanarm) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
y_pred = posterior_predict(data1.rstanarm) newdata1 = data1 %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -A:-Y) ggplot(newdata1) + geom_violin(aes(y = Value, x = A, fill = "Model"), alpha = 0.5) + geom_violin(data = data1, aes(y = Y, x = A, fill = "Obs"), alpha = 0.5) + geom_point(data = data1, aes(y = Y, x = A), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata1) + geom_violin(aes(y = Value, x = B, fill = "Model", group = B, color = A), alpha = 0.5) + geom_point(data = data1, aes(y = Y, x = B, group = B, color = A))
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data1.rstanarm), regex_pars = "Intercept|^A|^B|sigma")
mcmc_areas(as.matrix(data1.rstanarm), regex_pars = "Intercept|^A|^B|sigma")
Residuals are not computed directly within BRMS. However, we can calculate them manually form the posteriors.
resid = resid(data1.brms)[, "Estimate"] fit = fitted(data1.brms)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
Residuals against predictors
resid = resid(data1.brms)[, "Estimate"] dat = data1 %>% mutate(resid = resid) ggplot(dat) + geom_point(aes(y = resid, x = A))
ggplot(dat) + geom_point(aes(y = resid, x = B))
And now for studentized residuals
resid = resid(data1.brms)[, "Estimate"] sresid = resid/sd(resid) fit = fitted(data1.brms)[, "Estimate"] ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).
Lets see how well data simulated from the model reflects the raw data
y_pred = posterior_predict(data1.brms) newdata1 = data1 %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -A:-Y) ggplot(newdata1) + geom_violin(aes(y = Value, x = A, fill = "Model"), alpha = 0.5) + geom_violin(data = data1, aes(y = Y, x = A, fill = "Obs"), alpha = 0.5) + geom_point(data = data1, aes(y = Y, x = A), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata1) + geom_violin(aes(y = Value, x = B, fill = "Model", group = B, color = A), alpha = 0.5) + geom_point(data = data1, aes(y = Y, x = B, group = B, color = A))
Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(data1.brms), regex_pars = "b_|sigma")
mcmc_areas(as.matrix(data1.brms), regex_pars = "b_|sigma")
Parameter estimates (posterior summaries)
Although all parameters in a Bayesian analysis are considered random and are considered a distribution, rarely would it be useful to present tables of all the samples from each distribution. On the other hand, plots of the posterior distributions are do have some use. Nevertheless, most workers prefer to present simple statistical summaries of the posteriors. Popular choices include the median (or mean) and 95% credibility intervals.
library(coda) mcmcpvalue <- function(samp) { ## elementary version that creates an empirical p-value for the ## hypothesis that the columns of samp have mean zero versus a general ## multivariate distribution with elliptical contours. ## differences from the mean standardized by the observed ## variance-covariance factor ## Note, I put in the bit for single terms if (length(dim(samp)) == 0) { std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - mean(samp), transpose = TRUE) sqdist <- colSums(std * std) sum(sqdist[-1] > sqdist[1])/length(samp) } else { std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - colMeans(samp), transpose = TRUE) sqdist <- colSums(std * std) sum(sqdist[-1] > sqdist[1])/nrow(samp) } }
Matrix model (MCMCpack)
summary(data1.mcmcpack)
Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 49.5353 2.29334 0.0229334 0.0226186 AGroup B -12.2262 3.09179 0.0309179 0.0309179 AGroup C -28.2439 2.93282 0.0293282 0.0293282 B -0.4938 0.09038 0.0009038 0.0008764 AGroup B:B -0.1211 0.11358 0.0011358 0.0011178 AGroup C:B 0.4089 0.10829 0.0010829 0.0010829 sigma2 13.8964 4.43200 0.0443200 0.0552651 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 45.0690 48.0206 49.5282 51.01720 54.0757 AGroup B -18.3801 -14.2357 -12.1642 -10.19136 -6.1832 AGroup C -34.0521 -30.1795 -28.2026 -26.30389 -22.5493 B -0.6743 -0.5528 -0.4940 -0.43428 -0.3178 AGroup B:B -0.3412 -0.1958 -0.1209 -0.04575 0.1030 AGroup C:B 0.1948 0.3382 0.4086 0.47902 0.6240 sigma2 7.7536 10.8320 13.0991 16.09205 24.8023
# OR library(broom) tidyMCMC(data1.mcmcpack, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 49.5353232 2.29333768 45.1289511 54.1244471 2 AGroup B -12.2262443 3.09178506 -18.4166155 -6.2375566 3 AGroup C -28.2438752 2.93282206 -33.9185333 -22.4861115 4 B -0.4938007 0.09037721 -0.6733914 -0.3173760 5 AGroup B:B -0.1211118 0.11358441 -0.3414655 0.1027879 6 AGroup C:B 0.4088582 0.10828868 0.1931050 0.6219178 7 sigma2 13.8964411 4.43200264 7.0646504 22.9045959
- the intercept of the first group (Group A) is
49.5353232
- the mean of the second group (Group B) is
-12.2262443
units greater than (A) - the mean of the third group (Group C) is
-28.2438752
units greater than (A) - a one unit increase in B in Group A is associated with a
-0.4938007
units increase in Y - difference in slope between Group B and Group A
-0.1211118
- difference in slope between Group C and Group A
0.4088582
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(data1.mcmcpack[, 2]) # effect of (B-A = 0)
[1] 0.001
mcmcpvalue(data1.mcmcpack[, 3]) # effect of (C-A = 0)
[1] 0
mcmcpvalue(data1.mcmcpack[, 4]) # effect of (slope = 0)
[1] 0
mcmcpvalue(data1.mcmcpack[, 5]) # effect of (slopeB - slopeA = 0)
[1] 0.2763
mcmcpvalue(data1.mcmcpack[, 6]) # effect of (slopeC - slopeA = 0)
[1] 7e-04
mcmcpvalue(data1.mcmcpack[, 2:6]) # effect of (model)
[1] 0
There is evidence that the reponse differs between the groups.
Matrix model (JAGS)
print(data1.r2jags)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10 n.sims = 15000 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] 49.553 2.336 45.049 48.003 49.538 51.092 54.162 1.001 15000 beta[2] -12.227 3.093 -18.248 -14.285 -12.228 -10.182 -6.142 1.001 11000 beta[3] -28.270 2.993 -34.198 -30.236 -28.275 -26.304 -22.340 1.001 15000 beta[4] -0.495 0.093 -0.680 -0.556 -0.494 -0.433 -0.313 1.001 15000 beta[5] -0.121 0.115 -0.347 -0.197 -0.122 -0.046 0.105 1.001 15000 beta[6] 0.410 0.111 0.193 0.336 0.410 0.482 0.636 1.001 15000 sigma 3.772 0.585 2.827 3.354 3.707 4.106 5.105 1.001 15000 deviance 163.101 4.379 156.875 159.924 162.329 165.512 173.639 1.001 15000 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 9.6 and DIC = 172.7 DIC is an estimate of expected predictive error (lower deviance is better).
# OR library(broom) tidyMCMC(as.mcmc(data1.r2jags), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 beta[1] 49.5534512 2.33556926 45.0234105 54.1328491 2 beta[2] -12.2268098 3.09333864 -18.3836687 -6.3089149 3 beta[3] -28.2700616 2.99256629 -34.2921382 -22.4881054 4 beta[4] -0.4947167 0.09264266 -0.6838619 -0.3183081 5 beta[5] -0.1214410 0.11480484 -0.3469758 0.1054090 6 beta[6] 0.4099855 0.11145489 0.1877435 0.6289609 7 deviance 163.1011311 4.37873657 156.1255607 171.7611813 8 sigma 3.7720815 0.58508086 2.7189187 4.9291545
- the intercept of the first group (Group A) is
49.5534512
- the mean of the second group (Group B) is
-12.2268098
units greater than (A) - the mean of the third group (Group C) is
-28.2700616
units greater than (A) - a one unit increase in B in Group A is associated with a
-0.4947167
units increase in Y - difference in slope between Group B and Group A
-0.121441
- difference in slope between Group C and Group A
0.4099855
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(data1.r2jags$BUGSoutput$sims.matrix[, "beta[2]"]) # effect of (B-A = 0)
[1] 0.0003333333
mcmcpvalue(data1.r2jags$BUGSoutput$sims.matrix[, "beta[3]"]) # effect of (C-A = 0)
[1] 0
mcmcpvalue(data1.r2jags$BUGSoutput$sims.matrix[, "beta[4]"]) # effect of (slope = 0)
[1] 0
mcmcpvalue(data1.r2jags$BUGSoutput$sims.matrix[, "beta[5]"]) # effect of (slopeB - slopeA = 0)
[1] 0.28
mcmcpvalue(data1.r2jags$BUGSoutput$sims.matrix[, "beta[6]"]) # effect of (slopeC - slopeA = 0)
[1] 0.0007333333
mcmcpvalue(data1.r2jags$BUGSoutput$sims.matrix[, 2:6]) # effect of (model)
[1] 0
There is evidence that the reponse differs between the groups.
Matrix model (RSTAN)
print(data1.rstan, pars = c("beta", "sigma"))
Inference for Stan model: 3d2414c9dcf4b5e12be870eadd2c894a. 3 chains, each with iter=2000; warmup=500; thin=3; post-warmup draws per chain=500, total post-warmup draws=1500. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat beta[1] 49.50 0.08 2.34 44.55 48.01 49.58 51.01 54.11 902 1 beta[2] -12.29 0.09 3.04 -18.60 -14.30 -12.17 -10.22 -6.70 1094 1 beta[3] -28.21 0.10 2.97 -33.80 -30.23 -28.34 -26.40 -22.04 957 1 beta[4] -0.49 0.00 0.09 -0.67 -0.55 -0.49 -0.44 -0.31 978 1 beta[5] -0.12 0.00 0.11 -0.34 -0.19 -0.12 -0.05 0.11 1164 1 beta[6] 0.41 0.00 0.11 0.18 0.34 0.41 0.48 0.62 953 1 sigma 3.73 0.02 0.58 2.81 3.32 3.66 4.05 5.07 1271 1 Samples were drawn using NUTS(diag_e) at Sat Nov 4 07:45:04 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).
# OR library(broom) tidyMCMC(data1.rstan, conf.int = TRUE, conf.method = "HPDinterval", pars = c("beta", "sigma"))
term estimate std.error conf.low conf.high 1 beta[1] 49.5013545 2.34230542 44.9503609 54.24168440 2 beta[2] -12.2889364 3.04350709 -18.5232918 -6.66905473 3 beta[3] -28.2129289 2.96719159 -33.8515230 -22.12391688 4 beta[4] -0.4929557 0.09152697 -0.6703121 -0.30402421 5 beta[5] -0.1195490 0.11233322 -0.3536692 0.08719164 6 beta[6] 0.4082751 0.11091872 0.1750039 0.61796175 7 sigma 3.7255865 0.57520983 2.6288615 4.77664381
- the intercept of the first group (Group A) is
49.5013545
- the mean of the second group (Group B) is
-12.2889364
units greater than (A) - the mean of the third group (Group C) is
-28.2129289
units greater than (A) - a one unit increase in B in Group A is associated with a
-0.4929557
units increase in Y - difference in slope between Group B and Group A
-0.119549
- difference in slope between Group C and Group A
0.4082751
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero mcmcpvalue(as.matrix(data1.rstan)[, "beta[2]"]) # effect of (B-A = 0)
[1] 0.0006666667
mcmcpvalue(as.matrix(data1.rstan)[, "beta[3]"]) # effect of (C-A = 0)
[1] 0
mcmcpvalue(as.matrix(data1.rstan)[, "beta[4]"]) # effect of (slope = 0)
[1] 0
mcmcpvalue(as.matrix(data1.rstan)[, "beta[5]"]) # effect of (slopeB - slopeA = 0)
[1] 0.278
mcmcpvalue(as.matrix(data1.rstan)[, "beta[6]"]) # effect of (slopeC - slopeA = 0)
[1] 0.0006666667
mcmcpvalue(as.matrix(data1.rstan)[, 2:6]) # effect of (model)
[1] 0
There is evidence that a model that includes an interaction is better than an additive model.
library(loo) (full = loo(extract_log_lik(data1.rstan)))
Computed from 1500 by 30 log-likelihood matrix Estimate SE elpd_loo -85.3 4.0 p_loo 6.3 1.6 looic 170.6 7.9 Pareto k diagnostic values: Count Pct (-Inf, 0.5] (good) 26 86.7% (0.5, 0.7] (ok) 4 13.3% (0.7, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0% All Pareto k estimates are ok (k < 0.7) See help('pareto-k-diagnostic') for details.
# now fit a model without main factor modelString = " data { int<lower=1> n; int<lower=1> nX; vector [n] y; matrix [n,nX] X; } parameters { vector[nX] beta; real<lower=0> sigma; } transformed parameters { vector[n] mu; mu = X*beta; } model { #Likelihood y~normal(mu,sigma); #Priors beta ~ normal(0,1000); sigma~cauchy(0,5); } generated quantities { vector[n] log_lik; for (i in 1:n) { log_lik[i] = normal_lpdf(y[i] | mu[i], sigma); } } " Xmat <- model.matrix(~A + B, data1) data1.list <- with(data1, list(y = Y, X = Xmat, n = nrow(data1), nX = ncol(Xmat))) data1.rstan.red <- stan(data = data1.list, model_code = modelString, chains = 3, iter = 2000, warmup = 500, thin = 3)
SAMPLING FOR MODEL '3b057d3d81cbed2078ce678376a94574' NOW (CHAIN 1). Gradient evaluation took 4.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.42 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.138805 seconds (Warm-up) 0.122949 seconds (Sampling) 0.261754 seconds (Total) SAMPLING FOR MODEL '3b057d3d81cbed2078ce678376a94574' NOW (CHAIN 2). Gradient evaluation took 1.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.106299 seconds (Warm-up) 0.113143 seconds (Sampling) 0.219442 seconds (Total) SAMPLING FOR MODEL '3b057d3d81cbed2078ce678376a94574' NOW (CHAIN 3). Gradient evaluation took 1.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 501 / 2000 [ 25%] (Sampling) Iteration: 700 / 2000 [ 35%] (Sampling) Iteration: 900 / 2000 [ 45%] (Sampling) Iteration: 1100 / 2000 [ 55%] (Sampling) Iteration: 1300 / 2000 [ 65%] (Sampling) Iteration: 1500 / 2000 [ 75%] (Sampling) Iteration: 1700 / 2000 [ 85%] (Sampling) Iteration: 1900 / 2000 [ 95%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.120129 seconds (Warm-up) 0.110436 seconds (Sampling) 0.230565 seconds (Total)
(reduced = loo(extract_log_lik(data1.rstan.red)))
Computed from 1500 by 30 log-likelihood matrix Estimate SE elpd_loo -97.9 3.3 p_loo 5.0 1.1 looic 195.8 6.7 Pareto k diagnostic values: Count Pct (-Inf, 0.5] (good) 28 93.3% (0.5, 0.7] (ok) 2 6.7% (0.7, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0% All Pareto k estimates are ok (k < 0.7) See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
compare_models(full, reduced)
Error in discrete == discrete[1]: comparison of these types is not implemented
Matrix model (RSTANARM)
summary(data1.rstanarm)
Model Info: function: stan_glm family: gaussian [identity] formula: Y ~ A * B algorithm: sampling priors: see help('prior_summary') sample: 2250 (posterior sample size) num obs: 30 Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) 49.5 2.4 44.9 48.0 49.6 51.1 54.3 AGroup B -12.3 3.1 -18.3 -14.4 -12.3 -10.2 -6.1 AGroup C -28.3 3.0 -34.3 -30.3 -28.3 -26.3 -22.3 B -0.5 0.1 -0.7 -0.6 -0.5 -0.4 -0.3 AGroup B:B -0.1 0.1 -0.3 -0.2 -0.1 0.0 0.1 AGroup C:B 0.4 0.1 0.2 0.3 0.4 0.5 0.6 sigma 3.8 0.6 2.8 3.3 3.7 4.1 5.2 mean_PPD 26.9 1.0 24.9 26.2 26.9 27.5 28.9 log-posterior -97.2 2.2 -102.5 -98.4 -96.8 -95.6 -94.1 Diagnostics: mcse Rhat n_eff (Intercept) 0.1 1.0 1938 AGroup B 0.1 1.0 1974 AGroup C 0.1 1.0 1902 B 0.0 1.0 2126 AGroup B:B 0.0 1.0 2068 AGroup C:B 0.0 1.0 1989 sigma 0.0 1.0 1226 mean_PPD 0.0 1.0 1497 log-posterior 0.1 1.0 1030 For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
# OR library(broom) tidyMCMC(data1.rstanarm$stanfit, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 49.5438430 2.38328655 44.8579446 54.1707410 2 AGroup B -12.3097610 3.13510315 -18.8059933 -6.6018889 3 AGroup C -28.2751561 3.02810337 -33.8195233 -21.9065319 4 B -0.4940998 0.09381214 -0.6922853 -0.3200262 5 AGroup B:B -0.1204605 0.11724241 -0.3514843 0.1104638 6 AGroup C:B 0.4111006 0.11271108 0.1810014 0.6214744 7 sigma 3.7727701 0.59136185 2.7322401 4.9499110 8 mean_PPD 26.8713702 1.00446504 24.9737303 29.0017404 9 log-posterior -97.2208223 2.19450449 -101.6130115 -93.7209496
- the intercept of the first group (Group A) is
49.543843
- the mean of the second group (Group B) is
-12.309761
units greater than (A) - the mean of the third group (Group C) is
-28.2751561
units greater than (A) - a one unit increase in B in Group A is associated with a
-0.4940998
units increase in Y - difference in slope between Group B and Group A
-0.1204605
- difference in slope between Group C and Group A
0.4111006
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero head(as.matrix(data1.rstanarm))
parameters iterations (Intercept) AGroup B AGroup C B AGroup B:B AGroup C:B sigma [1,] 49.10544 -12.03203 -30.24254 -0.5408720 -0.07177445 0.5097346 3.368864 [2,] 49.57108 -10.46682 -27.74189 -0.5167232 -0.15566016 0.4159785 3.202435 [3,] 52.79242 -16.30169 -31.83083 -0.5749994 -0.08438700 0.4699279 3.849197 [4,] 50.85134 -14.46853 -28.92998 -0.5535872 0.03970759 0.4676661 4.339336 [5,] 52.41561 -18.30582 -30.55315 -0.4583573 -0.10973607 0.3507038 4.801621 [6,] 52.90121 -18.31527 -30.77699 -0.6350685 0.13775093 0.4858842 3.541416
mcmcpvalue(as.matrix(data1.rstanarm)[, "AGroup B"]) # effect of (B-A = 0)
[1] 0
mcmcpvalue(as.matrix(data1.rstanarm)[, "AGroup C"]) # effect of (C-A = 0)
[1] 0
mcmcpvalue(as.matrix(data1.rstanarm)[, "B"]) # effect of (slope = 0)
[1] 0
mcmcpvalue(as.matrix(data1.rstanarm)[, "AGroup B:B"]) # effect of (slopeB - slopeA = 0)
[1] 0.2951111
mcmcpvalue(as.matrix(data1.rstanarm)[, "AGroup C:B"]) # effect of (slopeC - slopeA = 0)
[1] 0.001777778
mcmcpvalue(as.matrix(data1.rstanarm)[, 2:6]) # effect of (model)
[1] 0
There is evidence that the reponse differs between the groups. There is very little evidence that the response of group D differs from that of group A
library(loo) (full = loo(data1.rstanarm))
Computed from 2250 by 30 log-likelihood matrix Estimate SE elpd_loo -85.9 4.1 p_loo 6.8 1.8 looic 171.9 8.2 Pareto k diagnostic values: Count Pct (-Inf, 0.5] (good) 24 80.0% (0.5, 0.7] (ok) 5 16.7% (0.7, 1] (bad) 1 3.3% (1, Inf) (very bad) 0 0.0% See help('pareto-k-diagnostic') for details.
data1.rstanarm.red = update(data1.rstanarm, . ~ A + B)
Gradient evaluation took 4.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.41 seconds. Adjust your expectations accordingly! Elapsed Time: 0.252022 seconds (Warm-up) 0.130327 seconds (Sampling) 0.382349 seconds (Total) Gradient evaluation took 1.9e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds. Adjust your expectations accordingly! Elapsed Time: 0.182518 seconds (Warm-up) 0.152707 seconds (Sampling) 0.335225 seconds (Total) Gradient evaluation took 1.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. Adjust your expectations accordingly! Elapsed Time: 0.164992 seconds (Warm-up) 0.158132 seconds (Sampling) 0.323124 seconds (Total)
(reduced = loo(data1.rstanarm.red))
Computed from 2250 by 30 log-likelihood matrix Estimate SE elpd_loo -97.7 3.1 p_loo 4.7 1.0 looic 195.5 6.3 Pareto k diagnostic values: Count Pct (-Inf, 0.5] (good) 27 90.0% (0.5, 0.7] (ok) 3 10.0% (0.7, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0% All Pareto k estimates are ok (k < 0.7) See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
compare_models(full, reduced)
elpd_diff se -11.8 4.2
Matrix model (BRMS)
summary(data1.brms)
Family: gaussian(identity) Formula: Y ~ A * B Data: data1 (Number of observations: 30) Samples: 3 chains, each with iter = 2000; warmup = 500; thin = 2; total post-warmup samples = 2250 ICs: LOO = NA; WAIC = NA; R2 = NA Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 49.60 2.32 44.97 54.25 1182 1 AGroupB -12.33 3.06 -18.49 -6.37 1216 1 AGroupC -28.29 2.97 -34.18 -22.34 1246 1 B -0.49 0.09 -0.67 -0.32 1160 1 AGroupB:B -0.12 0.11 -0.35 0.11 1186 1 AGroupC:B 0.41 0.11 0.19 0.62 1248 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 3.71 0.54 2.81 4.98 1423 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
# OR library(broom) tidyMCMC(data1.brms$fit, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 b_Intercept 49.6027270 2.3173028 44.8624049 54.0547491 2 b_AGroupB -12.3262056 3.0553552 -18.7775957 -6.7594663 3 b_AGroupC -28.2870152 2.9659002 -33.8736952 -22.1495177 4 b_B -0.4926334 0.0908813 -0.6696454 -0.3177777 5 b_AGroupB:B -0.1212027 0.1126979 -0.3497945 0.1053154 6 b_AGroupC:B 0.4067397 0.1097094 0.1878784 0.6203292 7 sigma 3.7129053 0.5394484 2.6776034 4.7406944
- the intercept of the first group (Group A) is
49.602727
- the mean of the second group (Group B) is
-12.3262056
units greater than (A) - the mean of the third group (Group C) is
-28.2870152
units greater than (A) - a one unit increase in B in Group A is associated with a
-0.4926334
units increase in Y - difference in slope between Group B and Group A
-0.1212027
- difference in slope between Group C and Group A
0.4067397
While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.
## since values are less than zero head(as.matrix(data1.brms))
parameters iterations b_Intercept b_AGroupB b_AGroupC b_B b_AGroupB:B b_AGroupC:B sigma lp__ [1,] 47.70303 -10.255010 -25.32093 -0.3742985 -0.17137635 0.2600049 3.092922 -116.2110 [2,] 50.55262 -14.941232 -30.12870 -0.4643934 -0.13460334 0.4086522 3.230887 -114.2718 [3,] 48.22410 -9.538243 -25.80487 -0.4858593 -0.10874505 0.4073568 4.423214 -116.0591 [4,] 48.07606 -10.397199 -26.96966 -0.4404635 -0.16751208 0.3346061 2.849359 -113.2790 [5,] 49.43749 -10.676002 -28.03187 -0.5099977 -0.11303887 0.4276380 3.777916 -113.2435 [6,] 50.64586 -13.340054 -27.57699 -0.5712098 -0.02596519 0.4449288 2.659352 -115.2269
mcmcpvalue(as.matrix(data1.brms)[, "b_AGroupB"]) # effect of (B-A = 0)
[1] 0.0004444444
mcmcpvalue(as.matrix(data1.brms)[, "b_AGroupC"]) # effect of (C-A = 0)
[1] 0
mcmcpvalue(as.matrix(data1.brms)[, "b_B"]) # effect of (slope = 0)
[1] 0
mcmcpvalue(as.matrix(data1.brms)[, "b_AGroupB:B"]) # effect of (slopeB - slopeA = 0)
[1] 0.264
mcmcpvalue(as.matrix(data1.brms)[, "b_AGroupC:B"]) # effect of (slopeB - slopeA = 0)
[1] 0.001333333
mcmcpvalue(as.matrix(data1.brms)[, 2:6]) # effect of (model)
[1] 0
There is evidence that the reponse differs between the groups. There is very little evidence that the response of group D differs from that of group A
library(loo) (full = loo(data1.brms))
LOOIC SE 170.84 8.04
data1.brms.red = update(data1.brms, . ~ A + B)
SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 1). Gradient evaluation took 2.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.059643 seconds (Warm-up) 0.028751 seconds (Sampling) 0.088394 seconds (Total) SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 2). Gradient evaluation took 8e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.05915 seconds (Warm-up) 0.028504 seconds (Sampling) 0.087654 seconds (Total) SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 3). Gradient evaluation took 8e-06 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.056265 seconds (Warm-up) 0.027707 seconds (Sampling) 0.083972 seconds (Total)
(reduced = loo(data1.brms.red))
LOOIC SE 195.67 6.7
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3) plot(full, label_points = TRUE) plot(reduced, label_points = TRUE)
compare_models(full, reduced)
Error in discrete == discrete[1]: comparison of these types is not implemented
Graphical summaries
A nice graphic is often a great accompaniment to a statistical analysis. Although there are no fixed assumptions associated with graphing (in contrast to statistical analyses), we often want the graphical summaries to reflect the associated statistical analyses. After all, the sample is just one perspective on the population(s). What we are more interested in is being able to estimate and depict likely population parameters/trends.
Thus, whilst we could easily provide a plot displaying the raw data along with simple measures of location and spread, arguably, we should use estimates that reflect the fitted model. In this case, it would be appropriate to plot the credibility interval associated with each group.
Matrix model (MCMCpack)
mcmc = data1.mcmcpack ## Calculate the fitted values newdata1 = expand.grid(A = levels(data1$A), B = seq(min(data1$B), max(data1$B), len = 100)) Xmat = model.matrix(~A * B, newdata1) coefs = mcmc[, 1:6] fit = coefs %*% t(Xmat) newdata1 = newdata1 %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ggplot(newdata1, aes(y = estimate, x = B, fill = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
As this is simple single factor ANOVA, we can simple add the raw data to this figure. For more complex designs with additional predictors, it is necessary to plot partial residuals.
## Calculate partial residuals fitted values fdata1 = rdata1 = data1 fMat = rMat = model.matrix(~A * B, fdata1) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data1$Y - apply(coefs, 2, median) %*% t(rMat)) rdata1 = rdata1 %>% mutate(partial.resid = resid + fit) ggplot(newdata1, aes(y = estimate, x = B, fill = A)) + geom_point(data = rdata1, aes(y = partial.resid, x = B, color = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
Matrix model (JAGS)
mcmc = data1.r2jags$BUGSoutput$sims.matrix ## Calculate the fitted values newdata1 = expand.grid(A = levels(data1$A), B = seq(min(data1$B), max(data1$B), len = 100)) Xmat = model.matrix(~A * B, newdata1) coefs = mcmc[, c("beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) newdata1 = newdata1 %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ggplot(newdata1, aes(y = estimate, x = B, fill = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
As this is simple single factor ANOVA, we can simple add the raw data to this figure. For more complex designs with additional predictors, it is necessary to plot partial residuals.
## Calculate partial residuals fitted values fdata1 = rdata1 = data1 fMat = rMat = model.matrix(~A * B, fdata1) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data1$Y - apply(coefs, 2, median) %*% t(rMat)) rdata1 = rdata1 %>% mutate(partial.resid = resid + fit) ggplot(newdata1, aes(y = estimate, x = B, fill = A)) + geom_point(data = rdata1, aes(y = partial.resid, x = B, color = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
Matrix model (RSTAN)
mcmc = as.matrix(data1.rstan) ## Calculate the fitted values newdata1 = expand.grid(A = levels(data1$A), B = seq(min(data1$B), max(data1$B), len = 100)) Xmat = model.matrix(~A * B, newdata1) coefs = mcmc[, c("beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")] fit = coefs %*% t(Xmat) newdata1 = newdata1 %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) ggplot(newdata1, aes(y = estimate, x = B, fill = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
As this is simple single factor ANOVA, we can simple add the raw data to this figure. For more complex designs with additional predictors, it is necessary to plot partial residuals.
## Calculate partial residuals fitted values fdata1 = rdata1 = data1 fMat = rMat = model.matrix(~A * B, fdata1) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(data1$Y - apply(coefs, 2, median) %*% t(rMat)) rdata1 = rdata1 %>% mutate(partial.resid = resid + fit) ggplot(newdata1, aes(y = estimate, x = B, fill = A)) + geom_point(data = rdata1, aes(y = partial.resid, x = B, color = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
Matrix model (RSTANARM)
## Calculate the fitted values newdata1 = expand.grid(A = levels(data1$A), B = seq(min(data1$B), max(data1$B), len = 100)) fit = posterior_linpred(data1.rstanarm, newdata = newdata1) newdata1 = newdata1 %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) ggplot(newdata1, aes(y = estimate, x = B, fill = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
As this is simple single factor ANOVA, we can simple add the raw data to this figure. For more complex designs with additional predictors, it is necessary to plot partial residuals.
## Calculate partial residuals rdata1 = data1 pp = posterior_linpred(data1.rstanarm, newdata = rdata1) fit = as.vector(apply(pp, 2, median)) resid = resid(data1.rstanarm) rdata1 = rdata1 %>% mutate(partial.resid = resid + fit) ggplot(newdata1, aes(y = estimate, x = B, fill = A)) + geom_point(data = rdata1, aes(y = partial.resid, x = B, color = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
Matrix model (BRMS)
Although we could calculated the fitted values via matrix multiplication of the coefficients and the model matrix (as for MCMCpack, RJAGS and RSTAN), for more complex models, it is more convenient to use the marginal_effects function that comes with brms.
plot(marginal_effects(data1.brms), points = TRUE)
# OR eff = plot(marginal_effects(data1.brms), points = TRUE, plot = FALSE) eff
$A
$B
$`B:A`
## Calculate the fitted values newdata1 = expand.grid(A = levels(data1$A), B = seq(min(data1$B), max(data1$B), len = 100)) fit = fitted(data1.brms, newdata = newdata1, summary = FALSE) newdata1 = newdata1 %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) ggplot(newdata1, aes(y = estimate, x = B, fill = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
As this is simple single factor ANOVA, we can simple add the raw data to this figure. For more complex designs with additional predictors, it is necessary to plot partial residuals.
## Calculate partial residuals rdata1 = data1 fit = fitted(data1.brms, summary = TRUE)[, "Estimate"] resid = resid(data1.brms)[, "Estimate"] rdata1 = rdata1 %>% mutate(partial.resid = resid + fit) ggplot(newdata1, aes(y = estimate, x = B, fill = A)) + geom_point(data = rdata1, aes(y = partial.resid, x = B, color = A)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) + geom_line() + scale_y_continuous("Y") + scale_x_continuous("B") + theme_classic()
References
Gelman, A. (2005). “Analysis of Variance - Why it is More Important Than Ever”. In: The Annals of Statistics 33.1, pp. 1–53.
Gelman, A., B. Goodrich, J. Gabry, et al. (2017). “R-squared for Bayesian regression models”.
Worked Examples
- McCarthy (2007) - Chpt 6
- Kery (2010) - Chpt 10
- Gelman & Hill (2007) - Chpt 4
- Logan (2010) - Chpt 12
- Quinn & Keough (2002) - Chpt 9
Homogeneous slopes
To investigate the impacts of sexual activity on the fruitfly longevity, Partridge and Farquhar (1981), measured the longevity of male fruitflies with access to either one virgin female (potential mate), eight virgin females, one pregnant female (not a potential mate), eight pregnant females or no females. The pool of available male fruitflies varied in size and since size is known to impact longevity, the researchers also measured thorax length as a covariate.
Download Partridge1 data setFormat of partridge1.csv data file | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
partridge <- read.csv("../downloads/data/partridge1.csv", strip.white = T) head(partridge)
TREATMENT THORAX LONGEV 1 Preg8 0.64 35 2 Preg8 0.68 37 3 Preg8 0.68 49 4 Preg8 0.72 46 5 Preg8 0.72 63 6 Preg8 0.76 39
A subset of these data were analysed in Tutorial 7.4b Q3 (the same data, yet without the continuous covariate). On that occasion we decided that although the response were counts (and thus a good match for a Poisson distribution), the magnitude of the observations would probably mean that a Gaussian distribution would be adequate.
The addition of a continuous covariate is likely to complicate this somewhat. Now we not only assume homogeneity of variance between treatments, we also assume homogeneity of variance with thorax length.
- Perform basic exploratory data analysis to examined.
- normality
- homogeneity of variance
- homogeneity of slopes (and equality of covariate ranges)
ggplot(partridge, aes(y = LONGEV, x = THORAX, color = TREATMENT)) + geom_smooth() + geom_point()
plot(lm(LONGEV ~ THORAX + TREATMENT, partridge), which = 1)
There is definitely evidence of non-homogeneity of variance and perhaps even non-linearity. As already indicated, this is perhaps not unsurprising. Arguably, the most appropriate starting point would be to begin with a Poisson distribution which would not only expect the non-homogeneity of variance, since the Poisson uses a log-link, the relationship on a response scale would be expected to be non-linear.
Nevertheless, we will researve the use of a Poisson distribution until Tutorial 9.4b and for now, simply use a log (base 10) transformation to be consistent with the original authors. Note, as there are no zeros in these data, and (perhaps more importantly), the response does not approach zero, a logarithmic transformation is likely to be reasonable.
ggplot(partridge, aes(y = log10(LONGEV), x = THORAX, color = TREATMENT)) + geom_smooth() + geom_point()
plot(lm(log10(LONGEV) ~ THORAX + TREATMENT, partridge), which = 1)
That does appear more reasonable..
And now to explore the range of the covariate.
ggplot(partridge, aes(y = THORAX, x = TREATMENT)) + geom_boxplot()
summary(lm(THORAX ~ TREATMENT, partridge))
Call: lm(formula = THORAX ~ TREATMENT, data = partridge) Residuals: Min 1Q Median 3Q Max -0.1960 -0.0456 0.0144 0.0544 0.1344 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.83600 0.01543 54.194 <2e-16 *** TREATMENTPreg1 -0.01040 0.02182 -0.477 0.634 TREATMENTPreg8 -0.03040 0.02182 -1.393 0.166 TREATMENTVirg1 0.00160 0.02182 0.073 0.942 TREATMENTVirg8 -0.03600 0.02182 -1.650 0.102 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.07713 on 120 degrees of freedom Multiple R-squared: 0.04032, Adjusted R-squared: 0.008335 F-statistic: 1.261 on 4 and 120 DF, p-value: 0.2893
Conclusions - no evidence that the range of thorax length varies substantially between treatments. Furthermore, homogeneity of slopes seems reasonable and so we will use an additive model.
- Fit the appropriate Bayesian model to explore the effect of treatment and thorax
length on male fruitfly longenvity.
$$
\begin{align}
y_{ij} &\sim{} N(\mu_{ij}, \sigma)\\
\mu_{ij} &= \beta_0 + \mathbf{X}\boldsymbol{\beta}\\[1em]
\beta_0 &\sim{} N(0,10)\\
\beta &\sim{} N(0,10)\\
\sigma &\sim{} cauchy(0,5)\\
\end{align}
$$
library(MCMCpack) partridge.mcmcpack = MCMCregress(log10(LONGEV) ~ TREATMENT + THORAX, data = partridge)
modelString = " model { #Likelihood for (i in 1:n) { y[i]~dnorm(mu[i],tau) mu[i] <- beta0 + inprod(beta[],X[i,]) } #Priors beta0 ~ dnorm(0.01,1.0E-6) for (j in 1:nX) { beta[j] ~ dnorm(0.01,1.0E-6) } tau <- 1 / (sigma * sigma) sigma~dunif(0,100) } " X = model.matrix(~TREATMENT + THORAX, data = partridge) partridge.list <- with(partridge, list(y = log10(LONGEV), X = X[, -1], nX = ncol(X) - 1, n = nrow(partridge))) params <- c("beta0", "beta", "sigma") burnInSteps = 3000 nChains = 3 numSavedSteps = 15000 thinSteps = 10 nIter = ceiling((numSavedSteps * thinSteps)/nChains) partridge.r2jags <- jags(data = partridge.list, inits = NULL, parameters.to.save = params, model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter, n.burnin = burnInSteps, n.thin = thinSteps)
Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 125 Unobserved stochastic nodes: 7 Total graph size: 998 Initializing model
modelString=" data { int
n; // total number of observations vector[n] Y; // response variable int nX; // number of effects matrix[n, nX] X; // model matrix } transformed data { matrix[n, nX - 1] Xc; // centered version of X vector[nX - 1] means_X; // column means of X before centering for (i in 2:nX) { means_X[i - 1] = mean(X[, i]); Xc[, i - 1] = X[, i] - means_X[i - 1]; } } parameters { vector[nX-1] beta; // population-level effects real cbeta0; // center-scale intercept real sigma; // residual SD } transformed parameters { } model { vector[n] mu; mu = Xc * beta + cbeta0; // prior specifications beta ~ normal(0, 10); cbeta0 ~ normal(0, 10); sigma ~ cauchy(0, 5); // likelihood contribution Y ~ normal(mu, sigma); } generated quantities { real beta0; // population-level intercept vector[n] log_lik; beta0 = cbeta0 - dot_product(means_X, beta); for (i in 1:n) { log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma); } } " X = model.matrix(~TREATMENT + THORAX, data = partridge) partridge.list <- with(partridge, list(Y = log10(LONGEV), X = X, nX = ncol(X), n = nrow(partridge))) library(rstan) partridge.rstan <- stan(data = partridge.list, model_code = modelString, chains = 3, iter = 5000, warmup = 500, thin = 2)
In file included from /usr/local/lib/R/site-library/BH/include/boost/config.hpp:39:0, from /usr/local/lib/R/site-library/BH/include/boost/math/tools/config.hpp:13, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/var.hpp:7, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:12, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4, from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4, from file6f0539983e2.cpp:8: /usr/local/lib/R/site-library/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: "BOOST_NO_CXX11_RVALUE_REFERENCES" redefined # define BOOST_NO_CXX11_RVALUE_REFERENCES ^ <command-line>:0:0: note: this is the location of the previous definition SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 1). Gradient evaluation took 3.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.31 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 501 / 5000 [ 10%] (Sampling) Iteration: 1000 / 5000 [ 20%] (Sampling) Iteration: 1500 / 5000 [ 30%] (Sampling) Iteration: 2000 / 5000 [ 40%] (Sampling) Iteration: 2500 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.077105 seconds (Warm-up) 0.380042 seconds (Sampling) 0.457147 seconds (Total) SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 2). Gradient evaluation took 1.6e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 501 / 5000 [ 10%] (Sampling) Iteration: 1000 / 5000 [ 20%] (Sampling) Iteration: 1500 / 5000 [ 30%] (Sampling) Iteration: 2000 / 5000 [ 40%] (Sampling) Iteration: 2500 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.067367 seconds (Warm-up) 0.394947 seconds (Sampling) 0.462314 seconds (Total) SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 3). Gradient evaluation took 1.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds. Adjust your expectations accordingly! Iteration: 1 / 5000 [ 0%] (Warmup) Iteration: 500 / 5000 [ 10%] (Warmup) Iteration: 501 / 5000 [ 10%] (Sampling) Iteration: 1000 / 5000 [ 20%] (Sampling) Iteration: 1500 / 5000 [ 30%] (Sampling) Iteration: 2000 / 5000 [ 40%] (Sampling) Iteration: 2500 / 5000 [ 50%] (Sampling) Iteration: 3000 / 5000 [ 60%] (Sampling) Iteration: 3500 / 5000 [ 70%] (Sampling) Iteration: 4000 / 5000 [ 80%] (Sampling) Iteration: 4500 / 5000 [ 90%] (Sampling) Iteration: 5000 / 5000 [100%] (Sampling) Elapsed Time: 0.067727 seconds (Warm-up) 0.42144 seconds (Sampling) 0.489167 seconds (Total)
partridge.rstanarm = stan_glm(log10(LONGEV) ~ TREATMENT + THORAX, data = partridge, iter = 5000, warmup = 500, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 10), prior = normal(0, 10), prior_aux = cauchy(0, 5))
Gradient evaluation took 0.000161 seconds 1000 transitions using 10 leapfrog steps per transition would take 1.61 seconds. Adjust your expectations accordingly! Elapsed Time: 0.109844 seconds (Warm-up) 0.657854 seconds (Sampling) 0.767698 seconds (Total) Gradient evaluation took 2.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds. Adjust your expectations accordingly! Elapsed Time: 0.08606 seconds (Warm-up) 0.507298 seconds (Sampling) 0.593358 seconds (Total) Gradient evaluation took 3e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.3 seconds. Adjust your expectations accordingly! Elapsed Time: 0.084726 seconds (Warm-up) 0.50021 seconds (Sampling) 0.584936 seconds (Total)
partridge.brm = brm(log10(LONGEV) ~ TREATMENT + THORAX, data = partridge, iter = 5000, warmup = 500, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 10), class = "Intercept"), prior(normal(0, 10), class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 2.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds. Adjust your expectations accordingly! Elapsed Time: 0.060042 seconds (Warm-up) 0.340743 seconds (Sampling) 0.400785 seconds (Total) Gradient evaluation took 1.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. Adjust your expectations accordingly! Elapsed Time: 0.061698 seconds (Warm-up) 0.346451 seconds (Sampling) 0.408149 seconds (Total) Gradient evaluation took 1.4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds. Adjust your expectations accordingly! Elapsed Time: 0.064057 seconds (Warm-up) 0.336682 seconds (Sampling) 0.400739 seconds (Total)
- Explore MCMC diagnostics
library(MCMCpack) plot(partridge.mcmcpack)
raftery.diag(partridge.mcmcpack)
Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) (Intercept) 2 3865 3746 1.030 TREATMENTPreg1 2 3710 3746 0.990 TREATMENTPreg8 2 3741 3746 0.999 TREATMENTVirg1 2 3710 3746 0.990 TREATMENTVirg8 2 3771 3746 1.010 THORAX 2 3710 3746 0.990 sigma2 2 3994 3746 1.070
autocorr.diag(partridge.mcmcpack)
(Intercept) TREATMENTPreg1 TREATMENTPreg8 TREATMENTVirg1 TREATMENTVirg8 THORAX Lag 0 1.000000000 1.000000000 1.000000e+00 1.000000000 1.000000000 1.000000e+00 Lag 1 -0.006504213 0.005715657 -1.260043e-03 -0.008276112 -0.008425989 -6.216742e-03 Lag 5 0.020132882 0.012739945 -2.900389e-02 -0.001338169 -0.015262191 2.106973e-02 Lag 10 0.005066356 0.011293213 -8.019851e-03 -0.003100969 -0.013767490 5.865134e-03 Lag 50 -0.001937426 -0.024530392 8.454565e-05 0.004761843 -0.008212958 -3.677482e-05 sigma2 Lag 0 1.000000e+00 Lag 1 5.041996e-02 Lag 5 7.127112e-05 Lag 10 1.653846e-02 Lag 50 -7.121217e-03
library(R2jags) library(coda) partridge.mcmc = as.mcmc(partridge.r2jags) plot(partridge.mcmc)
raftery.diag(partridge.mcmc)
[[1]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 10 37660 3746 10.10 beta[1] 20 37020 3746 9.88 beta[2] 20 37020 3746 9.88 beta[3] 20 37020 3746 9.88 beta[4] 20 35750 3746 9.54 beta[5] 20 37020 3746 9.88 deviance 20 37020 3746 9.88 sigma 10 37660 3746 10.10 [[2]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 10 37660 3746 10.10 beta[1] 10 37660 3746 10.10 beta[2] 30 40390 3746 10.80 beta[3] 20 37020 3746 9.88 beta[4] 10 37660 3746 10.10 beta[5] 20 38330 3746 10.20 deviance 20 38330 3746 10.20 sigma 20 36380 3746 9.71 [[3]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 Burn-in Total Lower bound Dependence (M) (N) (Nmin) factor (I) beta0 20 37020 3746 9.88 beta[1] 10 37660 3746 10.10 beta[2] 20 39000 3746 10.40 beta[3] 20 37020 3746 9.88 beta[4] 20 37020 3746 9.88 beta[5] 10 37660 3746 10.10 deviance 20 37020 3746 9.88 sigma 20 38330 3746 10.20
autocorr.diag(partridge.mcmc)
beta0 beta[1] beta[2] beta[3] beta[4] beta[5] deviance Lag 0 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000 Lag 10 -0.013921949 0.005149797 0.004975120 -0.001853710 -0.014382555 -0.012575832 -0.008892071 Lag 50 -0.003696594 -0.002835295 0.001318796 0.010156118 -0.011928061 -0.003560853 0.010943677 Lag 100 0.003584436 -0.001331310 0.003826374 0.009171373 0.008765503 0.003492416 -0.008883727 Lag 500 0.016409182 0.013260824 0.009091810 0.001029463 -0.003070906 0.015760118 0.002411228 sigma Lag 0 1.0000000000 Lag 10 -0.0092198712 Lag 50 0.0059702667 Lag 100 -0.0008803947 Lag 500 0.0077816203
library(rstan) library(coda) s = as.array(partridge.rstan) partridge.mcmc <- do.call(mcmc.list, plyr:::alply(s[, , c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "sigma")], 2, as.mcmc)) plot(partridge.mcmc)
raftery.diag(partridge.mcmc)
$`1` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`2` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`3` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
autocorr.diag(partridge.mcmc)
beta0 beta[1] beta[2] beta[3] beta[4] beta[5] sigma Lag 0 1.0000000000 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000 1.0000000000 Lag 1 0.0362085746 0.091738066 0.093666654 0.065303202 0.083647790 0.029088441 0.0297993560 Lag 5 0.0075315830 -0.007179393 -0.013474037 0.019288850 -0.002217306 0.007947353 -0.0114935490 Lag 10 0.0154634582 0.004801029 0.020335742 0.010920588 0.025187625 0.014189952 0.0007989306 Lag 50 -0.0001803206 -0.001614632 -0.005646201 0.003760134 -0.019317695 0.003126408 -0.0024392021
library(rstan) library(coda) stan_ac(partridge.rstan, pars = c("beta0", "beta", "sigma"))
stan_rhat(partridge.rstan, pars = c("beta0", "beta", "sigma"))
stan_ess(partridge.rstan, pars = c("beta0", "beta", "sigma"))
# using Bayeseplot library(bayesplot) detach("package:reshape") mcmc_trace(as.array(partridge.rstan), regex_par = "beta|sigma")
mcmc_trace(as.array(partridge.rstan), regex_pars = "beta|sigma")
mcmc_dens(as.array(partridge.rstan), regex_par = "beta|sigma")
detach("package:reshape") library(bayesplot) mcmc_combo(as.array(partridge.rstan), regex_par = "beta|sigma")
library(rstanarm) library(coda) s = as.array(partridge.rstanarm) colnames(as.matrix(partridge.rstanarm))
[1] "(Intercept)" "TREATMENTPreg1" "TREATMENTPreg8" "TREATMENTVirg1" "TREATMENTVirg8" [6] "THORAX" "sigma"
partridge.mcmc <- do.call(mcmc.list, plyr:::alply(s[, , c("(Intercept)", "TREATMENTPreg1", "TREATMENTPreg8", "TREATMENTVirg1", "TREATMENTVirg8", "THORAX", "sigma")], 2, as.mcmc)) plot(partridge.mcmc)
raftery.diag(partridge.mcmc)
$`1` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`2` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s $`3` Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
autocorr.diag(partridge.mcmc)
(Intercept) TREATMENTPreg1 TREATMENTPreg8 TREATMENTVirg1 TREATMENTVirg8 THORAX Lag 0 1.000000000 1.000000000 1.000000000 1.0000000000 1.000000000 1.0000000000 Lag 1 0.075353592 0.177654851 0.188350285 0.1751439722 0.185668547 0.0617834129 Lag 5 0.003490198 0.005324381 0.000440865 -0.0045602106 -0.002274830 0.0006088013 Lag 10 -0.003447542 0.012745177 -0.005106027 0.0115046730 -0.010358881 -0.0033071860 Lag 50 -0.004644673 -0.009315829 0.011216715 -0.0005416519 -0.007338707 -0.0043183473 sigma Lag 0 1.000000000 Lag 1 0.048851176 Lag 5 -0.011582830 Lag 10 -0.008910899 Lag 50 0.003726101
library(rstanarm) library(coda) stan_ac(partridge.rstanarm, regex_pars = "Intercept|TREATMENT|THORAX|sigma")
stan_rhat(partridge.rstanarm, regex_pars = "Intercept|TREATMENT|THORAX|sigma")
stan_ess(partridge.rstanarm, regex_pars = "Intercept|TREATMENT|THORAX|sigma")
# using Bayeseplot library(bayesplot) detach("package:reshape") mcmc_trace(as.array(partridge.rstanarm), regex_par = "Intercept|TREATMENT|THORAX|sigma")
mcmc_trace(as.array(partridge.rstanarm), regex_pars = "Intercept|TREATMENT|THORAX|sigma")
mcmc_dens(as.array(partridge.rstanarm), regex_pars = "Intercept|TREATMENT|THORAX|sigma")
detach("package:reshape") library(bayesplot) mcmc_combo(as.array(partridge.rstanarm), regex_par = "Intercept|TREATMENT|THORAX|sigma")
library(rstanarm) posterior_vs_prior(partridge.rstanarm, color_by = "vs", group_by = TRUE, facet_args = list(scales = "free_y"))
Gradient evaluation took 4.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.47 seconds. Adjust your expectations accordingly! Elapsed Time: 0.064874 seconds (Warm-up) 0.139432 seconds (Sampling) 0.204306 seconds (Total) Gradient evaluation took 4e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.4 seconds. Adjust your expectations accordingly! Elapsed Time: 0.074223 seconds (Warm-up) 0.240174 seconds (Sampling) 0.314397 seconds (Total)
library(coda) library(brms) partridge.mcmc = as.mcmc(partridge.brm) plot(partridge.mcmc)
raftery.diag(partridge.mcmc)
[[1]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s [[2]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s [[3]] Quantile (q) = 0.025 Accuracy (r) = +/- 0.005 Probability (s) = 0.95 You need a sample size of at least 3746 with these values of q, r and s
autocorr.diag(partridge.mcmc)
Error in ts(x, start = start(x), end = end(x), deltat = thin(x)): invalid time series parameters specified
library(coda) stan_ac(partridge.brm$fit)
stan_rhat(partridge.brm$fit)
stan_ess(partridge.brm$fit)
- Perform model validation
library(MCMCpack) partridge.mcmc = as.data.frame(partridge.mcmcpack) # generate a model matrix newdata = partridge Xmat = model.matrix(~TREATMENT + THORAX, newdata) ## get median parameter estimates wch = grep("Intercept|TREATMENT|THORAX", colnames(partridge.mcmc)) coefs = apply(partridge.mcmc[, wch], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = log10(partridge$LONGEV) - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
# Plot residuals against treatment level newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, TREATMENT, THORAX) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scales = "free")
# Standardized residuals sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
library(MCMCpack) partridge.mcmc = as.matrix(partridge.mcmcpack) # generate a model matrix Xmat = model.matrix(~TREATMENT + THORAX, partridge) ## get median parameter estimates wch = grep("Intercept|TREATMENT|THORAX", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] fit = 10^(coefs %*% t(Xmat)) ## draw samples from this model yRep = sapply(1:nrow(partridge.mcmc), function(i) rnorm(nrow(partridge), fit[i, ], sqrt(partridge.mcmc[i, "sigma2"]))) newdata = data.frame(TREATMENT = partridge$TREATMENT, THORAX = partridge$THORAX, yRep) %>% gather(key = Sample, value = Value, -TREATMENT, -THORAX) ggplot(newdata) + geom_violin(aes(y = Value, x = TREATMENT, fill = "Model"), alpha = 0.5) + geom_violin(data = partridge, aes(y = LONGEV, x = TREATMENT, fill = "Obs"), alpha = 0.5) + geom_point(data = partridge, aes(y = LONGEV, x = TREATMENT), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata) + geom_violin(aes(y = Value, x = THORAX, fill = "Model", group = THORAX, color = TREATMENT), alpha = 0.5) + geom_point(data = partridge, aes(y = LONGEV, x = THORAX, group = THORAX, color = TREATMENT))
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(partridge.mcmcpack), regex_pars = "TREATMENT|THORAX")
mcmc_areas(as.matrix(partridge.mcmcpack), regex_pars = "TREATMENT|THORAX")
partridge.mcmc = partridge.r2jags$BUGSoutput$sims.matrix # generate a model matrix newdata = partridge Xmat = model.matrix(~TREATMENT + THORAX, newdata) ## get median parameter estimates wch = c(which(colnames(partridge.mcmc) == "beta0"), grep("beta\\[", colnames(partridge.mcmc))) coefs = apply(partridge.mcmc[, wch], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = log10(partridge$LONGEV) - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
# Plot residuals against treatment level newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, TREATMENT, THORAX) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scales = "free")
# Standardized residuals sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
partridge.mcmc = partridge.r2jags$BUGSoutput$sims.matrix # generate a model matrix Xmat = model.matrix(~TREATMENT + THORAX, partridge) ## get median parameter estimates wch = c(which(colnames(partridge.mcmc) == "beta0"), grep("beta\\[", colnames(partridge.mcmc))) coefs = partridge.mcmc[, wch] fit = 10^(coefs %*% t(Xmat)) ## draw samples from this model yRep = sapply(1:nrow(partridge.mcmc), function(i) rnorm(nrow(partridge), fit[i, ], partridge.mcmc[i, "sigma"])) newdata = data.frame(TREATMENT = partridge$TREATMENT, THORAX = partridge$THORAX, yRep) %>% gather(key = Sample, value = Value, -TREATMENT, -THORAX) ggplot(newdata) + geom_violin(aes(y = Value, x = TREATMENT, fill = "Model"), alpha = 0.5) + geom_violin(data = partridge, aes(y = LONGEV, x = TREATMENT, fill = "Obs"), alpha = 0.5) + geom_point(data = partridge, aes(y = LONGEV, x = TREATMENT), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata) + geom_violin(aes(y = Value, x = THORAX, fill = "Model", group = THORAX, color = TREATMENT), alpha = 0.5) + geom_point(data = partridge, aes(y = LONGEV, x = THORAX, group = THORAX, color = TREATMENT))
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(partridge.r2jags$BUGSoutput$sims.matrix, regex_pars = "beta|sigma")
mcmc_areas(partridge.r2jags$BUGSoutput$sims.matrix, regex_pars = "beta|sigma")
partridge.mcmc = as.matrix(partridge.rstan) # generate a model matrix newdata = partridge Xmat = model.matrix(~TREATMENT + THORAX, newdata) ## get median parameter estimates wch = c(which(colnames(partridge.mcmc) == "beta0"), grep("beta\\[", colnames(partridge.mcmc))) coefs = apply(partridge.mcmc[, wch], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = log10(partridge$LONGEV) - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
# Plot residuals against treatment level newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, TREATMENT, THORAX) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scales = "free")
# Standardized residuals sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
partridge.mcmc = as.matrix(partridge.rstan) # generate a model matrix Xmat = model.matrix(~TREATMENT + THORAX, partridge) ## get median parameter estimates wch = c(which(colnames(partridge.mcmc) == "beta0"), grep("beta\\[", colnames(partridge.mcmc))) coefs = partridge.mcmc[, wch] fit = 10^(coefs %*% t(Xmat)) ## draw samples from this model yRep = sapply(1:nrow(partridge.mcmc), function(i) rnorm(nrow(partridge), fit[i, ], partridge.mcmc[i, "sigma"])) newdata = data.frame(TREATMENT = partridge$TREATMENT, THORAX = partridge$THORAX, yRep) %>% gather(key = Sample, value = Value, -TREATMENT, -THORAX) ggplot(newdata) + geom_violin(aes(y = Value, x = TREATMENT, fill = "Model"), alpha = 0.5) + geom_violin(data = partridge, aes(y = LONGEV, x = TREATMENT, fill = "Obs"), alpha = 0.5) + geom_point(data = partridge, aes(y = LONGEV, x = TREATMENT), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata) + geom_violin(aes(y = Value, x = THORAX, fill = "Model", group = THORAX, color = TREATMENT), alpha = 0.5) + geom_point(data = partridge, aes(y = LONGEV, x = THORAX, group = THORAX, color = TREATMENT))
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(partridge.rstan), regex_pars = "beta|sigma")
mcmc_areas(as.matrix(partridge.rstan), regex_pars = "beta|sigma")
partridge.mcmc = as.matrix(partridge.rstanarm) # generate a model matrix newdata = partridge Xmat = model.matrix(~TREATMENT + THORAX, newdata) ## get median parameter estimates wch = grep("Intercept|TREATMENT|THORAX", colnames(partridge.mcmc)) coefs = apply(partridge.mcmc[, wch], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = log10(partridge$LONGEV) - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
# Plot residuals against treatment level newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, TREATMENT, THORAX) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scales = "free")
# Standardized residuals sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
partridge.mcmc = as.matrix(partridge.rstanarm) # generate a model matrix Xmat = model.matrix(~TREATMENT + THORAX, partridge) ## get median parameter estimates wch = grep("Intercept|TREATMENT|THORAX", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] fit = 10^(coefs %*% t(Xmat)) ## draw samples from this model yRep = sapply(1:nrow(partridge.mcmc), function(i) rnorm(nrow(partridge), fit[i, ], partridge.mcmc[i, "sigma"])) newdata = data.frame(TREATMENT = partridge$TREATMENT, THORAX = partridge$THORAX, yRep) %>% gather(key = Sample, value = Value, -TREATMENT, -THORAX) ggplot(newdata) + geom_violin(aes(y = Value, x = TREATMENT, fill = "Model"), alpha = 0.5) + geom_violin(data = partridge, aes(y = LONGEV, x = TREATMENT, fill = "Obs"), alpha = 0.5) + geom_point(data = partridge, aes(y = LONGEV, x = TREATMENT), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata) + geom_violin(aes(y = Value, x = THORAX, fill = "Model", group = THORAX, color = TREATMENT), alpha = 0.5) + geom_point(data = partridge, aes(y = LONGEV, x = THORAX, group = THORAX, color = TREATMENT))
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(partridge.rstanarm), regex_pars = "Intercept|TREATMENT|THORAX|sigma")
mcmc_areas(as.matrix(partridge.rstanarm), regex_pars = "Intercept|TREATMENT|THORAX|sigma")
partridge.mcmc = as.matrix(partridge.brm) # generate a model matrix newdata = partridge Xmat = model.matrix(~TREATMENT + THORAX, newdata) ## get median parameter estimates wch = grep("b_", colnames(partridge.mcmc)) coefs = apply(partridge.mcmc[, wch], 2, median) fit = as.vector(coefs %*% t(Xmat)) resid = log10(partridge$LONGEV) - fit ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
# Plot residuals against treatment level newdata = newdata %>% cbind(fit, resid) newdata.melt = newdata %>% gather(key = Pred, value = Value, TREATMENT, THORAX) ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scales = "free")
# Standardized residuals sresid = resid/sd(resid) ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
partridge.mcmc = as.matrix(partridge.brm) # generate a model matrix Xmat = model.matrix(~TREATMENT + THORAX, partridge) ## get median parameter estimates wch = grep("b_", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] fit = 10^(coefs %*% t(Xmat)) ## draw samples from this model yRep = sapply(1:nrow(partridge.mcmc), function(i) rnorm(nrow(partridge), fit[i, ], partridge.mcmc[i, "sigma"])) newdata = data.frame(TREATMENT = partridge$TREATMENT, THORAX = partridge$THORAX, yRep) %>% gather(key = Sample, value = Value, -TREATMENT, -THORAX) ggplot(newdata) + geom_violin(aes(y = Value, x = TREATMENT, fill = "Model"), alpha = 0.5) + geom_violin(data = partridge, aes(y = LONGEV, x = TREATMENT, fill = "Obs"), alpha = 0.5) + geom_point(data = partridge, aes(y = LONGEV, x = TREATMENT), position = position_jitter(width = 0.1, height = 0), color = "black")
ggplot(newdata) + geom_violin(aes(y = Value, x = THORAX, fill = "Model", group = THORAX, color = TREATMENT), alpha = 0.5) + geom_point(data = partridge, aes(y = LONGEV, x = THORAX, group = THORAX, color = TREATMENT))
We can also explore the posteriors of each parameter.
library(bayesplot) mcmc_intervals(as.matrix(partridge.brm), regex_pars = "b_|sigma")
mcmc_areas(as.matrix(partridge.brm), regex_pars = "b_|sigma")
- Explore parameter estimates
library(MCMCpack) summary(partridge.mcmcpack)
Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) 0.79153 0.0862441 8.624e-04 8.643e-04 TREATMENTPreg1 0.02226 0.0238689 2.387e-04 2.387e-04 TREATMENTPreg8 0.03650 0.0241417 2.414e-04 2.414e-04 TREATMENTVirg1 -0.05404 0.0240102 2.401e-04 2.401e-04 TREATMENTVirg8 -0.18198 0.0241462 2.415e-04 2.415e-04 THORAX 1.19335 0.1009666 1.010e-03 1.006e-03 sigma2 0.00713 0.0009452 9.452e-06 9.942e-06 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) 0.625379 0.732743 0.791865 0.849579 0.961182 TREATMENTPreg1 -0.025297 0.006360 0.022323 0.038120 0.068806 TREATMENTPreg8 -0.010526 0.020301 0.036510 0.052720 0.083886 TREATMENTVirg1 -0.101254 -0.069767 -0.053919 -0.038007 -0.006695 TREATMENTVirg8 -0.230098 -0.197744 -0.181983 -0.165952 -0.134440 THORAX 0.992453 1.124938 1.193252 1.262005 1.389023 sigma2 0.005491 0.006455 0.007057 0.007723 0.009211
library(broom) tidyMCMC(partridge.mcmcpack, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 (Intercept) 0.791534756 0.0862440749 0.627282417 0.962326127 2 TREATMENTPreg1 0.022262435 0.0238688705 -0.025800765 0.068161673 3 TREATMENTPreg8 0.036498778 0.0241417477 -0.012028238 0.082087102 4 TREATMENTVirg1 -0.054043317 0.0240102293 -0.101607483 -0.007550831 5 TREATMENTVirg8 -0.181978738 0.0241461559 -0.231333679 -0.135891212 6 THORAX 1.193347232 0.1009665978 0.995580301 1.390356975 7 sigma2 0.007129683 0.0009451969 0.005343271 0.008958414
mcmcpvalue(partridge.mcmcpack[, "TREATMENTPreg1"])
[1] 0.3491
mcmcpvalue(partridge.mcmcpack[, "TREATMENTPreg8"])
[1] 0.133
mcmcpvalue(partridge.mcmcpack[, "TREATMENTVirg1"])
[1] 0.0271
mcmcpvalue(partridge.mcmcpack[, "TREATMENTVirg8"])
[1] 0
mcmcpvalue(partridge.mcmcpack[, "THORAX"])
[1] 0
wch = grep("TREATMENT|THORAX", colnames(partridge.mcmcpack)) mcmcpvalue(partridge.mcmcpack[, wch])
[1] 0
## Frequentist for comparison summary(lm(log10(LONGEV) ~ TREATMENT + THORAX, partridge))
Call: lm(formula = log10(LONGEV) ~ TREATMENT + THORAX, data = partridge) Residuals: Min 1Q Median 3Q Max -0.226736 -0.058445 -0.003469 0.059961 0.170389 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.79095 0.08443 9.368 5.89e-16 *** TREATMENTPreg1 0.02260 0.02368 0.954 0.3419 TREATMENTPreg8 0.03648 0.02385 1.530 0.1287 TREATMENTVirg1 -0.05381 0.02366 -2.275 0.0247 * TREATMENTVirg8 -0.18165 0.02392 -7.592 7.79e-12 *** THORAX 1.19385 0.09900 12.060 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.08364 on 119 degrees of freedom Multiple R-squared: 0.7055, Adjusted R-squared: 0.6932 F-statistic: 57.02 on 5 and 119 DF, p-value: < 2.2e-16
print(partridge.r2jags)
Inference for Bugs model at "5", fit using jags, 3 chains, each with 50000 iterations (first 3000 discarded), n.thin = 10 n.sims = 14100 iterations saved mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff beta[1] 0.022 0.024 -0.025 0.006 0.022 0.039 0.069 1.001 5000 beta[2] 0.037 0.024 -0.010 0.020 0.037 0.053 0.083 1.001 11000 beta[3] -0.054 0.024 -0.101 -0.070 -0.054 -0.038 -0.007 1.001 14000 beta[4] -0.182 0.024 -0.229 -0.198 -0.182 -0.166 -0.134 1.001 6500 beta[5] 1.194 0.100 0.995 1.128 1.194 1.262 1.389 1.001 7500 beta0 0.791 0.085 0.626 0.733 0.790 0.848 0.960 1.001 6800 sigma 0.085 0.006 0.074 0.081 0.084 0.088 0.096 1.001 14000 deviance -264.409 3.899 -269.889 -267.273 -265.103 -262.305 -254.937 1.002 2600 For each parameter, n.eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor (at convergence, Rhat=1). DIC info (using the rule, pD = var(deviance)/2) pD = 7.6 and DIC = -256.8 DIC is an estimate of expected predictive error (lower deviance is better).
library(broom) tidyMCMC(partridge.r2jags$BUGSoutput$sims.matrix, conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 beta[1] 0.02222087 0.024107038 -0.02387136 7.028915e-02 2 beta[2] 0.03655562 0.024131694 -0.01101267 8.252448e-02 3 beta[3] -0.05390009 0.023948614 -0.09989605 -6.049349e-03 4 beta[4] -0.18170503 0.024261555 -0.22861460 -1.338749e-01 5 beta[5] 1.19380034 0.100331683 1.00121995 1.392995e+00 6 beta0 0.79102584 0.085462563 0.62940375 9.630549e-01 7 deviance -264.40914257 3.898646906 -270.71869433 -2.567381e+02 8 sigma 0.08453005 0.005586101 0.07400793 9.562832e-02
partridge.mcmc = partridge.r2jags$BUGSoutput$sims.matrix mcmcpvalue(partridge.mcmc[, "beta[1]"])
[1] 0.3587234
mcmcpvalue(partridge.mcmc[, "beta[2]"])
[1] 0.1301418
mcmcpvalue(partridge.mcmc[, "beta[3]"])
[1] 0.02468085
mcmcpvalue(partridge.mcmc[, "beta[4]"])
[1] 0
mcmcpvalue(partridge.mcmc[, "beta[5]"])
[1] 0
wch = grep("beta\\[", colnames(partridge.mcmc)) mcmcpvalue(partridge.mcmc[, wch])
[1] 0
# summary(partridge.rstan) library(broom) partridge.mcmc = as.matrix(partridge.rstan) tidyMCMC(partridge.rstan, conf.int = TRUE, conf.method = "HPDinterval", pars = c("beta0", "beta", "sigma"), ess = TRUE, rhat = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 beta0 0.79128931 0.086061774 0.62566968 0.961482006 1.000007 5827 2 beta[1] 0.02299254 0.024083810 -0.02571685 0.067810084 1.000434 5184 3 beta[2] 0.03628671 0.024376240 -0.01049910 0.083787879 1.000117 5274 4 beta[3] -0.05368096 0.023662283 -0.10081880 -0.009464246 1.000751 5262 5 beta[4] -0.18108684 0.024518494 -0.22869622 -0.132604319 1.000599 5420 6 beta[5] 1.19318572 0.100771538 1.00337859 1.396651732 1.000058 6383 7 sigma 0.08455309 0.005593207 0.07383390 0.095473807 1.000092 6376
mcmcpvalue(partridge.mcmc[, "beta[1]"])
[1] 0.3402963
mcmcpvalue(partridge.mcmc[, "beta[2]"])
[1] 0.1362963
mcmcpvalue(partridge.mcmc[, "beta[3]"])
[1] 0.02237037
mcmcpvalue(partridge.mcmc[, "beta[4]"])
[1] 0
mcmcpvalue(partridge.mcmc[, "beta[5]"])
[1] 0
wch = grep("beta\\[", colnames(partridge.mcmc)) mcmcpvalue(partridge.mcmc[, wch])
[1] 0
summary(partridge.rstanarm)
Model Info: function: stan_glm family: gaussian [identity] formula: log10(LONGEV) ~ TREATMENT + THORAX algorithm: sampling priors: see help('prior_summary') sample: 6750 (posterior sample size) num obs: 125 Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) 0.8 0.1 0.6 0.7 0.8 0.9 1.0 TREATMENTPreg1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 TREATMENTPreg8 0.0 0.0 0.0 0.0 0.0 0.1 0.1 TREATMENTVirg1 -0.1 0.0 -0.1 -0.1 -0.1 0.0 0.0 TREATMENTVirg8 -0.2 0.0 -0.2 -0.2 -0.2 -0.2 -0.1 THORAX 1.2 0.1 1.0 1.1 1.2 1.3 1.4 sigma 0.1 0.0 0.1 0.1 0.1 0.1 0.1 mean_PPD 1.7 0.0 1.7 1.7 1.7 1.7 1.8 log-posterior 122.3 1.9 117.6 121.3 122.6 123.7 125.0 Diagnostics: mcse Rhat n_eff (Intercept) 0.0 1.0 5726 TREATMENTPreg1 0.0 1.0 4536 TREATMENTPreg8 0.0 1.0 4695 TREATMENTVirg1 0.0 1.0 4830 TREATMENTVirg8 0.0 1.0 4579 THORAX 0.0 1.0 5922 sigma 0.0 1.0 6149 mean_PPD 0.0 1.0 6728 log-posterior 0.0 1.0 4744 For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
library(broom) partridge.mcmc = as.matrix(partridge.rstanarm) tidyMCMC(partridge.rstanarm$stanfit, conf.int = TRUE, conf.method = "HPDinterval", ess = TRUE, rhat = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 (Intercept) 0.79293103 0.085741974 0.63442374 0.967936705 0.9998648 5726 2 TREATMENTPreg1 0.02234996 0.024074877 -0.02338140 0.070865192 0.9999204 4536 3 TREATMENTPreg8 0.03641433 0.024403496 -0.01013371 0.086076915 1.0000061 4695 4 TREATMENTVirg1 -0.05403215 0.023548978 -0.09925124 -0.006906492 0.9997467 4830 5 TREATMENTVirg8 -0.18226103 0.023984235 -0.22966723 -0.136666158 1.0002442 4579 6 THORAX 1.19165812 0.100358278 0.98759586 1.382129994 0.9999683 5922 7 sigma 0.08462483 0.005556366 0.07493118 0.096523655 0.9996408 6149 8 mean_PPD 1.73567551 0.010668382 1.71548076 1.757712481 1.0000841 6728 9 log-posterior 122.29436093 1.937479394 118.54507644 125.359663348 0.9998040 4744
mcmcpvalue(partridge.mcmc[, "TREATMENTPreg1"])
[1] 0.354963
mcmcpvalue(partridge.mcmc[, "TREATMENTPreg8"])
[1] 0.1346667
mcmcpvalue(partridge.mcmc[, "TREATMENTVirg1"])
[1] 0.02222222
mcmcpvalue(partridge.mcmc[, "TREATMENTVirg8"])
[1] 0
mcmcpvalue(partridge.mcmc[, "THORAX"])
[1] 0
wch = grep("TREATMENT|THORAX", colnames(partridge.mcmc)) mcmcpvalue(partridge.mcmc[, wch])
[1] 0
summary(partridge.brm)
Family: gaussian(identity) Formula: log10(LONGEV) ~ TREATMENT + THORAX Data: partridge (Number of observations: 125) Samples: 3 chains, each with iter = 5000; warmup = 500; thin = 2; total post-warmup samples = 6750 ICs: LOO = NA; WAIC = NA; R2 = NA Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 0.79 0.09 0.63 0.96 6165 1 TREATMENTPreg1 0.02 0.02 -0.02 0.07 5774 1 TREATMENTPreg8 0.04 0.02 -0.01 0.08 5790 1 TREATMENTVirg1 -0.05 0.02 -0.10 -0.01 5633 1 TREATMENTVirg8 -0.18 0.02 -0.23 -0.13 5810 1 THORAX 1.19 0.10 1.00 1.39 6155 1 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 0.08 0.01 0.07 0.1 6265 1 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
library(broom) partridge.mcmc = as.matrix(partridge.brm) tidyMCMC(partridge.brm$fit, conf.int = TRUE, conf.method = "HPDinterval", ess = TRUE, rhat = TRUE)
term estimate std.error conf.low conf.high rhat ess 1 b_Intercept 0.79087403 0.085559712 0.628005663 0.963051668 0.9999102 6165 2 b_TREATMENTPreg1 0.02240460 0.024011041 -0.024874580 0.068487474 0.9997021 5774 3 b_TREATMENTPreg8 0.03620256 0.023843389 -0.008463557 0.084372278 0.9999173 5790 4 b_TREATMENTVirg1 -0.05404743 0.023815863 -0.100910410 -0.008313408 0.9997183 5633 5 b_TREATMENTVirg8 -0.18198728 0.023869345 -0.228420724 -0.135256737 0.9997294 5810 6 b_THORAX 1.19414513 0.100754965 0.995884101 1.389117902 0.9998769 6155 7 sigma 0.08463568 0.005584061 0.074166370 0.095825344 1.0004425 6265
mcmcpvalue(partridge.mcmc[, "b_TREATMENTPreg1"])
[1] 0.3511111
mcmcpvalue(partridge.mcmc[, "b_TREATMENTPreg8"])
[1] 0.128
mcmcpvalue(partridge.mcmc[, "b_TREATMENTVirg1"])
[1] 0.024
mcmcpvalue(partridge.mcmc[, "b_TREATMENTVirg8"])
[1] 0
wch = grep("b_TREATMENT|b_THORAX", colnames(partridge.mcmc)) mcmcpvalue(partridge.mcmc[, wch])
[1] 0
- Generate graphical summaries
library(MCMCpack) partridge.mcmc = partridge.mcmcpack ## Calculate the fitted values newdata = with(partridge, expand.grid(TREATMENT = levels(TREATMENT), THORAX = seq(min(THORAX), max(THORAX), len = 100))) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = grep("Intercept|TREATMENT|THORAX", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] fit = 10^(coefs %*% t(Xmat)) newdata.95 = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.level = 0.95, conf.method = "HPDinterval")) newdata.80 = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.level = 0.8, conf.method = "HPDinterval")) newdata.95 %>% head
TREATMENT THORAX estimate std.error conf.low conf.high 1 None 0.6400000 35.98025 2.167746 31.85706 40.31875 2 Preg1 0.6400000 37.86609 2.167869 33.72704 42.22615 3 Preg8 0.6400000 39.12340 2.159726 34.98930 43.45699 4 Virg1 0.6400000 31.77057 1.920246 28.06600 35.52507 5 Virg8 0.6400000 23.65556 1.281681 21.25357 26.25122 6 None 0.6430303 36.27993 2.166315 32.15499 40.61696
ggplot(newdata.95, aes(y = estimate, x = THORAX, color = TREATMENT)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = TREATMENT), color = NA, alpha = 0.2) + geom_ribbon(data = newdata.80, aes(ymin = conf.low, ymax = conf.high, fill = TREATMENT), color = NA, alpha = 0.4) + geom_line() + scale_y_continuous("Male fruitfly longevity (days)") + scale_fill_discrete("Substrate type", breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners")) + scale_color_discrete("Substrate type", breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners")) + theme_classic()
# And now with partial residuals fdata = rdata = partridge fMat = rMat = model.matrix(~TREATMENT + THORAX, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(log10(partridge$LONGEV) - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = 10^(resid + fit)) br <- log10(pretty(partridge$LONGEV)) ggplot(newdata.95, aes(y = estimate, x = THORAX, linetype = TREATMENT)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "grey", alpha = 0.2) + geom_point(data = rdata, aes(y = partial.resid, shape = TREATMENT, fill = TREATMENT)) + scale_y_continuous("Longevity (days)") + scale_x_continuous("Thorax length (mm)") + scale_linetype_manual("Substrate type", values = 1:5, breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + scale_shape_manual("Substrate type", values = c(23, 22, 21, 22, 21), breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + scale_fill_manual("Substrate type", values = c("black", "black", "black", "white", "white"), breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + theme_classic() + theme(legend.justification = c(0, 1), legend.position = c(0.1, 1), axis.title.y = element_text(vjust = 2, size = rel(1.25)), axis.title.x = element_text(vjust = -2, size = rel(1.25)), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))
partridge.mcmc = partridge.r2jags$BUGSoutput$sims.matrix ## Calculate the fitted values newdata = with(partridge, expand.grid(TREATMENT = levels(TREATMENT), THORAX = seq(min(THORAX), max(THORAX), len = 100))) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = c(which(colnames(partridge.mcmc) == "beta0"), grep("beta\\[", colnames(partridge.mcmc))) coefs = partridge.mcmc[, wch] fit = 10^(coefs %*% t(Xmat)) newdata.95 = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.level = 0.95, conf.method = "HPDinterval")) newdata.80 = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.level = 0.8, conf.method = "HPDinterval")) newdata.95 %>% head
TREATMENT THORAX estimate std.error conf.low conf.high 1 None 0.6400000 35.96099 2.150108 31.84513 40.20482 2 Preg1 0.6400000 37.84543 2.204559 33.43408 42.08953 3 Preg8 0.6400000 39.10798 2.144250 34.96804 43.38181 4 Virg1 0.6400000 31.76434 1.908763 28.00014 35.46843 5 Virg8 0.6400000 23.65820 1.274451 21.22775 26.21792 6 None 0.6430303 36.26065 2.148823 32.15733 40.50511
ggplot(newdata.95, aes(y = estimate, x = THORAX, color = TREATMENT)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = TREATMENT), color = NA, alpha = 0.2) + geom_ribbon(data = newdata.80, aes(ymin = conf.low, ymax = conf.high, fill = TREATMENT), color = NA, alpha = 0.4) + geom_line() + scale_y_continuous("Male fruitfly longevity (days)") + scale_fill_discrete("Substrate type", breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners")) + scale_color_discrete("Substrate type", breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners")) + theme_classic()
# And now with partial residuals fdata = rdata = partridge fMat = rMat = model.matrix(~TREATMENT + THORAX, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(log10(partridge$LONGEV) - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = 10^(resid + fit)) br <- log10(pretty(partridge$LONGEV)) ggplot(newdata.95, aes(y = estimate, x = THORAX, linetype = TREATMENT)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "grey", alpha = 0.2) + geom_point(data = rdata, aes(y = partial.resid, shape = TREATMENT, fill = TREATMENT)) + scale_y_continuous("Longevity (days)") + scale_x_continuous("Thorax length (mm)") + scale_linetype_manual("Substrate type", values = 1:5, breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + scale_shape_manual("Substrate type", values = c(23, 22, 21, 22, 21), breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + scale_fill_manual("Substrate type", values = c("black", "black", "black", "white", "white"), breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + theme_classic() + theme(legend.justification = c(0, 1), legend.position = c(0.1, 1), axis.title.y = element_text(vjust = 2, size = rel(1.25)), axis.title.x = element_text(vjust = -2, size = rel(1.25)), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))
partridge.mcmc = as.matrix(partridge.rstan) ## Calculate the fitted values newdata = with(partridge, expand.grid(TREATMENT = levels(TREATMENT), THORAX = seq(min(THORAX), max(THORAX), len = 100))) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = c(which(colnames(partridge.mcmc) == "beta0"), grep("beta\\[", colnames(partridge.mcmc))) coefs = partridge.mcmc[, wch] fit = 10^(coefs %*% t(Xmat)) newdata.95 = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.level = 0.95, conf.method = "HPDinterval")) newdata.80 = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.level = 0.8, conf.method = "HPDinterval")) newdata.95 %>% head
TREATMENT THORAX estimate std.error conf.low conf.high 1 None 0.6400000 35.95150 2.171163 32.04491 40.56730 2 Preg1 0.6400000 37.90240 2.225653 33.73523 42.38996 3 Preg8 0.6400000 39.07261 2.150087 34.97066 43.26534 4 Virg1 0.6400000 31.77152 1.919648 28.18076 35.68269 5 Virg8 0.6400000 23.68545 1.287215 21.32823 26.26502 6 None 0.6430303 36.25091 2.169872 32.34034 40.84279
ggplot(newdata.95, aes(y = estimate, x = THORAX, color = TREATMENT)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = TREATMENT), color = NA, alpha = 0.2) + geom_ribbon(data = newdata.80, aes(ymin = conf.low, ymax = conf.high, fill = TREATMENT), color = NA, alpha = 0.4) + geom_line() + scale_y_continuous("Male fruitfly longevity (days)") + scale_fill_discrete("Substrate type", breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners")) + scale_color_discrete("Substrate type", breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners")) + theme_classic()
# And now with partial residuals fdata = rdata = partridge fMat = rMat = model.matrix(~TREATMENT + THORAX, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(log10(partridge$LONGEV) - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = 10^(resid + fit)) br <- log10(pretty(partridge$LONGEV)) ggplot(newdata.95, aes(y = estimate, x = THORAX, linetype = TREATMENT)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "grey", alpha = 0.2) + geom_point(data = rdata, aes(y = partial.resid, shape = TREATMENT, fill = TREATMENT)) + scale_y_continuous("Longevity (days)") + scale_x_continuous("Thorax length (mm)") + scale_linetype_manual("Substrate type", values = 1:5, breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + scale_shape_manual("Substrate type", values = c(23, 22, 21, 22, 21), breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + scale_fill_manual("Substrate type", values = c("black", "black", "black", "white", "white"), breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + theme_classic() + theme(legend.justification = c(0, 1), legend.position = c(0.1, 1), axis.title.y = element_text(vjust = 2, size = rel(1.25)), axis.title.x = element_text(vjust = -2, size = rel(1.25)), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))
partridge.mcmc = as.matrix(partridge.rstanarm) ## Calculate the fitted values newdata = with(partridge, expand.grid(TREATMENT = levels(TREATMENT), THORAX = seq(min(THORAX), max(THORAX), len = 100))) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = grep("Intercept|TREATMENT|THORAX", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] fit = 10^(coefs %*% t(Xmat)) newdata.95 = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.level = 0.95, conf.method = "HPDinterval")) newdata.80 = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.level = 0.8, conf.method = "HPDinterval")) newdata.95 %>% head
TREATMENT THORAX estimate std.error conf.low conf.high 1 None 0.6400000 36.00583 2.159989 31.81425 40.09867 2 Preg1 0.6400000 37.90180 2.179165 33.48592 42.07839 3 Preg8 0.6400000 39.14320 2.141342 35.04090 43.33835 4 Virg1 0.6400000 31.79228 1.886028 27.94081 35.30833 5 Virg8 0.6400000 23.65624 1.258929 21.30229 26.22091 6 None 0.6430303 36.30531 2.158638 32.06754 40.35430
ggplot(newdata.95, aes(y = estimate, x = THORAX, color = TREATMENT)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = TREATMENT), color = NA, alpha = 0.2) + geom_ribbon(data = newdata.80, aes(ymin = conf.low, ymax = conf.high, fill = TREATMENT), color = NA, alpha = 0.4) + geom_line() + scale_y_continuous("Male fruitfly longevity (days)") + scale_fill_discrete("Substrate type", breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners")) + scale_color_discrete("Substrate type", breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners")) + theme_classic()
# And now with partial residuals fdata = rdata = partridge fMat = rMat = model.matrix(~TREATMENT + THORAX, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(log10(partridge$LONGEV) - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = 10^(resid + fit)) br <- log10(pretty(partridge$LONGEV)) ggplot(newdata.95, aes(y = estimate, x = THORAX, linetype = TREATMENT)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "grey", alpha = 0.2) + geom_point(data = rdata, aes(y = partial.resid, shape = TREATMENT, fill = TREATMENT)) + scale_y_continuous("Longevity (days)") + scale_x_continuous("Thorax length (mm)") + scale_linetype_manual("Substrate type", values = 1:5, breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + scale_shape_manual("Substrate type", values = c(23, 22, 21, 22, 21), breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + scale_fill_manual("Substrate type", values = c("black", "black", "black", "white", "white"), breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + theme_classic() + theme(legend.justification = c(0, 1), legend.position = c(0.1, 1), axis.title.y = element_text(vjust = 2, size = rel(1.25)), axis.title.x = element_text(vjust = -2, size = rel(1.25)), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))
partridge.mcmc = as.matrix(partridge.brm) ## Calculate the fitted values newdata = with(partridge, expand.grid(TREATMENT = levels(TREATMENT), THORAX = seq(min(THORAX), max(THORAX), len = 100))) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = grep("b_", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] fit = 10^(coefs %*% t(Xmat)) newdata.95 = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.level = 0.95, conf.method = "HPDinterval")) newdata.80 = newdata %>% cbind(tidyMCMC(fit, conf.int = TRUE, conf.level = 0.8, conf.method = "HPDinterval")) newdata.95 %>% head
TREATMENT THORAX estimate std.error conf.low conf.high 1 None 0.6400000 35.96536 2.125997 31.82643 40.13090 2 Preg1 0.6400000 37.86659 2.190934 33.72999 42.27928 3 Preg8 0.6400000 39.08282 2.151109 34.95758 43.39081 4 Virg1 0.6400000 31.75761 1.891817 27.93170 35.27973 5 Virg8 0.6400000 23.64734 1.288577 21.28669 26.30815 6 None 0.6430303 36.26515 2.124425 32.13702 40.42999
ggplot(newdata.95, aes(y = estimate, x = THORAX, color = TREATMENT)) + geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = TREATMENT), color = NA, alpha = 0.2) + geom_ribbon(data = newdata.80, aes(ymin = conf.low, ymax = conf.high, fill = TREATMENT), color = NA, alpha = 0.4) + geom_line() + scale_y_continuous("Male fruitfly longevity (days)") + scale_fill_discrete("Substrate type", breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners")) + scale_color_discrete("Substrate type", breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners")) + theme_classic()
# And now with partial residuals fdata = rdata = partridge fMat = rMat = model.matrix(~TREATMENT + THORAX, fdata) fit = as.vector(apply(coefs, 2, median) %*% t(fMat)) resid = as.vector(log10(partridge$LONGEV) - apply(coefs, 2, median) %*% t(rMat)) rdata = rdata %>% mutate(partial.resid = 10^(resid + fit)) br <- log10(pretty(partridge$LONGEV)) ggplot(newdata.95, aes(y = estimate, x = THORAX, linetype = TREATMENT)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "grey", alpha = 0.2) + geom_point(data = rdata, aes(y = partial.resid, shape = TREATMENT, fill = TREATMENT)) + scale_y_continuous("Longevity (days)") + scale_x_continuous("Thorax length (mm)") + scale_linetype_manual("Substrate type", values = 1:5, breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + scale_shape_manual("Substrate type", values = c(23, 22, 21, 22, 21), breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + scale_fill_manual("Substrate type", values = c("black", "black", "black", "white", "white"), breaks = c("None", "Preg1", "Preg8", "Virg1", "Virg8"), labels = c("No\npartners", "1 pregnant\npartner", "8 pregnant\npartners", "1 virgin\npartner", "8 virgin\npartners"), guide = guide_legend(ncol = 2)) + theme_classic() + theme(legend.justification = c(0, 1), legend.position = c(0.1, 1), axis.title.y = element_text(vjust = 2, size = rel(1.25)), axis.title.x = element_text(vjust = -2, size = rel(1.25)), plot.margin = unit(c(0.5, 0.5, 2, 2), "lines"))
- We have established that the male fruitfly longevity varies across the treatments.
The effects model directly compared each of the partner type to the control treatment with no partners.
We might also be interested in describing the difference in male fruitfly longevity between other
combinations of partner types. Lets compare each treatment group to each other in a pairwise manner.
Note, as we elected to fit the model on logarithmic (base ten) transformed longevity, we have to be a
bit careful in how we process the results. Without any back transforms, the comparisons will be on a log
scale. If we wish to express them on the natural response scale (difference in days, we will need to do a
bit more manipulation and remember our log laws ($log(A) - log(B) = log(A)/log(B)$). Thus if we back transform
the contrasts, we will get results that reflect the factor change.
library(MCMCpack) partridge.mcmc = partridge.mcmcpack wch = grep("Intercept|TREATMENT|THORAX", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] newdata = with(partridge, expand.grid(TREATMENT = levels(TREATMENT), THORAX = mean(THORAX, na.rm = TRUE))) # A Tukeys contrast matrix library(multcomp) tuk.mat <- contrMat(n = table(newdata$TREATMENT), type = "Tukey") Xmat <- model.matrix(~TREATMENT + THORAX, data = newdata) pairwise.mat <- tuk.mat %*% Xmat pairwise.mat
(Intercept) TREATMENTPreg1 TREATMENTPreg8 TREATMENTVirg1 TREATMENTVirg8 THORAX Preg1 - None 0 1 0 0 0 0 Preg8 - None 0 0 1 0 0 0 Virg1 - None 0 0 0 1 0 0 Virg8 - None 0 0 0 0 1 0 Preg8 - Preg1 0 -1 1 0 0 0 Virg1 - Preg1 0 -1 0 1 0 0 Virg8 - Preg1 0 -1 0 0 1 0 Virg1 - Preg8 0 0 -1 1 0 0 Virg8 - Preg8 0 0 -1 0 1 0 Virg8 - Virg1 0 0 0 -1 1 0
mcmc_areas(coefs %*% t(pairwise.mat))
(comps = tidyMCMC(coefs %*% t(pairwise.mat), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Preg1 - None 0.02226244 0.02386887 -0.02580077 0.068161673 2 Preg8 - None 0.03649878 0.02414175 -0.01202824 0.082087102 3 Virg1 - None -0.05404332 0.02401023 -0.10160748 -0.007550831 4 Virg8 - None -0.18197874 0.02414616 -0.23133368 -0.135891212 5 Preg8 - Preg1 0.01423634 0.02396522 -0.03512377 0.059624253 6 Virg1 - Preg1 -0.07630575 0.02391487 -0.12156718 -0.028062534 7 Virg8 - Preg1 -0.20424117 0.02371341 -0.25045812 -0.156732472 8 Virg1 - Preg8 -0.09054210 0.02433891 -0.13738602 -0.041521436 9 Virg8 - Preg8 -0.21847752 0.02396819 -0.26613085 -0.171726858 10 Virg8 - Virg1 -0.12793542 0.02426936 -0.17484823 -0.080668677
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
## With back transformed mcmc_areas(10^(coefs %*% t(pairwise.mat)))
(comps1 = tidyMCMC(10^(coefs %*% t(pairwise.mat)), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Preg1 - None 1.0541883 0.05796784 0.9423218 1.1699348 2 Preg8 - None 1.0893557 0.06060224 0.9684462 1.2032111 3 Virg1 - None 0.8843419 0.04891830 0.7906505 0.9820142 4 Virg8 - None 0.6587068 0.03662892 0.5855194 0.7297431 5 Preg8 - Preg1 1.0348974 0.05712792 0.9223085 1.1471607 6 Virg1 - Preg1 0.8401419 0.04629954 0.7512269 0.9324124 7 Virg8 - Preg1 0.6257574 0.03418221 0.5595466 0.6945545 8 Virg1 - Preg8 0.8130922 0.04560181 0.7219419 0.9014199 9 Virg8 - Preg8 0.6055967 0.03343095 0.5375230 0.6689410 10 Virg8 - Virg1 0.7460068 0.04173700 0.6685775 0.8304841
ggplot(comps1, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 1, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
library(MCMCpack) partridge.mcmc = partridge.r2jags$BUGSoutput$sims.matrix wch = c(which(colnames(partridge.mcmc) == "beta0"), grep("beta\\[", colnames(partridge.mcmc))) coefs = partridge.mcmc[, wch] newdata = with(partridge, expand.grid(TREATMENT = levels(TREATMENT), THORAX = mean(THORAX, na.rm = TRUE))) # A Tukeys contrast matrix library(multcomp) tuk.mat <- contrMat(n = table(newdata$TREATMENT), type = "Tukey") Xmat <- model.matrix(~TREATMENT + THORAX, data = newdata) pairwise.mat <- tuk.mat %*% Xmat pairwise.mat
(Intercept) TREATMENTPreg1 TREATMENTPreg8 TREATMENTVirg1 TREATMENTVirg8 THORAX Preg1 - None 0 1 0 0 0 0 Preg8 - None 0 0 1 0 0 0 Virg1 - None 0 0 0 1 0 0 Virg8 - None 0 0 0 0 1 0 Preg8 - Preg1 0 -1 1 0 0 0 Virg1 - Preg1 0 -1 0 1 0 0 Virg8 - Preg1 0 -1 0 0 1 0 Virg1 - Preg8 0 0 -1 1 0 0 Virg8 - Preg8 0 0 -1 0 1 0 Virg8 - Virg1 0 0 0 -1 1 0
mcmc_areas(coefs %*% t(pairwise.mat))
(comps = tidyMCMC(coefs %*% t(pairwise.mat), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Preg1 - None 0.02222087 0.02410704 -0.02387136 0.070289150 2 Preg8 - None 0.03655562 0.02413169 -0.01101267 0.082524482 3 Virg1 - None -0.05390009 0.02394861 -0.09989605 -0.006049349 4 Virg8 - None -0.18170503 0.02426155 -0.22861460 -0.133874873 5 Preg8 - Preg1 0.01433475 0.02402038 -0.03409306 0.060470474 6 Virg1 - Preg1 -0.07612096 0.02380513 -0.12371581 -0.030310117 7 Virg8 - Preg1 -0.20392590 0.02419627 -0.25216901 -0.157132792 8 Virg1 - Preg8 -0.09045571 0.02415687 -0.13813097 -0.043683249 9 Virg8 - Preg8 -0.21826065 0.02403122 -0.26479914 -0.171535168 10 Virg8 - Virg1 -0.12780494 0.02419583 -0.17496673 -0.080363292
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
## With back transformed mcmc_areas(10^(coefs %*% t(pairwise.mat)))
(comps1 = tidyMCMC(10^(coefs %*% t(pairwise.mat)), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Preg1 - None 1.0541196 0.05855850 0.9400143 1.1681953 2 Preg8 - None 1.0894963 0.06054854 0.9744850 1.2087235 3 Virg1 - None 0.8846269 0.04881624 0.7922053 0.9837108 4 Virg8 - None 0.6591326 0.03686490 0.5880233 0.7316295 5 Preg8 - Preg1 1.0351401 0.05731642 0.9245000 1.1493981 6 Virg1 - Preg1 0.8404883 0.04613413 0.7521149 0.9325881 7 Virg8 - Preg1 0.6262506 0.03492673 0.5595398 0.6964135 8 Virg1 - Preg8 0.8132350 0.04527198 0.7270732 0.9038008 9 Virg8 - Preg8 0.6059049 0.03357668 0.5435016 0.6736973 10 Virg8 - Virg1 0.7462239 0.04162240 0.6678105 0.8304426
ggplot(comps1, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 1, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
library(MCMCpack) partridge.mcmc = as.matrix(partridge.rstan) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = c(which(colnames(partridge.mcmc) == "beta0"), grep("beta\\[", colnames(partridge.mcmc))) coefs = partridge.mcmc[, wch] newdata = with(partridge, expand.grid(TREATMENT = levels(TREATMENT), THORAX = mean(THORAX, na.rm = TRUE))) # A Tukeys contrast matrix library(multcomp) tuk.mat <- contrMat(n = table(newdata$TREATMENT), type = "Tukey") Xmat <- model.matrix(~TREATMENT + THORAX, data = newdata) pairwise.mat <- tuk.mat %*% Xmat pairwise.mat
(Intercept) TREATMENTPreg1 TREATMENTPreg8 TREATMENTVirg1 TREATMENTVirg8 THORAX Preg1 - None 0 1 0 0 0 0 Preg8 - None 0 0 1 0 0 0 Virg1 - None 0 0 0 1 0 0 Virg8 - None 0 0 0 0 1 0 Preg8 - Preg1 0 -1 1 0 0 0 Virg1 - Preg1 0 -1 0 1 0 0 Virg8 - Preg1 0 -1 0 0 1 0 Virg1 - Preg8 0 0 -1 1 0 0 Virg8 - Preg8 0 0 -1 0 1 0 Virg8 - Virg1 0 0 0 -1 1 0
mcmc_areas(coefs %*% t(pairwise.mat))
(comps = tidyMCMC(coefs %*% t(pairwise.mat), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Preg1 - None 0.02299254 0.02408381 -0.02571685 0.067810084 2 Preg8 - None 0.03628671 0.02437624 -0.01049910 0.083787879 3 Virg1 - None -0.05368096 0.02366228 -0.10081880 -0.009464246 4 Virg8 - None -0.18108684 0.02451849 -0.22869622 -0.132604319 5 Preg8 - Preg1 0.01329417 0.02395326 -0.03264648 0.062360678 6 Virg1 - Preg1 -0.07667351 0.02383856 -0.12247183 -0.030528752 7 Virg8 - Preg1 -0.20407938 0.02424689 -0.25291361 -0.156602516 8 Virg1 - Preg8 -0.08996768 0.02416122 -0.13611621 -0.041840729 9 Virg8 - Preg8 -0.21737355 0.02391817 -0.26345342 -0.170901602 10 Virg8 - Virg1 -0.12740587 0.02428197 -0.17411665 -0.079437471
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
## With back transformed mcmc_areas(10^(coefs %*% t(pairwise.mat)))
(comps1 = tidyMCMC(10^(coefs %*% t(pairwise.mat)), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Preg1 - None 1.0559907 0.05858701 0.9383382 1.1646937 2 Preg8 - None 1.0888563 0.06114756 0.9758079 1.2124721 3 Virg1 - None 0.8850402 0.04819619 0.7928321 0.9784435 4 Virg8 - None 0.6600920 0.03724069 0.5898256 0.7359186 5 Preg8 - Preg1 1.0326541 0.05703137 0.9215931 1.1470933 6 Virg1 - Preg1 0.8394233 0.04615294 0.7542723 0.9321188 7 Virg8 - Preg1 0.6260333 0.03498437 0.5583422 0.6969889 8 Virg1 - Preg8 0.8141491 0.04529563 0.7309435 0.9081535 9 Virg8 - Preg8 0.6071349 0.03348412 0.5447600 0.6741852 10 Virg8 - Virg1 0.7469190 0.04185802 0.6680023 0.8307720
ggplot(comps1, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 1, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
library(MCMCpack) partridge.mcmc = as.matrix(partridge.rstanarm) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = grep("Intercept|TREATMENT|THORAX", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] newdata = with(partridge, expand.grid(TREATMENT = levels(TREATMENT), THORAX = mean(THORAX, na.rm = TRUE))) # A Tukeys contrast matrix library(multcomp) tuk.mat <- contrMat(n = table(newdata$TREATMENT), type = "Tukey") Xmat <- model.matrix(~TREATMENT + THORAX, data = newdata) pairwise.mat <- tuk.mat %*% Xmat pairwise.mat
(Intercept) TREATMENTPreg1 TREATMENTPreg8 TREATMENTVirg1 TREATMENTVirg8 THORAX Preg1 - None 0 1 0 0 0 0 Preg8 - None 0 0 1 0 0 0 Virg1 - None 0 0 0 1 0 0 Virg8 - None 0 0 0 0 1 0 Preg8 - Preg1 0 -1 1 0 0 0 Virg1 - Preg1 0 -1 0 1 0 0 Virg8 - Preg1 0 -1 0 0 1 0 Virg1 - Preg8 0 0 -1 1 0 0 Virg8 - Preg8 0 0 -1 0 1 0 Virg8 - Virg1 0 0 0 -1 1 0
mcmc_areas(coefs %*% t(pairwise.mat))
(comps = tidyMCMC(coefs %*% t(pairwise.mat), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Preg1 - None 0.02234996 0.02407488 -0.02338140 0.070865192 2 Preg8 - None 0.03641433 0.02440350 -0.01013371 0.086076915 3 Virg1 - None -0.05403215 0.02354898 -0.09925124 -0.006906492 4 Virg8 - None -0.18226103 0.02398424 -0.22966723 -0.136666158 5 Preg8 - Preg1 0.01406437 0.02412565 -0.02968572 0.065601507 6 Virg1 - Preg1 -0.07638211 0.02392343 -0.12328881 -0.029989351 7 Virg8 - Preg1 -0.20461099 0.02415189 -0.25341139 -0.159071623 8 Virg1 - Preg8 -0.09044648 0.02433552 -0.13815607 -0.042642378 9 Virg8 - Preg8 -0.21867536 0.02387648 -0.26542574 -0.170323352 10 Virg8 - Virg1 -0.12822888 0.02393159 -0.17425325 -0.080769365
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
## With back transformed mcmc_areas(10^(coefs %*% t(pairwise.mat)))
(comps1 = tidyMCMC(10^(coefs %*% t(pairwise.mat)), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Preg1 - None 1.0544289 0.05852064 0.9411695 1.1700444 2 Preg8 - None 1.0891800 0.06122090 0.9669837 1.2075192 3 Virg1 - None 0.8843145 0.04804840 0.7933975 0.9816892 4 Virg8 - None 0.6582655 0.03637684 0.5892950 0.7300185 5 Preg8 - Preg1 1.0345108 0.05759814 0.9221479 1.1490921 6 Virg1 - Preg1 0.8399958 0.04635241 0.7520730 0.9324562 7 Virg8 - Preg1 0.6252592 0.03476928 0.5579414 0.6933115 8 Virg1 - Preg8 0.8132717 0.04565201 0.7240084 0.9027741 9 Virg8 - Preg8 0.6053146 0.03332360 0.5402115 0.6728358 10 Virg8 - Virg1 0.7454694 0.04105743 0.6646395 0.8252720
ggplot(comps1, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 1, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
library(MCMCpack) partridge.mcmc = as.matrix(partridge.brm) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = grep("b_", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] newdata = with(partridge, expand.grid(TREATMENT = levels(TREATMENT), THORAX = mean(THORAX, na.rm = TRUE))) # A Tukeys contrast matrix library(multcomp) tuk.mat <- contrMat(n = table(newdata$TREATMENT), type = "Tukey") Xmat <- model.matrix(~TREATMENT + THORAX, data = newdata) pairwise.mat <- tuk.mat %*% Xmat pairwise.mat
(Intercept) TREATMENTPreg1 TREATMENTPreg8 TREATMENTVirg1 TREATMENTVirg8 THORAX Preg1 - None 0 1 0 0 0 0 Preg8 - None 0 0 1 0 0 0 Virg1 - None 0 0 0 1 0 0 Virg8 - None 0 0 0 0 1 0 Preg8 - Preg1 0 -1 1 0 0 0 Virg1 - Preg1 0 -1 0 1 0 0 Virg8 - Preg1 0 -1 0 0 1 0 Virg1 - Preg8 0 0 -1 1 0 0 Virg8 - Preg8 0 0 -1 0 1 0 Virg8 - Virg1 0 0 0 -1 1 0
mcmc_areas(coefs %*% t(pairwise.mat))
(comps = tidyMCMC(coefs %*% t(pairwise.mat), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Preg1 - None 0.02240460 0.02401104 -0.024874580 0.068487474 2 Preg8 - None 0.03620256 0.02384339 -0.008463557 0.084372278 3 Virg1 - None -0.05404743 0.02381586 -0.100910410 -0.008313408 4 Virg8 - None -0.18198728 0.02386935 -0.228420724 -0.135256737 5 Preg8 - Preg1 0.01379796 0.02421139 -0.033271153 0.061604483 6 Virg1 - Preg1 -0.07645203 0.02403209 -0.123225559 -0.028926763 7 Virg8 - Preg1 -0.20439189 0.02427000 -0.250323885 -0.155967121 8 Virg1 - Preg8 -0.09024999 0.02394566 -0.136639855 -0.043432356 9 Virg8 - Preg8 -0.21818985 0.02415905 -0.265948399 -0.171503685 10 Virg8 - Virg1 -0.12793985 0.02412511 -0.176962850 -0.081758344
ggplot(comps, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 0, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
## With back transformed mcmc_areas(10^(coefs %*% t(pairwise.mat)))
(comps1 = tidyMCMC(10^(coefs %*% t(pairwise.mat)), conf.int = TRUE, conf.method = "HPDinterval"))
term estimate std.error conf.low conf.high 1 Preg1 - None 1.0545530 0.05836951 0.9443336 1.1708128 2 Preg8 - None 1.0885725 0.05986066 0.9797715 1.2133891 3 Virg1 - None 0.8843113 0.04850064 0.7926648 0.9810397 4 Virg8 - None 0.6586722 0.03629164 0.5909888 0.7323914 5 Preg8 - Preg1 1.0338855 0.05765372 0.9262513 1.1524033 6 Virg1 - Preg1 0.8398717 0.04652671 0.7490572 0.9312345 7 Virg8 - Preg1 0.6255845 0.03497534 0.5603392 0.6965585 8 Virg1 - Preg8 0.8135973 0.04483752 0.7239839 0.8980175 9 Virg8 - Preg8 0.6060138 0.03377457 0.5403503 0.6717646 10 Virg8 - Virg1 0.7459856 0.04150406 0.6613482 0.8240819
ggplot(comps1, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + geom_hline(yintercept = 1, linetype = "dashed") + scale_y_continuous("Effect size") + scale_x_discrete("") + coord_flip() + theme_classic()
So (for example), the longevity of male fruitflies with 8 virgin partners is approximately
0.61
that of male fruitflies with 8 pregnant partners. Similar contrasts to those defined in Tutorial 7.4b, Q3 could be defined to explore specific comparisons. - Explore $R^2$
library(MCMCpack) library(broom) partridge.mcmc <- partridge.mcmcpack Xmat = model.matrix(~TREATMENT + THORAX, data = partridge) wch = grep("Intercept|TREATMENT|THORAX", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, log10(partridge$LONGEV), "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.6984863 0.02600139 0.6472273 0.7451868
# for comparison with frequentist summary(lm(log10(LONGEV) ~ TREATMENT + THORAX, data = partridge))
Call: lm(formula = log10(LONGEV) ~ TREATMENT + THORAX, data = partridge) Residuals: Min 1Q Median 3Q Max -0.226736 -0.058445 -0.003469 0.059961 0.170389 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.79095 0.08443 9.368 5.89e-16 *** TREATMENTPreg1 0.02260 0.02368 0.954 0.3419 TREATMENTPreg8 0.03648 0.02385 1.530 0.1287 TREATMENTVirg1 -0.05381 0.02366 -2.275 0.0247 * TREATMENTVirg8 -0.18165 0.02392 -7.592 7.79e-12 *** THORAX 1.19385 0.09900 12.060 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.08364 on 119 degrees of freedom Multiple R-squared: 0.7055, Adjusted R-squared: 0.6932 F-statistic: 57.02 on 5 and 119 DF, p-value: < 2.2e-16
partridge.mcmc = partridge.r2jags$BUGSoutput$sims.matrix wch = c(which(colnames(partridge.mcmc) == "beta0"), grep("beta\\[", colnames(partridge.mcmc))) coefs = partridge.mcmc[, wch] Xmat = model.matrix(~TREATMENT + THORAX, data = partridge) fit = coefs %*% t(Xmat) resid = sweep(fit, 2, log10(partridge$LONGEV), "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.6983588 0.02576383 0.6474857 0.7445594
# for comparison with frequentist summary(lm(log10(LONGEV) ~ TREATMENT + THORAX, data = partridge))
Call: lm(formula = log10(LONGEV) ~ TREATMENT + THORAX, data = partridge) Residuals: Min 1Q Median 3Q Max -0.226736 -0.058445 -0.003469 0.059961 0.170389 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.79095 0.08443 9.368 5.89e-16 *** TREATMENTPreg1 0.02260 0.02368 0.954 0.3419 TREATMENTPreg8 0.03648 0.02385 1.530 0.1287 TREATMENTVirg1 -0.05381 0.02366 -2.275 0.0247 * TREATMENTVirg8 -0.18165 0.02392 -7.592 7.79e-12 *** THORAX 1.19385 0.09900 12.060 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.08364 on 119 degrees of freedom Multiple R-squared: 0.7055, Adjusted R-squared: 0.6932 F-statistic: 57.02 on 5 and 119 DF, p-value: < 2.2e-16
partridge.mcmc = as.matrix(partridge.rstan) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = c(which(colnames(partridge.mcmc) == "beta0"), grep("beta\\[", colnames(partridge.mcmc))) coefs = partridge.mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, log10(partridge$LONGEV), "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.376688 0.148472 0.1275442 0.6879338
# for comparison with frequentist summary(lm(log10(LONGEV) ~ TREATMENT + THORAX, data = partridge))
Call: lm(formula = log10(LONGEV) ~ TREATMENT + THORAX, data = partridge) Residuals: Min 1Q Median 3Q Max -0.226736 -0.058445 -0.003469 0.059961 0.170389 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.79095 0.08443 9.368 5.89e-16 *** TREATMENTPreg1 0.02260 0.02368 0.954 0.3419 TREATMENTPreg8 0.03648 0.02385 1.530 0.1287 TREATMENTVirg1 -0.05381 0.02366 -2.275 0.0247 * TREATMENTVirg8 -0.18165 0.02392 -7.592 7.79e-12 *** THORAX 1.19385 0.09900 12.060 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.08364 on 119 degrees of freedom Multiple R-squared: 0.7055, Adjusted R-squared: 0.6932 F-statistic: 57.02 on 5 and 119 DF, p-value: < 2.2e-16
partridge.mcmc = as.matrix(partridge.rstanarm) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = grep("Intercept|TREATMENT|THORAX", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, log10(partridge$LONGEV), "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.3773228 0.1478388 0.1241807 0.6818132
# for comparison with frequentist summary(lm(log10(LONGEV) ~ TREATMENT + THORAX, data = partridge))
Call: lm(formula = log10(LONGEV) ~ TREATMENT + THORAX, data = partridge) Residuals: Min 1Q Median 3Q Max -0.226736 -0.058445 -0.003469 0.059961 0.170389 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.79095 0.08443 9.368 5.89e-16 *** TREATMENTPreg1 0.02260 0.02368 0.954 0.3419 TREATMENTPreg8 0.03648 0.02385 1.530 0.1287 TREATMENTVirg1 -0.05381 0.02366 -2.275 0.0247 * TREATMENTVirg8 -0.18165 0.02392 -7.592 7.79e-12 *** THORAX 1.19385 0.09900 12.060 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.08364 on 119 degrees of freedom Multiple R-squared: 0.7055, Adjusted R-squared: 0.6932 F-statistic: 57.02 on 5 and 119 DF, p-value: < 2.2e-16
partridge.mcmc = as.matrix(partridge.brm) Xmat = model.matrix(~TREATMENT + THORAX, newdata) wch = grep("b_", colnames(partridge.mcmc)) coefs = partridge.mcmc[, wch] fit = coefs %*% t(Xmat) resid = sweep(fit, 2, log10(partridge$LONGEV), "-") var_f = apply(fit, 1, var) var_e = apply(resid, 1, var) R2 = var_f/(var_f + var_e) tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
term estimate std.error conf.low conf.high 1 var1 0.3771413 0.1483151 0.1227373 0.6817201
# for comparison with frequentist summary(lm(log10(LONGEV) ~ TREATMENT + THORAX, data = partridge))
Call: lm(formula = log10(LONGEV) ~ TREATMENT + THORAX, data = partridge) Residuals: Min 1Q Median 3Q Max -0.226736 -0.058445 -0.003469 0.059961 0.170389 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.79095 0.08443 9.368 5.89e-16 *** TREATMENTPreg1 0.02260 0.02368 0.954 0.3419 TREATMENTPreg8 0.03648 0.02385 1.530 0.1287 TREATMENTVirg1 -0.05381 0.02366 -2.275 0.0247 * TREATMENTVirg8 -0.18165 0.02392 -7.592 7.79e-12 *** THORAX 1.19385 0.09900 12.060 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.08364 on 119 degrees of freedom Multiple R-squared: 0.7055, Adjusted R-squared: 0.6932 F-statistic: 57.02 on 5 and 119 DF, p-value: < 2.2e-16
The chains appear well mixed and have converged on what is likely to be a stable posterior.
The model validation diagnostics all seem reasonable suggesting that the model is likely to be reliable.
Heterogeneous slopes
Constable (1993) compared the inter-radial suture widths of urchins maintained on one of three food regimes (Initial: no additional food supplied above what was in the initial sample, low: food supplied periodically and high: food supplied ad libitum. In an attempt to control for substantial variability in urchin sizes, the initial body volume of each urchin was measured as a covariate.
Download Partridge1 data setFormat of constable.csv data file | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
constable <- read.csv("../downloads/data/constable.csv", strip.white = T) head(constable)
TREAT IV SUTW 1 Initial 3.5 0.010 2 Initial 5.0 0.020 3 Initial 8.0 0.061 4 Initial 10.0 0.051 5 Initial 13.0 0.041 6 Initial 13.0 0.061
- Perform basic exploratory data analysis to examined.
- normality
- homogeneity of variance
- homogeneity of slopes (and equality of covariate ranges)
ggplot(constable, aes(y = SUTW, x = IV, color = TREAT)) + geom_smooth() + geom_point() + theme_classic()
# Or with linear smoother ggplot(constable, aes(y = SUTW, x = IV, color = TREAT)) + geom_smooth(method = "lm") + geom_point() + theme_classic()
- To obtain a quick estimation of the likely residuals,
lets explore the residuals from a simple linear model fitted in a frequentist
framework.
plot(lm(SUTW ~ IV * TREAT, data = constable))
- Given the likely heterogeneous slopes, it is arguably best
to fit a model that permits heterogeneous slopes (that is, includes the
IV by TREAT interaction).
Fit the appropriate Bayesian model to explore the effect of treatment and inital volume
(and their interaction) on the width of sutures.
$$
\begin{align}
y_{ij} &\sim{} N(\mu_{ij}, \sigma)\\
\mu_{ij} &= \beta_0 + \mathbf{X}\boldsymbol{\beta}\\[1em]
\beta_0 &\sim{} N(0,10)\\
\beta &\sim{} N(0,10)\\
\sigma &\sim{} cauchy(0,5)\\
\end{align}
$$
library(MCMCpack) constable.mcmcpack = MCMCregress(SUTW ~ I(IV^(1/3)) * TREAT, data = constable)
It would appear that it is likely that homogeneity of slopes has been violated - the slope associated with the Low treatment appears less steep than the slopes associated with either High or Initial. Although the range of the continuous covariate (IV) is slightly less for the High treatment (compared to the Initial and Low treatments), the difference is not dramatic (and appears broadly symmetrical).
Constable applied a third root transform on IV to improve normality and linearity. For consistency sake, we will do the same.
plot(lm(SUTW ~ I(IV^(1/3)) * TREAT, data = constable))