Jump to main navigation


Tutorial 7.3b - Multiple linear regression (Bayesian)

12 Jan 2018

Multiple and complex regression analyses can be useful for situations in which patterns in a response variable can not be adequately described by a single straight line resulting from a single predictor and/or a simple linear equation.

As the multiple linear regression design is very much consistent between frequentist and Bayesian approaches, you are advised to review the tutorial on frequentist multiple linear regression. Much of the important assumptions and exploratory data analysis issued discussed in that tutorial are also relevant in a Bayesian framework, yet for brevity reasons will not be repeated here.

General form of linear models

Additive model

$$y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+...+\beta_jx_{ij}+\epsilon_i$$ where $\beta_0$ is the population y-intercept (value of $y$ when all partial slopes equal zero), $\beta_1$, $\beta_2$, etc are the partial population slopes of $Y$ on $X_1$, $X_2$, etc respectively holding the other $X$ constant. $\epsilon_i$ is the random unexplained error or residual component.

The additive model assumes that the effect of one predictor variable (partial slope) is independent of the levels of the other predictor variables.

Multiplicative model

$$y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i1}x_{i2}+...+\epsilon_i$$ where $\beta_3x_{i1}x_{i2}$ is the interactive effect of $X_1$ and $X_2$ on $Y$ and it examines the degree to which the effect of one of the predictor variables depends on the levels of the other predictor variable(s).

Scenario and Data

Lets say we had set up a natural experiment in which we measured a response ($y$) from each of 20 sampling units ($n=20$) across a landscape. At the same time, we also measured two other continuous covariates ($x1$ and $x2$) from each of the sampling units. As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

set.seed(3)
n = 100
intercept = 5
temp = runif(n)
nitro = runif(n) + 0.8 * temp
int.eff = 2
temp.eff <- 0.85
nitro.eff <- 0.5
res = rnorm(n, 0, 1)
coef <- c(int.eff, temp.eff, nitro.eff, int.eff)
mm <- model.matrix(~temp * nitro)

y <- t(coef %*% t(mm)) + res
data <- data.frame(y, x1 = temp, x2 = nitro, cx1 = scale(temp,
    scale = F), cx2 = scale(nitro, scale = F))
head(data)
         y        x1        x2         cx1
1 3.513305 0.1680415 0.9007709 -0.31604197
2 5.090382 0.8075164 1.3281453  0.32343291
3 4.036943 0.3849424 0.5170847 -0.09914114
4 4.006436 0.3277343 0.9741312 -0.15634918
5 5.381677 0.6021007 1.0869787  0.11801718
6 4.530071 0.6043941 0.8240744  0.12031056
          cx2
1  0.02986272
2  0.45723717
3 -0.35382350
4  0.10322304
5  0.21607055
6 -0.04683372

With these sort of data, we are primarily interested in investigating whether there is a relationship between the continuous response variable and the components linear predictor (continuous predictors). We could model the relationship via either:

  • an additive model in which the effects of each predictor contribute in an additive way to the response - we do not allow for an interaction as we consider an interaction either not of great importance or likely to be absent.
  • and multiplicative model in which the effects of each predictor and their interaction contribute to the response - we allow for the impact of one predictor to vary across the range of the other predictor.

Centering data

When a linear model contains a covariate (continuous predictor variable) in addition to another predictor (continuous or categorical), it is nearly always advisable that the continuous predictor variables be centered prior to the analysis. Centering is a process by which the mean of a variable is subtracted from each of the values such that the scale of the variable is shifted so as to be centered around 0. Hence the mean of the new centered variable will be 0, yet it will retain the same variance.

Raw data

Centred data

There are multiple reasons for this:

  • Firstly, it provides some biological meaning to the y-intercept. Recall that the y-intercept is the value of Y when X is equal to zero. If X is centered, then the y-intercept represents the value of Y at the mid-point of the X range. The y-intercept of an un-centered X typically represents a un-real value of Y (as an X of 0 is often beyond the reasonable range of values).
  • Secondly, in multiplicative models (in which predictors and their interactions are included), main effects and interaction terms built from centered predictors will not be correlated to one another (see below)
  • Thirdly, for more complex models, centering the covariates can increase the likelihood that the modelling engine converges (arrives at a numerically stable and reliable outcome).
Note, centering will not effect the slope estimates.

In R, centering is easily achieved with the scale function. Note, the scale() function centers and scales (divides by standard deviation) data. We only really need to center the data, so we provide the argument scale=FALSE. Also note that the scale() function attaches the pre-centered mean (and standard deviation if scaling performed) as attributes to the scaled data in order to facilitate back-scaling to the original scale. While these attributes are often convenient, they do cause issues for some of the Bayesian routines and so we will strip these attributes using the as.numeric() function. Instead, we will create separate scalar variables to store the pre-scaled means.

data <- within(data, {
    cx1 <- as.numeric(scale(x1, scale = FALSE))
    cx2 <- as.numeric(scale(x2, scale = FALSE))
})
head(data)
         y        x1        x2         cx1
1 3.513305 0.1680415 0.9007709 -0.31604197
2 5.090382 0.8075164 1.3281453  0.32343291
3 4.036943 0.3849424 0.5170847 -0.09914114
4 4.006436 0.3277343 0.9741312 -0.15634918
5 5.381677 0.6021007 1.0869787  0.11801718
6 4.530071 0.6043941 0.8240744  0.12031056
          cx2
1  0.02986272
2  0.45723717
3 -0.35382350
4  0.10322304
5  0.21607055
6 -0.04683372
mean.x1 = mean(data$x1)
mean.x2 = mean(data$x2)

Assumptions

The assumptions are:
  1. All of the observations are independent - this must be addressed at the design and collection stages
  2. The response variable (and thus the residuals) should be normally distributed. A boxplot of the entire variable is usually useful for diagnosing major issues with normality.
  3. The response variable should be equally varied (variance should not be related to mean as these are supposed to be estimated separately). Scatterplots with linear smoothers can be useful for exploring the spread of observations around the trendline. The spread of observations around the trendline should not increase (or decrease) along its length.
  4. The predictor variables should be uniformly or normally distributed. Again, boxplots can be useful.
  5. The relationships between the linear predictors (right hand side of the regression formula) and the response variable should be linear. Scatterplots with smoothers can be useful for identifying possible non-linearity.
  6. (Multi)collinearity - see below
  7. The number of predictor variables must be less than the number of observations otherwise the linear model will be over-parameterized (more parameters to estimate than there are independent data from which estimations are calculated).

(Multi)collinearity - a predictor variable must not be correlated to the combination of other predictor variables (known collectively as the linear predictor). Multicollinearity has major detrimental effects on model fitting:

  • instability of the estimated partial regression slopes (small changes in the data or variable inclusion can cause dramatic changes in parameter estimates)
  • inflated standard errors and confidence intervals of model parameters, thereby increasing the type II error rate (reducing power) of parameter hypothesis tests
Multicollinearity can be diagnosed with the following:
  • investigate pairwise correlations between all the predictor variables either by a correlation matrix or a scatterplot matrix
  • calculate tolerance ($1-r^2$ of the relationship between a predictor variable and all the other predictor variables) for each of the predictor variables. Tolerance is a measure of the degree of collinearity and values less $<0.2$ should be considered and values $<0.1$ given series attention. Variance inflation factor (VIF) are the inverse of tolerance and thus values greater than 5, or worse, 10 indicate collinearity.
  • PCA (principle components analysis) eigenvalues (from a correlation matrix for all the predictor variables) close to zero indicate collinearity and component loadings may be useful in determining which predictor variables cause collinearity.
There are several approaches to dealing with collinearity (however the first two of these are likely to result in biased parameter estimates):
  • remove the highly correlated predictor variable(s), starting with the least most biologically interesting variable(s)
  • PCA (principle components analysis) regression - regress the response variable against the principal components resulting from a correlation matrix for all the predictor variables. Each of these principal components by definition are completely independent, but the resulting parameter estimates must be back-calculated in order to have any biological meaning.
  • apply a regression tree - regression trees recursively partitioning (subsetting) the data in accordance to individual variables that explain the greatest remaining variance. Since at each iteration, each predictor variable is effectively evaluated in isololation, (multi)collinearity is not an issue.

These assumptions should be explored using the techniques highlighted here

Model fitting or statistical analysis

Consistent with Tutorial 7.2b we will explore Bayesian modelling of multiple linear regression using a variety of tools (such as MCMCpack, JAGS, RSTAN, RSTANARM and BRMS). Whilst JAGS and RSTAN are extremely flexible and thus allow models to be formulated that contain not only the simple model, but also additional derivatives, the other approaches are more restrictive. Consequently, I will mostly restrict models to just the minimum necessary and all derivatives will instead be calculated in R itself from the returned posteriors. Hence for each model, I will generate a mcmc list (data.mcmc.list) containing the mcmc sample matrix for each chain. This mcmc list will be considered a standard starting point for all other manipulations.

Multiple linear regression models can include predictors (terms) that are incorporated additively (no interactions) or multiplicatively (with interactions). As such we will explore these separately for each modelling tool.

The observed response ($y_i$) are assumed to be drawn from a normal distribution with a given mean ($\mu$) and standard deviation ($\sigma$). The expected values ($\mu$) are themselves determined by the linear predictor ($\beta_0 + \beta X_i$). In this case, $\beta_0$ represents the y-intercept (value of $y$ when all of the $x$'s are equal to zero) and the set of $\beta$'s represent the rates of change in $y$ for every unit change in each $x$ (the effect) holding each other $x$ constant.

Note that since we should always center all predictors (by subtracting the mean of each $x$ from the repective values of each $x$), the y-intercept represents the value of $y$ at the average value of each $x$.

MCMC sampling requires priors on all parameters. We will employ weakly informative priors. Specifying 'uninformative' priors is always a bit of a balancing act. If the priors are too vague (wide) the MCMC sampler can wander off into nonscence areas of likelihood rather than concentrate around areas of highest likelihood (desired when wanting the outcomes to be largely driven by the data). On the other hand, if the priors are too strong, they may have an influence on the parameters. In such a simple model, this balance is very forgiving - it is for more complex models that prior choice becomes more important.

For this simple model, we will go with zero-centered Gaussian (normal) priors with relatively large standard deviations (100) for both the intercept and the treatment effect and a wide half-cauchy (scale=5) for the standard deviation. $$ \begin{align} y_i &\sim{} N(\mu_i, \sigma)\\ \mu_i &= \beta_0 + \beta X_i\\[1em] \beta_0 &\sim{} N(0,100)\\ \beta &\sim{} N(0,100)\\ \sigma &\sim{} cauchy(0,5)\\ \end{align} $$

Additive model

library(MCMCpack)
data.mcmcpack.add <- MCMCregress(y ~ cx1 + cx2, data = data)

Multiplicative model

library(MCMCpack)
data.mcmcpack.mult <- MCMCregress(y ~ cx1 * cx2, data = data)

If we define the model in terms of matrices, the JAGS model definition is identical for both additive and multiplicative models. $$\begin{align} y_i&\sim{}N(\mu_i, \tau)\\ \mu_i &= \beta_0 + \beta X_i\\ \beta_0&\sim{}N(0,1.0{E-6}) \hspace{1cm}\mathsf{non-informative~prior~for~interept}\\ \beta_j&\sim{}N(0,1.0{E-6}) \hspace{1cm}\mathsf{non-informative~prior~for~partial~slopes}\\ \tau &= 1/\sigma^2\\ \sigma&\sim{}U(0,100)\\ \end{align} $$

Define the model

modelString = "
  model {
  #Likelihood
  for (i in 1:n) {
  y[i]~dnorm(mu[i],tau)
  mu[i] <- beta0 + inprod(beta[],X[i,])
  }
  #Priors
  beta0 ~ dnorm(0.01,1.0E-6)
  for (j in 1:nX) {
  beta[j] ~ dnorm(0.01,1.0E-6)
  }
  tau <- 1 / (sigma * sigma)
  sigma~dunif(0,100)
  }
  "

Additive model

Define the data list

Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:

  • the response variable (y)
  • the predictor matrix (X)
  • the number of predictor variables (nX)
  • the total number of observed items (n)
This all needs to be contained within a list object. We will create two data lists, one for each of the hypotheses.

X = model.matrix(~cx1 + cx2, data = data)
data.list <- with(data, list(y = y, X = X[, -1], nX = ncol(X) -
    1, n = nrow(data)))

Define the MCMC chain parameters

Next we should define the behavioural parameters of the MCMC sampling chains. Include the following:

  • the nodes (estimated parameters) to monitor (return samples for)
  • the number of MCMC chains (3)
  • the number of burnin steps (1000)
  • the thinning factor (10)
  • the number of MCMC iterations - determined by the number of samples to save, the rate of thinning and the number of chains

params <- c("beta0", "beta", "sigma")
nChains = 3
burnInSteps = 3000
thinSteps = 10
numSavedSteps = 15000  #across all chains
nIter = ceiling(burnInSteps + (numSavedSteps * thinSteps)/nChains)
nIter
[1] 53000

Fit the model

Now run the JAGS code via the R2jags interface. Note that the first time jags is run after the R2jags package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.

## load the R2jags package
library(R2jags)
data.r2jags.add <- jags(data = data.list, inits = NULL, parameters.to.save = params,
    model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter,
    n.burnin = burnInSteps, n.thin = thinSteps)
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 100
   Unobserved stochastic nodes: 4
   Total graph size: 618

Initializing model
print(data.r2jags.add)
Inference for Bugs model at "5", fit using jags,
 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10
 n.sims = 15000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]    3.028   0.501   2.037   2.692   3.032   3.365   4.009 1.001 15000
beta[2]    1.389   0.432   0.545   1.101   1.394   1.675   2.235 1.001 15000
beta0      3.823   0.115   3.597   3.745   3.822   3.901   4.048 1.001  8000
sigma      1.146   0.083   1.001   1.088   1.140   1.198   1.322 1.001 15000
deviance 309.526   2.883 305.925 307.427 308.888 310.918 316.903 1.002  3100

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 4.2 and DIC = 313.7
DIC is an estimate of expected predictive error (lower deviance is better).
data.mcmc.list.add <- as.mcmc(data.r2jags.add)

Multiplicative model

Define the data list

Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:

  • the response variable (y)
  • the predictor matrix (X)
  • the number of predictor terms (nX)
  • the total number of observed items (n)
This all needs to be contained within a list object. We will create two data lists, one for each of the hypotheses.

X = model.matrix(~cx1 * cx2, data = data)
data.list <- with(data, list(y = y, X = X[, -1], nX = ncol(X) - 1, n = nrow(data)))

Define the MCMC chain parameters

Next we should define the behavioural parameters of the MCMC sampling chains. Include the following:

  • the nodes (estimated parameters) to monitor (return samples for)
  • the number of MCMC chains (3)
  • the number of burnin steps (1000)
  • the thinning factor (10)
  • the number of MCMC iterations - determined by the number of samples to save, the rate of thinning and the number of chains

params <- c("beta0", "beta", "sigma")
nChains = 3
burnInSteps = 3000
thinSteps = 10
numSavedSteps = 15000  #across all chains
nIter = ceiling(burnInSteps + (numSavedSteps * thinSteps)/nChains)
nIter
[1] 53000

Fit the model

Now run the JAGS code via the R2jags interface. Note that the first time jags is run after the R2jags package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.

## load the R2jags package
library(R2jags)
data.r2jags.mult <- jags(data = data.list, inits = NULL, parameters.to.save = params,
    model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter,
    n.burnin = burnInSteps, n.thin = thinSteps)
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 100
   Unobserved stochastic nodes: 5
   Total graph size: 721

Initializing model
print(data.r2jags.mult)
Inference for Bugs model at "5", fit using jags,
 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10
 n.sims = 15000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]    2.931   0.499   1.969   2.593   2.929   3.265   3.902 1.001 15000
beta[2]    1.344   0.426   0.508   1.053   1.343   1.631   2.181 1.001 15000
beta[3]    2.675   1.256   0.182   1.845   2.669   3.509   5.158 1.001 15000
beta0      3.671   0.134   3.413   3.580   3.671   3.760   3.932 1.001 12000
sigma      1.126   0.082   0.981   1.070   1.120   1.176   1.307 1.001  8000
deviance 305.828   3.280 301.532 303.441 305.137 307.457 314.045 1.001 15000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 5.4 and DIC = 311.2
DIC is an estimate of expected predictive error (lower deviance is better).
data.mcmc.list.mult <- as.mcmc(data.r2jags.mult)

Whilst Gibbs sampling provides an elegantly simple MCMC sampling routine, very complex hierarchical models can take enormous numbers of iterations (often prohibitory large) to converge on a stable posterior distribution.

To address this, Andrew Gelman (and other collaborators) have implemented a variation on Hamiltonian Monte Carlo (HMC: a sampler that selects subsequent samples in a way that reduces the correlation between samples, thereby speeding up convergence) called the No-U-Turn (NUTS) sampler. All of these developments are brought together into a tool called Stan ("Sampling Through Adaptive Neighborhoods").

By design (to appeal to the vast BUGS users), Stan models are defined in a manner reminiscent of BUGS. Stan first converts these models into C++ code which is then compiled to allow very rapid computation.

Consistent with the use of C++, the model must be accompanied by variable declarations for all inputs and parameters.

One important difference between Stan and JAGS is that whereas BUGS (and thus JAGS) use precision rather than variance, Stan uses variance.

Stan itself is a stand-alone command line application. However, conveniently, the authors of Stan have also developed an R interface to Stan called Rstan which can be used much like R2jags.

Define the model

The minimum model in Stan required to fit the above simple regression follows. Note the following modifications from the model defined in JAGS:
  • the normal distribution is defined by variance rather than precision
  • rather than using a uniform prior for sigma, I am using a half-Cauchy

We now translate the likelihood model into STAN code.
$$\begin{align} y_i&\sim{}N(\mu_i, \sigma)\\ \mu_i &= \beta_0+\beta X_i\\ \beta_0&\sim{}N(0,100)\\ \beta&\sim{}N(0,100)\\ \sigma&\sim{}Cauchy(0,5)\\ \end{align} $$

					  data { 
					  int<lower=1> n;   // total number of observations 
					  vector[n] Y;      // response variable 
					  int<lower=1> nX;  // number of effects 
					  matrix[n, nX] X;   // model matrix 
					  } 
					  transformed data { 
					  matrix[n, nX - 1] Xc;  // centered version of X 
					  vector[nX - 1] means_X;  // column means of X before centering 
					  
					  for (i in 2:nX) { 
					  means_X[i - 1] = mean(X[, i]); 
					  Xc[, i - 1] = X[, i] - means_X[i - 1]; 
					  }  
					  } 
					  parameters { 
					  vector[nX-1] beta;  // population-level effects 
					  real cbeta0;  // center-scale intercept 
					  real<lower=0> sigma;  // residual SD 
					  } 
					  transformed parameters { 
					  } 
					  model { 
					  vector[n] mu; 
					  mu = Xc * beta + cbeta0; 
					  // prior specifications 
					  beta ~ normal(0, 100); 
					  cbeta0 ~ normal(0, 100); 
					  sigma ~ cauchy(0, 5); 
					  // likelihood contribution 
					  Y ~ normal(mu, sigma); 
					  } 
					  generated quantities {
					  real beta0;  // population-level intercept 
					  vector[n] log_lik;
					  beta0 = cbeta0 - dot_product(means_X, beta);
					  for (i in 1:n) {
					  log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma);
					  } 
					  }

Additive model

Define the data list

Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:

  • the response variable (y)
  • the predictor matrix (X)
  • the number of predictor variables (nX)
  • the total number of observed items (n)
This all needs to be contained within a list object. We will create two data lists, one for each of the hypotheses.

X = model.matrix(~cx1 + cx2, data = data)
data.list <- with(data, list(Y = y, X = X, nX = ncol(X), n = nrow(data)))

Define the MCMC chain parameters

Next we should define the behavioural parameters of the No-U-Turn sampling chains. Include the following:

  • the nodes (estimated parameters) to monitor (return samples for)
  • the number of MCMC chains (3)
  • the number of burnin steps (1000)
  • the thinning factor (10)
  • the number of MCMC iterations - determined by the number of samples to save, the rate of thinning and the number of chains

nChains = 3
burnInSteps = 1000
thinSteps = 3
numSavedSteps = 3000  #across all chains
nIter = ceiling(burnInSteps + (numSavedSteps * thinSteps)/nChains)
nIter
[1] 4000

Fit the model

Now compile and run the Stan code via the rstan interface. Note that the first time jags is run after the rstan package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.

During the warmup stage, the No-U-Turn sampler (NUTS) attempts to determine the optimum stepsize - the stepsize that achieves the target acceptance rate (0.8 or 80% by default) without divergence (occurs when the stepsize is too large relative to the curvature of the log posterior - and results in approximations that are likely to diverge and be biased) and without hitting the maximum treedepth (10). At each iteration of the NUTS algorithm, the number of leapfrog steps doubles (as it increases the treedepth) and only terminates when either the NUTS criterion are satisfied or the tree depth reaches the maximum (10 by default).

## load the rstan package
library(rstan)
data.rstan.add <- stan(data = data.list, model_code = modelString, chains = nChains,
    iter = nIter, warmup = burnInSteps, thin = thinSteps, save_dso = TRUE)
In file included from /usr/local/lib/R/site-library/BH/include/boost/config.hpp:39:0,
                 from /usr/local/lib/R/site-library/BH/include/boost/math/tools/config.hpp:13,
                 from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/var.hpp:7,
                 from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5,
                 from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:12,
                 from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4,
                 from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4,
                 from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4,
                 from file2b18778a1a1f.cpp:8:
/usr/local/lib/R/site-library/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: "BOOST_NO_CXX11_RVALUE_REFERENCES" redefined
 #  define BOOST_NO_CXX11_RVALUE_REFERENCES
 ^
<command-line>:0:0: note: this is the location of the previous definition

SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 1).

Gradient evaluation took 2.5e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
Adjust your expectations accordingly!


Iteration:    1 / 4000 [  0%]  (Warmup)
Iteration:  400 / 4000 [ 10%]  (Warmup)
Iteration:  800 / 4000 [ 20%]  (Warmup)
Iteration: 1001 / 4000 [ 25%]  (Sampling)
Iteration: 1400 / 4000 [ 35%]  (Sampling)
Iteration: 1800 / 4000 [ 45%]  (Sampling)
Iteration: 2200 / 4000 [ 55%]  (Sampling)
Iteration: 2600 / 4000 [ 65%]  (Sampling)
Iteration: 3000 / 4000 [ 75%]  (Sampling)
Iteration: 3400 / 4000 [ 85%]  (Sampling)
Iteration: 3800 / 4000 [ 95%]  (Sampling)
Iteration: 4000 / 4000 [100%]  (Sampling)

 Elapsed Time: 0.073416 seconds (Warm-up)
               0.143677 seconds (Sampling)
               0.217093 seconds (Total)


SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 2).

Gradient evaluation took 2.4e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds.
Adjust your expectations accordingly!


Iteration:    1 / 4000 [  0%]  (Warmup)
Iteration:  400 / 4000 [ 10%]  (Warmup)
Iteration:  800 / 4000 [ 20%]  (Warmup)
Iteration: 1001 / 4000 [ 25%]  (Sampling)
Iteration: 1400 / 4000 [ 35%]  (Sampling)
Iteration: 1800 / 4000 [ 45%]  (Sampling)
Iteration: 2200 / 4000 [ 55%]  (Sampling)
Iteration: 2600 / 4000 [ 65%]  (Sampling)
Iteration: 3000 / 4000 [ 75%]  (Sampling)
Iteration: 3400 / 4000 [ 85%]  (Sampling)
Iteration: 3800 / 4000 [ 95%]  (Sampling)
Iteration: 4000 / 4000 [100%]  (Sampling)

 Elapsed Time: 0.060339 seconds (Warm-up)
               0.177215 seconds (Sampling)
               0.237554 seconds (Total)


SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 3).

Gradient evaluation took 1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
Adjust your expectations accordingly!


Iteration:    1 / 4000 [  0%]  (Warmup)
Iteration:  400 / 4000 [ 10%]  (Warmup)
Iteration:  800 / 4000 [ 20%]  (Warmup)
Iteration: 1001 / 4000 [ 25%]  (Sampling)
Iteration: 1400 / 4000 [ 35%]  (Sampling)
Iteration: 1800 / 4000 [ 45%]  (Sampling)
Iteration: 2200 / 4000 [ 55%]  (Sampling)
Iteration: 2600 / 4000 [ 65%]  (Sampling)
Iteration: 3000 / 4000 [ 75%]  (Sampling)
Iteration: 3400 / 4000 [ 85%]  (Sampling)
Iteration: 3800 / 4000 [ 95%]  (Sampling)
Iteration: 4000 / 4000 [100%]  (Sampling)

 Elapsed Time: 0.0688 seconds (Warm-up)
               0.149029 seconds (Sampling)
               0.217829 seconds (Total)
data.rstan.add
Inference for Stan model: 4e0a54cd22440d9847d7705bdb1ff803.
3 chains, each with iter=4000; warmup=1000; thin=3; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

               mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
beta[1]        3.02    0.01 0.51   2.03   2.69   3.02   3.34   4.05  2747    1
beta[2]        1.38    0.01 0.43   0.52   1.10   1.39   1.68   2.22  2854    1
cbeta0         3.83    0.00 0.12   3.59   3.75   3.83   3.91   4.06  2317    1
sigma          1.15    0.00 0.08   0.99   1.09   1.14   1.20   1.32  2806    1
beta0          3.83    0.00 0.12   3.59   3.75   3.83   3.91   4.06  2317    1
log_lik[1]    -1.21    0.00 0.11  -1.46  -1.28  -1.20  -1.13  -1.02  2200    1
log_lik[2]    -1.11    0.00 0.09  -1.31  -1.17  -1.11  -1.05  -0.96  2916    1
log_lik[3]    -1.45    0.00 0.14  -1.74  -1.54  -1.44  -1.35  -1.21  2829    1
log_lik[4]    -1.16    0.00 0.09  -1.35  -1.22  -1.16  -1.10  -1.00  2219    1
log_lik[5]    -1.37    0.00 0.10  -1.59  -1.44  -1.37  -1.30  -1.19  2327    1
log_lik[6]    -1.12    0.00 0.08  -1.28  -1.17  -1.12  -1.07  -0.98  2313    1
log_lik[7]    -1.27    0.00 0.16  -1.63  -1.36  -1.24  -1.15  -1.02  2806    1
log_lik[8]    -2.43    0.01 0.25  -2.95  -2.59  -2.42  -2.26  -1.99  2248    1
log_lik[9]    -1.42    0.00 0.16  -1.77  -1.51  -1.40  -1.31  -1.15  2796    1
log_lik[10]   -1.09    0.00 0.08  -1.24  -1.14  -1.09  -1.04  -0.95  2402    1
log_lik[11]   -1.21    0.00 0.08  -1.39  -1.27  -1.21  -1.15  -1.05  2839    1
log_lik[12]   -1.16    0.00 0.10  -1.37  -1.22  -1.15  -1.09  -0.98  2774    1
log_lik[13]   -1.22    0.00 0.10  -1.44  -1.28  -1.21  -1.14  -1.03  2784    1
log_lik[14]   -1.81    0.00 0.20  -2.24  -1.94  -1.80  -1.67  -1.46  2828    1
log_lik[15]   -1.14    0.00 0.11  -1.40  -1.20  -1.13  -1.07  -0.96  2934    1
log_lik[16]   -1.12    0.00 0.09  -1.32  -1.17  -1.11  -1.06  -0.96  2917    1
log_lik[17]   -1.07    0.00 0.08  -1.24  -1.12  -1.07  -1.02  -0.93  2768    1
log_lik[18]   -1.07    0.00 0.08  -1.23  -1.12  -1.07  -1.02  -0.93  2581    1
log_lik[19]   -1.07    0.00 0.08  -1.24  -1.12  -1.07  -1.02  -0.92  2737    1
log_lik[20]   -1.06    0.00 0.07  -1.21  -1.11  -1.06  -1.01  -0.92  2809    1
log_lik[21]   -1.98    0.00 0.22  -2.45  -2.12  -1.96  -1.82  -1.58  2405    1
log_lik[22]   -1.16    0.00 0.12  -1.44  -1.22  -1.15  -1.08  -0.97  2216    1
log_lik[23]   -1.17    0.00 0.11  -1.42  -1.23  -1.16  -1.09  -0.98  2231    1
log_lik[24]   -1.07    0.00 0.08  -1.25  -1.13  -1.07  -1.02  -0.92  2452    1
log_lik[25]   -1.24    0.00 0.10  -1.45  -1.30  -1.23  -1.17  -1.07  2104    1
log_lik[26]   -1.37    0.00 0.12  -1.63  -1.45  -1.36  -1.29  -1.16  2896    1
log_lik[27]   -1.09    0.00 0.08  -1.24  -1.14  -1.09  -1.04  -0.94  2487    1
log_lik[28]   -1.27    0.00 0.13  -1.55  -1.35  -1.26  -1.18  -1.06  2881    1
log_lik[29]   -2.30    0.01 0.31  -2.98  -2.50  -2.28  -2.08  -1.75  2682    1
log_lik[30]   -1.40    0.00 0.12  -1.67  -1.48  -1.39  -1.32  -1.18  2285    1
log_lik[31]   -1.22    0.00 0.09  -1.40  -1.27  -1.21  -1.16  -1.06  2306    1
log_lik[32]   -1.06    0.00 0.07  -1.21  -1.11  -1.06  -1.01  -0.92  2518    1
log_lik[33]   -2.19    0.00 0.27  -2.77  -2.36  -2.18  -2.01  -1.73  2877    1
log_lik[34]   -2.26    0.00 0.24  -2.78  -2.42  -2.25  -2.09  -1.83  2785    1
log_lik[35]   -1.07    0.00 0.08  -1.22  -1.12  -1.07  -1.02  -0.93  2802    1
log_lik[36]   -3.41    0.01 0.37  -4.20  -3.64  -3.39  -3.15  -2.74  2146    1
log_lik[37]   -1.53    0.00 0.21  -2.00  -1.65  -1.51  -1.38  -1.18  2829    1
log_lik[38]   -1.22    0.00 0.13  -1.50  -1.29  -1.20  -1.13  -1.01  2842    1
log_lik[39]   -1.86    0.00 0.15  -2.17  -1.95  -1.85  -1.75  -1.59  2213    1
log_lik[40]   -1.32    0.00 0.11  -1.55  -1.39  -1.31  -1.24  -1.12  2662    1
log_lik[41]   -1.58    0.00 0.13  -1.85  -1.67  -1.58  -1.49  -1.35  2326    1
log_lik[42]   -2.93    0.01 0.36  -3.71  -3.15  -2.91  -2.68  -2.30  2290    1
log_lik[43]   -1.43    0.00 0.14  -1.73  -1.51  -1.42  -1.32  -1.18  2520    1
log_lik[44]   -1.09    0.00 0.08  -1.26  -1.14  -1.08  -1.03  -0.94  2925    1
log_lik[45]   -1.31    0.00 0.13  -1.60  -1.39  -1.30  -1.22  -1.09  2624    1
log_lik[46]   -1.16    0.00 0.09  -1.36  -1.22  -1.16  -1.10  -1.00  2348    1
log_lik[47]   -1.47    0.00 0.17  -1.84  -1.57  -1.45  -1.34  -1.18  2530    1
log_lik[48]   -1.23    0.00 0.13  -1.52  -1.31  -1.22  -1.14  -1.03  2654    1
log_lik[49]   -2.42    0.00 0.24  -2.94  -2.57  -2.40  -2.24  -2.00  2770    1
log_lik[50]   -1.07    0.00 0.08  -1.23  -1.12  -1.07  -1.02  -0.93  2752    1
log_lik[51]   -1.78    0.00 0.21  -2.21  -1.91  -1.76  -1.63  -1.42  2456    1
log_lik[52]   -4.28    0.01 0.65  -5.68  -4.69  -4.24  -3.82  -3.13  2861    1
log_lik[53]   -1.33    0.00 0.18  -1.75  -1.43  -1.31  -1.21  -1.05  2878    1
log_lik[54]   -1.30    0.00 0.15  -1.64  -1.38  -1.28  -1.19  -1.05  2271    1
log_lik[55]   -1.59    0.00 0.19  -2.01  -1.70  -1.58  -1.46  -1.28  2922    1
log_lik[56]   -1.07    0.00 0.08  -1.23  -1.12  -1.07  -1.02  -0.92  2743    1
log_lik[57]   -1.07    0.00 0.08  -1.22  -1.12  -1.07  -1.02  -0.92  2799    1
log_lik[58]   -1.29    0.00 0.15  -1.62  -1.38  -1.27  -1.19  -1.05  2840    1
log_lik[59]   -1.52    0.00 0.16  -1.86  -1.62  -1.51  -1.41  -1.25  2484    1
log_lik[60]   -3.72    0.01 0.59  -5.02  -4.09  -3.68  -3.30  -2.69  2836    1
log_lik[61]   -1.14    0.00 0.11  -1.40  -1.20  -1.13  -1.07  -0.96  2936    1
log_lik[62]   -1.63    0.00 0.20  -2.07  -1.75  -1.61  -1.49  -1.30  2649    1
log_lik[63]   -1.30    0.00 0.11  -1.55  -1.37  -1.29  -1.23  -1.10  2260    1
log_lik[64]   -2.03    0.01 0.31  -2.71  -2.23  -2.01  -1.81  -1.52  2685    1
log_lik[65]   -2.47    0.01 0.33  -3.19  -2.67  -2.44  -2.24  -1.90  2972    1
log_lik[66]   -1.21    0.00 0.10  -1.43  -1.27  -1.20  -1.14  -1.03  2514    1
log_lik[67]   -1.81    0.00 0.27  -2.43  -1.97  -1.78  -1.61  -1.36  2935    1
log_lik[68]   -1.75    0.00 0.18  -2.15  -1.87  -1.74  -1.62  -1.44  2830    1
log_lik[69]   -1.18    0.00 0.10  -1.40  -1.24  -1.18  -1.11  -1.01  2569    1
log_lik[70]   -1.85    0.00 0.22  -2.33  -1.99  -1.84  -1.69  -1.47  2684    1
log_lik[71]   -1.07    0.00 0.07  -1.22  -1.12  -1.07  -1.02  -0.93  2542    1
log_lik[72]   -1.30    0.00 0.14  -1.61  -1.39  -1.28  -1.20  -1.07  2787    1
log_lik[73]   -1.11    0.00 0.09  -1.30  -1.16  -1.11  -1.05  -0.95  2444    1
log_lik[74]   -1.09    0.00 0.09  -1.28  -1.14  -1.08  -1.03  -0.94  2834    1
log_lik[75]   -1.25    0.00 0.10  -1.47  -1.32  -1.24  -1.17  -1.06  2795    1
log_lik[76]   -1.46    0.00 0.10  -1.67  -1.52  -1.46  -1.39  -1.28  2265    1
log_lik[77]   -2.93    0.01 0.37  -3.73  -3.17  -2.91  -2.66  -2.29  2806    1
log_lik[78]   -2.23    0.01 0.30  -2.86  -2.42  -2.21  -2.02  -1.70  2431    1
log_lik[79]   -2.47    0.01 0.32  -3.19  -2.67  -2.44  -2.25  -1.91  2812    1
log_lik[80]   -1.54    0.00 0.14  -1.84  -1.62  -1.52  -1.44  -1.29  2938    1
log_lik[81]   -1.07    0.00 0.08  -1.23  -1.12  -1.07  -1.02  -0.92  2751    1
log_lik[82]   -1.54    0.00 0.13  -1.82  -1.62  -1.53  -1.45  -1.31  2380    1
log_lik[83]   -1.11    0.00 0.09  -1.30  -1.17  -1.11  -1.06  -0.96  2357    1
log_lik[84]   -1.45    0.00 0.22  -1.97  -1.58  -1.42  -1.29  -1.10  2814    1
log_lik[85]   -1.22    0.00 0.13  -1.51  -1.29  -1.20  -1.13  -1.02  2712    1
log_lik[86]   -1.27    0.00 0.12  -1.52  -1.34  -1.26  -1.19  -1.07  2698    1
log_lik[87]   -1.84    0.00 0.24  -2.36  -2.00  -1.82  -1.67  -1.42  2501    1
log_lik[88]   -1.17    0.00 0.10  -1.37  -1.23  -1.16  -1.10  -1.00  2797    1
log_lik[89]   -2.54    0.00 0.26  -3.07  -2.71  -2.52  -2.36  -2.08  2900    1
log_lik[90]   -1.19    0.00 0.11  -1.45  -1.26  -1.18  -1.11  -1.01  2699    1
log_lik[91]   -1.42    0.00 0.17  -1.80  -1.53  -1.41  -1.31  -1.14  2488    1
log_lik[92]   -1.31    0.00 0.13  -1.59  -1.39  -1.29  -1.21  -1.09  2640    1
log_lik[93]   -1.17    0.00 0.08  -1.34  -1.22  -1.17  -1.12  -1.02  2268    1
log_lik[94]   -1.29    0.00 0.12  -1.55  -1.36  -1.28  -1.21  -1.08  2884    1
log_lik[95]   -4.17    0.01 0.62  -5.49  -4.55  -4.13  -3.75  -3.06  2957    1
log_lik[96]   -1.72    0.00 0.16  -2.05  -1.82  -1.70  -1.60  -1.43  2421    1
log_lik[97]   -1.07    0.00 0.08  -1.22  -1.12  -1.07  -1.02  -0.93  2860    1
log_lik[98]   -1.11    0.00 0.09  -1.31  -1.17  -1.11  -1.05  -0.96  2847    1
log_lik[99]   -1.54    0.00 0.15  -1.86  -1.63  -1.53  -1.44  -1.28  2492    1
log_lik[100]  -1.38    0.00 0.12  -1.63  -1.45  -1.37  -1.30  -1.17  2828    1
lp__         -62.83    0.03 1.45 -66.53 -63.52 -62.53 -61.76 -61.01  2476    1

Samples were drawn using NUTS(diag_e) at Thu Aug 17 15:32:58 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Multiplicative model

Define the data list

Arrange the data as a list (as required by BUGS). As input, JAGS will need to be supplied with:

  • the response variable (y)
  • the predictor matrix (X)
  • the number of predictor variables (nX)
  • the total number of observed items (n)
This all needs to be contained within a list object. We will create two data lists, one for each of the hypotheses.

X = model.matrix(~cx1 * cx2, data = data)
data.list <- with(data, list(Y = y, X = X, nX = ncol(X), n = nrow(data)))

Define the MCMC chain parameters

Next we should define the behavioural parameters of the No-U-Turn sampling chains. Include the following:

  • the nodes (estimated parameters) to monitor (return samples for)
  • the number of MCMC chains (3)
  • the number of burnin steps (1000)
  • the thinning factor (10)
  • the number of MCMC iterations - determined by the number of samples to save, the rate of thinning and the number of chains

nChains = 3
burnInSteps = 1000
thinSteps = 3
numSavedSteps = 3000  #across all chains
nIter = ceiling(burnInSteps + (numSavedSteps * thinSteps)/nChains)
nIter
[1] 4000

Fit the model

Now compile and run the Stan code via the rstan interface. Note that the first time jags is run after the rstan package is loaded, it is often necessary to run any kind of randomization function just to initiate the .Random.seed variable.

During the warmup stage, the No-U-Turn sampler (NUTS) attempts to determine the optimum stepsize - the stepsize that achieves the target acceptance rate (0.8 or 80% by default) without divergence (occurs when the stepsize is too large relative to the curvature of the log posterior - and results in approximations that are likely to diverge and be biased) and without hitting the maximum treedepth (10). At each iteration of the NUTS algorithm, the number of leapfrog steps doubles (as it increases the treedepth) and only terminates when either the NUTS criterion are satisfied or the tree depth reaches the maximum (10 by default).

## load the rstan package
library(rstan)
data.rstan.mult <- stan(data = data.list, model_code = modelString, chains = nChains,
    iter = nIter, warmup = burnInSteps, thin = thinSteps, save_dso = TRUE)
SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 1).

Gradient evaluation took 1.6e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
Adjust your expectations accordingly!


Iteration:    1 / 4000 [  0%]  (Warmup)
Iteration:  400 / 4000 [ 10%]  (Warmup)
Iteration:  800 / 4000 [ 20%]  (Warmup)
Iteration: 1001 / 4000 [ 25%]  (Sampling)
Iteration: 1400 / 4000 [ 35%]  (Sampling)
Iteration: 1800 / 4000 [ 45%]  (Sampling)
Iteration: 2200 / 4000 [ 55%]  (Sampling)
Iteration: 2600 / 4000 [ 65%]  (Sampling)
Iteration: 3000 / 4000 [ 75%]  (Sampling)
Iteration: 3400 / 4000 [ 85%]  (Sampling)
Iteration: 3800 / 4000 [ 95%]  (Sampling)
Iteration: 4000 / 4000 [100%]  (Sampling)

 Elapsed Time: 0.075668 seconds (Warm-up)
               0.14075 seconds (Sampling)
               0.216418 seconds (Total)


SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 2).

Gradient evaluation took 2e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
Adjust your expectations accordingly!


Iteration:    1 / 4000 [  0%]  (Warmup)
Iteration:  400 / 4000 [ 10%]  (Warmup)
Iteration:  800 / 4000 [ 20%]  (Warmup)
Iteration: 1001 / 4000 [ 25%]  (Sampling)
Iteration: 1400 / 4000 [ 35%]  (Sampling)
Iteration: 1800 / 4000 [ 45%]  (Sampling)
Iteration: 2200 / 4000 [ 55%]  (Sampling)
Iteration: 2600 / 4000 [ 65%]  (Sampling)
Iteration: 3000 / 4000 [ 75%]  (Sampling)
Iteration: 3400 / 4000 [ 85%]  (Sampling)
Iteration: 3800 / 4000 [ 95%]  (Sampling)
Iteration: 4000 / 4000 [100%]  (Sampling)

 Elapsed Time: 0.058992 seconds (Warm-up)
               0.139282 seconds (Sampling)
               0.198274 seconds (Total)


SAMPLING FOR MODEL '4e0a54cd22440d9847d7705bdb1ff803' NOW (CHAIN 3).

Gradient evaluation took 1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
Adjust your expectations accordingly!


Iteration:    1 / 4000 [  0%]  (Warmup)
Iteration:  400 / 4000 [ 10%]  (Warmup)
Iteration:  800 / 4000 [ 20%]  (Warmup)
Iteration: 1001 / 4000 [ 25%]  (Sampling)
Iteration: 1400 / 4000 [ 35%]  (Sampling)
Iteration: 1800 / 4000 [ 45%]  (Sampling)
Iteration: 2200 / 4000 [ 55%]  (Sampling)
Iteration: 2600 / 4000 [ 65%]  (Sampling)
Iteration: 3000 / 4000 [ 75%]  (Sampling)
Iteration: 3400 / 4000 [ 85%]  (Sampling)
Iteration: 3800 / 4000 [ 95%]  (Sampling)
Iteration: 4000 / 4000 [100%]  (Sampling)

 Elapsed Time: 0.062121 seconds (Warm-up)
               0.147839 seconds (Sampling)
               0.20996 seconds (Total)
data.rstan.mult
Inference for Stan model: e7d4e08d9f1bbcf9d6d79f126f9e56c7.
3 chains, each with iter=4000; warmup=1000; thin=3; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

          mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
beta[1]   2.94    0.01 0.51   1.96   2.60   2.94   3.29   3.95  2830    1
beta[2]   1.34    0.01 0.43   0.51   1.05   1.33   1.62   2.19  2755    1
beta[3]   2.70    0.02 1.24   0.28   1.88   2.70   3.52   5.17  2701    1
cbeta0    3.82    0.00 0.11   3.60   3.75   3.82   3.90   4.05  2854    1
sigma     1.13    0.00 0.08   0.98   1.07   1.12   1.18   1.31  3000    1
beta0     3.67    0.00 0.14   3.40   3.58   3.67   3.76   3.94  2515    1
lp__    -60.99    0.03 1.67 -65.06 -61.83 -60.64 -59.76 -58.82  2502    1

Samples were drawn using NUTS(diag_e) at Tue Aug 15 15:21:02 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The STAN team has put together pre-compiled modules (functions) to make specifying and applying STAN models much simpler. Each function offers a consistent interface that is Also reminiscent of major frequentist linear modelling routines in R.

Whilst it is not necessary to specify priors when using rstanarm functions (as defaults will be generated), there is no guarantee that the routines for determining these defaults will persist over time. Furthermore, it is always better to define your own priors if for no other reason that it forces you to thing about what you re doing. Consistent with the pure STAN version, we will employ the following priors:

  • weakly informative Gaussian prior for the intercept $\beta_0 \sim{} N(0, 100)$
  • weakly informative Gaussian prior for the treatment effect $\beta_1 \sim{} N(0, 100)$
  • half-cauchy prior for the variance $\sigma \sim{} Cauchy(0, 5)$

Note, I am using the refresh=0 option so as to suppress the larger regular output in the interest of keeping output to what is necessary for this tutorial. When running outside of a tutorial context, the regular verbose output is useful as it provides a way to gauge progress.

Additive model

library(rstanarm)
library(broom)
library(coda)
data.rstanarm.add = stan_glm(y ~ cx1 + cx2, data = data, iter = 2000, warmup = 200,
    chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 100),
    prior = normal(0, 100), prior_aux = cauchy(0, 2))
Gradient evaluation took 3.9e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.39 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.198223 seconds (Warm-up)
               0.678626 seconds (Sampling)
               0.876849 seconds (Total)


Gradient evaluation took 1.5e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.202665 seconds (Warm-up)
               0.536397 seconds (Sampling)
               0.739062 seconds (Total)


Gradient evaluation took 1.6e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.160257 seconds (Warm-up)
               0.56306 seconds (Sampling)
               0.723317 seconds (Total)
print(data.rstanarm.add)
stan_glm
 family:  gaussian [identity]
 formula: y ~ cx1 + cx2
------

Estimates:
            Median MAD_SD
(Intercept) 3.8    0.1   
cx1         3.0    0.5   
cx2         1.4    0.4   
sigma       1.1    0.1   

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 3.8    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
tidyMCMC(data.rstanarm.add, conf.int = TRUE, conf.method = "HPDinterval")
         term estimate  std.error  conf.low conf.high
1 (Intercept) 3.825270 0.11429998 3.6128940  4.066603
2         cx1 3.021411 0.50685407 1.9964692  3.978835
3         cx2 1.381486 0.43201840 0.5227742  2.206361
4       sigma 1.148550 0.08466719 0.9874055  1.319120

Multiplicative model

data.rstanarm.mult = stan_glm(y ~ cx1 * cx2, data = data, iter = 2000,
    warmup = 200, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0,
        100), prior = normal(0, 100), prior_aux = cauchy(0, 2))
Gradient evaluation took 3.4e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.112049 seconds (Warm-up)
               0.580318 seconds (Sampling)
               0.692367 seconds (Total)


Gradient evaluation took 1.4e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.059281 seconds (Warm-up)
               0.420321 seconds (Sampling)
               0.479602 seconds (Total)


Gradient evaluation took 1.5e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.210373 seconds (Warm-up)
               0.70337 seconds (Sampling)
               0.913743 seconds (Total)
print(data.rstanarm.mult)
stan_glm
 family:  gaussian [identity]
 formula: y ~ cx1 * cx2
------

Estimates:
            Median MAD_SD
(Intercept) 3.7    0.1   
cx1         2.9    0.5   
cx2         1.3    0.4   
cx1:cx2     2.7    1.2   
sigma       1.1    0.1   

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 3.8    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
tidyMCMC(data.rstanarm.mult, conf.int = TRUE, conf.method = "HPDinterval")
         term estimate  std.error  conf.low conf.high
1 (Intercept) 3.668833 0.13766864 3.3846348  3.921644
2         cx1 2.933656 0.49365987 1.9342004  3.875018
3         cx2 1.335035 0.41932624 0.5170813  2.187972
4     cx1:cx2 2.703573 1.23876604 0.3719395  5.211188
5       sigma 1.128580 0.08523699 0.9605405  1.293866

The brms package serves a similar goal to the rstanarm package - to provide a simple user interface to STAN. However, unlike the rstanarm implementation, brms simply converts the formula, data, priors and family into STAN model code and data before executing stan with those elements.

Whilst it is not necessary to specify priors when using brms functions (as defaults will be generated), there is no guarantee that the routines for determining these defaults will persist over time. Furthermore, it is always better to define your own priors if for no other reason that it forces you to thing about what you are doing. Consistent with the pure STAN version, we will employ the following priors:

  • weakly informative Gaussian prior for the intercept $\beta_0 \sim{} N(0, 100)$
  • weakly informative Gaussian prior for the treatment effect $\beta_1 \sim{} N(0, 100)$
  • half-cauchy prior for the variance $\sigma \sim{} Cauchy(0, 5)$

Note, I am using the refresh=0. option so as to suppress the larger regular output in the interest of keeping output to what is necessary for this tutorial. When running outside of a tutorial context, the regular verbose output is useful as it provides a way to gauge progress.

Additive model

library(brms)
library(broom)
library(coda)
data.brms.add = brm(y ~ cx1 + cx2, data = data, iter = 2000, warmup = 200,
    chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 100),
        class = "Intercept"), prior(normal(0, 100), class = "b"), prior(cauchy(0,
        5), class = "sigma")))
Gradient evaluation took 2.2e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.22 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.013242 seconds (Warm-up)
               0.072156 seconds (Sampling)
               0.085398 seconds (Total)


Gradient evaluation took 1.1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.012669 seconds (Warm-up)
               0.068201 seconds (Sampling)
               0.08087 seconds (Total)


Gradient evaluation took 2.1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.013731 seconds (Warm-up)
               0.065669 seconds (Sampling)
               0.0794 seconds (Total)
print(data.brms.add)
 Family: gaussian(identity) 
Formula: y ~ cx1 + cx2 
   Data: data (Number of observations: 100) 
Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; 
         total post-warmup samples = 2700
    ICs: LOO = Not computed; WAIC = Not computed
 
Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     3.82      0.12     3.59     4.04       2700    1
cx1           3.04      0.52     2.04     4.08       2427    1
cx2           1.38      0.44     0.48     2.24       2463    1

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma     1.15      0.08     0.99     1.33       2405    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
tidyMCMC(data.brms.add, conf.int = TRUE, conf.method = "HPDinterval")
         term estimate  std.error  conf.low conf.high
1 b_Intercept 3.818513 0.11580124 3.6098266  4.062734
2       b_cx1 3.042654 0.51964636 2.0187823  4.042107
3       b_cx2 1.380825 0.44437714 0.5271996  2.258048
4       sigma 1.147234 0.08407413 0.9784836  1.302088

Multiplicative model

data.brms.mult = brm(y ~ cx1 * cx2, data = data, iter = 2000, warmup = 200,
    chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 100),
        class = "Intercept"), prior(normal(0, 100), class = "b"), prior(cauchy(0,
        5), class = "sigma")))
Gradient evaluation took 1.9e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.023432 seconds (Warm-up)
               0.10201 seconds (Sampling)
               0.125442 seconds (Total)


Gradient evaluation took 9e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.021031 seconds (Warm-up)
               0.075103 seconds (Sampling)
               0.096134 seconds (Total)


Gradient evaluation took 9e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.019726 seconds (Warm-up)
               0.07707 seconds (Sampling)
               0.096796 seconds (Total)
print(data.brms.mult)
 Family: gaussian(identity) 
Formula: y ~ cx1 * cx2 
   Data: data (Number of observations: 100) 
Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; 
         total post-warmup samples = 2700
    ICs: LOO = Not computed; WAIC = Not computed
 
Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     3.67      0.13     3.42     3.94       2534    1
cx1           2.92      0.49     1.94     3.91       2700    1
cx2           1.35      0.43     0.54     2.19       2528    1
cx1:cx2       2.66      1.27     0.11     5.16       2411    1

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma     1.12      0.08     0.98      1.3       2585    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
tidyMCMC(data.brms.mult, conf.int = TRUE, conf.method = "HPDinterval")
         term estimate  std.error   conf.low conf.high
1 b_Intercept 3.671568 0.13285306 3.43612342  3.943035
2       b_cx1 2.923696 0.49283728 1.89910831  3.857202
3       b_cx2 1.353976 0.42606897 0.59537098  2.231514
4   b_cx1:cx2 2.659722 1.26670106 0.04200689  5.067596
5       sigma 1.123292 0.08287744 0.97447809  1.290851

MCMC diagnostics

In addition to the regular model diagnostic checks (such as residual plots), for Bayesian analyses, it is necessary to explore the characteristics of the MCMC chains and the sampler in general. Recall that the purpose of MCMC sampling is to replicate the posterior distribution of the model likelihood and priors by drawing a known number of samples from this posterior (thereby formulating a probability distribution). This is only reliable if the MCMC samples accurately reflect the posterior.

Unfortunately, since we only know the posterior in the most trivial of circumstances, it is necessary to rely on indirect measures of how accurately the MCMC samples are likely to reflect the likelihood. I will breifly outline the most important diagnostics, however, please refer to Tutorial 4.3, Secton 3.1: Markov Chain Monte Carlo sampling for a discussion of these diagnostics.

  • Traceplots for each parameter illustrate the MCMC sample values after each successive iteration along the chain. Bad chain mixing (characterized by any sort of pattern) suggests that the MCMC sampling chains may not have completely traversed all features of the posterior distribution and that more iterations are required to ensure the distribution has been accurately represented.
    plot of chunk sim7
  • Autocorrelation plot for each paramter illustrate the degree of correlation between MCMC samples separated by different lags. For example, a lag of 0 represents the degree of correlation between each MCMC sample and itself (obviously this will be a correlation of 1). A lag of 1 represents the degree of correlation between each MCMC sample and the next sample along the Chain and so on. In order to be able to generate unbiased estimates of parameters, the MCMC samples should be independent (uncorrelated). In the figures below, this would be violated in the top autocorrelation plot and met in the bottom autocorrelation plot.
    plot of chunk sim7 plot of chunk sim9
  • Rhat statistic for each parameter provides a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05. If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or overly vague priors that permit sampling in otherwise nonscence parameter space.

Prior to inspecting any summaries of the parameter estimates, it is prudent to inspect a range of chain convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model.

  • Trace plots
    View trace plots
    library(MCMCpack)
    plot(data.mcmcpack.mult)
    
    plot of chunk tut7_3bMCMCpackTrace
    plot of chunk tut7_3bMCMCpackTrace
    Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
  • Raftery diagnostic
    View Raftery diagnostic
    library(MCMCpack)
    raftery.diag(data.mcmcpack.mult)
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                       
                 Burn-in  Total Lower bound  Dependence
                 (M)      (N)   (Nmin)       factor (I)
     (Intercept) 2        3834  3746         1.020     
     cx1         2        3834  3746         1.020     
     cx2         2        3650  3746         0.974     
     cx1:cx2     2        3680  3746         0.982     
     sigma2      2        3710  3746         0.990     
    
    The Raftery diagnostics estimate that we would require about 3900 samples to reach the specified level of confidence in convergence. As we have 10,000 samples, we can be confidence that convergence has occurred.
  • Autocorrelation diagnostic
    View autocorrelations
    library(MCMCpack)
    autocorr.diag(data.mcmcpack.mult)
    
            (Intercept)           cx1         cx2       cx1:cx2       sigma2
    Lag 0   1.000000000  1.0000000000  1.00000000  1.0000000000  1.000000000
    Lag 1   0.005947655  0.0005726118 -0.03747573 -0.0009129132  0.026538088
    Lag 5   0.004657780  0.0004847322  0.01289516 -0.0066712267 -0.001533080
    Lag 10 -0.014139624 -0.0055672420 -0.01006061  0.0089793972  0.006885994
    Lag 50 -0.005874175 -0.0037488027  0.01129924 -0.0125086637  0.018083774
    
    A lag of 1 appears to be mainly sufficient to avoid autocorrelation.

Again, prior to examining the summaries, we should have explored the convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model.

library(coda)
data.mcmc = as.mcmc(data.r2jags.mult)
  • Trace plots
    plot(data.mcmc)
    
    plot of chunk tut7.3bJAGSTrace
    plot of chunk tut7.3bJAGSTrace
    Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.

    When there are a lot of parameters, this can result in a very large number of traceplots. To focus on just certain parameters (such as $\beta$s)

    preds <- c("beta0", "beta[1]", "beta[2]")
    plot(as.mcmc(data.r2jags.mult)[, preds])
    
    plot of chunk tut7.3bJAGSTrace1
  • Raftery diagnostic
    raftery.diag(data.mcmc)
    
    [[1]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                    
              Burn-in  Total Lower bound  Dependence
              (M)      (N)   (Nmin)       factor (I)
     beta0    20       35610 3746          9.51     
     beta[1]  20       36200 3746          9.66     
     beta[2]  20       38660 3746         10.30     
     beta[3]  20       37410 3746          9.99     
     deviance 20       38660 3746         10.30     
     sigma    20       37410 3746          9.99     
    
    
    [[2]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                    
              Burn-in  Total Lower bound  Dependence
              (M)      (N)   (Nmin)       factor (I)
     beta0    20       38030 3746         10.20     
     beta[1]  20       39950 3746         10.70     
     beta[2]  20       36800 3746          9.82     
     beta[3]  20       38030 3746         10.20     
     deviance 20       38030 3746         10.20     
     sigma    10       37410 3746          9.99     
    
    
    [[3]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                    
              Burn-in  Total Lower bound  Dependence
              (M)      (N)   (Nmin)       factor (I)
     beta0    20       37410 3746          9.99     
     beta[1]  20       37410 3746          9.99     
     beta[2]  20       38660 3746         10.30     
     beta[3]  20       38030 3746         10.20     
     deviance 20       37410 3746          9.99     
     sigma    20       36800 3746          9.82     
    
    The Raftery diagnostics for each chain estimate that we would require no more than 5000 samples to reach the specified level of confidence in convergence. As we have 16,667 samples, we can be confidence that convergence has occurred.
  • Autocorrelation diagnostic
    autocorr.diag(data.mcmc)
    
                   beta0      beta[1]       beta[2]      beta[3]     deviance       sigma
    Lag 0    1.000000000  1.000000000  1.000000e+00 1.0000000000  1.000000000 1.000000000
    Lag 10  -0.003559426  0.008596973 -6.002651e-04 0.0009684002 -0.011764207 0.003113188
    Lag 50   0.007319792 -0.001489343 -1.080306e-02 0.0012955417 -0.008450594 0.005073925
    Lag 100 -0.015299928  0.001915914  5.393375e-05 0.0082757392  0.001525074 0.017887984
    Lag 500  0.002792247  0.015376731  8.762674e-03 0.0065778151 -0.009101738 0.010133970
    
    A lag of 10 appears to be sufficient to avoid autocorrelation (poor mixing).

Again, prior to examining the summaries, we should have explored the convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarization).

  • extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
  • use the numerous routines that come with the rstan package
  • use the routines that come with the bayesplot package
  • explore the diagnostics interactively via shinystan

We will explore all of these:
  • via coda
    • Traceplots
    • library(coda)
      s = as.array(data.rstan.mult)
      wch = grep("beta", dimnames(s)$parameters)
      s = s[, , wch]
      mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc))
      plot(mcmc)
      
      plot of chunk tut7.3bSTANcodaTraceplots
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
    • Autocorrelation
    • library(coda)
      s = as.array(data.rstan.mult)
      wch = grep("beta", dimnames(s)$parameters)
      s = s[, , wch]
      mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc))
      autocorr.diag(mcmc)
      
                  beta[1]     beta[2]      beta[3]        cbeta0
      Lag 0   1.000000000  1.00000000  1.000000000  1.0000000000
      Lag 1   0.006483455  0.02858605  0.033585670  0.0182224466
      Lag 5  -0.008761706  0.01281171 -0.013686626  0.0147104379
      Lag 10  0.005060245 -0.01900523 -0.015181308 -0.0212298669
      Lag 50 -0.028188168 -0.01532768  0.005026116  0.0007438316
      
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
  • via rstan
    • Traceplots
      stan_trace(data.rstan.mult)
      
      plot of chunk tut7.3bSTANTrace
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
    • Raftery diagnostic
      raftery.diag(data.rstan.mult)
      
      Quantile (q) = 0.025
      Accuracy (r) = +/- 0.005
      Probability (s) = 0.95 
      
      You need a sample size of at least 3746 with these values of q, r and s
      
      The Raftery diagnostics for each chain estimate that we would require no more than 5000 samples to reach the specified level of confidence in convergence. As we have 16,667 samples, we can be confidence that convergence has occurred.
    • Autocorrelation diagnostic
      stan_ac(data.rstan.mult)
      
      plot of chunk tut7.3bSTANAuto
      A lag of 2 appears broadly sufficient to avoid autocorrelation (poor mixing).
    • Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05. If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or overly vague priors that permit sampling in otherwise nonscence parameter space.
      stan_rhat(data.rstan.mult)
      
      plot of chunk tut7.3bSTANRhat
      In this instance, all rhat values are well below 1.05 (a good thing).
    • Another measure of sampling efficiency is Effective Sample Size (ess). ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
      stan_ess(data.rstan.mult)
      
      plot of chunk tut7.3bSTANess
      In this instance, most of the parameters have reasonably high effective samples and thus there is likely to be a good range of values from which to estimate paramter properties.
  • via bayesplot
    • Trace plots and density plots
      library(bayesplot)
      mcmc_trace(as.matrix(data.rstan.mult), regex_pars = "beta|sigma")
      
      plot of chunk tut7.3bSTANMCMCTrace
      library(bayesplot)
      mcmc_combo(as.matrix(data.rstan.mult), regex_pars = "beta|sigma")
      
      plot of chunk tut7.3bSTANTrace1
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
    • Density plots
      library(bayesplot)
      mcmc_dens(as.matrix(data.rstan.mult), regex_pars = "beta|sigma")
      
      plot of chunk tut7.3bSTANdens
      Density plots sugggest mean or median would be appropriate to describe the fixed posteriors and median is appropriate for the sigma posterior.
  • via shinystan
    library(shinystan)
    launch_shinystan(data.rstan.mult)
    
  • It is worth exploring the influence of our priors.

Again, prior to examining the summaries, we should have explored the convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarization).

  • extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
  • use the numerous routines that come with the rstan package
  • use the routines that come with the bayesplot package
  • explore the diagnostics interactively via shinystan

We will explore all of these:
  • via coda
    • Traceplots
    • library(coda)
      s = as.array(data.rstanarm.mult)
      mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc))
      plot(mcmc)
      
      plot of chunk tut7.3bRSTANARMcodaTraceplots
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
    • Autocorrelation
    • library(coda)
      s = as.array(data.rstanarm.mult)
      mcmc <- do.call(mcmc.list, plyr:::alply(s[, , -(length(s[1, 1, ]))], 2, as.mcmc))
      autocorr.diag(mcmc)
      
             (Intercept)         cx1          cx2      cx1:cx2
      Lag 0   1.00000000 1.000000000  1.000000000  1.000000000
      Lag 1   0.20527489 0.003647371  0.039860083 -0.020218444
      Lag 5  -0.04746531 0.008339269 -0.016665899 -0.024428764
      Lag 10  0.01484256 0.033294835  0.002029290  0.027973299
      Lag 50 -0.01554411 0.012833299 -0.005277095  0.002660752
      
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
  • via rstan
    • Traceplots
      stan_trace(data.rstanarm.mult)
      
      plot of chunk tut7.3bRSTANARMTrace
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
    • Raftery diagnostic
      raftery.diag(data.rstanarm.mult)
      
      Quantile (q) = 0.025
      Accuracy (r) = +/- 0.005
      Probability (s) = 0.95 
      
      You need a sample size of at least 3746 with these values of q, r and s
      
      The Raftery diagnostics for each chain estimate that we would require no more than 5000 samples to reach the specified level of confidence in convergence. As we have 16,667 samples, we can be confidence that convergence has occurred.
    • Autocorrelation diagnostic
      stan_ac(data.rstanarm.mult)
      
      plot of chunk tut7.3bRSTANARMAuto
      A lag of 2 appears broadly sufficient to avoid autocorrelation (poor mixing).
    • Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05. If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or overly vague priors that permit sampling in otherwise nonscence parameter space.
      stan_rhat(data.rstanarm.mult)
      
      plot of chunk tut7.3bRSTANARMRhat
      In this instance, all rhat values are well below 1.05 (a good thing).
    • Another measure of sampling efficiency is Effective Sample Size (ess). ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
      stan_ess(data.rstanarm.mult)
      
      plot of chunk tut7.3bRSTANARMess
      In this instance, most of the parameters have reasonably high effective samples and thus there is likely to be a good range of values from which to estimate paramter properties.
  • via bayesplot
    • Trace plots and density plots
      library(bayesplot)
      mcmc_trace(as.array(data.rstanarm.mult), regex_pars = "Intercept|x|sigma")
      
      plot of chunk tut7.3bRSTANARMMCMCTrace
      library(bayesplot)
      mcmc_combo(as.array(data.rstanarm.mult))
      
      plot of chunk tut7.3bRSTANARMTrace1
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
    • Density plots
      library(bayesplot)
      mcmc_dens(as.array(data.rstanarm.mult))
      
      plot of chunk tut7.3bRSTANARMdens
      Density plots sugggest mean or median would be appropriate to describe the fixed posteriors and median is appropriate for the sigma posterior.
  • via rstanarm
    The rstanarm package provides additional posterior checks.
    • Posterior vs Prior - this compares the posterior estimate for each parameter against the associated prior. If the spread of the priors is small relative to the posterior, then it is likely that the priors are too influential. On the other hand, overly wide priors can lead to computational issues.
      library(rstanarm)
      posterior_vs_prior(data.rstanarm.mult, color_by = "vs", group_by = TRUE,
          facet_args = list(scales = "free_y"))
      
      Gradient evaluation took 6.5e-05 seconds
      1000 transitions using 10 leapfrog steps per transition would take 0.65 seconds.
      Adjust your expectations accordingly!
      
      
      
       Elapsed Time: 0.103395 seconds (Warm-up)
                     0.112204 seconds (Sampling)
                     0.215599 seconds (Total)
      
      
      Gradient evaluation took 1.4e-05 seconds
      1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
      Adjust your expectations accordingly!
      
      
      
       Elapsed Time: 0.155166 seconds (Warm-up)
                     0.139936 seconds (Sampling)
                     0.295102 seconds (Total)
      
      plot of chunk tut7.3bRSTANARMposterorvsprior
  • via shinystan
    						  library(shinystan) 
    						  launch_shinystan(data.rstanarm.mult))      
    
  • It is worth exploring the influence of our priors.

Again, prior to examining the summaries, we should have explored the convergence diagnostics. Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model. There are numerous ways of working with STAN model fits (for exploring diagnostics and summarization).

  • extract the mcmc samples and convert them into a mcmc.list to leverage the various coda routines
  • use the numerous routines that come with the rstan package
  • use the routines that come with the bayesplot package
  • explore the diagnostics interactively via shinystan

We will explore all of these:
  • via coda
    • Traceplots
    • library(coda)
      mcmc = as.mcmc(data.brms.mult)
      plot(mcmc)
      
      plot of chunk tut7.3bBRMScodaTraceplots
      plot of chunk tut7.3bBRMScodaTraceplots
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
    • Autocorrelation
    • library(coda)
      mcmc = as.mcmc(data.brms.mult)
      autocorr.diag(mcmc)
      
      Error in ts(x, start = start(x), end = end(x), deltat = thin(x)): invalid time series parameters specified
      
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
  • via rstan
    • Traceplots
      stan_trace(data.brms.mult$fit)
      
      plot of chunk tut7.3bBRMSTrace
      Trace plots show no evidence that the chains have not reasonably traversed the entire multidimensional parameter space.
    • Raftery diagnostic
      raftery.diag(data.brms.mult)
      
      Quantile (q) = 0.025
      Accuracy (r) = +/- 0.005
      Probability (s) = 0.95 
      
      You need a sample size of at least 3746 with these values of q, r and s
      
      The Raftery diagnostics for each chain estimate that we would require no more than 5000 samples to reach the specified level of confidence in convergence. As we have 16,667 samples, we can be confidence that convergence has occurred.
    • Autocorrelation diagnostic
      stan_ac(data.brms.mult$fit)
      
      plot of chunk tut7.3bBRMSAuto
      A lag of 2 appears broadly sufficient to avoid autocorrelation (poor mixing).
    • Rhat values. These values are a measure of sampling efficiency/effectiveness. Ideally, all values should be less than 1.05. If there are values of 1.05 or greater it suggests that the sampler was not very efficient or effective. Not only does this mean that the sampler was potentiall slower than it could have been, more importantly, it could indicate that the sampler spent time sampling in a region of the likelihood that is less informative. Such a situation can arise from either a misspecified model or overly vague priors that permit sampling in otherwise nonscence parameter space.
      stan_rhat(data.brms.mult$fit)
      
      plot of chunk tut7.3bBRMSRhat
      In this instance, all rhat values are well below 1.05 (a good thing).
    • Another measure of sampling efficiency is Effective Sample Size (ess). ess indicate the number samples (or proportion of samples that the sampling algorithm deamed effective. The sampler rejects samples on the basis of certain criterion and when it does so, the previous sample value is used. Hence while the MCMC sampling chain may contain 1000 samples, if there are only 10 effective samples (1%), the estimated properties are not likely to be reliable.
      stan_ess(data.brms.mult$fit)
      
      plot of chunk tut7.3bBRMSess
      In this instance, most of the parameters have reasonably high effective samples and thus there is likely to be a good range of values from which to estimate paramter properties.

Model validation

Model validation involves exploring the model diagnostics and fit to ensure that the model is broadly appropriate for the data. As such, exploration of the residuals should be routine.

For more complex models (those that contain multiple effects, it is also advisable to plot the residuals against each of the individual predictors. For sampling designs that involve sample collection over space or time, it is also a good idea to explore whether there are any temporal or spatial patterns in the residuals.

There are numerous situations (e.g. when applying specific variance-covariance structures to a model) where raw residuals do not reflect the interior workings of the model. Typically, this is because they do not take into account the variance-covariance matrix or assume a very simple variance-covariance matrix. Since the purpose of exploring residuals is to evaluate the model, for these cases, it is arguably better to draw conclusions based on standardized (or studentized) residuals.

Unfortunately the definitions of standardized and studentized residuals appears to vary and the two terms get used interchangeably. I will adopt the following definitions:


Standardized residuals:the raw residuals divided by the true standard deviation of the residuals (which of course is rarely known).
Studentized residuals:the raw residuals divided by the standard deviation of the residuals.
Note that externally studentized residuals are calculated by dividing the raw residuals by a unique standard deviation for each observation that is calculated from regressions having left each successive observation out.
Pearson residuals:the raw residuals divided by the standard deviation of the response variable.

The mark of a good model is being able to predict well. In an ideal world, we would have sufficiently large sample size as to permit us to hold a fraction (such as 25%) back thereby allowing us to train the model on 75% of the data and then see how well the model can predict the withheld 25%. Unfortunately, such a luxury is still rare in ecology.

The next best option is to see how well the model can predict the observed data. Models tend to struggle most with the extremes of trends and have particular issues when the extremes approach logical boundaries (such as zero for count data and standard deviations). We can use the fitted model to generate random predicted observations and then explore some properties of these compared to the actual observed data.

Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model.

Residuals are not computed directly within MCMCpack. However, we can calculate them manually form the posteriors.

mcmc = as.data.frame(data.mcmcpack.mult)
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = apply(mcmc[, 1:4], 2, median)
fit = as.vector(coefs %*% t(Xmat))
resid = data$y - fit
ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
plot of chunk tut7.3bMCMCpackresid

Residuals against predictors

mcmc = as.data.frame(data.mcmcpack.mult)
# generate a model matrix
newdata = newdata
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = apply(mcmc[, 1:4], 2, median)
fit = as.vector(coefs %*% t(Xmat))
resid = data$y - fit
newdata = newdata %>% cbind(fit, resid)
newdata.melt = newdata %>% gather(key = Pred, value = Value, cx1:cx2)
ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
plot of chunk tut7.3bMCMCpackresid1

And now for studentized residuals

mcmc = as.data.frame(data.mcmcpack.mult)
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = apply(mcmc[, 1:4], 2, median)
fit = as.vector(coefs %*% t(Xmat))
resid = data$y - fit
sresid = resid/sd(resid)
ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
plot of chunk tut7.3bMCMCpackresid2

Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).

Lets see how well data simulated from the model reflects the raw data

mcmc = as.matrix(data.mcmcpack.mult)
# generate a model matrix
Xmat = model.matrix(~cx1 * cx2, data)
## get median parameter estimates
coefs = mcmc[, 1:4]
fit = coefs %*% t(Xmat)
## draw samples from this model
yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data), fit[i,
    ], sqrt(mcmc[i, "sigma2"])))
ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep),
    fill = "Model"), alpha = 0.5) + geom_density(data = data,
    aes(x = y, fill = "Obs"), alpha = 0.5)
plot of chunk tut7.3bMCMCpackFit

We can also explore the posteriors of each parameter.

library(bayesplot)
mcmc_intervals(as.matrix(data.mcmcpack.mult), regex_pars = "Intercept|cx|sigma")
plot of chunk tut7.3bMCMCpackArea
mcmc_areas(as.matrix(data.mcmcpack.mult), regex_pars = "Intercept|cx|sigma")
plot of chunk tut7.3bMCMCpackArea
Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model.

Residuals are not computed directly within JAGS. However, we can calculate them manually form the posteriors.

mcmc = data.r2jags.mult$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(beta0,
    contains("beta"), sigma) %>% as.matrix
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = apply(mcmc[, 1:4], 2, median)
fit = as.vector(coefs %*% t(Xmat))
resid = data$y - fit
ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
plot of chunk tut7.3bJAGSresid

Residuals against predictors

mcmc = data.r2jags.mult$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(beta0,
    contains("beta"), sigma) %>% as.matrix
# generate a model matrix
newdata = newdata
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = apply(mcmc[, 1:4], 2, median)
fit = as.vector(coefs %*% t(Xmat))
resid = data$y - fit
newdata = newdata %>% cbind(fit, resid)
newdata.melt = newdata %>% gather(key = Pred, value = Value, cx1:cx2)
ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
plot of chunk tut7.3bJAGSresid1

And now for studentized residuals

mcmc = data.r2jags.mult$BUGSoutput$sims.matrix %>% as.data.frame %>% dplyr:::select(beta0,
    contains("beta"), sigma) %>% as.matrix
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = apply(mcmc[, 1:4], 2, median)
fit = as.vector(coefs %*% t(Xmat))
resid = data$y - fit
sresid = resid/sd(resid)
ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
plot of chunk tut7.3bJAGSresid2

Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).

Lets see how well data simulated from the model reflects the raw data

mcmc = data.r2jags.mult$BUGSoutput$sims.matrix %>% as.data.frame %>%
    dplyr:::select(beta0, contains("beta"), sigma) %>% as.matrix
# generate a model matrix
Xmat = model.matrix(~cx1 * cx2, data)
## get median parameter estimates
coefs = mcmc[, 1:4]
fit = coefs %*% t(Xmat)
## draw samples from this model
yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data), fit[i,
    ], mcmc[i, "sigma"]))
ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep),
    fill = "Model"), alpha = 0.5) + geom_density(data = data,
    aes(x = y, fill = "Obs"), alpha = 0.5)
plot of chunk tut7.3bJAGSFit

We can also explore the posteriors of each parameter.

library(bayesplot)
mcmc_intervals(data.r2jags.mult$BUGSoutput$sims.matrix, regex_pars = "beta|sigma")
plot of chunk tut7.3bJAGSArea
mcmc_areas(data.r2jags.mult$BUGSoutput$sims.matrix, regex_pars = "beta|sigma")
plot of chunk tut7.3bJAGSArea
Rather than dublicate this for both additive and multiplicative models, we will only explore the multiplicative model.

Residuals are not computed directly within RSTAN. However, we can calculate them manually form the posteriors.

mcmc = as.data.frame(data.rstan.mult) %>% dplyr:::select(beta0, starts_with("beta"),
    sigma) %>% as.matrix
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = apply(mcmc[, 1:4], 2, median)
fit = as.vector(coefs %*% t(Xmat))
resid = data$y - fit
ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
plot of chunk tut7.3bRSTANresid

Residuals against predictors

mcmc = as.data.frame(data.rstan.mult) %>% dplyr:::select(beta0, starts_with("beta"),
    sigma) %>% as.matrix
# generate a model matrix
newdata = newdata
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = apply(mcmc[, 1:4], 2, median)
fit = as.vector(coefs %*% t(Xmat))
resid = data$y - fit
newdata = newdata %>% cbind(fit, resid)
newdata.melt = newdata %>% gather(key = Pred, value = Value, cx1:cx2)
ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
plot of chunk tut7.3bRSTANresid1

And now for studentized residuals

mcmc = as.data.frame(data.rstan.mult) %>% dplyr:::select(beta0, starts_with("beta"),
    sigma) %>% as.matrix
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = apply(mcmc[, 1:4], 2, median)
fit = as.vector(coefs %*% t(Xmat))
resid = data$y - fit
sresid = resid/sd(resid)
ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
plot of chunk tut7.3bRSTANresid2

Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).

Lets see how well data simulated from the model reflects the raw data

mcmc = as.data.frame(data.rstan.mult) %>% dplyr:::select(beta0,
    starts_with("beta"), sigma) %>% as.matrix
# generate a model matrix
Xmat = model.matrix(~cx1 * cx2, data)
## get median parameter estimates
coefs = mcmc[, 1:4]
fit = coefs %*% t(Xmat)
## draw samples from this model
yRep = sapply(1:nrow(mcmc), function(i) rnorm(nrow(data), fit[i,
    ], mcmc[i, "sigma"]))
ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep),
    fill = "Model"), alpha = 0.5) + geom_density(data = data,
    aes(x = y, fill = "Obs"), alpha = 0.5)
plot of chunk tut7.3bRSTANFit

We can also explore the posteriors of each parameter.

library(bayesplot)
mcmc_intervals(as.matrix(data.rstan.mult), regex_pars = "beta|sigma")
plot of chunk tut7.3bRSTANArea
mcmc_areas(as.matrix(data.rstan.mult), regex_pars = "beta|sigma")
plot of chunk tut7.3bRSTANArea

Residuals can be computed directly within RSTANARM.

resid = resid(data.rstanarm.mult)
fit = fitted(data.rstanarm.mult)
ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
plot of chunk tut7.3bRSTANARMresid

Residuals against predictors

resid = resid(data.rstanarm.mult)
dat.melt = data %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cx1:cx2)
ggplot(dat.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
plot of chunk tut7.3bRSTANARMresid1

And now for studentized residuals

resid = resid(data.rstanarm.mult)
sresid = resid/sd(resid)
fit = fitted(data.rstanarm.mult)
ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
plot of chunk tut7.3bRSTANARMresid2

Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).

Lets compare draws (predictions) from the posterior (in blue) with the distributions of observed data (red) via violin plots. Violin plots are similar to boxplots except that they display more visual information about the density distribution.

y_pred = posterior_predict(data.rstanarm.mult)
newdata = data %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value",
    -y:-cx2)
newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val",
    cx1:cx2)
data.melt = data %>% gather(key = "Pred", value = "Pred_val",
    cx1:cx2)
ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue",
    fill = "blue", alpha = 0.5) + geom_violin(data = data.melt,
    aes(y = y, x = Pred_val), fill = "red", color = "red", alpha = 0.5) +
    facet_wrap(~Pred)
plot of chunk tut7.3bRSTANARMposteriorpredict
Conclusions - the posterior predictions match the observed data very well.

Yet another way to approach validation is to explore trends in posteriors in the context of the observed data.

## Calculate the fitted values
newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE),
    max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0), data.frame(cx1 = 0,
    cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE),
        len = 100)))
fit = posterior_predict(data.rstanarm.mult, newdata = newdata)
newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
    conf.method = "HPDinterval"))
newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
    cx1:cx2) %>% filter(Value != 0)
data.melt = data %>% gather(key = "Pred", value = "Value", cx1:cx2)
ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = data.melt,
    aes(y = y)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X") + theme_classic() + facet_wrap(~Pred)
plot of chunk tut7.3bRSTANARMGraphicalSummariesPosteriors

Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.

We can also explore the posteriors of each parameter.

library(bayesplot)
mcmc_intervals(as.matrix(data.rstanarm.mult), regex_pars = "Intercept|cx|sigma")
plot of chunk tut7.3bRSTANARMArea
mcmc_areas(as.matrix(data.rstanarm.mult), regex_pars = "Intercept|cx|sigma")
plot of chunk tut7.3bRSTANARMArea

Residuals can be computed directly within BRMS. By default, the residuals and fitted extractor functions in brms return summarized versions (means, SE and credibility intervals). We are only interested in the mean (Estimate) estimates.

resid = resid(data.brms.mult)[, "Estimate"]
fit = fitted(data.brms.mult)[, "Estimate"]
ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
plot of chunk tut7.3bBRMSresid

Residuals against predictors

resid = resid(data.brms.mult)[, "Estimate"]
dat.melt = data %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cx1:cx2)
ggplot(dat.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
plot of chunk tut7.3bBRMSresid1

And now for studentized residuals

resid = resid(data.brms.mult)[, "Estimate"]
sresid = resid/sd(resid)
fit = fitted(data.brms.mult)[, "Estimate"]
ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
plot of chunk tut7.3bBRMSresid2

Conclusions: for this simple model, the studentized residuals yield the same pattern as the raw residuals (or the Pearson residuals for that matter).

Lets compare draws (predictions) from the posterior (in blue) with the distributions of observed data (red) via violin plots. Violin plots are similar to boxplots except that they display more visual information about the density distribution.

y_pred = posterior_predict(data.brms.mult)
newdata = data %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value",
    -y:-cx2)
newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val",
    cx1:cx2)
data.melt = data %>% gather(key = "Pred", value = "Pred_val",
    cx1:cx2)
ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue",
    fill = "blue", alpha = 0.5) + geom_violin(data = data.melt,
    aes(y = y, x = Pred_val), fill = "red", color = "red", alpha = 0.5) +
    facet_wrap(~Pred)
plot of chunk tut7.3bBRMSposteriorpredict
Conclusions - the posterior predictions match the observed data very well.

Yet another way to approach validation is to explore trends in posteriors in the context of the observed data.

## Calculate the fitted values
newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE),
    max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0), data.frame(cx1 = 0,
    cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE),
        len = 100)))
fit = posterior_predict(data.brms.mult, newdata = newdata)
newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
    conf.method = "HPDinterval"))
newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
    cx1:cx2) %>% filter(Value != 0)
data.melt = data %>% gather(key = "Pred", value = "Value", cx1:cx2)
ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = data.melt,
    aes(y = y)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X") + theme_classic() + facet_wrap(~Pred)
plot of chunk tut7.3bBRMSGraphicalSummariesPosteriors

Conclusions the predicted trends do encapsulate the actual data, suggesting that the model is a reasonable representation of the underlying processes. Note, these are prediction intervals rather than confidence intervals as we are seeking intervals within which we can predict individual observations rather than means.

We can also explore the posteriors of each parameter.

library(bayesplot)
mcmc_intervals(as.matrix(data.brms.mult), regex_pars = "b_|sigma")
plot of chunk tut7.3bBRMSArea
mcmc_areas(as.matrix(data.brms.mult), regex_pars = "b_|sigma")
plot of chunk tut7.3bBRMSArea

Parameter estimates (posterior summaries)

Although all parameters in a Bayesian analysis are considered random and are considered a distribution, rarely would it be useful to present tables of all the samples from each distribution. On the other hand, plots of the posterior distributions are do have some use. Nevertheless, most workers prefer to present simple statistical summaries of the posteriors. Popular choices include the median (or mean) and 95% credibility intervals.

library(coda)
mcmcpvalue <- function(samp) {
    ## elementary version that creates an empirical p-value for the
    ## hypothesis that the columns of samp have mean zero versus a general
    ## multivariate distribution with elliptical contours.

    ## differences from the mean standardized by the observed
    ## variance-covariance factor

    ## Note, I put in the bit for single terms
    if (length(dim(samp)) == 0) {
        std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - mean(samp),
            transpose = TRUE)
        sqdist <- colSums(std * std)
        sum(sqdist[-1] > sqdist[1])/length(samp)
    } else {
        std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - colMeans(samp),
            transpose = TRUE)
        sqdist <- colSums(std * std)
        sum(sqdist[-1] > sqdist[1])/nrow(samp)
    }

}

Additive model (MCMCpack)

summary(data.mcmcpack.add)
Iterations = 1001:11000
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 10000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

             Mean     SD Naive SE Time-series SE
(Intercept) 3.823 0.1140 0.001140       0.001145
cx1         3.011 0.5045 0.005045       0.005045
cx2         1.397 0.4354 0.004354       0.004419
sigma2      1.306 0.1921 0.001921       0.001968

2. Quantiles for each variable:

              2.5%   25%   50%   75% 97.5%
(Intercept) 3.6041 3.746 3.823 3.899 4.046
cx1         2.0282 2.671 3.018 3.345 4.010
cx2         0.5306 1.106 1.399 1.693 2.252
sigma2      0.9839 1.168 1.286 1.419 1.734
# OR
library(broom)
tidyMCMC(data.mcmcpack.add, conf.int = TRUE, conf.method = "HPDinterval")
         term estimate std.error  conf.low conf.high
1 (Intercept) 3.822627 0.1139509 3.6048428  4.046831
2         cx1 3.011318 0.5044755 2.0435576  4.015099
3         cx2 1.396664 0.4353955 0.5283247  2.245911
4      sigma2 1.305520 0.1920807 0.9492838  1.677171
Conclusions:
  • when cx2 is held constant, a one unit increase in cx1 is associated with a 3.011318 change in y. That is, y increases at a rate of 3.011318 per unit increase in cx1 when standardized for cx2.
  • when cx1 is held constant, a one unit increase in cx2 is associated with a 1.396664 change in y. That is, y increases at a rate of 1.396664 per unit increase in cx2 when standardized for cx1.
Note, as this is an additive model, the rates associated with cx1 are assumed to be constant throughtout the range of cx2 and vice versa. The 95% confidence interval for each partial slope does not overlap with 0 implying a significant effects of cx1 and cx2 on y.

While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.

## since values are less than zero
mcmcpvalue(data.mcmcpack.add[, 2])
[1] 0
mcmcpvalue(data.mcmcpack.add[, 3])
[1] 0.0019

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

Multiplicative model (MCMCpack)

summary(data.mcmcpack.mult)
Iterations = 1001:11000
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 10000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

             Mean     SD Naive SE Time-series SE
(Intercept) 3.671 0.1330 0.001330       0.001330
cx1         2.928 0.4994 0.004994       0.004994
cx2         1.342 0.4194 0.004194       0.004040
cx1:cx2     2.667 1.2398 0.012398       0.012398
sigma2      1.256 0.1843 0.001843       0.001877

2. Quantiles for each variable:

              2.5%   25%   50%   75% 97.5%
(Intercept) 3.4147 3.581 3.672 3.759 3.938
cx1         1.9555 2.597 2.929 3.260 3.910
cx2         0.4986 1.072 1.343 1.622 2.161
cx1:cx2     0.2512 1.828 2.669 3.495 5.080
sigma2      0.9460 1.125 1.238 1.369 1.660
# OR
library(broom)
tidyMCMC(data.mcmcpack.mult, conf.int = TRUE, conf.method = "HPDinterval")
         term estimate std.error  conf.low conf.high
1 (Intercept) 3.671103 0.1330200 3.4112189  3.933302
2         cx1 2.928409 0.4993563 1.9354791  3.878062
3         cx2 1.341735 0.4193631 0.4912293  2.150196
4     cx1:cx2 2.667311 1.2398248 0.3215486  5.133875
5      sigma2 1.256300 0.1843042 0.9336364  1.632568
  • at the average level of cx2 (=0), a one unit increase in cx1 is associated with a 2.9284093 change in y. That is, y increases at a rate of 2.9284093 per unit increase in cx1 when standardized for cx2.
  • at the average level of cx1 (=0), a one unit increase in cx2 is associated with a 1.3417354 change in y. That is, y increases at a rate of 1.3417354 per unit increase in cx2 when standardized for cx1.
  • the degree to which the rate of change in response associated with a one unit change in cx1 changes over the range of cx2 (and vice versa) is 2.6673106.

The 95% confidence intervals for the interaction partial slope does not overlap with 0 implying a significant interaction between cx1 and cx2. This suggests that the nature of the relationship between y and cx1 depends on the level of cx2 (and vice versa). The estimates of the effect of cx1 are only appropriate when cx2 = 0 etc.

While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.

## since values are less than zero
mcmcpvalue(data.mcmcpack.mult[, 2])
[1] 0
mcmcpvalue(data.mcmcpack.mult[, 3])
[1] 0.0022
mcmcpvalue(data.mcmcpack.mult[, 4])
[1] 0.0311

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

Additive model (JAGS)

print(data.r2jags.add)
Inference for Bugs model at "5", fit using jags,
 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10
 n.sims = 15000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]    3.028   0.501   2.037   2.692   3.032   3.365   4.009 1.001 15000
beta[2]    1.389   0.432   0.545   1.101   1.394   1.675   2.235 1.001 15000
beta0      3.823   0.115   3.597   3.745   3.822   3.901   4.048 1.001  8000
sigma      1.146   0.083   1.001   1.088   1.140   1.198   1.322 1.001 15000
deviance 309.526   2.883 305.925 307.427 308.888 310.918 316.903 1.002  3100

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 4.2 and DIC = 313.7
DIC is an estimate of expected predictive error (lower deviance is better).
# OR
library(broom)
tidyMCMC(as.mcmc(data.r2jags.add), conf.int = TRUE, conf.method = "HPDinterval")
      term   estimate  std.error    conf.low  conf.high
1    beta0   3.822567 0.11519912   3.6077546   4.057015
2  beta[1]   3.028129 0.50095089   2.0517612   4.018021
3  beta[2]   1.388584 0.43169505   0.5412069   2.229200
4 deviance 309.526490 2.88347247 305.5279931 315.076927
5    sigma   1.145686 0.08280216   0.9979548   1.318215
Conclusions:
  • when cx2 is held constant, a one unit increase in cx1 is associated with a 3.0281293 change in y. That is, y increases at a rate of 3.0281293 per unit increase in cx1 when standardized for cx2.
  • when cx1 is held constant, a one unit increase in cx2 is associated with a 1.388584 change in y. That is, y increases at a rate of 1.388584 per unit increase in cx2 when standardized for cx1.
Note, as this is an additive model, the rates associated with cx1 are assumed to be constant throughtout the range of cx2 and vice versa. The 95% confidence interval for each partial slope does not overlap with 0 implying a significant effects of cx1 and cx2 on y.

While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.

## since values are less than zero
mcmcpvalue(data.r2jags.add$BUGSoutput$sims.matrix[, "beta[1]"])
[1] 0
mcmcpvalue(data.r2jags.add$BUGSoutput$sims.matrix[, "beta[2]"])
[1] 0.001866667

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

Multiplicative model (JAGS)

print(data.r2jags.mult)
Inference for Bugs model at "5", fit using jags,
 3 chains, each with 53000 iterations (first 3000 discarded), n.thin = 10
 n.sims = 15000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]    2.931   0.499   1.969   2.593   2.929   3.265   3.902 1.001 15000
beta[2]    1.344   0.426   0.508   1.053   1.343   1.631   2.181 1.001 15000
beta[3]    2.675   1.256   0.182   1.845   2.669   3.509   5.158 1.001 15000
beta0      3.671   0.134   3.413   3.580   3.671   3.760   3.932 1.001 12000
sigma      1.126   0.082   0.981   1.070   1.120   1.176   1.307 1.001  8000
deviance 305.828   3.280 301.532 303.441 305.137 307.457 314.045 1.001 15000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 5.4 and DIC = 311.2
DIC is an estimate of expected predictive error (lower deviance is better).
# OR
library(broom)
tidyMCMC(as.mcmc(data.r2jags.mult), conf.int = TRUE, conf.method = "HPDinterval")
      term   estimate  std.error    conf.low  conf.high
1    beta0   3.670651 0.13361307   3.4145748   3.933344
2  beta[1]   2.930546 0.49865289   1.9650810   3.893332
3  beta[2]   1.344297 0.42580446   0.5242358   2.194335
4  beta[3]   2.675152 1.25623900   0.1764777   5.139000
5 deviance 305.827579 3.27978099 301.0348960 312.277506
6    sigma   1.126052 0.08177241   0.9662253   1.287832
  • at the average level of cx2 (=0), a one unit increase in cx1 is associated with a 2.9305463 change in y. That is, y increases at a rate of 2.9305463 per unit increase in cx1 when standardized for cx2.
  • at the average level of cx1 (=0), a one unit increase in cx2 is associated with a 1.3442967 change in y. That is, y increases at a rate of 1.3442967 per unit increase in cx2 when standardized for cx1.
  • the degree to which the rate of change in response associated with a one unit change in cx1 changes over the range of cx2 (and vice versa) is 2.6751525.

The 95% confidence intervals for the interaction partial slope does not overlap with 0 implying a significant interaction between cx1 and cx2. This suggests that the nature of the relationship between y and cx1 depends on the level of cx2 (and vice versa). The estimates of the effect of cx1 are only appropriate when cx2 = 0 etc.

While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.

## since values are less than zero
mcmcpvalue(data.r2jags.mult$BUGSoutput$sims.matrix[, "beta[1]"])
[1] 0
mcmcpvalue(data.r2jags.mult$BUGSoutput$sims.matrix[, "beta[2]"])
[1] 0.001933333
mcmcpvalue(data.r2jags.mult$BUGSoutput$sims.matrix[, "beta[3]"])
[1] 0.0366

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

Additive model (RSTAN)

print(data.rstan.add, pars = c("beta0", "beta", "sigma"))
Inference for Stan model: 4e0a54cd22440d9847d7705bdb1ff803.
3 chains, each with iter=4000; warmup=1000; thin=3; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
beta0   3.83    0.00 0.12 3.59 3.75 3.83 3.91  4.06  2317    1
beta[1] 3.02    0.01 0.51 2.03 2.69 3.02 3.34  4.05  2747    1
beta[2] 1.38    0.01 0.43 0.52 1.10 1.39 1.68  2.22  2854    1
sigma   1.15    0.00 0.08 0.99 1.09 1.14 1.20  1.32  2806    1

Samples were drawn using NUTS(diag_e) at Thu Aug 17 15:32:58 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
# OR
library(broom)
tidyMCMC(data.rstan.add, conf.int = TRUE, conf.method = "HPDinterval",
    pars = c("beta0", "beta", "sigma"))
     term estimate  std.error  conf.low conf.high
1   beta0 3.826110 0.11727211 3.5953559  4.058548
2 beta[1] 3.022736 0.50695556 2.0124349  4.027372
3 beta[2] 1.384733 0.43249804 0.5363925  2.228436
4   sigma 1.145670 0.08415868 0.9808821  1.301693
Conclusions:
  • when cx2 is held constant, a one unit increase in cx1 is associated with a 1.3847333 change in y. That is, y increases at a rate of 1.3847333 per unit increase in cx1 when standardized for cx2.
  • when cx1 is held constant, a one unit increase in cx2 is associated with a 3.82611 change in y. That is, y increases at a rate of 3.82611 per unit increase in cx2 when standardized for cx1.
Note, as this is an additive model, the rates associated with cx1 are assumed to be constant throughtout the range of cx2 and vice versa. The 95% confidence interval for each partial slope does not overlap with 0 implying a significant effects of cx1 and cx2 on y.

While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.

## since values are less than zero
mcmcpvalue(as.matrix(data.rstan.add)[, "beta[1]"])
[1] 0
mcmcpvalue(as.matrix(data.rstan.add)[, "beta[2]"])
[1] 0.0006666667

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

Multiplicative model (RSTAN)

print(data.rstan.mult, pars = c("beta0", "beta", "sigma"))
Inference for Stan model: 4e0a54cd22440d9847d7705bdb1ff803.
3 chains, each with iter=4000; warmup=1000; thin=3; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

        mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
beta0   3.67    0.00 0.14 3.40 3.58 3.67 3.76  3.94  2515    1
beta[1] 2.94    0.01 0.51 1.96 2.60 2.94 3.29  3.95  2830    1
beta[2] 1.34    0.01 0.43 0.51 1.05 1.33 1.62  2.19  2755    1
beta[3] 2.70    0.02 1.24 0.28 1.88 2.70 3.52  5.17  2701    1
sigma   1.13    0.00 0.08 0.98 1.07 1.12 1.18  1.31  3000    1

Samples were drawn using NUTS(diag_e) at Wed Aug 16 16:06:53 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
# OR
library(broom)
tidyMCMC(data.rstan.mult, conf.int = TRUE, conf.method = "HPDinterval",
    pars = c("beta0", "beta", "sigma"))
     term estimate  std.error  conf.low conf.high
1   beta0 3.667709 0.13512216 3.3976772  3.923658
2 beta[1] 2.944087 0.50916320 1.9886013  3.974140
3 beta[2] 1.335580 0.43089574 0.4927849  2.175954
4 beta[3] 2.695545 1.23745247 0.2529844  5.066406
5   sigma 1.127368 0.08492246 0.9646372  1.292453
  • at the average level of cx2 (=0), a one unit increase in cx1 is associated with a 1.3355795 change in y. That is, y increases at a rate of 1.3355795 per unit increase in cx1 when standardized for cx2.
  • at the average level of cx1 (=0), a one unit increase in cx2 is associated with a 2.6955452 change in y. That is, y increases at a rate of 2.6955452 per unit increase in cx2 when standardized for cx1.
  • the degree to which the rate of change in response associated with a one unit change in cx1 changes over the range of cx2 (and vice versa) is 3.8218391.

The 95% confidence intervals for the interaction partial slope does not overlap with 0 implying a significant interaction between cx1 and cx2. This suggests that the nature of the relationship between y and cx1 depends on the level of cx2 (and vice versa). The estimates of the effect of cx1 are only appropriate when cx2 = 0 etc.

While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.

## since values are less than zero
mcmcpvalue(as.matrix(data.rstan.mult)[, "beta[1]"])
[1] 0
mcmcpvalue(as.matrix(data.rstan.mult)[, "beta[2]"])
[1] 0.001666667
mcmcpvalue(as.matrix(data.rstan.mult)[, "beta[3]"])
[1] 0.032

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

An alternative way of quantifying the impact of an interaction is to compare models with and without the interactions. In a Bayesian context, this can be achieved by comparing the leave-one-out cross-validation statistics. Leave-one-out (LOO) cross-validation explores how well a series of models can predict withheld values (Vehtari, Gelman, and Gabry, 2016b). The LOO Information Criterion (LOOIC) is analogous to the AIC except that the LOOIC takes priors into consideration, does not assume that the posterior distribution is drawn from a multivariate normal and integrates over parameter uncertainty so as to yield a distribution of looic rather than just a point estimate. The LOOIC does however assume that all observations are equally influential (it does not matter which observations are left out). This assumption can be examined via the Pareta k estimate (values greater than 0.5 or more conservatively 0.75 are considered overly influential).

library(loo)
(full = loo(extract_log_lik(data.rstan.mult)))
Computed from 3000 by 100 log-likelihood matrix

         Estimate   SE
elpd_loo   -155.3  6.2
p_loo         4.4  0.7
looic       310.5 12.4

All Pareto k estimates are good (k < 0.5)
See help('pareto-k-diagnostic') for details.
(reduced = loo(extract_log_lik(data.rstan.add)))
Computed from 3000 by 100 log-likelihood matrix

         Estimate   SE
elpd_loo   -156.8  6.9
p_loo         3.8  0.7
looic       313.6 13.7

All Pareto k estimates are good (k < 0.5)
See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3)
plot(full, label_points = TRUE)
plot(reduced, label_points = TRUE)
plot of chunk tut7.3bRSTANloo
compare_models(full, reduced)
Error in discrete == discrete[1]: comparison of these types is not implemented
Conclusions: the expected out-of-sample predictive accuracy is very similar (slightly lower) for the additive model compared to the multiplicative model (model containing the interaction). This might be used to suggest that the inferential evidence for an interaction is low.

Additive model (RSTANARM)

summary(data.rstanarm.add)
Model Info:

 function:  stan_glm
 family:    gaussian [identity]
 formula:   y ~ cx1 + cx2
 algorithm: sampling
 priors:    see help('prior_summary')
 sample:    2700 (posterior sample size)
 num obs:   100

Estimates:
                mean   sd     2.5%   25%    50%    75%    97.5%
(Intercept)      3.8    0.1    3.6    3.7    3.8    3.9    4.1 
cx1              3.0    0.5    2.0    2.7    3.0    3.4    4.0 
cx2              1.4    0.4    0.5    1.1    1.4    1.7    2.2 
sigma            1.1    0.1    1.0    1.1    1.1    1.2    1.3 
mean_PPD         3.8    0.2    3.5    3.7    3.8    3.9    4.1 
log-posterior -165.0    1.5 -168.8 -165.8 -164.6 -163.9 -163.2 

Diagnostics:
              mcse Rhat n_eff
(Intercept)   0.0  1.0  1591 
cx1           0.0  1.0  2700 
cx2           0.0  1.0  2700 
sigma         0.0  1.0  1641 
mean_PPD      0.0  1.0  2101 
log-posterior 0.0  1.0  1335 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
# OR
library(broom)
tidyMCMC(data.rstanarm.add$stanfit, conf.int = TRUE, conf.method = "HPDinterval")
           term    estimate  std.error     conf.low   conf.high
1   (Intercept)    3.825270 0.11429998    3.6128940    4.066603
2           cx1    3.021411 0.50685407    1.9964692    3.978835
3           cx2    1.381486 0.43201840    0.5227742    2.206361
4         sigma    1.148550 0.08466719    0.9874055    1.319120
5      mean_PPD    3.823862 0.15921259    3.5193260    4.137062
6 log-posterior -164.988433 1.48402493 -167.9667057 -162.954149
Conclusions:
  • when cx2 is held constant, a one unit increase in cx1 is associated with a 3.021411 change in y. That is, y increases at a rate of 3.021411 per unit increase in cx1 when standardized for cx2.
  • when cx1 is held constant, a one unit increase in cx2 is associated with a 1.3814856 change in y. That is, y increases at a rate of 1.3814856 per unit increase in cx2 when standardized for cx1.
Note, as this is an additive model, the rates associated with cx1 are assumed to be constant throughtout the range of cx2 and vice versa. The 95% confidence interval for each partial slope does not overlap with 0 implying a significant effects of cx1 and cx2 on y.

While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.

## since values are less than zero
mcmcpvalue(as.matrix(data.rstanarm.add)[, "cx1"])
[1] 0
mcmcpvalue(as.matrix(data.rstanarm.add)[, "cx2"])
[1] 0.001481481

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

Multiplicative model (RSTANARM)

summary(data.rstanarm.mult)
Model Info:

 function:  stan_glm
 family:    gaussian [identity]
 formula:   y ~ cx1 * cx2
 algorithm: sampling
 priors:    see help('prior_summary')
 sample:    2700 (posterior sample size)
 num obs:   100

Estimates:
                mean   sd     2.5%   25%    50%    75%    97.5%
(Intercept)      3.7    0.1    3.4    3.6    3.7    3.8    3.9 
cx1              2.9    0.5    1.9    2.6    2.9    3.3    3.9 
cx2              1.3    0.4    0.5    1.1    1.3    1.6    2.2 
cx1:cx2          2.7    1.2    0.3    1.9    2.7    3.5    5.1 
sigma            1.1    0.1    1.0    1.1    1.1    1.2    1.3 
mean_PPD         3.8    0.2    3.5    3.7    3.8    3.9    4.1 
log-posterior -164.1    1.7 -168.2 -164.9 -163.7 -162.8 -161.9 

Diagnostics:
              mcse Rhat n_eff
(Intercept)   0.0  1.0  1784 
cx1           0.0  1.0  2661 
cx2           0.0  1.0  2474 
cx1:cx2       0.0  1.0  2700 
sigma         0.0  1.0  1134 
mean_PPD      0.0  1.0  2086 
log-posterior 0.0  1.0  1374 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
# OR
library(broom)
tidyMCMC(data.rstanarm.mult$stanfit, conf.int = TRUE, conf.method = "HPDinterval")
           term    estimate  std.error     conf.low   conf.high
1   (Intercept)    3.668833 0.13766864    3.3846348    3.921644
2           cx1    2.933656 0.49365987    1.9342004    3.875018
3           cx2    1.335035 0.41932624    0.5170813    2.187972
4       cx1:cx2    2.703573 1.23876604    0.3719395    5.211188
5         sigma    1.128580 0.08523699    0.9605405    1.293866
6      mean_PPD    3.823219 0.15952737    3.5108288    4.130149
7 log-posterior -164.059918 1.68184862 -167.3759390 -161.738628
  • at the average level of cx2 (=0), a one unit increase in cx1 is associated with a 2.9336564 change in y. That is, y increases at a rate of 2.9336564 per unit increase in cx1 when standardized for cx2.
  • at the average level of cx1 (=0), a one unit increase in cx2 is associated with a 1.3350352 change in y. That is, y increases at a rate of 1.3350352 per unit increase in cx2 when standardized for cx1.
  • the degree to which the rate of change in response associated with a one unit change in cx1 changes over the range of cx2 (and vice versa) is 2.7035727.

The 95% confidence intervals for the interaction partial slope does not overlap with 0 implying a significant interaction between cx1 and cx2. This suggests that the nature of the relationship between y and cx1 depends on the level of cx2 (and vice versa). The estimates of the effect of cx1 are only appropriate when cx2 = 0 etc.

While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.

## since values are less than zero
mcmcpvalue(as.matrix(data.rstanarm.mult)[, "cx1"])
[1] 0
mcmcpvalue(as.matrix(data.rstanarm.mult)[, "cx2"])
[1] 0.002222222
mcmcpvalue(as.matrix(data.rstanarm.mult)[, "cx1:cx2"])
[1] 0.03185185

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

Alternatively we can generate posterior intervals for each parameter.

posterior_interval(data.rstanarm.mult, prob = 0.95)
                 2.5%    97.5%
(Intercept) 3.3933953 3.940445
cx1         1.9391715 3.879278
cx2         0.5196027 2.194059
cx1:cx2     0.2548959 5.109150
sigma       0.9753235 1.314384
Conclusions: the 95% confidence intervals for the effects of cx1, cx2 and the interaction do not overlap with 0 implying significant effects.

An alternative way of quantifying the impact of an interaction is to compare models with and without the interactions. In a Bayesian context, this can be achieved by comparing the leave-one-out cross-validation statistics. Leave-one-out (LOO) cross-validation explores how well a series of models can predict withheld values (Vehtari, Gelman, and Gabry, 2016b). The LOO Information Criterion (LOOIC) is analogous to the AIC except that the LOOIC takes priors into consideration, does not assume that the posterior distribution is drawn from a multivariate normal and integrates over parameter uncertainty so as to yield a distribution of looic rather than just a point estimate. The LOOIC does however assume that all observations are equally influential (it does not matter which observations are left out). This assumption can be examined via the Pareta k estimate (values greater than 0.5 or more conservatively 0.75 are considered overly influential).

(full = loo(data.rstanarm.mult))
Computed from 2700 by 100 log-likelihood matrix

         Estimate   SE
elpd_loo   -155.3  6.2
p_loo         4.4  0.7
looic       310.5 12.4

All Pareto k estimates are good (k < 0.5)
See help('pareto-k-diagnostic') for details.
(reduced = loo(data.rstanarm.add))
Computed from 2700 by 100 log-likelihood matrix

         Estimate   SE
elpd_loo   -156.8  6.8
p_loo         3.8  0.7
looic       313.6 13.7

All Pareto k estimates are good (k < 0.5)
See help('pareto-k-diagnostic') for details.
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3)
plot(full, label_points = TRUE)
plot(reduced, label_points = TRUE)
plot of chunk tut7.3bRSTANARMloo
compare_models(full, reduced)
elpd_diff        se 
     -1.5       1.8 
Conclusions: the expected out-of-sample predictive accuracy is very similar (slightly lower) for the additive model compared to the multiplicative model (model containing the interaction). This might be used to suggest that the inferential evidence for an interaction is low.

Additive model (BRMS)

summary(data.brms.add)
 Family: gaussian(identity) 
Formula: y ~ cx1 + cx2 
   Data: data (Number of observations: 100) 
Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; 
         total post-warmup samples = 2700
    ICs: LOO = Not computed; WAIC = Not computed
 
Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     3.82      0.12     3.59     4.04       2700    1
cx1           3.04      0.52     2.04     4.08       2427    1
cx2           1.38      0.44     0.48     2.24       2463    1

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma     1.15      0.08     0.99     1.33       2405    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
# OR
library(broom)
tidyMCMC(data.brms.add$fit, conf.int = TRUE, conf.method = "HPDinterval")
         term estimate  std.error  conf.low conf.high
1 b_Intercept 3.818513 0.11580124 3.6098266  4.062734
2       b_cx1 3.042654 0.51964636 2.0187823  4.042107
3       b_cx2 1.380825 0.44437714 0.5271996  2.258048
4       sigma 1.147234 0.08407413 0.9784836  1.302088
Conclusions:
  • when cx2 is held constant, a one unit increase in cx1 is associated with a 3.0426537 change in y. That is, y increases at a rate of 3.0426537 per unit increase in cx1 when standardized for cx2.
  • when cx1 is held constant, a one unit increase in cx2 is associated with a 1.3808245 change in y. That is, y increases at a rate of 1.3808245 per unit increase in cx2 when standardized for cx1.
Note, as this is an additive model, the rates associated with cx1 are assumed to be constant throughtout the range of cx2 and vice versa. The 95% confidence interval for each partial slope does not overlap with 0 implying a significant effects of cx1 and cx2 on y.

While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.

## since values are less than zero
mcmcpvalue(as.matrix(data.brms.add)[, "b_cx1"])
[1] 0
mcmcpvalue(as.matrix(data.brms.add)[, "b_cx2"])
[1] 0.002592593

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

Multiplicative model (BRMS)

summary(data.brms.mult)
 Family: gaussian(identity) 
Formula: y ~ cx1 * cx2 
   Data: data (Number of observations: 100) 
Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; 
         total post-warmup samples = 2700
    ICs: LOO = Not computed; WAIC = Not computed
 
Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     3.67      0.13     3.42     3.94       2534    1
cx1           2.92      0.49     1.94     3.91       2700    1
cx2           1.35      0.43     0.54     2.19       2528    1
cx1:cx2       2.66      1.27     0.11     5.16       2411    1

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma     1.12      0.08     0.98      1.3       2585    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
# OR
library(broom)
tidyMCMC(data.brms.mult$fit, conf.int = TRUE, conf.method = "HPDinterval")
         term estimate  std.error   conf.low conf.high
1 b_Intercept 3.671568 0.13285306 3.43612342  3.943035
2       b_cx1 2.923696 0.49283728 1.89910831  3.857202
3       b_cx2 1.353976 0.42606897 0.59537098  2.231514
4   b_cx1:cx2 2.659722 1.26670106 0.04200689  5.067596
5       sigma 1.123292 0.08287744 0.97447809  1.290851
  • at the average level of cx2 (=0), a one unit increase in cx1 is associated with a 2.923696 change in y. That is, y increases at a rate of 2.923696 per unit increase in cx1 when standardized for cx2.
  • at the average level of cx1 (=0), a one unit increase in cx2 is associated with a 1.3539765 change in y. That is, y increases at a rate of 1.3539765 per unit increase in cx2 when standardized for cx1.
  • the degree to which the rate of change in response associated with a one unit change in cx1 changes over the range of cx2 (and vice versa) is 2.6597222.

The 95% confidence intervals for the interaction partial slope does not overlap with 0 implying a significant interaction between cx1 and cx2. This suggests that the nature of the relationship between y and cx1 depends on the level of cx2 (and vice versa). The estimates of the effect of cx1 are only appropriate when cx2 = 0 etc.

While workers attempt to become comfortable with a new statistical framework, it is only natural that they like to evaluate and comprehend new structures and output alongside more familiar concepts. One way to facilitate this is via Bayesian p-values that are somewhat analogous to the frequentist p-values for investigating the hypothesis that a parameter is equal to zero.

## since values are less than zero
mcmcpvalue(as.matrix(data.brms.mult)[, "b_cx1"])
[1] 0
mcmcpvalue(as.matrix(data.brms.mult)[, "b_cx2"])
[1] 0.0007407407
mcmcpvalue(as.matrix(data.brms.mult)[, "b_cx1:cx2"])
[1] 0.04

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

With a p-value of essentially 0, we would conclude that there is almost no evidence that the slope was likely to be equal to zero, suggesting there is a relationship.

Alternatively we can generate posterior intervals for each parameter.

posterior_interval(as.matrix(data.brms.mult), prob = 0.95)
                   2.5%      97.5%
b_Intercept   3.4249254   3.935567
b_cx1         1.9378137   3.907302
b_cx2         0.5370439   2.190452
b_cx1:cx2     0.1139071   5.162096
sigma         0.9772463   1.299092
lp__        -64.9204721 -58.831482
Conclusions: the 95% confidence intervals for the effects of cx1, cx2 and the interaction do not overlap with 0 implying significant effects.

An alternative way of quantifying the impact of an interaction is to compare models with and without the interactions. In a Bayesian context, this can be achieved by comparing the leave-one-out cross-validation statistics. Leave-one-out (LOO) cross-validation explores how well a series of models can predict withheld values (Vehtari, Gelman, and Gabry, 2016b). The LOO Information Criterion (LOOIC) is analogous to the AIC except that the LOOIC takes priors into consideration, does not assume that the posterior distribution is drawn from a multivariate normal and integrates over parameter uncertainty so as to yield a distribution of looic rather than just a point estimate. The LOOIC does however assume that all observations are equally influential (it does not matter which observations are left out). This assumption can be examined via the Pareta k estimate (values greater than 0.5 or more conservatively 0.75 are considered overly influential).

(full = loo(data.brms.mult))
  LOOIC    SE
 310.51 12.48
(reduced = loo(data.brms.add))
  LOOIC   SE
 313.62 13.7
par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3)
plot(full, label_points = TRUE)
plot(reduced, label_points = TRUE)
plot of chunk tut7.3bBRMSloo
compare_models(full, reduced)
Error in discrete == discrete[1]: comparison of these types is not implemented
Conclusions: the expected out-of-sample predictive accuracy is very similar (slightly lower) for the additive model compared to the multiplicative model (model containing the interaction). This might be used to suggest that the inferential evidence for an interaction is low.

Graphical summaries

A nice graphic is often a great accompaniment to a statistical analysis. Although there are no fixed assumptions associated with graphing (in contrast to statistical analyses), we often want the graphical summaries to reflect the associated statistical analyses. After all, the sample is just one perspective on the population(s). What we are more interested in is being able to estimate and depict likely population parameters/trends.

Thus, whilst we could easily provide a plot displaying the raw data along with simple measures of location and spread, arguably, we should use estimates that reflect the fitted model. In this case, it would be appropriate to plot the credibility interval associated with each group.

Additive model (MCMCpack)

With appropriate use of model matrices and data wrangling, it is possible to produce a single prediction data set along with ggplot() syntax to produce a multi-panel figure.

mcmc = data.mcmcpack.add
## Calculate the fitted values
newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0,
    cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE),
        len = 100), Pred = 2))
Xmat = model.matrix(~cx1 + cx2, newdata)
coefs = mcmc[, 1:3]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x = dplyr:::recode(Pred, x1, x2))

ggplot(newdata, aes(y = estimate, x = x)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X") + theme_classic() + facet_wrap(~Pred)
plot of chunk tut7.3bMCMCpackGraphicalSummaries

We cannot simply add the raw data to this figure. The reason for this is that the trends represent the effect of one predictor holding the other variable constant. Therefore, the observations we represent on the figure must likewise be standardized. We can achieve this by adding the partial residuals to the figure. Partial residuals are the fitted values plus the residuals.

## Calculate partial residuals fitted values
fdata = rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0,
    cx2 = data$cx2, Pred = 2))
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1,
    x2))

ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid),
    color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high),
    fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() +
    facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" *
        .(Pred))) + theme(axis.title.x = element_blank(), strip.background = element_blank(),
    strip.placement = "outside")
plot of chunk tut7.3bMCMCpackGraphicalSummaries2

However, this method (whist partially elegant) does become overly opaque if we need more extensive axes labels since the x-axes labels are actually strip labels (which must largely be defined outside of the ggplot structure. The alternative is to simply produce each partial plot separately before arranging them together in the one figure.

mcmc = data.mcmcpack.add
## Calculate the fitted values
newdata = data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = 0)
Xmat = model.matrix(~cx1 + cx2, newdata)
coefs = mcmc[, 1:3]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
## Now the partial residuals
fdata = rdata = data.frame(cx1 = data$cx1, cx2 = 0)
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
g1 = ggplot(newdata, aes(y = estimate, x = x1)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X1") + theme_classic()

newdata = data.frame(cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2,
    na.rm = TRUE), len = 100), cx1 = 0)
Xmat = model.matrix(~cx1 + cx2, newdata)
coefs = mcmc[, 1:3]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
## Now the partial residuals
fdata = rdata = data.frame(cx1 = 0, cx2 = data$cx2)
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
g2 = ggplot(newdata, aes(y = estimate, x = x2)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X2") + theme_classic()

grid.arrange(g1, g2, ncol = 2)
plot of chunk tut7.3bMCMCpackGraphicalSummaries3

Multiplicative model (MCMCpack)

For the multiplicative model, we could elect to split the trends up so as to explore the effects of one predictor at several set levels of another predictor. In this example, we will explore the effects of x1 when x2 is equal to its mean in the original data as well as one and two standard deviations below and above this mean.

mcmc = data.mcmcpack.mult
## Calculate the fitted values
newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = mean(data$cx2) + sd(data$cx2) %*%
    -2:2)
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, 1:4]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma")))
## Partial residuals
fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) *
    -2:2)
fMat = rMat = model.matrix(~cx1 * cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
## Partition the partial residuals such that each x1 trend only includes
## x2 data that is within that range in the observed data
findNearest = function(x, y) {
    ff = fields:::rdist(x, y)
    apply(ff, 1, function(x) which(x == min(x)))
}
fn = findNearest(x = data[, c("x1", "x2")], y = rdata[, c("x1", "x2")])
rdata = rdata[fn, ] %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2,
    -1, 0, 1, 2), "*sigma")))
ggplot(newdata, aes(y = estimate, x = x1)) + geom_line() + geom_blank(aes(y = 9)) +
    geom_point(data = rdata, aes(y = partial.resid), color = "grey") +
    geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
        alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") +
    facet_wrap(~x2, labeller = label_parsed, nrow = 1, scales = "free_y") +
    theme_classic() + theme(strip.background = element_blank())
plot of chunk tut7.3bMCMCpackGraphicalSummaries.mult

Alternatively, we could explore the interaction by plotting a two dimensional surface as a heat map. In the example below, relative confidence (upper - lower) is indicated via contours.

mcmc = data.mcmcpack.mult
## Calculate the fitted values
newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2,
    na.rm = TRUE), len = 100))
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, 1:4]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))

## Partial residuals
fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) *
    -2:2)
fMat = rMat = model.matrix(~cx1 * cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)

ggplot(newdata, aes(y = x1, x = x2)) + geom_tile(aes(fill = estimate)) +
    geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("Y",
    colors = heat.colors(10)) + geom_point(data = data, aes(size = y)) +
    scale_y_continuous("X1") + scale_x_continuous("X2") + theme_classic()
plot of chunk tut7.3bMCMCpackGraphicalSummaries.mult2

Additive model (JAGS)

With appropriate use of model matrices and data wrangling, it is possible to produce a single prediction data set along with ggplot() syntax to produce a multi-panel figure.

mcmc = data.r2jags.add$BUGSoutput$sims.matrix
## Calculate the fitted values
newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0,
    cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE),
        len = 100), Pred = 2))
Xmat = model.matrix(~cx1 + cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x = dplyr:::recode(Pred, x1, x2))

ggplot(newdata, aes(y = estimate, x = x)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X") + theme_classic() + facet_wrap(~Pred)
plot of chunk tut7.3bJAGSGraphicalSummaries

We cannot simply add the raw data to this figure. The reason for this is that the trends represent the effect of one predictor holding the other variable constant. Therefore, the observations we represent on the figure must likewise be standardized. We can achieve this by adding the partial residuals to the figure. Partial residuals are the fitted values plus the residuals.

## Calculate partial residuals fitted values
fdata = rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0,
    cx2 = data$cx2, Pred = 2))
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1,
    x2))

ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid),
    color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high),
    fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() +
    facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" *
        .(Pred))) + theme(axis.title.x = element_blank(), strip.background = element_blank(),
    strip.placement = "outside")
plot of chunk tut7.3bJAGSGraphicalSummaries2

However, this method (whist partially elegant) does become overly opaque if we need more extensive axes labels since the x-axes labels are actually strip labels (which must largely be defined outside of the ggplot structure. The alternative is to simply produce each partial plot separately before arranging them together in the one figure.

mcmc = data.r2jags.add$BUGSoutput$sims.matrix
## Calculate the fitted values
newdata = data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = 0)
Xmat = model.matrix(~cx1 + cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
## Now the partial residuals
fdata = rdata = data.frame(cx1 = data$cx1, cx2 = 0)
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
g1 = ggplot(newdata, aes(y = estimate, x = x1)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X1") + theme_classic()

newdata = data.frame(cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2,
    na.rm = TRUE), len = 100), cx1 = 0)
Xmat = model.matrix(~cx1 + cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
## Now the partial residuals
fdata = rdata = data.frame(cx1 = 0, cx2 = data$cx2)
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
g2 = ggplot(newdata, aes(y = estimate, x = x2)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X2") + theme_classic()

grid.arrange(g1, g2, ncol = 2)
plot of chunk tut7.3bJAGSGraphicalSummaries3

Multiplicative model (JAGS)

For the multiplicative model, we could elect to split the trends up so as to explore the effects of one predictor at several set levels of another predictor. In this example, we will explore the effects of x1 when x2 is equal to its mean in the original data as well as one and two standard deviations below and above this mean.

mcmc = data.r2jags.mult$BUGSoutput$sims.matrix
## Calculate the fitted values
newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = mean(data$cx2) + sd(data$cx2) %*%
    -2:2)
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma")))
## Partial residuals
fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) *
    -2:2)
fMat = rMat = model.matrix(~cx1 * cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
## Partition the partial residuals such that each x1 trend only includes
## x2 data that is within that range in the observed data
findNearest = function(x, y) {
    ff = fields:::rdist(x, y)
    apply(ff, 1, function(x) which(x == min(x)))
}
fn = findNearest(x = data[, c("x1", "x2")], y = rdata[, c("x1", "x2")])
rdata = rdata[fn, ] %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2,
    -1, 0, 1, 2), "*sigma")))
ggplot(newdata, aes(y = estimate, x = x1)) + geom_line() + geom_blank(aes(y = 9)) +
    geom_point(data = rdata, aes(y = partial.resid), color = "grey") +
    geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
        alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") +
    facet_wrap(~x2, labeller = label_parsed, nrow = 1, scales = "free_y") +
    theme_classic() + theme(strip.background = element_blank())
plot of chunk tut7.3bJAGSGraphicalSummaries.mult

Alternatively, we could explore the interaction by plotting a two dimensional surface as a heat map. In the example below, relative confidence (upper - lower) is indicated via contours.

mcmc = data.r2jags.mult$BUGSoutput$sims.matrix
## Calculate the fitted values
newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2,
    na.rm = TRUE), len = 100))
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))

## Partial residuals
fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) *
    -2:2)
fMat = rMat = model.matrix(~cx1 * cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)

ggplot(newdata, aes(y = x1, x = x2)) + geom_tile(aes(fill = estimate)) +
    geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("Y",
    colors = heat.colors(10)) + geom_point(data = data, aes(size = y)) +
    scale_y_continuous("X1") + scale_x_continuous("X2") + theme_classic()
plot of chunk tut7.3bJAGSGraphicalSummaries.mult2

Additive model (RSTAN)

With appropriate use of model matrices and data wrangling, it is possible to produce a single prediction data set along with ggplot() syntax to produce a multi-panel figure.

mcmc = as.matrix(data.rstan.add)
## Calculate the fitted values
newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0,
    cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE),
        len = 100), Pred = 2))
Xmat = model.matrix(~cx1 + cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x = dplyr:::recode(Pred, x1, x2))

ggplot(newdata, aes(y = estimate, x = x)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X") + theme_classic() + facet_wrap(~Pred)
plot of chunk tut7.3bRSTANGraphicalSummaries

We cannot simply add the raw data to this figure. The reason for this is that the trends represent the effect of one predictor holding the other variable constant. Therefore, the observations we represent on the figure must likewise be standardized. We can achieve this by adding the partial residuals to the figure. Partial residuals are the fitted values plus the residuals.

## Calculate partial residuals fitted values
fdata = rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0,
    cx2 = data$cx2, Pred = 2))
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1,
    x2))

ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid),
    color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high),
    fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() +
    facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" *
        .(Pred))) + theme(axis.title.x = element_blank(), strip.background = element_blank(),
    strip.placement = "outside")
plot of chunk tut7.3bRSTANGraphicalSummaries2

However, this method (whist partially elegant) does become overly opaque if we need more extensive axes labels since the x-axes labels are actually strip labels (which must largely be defined outside of the ggplot structure. The alternative is to simply produce each partial plot separately before arranging them together in the one figure.

mcmc = as.matrix(data.rstan.add)
## Calculate the fitted values
newdata = data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = 0)
Xmat = model.matrix(~cx1 + cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
## Now the partial residuals
fdata = rdata = data.frame(cx1 = data$cx1, cx2 = 0)
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
g1 = ggplot(newdata, aes(y = estimate, x = x1)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X1") + theme_classic()

newdata = data.frame(cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2,
    na.rm = TRUE), len = 100), cx1 = 0)
Xmat = model.matrix(~cx1 + cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
## Now the partial residuals
fdata = rdata = data.frame(cx1 = 0, cx2 = data$cx2)
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
g2 = ggplot(newdata, aes(y = estimate, x = x2)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X2") + theme_classic()

grid.arrange(g1, g2, ncol = 2)
plot of chunk tut7.3bRSTANGraphicalSummaries3

Multiplicative model (RSTAN)

For the multiplicative model, we could elect to split the trends up so as to explore the effects of one predictor at several set levels of another predictor. In this example, we will explore the effects of x1 when x2 is equal to its mean in the original data as well as one and two standard deviations below and above this mean.

mcmc = as.matrix(data.rstan.mult)
## Calculate the fitted values
newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = mean(data$cx2) + sd(data$cx2) %*%
    -2:2)
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma")))
## Partial residuals
fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) *
    -2:2)
fMat = rMat = model.matrix(~cx1 * cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
## Partition the partial residuals such that each x1 trend only includes
## x2 data that is within that range in the observed data
findNearest = function(x, y) {
    ff = fields:::rdist(x, y)
    apply(ff, 1, function(x) which(x == min(x)))
}
fn = findNearest(x = data[, c("x1", "x2")], y = rdata[, c("x1", "x2")])
rdata = rdata[fn, ] %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2,
    -1, 0, 1, 2), "*sigma")))
ggplot(newdata, aes(y = estimate, x = x1)) + geom_line() + geom_blank(aes(y = 9)) +
    geom_point(data = rdata, aes(y = partial.resid), color = "grey") +
    geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
        alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") +
    facet_wrap(~x2, labeller = label_parsed, nrow = 1, scales = "free_y") +
    theme_classic() + theme(strip.background = element_blank())
plot of chunk tut7.3bRSTANGraphicalSummaries.mult

Alternatively, we could explore the interaction by plotting a two dimensional surface as a heat map. In the example below, relative confidence (upper - lower) is indicated via contours.

mcmc = as.matrix(data.rstan.mult)
## Calculate the fitted values
newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2,
    na.rm = TRUE), len = 100))
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))

## Partial residuals
fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) *
    -2:2)
fMat = rMat = model.matrix(~cx1 * cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)

ggplot(newdata, aes(y = x1, x = x2)) + geom_tile(aes(fill = estimate)) +
    geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("Y",
    colors = heat.colors(10)) + geom_point(data = data, aes(size = y)) +
    scale_y_continuous("X1") + scale_x_continuous("X2") + theme_classic()
plot of chunk tut7.3bRSTANGraphicalSummaries.mult2

Additive model (RSTANARM)

Although we could calculated the fitted values via matrix multiplication of the coefficients and the model matrix (as for MCMCpack, RJAGS and RSTAN), for more complex models, it is more convenient to use the posterior_linpred function that comes with rstantools.

## Calculate the fitted values
newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE),
    max(data$cx1, na.rm = TRUE), len = 100), cx2 = 0, Pred = 1),
    data.frame(cx1 = 0, cx2 = seq(min(data$cx2, na.rm = TRUE),
        max(data$cx2, na.rm = TRUE), len = 100), Pred = 2))
fit = posterior_linpred(data.rstanarm.add, newdata = newdata)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x = dplyr:::recode(Pred, x1, x2))
## Partial residual
rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1),
    data.frame(cx1 = 0, cx2 = data$cx2, Pred = 2))
pp = posterior_linpred(data.rstanarm.add, newdata = rdata)
fit = as.vector(apply(pp, 2, median))
resid = resid(data.rstanarm.add)
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred,
    x1, x2))

ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    theme_classic() + facet_wrap(~Pred, strip.position = "bottom",
    labeller = label_bquote("x" * .(Pred)), sclaes = "free") +
    theme(axis.title.x = element_blank(), strip.background = element_blank(),
        strip.placement = "outside")
Error in facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" * : unused argument (sclaes = "free")

A more general solution would be to add the partial residuals to the figure. Partial residuals are the fitted values plus the residuals. In this simple case, that equates to exactly the same as the raw observations since $$resid = obs - fitted$$ and the fitted values depend only on the single predictor we are interested in.

mcmc = as.matrix(data.rstanarm.add)
newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0,
    cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE),
        len = 100), Pred = 2))
Xmat = model.matrix(~cx1 + cx2, data = newdata)
coefs = mcmc[, c("(Intercept)", "cx1", "cx2")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x = dplyr:::recode(Pred, x1, x2))

## Calculate partial residuals fitted values
fdata = rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0,
    cx2 = data$cx2, Pred = 2))
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1,
    x2))

ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid),
    color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high),
    fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() +
    facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" *
        .(Pred)), scales = "free") + theme(axis.title.x = element_blank(),
    strip.background = element_blank(), strip.placement = "outside")
plot of chunk tut7.3bRSTANARMGraphicalSummaries2

However, this method (whist partially elegant) does become overly opaque if we need more extensive axes labels since the x-axes labels are actually strip labels (which must largely be defined outside of the ggplot structure. The alternative is to simply produce each partial plot separately before arranging them together in the one figure.

mcmc = as.matrix(data.rstanarm.add)
coefs = mcmc[, c("(Intercept)", "cx1", "cx2")]
## Calculate the fitted values
newdata = data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = 0)
Xmat = model.matrix(~cx1 + cx2, newdata)
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
## Now the partial residuals
fdata = rdata = data.frame(cx1 = data$cx1, cx2 = 0)
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
g1 = ggplot(newdata, aes(y = estimate, x = x1)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X1") + theme_classic()

newdata = data.frame(cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2,
    na.rm = TRUE), len = 100), cx1 = 0)
Xmat = model.matrix(~cx1 + cx2, newdata)
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
## Now the partial residuals
fdata = rdata = data.frame(cx1 = 0, cx2 = data$cx2)
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
g2 = ggplot(newdata, aes(y = estimate, x = x2)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X2") + theme_classic()

grid.arrange(g1, g2, ncol = 2)
plot of chunk tut7.3bRSTANARMGraphicalSummaries3

Multiplicative model (RSTANARM)

For the multiplicative model, we could elect to split the trends up so as to explore the effects of one predictor at several set levels of another predictor. In this example, we will explore the effects of x1 when x2 is equal to its mean in the original data as well as one and two standard deviations below and above this mean.

mcmc = as.matrix(data.rstanarm.mult)
## Calculate the fitted values
newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = mean(data$cx2) + sd(data$cx2) %*%
    -2:2)
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma")))
## Partial residuals
fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) *
    -2:2)
fMat = rMat = model.matrix(~cx1 * cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
## Partition the partial residuals such that each x1 trend only includes
## x2 data that is within that range in the observed data
findNearest = function(x, y) {
    ff = fields:::rdist(x, y)
    apply(ff, 1, function(x) which(x == min(x)))
}
fn = findNearest(x = data[, c("x1", "x2")], y = rdata[, c("x1", "x2")])
rdata = rdata[fn, ] %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2,
    -1, 0, 1, 2), "*sigma")))
ggplot(newdata, aes(y = estimate, x = x1)) + geom_line() + geom_blank(aes(y = 9)) +
    geom_point(data = rdata, aes(y = partial.resid), color = "grey") +
    geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
        alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") +
    facet_wrap(~x2, labeller = label_parsed, nrow = 1, scales = "free_y") +
    theme_classic() + theme(strip.background = element_blank())
plot of chunk tut7.3bRSTANARMGraphicalSummaries.mult

Alternatively, we could explore the interaction by plotting a two dimensional surface as a heat map. In the example below, relative confidence (upper - lower) is indicated via contours.

mcmc = as.matrix(data.rstanarm.mult)
## Calculate the fitted values
newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2,
    na.rm = TRUE), len = 100))
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))

## Partial residuals
fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) *
    -2:2)
fMat = rMat = model.matrix(~cx1 * cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)

ggplot(newdata, aes(y = x1, x = x2)) + geom_tile(aes(fill = estimate)) +
    geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("Y",
    colors = heat.colors(10)) + geom_point(data = data, aes(size = y)) +
    scale_y_continuous("X1") + scale_x_continuous("X2") + theme_classic()
plot of chunk tut7.3bRSTANARMGraphicalSummaries.mult2

Additive model (BRMS)

Although we could calculated the fitted values via matrix multiplication of the coefficients and the model matrix (as for MCMCpack, RJAGS and RSTAN), for more complex models, it is more convenient to use the marginal_effects function that comes with brms.

plot(marginal_effects(data.brms.add), points = TRUE)
plot of chunk tut7.3bBRMSGraphicalSummaries.add
plot of chunk tut7.3bBRMSGraphicalSummaries.add
# OR
eff = plot(marginal_effects(data.brms.add), points = TRUE, plot = FALSE)
do.call("grid.arrange", c(eff, nrow = 1))
plot of chunk tut7.3bBRMSGraphicalSummaries.add

This is a great way of producing a quick plot. However, notice that the x axes are still on the scale of the centered predictors. A more general solution would be to add the partial residuals to the figure. Partial residuals are the fitted values plus the residuals. In this simple case, that equates to exactly the same as the raw observations since $$resid = obs - fitted$$ and the fitted values depend only on the single predictor we are interested in.

mcmc = as.matrix(data.brms.add)
newdata = rbind(data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = 0, Pred = 1), data.frame(cx1 = 0,
    cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2, na.rm = TRUE),
        len = 100), Pred = 2))
Xmat = model.matrix(~cx1 + cx2, data = newdata)
coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x = dplyr:::recode(Pred, x1, x2))

## Calculate partial residuals fitted values
fdata = rdata = rbind(data.frame(cx1 = data$cx1, cx2 = 0, Pred = 1), data.frame(cx1 = 0,
    cx2 = data$cx2, Pred = 2))
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2) %>% mutate(x = dplyr:::recode(Pred, x1,
    x2))

ggplot(newdata, aes(y = estimate, x = x)) + geom_point(data = rdata, aes(y = partial.resid),
    color = "gray") + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high),
    fill = "blue", alpha = 0.3) + scale_y_continuous("Y") + theme_classic() +
    facet_wrap(~Pred, strip.position = "bottom", labeller = label_bquote("x" *
        .(Pred)), scales = "free") + theme(axis.title.x = element_blank(),
    strip.background = element_blank(), strip.placement = "outside")
plot of chunk tut7.3bBRMSGraphicalSummaries2

However, this method (whist partially elegant) does become overly opaque if we need more extensive axes labels since the x-axes labels are actually strip labels (which must largely be defined outside of the ggplot structure. The alternative is to simply produce each partial plot separately before arranging them together in the one figure.

mcmc = as.matrix(data.brms.add)
coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2")]
## Calculate the fitted values
newdata = data.frame(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = 0)
Xmat = model.matrix(~cx1 + cx2, newdata)
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
## Now the partial residuals
fdata = rdata = data.frame(cx1 = data$cx1, cx2 = 0)
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
g1 = ggplot(newdata, aes(y = estimate, x = x1)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X1") + theme_classic()

newdata = data.frame(cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2,
    na.rm = TRUE), len = 100), cx1 = 0)
Xmat = model.matrix(~cx1 + cx2, newdata)
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
## Now the partial residuals
fdata = rdata = data.frame(cx1 = 0, cx2 = data$cx2)
fMat = rMat = model.matrix(~cx1 + cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
g2 = ggplot(newdata, aes(y = estimate, x = x2)) + geom_point(data = rdata,
    aes(y = partial.resid), color = "grey") + geom_line() + geom_ribbon(aes(ymin = conf.low,
    ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Y") +
    scale_x_continuous("X2") + theme_classic()

grid.arrange(g1, g2, ncol = 2)
plot of chunk tut7.3bBRMSGraphicalSummaries3

Multiplicative model (BRMS)

For the multiplicative model, we could elect to split the trends up so as to explore the effects of one predictor at several set levels of another predictor. In this example, we will explore the effects of x1 when x2 is equal to its mean in the original data as well as one and two standard deviations below and above this mean.

We will first explore the simple built in marginal_effects() function.

plot(marginal_effects(data.brms.mult, effects = "cx1:cx2"), points = TRUE)
plot of chunk tut7.3bBRMSGraphicalSummaries.multSimple
# OR Define a function that will calculate mean plus or minus
# 2 and 1 standard deviations
msd2 = function(x) {
    means = mean(x, na.rm = TRUE)
    sd = sd(x, na.rm = TRUE)
    means + (-2:2) * sd
}
plot(marginal_effects(data.brms.mult, effects = "cx1:cx2", int_conditions = list(cx2 = msd2)),
    points = TRUE)
plot of chunk tut7.3bBRMSGraphicalSummaries.multSimple
# OR we could arrange the effect of cx1 separately for
# different values of cx2 (mean plus or minus 1 and 2
# standard deviations)
cond = data.frame(cx2 = msd2(data$cx2), row.names = paste0("cx2: mean ",
    -2:2, "*sd"))
plot(marginal_effects(data.brms.mult, effects = "cx1", conditions = cond,
    select_points = 0.1), points = TRUE)
plot of chunk tut7.3bBRMSGraphicalSummaries.multSimple
## Yet another way would be as a 2D surface
plot(marginal_effects(data.brms.mult, effects = "cx1:cx2", surface = TRUE),
    points = TRUE, stype = "raster")
plot of chunk tut7.3bBRMSGraphicalSummaries.multSimple
mcmc = as.matrix(data.brms.mult)
## Calculate the fitted values
newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = mean(data$cx2) + sd(data$cx2) %*%
    -2:2)
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2", "b_cx1:cx2")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")) %>%
    mutate(x2 = factor(x2, labels = paste("X2:~", c(-2, -1, 0, 1, 2), "*sigma")))
## Partial residuals
fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) *
    -2:2)
fMat = rMat = model.matrix(~cx1 * cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)
## Partition the partial residuals such that each x1 trend only includes
## x2 data that is within that range in the observed data
findNearest = function(x, y) {
    ff = fields:::rdist(x, y)
    apply(ff, 1, function(x) which(x == min(x)))
}
fn = findNearest(x = data[, c("x1", "x2")], y = rdata[, c("x1", "x2")])
rdata = rdata[fn, ] %>% mutate(x2 = factor(x2, labels = paste("X2:~", c(-2,
    -1, 0, 1, 2), "*sigma")))
ggplot(newdata, aes(y = estimate, x = x1)) + geom_line() + geom_blank(aes(y = 9)) +
    geom_point(data = rdata, aes(y = partial.resid), color = "grey") +
    geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
        alpha = 0.3) + scale_y_continuous("Y") + scale_x_continuous("X1") +
    facet_wrap(~x2, labeller = label_parsed, nrow = 1, scales = "free_y") +
    theme_classic() + theme(strip.background = element_blank())
plot of chunk tut7.3bBRMSGraphicalSummaries.mult

Alternatively, we could explore the interaction by plotting a two dimensional surface as a heat map. In the example below, relative confidence (upper - lower) is indicated via contours.

mcmc = as.matrix(data.brms.mult)
## Calculate the fitted values
newdata = expand.grid(cx1 = seq(min(data$cx1, na.rm = TRUE), max(data$cx1,
    na.rm = TRUE), len = 100), cx2 = seq(min(data$cx2, na.rm = TRUE), max(data$cx2,
    na.rm = TRUE), len = 100))
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2", "b_cx1:cx2")]
fit = coefs %*% t(Xmat)
newdata = newdata %>% mutate(x1 = cx1 + mean.x1, x2 = cx2 + mean.x2) %>%
    cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))

## Partial residuals
fdata = rdata = expand.grid(cx1 = data$cx1, cx2 = mean(data$cx2) + sd(data$cx2) *
    -2:2)
fMat = rMat = model.matrix(~cx1 * cx2, fdata)
fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
resid = as.vector(data$y - apply(coefs, 2, median) %*% t(rMat))
rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(x1 = cx1 +
    mean.x1, x2 = cx2 + mean.x2)

ggplot(newdata, aes(y = x1, x = x2)) + geom_tile(aes(fill = estimate)) +
    geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("Y",
    colors = heat.colors(10)) + geom_point(data = data, aes(size = y)) +
    scale_y_continuous("X1") + scale_x_continuous("X2") + theme_classic()
plot of chunk tut7.3bBRMSGraphicalSummaries.mult2

Effect sizes

In addition to deriving the distribution means for the slope parameter, we could make use of the Bayesian framework to derive the distribution of the effect size. In so doing, effect size could be considered as either the rate of change or alternatively, the difference between pairs of values along the predictor gradient. For the latter case, there are multiple ways of calculating an effect size, but the two most common are:

Raw effect size
the difference between two groups (as already calculated)
Cohen's D
the effect size standardized by division with the pooled standard deviation $$ D = (\mu_A - \mu_B)/\sigma $$
Percent effect size
expressing the effect size as a percent of one of the pairs. That is, whether you expressing a percentage increase or a percentage decline depends on which of the pairs of values are considered a reference value. Care must be exercised to ensure no division by zeros occur.

For simple linear models, effect size based on a rate is essentially the same as above except that it is expressed per unit of the predictor. Of course in many instances, one unit change in the predictor represents too subtle a shift in the underlying gradient to likely yield any ecologically meaningful or appreciable change in response.

Lets explore a range of effect sizes:

  • Raw effect size between the largest and smallest x
  • Cohen's D
  • Percentage change between the largest and smallest x
  • Fractional change between the largest and smallest x

Probability that a change in x1 is associated with greater than a 50% increase in y at various levels of x2. Clearly, in order to explore this inference, we must first express the change in y as a percentage. This in turn requires us to calculate start and end points from which to calculate the magnitude of the effect (amount of increase in y) as well as the percentage decline. Hence, we start by predicting the distribution of y at the lowest and highest values of x1 at five levels of x2 (representing two standard deviations below the cx2 mean, one standard deviation below the cx2 mean, the cx2 mean, one standard deviation above the cx2 mean and 2 standard deviations above the cx2 mean.

For this exercise we will only use the multiplicative model. Needless to say, the process would be very similar for the additive model.

mcmc = data.mcmcpack.mult
newdata = expand.grid(cx1 = c(min(data$cx1), max(data$cx1)), cx2 = (-2:2) *
    sd(data$cx2))
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")]
fit = coefs %*% t(Xmat)
s1 = seq(1, 9, b = 2)
s2 = seq(2, 10, b = 2)
## Raw effect size
(RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error   conf.low conf.high
1    2 1.131538 0.9834683 -0.8617237  2.992687
2    4 2.007505 0.6623440  0.6986248  3.288608
3    6 2.883473 0.4916937  1.9057792  3.818553
4    8 3.759440 0.6135136  2.5552802  4.941969
5   10 4.635408 0.9179371  2.8090733  6.391762
## Cohen's D
cohenD = (fit[, s2] - fit[, s1])/sqrt(mcmc[, "sigma2"])
(cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error   conf.low conf.high
1    2 1.018302 0.8809005 -0.6266923  2.824726
2    4 1.805760 0.6063423  0.6348380  3.010847
3    6 2.593218 0.4784125  1.6946060  3.568473
4    8 3.380676 0.5997235  2.2250809  4.547363
5   10 4.168134 0.8717915  2.4901356  5.850737
# Percentage change (relative to Group A)
ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
(PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
  term  estimate std.error  conf.low conf.high
1    2  60.59249  60.30857 -36.51167  183.4926
2    4  93.25863  40.13721  18.95693  172.6912
3    6 130.69428  37.49792  62.55793  204.8880
4    8 175.48589  68.84283  74.16229  312.6143
5   10 237.13768 368.17308  52.26804  489.9922
# Probability that the effect is greater than 50% (an increase of >50%)
(p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
     2      4      6      8     10 
0.5051 0.8749 0.9978 0.9985 0.9966 
## fractional change
(FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error  conf.low conf.high
1    2 1.605925 0.6030857 0.6348833  2.834926
2    4 1.932586 0.4013721 1.1895693  2.726912
3    6 2.306943 0.3749792 1.6255793  3.048880
4    8 2.754859 0.6884283 1.7416229  4.126143
5   10 3.371377 3.6817308 1.5226804  5.899922

Conclusions:

  • On average, when x2 is equal to its mean, Y increases by 2.8834728 over the observed range of x1. We are 95% confident that the increase is between 1.9057792 and 3.8185531.
  • The Cohen's D associated change over the observed range of x1 is 2.5932182.
  • On average, Y increases by 130.694% over the observed range of x1 (at average x2). We are 95% confident that the increase is between 62.558% and 204.888%.
  • The probability that Y increases by more than 50% over the observed range of x1 (average x2) is 0.998.
  • On average, Y increases by a factor of 2.307% over the observed range of x1 (average x2). We are 95% confident that the decline is between a factor of 1.626% and 3.049%.

Lets explore a range of effect sizes:

  • Raw effect size between the largest and smallest x
  • Cohen's D
  • Percentage change between the largest and smallest x
  • Fractional change between the largest and smallest x

Probability that a change in x1 is associated with greater than a 50% increase in y at various levels of x2. Clearly, in order to explore this inference, we must first express the change in y as a percentage. This in turn requires us to calculate start and end points from which to calculate the magnitude of the effect (amount of increase in y) as well as the percentage decline. Hence, we start by predicting the distribution of y at the lowest and highest values of x1 at five levels of x2 (representing two standard deviations below the cx2 mean, one standard deviation below the cx2 mean, the cx2 mean, one standard deviation above the cx2 mean and 2 standard deviations above the cx2 mean.

For this exercise we will only use the multiplicative model. Needless to say, the process would be very similar for the additive model.

mcmc = data.r2jags.mult$BUGSoutput$sims.matrix
newdata = expand.grid(cx1 = c(min(data$cx1), max(data$cx1)), cx2 = (-2:2) *
    sd(data$cx2))
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
fit = coefs %*% t(Xmat)
s1 = seq(1, 9, b = 2)
s2 = seq(2, 10, b = 2)
## Raw effect size
(RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error   conf.low conf.high
1    2 1.128491 0.9881108 -0.8113367  3.042310
2    4 2.007034 0.6622062  0.7082004  3.281786
3    6 2.885577 0.4910011  1.9349269  3.833589
4    8 3.764120 0.6197240  2.5853036  5.011755
5   10 4.642663 0.9313671  2.7349541  6.390688
## Cohen's D
cohenD = (fit[, s2] - fit[, s1])/sqrt(mcmc[, "sigma2"])
Error in mcmc[, "sigma2"]: subscript out of bounds
(cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error   conf.low conf.high
1    2 1.018302 0.8809005 -0.6266923  2.824726
2    4 1.805760 0.6063423  0.6348380  3.010847
3    6 2.593218 0.4784125  1.6946060  3.568473
4    8 3.380676 0.5997235  2.2250809  4.547363
5   10 4.168134 0.8717915  2.4901356  5.850737
# Percentage change (relative to Group A)
ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
(PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
  term  estimate  std.error  conf.low conf.high
1    2  61.20075   64.56737 -41.40202  180.6583
2    4  93.37902   40.80821  19.36101  172.7273
3    6 130.88092   37.48191  64.85883  205.7658
4    8 176.42530   71.66748  67.72479  311.9882
5   10 262.64539 2612.68958  50.93741  509.6312
# Probability that the effect is greater than 50% (an increase of >50%)
(p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
        2         4         6         8        10 
0.5030000 0.8716667 0.9978667 0.9990667 0.9964667 
## fractional change
(FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate  std.error  conf.low conf.high
1    2 1.612008  0.6456737 0.5859798  2.806583
2    4 1.933790  0.4080821 1.1936101  2.727273
3    6 2.308809  0.3748191 1.6485883  3.057658
4    8 2.764253  0.7166748 1.6772479  4.119882
5   10 3.626454 26.1268958 1.5093741  6.096312

Conclusions:

  • On average, when x2 is equal to its mean, Y increases by 2.885577 over the observed range of x1. We are 95% confident that the increase is between 1.9349269 and 3.8335886.
  • The Cohen's D associated change over the observed range of x1 is 2.5932182.
  • On average, Y increases by 130.881% over the observed range of x1 (at average x2). We are 95% confident that the increase is between 64.859% and 205.766%.
  • The probability that Y increases by more than 50% over the observed range of x1 (average x2) is 0.998.
  • On average, Y increases by a factor of 2.309% over the observed range of x1 (average x2). We are 95% confident that the decline is between a factor of 1.649% and 3.058%.

Lets explore a range of effect sizes:

  • Raw effect size between the largest and smallest x
  • Cohen's D
  • Percentage change between the largest and smallest x
  • Fractional change between the largest and smallest x

Probability that a change in x1 is associated with greater than a 50% increase in y at various levels of x2. Clearly, in order to explore this inference, we must first express the change in y as a percentage. This in turn requires us to calculate start and end points from which to calculate the magnitude of the effect (amount of increase in y) as well as the percentage decline. Hence, we start by predicting the distribution of y at the lowest and highest values of x1 at five levels of x2 (representing two standard deviations below the cx2 mean, one standard deviation below the cx2 mean, the cx2 mean, one standard deviation above the cx2 mean and 2 standard deviations above the cx2 mean.

For this exercise we will only use the multiplicative model. Needless to say, the process would be very similar for the additive model.

mcmc = as.matrix(data.rstan.mult)
newdata = expand.grid(cx1 = c(min(data$cx1), max(data$cx1)), cx2 = (-2:2) *
    sd(data$cx1))
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
fit = coefs %*% t(Xmat)
s1 = seq(1, 9, b = 2)
s2 = seq(2, 10, b = 2)
## Raw effect size
(RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error   conf.low conf.high
1    2 1.378868 0.8964917 -0.3259671  3.191336
2    4 2.138889 0.6370163  0.8738894  3.372097
3    6 2.898910 0.5013501  1.9580862  3.913157
4    8 3.658931 0.5834243  2.5187370  4.817305
5   10 4.418952 0.8202961  2.7627498  5.970749
## Cohen's D
cohenD = (fit[, s2] - fit[, s1])/sqrt(mcmc[, "sigma2"])
Error in mcmc[, "sigma2"]: subscript out of bounds
(cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error   conf.low conf.high
1    2 1.241077 0.7956905 -0.2242797  2.896785
2    4 1.917148 0.5758933  0.8032340  3.056407
3    6 2.593218 0.4784125  1.6946060  3.568473
4    8 3.269289 0.5699119  2.1779064  4.378729
5   10 3.945360 0.7870300  2.4733235  5.523548
# Percentage change (relative to Group A)
ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
(PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
  term  estimate std.error  conf.low conf.high
1    2  69.38721  54.62118 -23.99487  177.2210
2    4  98.87413  39.22694  26.59480  175.7967
3    6 132.15307  38.36353  67.01962  213.2552
4    8 171.39922  67.04252  66.87743  295.4455
5   10 229.60617 482.03370  58.23209  432.2243
# Probability that the effect is greater than 50% (an increase of >50%)
(p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
        2         4         6         8        10 
0.5980000 0.9163333 0.9973333 0.9993333 0.9973333 
## fractional change
(FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error  conf.low conf.high
1    2 1.693872 0.5462118 0.7600513  2.772210
2    4 1.988741 0.3922694 1.2659480  2.757967
3    6 2.321531 0.3836353 1.6701962  3.132552
4    8 2.713992 0.6704252 1.6687743  3.954455
5   10 3.296062 4.8203370 1.5823209  5.322243

Conclusions:

  • On average, when x2 is equal to its mean, Y increases by 2.8989101 over the observed range of x1. We are 95% confident that the increase is between 1.9580862 and 3.9131569.
  • The Cohen's D associated change over the observed range of x1 is 2.5932182.
  • On average, Y increases by 132.153% over the observed range of x1 (at average x2). We are 95% confident that the increase is between 67.020% and 213.255%.
  • The probability that Y increases by more than 50% over the observed range of x1 (average x2) is 0.997.
  • On average, Y increases by a factor of 2.322% over the observed range of x1 (average x2). We are 95% confident that the decline is between a factor of 1.670% and 3.133%.

Lets explore a range of effect sizes:

  • Raw effect size between the largest and smallest x
  • Cohen's D
  • Percentage change between the largest and smallest x
  • Fractional change between the largest and smallest x

Probability that a change in x1 is associated with greater than a 50% increase in y at various levels of x2. Clearly, in order to explore this inference, we must first express the change in y as a percentage. This in turn requires us to calculate start and end points from which to calculate the magnitude of the effect (amount of increase in y) as well as the percentage decline. Hence, we start by predicting the distribution of y at the lowest and highest values of x1 at five levels of x2 (representing two standard deviations below the cx2 mean, one standard deviation below the cx2 mean, the cx2 mean, one standard deviation above the cx2 mean and 2 standard deviations above the cx2 mean.

For this exercise we will only use the multiplicative model. Needless to say, the process would be very similar for the additive model.

mcmc = as.matrix(data.rstanarm.mult)
newdata = expand.grid(cx1 = c(min(data$cx1), max(data$cx1)), cx2 = (-2:2) *
    sd(data$cx1))
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")]
fit = coefs %*% t(Xmat)
s1 = seq(1, 9, b = 2)
s2 = seq(2, 10, b = 2)
## Raw effect size
(RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error   conf.low conf.high
1    2 1.364070 0.8886508 -0.3986485  3.095723
2    4 2.126355 0.6252966  0.9360712  3.408929
3    6 2.888639 0.4860847  1.9045201  3.815556
4    8 3.650924 0.5705679  2.5904201  4.829344
5   10 4.413208 0.8116676  2.8424069  5.966333
## Cohen's D
cohenD = (fit[, s2] - fit[, s1])/sqrt(mcmc[, "sigma2"])
Error in mcmc[, "sigma2"]: subscript out of bounds
(cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error   conf.low conf.high
1    2 1.241077 0.7956905 -0.2242797  2.896785
2    4 1.917148 0.5758933  0.8032340  3.056407
3    6 2.593218 0.4784125  1.6946060  3.568473
4    8 3.269289 0.5699119  2.1779064  4.378729
5   10 3.945360 0.7870300  2.4733235  5.523548
# Percentage change (relative to Group A)
ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
(PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
  term  estimate std.error  conf.low conf.high
1    2  68.56476  54.43507 -26.75850  173.5301
2    4  97.87279  38.40045  29.88173  174.5653
3    6 131.01647  36.90149  60.00110  199.9777
4    8 169.96282  62.60247  70.59416  293.3944
5   10 221.68966 160.07276  62.33684  446.4973
# Probability that the effect is greater than 50% (an increase of >50%)
(p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
        2         4         6         8        10 
0.5988889 0.9155556 0.9974074 0.9992593 0.9981481 
## fractional change
(FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error conf.low conf.high
1    2 1.685648 0.5443507 0.732415  2.735301
2    4 1.978728 0.3840045 1.298817  2.745653
3    6 2.310165 0.3690149 1.600011  2.999777
4    8 2.699628 0.6260247 1.705942  3.933944
5   10 3.216897 1.6007276 1.623368  5.464973

Conclusions:

  • On average, when x2 is equal to its mean, Y increases by 2.8886394 over the observed range of x1. We are 95% confident that the increase is between 1.9045201 and 3.8155558.
  • The Cohen's D associated change over the observed range of x1 is 2.5932182.
  • On average, Y increases by 131.016% over the observed range of x1 (at average x2). We are 95% confident that the increase is between 60.001% and 199.978%.
  • The probability that Y increases by more than 50% over the observed range of x1 (average x2) is 0.997.
  • On average, Y increases by a factor of 2.310% over the observed range of x1 (average x2). We are 95% confident that the decline is between a factor of 1.600% and 3.000%.

Lets explore a range of effect sizes:

  • Raw effect size between the largest and smallest x
  • Cohen's D
  • Percentage change between the largest and smallest x
  • Fractional change between the largest and smallest x

Probability that a change in x1 is associated with greater than a 50% increase in y at various levels of x2. Clearly, in order to explore this inference, we must first express the change in y as a percentage. This in turn requires us to calculate start and end points from which to calculate the magnitude of the effect (amount of increase in y) as well as the percentage decline. Hence, we start by predicting the distribution of y at the lowest and highest values of x1 at five levels of x2 (representing two standard deviations below the cx2 mean, one standard deviation below the cx2 mean, the cx2 mean, one standard deviation above the cx2 mean and 2 standard deviations above the cx2 mean.

For this exercise we will only use the multiplicative model. Needless to say, the process would be very similar for the additive model.

mcmc = as.matrix(data.brms.mult)
newdata = expand.grid(cx1 = c(min(data$cx1), max(data$cx1)), cx2 = (-2:2) *
    sd(data$cx1))
Xmat = model.matrix(~cx1 * cx2, newdata)
coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2", "b_cx1:cx2")]
fit = coefs %*% t(Xmat)
s1 = seq(1, 9, b = 2)
s2 = seq(2, 10, b = 2)
## Raw effect size
(RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error   conf.low conf.high
1    2 1.378991 0.9190167 -0.3987054  3.229530
2    4 2.128911 0.6422492  0.7482612  3.300124
3    6 2.878832 0.4852747  1.8699665  3.798013
4    8 3.628753 0.5600127  2.5331742  4.766964
5   10 4.378673 0.8042712  2.8863192  6.043824
## Cohen's D
cohenD = (fit[, s2] - fit[, s1])/sqrt(mcmc[, "sigma2"])
Error in mcmc[, "sigma2"]: subscript out of bounds
(cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error   conf.low conf.high
1    2 1.241077 0.7956905 -0.2242797  2.896785
2    4 1.917148 0.5758933  0.8032340  3.056407
3    6 2.593218 0.4784125  1.6946060  3.568473
4    8 3.269289 0.5699119  2.1779064  4.378729
5   10 3.945360 0.7870300  2.4733235  5.523548
# Percentage change (relative to Group A)
ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
(PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
  term  estimate std.error  conf.low conf.high
1    2  70.11175  56.09519 -28.50460  177.9270
2    4  98.42547  39.60222  26.48813  179.2685
3    6 130.18361  36.80878  62.53690  203.1577
4    8 166.87273  58.53950  68.10948  279.8138
5   10 213.35276 125.34446  58.92578  414.6204
# Probability that the effect is greater than 50% (an increase of >50%)
(p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
        2         4         6         8        10 
0.5988889 0.9074074 0.9985185 1.0000000 0.9985185 
## fractional change
(FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
  term estimate std.error conf.low conf.high
1    2 1.701117 0.5609519 0.714954  2.779270
2    4 1.984255 0.3960222 1.264881  2.792685
3    6 2.301836 0.3680878 1.625369  3.031577
4    8 2.668727 0.5853950 1.681095  3.798138
5   10 3.133528 1.2534446 1.589258  5.146204

Conclusions:

  • On average, when x2 is equal to its mean, Y increases by 2.8788319 over the observed range of x1. We are 95% confident that the increase is between 1.8699665 and 3.7980128.
  • The Cohen's D associated change over the observed range of x1 is 2.5932182.
  • On average, Y increases by 130.184% over the observed range of x1 (at average x2). We are 95% confident that the increase is between 62.537% and 203.158%.
  • The probability that Y increases by more than 50% over the observed range of x1 (average x2) is 0.999.
  • On average, Y increases by a factor of 2.302% over the observed range of x1 (average x2). We are 95% confident that the decline is between a factor of 1.625% and 3.032%.

Posteriors

Finite Population Standard Deviations

Variance components, the amount of added variance attributed to each influence, are traditionally estimated for so called random effects. These are the effects for which the levels employed in the design are randomly selected to represent a broader range of possible levels. For such effects, effect sizes (differences between each level and a reference level) are of little value. Instead, the 'importance' of the variables are measured in units of variance components.

On the other hand, regular variance components for fixed factors (those whose measured levels represent the only levels of interest) are not logical - since variance components estimate variance as if the levels are randomly selected from a larger population. Nevertheless, in order to compare and contrast the scale of variability of both fixed and random factors, it is necessary to measure both on the same scale (sample or population based variance).

Finite-population variance components (Gelman, 2005) assume that the levels of all factors (fixed and random) in the design are all the possible levels available. In other words, they are assumed to represent finite populations of levels. Sample (rather than population) statistics are then used to calculate these finite-population variances (or standard deviations).

Since standard deviation (and variance) are bound at zero, standard deviation posteriors are typically non-normal. Consequently, medians and HPD intervals are more robust estimates.

library(broom)
mcmc = data.mcmcpack.mult
Xmat = model.matrix(~cx1 * cx2, data = data)
sd.x1 = abs(mcmc[, "cx1"]) * sd(Xmat[, "cx1"])
sd.x2 = abs(mcmc[, "cx2"]) * sd(Xmat[, "cx2"])
sd.x1x2 = abs(mcmc[, "cx1:cx2"]) * sd(Xmat[, "cx1:cx2"])
sd.x = sd.x1 + sd.x2 + sd.x1x2
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")]
fit = coefs %*% t(Xmat)
resid = sweep(fit, 2, data$y, "-")
sd.resid = apply(resid, 1, sd)

sd.all = cbind(sd.x1, sd.x2, sd.x1x2, sd.resid)
(fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
      term  estimate  std.error   conf.low conf.high
1    sd.x1 0.8385457 0.14298995 0.55422160 1.1104773
2    sd.x2 0.4476210 0.13949494 0.16383800 0.7171473
3  sd.x1x2 0.2463365 0.11067311 0.02945066 0.4554639
4 sd.resid 1.1106423 0.01421401 1.09355250 1.1387527
# OR expressed as a percentage
(fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
    conf.int = TRUE, conf.method = "HPDinterval"))
      term  estimate std.error  conf.low conf.high
1    sd.x1 31.816643  5.126155 21.433069  41.56264
2    sd.x2 16.976661  5.066772  6.574339  26.59877
3  sd.x1x2  9.300278  3.880973  1.426584  16.43793
4 sd.resid 41.898287  2.536467 37.537311  47.24753
## we can even plot this as a Bayesian ANOVA table
ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
    ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
    vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
    scale_x_discrete() + coord_flip() + theme_classic()
plot of chunk tut7.3bMCMCpackFinitePopulation

Conclusions: Approximately 58.1% of the total finite population standard deviation is due to x, x2 and their interaction.

mcmc = data.r2jags.mult$BUGSoutput$sims.matrix
Xmat = model.matrix(~cx1 * cx2, data = data)
sd.x1 = abs(mcmc[, "beta[1]"]) * sd(Xmat[, "cx1"])
sd.x2 = abs(mcmc[, "beta[2]"]) * sd(Xmat[, "cx2"])
sd.x1x2 = abs(mcmc[, "beta[3]"]) * sd(Xmat[, "cx1:cx2"])
sd.x = sd.x1 + sd.x2 + sd.x1x2
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
fit = coefs %*% t(Xmat)
resid = sweep(fit, 2, data$y, "-")
sd.resid = apply(resid, 1, sd)

sd.all = cbind(sd.x1, sd.x2, sd.x1x2, sd.resid)
(fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
      term  estimate  std.error    conf.low conf.high
1    sd.x1 0.8391576 0.14278852 0.562698059 1.1148498
2    sd.x2 0.4484404 0.14175853 0.174846530 0.7318690
3  sd.x1x2 0.2471354 0.11203956 0.005098644 0.4380983
4 sd.resid 1.1109865 0.01441035 1.093559848 1.1388625
# OR expressed as a percentage
(fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
    conf.int = TRUE, conf.method = "HPDinterval"))
      term estimate std.error   conf.low conf.high
1    sd.x1 31.78126  5.102172 21.6079476  41.43618
2    sd.x2 17.00764  5.135077  6.8692354  26.99571
3  sd.x1x2  9.29281  3.926346  0.7970709  16.21722
4 sd.resid 41.87848  2.540247 37.4649686  47.15805
## we can even plot this as a Bayesian ANOVA table
ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
    ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
    vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
    scale_x_discrete() + coord_flip() + theme_classic()
plot of chunk tut7.3bJAGSFinitePopulation

Conclusions: Approximately 58.1% of the total finite population standard deviation is due to x1, x2 and their interaction.

mcmc = as.matrix(data.rstan.mult)
Xmat = model.matrix(~cx1 * cx2, data = data)
sd.x1 = abs(mcmc[, "beta[1]"]) * sd(Xmat[, "cx1"])
sd.x2 = abs(mcmc[, "beta[2]"]) * sd(Xmat[, "cx2"])
sd.x1x2 = abs(mcmc[, "beta[3]"]) * sd(Xmat[, "cx1:cx2"])
sd.x = sd.x1 + sd.x2 + sd.x1x2
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
fit = coefs %*% t(Xmat)
resid = sweep(fit, 2, data$y, "-")
sd.resid = apply(resid, 1, sd)

sd.all = cbind(sd.x1, sd.x2, sd.x1x2, sd.resid)
(fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
      term  estimate std.error   conf.low conf.high
1    sd.x1 0.8430350 0.1457981 0.56943304 1.1379891
2    sd.x2 0.4455197 0.1435024 0.16435682 0.7257383
3  sd.x1x2 0.2486064 0.1111387 0.01755885 0.4479512
4 sd.resid 1.1110181 0.0144178 1.09365822 1.1400552
# OR expressed as a percentage
(fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
    conf.int = TRUE, conf.method = "HPDinterval"))
      term  estimate std.error   conf.low conf.high
1    sd.x1 32.023023  5.259373 21.2694513  41.69414
2    sd.x2 16.829913  5.180634  6.4358085  26.67465
3  sd.x1x2  9.386689  3.882299  0.9254482  16.13724
4 sd.resid 41.924201  2.522040 37.2404261  46.86077
## we can even plot this as a Bayesian ANOVA table
ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
    ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
    vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
    scale_x_discrete() + coord_flip() + theme_classic()
plot of chunk tut7.3bRSTANFinitePopulation

Conclusions: Approximately 58.2% of the total finite population standard deviation is due to x1, x2 and their interaction.

mcmc = as.matrix(data.rstanarm.mult)
Xmat = model.matrix(~cx1 * cx2, data = data)
sd.x1 = abs(mcmc[, "cx1"]) * sd(Xmat[, "cx1"])
sd.x2 = abs(mcmc[, "cx2"]) * sd(Xmat[, "cx2"])
sd.x1x2 = abs(mcmc[, "cx1:cx2"]) * sd(Xmat[, "cx1:cx2"])
sd.x = sd.x1 + sd.x2 + sd.x1x2
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")]
fit = coefs %*% t(Xmat)
resid = sweep(fit, 2, data$y, "-")
sd.resid = apply(resid, 1, sd)

sd.all = cbind(sd.x1, sd.x2, sd.x1x2, sd.resid)
(fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
      term  estimate  std.error   conf.low conf.high
1    sd.x1 0.8400482 0.14135878 0.55385543 1.1096057
2    sd.x2 0.4453660 0.13954895 0.17246030 0.7297465
3  sd.x1x2 0.2495511 0.11078771 0.03275807 0.4602081
4 sd.resid 1.1105389 0.01428812 1.09360418 1.1390562
# OR expressed as a percentage
(fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
    conf.int = TRUE, conf.method = "HPDinterval"))
      term  estimate std.error  conf.low conf.high
1    sd.x1 31.854366  5.075864 21.965836  41.59395
2    sd.x2 16.876390  5.018949  7.012028  26.68624
3  sd.x1x2  9.352654  3.874954  1.516877  16.38607
4 sd.resid 41.903312  2.489766 37.960454  47.38992
## we can even plot this as a Bayesian ANOVA table
ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
    ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
    vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
    scale_x_discrete() + coord_flip() + theme_classic()
plot of chunk tut7.3bRSTANARMFinitePopulation

Conclusions: Approximately 58.1% of the total finite population standard deviation is due to x1, x2 and their interaction.

mcmc = as.matrix(data.brms.mult)
Xmat = model.matrix(~cx1 * cx2, data = data)
sd.x1 = abs(mcmc[, "b_cx1"]) * sd(Xmat[, "cx1"])
sd.x2 = abs(mcmc[, "b_cx2"]) * sd(Xmat[, "cx2"])
sd.x1x2 = abs(mcmc[, "b_cx1:cx2"]) * sd(Xmat[, "cx1:cx2"])
sd.x = sd.x1 + sd.x2 + sd.x1x2
# generate a model matrix
newdata = data
Xmat = model.matrix(~cx1 * cx2, newdata)
## get median parameter estimates
coefs = mcmc[, c("(Intercept)", "b_cx1", "b_cx2", "b_cx1:cx2")]
Error in mcmc[, c("(Intercept)", "b_cx1", "b_cx2", "b_cx1:cx2")]: subscript out of bounds
fit = coefs %*% t(Xmat)
resid = sweep(fit, 2, data$y, "-")
sd.resid = apply(resid, 1, sd)

sd.all = cbind(sd.x1, sd.x2, sd.x1x2, sd.resid)
(fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
      term  estimate  std.error    conf.low conf.high
1    sd.x1 0.8371960 0.14112323 0.543806866 1.1045040
2    sd.x2 0.4515870 0.14210529 0.198571997 0.7442691
3  sd.x1x2 0.2458671 0.11272300 0.003856768 0.4310691
4 sd.resid 1.1105389 0.01428812 1.093604181 1.1390562
# OR expressed as a percentage
(fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
    conf.int = TRUE, conf.method = "HPDinterval"))
      term  estimate std.error   conf.low conf.high
1    sd.x1 31.725295  5.188656 21.6538435  41.74478
2    sd.x2 16.987658  5.079056  7.2092581  26.68051
3  sd.x1x2  9.270729  3.962796  0.3883622  15.83973
4 sd.resid 42.019970  2.593114 37.2740427  47.27606
## we can even plot this as a Bayesian ANOVA table
ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
    ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
    vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
    scale_x_discrete() + coord_flip() + theme_classic()
plot of chunk tut7.3bBRMSFinitePopulation

Conclusions: Approximately 58.0% of the total finite population standard deviation is due to x1, x2 and their interaction.

$R^2$

In a frequentist context, the $R^2$ value is seen as a useful indicator of goodness of fit. Whilst it has long been acknowledged that this measure is not appropriate for comparing models (for such purposes information criterion such as AIC are more appropriate), it is nevertheless useful for estimating the amount (percent) of variance explained by the model.

In a frequentist context, $R^2$ is calculated as the variance in predicted values divided by the variance in the observed (response) values. Unfortunately, this classical formulation does not translate simply into a Bayesian context since the equivalently calculated numerator can be larger than the an equivalently calculated denominator - thereby resulting in an $R^2$ greater than 100%. Gelman, Goodrich, Gabry, and Ali (2017) proposed an alternative formulation in which the denominator comprises the sum of the explained variance and the variance of the residuals.

So in the standard regression model notation of: $$ \begin{align} y_i \sim{}& N(\mu_i, \sigma)\\ \mu_i =& \mathbf{X}\boldsymbol{\beta} \end{align} $$ The $R^2$ could be formulated as: $$ R^2 = \frac{\sigma^2_f}{\sigma^2_f + \sigma^2_e} $$ where $\sigma^2_f = var(\mu)$, ($\mu = \mathbf{X}\boldsymbol{\beta})$) and for Gaussian models $\sigma^2_e = var(y-\mu)$

library(broom)
mcmc <- data.mcmcpack.mult
Xmat = model.matrix(~cx1 * cx2, data)
coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")]
fit = coefs %*% t(Xmat)
resid = sweep(fit, 2, data$y, "-")
var_f = apply(fit, 1, var)
var_e = apply(resid, 1, var)
R2 = var_f/(var_f + var_e)
tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
  term  estimate  std.error  conf.low conf.high
1 var1 0.5514478 0.04547871 0.4617169 0.6348432
# for comparison with frequentist
summary(lm(y ~ cx1 * cx2, data))
Call:
lm(formula = y ~ cx1 * cx2, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.34877 -0.85435  0.06905  0.71265  2.57068 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.6710     0.1315  27.924  < 2e-16 ***
cx1           2.9292     0.4914   5.961 4.15e-08 ***
cx2           1.3445     0.4207   3.196  0.00189 ** 
cx1:cx2       2.6651     1.2305   2.166  0.03281 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.111 on 96 degrees of freedom
Multiple R-squared:  0.5577,	Adjusted R-squared:  0.5439 
F-statistic: 40.35 on 3 and 96 DF,  p-value: < 2.2e-16
library(broom)
mcmc <- data.r2jags.mult$BUGSoutput$sims.matrix
Xmat = model.matrix(~cx1 * cx2, data)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
fit = coefs %*% t(Xmat)
resid = sweep(fit, 2, data$y, "-")
var_f = apply(fit, 1, var)
var_e = apply(resid, 1, var)
R2 = var_f/(var_f + var_e)
tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
  term estimate  std.error conf.low conf.high
1 var1  0.55194 0.04571563 0.460811 0.6348946
# for comparison with frequentist
summary(lm(y ~ cx1 * cx2, data))
Call:
lm(formula = y ~ cx1 * cx2, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.34877 -0.85435  0.06905  0.71265  2.57068 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.6710     0.1315  27.924  < 2e-16 ***
cx1           2.9292     0.4914   5.961 4.15e-08 ***
cx2           1.3445     0.4207   3.196  0.00189 ** 
cx1:cx2       2.6651     1.2305   2.166  0.03281 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.111 on 96 degrees of freedom
Multiple R-squared:  0.5577,	Adjusted R-squared:  0.5439 
F-statistic: 40.35 on 3 and 96 DF,  p-value: < 2.2e-16
library(broom)
mcmc <- as.matrix(data.rstan.mult)
Xmat = model.matrix(~cx1 * cx2, data)
coefs = mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
fit = coefs %*% t(Xmat)
resid = sweep(fit, 2, data$y, "-")
var_f = apply(fit, 1, var)
var_e = apply(resid, 1, var)
R2 = var_f/(var_f + var_e)
tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
  term  estimate  std.error  conf.low conf.high
1 var1 0.5527234 0.04505697 0.4577562 0.6298844
# for comparison with frequentist
summary(lm(y ~ cx1 * cx2, data))
Call:
lm(formula = y ~ cx1 * cx2, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.34877 -0.85435  0.06905  0.71265  2.57068 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.6710     0.1315  27.924  < 2e-16 ***
cx1           2.9292     0.4914   5.961 4.15e-08 ***
cx2           1.3445     0.4207   3.196  0.00189 ** 
cx1:cx2       2.6651     1.2305   2.166  0.03281 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.111 on 96 degrees of freedom
Multiple R-squared:  0.5577,	Adjusted R-squared:  0.5439 
F-statistic: 40.35 on 3 and 96 DF,  p-value: < 2.2e-16
library(broom)
mcmc <- as.matrix(data.rstanarm.mult)
Xmat = model.matrix(~cx1 * cx2, data)
coefs = mcmc[, c("(Intercept)", "cx1", "cx2", "cx1:cx2")]
fit = coefs %*% t(Xmat)
resid = sweep(fit, 2, data$y, "-")
var_f = apply(fit, 1, var)
var_e = apply(resid, 1, var)
R2 = var_f/(var_f + var_e)
tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
  term  estimate  std.error conf.low conf.high
1 var1 0.5517803 0.04514623 0.462438 0.6348679
# for comparison with frequentist
summary(lm(y ~ cx1 * cx2, data))
Call:
lm(formula = y ~ cx1 * cx2, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.34877 -0.85435  0.06905  0.71265  2.57068 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.6710     0.1315  27.924  < 2e-16 ***
cx1           2.9292     0.4914   5.961 4.15e-08 ***
cx2           1.3445     0.4207   3.196  0.00189 ** 
cx1:cx2       2.6651     1.2305   2.166  0.03281 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.111 on 96 degrees of freedom
Multiple R-squared:  0.5577,	Adjusted R-squared:  0.5439 
F-statistic: 40.35 on 3 and 96 DF,  p-value: < 2.2e-16
library(broom)
mcmc <- as.matrix(data.brms.mult)
Xmat = model.matrix(~cx1 * cx2, data)
coefs = mcmc[, c("b_Intercept", "b_cx1", "b_cx2", "b_cx1:cx2")]
fit = coefs %*% t(Xmat)
resid = sweep(fit, 2, data$y, "-")
var_f = apply(fit, 1, var)
var_e = apply(resid, 1, var)
R2 = var_f/(var_f + var_e)
tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
  term  estimate  std.error  conf.low conf.high
1 var1 0.5519288 0.04630979 0.4565537 0.6331142
# for comparison with frequentist
summary(lm(y ~ cx1 * cx2, data))
Call:
lm(formula = y ~ cx1 * cx2, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.34877 -0.85435  0.06905  0.71265  2.57068 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.6710     0.1315  27.924  < 2e-16 ***
cx1           2.9292     0.4914   5.961 4.15e-08 ***
cx2           1.3445     0.4207   3.196  0.00189 ** 
cx1:cx2       2.6651     1.2305   2.166  0.03281 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.111 on 96 degrees of freedom
Multiple R-squared:  0.5577,	Adjusted R-squared:  0.5439 
F-statistic: 40.35 on 3 and 96 DF,  p-value: < 2.2e-16

Bayesian model selection (Sparsity)

A statistical model is by definition a low-dimensional (=over simplification) representation of what is really likely to be a very complex system. As a result, no model is right. Some models however can provide useful insights into some of the processes operating on the system.

Frequentist statistics have various methods (Model selection, dredging, lasso, cross validation) for selecting parsimonious models. These are models that provide a good comprimise between minimizing unexplained patterns and minimizing model complexity. The basic premise is that since no model can hope to capture the full complexity of a system with all its subtleties, only the very major patterns can be estimated. Overly complex models are likely to be representing artificial complexity present only in the specific observed data (not the general population).

The Bayesian approach is to apply priors to the non-variance parameters such that parameters close to zero are further shrunk towards zero whilst priors on parameters further away from zero are less effected. The most popular form of prior for sparsity is the horseshoe prior, so called because the shape of a component of this prior resembles a horseshoe (with most of the mass either close to 0 or close to 1).

Rather than apply weakly informative Gaussian priors on parameters as: $$ \beta_j \sim{} N(0, \sigma^2) $$ the horseshoe prior is defined as: $$ \begin{align} \beta_j &\sim{} N(0, \tau^2\lambda_j^2)\\ \tau &\sim{} cauchy(0,1)\\ \lambda_j &\sim{} cauchy(0,1),\hspace{0.5cm}j=1,...,D\\ \end{align} $$ where $D$ is the number of (non-intercept or variance) parameters. $\tau$ represents the global scale that weights or shrinks all parameters towards zero and $\lambda_j$ are thick tailed local scales that allow some of the $j$ parameters to escape shrinkage.

More recently, Piironen and Vehtari (2017) have argued that whilst the above horseshoe priors do guarantee that strong effects (parameters) will not be over-shrunk, there is the potential for weekly identified effects (those based on relatively little data) to be misrepresented in the posteriors. As an alternative Piironen and Vehtari (2017) advocated the use of regularized horseshoe priors in which the amount of shrinkage applied to the largest effects can be controlled.

$$ \begin{align} \beta_j &\sim{} N(0, \tau^2\tilde{\lambda_j}^2)\\ \tau &\sim{} cauchy(0,1)\\ \tilde{\lambda_j}^2 &=\frac{c^2\lambda_j^2}{c^2+\tau^2\lambda_j^2}\\ \lambda_j &\sim{} cauchy(0,1),\hspace{0.5cm}j=1,...,D\\ \end{align} $$ where $c$ (slab width, actually variance) is a constant. For small effects (when $\tau^2\lambda_j^2 \ll c^2$), the prior approaches a regular prior. However, for large effects (when $\tau^2\lambda_j^2 \gg c^2$), the prior approaches $N(0,c^2)$.

.. recommend applying a inverse-gamma prior on $c^2$ $$ \begin{align} c^2 &\sim{} Inv-Gamma(\alpha,\beta)\\ \alpha &=v/2\\ \beta &=vs^2/2\\ \end{align} $$

Not available for MCMCpack

Not available for JAGS

The following code is a slight modification of the code presented in Appendix C.1 of Piironen and Vehtari (2017).

						 data {
						 int < lower =0 > n; # number of observations
						 int < lower =0 > nX; # number of predictors
						 vector [ n] Y; # outputs
						 matrix [n ,nX] X; # inputs
						 real < lower =0 > scale_icept ; # prior std for the intercept
						 real < lower =0 > scale_global ; # scale for the half -t prior for tau
						 real < lower =1 > nu_global ; # degrees of freedom for the half -t priors for tau
						 real < lower =1 > nu_local ; # degrees of freedom for the half - t priors for lambdas
						 real < lower =0 > slab_scale ; # slab scale for the regularized horseshoe
						 real < lower =0 > slab_df ; # slab degrees of freedom for the regularized horseshoe
						 }
						 transformed data {
						 matrix[n, nX - 1] Xc;  // centered version of X 
						 vector[nX - 1] means_X;  // column means of X before centering 
						 for (i in 2:nX) { 
						 means_X[i - 1] = mean(X[, i]); 
						 Xc[, i - 1] = X[, i] - means_X[i - 1]; 
						 }  
						 }
						 parameters {
						 real logsigma ;
						 real cbeta0 ;
						 vector [ nX-1] z;
						 real < lower =0 > tau ; # global shrinkage parameter
						 vector < lower =0 >[ nX-1] lambda ; # local shrinkage parameter
						 real < lower =0 > caux ;
						 }
						 transformed parameters {
						 real < lower =0 > sigma ; # noise std
						 vector < lower =0 >[ nX-1] lambda_tilde ; # ’ truncated ’ local shrinkage parameter
						 real < lower =0 > c; # slab scale
						 vector [ nX-1] beta ; # regression coefficients
						 vector [ n] mu; # latent function values
						 sigma = exp ( logsigma );
						 c = slab_scale * sqrt ( caux );
						 lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) );
						 beta = z .* lambda_tilde * tau ;
						 mu = cbeta0 + Xc* beta ;
						 }
						 model {
						 # half -t priors for lambdas and tau , and inverse - gamma for c ^2
						 z ~ normal (0 , 1);
						 lambda ~ student_t ( nu_local , 0, 1);
						 tau ~ student_t ( nu_global , 0 , scale_global * sigma );
						 caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );
						 cbeta0 ~ normal (0 , scale_icept );
						 Y ~ normal (mu , sigma );
						 }
						 generated quantities { 
						 real beta0;  // population-level intercept 
						 vector[n] log_lik;
						 beta0 = cbeta0 - dot_product(means_X, beta);
						 for (i in 1:n) {
						 log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma);
						 }
						 }
X = model.matrix(~cx1 + cx2, data = data)
data.list <- with(data, list(Y = y, X = X, nX = ncol(X), n = nrow(data),
    scale_icept = 100, scale_global = 1, nu_global = 1, nu_local = 1, slab_scale = 2,
    slab_df = 4))

data.rstan.sparsity <- stan(data = data.list, model_code = modelString,
    chains = 3, iter = 4000, warmup = 2000, thin = 3, save_dso = TRUE)
In file included from /usr/local/lib/R/site-library/BH/include/boost/config.hpp:39:0,
                 from /usr/local/lib/R/site-library/BH/include/boost/math/tools/config.hpp:13,
                 from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/var.hpp:7,
                 from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5,
                 from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:12,
                 from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4,
                 from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4,
                 from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4,
                 from file4d7250064089.cpp:8:
/usr/local/lib/R/site-library/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: "BOOST_NO_CXX11_RVALUE_REFERENCES" redefined
 #  define BOOST_NO_CXX11_RVALUE_REFERENCES
 ^
<command-line>:0:0: note: this is the location of the previous definition

SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 1).

Gradient evaluation took 3.4e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds.
Adjust your expectations accordingly!


Iteration:    1 / 4000 [  0%]  (Warmup)
Iteration:  400 / 4000 [ 10%]  (Warmup)
Iteration:  800 / 4000 [ 20%]  (Warmup)
Iteration: 1200 / 4000 [ 30%]  (Warmup)
Iteration: 1600 / 4000 [ 40%]  (Warmup)
Iteration: 2000 / 4000 [ 50%]  (Warmup)
Iteration: 2001 / 4000 [ 50%]  (Sampling)
Iteration: 2400 / 4000 [ 60%]  (Sampling)
Iteration: 2800 / 4000 [ 70%]  (Sampling)
Iteration: 3200 / 4000 [ 80%]  (Sampling)
Iteration: 3600 / 4000 [ 90%]  (Sampling)
Iteration: 4000 / 4000 [100%]  (Sampling)

 Elapsed Time: 0.421584 seconds (Warm-up)
               0.418319 seconds (Sampling)
               0.839903 seconds (Total)


SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 2).

Gradient evaluation took 1.7e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
Adjust your expectations accordingly!


Iteration:    1 / 4000 [  0%]  (Warmup)
Iteration:  400 / 4000 [ 10%]  (Warmup)
Iteration:  800 / 4000 [ 20%]  (Warmup)
Iteration: 1200 / 4000 [ 30%]  (Warmup)
Iteration: 1600 / 4000 [ 40%]  (Warmup)
Iteration: 2000 / 4000 [ 50%]  (Warmup)
Iteration: 2001 / 4000 [ 50%]  (Sampling)
Iteration: 2400 / 4000 [ 60%]  (Sampling)
Iteration: 2800 / 4000 [ 70%]  (Sampling)
Iteration: 3200 / 4000 [ 80%]  (Sampling)
Iteration: 3600 / 4000 [ 90%]  (Sampling)
Iteration: 4000 / 4000 [100%]  (Sampling)

 Elapsed Time: 0.532437 seconds (Warm-up)
               0.429048 seconds (Sampling)
               0.961485 seconds (Total)


SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 3).

Gradient evaluation took 1.7e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
Adjust your expectations accordingly!


Iteration:    1 / 4000 [  0%]  (Warmup)
Iteration:  400 / 4000 [ 10%]  (Warmup)
Iteration:  800 / 4000 [ 20%]  (Warmup)
Iteration: 1200 / 4000 [ 30%]  (Warmup)
Iteration: 1600 / 4000 [ 40%]  (Warmup)
Iteration: 2000 / 4000 [ 50%]  (Warmup)
Iteration: 2001 / 4000 [ 50%]  (Sampling)
Iteration: 2400 / 4000 [ 60%]  (Sampling)
Iteration: 2800 / 4000 [ 70%]  (Sampling)
Iteration: 3200 / 4000 [ 80%]  (Sampling)
Iteration: 3600 / 4000 [ 90%]  (Sampling)
Iteration: 4000 / 4000 [100%]  (Sampling)

 Elapsed Time: 0.434669 seconds (Warm-up)
               0.388257 seconds (Sampling)
               0.822926 seconds (Total)
tidyMCMC(data.rstan.sparsity, pars = c("beta[1]", "beta[2]"), conf.int = TRUE,
    conf.type = "HPDinterval", rhat = TRUE, ess = TRUE)
     term estimate std.error  conf.low conf.high      rhat  ess
1 beta[1] 2.895646 0.5093930 1.9116555  3.920760 1.0001824 2001
2 beta[2] 1.375823 0.4562855 0.4740256  2.260056 0.9992755 1865
library(bayesplot)
mcmc_areas(as.matrix(data.rstan.sparsity), regex_par = "beta")
plot of chunk tut7.3bRSTANSparsity.2

Obviously, these data are not really appropriate for model selection as there are only two predictors. Both predictors have substantial effects mass larger than zero.

The RSTANARM implementation of horseshoe priors follows closely the recommendations of Piironen and Vehtari (2017). In particular, the global scale parameter is provided as a ratio of expected number of non-zero coefficients to expected number of zero coefficients.

n = nrow(data)
nX = 2
p0 = 1
global_scale = p0/(nX - p0)/sqrt(n)
data.rstanarm.sparsity = stan_glm(y ~ cx1 + cx2, data = data, iter = 2000,
    warmup = 200, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0,
        100), prior = hs(df = 1, global_df = 1, global_scale = global_scale),
    prior_aux = cauchy(0, 2))
Gradient evaluation took 6e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.6 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.759242 seconds (Warm-up)
               3.96208 seconds (Sampling)
               4.72132 seconds (Total)


Gradient evaluation took 2e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.656009 seconds (Warm-up)
               3.67586 seconds (Sampling)
               4.33187 seconds (Total)


Gradient evaluation took 1.8e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.743072 seconds (Warm-up)
               2.80689 seconds (Sampling)
               3.54996 seconds (Total)
print(data.rstanarm.sparsity)
stan_glm
 family:  gaussian [identity]
 formula: y ~ cx1 + cx2
------

Estimates:
            Median MAD_SD
(Intercept) 3.8    0.1   
cx1         3.0    0.6   
cx2         1.2    0.5   
sigma       1.1    0.1   

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 3.8    0.2   

------
For info on the priors used see help('prior_summary.stanreg').
tidyMCMC(data.rstanarm.sparsity$stanfit, conf.int = TRUE, conf.method = "HPDinterval",
    rhat = TRUE, ess = TRUE)
           term    estimate  std.error     conf.low   conf.high      rhat  ess
1   (Intercept)    3.824113 0.11437874    3.5946995    4.042791 0.9993167 2673
2           cx1    3.053599 0.55226650    1.9556994    4.081797 1.0007825 2423
3           cx2    1.209863 0.49005692    0.1433270    2.096745 1.0024060 2196
4         sigma    1.154752 0.08662372    0.9797404    1.310512 0.9995021 2389
5      mean_PPD    3.826130 0.16242530    3.4951206    4.121557 0.9996932 2652
6 log-posterior -179.717342 2.43657813 -184.8057338 -175.828604 1.0013223 1438
library(bayesplot)
mcmc_areas(as.matrix(data.rstanarm.sparsity), regex_par = "cx")
plot of chunk tut7.3bRSTANARMSparsity

Obviously, these data are not really appropriate for model selection as there are only two predictors. Both predictors have substantial effects mass larger than zero.

The BRMS implementation of horseshoe priors follows closely the recommendations of Piironen and Vehtari (2017). In particular, the global scale parameter is provided as a ratio of expected number of non-zero coefficients to expected number of zero coefficients.

nX = 2
p0 = 1
par_ratio = p0/(nX - p0)
data.brms.sparsity = brm(y ~ cx1 + cx2, data = data, iter = 2000, warmup = 200,
    chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 100),
        class = "Intercept"), prior(horseshoe(df = 1, par_ratio = par_ratio),
        class = "b"), prior(cauchy(0, 5), class = "sigma")))
Gradient evaluation took 3.7e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.37 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.116006 seconds (Warm-up)
               0.635486 seconds (Sampling)
               0.751492 seconds (Total)


Gradient evaluation took 1.7e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.122749 seconds (Warm-up)
               0.606301 seconds (Sampling)
               0.72905 seconds (Total)


Gradient evaluation took 1.9e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
Adjust your expectations accordingly!



 Elapsed Time: 0.12116 seconds (Warm-up)
               0.50536 seconds (Sampling)
               0.62652 seconds (Total)
print(data.brms.sparsity)
 Family: gaussian(identity) 
Formula: y ~ cx1 + cx2 
   Data: data (Number of observations: 100) 
Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; 
         total post-warmup samples = 2700
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     3.82      0.11     3.60     4.05       2419    1
cx1           2.96      0.53     1.95     4.03       2437    1
cx2           1.31      0.47     0.29     2.17       1815    1

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma     1.15      0.08        1     1.33       2301    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
tidyMCMC(data.brms.sparsity$fit, conf.int = TRUE, conf.method = "HPDinterval",
    rhat = TRUE, ess = TRUE)
         term estimate  std.error  conf.low conf.high      rhat  ess
1 b_Intercept 3.818970 0.11359657 3.6086274  4.052044 0.9995747 2419
2       b_cx1 2.958484 0.52869532 1.9568866  4.031718 0.9999373 2437
3       b_cx2 1.305360 0.46931653 0.3701380  2.200090 1.0011298 1815
4       sigma 1.152482 0.08273292 0.9900069  1.317621 1.0017566 2301
5       hs_c2 2.183763 2.35770828 0.2967781  6.056975 1.0020866 1340
library(bayesplot)
mcmc_areas(as.matrix(data.brms.sparsity), regex_par = "cx")
plot of chunk tut7.3bBRMSSparsity

Obviously, these data are not really appropriate for model selection as there are only two predictors. Both predictors have substantial effects mass larger than zero.

References

Gelman, A. (2005). “Analysis of Variance - Why it is More Important Than Ever”. In: The Annals of Statistics 33.1, pp. 1–53.

Gelman, A., B. Goodrich, J. Gabry, et al. (2017). “R-squared for Bayesian regression models”.

Piironen, J. and A. Vehtari (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. URL: http://arxiv.org/abs/1707.01694.

Vehtari, A, A. Gelman and J. Gabry (2016b). “Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC”. In: Statistics and Computing.




Worked Examples

Multiple and non-linear regression references
  • Logan (2010) - Chpt 9
  • Quinn & Keough (2002) - Chpt 6

Multiple Linear Regression

Paruelo & Lauenroth (1996) analyzed the geographic distribution and the effects of climate variables on the relative abundance of a number of plant functional types (PFT's) including shrubs, forbs, succulents (e.g. cacti), C3 grasses and C4 grasses. They used data from 73 sites across temperate central North America (see pareulo.syd) and calculated the relative abundance of C3 grasses at each site as a response variable

Download Paruelo data set
Format of paruelo.csv data file
C3LATLONGMAPJJAMAPDJFMAP
............

C3Relative abundance of C3 grasses at each site - response variable
LATLatitude in centesimal degrees - predictor variable
LONGLongitude in centesimal degrees - predictor variable
MAPMean annual precipitation (mm) - predictor variable
MATMean annual temperature (0C) - predictor variable
JJAMAPProportion of MAP that fell in June, July and August - predictor variable
DJFMAPProportion of MAP that fell in December, January and Febrary - predictor variable
Saltmarsh

Open
the paruelo data file. HINT.
paruelo <- read.table("../downloads/data/paruelo.csv", header = T, sep = ",", strip.white = T)
head(paruelo)
    C3   LAT   LONG MAP  MAT JJAMAP DJFMAP
1 0.65 46.40 119.55 199 12.4   0.12   0.45
2 0.65 47.32 114.27 469  7.5   0.24   0.29
3 0.76 45.78 110.78 536  7.2   0.24   0.20
4 0.75 43.95 101.87 476  8.2   0.35   0.15
5 0.33 46.90 102.82 484  4.8   0.40   0.14
6 0.03 38.87  99.38 623 12.0   0.40   0.11
  1. Perform exploratory data analysis to help guide what sort of analysis will be suitable and whether the various assumptions are likely to be met.
    # via car's scatterplotMatrix function
    library(car)
    scatterplotMatrix(~C3 + LAT + LONG + MAP + MAT + JJAMAP + DJFMAP, data = paruelo,
        diagonal = "boxplot")
    
    plot of chunk tut7.3bQ1.1
    # via lattice
    library(lattice)
    splom.lat <- splom(paruelo, type = c("p", "r"))
    print(splom.lat)
    
    plot of chunk tut7.3bQ1.1
    # via ggplot2 - warning these are slow!
    library(GGally)
    ggpairs(paruelo, lower = list(continuous = "smooth"), diag = list(continuous = "density"),
        axisLabels = "none")
    
    plot of chunk tut7.3bQ1.1
    # splom.gg <- plotmatrix(paruelo)+geom_smooth(method='lm')
    # print(splom.gg)
    
  2. C3 abundance is clearly non-normal. Since C3 abundance is relative abundance (which logically must range from 0 to 1), arguably, the most appropriate approach would be to model these data with a binomial (or perhaps beta) distribution. Indeed, this is the approach that we will take in Tutorial 10.4 and Tutorial 10.5a A more simplistic approach that can be applied within simple OLS regression, is to attempt to normalize the response variable via a scale transformation.

    Since the C3 relative abundances have values of zero, the authors elected to perform a square-root transformation. Generally speaking, this can be a very dangerous course of action if back-transformations from the fitted model are required due to the nature of squaring sets of numbers that are a mixture of negatives and positives or even less than 1 and greater than 1.

    This example therefore potentially serves as a good example of the dangers of root transformations. Try applying a temporary square root transformation (HINT). Does this improve some of these specific assumptions (y or n)?

    Whilst in many model fitting and graphing routines are able to perform transformation inline, for more complex examples, it is often advisable to also create transformed versions of variables.

    # via car's scatterplotMatrix function
    library(car)
    scatterplotMatrix(~sqrt(C3) + LAT + LONG + MAP + MAT + JJAMAP + log10(DJFMAP),
        data = paruelo, diagonal = "boxplot")
    
    plot of chunk tut7.3bQ1.1a
    # via ggplot2 - warning these are slow!
    library(GGally)
    library(dplyr)
    paruelo = paruelo %>% mutate(sqrtC3 = sqrt(C3), lDJFMAP = log10(DJFMAP))
    paruelo %>% dplyr:::select(sqrtC3, LAT, LONG, MAP, MAT, lDJFMAP) %>% ggpairs(lower = list(continuous = "smooth"),
        diag = list(continuous = "density"), axisLabels = "none")
    
    plot of chunk tut7.3bQ1.1a
  3. The scatterplot matrices suggest that some of the predictors might be correlated to one another. Although the above diagnostics are useful at identifying potential (Multi)collinearity issues, they do not examine collinearity directly. (Multi)collinearity can more be diagnosed directly via tolerance and variance inflation factor (VIF) measures.

    1. Calculate the VIF values for each of the predictor variables (note, this is typically done in a frequentist framework to save time HINT).
    2. Calculate the tolerance values for each of the predictor variables (HINT).
      library(car)
      vif(lm(sqrt(C3) ~ LAT + LONG + MAP + MAT + JJAMAP + log10(DJFMAP), data = paruelo))
      
                LAT          LONG           MAP           MAT        JJAMAP log10(DJFMAP) 
           3.560103      4.988318      2.794157      3.752353      3.194724      5.467330 
      
      library(car)
      1/vif(lm(sqrt(C3) ~ LAT + LONG + MAP + MAT + JJAMAP + log10(DJFMAP), data = paruelo))
      
                LAT          LONG           MAP           MAT        JJAMAP log10(DJFMAP) 
          0.2808908     0.2004684     0.3578897     0.2664995     0.3130161     0.1829046 
      
    3. Obviously, this model will violate collinearity. It is highly likely that LAT and LONG will be related to the LAT:LONG interaction term. It turns out that if we center the variables, then the individual terms will no longer be correlated to the interaction. Center the LAT and LONG variables ( HINT) and (HINT)
      paruelo = paruelo %>% mutate(cLAT = as.vector(scale(paruelo$LAT, scale = F)),
          cLONG = as.vector(scale(paruelo$LONG, scale = F)))
      mean.LAT = mean(paruelo$LAT)
      mean.LONG = mean(paruelo$LONG)
      
  4. Fit the appropriate Bayesian model.
    library(MCMCpack)
    paruelo.mcmcpack = MCMCregress(sqrt(C3) ~ cLAT * cLONG, data = paruelo)
    
    modelString = "
      model {
      #Likelihood
      for (i in 1:n) {
      y[i]~dnorm(mu[i],tau)
      mu[i] <- beta0 + inprod(beta[],X[i,])
      }
      #Priors
      beta0 ~ dnorm(0.01,1.0E-6)
      for (j in 1:nX) {
      beta[j] ~ dnorm(0.01,1.0E-6)
      }
      tau <- 1 / (sigma * sigma)
      sigma~dunif(0,100)
      }
      "
    
    X = model.matrix(~cLAT * cLONG, data = paruelo)
    paruelo.list <- with(paruelo, list(y = sqrt(C3), X = X[, -1], nX = ncol(X) -
        1, n = nrow(paruelo)))
    
    params <- c("beta0", "beta", "sigma")
    burnInSteps = 3000
    nChains = 3
    numSavedSteps = 15000
    thinSteps = 10
    nIter = ceiling((numSavedSteps * thinSteps)/nChains)
    
    paruelo.r2jags <- jags(data = paruelo.list, inits = NULL, parameters.to.save = params,
        model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter,
        n.burnin = burnInSteps, n.thin = thinSteps)
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
    Graph information:
       Observed stochastic nodes: 73
       Unobserved stochastic nodes: 5
       Total graph size: 530
    
    Initializing model
    
    					  modelString=" 
    					  data { 
    					  int n;   // total number of observations 
    					  vector[n] Y;      // response variable 
    					  int nX;  // number of effects 
    					  matrix[n, nX] X;   // model matrix 
    					  } 
    					  transformed data { 
    					  matrix[n, nX - 1] Xc;  // centered version of X 
    					  vector[nX - 1] means_X;  // column means of X before centering 
    					  
    					  for (i in 2:nX) { 
    					  means_X[i - 1] = mean(X[, i]); 
    					  Xc[, i - 1] = X[, i] - means_X[i - 1]; 
    					  }  
    					  } 
    					  parameters { 
    					  vector[nX-1] beta;  // population-level effects 
    					  real cbeta0;  // center-scale intercept 
    					  real sigma;  // residual SD 
    					  } 
    					  transformed parameters { 
    					  } 
    					  model { 
    					  vector[n] mu; 
    					  mu = Xc * beta + cbeta0; 
    					  // prior specifications 
    					  beta ~ normal(0, 10); 
    					  cbeta0 ~ normal(0, 10); 
    					  sigma ~ cauchy(0, 5); 
    					  // likelihood contribution 
    					  Y ~ normal(mu, sigma); 
    					  } 
    					  generated quantities { 
    					  real beta0;  // population-level intercept 
    					  vector[n] log_lik;
    					  beta0 = cbeta0 - dot_product(means_X, beta);
    					  for (i in 1:n) {
    					  log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma);
    					  }
    					  }
    					  "
    
    X = model.matrix(~cLAT * cLONG, data = paruelo)
    paruelo.list <- with(paruelo, list(Y = sqrt(C3), X = X, nX = ncol(X), n = nrow(paruelo)))
    
    library(rstan)
    paruelo.rstan <- stan(data = paruelo.list, model_code = modelString, chains = 3, iter = 5000, warmup = 500,
        thin = 2)
    
    In file included from /usr/local/lib/R/site-library/BH/include/boost/config.hpp:39:0,
                     from /usr/local/lib/R/site-library/BH/include/boost/math/tools/config.hpp:13,
                     from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/var.hpp:7,
                     from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core/gevv_vvv_vari.hpp:5,
                     from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/core.hpp:12,
                     from /usr/local/lib/R/site-library/StanHeaders/include/stan/math/rev/mat.hpp:4,
                     from /usr/local/lib/R/site-library/StanHeaders/include/stan/math.hpp:4,
                     from /usr/local/lib/R/site-library/StanHeaders/include/src/stan/model/model_header.hpp:4,
                     from file48e38d66a6e.cpp:8:
    /usr/local/lib/R/site-library/BH/include/boost/config/compiler/gcc.hpp:186:0: warning: "BOOST_NO_CXX11_RVALUE_REFERENCES" redefined
     #  define BOOST_NO_CXX11_RVALUE_REFERENCES
     ^
    <command-line>:0:0: note: this is the location of the previous definition
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 1).
    
    Gradient evaluation took 2.9e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration:  501 / 5000 [ 10%]  (Sampling)
    Iteration: 1000 / 5000 [ 20%]  (Sampling)
    Iteration: 1500 / 5000 [ 30%]  (Sampling)
    Iteration: 2000 / 5000 [ 40%]  (Sampling)
    Iteration: 2500 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.139951 seconds (Warm-up)
                   0.643681 seconds (Sampling)
                   0.783632 seconds (Total)
    
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 2).
    
    Gradient evaluation took 1.1e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration:  501 / 5000 [ 10%]  (Sampling)
    Iteration: 1000 / 5000 [ 20%]  (Sampling)
    Iteration: 1500 / 5000 [ 30%]  (Sampling)
    Iteration: 2000 / 5000 [ 40%]  (Sampling)
    Iteration: 2500 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.108938 seconds (Warm-up)
                   0.497058 seconds (Sampling)
                   0.605996 seconds (Total)
    
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 3).
    
    Gradient evaluation took 9e-06 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration:  501 / 5000 [ 10%]  (Sampling)
    Iteration: 1000 / 5000 [ 20%]  (Sampling)
    Iteration: 1500 / 5000 [ 30%]  (Sampling)
    Iteration: 2000 / 5000 [ 40%]  (Sampling)
    Iteration: 2500 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.099245 seconds (Warm-up)
                   0.667992 seconds (Sampling)
                   0.767237 seconds (Total)
    
    paruelo.rstanarm = stan_glm(sqrt(C3) ~ cLAT * cLONG, data = paruelo, iter = 5000,
        warmup = 500, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0,
            10), prior = normal(0, 10), prior_aux = cauchy(0, 5))
    
    Gradient evaluation took 3.9e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.39 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.043872 seconds (Warm-up)
                   0.263119 seconds (Sampling)
                   0.306991 seconds (Total)
    
    
    Gradient evaluation took 1.3e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.042429 seconds (Warm-up)
                   0.251179 seconds (Sampling)
                   0.293608 seconds (Total)
    
    
    Gradient evaluation took 1.5e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.045349 seconds (Warm-up)
                   0.273638 seconds (Sampling)
                   0.318987 seconds (Total)
    
    paruelo.brm = brm(sqrt(C3) ~ cLAT * cLONG, data = paruelo, iter = 5000,
        warmup = 500, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0,
            10), class = "Intercept"), prior(normal(0, 10), class = "b"), prior(cauchy(0,
            5), class = "sigma")))
    
    Gradient evaluation took 3.4e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.097905 seconds (Warm-up)
                   0.553687 seconds (Sampling)
                   0.651592 seconds (Total)
    
    
    Gradient evaluation took 9e-06 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.09545 seconds (Warm-up)
                   0.488293 seconds (Sampling)
                   0.583743 seconds (Total)
    
    
    Gradient evaluation took 1.9e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.099694 seconds (Warm-up)
                   0.486619 seconds (Sampling)
                   0.586313 seconds (Total)
    
  5. Explore MCMC diagnostics
    library(MCMCpack)
    plot(paruelo.mcmcpack)
    
    plot of chunk tut7.3bQ1.6a
    plot of chunk tut7.3bQ1.6a
    raftery.diag(paruelo.mcmcpack)
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                       
                 Burn-in  Total Lower bound  Dependence
                 (M)      (N)   (Nmin)       factor (I)
     (Intercept) 2        3834  3746         1.020     
     cLAT        2        3865  3746         1.030     
     cLONG       2        3741  3746         0.999     
     cLAT:cLONG  2        3741  3746         0.999     
     sigma2      2        3711  3746         0.991     
    
    autocorr.diag(paruelo.mcmcpack)
    
            (Intercept)         cLAT        cLONG    cLAT:cLONG        sigma2
    Lag 0   1.000000000  1.000000000  1.000000000  1.0000000000  1.0000000000
    Lag 1   0.004983757  0.003311830 -0.022298086 -0.0095335071  0.0453269793
    Lag 5   0.004865672 -0.003696004 -0.010582477 -0.0091698711 -0.0007370183
    Lag 10 -0.003554399 -0.006906004 -0.008250076  0.0070361094  0.0060836234
    Lag 50  0.006350222  0.002874780  0.003297339 -0.0005618509  0.0095939327
    
    library(R2jags)
    library(coda)
    paruelo.mcmc = as.mcmc(paruelo.r2jags)
    plot(paruelo.mcmc)
    
    plot of chunk tut7.3bQ1.6b
    plot of chunk tut7.3bQ1.6b
    raftery.diag(paruelo.mcmc)
    
    [[1]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                    
              Burn-in  Total Lower bound  Dependence
              (M)      (N)   (Nmin)       factor (I)
     beta0    20       39680 3746         10.6      
     beta[1]  20       39000 3746         10.4      
     beta[2]  10       37660 3746         10.1      
     beta[3]  10       37660 3746         10.1      
     deviance 10       37660 3746         10.1      
     sigma    20       38330 3746         10.2      
    
    
    [[2]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                    
              Burn-in  Total Lower bound  Dependence
              (M)      (N)   (Nmin)       factor (I)
     beta0    10       37660 3746         10.10     
     beta[1]  20       37020 3746          9.88     
     beta[2]  10       37660 3746         10.10     
     beta[3]  20       39680 3746         10.60     
     deviance 20       39680 3746         10.60     
     sigma    10       37670 3746         10.10     
    
    
    [[3]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                    
              Burn-in  Total Lower bound  Dependence
              (M)      (N)   (Nmin)       factor (I)
     beta0    20       36380 3746          9.71     
     beta[1]  20       37020 3746          9.88     
     beta[2]  10       37660 3746         10.10     
     beta[3]  20       38330 3746         10.20     
     deviance 20       38330 3746         10.20     
     sigma    10       37660 3746         10.10     
    
    autocorr.diag(paruelo.mcmc)
    
                   beta0      beta[1]      beta[2]      beta[3]     deviance        sigma
    Lag 0    1.000000000  1.000000000  1.000000000  1.000000000  1.000000000  1.000000000
    Lag 10   0.010786700  0.005332811 -0.009595787  0.002435309  0.009380278  0.006592760
    Lag 50   0.004340032 -0.002413859 -0.008135064 -0.005542328 -0.003773973 -0.002048602
    Lag 100  0.003071248 -0.003459770 -0.005733956  0.007969559  0.004984708  0.006517454
    Lag 500 -0.016464095 -0.017355263  0.004811424 -0.011833396 -0.004070388  0.017952813
    
    library(rstan)
    library(coda)
    s = as.array(paruelo.rstan)
    paruelo.mcmc <- do.call(mcmc.list, plyr:::alply(s[, , c("beta0", "beta[1]", "beta[2]", "sigma")], 2,
        as.mcmc))
    plot(paruelo.mcmc)
    
    plot of chunk tut7.3bQ1.6c
    raftery.diag(paruelo.mcmc)
    
    $`1`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    $`2`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    $`3`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    autocorr.diag(paruelo.mcmc)
    
                  beta0      beta[1]      beta[2]        sigma
    Lag 0   1.000000000  1.000000000  1.000000000  1.000000000
    Lag 1   0.115904479  0.006334995  0.007386692  0.112083619
    Lag 5  -0.039772609  0.006114161 -0.012930676 -0.015401901
    Lag 10  0.009567958 -0.004416943 -0.001078328  0.001545453
    Lag 50  0.024857904  0.027040761 -0.004307904 -0.026240662
    
    library(rstan)
    library(coda)
    stan_ac(paruelo.rstan, pars = c("beta", "sigma"))
    
    plot of chunk tut7.3bQ1.6c1
    stan_rhat(paruelo.rstan, pars = c("beta", "sigma"))
    
    plot of chunk tut7.3bQ1.6c1
    stan_ess(paruelo.rstan, pars = c("beta", "sigma"))
    
    plot of chunk tut7.3bQ1.6c1
    # using Bayeseplot
    library(bayesplot)
    detach("package:reshape")
    mcmc_trace(as.array(paruelo.rstan), regex_par = "beta|sigma")
    
    plot of chunk tut7.3bQ1.6c2
    mcmc_trace(as.array(paruelo.rstan), regex_pars = "beta|sigma")
    
    plot of chunk tut7.3bQ1.6c2
    mcmc_dens(as.array(paruelo.rstan), regex_par = "beta|sigma")
    
    plot of chunk tut7.3bQ1.6c2
    detach("package:reshape")
    library(bayesplot)
    mcmc_combo(as.array(paruelo.rstan), regex_par = "beta|sigma")
    
    plot of chunk tut7.3bQ1.6c3
    library(rstanarm)
    library(coda)
    s = as.array(paruelo.rstanarm)
    paruelo.mcmc <- do.call(mcmc.list, plyr:::alply(s[, , c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG",
        "sigma")], 2, as.mcmc))
    plot(paruelo.mcmc)
    
    plot of chunk tut7.3bQ1.6d
    plot of chunk tut7.3bQ1.6d
    raftery.diag(paruelo.mcmc)
    
    $`1`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    $`2`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    $`3`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    autocorr.diag(paruelo.mcmc)
    
            (Intercept)        cLAT        cLONG  cLAT:cLONG        sigma
    Lag 0   1.000000000  1.00000000  1.000000000  1.00000000  1.000000000
    Lag 1   0.017538076  0.03784572  0.039597169  0.01947417  0.013387926
    Lag 5  -0.022630503 -0.01672235 -0.006182728 -0.03032264 -0.002027127
    Lag 10  0.031736657  0.01612416  0.015318241 -0.01570669 -0.022527541
    Lag 50 -0.005001375  0.02067267 -0.015023796 -0.02282986  0.008570300
    
    library(rstanarm)
    library(coda)
    stan_ac(paruelo.rstanarm, pars = c("Intercept", "cL", "sigma"))
    
    Error in data.frame(value = unlist(x[[i]], use.names = FALSE), parameter = rep(names(x[[i]]), : arguments imply differing number of rows: 2249, 0, 1
    
    stan_rhat(paruelo.rstanarm, pars = c("Intercept", "cL", "sigma"))
    
    Error in check_pars(allpars, pars): no parameter Intercept, cL
    
    stan_ess(paruelo.rstanarm, pars = c("Intercept", "cL", "sigma"))
    
    Error in check_pars(allpars, pars): no parameter Intercept, cL
    
    # using Bayeseplot
    library(bayesplot)
    detach("package:reshape")
    mcmc_trace(as.array(paruelo.rstanarm), regex_par = "Intercept|cL|sigma")
    
    plot of chunk tut7.3bQ1.6d2
    mcmc_trace(as.array(paruelo.rstanarm), regex_pars = "Intercept|cL|sigma")
    
    plot of chunk tut7.3bQ1.6d2
    mcmc_dens(as.array(paruelo.rstanarm), regex_pars = "Intercept|cL|sigma")
    
    plot of chunk tut7.3bQ1.6d2
    detach("package:reshape")
    library(bayesplot)
    mcmc_combo(as.array(paruelo.rstanarm), regex_par = "Intercept|cL|sigma")
    
    plot of chunk tut7.3bQ1.6d3
    library(coda)
    library(brms)
    paruelo.mcmc = as.mcmc(paruelo.brm)
    plot(paruelo.mcmc)
    
    plot of chunk tut7.3bQ1.6e
    plot of chunk tut7.3bQ1.6e
    raftery.diag(paruelo.mcmc)
    
    [[1]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    [[2]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    [[3]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    autocorr.diag(paruelo.mcmc)
    
    Error in ts(x, start = start(x), end = end(x), deltat = thin(x)): invalid time series parameters specified
    
    library(coda)
    stan_ac(paruelo.brm$fit)
    
    plot of chunk tut7.2bQ1.3e1
    stan_rhat(paruelo.brm$fit)
    
    plot of chunk tut7.2bQ1.3e1
    stan_ess(paruelo.brm$fit)
    
    plot of chunk tut7.2bQ1.3e1
  6. Perform model validation
    library(MCMCpack)
    paruelo.mcmc = as.data.frame(paruelo.mcmcpack)
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = apply(paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")], 2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = sqrt(paruelo$C3) - fit
    ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
    
    plot of chunk tut7.3bQ1.7a1
    library(MCMCpack)
    paruelo.mcmc = as.data.frame(paruelo.mcmcpack)
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = apply(paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")], 2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = sqrt(paruelo$C3) - fit
    newdata = newdata %>% cbind(fit, resid)
    newdata.melt = newdata %>% gather(key = Pred, value = Value, cLAT:cLONG)
    ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7a2
    library(MCMCpack)
    paruelo.mcmc = as.data.frame(paruelo.mcmcpack)
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = apply(paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")], 2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = sqrt(paruelo$C3) - fit
    sresid = resid/sd(resid)
    ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
    
    plot of chunk tut7.3bQ1.7a3
    library(MCMCpack)
    paruelo.mcmc = as.data.frame(paruelo.mcmcpack)
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = as.matrix(paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")])
    fit = coefs %*% t(Xmat)
    ## draw samples from this model
    yRep = sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(paruelo), fit[i,
        ], sqrt(paruelo.mcmc[i, "sigma2"])))
    ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"),
        alpha = 0.5) + geom_density(data = paruelo, aes(x = sqrt(C3), fill = "Obs"),
        alpha = 0.5)
    
    plot of chunk tut7.3bQ1.7a4

    We can also explore the posteriors of each parameter.

    library(bayesplot)
    mcmc_intervals(as.matrix(paruelo.mcmcpack), regex_pars = "cL")
    
    plot of chunk tut7.3bQ1.7a5
    mcmc_areas(as.matrix(paruelo.mcmcpack), regex_pars = "cL")
    
    plot of chunk tut7.3bQ1.7a5
    library(R2jags)
    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = sqrt(paruelo$C3) - fit
    ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
    
    plot of chunk tut7.3bQ1.7b1
    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = sqrt(paruelo$C3) - fit
    newdata = newdata %>% cbind(fit, resid)
    newdata.melt = newdata %>% gather(key = Pred, value = Value, cLAT:cLONG)
    ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7b2
    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = sqrt(paruelo$C3) - fit
    sresid = resid/sd(resid)
    ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
    
    plot of chunk tut7.3bQ1.7b3
    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = as.matrix(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")])
    fit = coefs %*% t(Xmat)
    ## draw samples from this model
    yRep = sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(paruelo), fit[i,
        ], paruelo.mcmc[i, "sigma"]))
    ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"),
        alpha = 0.5) + geom_density(data = paruelo, aes(x = sqrt(C3), fill = "Obs"),
        alpha = 0.5)
    
    plot of chunk tut7.3bQ1.7b4

    Lets see how well data simulated from the model reflects the raw data

    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>%
        dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix
    coefs = paruelo.mcmc[, 1:4]
    newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT),
        max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG),
        max(cLONG), len = 100))))
    Xmat = model.matrix(~cLAT * cLONG, data = newdata)
    fit = coefs %*% t(Xmat)
    # add noise for prediction instead of confidence
    fit = t(sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(newdata),
        fit[i, ], paruelo.mcmc[i, "sigma"])))
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG) %>% filter(Value != 0)
    paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt,
        aes(y = sqrt(C3))) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7b5

    We can also explore the posteriors of each parameter.

    library(bayesplot)
    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix
    mcmc_intervals(paruelo.mcmc, regex_pars = "beta")
    
    plot of chunk tut7.3bQ1.7b6
    mcmc_areas(paruelo.mcmc, regex_pars = "beta")
    
    plot of chunk tut7.3bQ1.7b6
    paruelo.mcmc = as.data.frame(paruelo.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>%
        as.matrix
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = sqrt(paruelo$C3) - fit
    ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
    
    plot of chunk tut7.3bQ1.7c1
    paruelo.mcmc = as.data.frame(paruelo.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>%
        as.matrix
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = sqrt(paruelo$C3) - fit
    newdata = newdata %>% cbind(fit, resid)
    newdata.melt = newdata %>% gather(key = Pred, value = Value, cLAT:cLONG)
    ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7c2
    paruelo.mcmc = as.data.frame(paruelo.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>%
        as.matrix
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = apply(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")], 2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = sqrt(paruelo$C3) - fit
    sresid = resid/sd(resid)
    ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
    
    plot of chunk tut7.3bQ1.7c3
    paruelo.mcmc = as.data.frame(paruelo.rstan) %>% dplyr:::select(beta0, starts_with("beta"),
        sigma) %>% as.matrix
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = as.matrix(paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")])
    fit = coefs %*% t(Xmat)
    ## draw samples from this model
    yRep = sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(paruelo), fit[i,
        ], paruelo.mcmc[i, "sigma"]))
    ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"),
        alpha = 0.5) + geom_density(data = paruelo, aes(x = sqrt(C3), fill = "Obs"),
        alpha = 0.5)
    
    plot of chunk tut7.3bQ1.7c4

    Lets see how well data simulated from the model reflects the raw data

    paruelo.mcmc = as.data.frame(paruelo.rstan) %>% dplyr:::select(beta0,
        starts_with("beta"), sigma) %>% as.matrix
    coefs = paruelo.mcmc[, 1:4]
    newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT),
        max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG),
        max(cLONG), len = 100))))
    Xmat = model.matrix(~cLAT * cLONG, data = newdata)
    fit = coefs %*% t(Xmat)
    # add noise for prediction instead of confidence
    fit = t(sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(newdata),
        fit[i, ], paruelo.mcmc[i, "sigma"])))
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG) %>% filter(Value != 0)
    paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt,
        aes(y = sqrt(C3))) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7c5

    And on a natural scale (back-transformed)

    newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT),
        max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG),
        max(cLONG), len = 100))))
    fit = fit^2
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG) %>% filter(Value != 0)
    paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt,
        aes(y = sqrt(C3))) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7c5a

    We can also explore the posteriors of each parameter.

    library(bayesplot)
    paruelo.mcmc = as.matrix(paruelo.rstan)
    mcmc_intervals(paruelo.mcmc, regex_pars = "beta\\[")
    
    plot of chunk tut7.3bQ1.7c6
    mcmc_areas(paruelo.mcmc, regex_pars = "beta\\[")
    
    plot of chunk tut7.3bQ1.7c6
    resid = resid(paruelo.rstanarm)
    fit = fitted(paruelo.rstanarm)
    ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
    
    plot of chunk tut7.3bQ1.7d1
    resid = resid(paruelo.rstanarm)
    paruelo.melt = paruelo %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cLAT:cLONG)
    ggplot(paruelo.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7d2
    resid = resid(paruelo.rstanarm)
    sresid = resid/sd(resid)
    fit = fitted(paruelo.rstanarm)
    ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
    
    plot of chunk tut7.3bQ1.7d3
    y_pred = posterior_predict(paruelo.rstanarm)
    newdata = paruelo %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -C3:-cLONG)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val", cLAT:cLONG)
    paruelo.melt = paruelo %>% gather(key = "Pred", value = "Pred_val", cLAT:cLONG)
    ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue", fill = "blue", alpha = 0.5) +
        geom_violin(data = paruelo.melt, aes(y = sqrt(C3), x = Pred_val), fill = "red", color = "red", alpha = 0.5) +
        facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7d4a
    paruelo.mcmc = as.data.frame(paruelo.rstanarm) %>% dplyr:::select(matches("Inter"),
        starts_with("cL"), sigma) %>% as.matrix
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    fit = coefs %*% t(Xmat)
    ## draw samples from this model
    yRep = sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(paruelo), fit[i,
        ], paruelo.mcmc[i, "sigma"]))
    ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"),
        alpha = 0.5) + geom_density(data = paruelo, aes(x = sqrt(C3), fill = "Obs"),
        alpha = 0.5)
    
    plot of chunk tut7.3bQ1.7d4

    Lets see how well data simulated from the model reflects the raw data

    newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT),
        max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG),
        max(cLONG), len = 100))))
    fit = posterior_predict(paruelo.rstanarm, newdata = newdata)
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG) %>% filter(Value != 0)
    paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt,
        aes(y = sqrt(C3))) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7d5

    And on a natural scale (back-transformed)

    newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT),
        max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG),
        max(cLONG), len = 100))))
    fit = posterior_predict(paruelo.rstanarm, newdata = newdata)^2
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG) %>% filter(Value != 0)
    paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt,
        aes(y = C3)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7d5a

    We can also explore the posteriors of each parameter.

    library(bayesplot)
    paruelo.mcmc = as.matrix(paruelo.rstanarm)
    mcmc_intervals(paruelo.mcmc, regex_pars = "cL")
    
    plot of chunk tut7.3bQ1.7d6
    mcmc_areas(paruelo.mcmc, regex_pars = "cL")
    
    plot of chunk tut7.3bQ1.7d6
    resid = resid(paruelo.brm)[, "Estimate"]
    fit = fitted(paruelo.brm)[, "Estimate"]
    ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
    
    plot of chunk tut7.3bQ1.7e1
    resid = resid(paruelo.brm)[, "Estimate"]
    paruelo.melt = paruelo %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cLAT:cLONG)
    ggplot(paruelo.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7e2
    resid = resid(paruelo.brm)[, "Estimate"]
    sresid = resid/sd(resid)
    fit = fitted(paruelo.brm)[, "Estimate"]
    ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
    
    plot of chunk tut7.3bQ1.7e3
    y_pred = posterior_predict(paruelo.brm)
    newdata = paruelo %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value", -C3:-cLONG)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val", cLAT:cLONG)
    paruelo.melt = paruelo %>% gather(key = "Pred", value = "Pred_val", cLAT:cLONG)
    ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue", fill = "blue", alpha = 0.5) +
        geom_violin(data = paruelo.melt, aes(y = sqrt(C3), x = Pred_val), fill = "red", color = "red", alpha = 0.5) +
        facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7e4a
    paruelo.mcmc = as.data.frame(paruelo.brm) %>% dplyr:::select(matches("Inter"),
        starts_with("b_"), sigma) %>% as.matrix
    coefs = paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]
    
    Error in paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]: subscript out of bounds
    
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    fit = coefs %*% t(Xmat)
    ## draw samples from this model
    yRep = sapply(1:nrow(paruelo.mcmc), function(i) rnorm(nrow(paruelo), fit[i,
        ], paruelo.mcmc[i, "sigma"]))
    ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"),
        alpha = 0.5) + geom_density(data = paruelo, aes(x = sqrt(C3), fill = "Obs"),
        alpha = 0.5)
    
    plot of chunk tut7.3bQ1.7e4

    Lets see how well data simulated from the model reflects the raw data

    newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT),
        max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG),
        max(cLONG), len = 100))))
    fit = posterior_predict(paruelo.brm, newdata = newdata)
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG) %>% filter(Value != 0)
    paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt,
        aes(y = sqrt(C3))) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7e5

    And on a natural scale (back-transformed)

    newdata = with(paruelo, rbind(data.frame(cLAT = seq(min(cLAT),
        max(cLAT), len = 100), cLONG = 0), data.frame(cLAT = 0, cLONG = seq(min(cLONG),
        max(cLONG), len = 100))))
    fit = posterior_predict(paruelo.brm, newdata = newdata)^2
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG) %>% filter(Value != 0)
    paruelo.melt = paruelo %>% gather(key = "Pred", value = "Value",
        cLAT:cLONG)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = paruelo.melt,
        aes(y = C3)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred)
    
    plot of chunk tut7.3bQ1.7e5a

    We can also explore the posteriors of each parameter.

    library(bayesplot)
    paruelo.mcmc = as.matrix(paruelo.brm)
    mcmc_intervals(paruelo.mcmc, regex_pars = "cL")
    
    plot of chunk tut7.3bQ1.7e6
    mcmc_areas(paruelo.mcmc, regex_pars = "cL")
    
    plot of chunk tut7.3bQ1.7e6
  7. Whilst there are no real issues with the residuals, The violin plots and predicted trends should raise alarm bells. On the square-root scale, predicted values associated with low cLAT are less than 0. When a mixture of estimates above and below zero are back-transformed onto the natural scale, negative values will become positive. Hence the order of the data will not be preserved during the back-transform. This is not good.

    Although root transformations to the inverse of an odd power will preserve the polarity of estimates, it is of course not logical to have a C3 abundance less than 0. We will ignore this issue for now, yet we will again note that the Gaussian approach is probably inappropriate.

  8. Explore parameter estimates
    library(MCMCpack)
    summary(paruelo.mcmcpack)
    
    Iterations = 1001:11000
    Thinning interval = 1 
    Number of chains = 1 
    Sample size per chain = 10000 
    
    1. Empirical mean and standard deviation for each variable,
       plus standard error of the mean:
    
                     Mean        SD  Naive SE Time-series SE
    (Intercept)  0.428262 0.0238363 2.384e-04      2.384e-04
    cLAT         0.043690 0.0049880 4.988e-05      4.988e-05
    cLONG       -0.002911 0.0036879 3.688e-05      3.607e-05
    cLAT:cLONG   0.002282 0.0007591 7.591e-06      7.591e-06
    sigma2       0.040774 0.0071573 7.157e-05      7.416e-05
    
    2. Quantiles for each variable:
    
                      2.5%       25%       50%        75%    97.5%
    (Intercept)  0.3823311  0.412255  0.428315  0.4438954 0.476045
    cLAT         0.0339328  0.040366  0.043678  0.0469825 0.053461
    cLONG       -0.0101961 -0.005383 -0.002868 -0.0004278 0.004209
    cLAT:cLONG   0.0008195  0.001773  0.002275  0.0027832 0.003791
    sigma2       0.0291154  0.035663  0.039970  0.0450441 0.056792
    
    library(broom)
    tidyMCMC(paruelo.mcmcpack, conf.int = TRUE, conf.method = "HPDinterval")
    
             term     estimate    std.error      conf.low   conf.high
    1 (Intercept)  0.428262064 0.0238363306  0.3812121953 0.474471951
    2        cLAT  0.043690358 0.0049880391  0.0337928327 0.053291476
    3       cLONG -0.002910852 0.0036879275 -0.0103440322 0.003966760
    4  cLAT:cLONG  0.002282089 0.0007591007  0.0008117709 0.003776196
    5      sigma2  0.040774221 0.0071572530  0.0278700035 0.054891530
    
    mcmcpvalue(paruelo.mcmcpack[, "cLAT"])
    
    [1] 0
    
    mcmcpvalue(paruelo.mcmcpack[, "cLONG"])
    
    [1] 0.4271
    
    mcmcpvalue(paruelo.mcmcpack[, "cLAT:cLONG"])
    
    [1] 0.0037
    
    print(paruelo.r2jags)
    
    Inference for Bugs model at "5", fit using jags,
     3 chains, each with 50000 iterations (first 3000 discarded), n.thin = 10
     n.sims = 14100 iterations saved
             mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
    beta[1]    0.044   0.005   0.034   0.040   0.044   0.047   0.054 1.001 14000
    beta[2]   -0.003   0.004  -0.010  -0.005  -0.003   0.000   0.005 1.001 14000
    beta[3]    0.002   0.001   0.001   0.002   0.002   0.003   0.004 1.001  5800
    beta0      0.428   0.024   0.381   0.412   0.428   0.445   0.476 1.001 14000
    sigma      0.203   0.018   0.172   0.190   0.201   0.214   0.241 1.001  8500
    deviance -27.267   3.349 -31.694 -29.726 -27.961 -25.557 -19.074 1.001 13000
    
    For each parameter, n.eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
    
    DIC info (using the rule, pD = var(deviance)/2)
    pD = 5.6 and DIC = -21.7
    DIC is an estimate of expected predictive error (lower deviance is better).
    
    library(broom)
    tidyMCMC(as.mcmc(paruelo.r2jags), conf.int = TRUE, conf.method = "HPDinterval")
    
          term      estimate    std.error      conf.low     conf.high
    1    beta0   0.428244156 0.0241248235  3.828510e-01   0.477842988
    2  beta[1]   0.043754800 0.0049544131  3.357956e-02   0.053028339
    3  beta[2]  -0.002930712 0.0037809737 -1.045071e-02   0.004455156
    4  beta[3]   0.002282551 0.0007658955  7.668375e-04   0.003760403
    5 deviance -27.267374046 3.3494006696 -3.232140e+01 -20.820160276
    6    sigma   0.202889499 0.0177803063  1.708334e-01   0.239513349
    
    mcmcpvalue(paruelo.r2jags$BUGSoutput$sims.matrix[, "beta[1]"])
    
    [1] 0
    
    mcmcpvalue(paruelo.r2jags$BUGSoutput$sims.matrix[, "beta[2]"])
    
    [1] 0.431844
    
    mcmcpvalue(paruelo.r2jags$BUGSoutput$sims.matrix[, "beta[3]"])
    
    [1] 0.003262411
    
    print(paruelo.rstan, pars = c("beta0", "beta", "sigma"))
    
    Inference for Stan model: d98dbf6a02725fc3fce11306b77873e9.
    3 chains, each with iter=5000; warmup=500; thin=2; 
    post-warmup draws per chain=2250, total post-warmup draws=6750.
    
            mean se_mean   sd  2.5%   25%  50%  75% 97.5% n_eff Rhat
    beta0   0.43       0 0.02  0.38  0.41 0.43 0.44  0.48  5240    1
    beta[1] 0.04       0 0.00  0.03  0.04 0.04 0.05  0.05  6527    1
    beta[2] 0.00       0 0.00 -0.01 -0.01 0.00 0.00  0.00  6349    1
    beta[3] 0.00       0 0.00  0.00  0.00 0.00 0.00  0.00  6288    1
    sigma   0.20       0 0.02  0.17  0.19 0.20 0.21  0.24  5209    1
    
    Samples were drawn using NUTS(diag_e) at Mon Aug 21 16:38:35 2017.
    For each parameter, n_eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor on split chains (at 
    convergence, Rhat=1).
    
    library(broom)
    tidyMCMC(paruelo.rstan, conf.int = TRUE, conf.method = "HPDinterval", pars = c("beta0", "beta", "sigma"),
        ess = TRUE, rhat = TRUE)
    
         term     estimate    std.error      conf.low   conf.high      rhat  ess
    1   beta0  0.427836523 0.0240303191  0.3819939833 0.475662944 1.0010406 5240
    2 beta[1]  0.043632430 0.0049722257  0.0332021094 0.052911579 1.0001943 6527
    3 beta[2] -0.002863611 0.0037983372 -0.0099487399 0.004980100 0.9997762 6349
    4 beta[3]  0.002294422 0.0007688886  0.0008689837 0.003920697 0.9998934 6288
    5   sigma  0.203118147 0.0177639850  0.1699683473 0.238532986 0.9998482 5209
    
    mcmcpvalue(as.matrix(paruelo.rstan)[, "beta[1]"])
    
    [1] 0
    
    mcmcpvalue(as.matrix(paruelo.rstan)[, "beta[2]"])
    
    [1] 0.4474074
    
    mcmcpvalue(as.matrix(paruelo.rstan)[, "beta[3]"])
    
    [1] 0.00237037
    
    # lets explore the support for the interaction via loo
    library(loo)
    (full = loo(extract_log_lik(paruelo.rstan)))
    
    Computed from 6750 by 73 log-likelihood matrix
    
             Estimate   SE
    elpd_loo     11.7  5.2
    p_loo         3.6  0.6
    looic       -23.4 10.4
    
    All Pareto k estimates are good (k < 0.5)
    See help('pareto-k-diagnostic') for details.
    
    X = model.matrix(~cLAT + cLONG, data = paruelo)
    paruelo.list <- with(paruelo, list(Y = sqrt(C3), X = X, nX = ncol(X), n = nrow(paruelo)))
    paruelo.rstan.red <- stan(data = paruelo.list, model_code = modelString, chains = 3, iter = 4000, warmup = 1000,
        thin = 3, save_dso = TRUE)
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 1).
    
    Gradient evaluation took 2e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 4000 [  0%]  (Warmup)
    Iteration:  400 / 4000 [ 10%]  (Warmup)
    Iteration:  800 / 4000 [ 20%]  (Warmup)
    Iteration: 1001 / 4000 [ 25%]  (Sampling)
    Iteration: 1400 / 4000 [ 35%]  (Sampling)
    Iteration: 1800 / 4000 [ 45%]  (Sampling)
    Iteration: 2200 / 4000 [ 55%]  (Sampling)
    Iteration: 2600 / 4000 [ 65%]  (Sampling)
    Iteration: 3000 / 4000 [ 75%]  (Sampling)
    Iteration: 3400 / 4000 [ 85%]  (Sampling)
    Iteration: 3800 / 4000 [ 95%]  (Sampling)
    Iteration: 4000 / 4000 [100%]  (Sampling)
    
     Elapsed Time: 0.045721 seconds (Warm-up)
                   0.109229 seconds (Sampling)
                   0.15495 seconds (Total)
    
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 2).
    
    Gradient evaluation took 1.1e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 4000 [  0%]  (Warmup)
    Iteration:  400 / 4000 [ 10%]  (Warmup)
    Iteration:  800 / 4000 [ 20%]  (Warmup)
    Iteration: 1001 / 4000 [ 25%]  (Sampling)
    Iteration: 1400 / 4000 [ 35%]  (Sampling)
    Iteration: 1800 / 4000 [ 45%]  (Sampling)
    Iteration: 2200 / 4000 [ 55%]  (Sampling)
    Iteration: 2600 / 4000 [ 65%]  (Sampling)
    Iteration: 3000 / 4000 [ 75%]  (Sampling)
    Iteration: 3400 / 4000 [ 85%]  (Sampling)
    Iteration: 3800 / 4000 [ 95%]  (Sampling)
    Iteration: 4000 / 4000 [100%]  (Sampling)
    
     Elapsed Time: 0.04781 seconds (Warm-up)
                   0.124076 seconds (Sampling)
                   0.171886 seconds (Total)
    
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 3).
    
    Gradient evaluation took 1.1e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 4000 [  0%]  (Warmup)
    Iteration:  400 / 4000 [ 10%]  (Warmup)
    Iteration:  800 / 4000 [ 20%]  (Warmup)
    Iteration: 1001 / 4000 [ 25%]  (Sampling)
    Iteration: 1400 / 4000 [ 35%]  (Sampling)
    Iteration: 1800 / 4000 [ 45%]  (Sampling)
    Iteration: 2200 / 4000 [ 55%]  (Sampling)
    Iteration: 2600 / 4000 [ 65%]  (Sampling)
    Iteration: 3000 / 4000 [ 75%]  (Sampling)
    Iteration: 3400 / 4000 [ 85%]  (Sampling)
    Iteration: 3800 / 4000 [ 95%]  (Sampling)
    Iteration: 4000 / 4000 [100%]  (Sampling)
    
     Elapsed Time: 0.055341 seconds (Warm-up)
                   0.117843 seconds (Sampling)
                   0.173184 seconds (Total)
    
    (reduced = loo(extract_log_lik(paruelo.rstan.red)))
    
    Computed from 3000 by 73 log-likelihood matrix
    
             Estimate  SE
    elpd_loo      7.8 4.3
    p_loo         3.2 0.5
    looic       -15.7 8.5
    
    All Pareto k estimates are good (k < 0.5)
    See help('pareto-k-diagnostic') for details.
    
    par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3)
    plot(full, label_points = TRUE)
    plot(reduced, label_points = TRUE)
    
    plot of chunk tut7.3bQ1.8c1
    summary(paruelo.rstanarm)
    
    Model Info:
    
     function:  stan_glm
     family:    gaussian [identity]
     formula:   sqrt(C3) ~ cLAT * cLONG
     algorithm: sampling
     priors:    see help('prior_summary')
     sample:    6750 (posterior sample size)
     num obs:   73
    
    Estimates:
                    mean   sd   2.5%   25%   50%   75%   97.5%
    (Intercept)   0.4    0.0  0.4    0.4   0.4   0.4   0.5    
    cLAT          0.0    0.0  0.0    0.0   0.0   0.0   0.1    
    cLONG         0.0    0.0  0.0    0.0   0.0   0.0   0.0    
    cLAT:cLONG    0.0    0.0  0.0    0.0   0.0   0.0   0.0    
    sigma         0.2    0.0  0.2    0.2   0.2   0.2   0.2    
    mean_PPD      0.4    0.0  0.4    0.4   0.4   0.5   0.5    
    log-posterior 5.8    1.6  1.7    5.0   6.1   7.0   8.0    
    
    Diagnostics:
                  mcse Rhat n_eff
    (Intercept)   0.0  1.0  6520 
    cLAT          0.0  1.0  6276 
    cLONG         0.0  1.0  6208 
    cLAT:cLONG    0.0  1.0  6187 
    sigma         0.0  1.0  6511 
    mean_PPD      0.0  1.0  6514 
    log-posterior 0.0  1.0  5064 
    
    For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
    
    library(broom)
    tidyMCMC(paruelo.rstanarm$stanfit, conf.int = TRUE, conf.method = "HPDinterval", ess = TRUE, rhat = TRUE)
    
               term     estimate    std.error      conf.low   conf.high      rhat  ess
    1   (Intercept)  0.428725348 0.0237030588  0.3817818425 0.474550972 1.0001020 6520
    2          cLAT  0.043720845 0.0049584396  0.0339531664 0.053464776 0.9999700 6276
    3         cLONG -0.002864137 0.0037665520 -0.0101579021 0.004520961 0.9999069 6208
    4    cLAT:cLONG  0.002286517 0.0007621962  0.0007272455 0.003698621 1.0002028 6187
    5         sigma  0.202789016 0.0174313569  0.1701918889 0.236954391 1.0001987 6511
    6      mean_PPD  0.436065205 0.0334535901  0.3707492034 0.502129845 0.9998652 6514
    7 log-posterior  5.799906303 1.6257986198  2.5983796407 8.199551919 1.0001926 5064
    
    mcmcpvalue(as.matrix(paruelo.rstanarm)[, "cLAT"])
    
    [1] 0
    
    mcmcpvalue(as.matrix(paruelo.rstanarm)[, "cLONG"])
    
    [1] 0.4432593
    
    mcmcpvalue(as.matrix(paruelo.rstanarm)[, "cLAT:cLONG"])
    
    [1] 0.004444444
    
    # lets explore the support for the interaction via loo
    library(loo)
    (full = loo(paruelo.rstanarm))
    
    Computed from 6750 by 73 log-likelihood matrix
    
             Estimate   SE
    elpd_loo     11.8  5.2
    p_loo         3.6  0.6
    looic       -23.5 10.4
    
    All Pareto k estimates are good (k < 0.5)
    See help('pareto-k-diagnostic') for details.
    
    paruelo.rstanarm.red = update(paruelo.rstanarm, . ~ cLAT + cLONG)
    
    Gradient evaluation took 3.3e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.33 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.036207 seconds (Warm-up)
                   0.244547 seconds (Sampling)
                   0.280754 seconds (Total)
    
    
    Gradient evaluation took 1.6e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.037839 seconds (Warm-up)
                   0.244601 seconds (Sampling)
                   0.28244 seconds (Total)
    
    
    Gradient evaluation took 1.7e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.038992 seconds (Warm-up)
                   0.24547 seconds (Sampling)
                   0.284462 seconds (Total)
    
    (reduced = loo(paruelo.rstanarm.red))
    
    Computed from 6750 by 73 log-likelihood matrix
    
             Estimate  SE
    elpd_loo      7.9 4.3
    p_loo         3.2 0.5
    looic       -15.9 8.6
    
    All Pareto k estimates are good (k < 0.5)
    See help('pareto-k-diagnostic') for details.
    
    par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3)
    plot(full, label_points = TRUE)
    plot(reduced, label_points = TRUE)
    
    plot of chunk tut7.3bQ1.8d1
    compare_models(full, reduced)
    
    elpd_diff        se 
         -3.8       2.4 
    
    summary(paruelo.brm)
    
     Family: gaussian(identity) 
    Formula: sqrt(C3) ~ cLAT * cLONG 
       Data: paruelo (Number of observations: 73) 
    Samples: 3 chains, each with iter = 5000; warmup = 500; thin = 2; 
             total post-warmup samples = 6750
        ICs: LOO = NA; WAIC = NA; R2 = NA
     
    Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    Intercept      0.43      0.02     0.38     0.47       5783    1
    cLAT           0.04      0.00     0.03     0.05       6366    1
    cLONG          0.00      0.00    -0.01     0.00       6661    1
    cLAT:cLONG     0.00      0.00     0.00     0.00       6750    1
    
    Family Specific Parameters: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    sigma      0.2      0.02     0.17     0.24       5323    1
    
    Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
    is a crude measure of effective sample size, and Rhat is the potential 
    scale reduction factor on split chains (at convergence, Rhat = 1).
    
    library(broom)
    tidyMCMC(paruelo.brm$fit, conf.int = TRUE, conf.method = "HPDinterval", ess = TRUE, rhat = TRUE)
    
              term     estimate    std.error      conf.low   conf.high      rhat  ess
    1  b_Intercept  0.428165324 0.0240007939  0.3796997054 0.473389433 1.0000025 5783
    2       b_cLAT  0.043616675 0.0049670799  0.0338118263 0.053212235 0.9997456 6366
    3      b_cLONG -0.002874484 0.0037617204 -0.0102170990 0.004536065 0.9999475 6661
    4 b_cLAT:cLONG  0.002266867 0.0007727372  0.0007333648 0.003811004 0.9997885 6750
    5        sigma  0.202951074 0.0174078684  0.1699554940 0.237294437 1.0002295 5323
    
    mcmcpvalue(as.matrix(paruelo.brm)[, "b_cLAT"])
    
    [1] 0
    
    mcmcpvalue(as.matrix(paruelo.brm)[, "b_cLONG"])
    
    [1] 0.4368889
    
    mcmcpvalue(as.matrix(paruelo.brm)[, "b_cLAT:cLONG"])
    
    [1] 0.004296296
    
    # lets explore the support for the interaction via loo
    library(loo)
    (full = loo(paruelo.brm))
    
     LOOIC   SE
     -23.5 10.4
    
    paruelo.brm.red = update(paruelo.brm, . ~ cLAT + cLONG)
    
    SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 1).
    
    Gradient evaluation took 1.7e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.087868 seconds (Warm-up)
                   0.083064 seconds (Sampling)
                   0.170932 seconds (Total)
    
    
    SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 2).
    
    Gradient evaluation took 1e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.083951 seconds (Warm-up)
                   0.080144 seconds (Sampling)
                   0.164095 seconds (Total)
    
    
    SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 3).
    
    Gradient evaluation took 9e-06 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.10254 seconds (Warm-up)
                   0.092983 seconds (Sampling)
                   0.195523 seconds (Total)
    
    (reduced = loo(paruelo.brm.red))
    
      LOOIC   SE
     -15.73 8.56
    
    par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3)
    plot(full, label_points = TRUE)
    plot(reduced, label_points = TRUE)
    
    plot of chunk tut7.3bQ1.8e1
  9. There is some support for an interaction.

  10. Generate graphical summaries
    library(MCMCpack)
    paruelo.mcmc = paruelo.mcmcpack
    ## Calculate the fitted values
    newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE),
        max(cLONG, na.rm = TRUE), len = 100), cLONG = mean(cLONG) + sd(cLONG) %*%
        -2:2))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>%
        cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) %>%
        mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1,
            2), "*sigma")))
    
    ## Partial residuals
    fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) +
        sd(cLONG) * -2:2))
    fMat = rMat = model.matrix(~cLAT * cLONG, fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2)
    rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT +
        mean.LAT, LONG = cLONG + mean.LONG)
    
    ## Partition the partial residuals such that each x1 trend only includes
    ## x2 data that is within that range in the observed data
    findNearest = function(x, y) {
        ff = fields:::rdist(x, y)
        apply(ff, 1, function(x) which(x == min(x)))
    }
    fn = findNearest(x = paruelo[, c("LAT", "LONG")], y = rdata[, c("LAT",
        "LONG")])
    rdata = rdata[unlist(fn), ] %>% mutate(LONG = factor(LONG, labels = paste("LONG:~",
        c(-2, -1, 0, 1, 2), "*sigma")))
    ggplot(newdata, aes(y = estimate, x = LAT)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("Latitude") + facet_wrap(~LONG, labeller = label_parsed,
        nrow = 1, scales = "fixed") + theme_classic() + theme(strip.background = element_blank())
    
    plot of chunk tut7.3bQ1.9a1
    # Note, the curvature is purely an artifact of the transformation
    # applied.
    
    paruelo.mcmc = paruelo.mcmcpack
    ## Calculate the fitted values
    newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE),
        max(cLAT, na.rm = TRUE), len = 100), cLONG = seq(min(cLONG, na.rm = TRUE),
        max(cLONG, na.rm = TRUE), len = 100)))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>%
        cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) +
        sd(cLONG) * -2:2))
    fMat = rMat = model.matrix(~cLAT * cLONG, fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2)
    rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT +
        mean.LAT, LONG = cLONG + mean.LONG)
    
    ggplot(newdata, aes(y = LAT, x = LONG)) + geom_tile(aes(fill = estimate)) +
        geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("C3",
        colors = heat.colors(10)) + geom_point(data = paruelo, aes(size = C3)) +
        scale_y_continuous("Latitude") + scale_x_continuous("Longitude") +
        theme_classic()
    
    plot of chunk tut7.3bQ1.9a2
    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix
    ## Calculate the fitted values
    newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE),
        max(cLONG, na.rm = TRUE), len = 100), cLONG = mean(cLONG) + sd(cLONG) %*%
        -2:2))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>%
        cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) %>%
        mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1,
            2), "*sigma")))
    
    ## Partial residuals
    fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) +
        sd(cLONG) * -2:2))
    fMat = rMat = model.matrix(~cLAT * cLONG, fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2)
    rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT +
        mean.LAT, LONG = cLONG + mean.LONG)
    
    ## Partition the partial residuals such that each x1 trend only includes
    ## x2 data that is within that range in the observed data
    findNearest = function(x, y) {
        ff = fields:::rdist(x, y)
        apply(ff, 1, function(x) which(x == min(x)))
    }
    fn = findNearest(x = paruelo[, c("LAT", "LONG")], y = rdata[, c("LAT",
        "LONG")])
    rdata = rdata[unlist(fn), ] %>% mutate(LONG = factor(LONG, labels = paste("LONG:~",
        c(-2, -1, 0, 1, 2), "*sigma")))
    ggplot(newdata, aes(y = estimate, x = LAT)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("Latitude") + facet_wrap(~LONG, labeller = label_parsed,
        nrow = 1, scales = "fixed") + theme_classic() + theme(strip.background = element_blank())
    
    plot of chunk tut7.3bQ1.9b1
    # Note, the curvature is purely an artifact of the transformation
    # applied.
    
    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix
    ## Calculate the fitted values
    newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE),
        max(cLAT, na.rm = TRUE), len = 100), cLONG = seq(min(cLONG, na.rm = TRUE),
        max(cLONG, na.rm = TRUE), len = 100)))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>%
        cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) +
        sd(cLONG) * -2:2))
    fMat = rMat = model.matrix(~cLAT * cLONG, fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2)
    rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT +
        mean.LAT, LONG = cLONG + mean.LONG)
    
    ggplot(newdata, aes(y = LAT, x = LONG)) + geom_tile(aes(fill = estimate)) +
        geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("C3",
        colors = heat.colors(10)) + geom_point(data = paruelo, aes(size = C3)) +
        scale_y_continuous("Latitude") + scale_x_continuous("Longitude") +
        theme_classic()
    
    plot of chunk tut7.3bQ1.9b2
    paruelo.mcmc = as.matrix(paruelo.rstan)
    ## Calculate the fitted values
    newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE),
        max(cLONG, na.rm = TRUE), len = 100), cLONG = mean(cLONG) + sd(cLONG) %*%
        -2:2))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>%
        cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) %>%
        mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1,
            2), "*sigma")))
    
    ## Partial residuals
    fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) +
        sd(cLONG) * -2:2))
    fMat = rMat = model.matrix(~cLAT * cLONG, fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2)
    rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT +
        mean.LAT, LONG = cLONG + mean.LONG)
    
    ## Partition the partial residuals such that each x1 trend only includes
    ## x2 data that is within that range in the observed data
    findNearest = function(x, y) {
        ff = fields:::rdist(x, y)
        apply(ff, 1, function(x) which(x == min(x)))
    }
    fn = findNearest(x = paruelo[, c("LAT", "LONG")], y = rdata[, c("LAT",
        "LONG")])
    rdata = rdata[unlist(fn), ] %>% mutate(LONG = factor(LONG, labels = paste("LONG:~",
        c(-2, -1, 0, 1, 2), "*sigma")))
    ggplot(newdata, aes(y = estimate, x = LAT)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("Latitude") + facet_wrap(~LONG, labeller = label_parsed,
        nrow = 1, scales = "fixed") + theme_classic() + theme(strip.background = element_blank())
    
    plot of chunk tut7.3bQ1.9c1
    # Note, the curvature is purely an artifact of the transformation
    # applied.
    
    paruelo.mcmc = as.matrix(paruelo.rstan)
    ## Calculate the fitted values
    newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE),
        max(cLAT, na.rm = TRUE), len = 100), cLONG = seq(min(cLONG, na.rm = TRUE),
        max(cLONG, na.rm = TRUE), len = 100)))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>%
        cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) +
        sd(cLONG) * -2:2))
    fMat = rMat = model.matrix(~cLAT * cLONG, fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2)
    rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT +
        mean.LAT, LONG = cLONG + mean.LONG)
    
    ggplot(newdata, aes(y = LAT, x = LONG)) + geom_tile(aes(fill = estimate)) +
        geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("C3",
        colors = heat.colors(10)) + geom_point(data = paruelo, aes(size = C3)) +
        scale_y_continuous("Latitude") + scale_x_continuous("Longitude") +
        theme_classic()
    
    plot of chunk tut7.3bQ1.9c2
    paruelo.mcmc = as.matrix(paruelo.rstanarm)
    ## Calculate the fitted values
    newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE),
        max(cLONG, na.rm = TRUE), len = 100), cLONG = mean(cLONG) + sd(cLONG) %*%
        -2:2))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>%
        cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) %>%
        mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1,
            2), "*sigma")))
    
    ## Partial residuals
    fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) +
        sd(cLONG) * -2:2))
    fMat = rMat = model.matrix(~cLAT * cLONG, fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2)
    rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT +
        mean.LAT, LONG = cLONG + mean.LONG)
    
    ## Partition the partial residuals such that each x1 trend only includes
    ## x2 data that is within that range in the observed data
    findNearest = function(x, y) {
        ff = fields:::rdist(x, y)
        apply(ff, 1, function(x) which(x == min(x)))
    }
    fn = findNearest(x = paruelo[, c("LAT", "LONG")], y = rdata[, c("LAT",
        "LONG")])
    rdata = rdata[unlist(fn), ] %>% mutate(LONG = factor(LONG, labels = paste("LONG:~",
        c(-2, -1, 0, 1, 2), "*sigma")))
    ggplot(newdata, aes(y = estimate, x = LAT)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("Latitude") + facet_wrap(~LONG, labeller = label_parsed,
        nrow = 1, scales = "fixed") + theme_classic() + theme(strip.background = element_blank())
    
    plot of chunk tut7.3bQ1.9d1
    # Note, the curvature is purely an artifact of the transformation
    # applied.
    
    paruelo.mcmc = as.matrix(paruelo.rstanarm)
    ## Calculate the fitted values
    newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE),
        max(cLAT, na.rm = TRUE), len = 100), cLONG = seq(min(cLONG, na.rm = TRUE),
        max(cLONG, na.rm = TRUE), len = 100)))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>%
        cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) +
        sd(cLONG) * -2:2))
    fMat = rMat = model.matrix(~cLAT * cLONG, fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2)
    rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT +
        mean.LAT, LONG = cLONG + mean.LONG)
    
    ggplot(newdata, aes(y = LAT, x = LONG)) + geom_tile(aes(fill = estimate)) +
        geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("C3",
        colors = heat.colors(10)) + geom_point(data = paruelo, aes(size = C3)) +
        scale_y_continuous("Latitude") + scale_x_continuous("Longitude") +
        theme_classic()
    
    plot of chunk tut7.3bQ1.9d2
    plot(marginal_effects(paruelo.brm, effects = "cLAT:cLONG"), points = TRUE)
    
    plot of chunk tut7.3bQ1.9e1a
    # OR Define a function that will calculate mean plus or minus 2 and 1
    # standard deviations
    msd2 = function(x) {
        means = mean(x, na.rm = TRUE)
        sd = sd(x, na.rm = TRUE)
        means + (-2:2) * sd
    }
    plot(marginal_effects(paruelo.brm, effects = "cLAT:cLONG", int_conditions = list(cLONG = msd2)),
        points = TRUE)
    
    plot of chunk tut7.3bQ1.9e1a
    # OR we could arrange the effect of cLAT separately for different
    # values of cLONG (mean plus or minus 1 and 2 standard deviations)
    cond = data.frame(cLONG = msd2(paruelo$cLONG), row.names = paste0("cLONG: mean ",
        -2:2, "*sd"))
    plot(marginal_effects(paruelo.brm, effects = "cLAT", conditions = cond,
        select_points = 0.1), points = TRUE)
    
    plot of chunk tut7.3bQ1.9e1a
    ## Yet another way would be as a 2D surface
    plot(marginal_effects(paruelo.brm, effects = "cLAT:cLONG", surface = TRUE),
        points = TRUE, stype = "raster")
    
    plot of chunk tut7.3bQ1.9e1a
    paruelo.mcmc = as.matrix(paruelo.brm)
    ## Calculate the fitted values
    newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE),
        max(cLONG, na.rm = TRUE), len = 100), cLONG = mean(cLONG) + sd(cLONG) %*%
        -2:2))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("b_Intercept)", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]
    
    Error in paruelo.mcmc[, c("b_Intercept)", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]: subscript out of bounds
    
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>%
        cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval")) %>%
        mutate(LONG = factor(LONG, labels = paste("LONG:~", c(-2, -1, 0, 1,
            2), "*sigma")))
    
    ## Partial residuals
    fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) +
        sd(cLONG) * -2:2))
    fMat = rMat = model.matrix(~cLAT * cLONG, fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2)
    rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT +
        mean.LAT, LONG = cLONG + mean.LONG)
    
    ## Partition the partial residuals such that each x1 trend only includes
    ## x2 data that is within that range in the observed data
    findNearest = function(x, y) {
        ff = fields:::rdist(x, y)
        apply(ff, 1, function(x) which(x == min(x)))
    }
    fn = findNearest(x = paruelo[, c("LAT", "LONG")], y = rdata[, c("LAT",
        "LONG")])
    rdata = rdata[unlist(fn), ] %>% mutate(LONG = factor(LONG, labels = paste("LONG:~",
        c(-2, -1, 0, 1, 2), "*sigma")))
    ggplot(newdata, aes(y = estimate, x = LAT)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata, aes(y = partial.resid), color = "grey") + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("C3") +
        scale_x_continuous("Latitude") + facet_wrap(~LONG, labeller = label_parsed,
        nrow = 1, scales = "fixed") + theme_classic() + theme(strip.background = element_blank())
    
    plot of chunk tut7.3bQ1.9e1
    # Note, the curvature is purely an artifact of the transformation
    # applied.
    
    paruelo.mcmc = as.matrix(paruelo.brm)
    ## Calculate the fitted values
    newdata = with(paruelo, expand.grid(cLAT = seq(min(cLAT, na.rm = TRUE),
        max(cLAT, na.rm = TRUE), len = 100), cLONG = seq(min(cLONG, na.rm = TRUE),
        max(cLONG, na.rm = TRUE), len = 100)))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(LAT = cLAT + mean.LAT, LONG = cLONG + mean.LONG) %>%
        cbind(tidyMCMC(fit^2, conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    fdata = rdata = with(paruelo, expand.grid(cLAT = cLAT, cLONG = mean(cLONG) +
        sd(cLONG) * -2:2))
    fMat = rMat = model.matrix(~cLAT * cLONG, fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(paruelo$C3 - (apply(coefs, 2, median) %*% t(rMat))^2)
    rdata = rdata %>% mutate(partial.resid = resid + fit^2) %>% mutate(LAT = cLAT +
        mean.LAT, LONG = cLONG + mean.LONG)
    
    ggplot(newdata, aes(y = LAT, x = LONG)) + geom_tile(aes(fill = estimate)) +
        geom_contour(aes(z = conf.high - conf.low)) + scale_fill_gradientn("C3",
        colors = heat.colors(10)) + geom_point(data = paruelo, aes(size = C3)) +
        scale_y_continuous("Latitude") + scale_x_continuous("Longitude") +
        theme_classic()
    
    plot of chunk tut7.3bQ1.9e2
  11. Explore effect sizes - change in C3 associated with a change in Latitude from 35 to 45 at various levels of Longitude.
    library(MCMCpack)
    paruelo.mcmc = paruelo.mcmcpack
    newdata = with(paruelo, expand.grid(cLAT = c(35 - mean.LAT, 45 - mean.LAT),
        cLONG = (-2:2) * sd(cLONG)))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    fit = (coefs %*% t(Xmat))^2
    s1 = seq(1, 9, b = 2)
    s2 = seq(2, 10, b = 2)
    ## Raw effect size
    (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term  estimate  std.error    conf.low conf.high
    1    2 0.1342806 0.08713327 -0.03361161 0.3080649
    2    4 0.2579431 0.05349590  0.15235971 0.3620157
    3    6 0.3700630 0.04574632  0.27681216 0.4559970
    4    8 0.4706405 0.07515740  0.32157620 0.6152214
    5   10 0.5596755 0.12343050  0.32232714 0.8075261
    
    ## Cohen's D
    cohenD = (fit[, s2] - fit[, s1])/sqrt(paruelo.mcmc[, "sigma2"])
    (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term estimate std.error   conf.low conf.high
    1    2 0.672703 0.4352213 -0.1968762  1.500608
    2    4 1.291933 0.2878937  0.7507301  1.880575
    3    6 1.853297 0.2771987  1.2898298  2.375413
    4    8 2.356796 0.4226718  1.5261221  3.175460
    5   10 2.802429 0.6547949  1.5759322  4.114309
    
    # Percentage change
    ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
    (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term     estimate    std.error  conf.low    conf.high
    1    2 1.040822e+02 9.068098e+01 -36.33409     270.4147
    2    4 3.100988e+02 1.254237e+02 107.26130     558.3306
    3    6 9.916857e+02 5.155349e+02 313.55369    1894.5857
    4    8 4.620501e+06 2.972336e+08 240.87719   78225.0598
    5   10 1.294870e+10 1.289270e+12 174.88995 2118666.8085
    
    # Probability that the effect is greater than 50% (an increase of >50%)
    (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
    
         2      4      6      8     10 
    0.7152 0.9999 1.0000 1.0000 1.0000 
    
    ## fractional change
    (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term     estimate    std.error  conf.low    conf.high
    1    2 2.040822e+00 9.068098e-01 0.6366591     3.704147
    2    4 4.100988e+00 1.254237e+00 2.0726130     6.583306
    3    6 1.091686e+01 5.155349e+00 4.1355369    19.945857
    4    8 4.620601e+04 2.972336e+06 3.4087719   783.250598
    5   10 1.294870e+08 1.289270e+10 2.7488995 21187.668085
    
    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix
    newdata = with(paruelo, expand.grid(cLAT = c(35 - mean.LAT, 45 - mean.LAT),
        cLONG = (-2:2) * sd(cLONG)))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
    fit = (coefs %*% t(Xmat))^2
    s1 = seq(1, 9, b = 2)
    s2 = seq(2, 10, b = 2)
    ## Raw effect size
    (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term  estimate  std.error    conf.low conf.high
    1    2 0.1349178 0.08794489 -0.04041036 0.3059156
    2    4 0.2585383 0.05370009  0.15709832 0.3672346
    3    6 0.3705840 0.04565108  0.28155083 0.4597610
    4    8 0.4710550 0.07610010  0.32434150 0.6226791
    5   10 0.5599513 0.12596962  0.32306584 0.8133294
    
    ## Cohen's D
    cohenD = (fit[, s2] - fit[, s1])/sqrt(paruelo.mcmc[, "sigma"])
    (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term  estimate std.error    conf.low conf.high
    1    2 0.3003534 0.1948831 -0.07593415 0.6900725
    2    4 0.5755871 0.1213214  0.33833739 0.8140084
    3    6 0.8250532 0.1070657  0.60836234 1.0276555
    4    8 1.0487518 0.1743212  0.70416978 1.3896697
    5   10 1.2466827 0.2838059  0.70502259 1.8109759
    
    # Percentage change
    ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
    (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term     estimate    std.error  conf.low    conf.high
    1    2 1.053393e+02 9.348612e+01 -28.95269     289.1523
    2    4 3.119978e+02 1.294950e+02 103.90816     566.0258
    3    6 9.959446e+02 5.039227e+02 323.90140    1919.1223
    4    8 8.040494e+08 9.437749e+10 228.89724   82953.0847
    5   10 3.366204e+07 9.964292e+08 243.68269 2220207.1191
    
    # Probability that the effect is greater than 50% (an increase of >50%)
    (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
    
            2         4         6         8        10 
    0.7144681 0.9998582 1.0000000 1.0000000 1.0000000 
    
    ## fractional change
    (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term     estimate    std.error  conf.low    conf.high
    1    2 2.053393e+00 9.348612e-01 0.7104731     3.891523
    2    4 4.119978e+00 1.294950e+00 2.0390816     6.660258
    3    6 1.095945e+01 5.039227e+00 4.2390140    20.191223
    4    8 8.040495e+06 9.437749e+08 3.2889724   830.530847
    5   10 3.366214e+05 9.964292e+06 3.4368269 22203.071191
    
    paruelo.mcmc = as.matrix(paruelo.rstan)
    newdata = with(paruelo, expand.grid(cLAT = c(35 - mean.LAT, 45 - mean.LAT),
        cLONG = (-2:2) * sd(cLONG)))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
    fit = (coefs %*% t(Xmat))^2
    s1 = seq(1, 9, b = 2)
    s2 = seq(2, 10, b = 2)
    ## Raw effect size
    (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term  estimate  std.error    conf.low conf.high
    1    2 0.1319641 0.08789611 -0.04389211 0.3016103
    2    4 0.2563267 0.05386641  0.15163208 0.3590931
    3    6 0.3692837 0.04628573  0.28427916 0.4645861
    4    8 0.4708352 0.07698113  0.31926392 0.6211574
    5   10 0.5609811 0.12716248  0.31719799 0.8079195
    
    ## Cohen's D
    cohenD = (fit[, s2] - fit[, s1])/sqrt(paruelo.mcmc[, "sigma"])
    (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term  estimate std.error    conf.low conf.high
    1    2 0.2935319 0.1942803 -0.09896972 0.6595833
    2    4 0.5702913 0.1212148  0.34901086 0.8156394
    3    6 0.8216624 0.1080325  0.61471403 1.0312566
    4    8 1.0476449 0.1758745  0.69492968 1.3808014
    5   10 1.2482390 0.2859291  0.67872851 1.7890767
    
    # Percentage change
    ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
    (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term     estimate    std.error  conf.low    conf.high
    1    2 1.042273e+02 1.170636e+02 -38.27722     282.6902
    2    4 3.092962e+02 1.348801e+02 109.35838     565.3589
    3    6 9.873882e+02 4.955407e+02 323.40689    1873.6455
    4    8 1.600200e+06 1.080694e+08 320.12706   78699.1297
    5   10 7.245863e+08 4.756900e+10  24.83881 1632540.3890
    
    # Probability that the effect is greater than 50% (an increase of >50%)
    (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
    
            2         4         6         8        10 
    0.6991111 1.0000000 1.0000000 1.0000000 0.9998519 
    
    ## fractional change
    (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term     estimate    std.error  conf.low    conf.high
    1    2 2.042273e+00 1.170636e+00 0.6172278     3.826902
    2    4 4.092962e+00 1.348801e+00 2.0935838     6.653589
    3    6 1.087388e+01 4.955407e+00 4.2340689    19.736455
    4    8 1.600300e+04 1.080694e+06 4.2012706   787.991297
    5   10 7.245864e+06 4.756900e+08 1.2483881 16326.403890
    
    paruelo.mcmc = as.matrix(paruelo.rstanarm)
    newdata = with(paruelo, expand.grid(cLAT = c(35 - mean.LAT, 45 - mean.LAT),
        cLONG = (-2:2) * sd(cLONG)))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    fit = (coefs %*% t(Xmat))^2
    s1 = seq(1, 9, b = 2)
    s2 = seq(2, 10, b = 2)
    ## Raw effect size
    (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term  estimate  std.error    conf.low conf.high
    1    2 0.1341501 0.08848599 -0.03243586 0.3107923
    2    4 0.2581415 0.05442864  0.15686023 0.3675871
    3    6 0.3707618 0.04583440  0.28341930 0.4623395
    4    8 0.4720110 0.07541079  0.32685749 0.6220711
    5   10 0.5618890 0.12473091  0.32547913 0.8094347
    
    ## Cohen's D
    cohenD = (fit[, s2] - fit[, s1])/sqrt(paruelo.mcmc[, "sigma"])
    (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term  estimate std.error    conf.low conf.high
    1    2 0.2986746 0.1958789 -0.07998586 0.6760735
    2    4 0.5747869 0.1226097  0.33480854 0.8135894
    3    6 0.8255451 0.1070300  0.61139499 1.0319382
    4    8 1.0509493 0.1721461  0.69999432 1.3776946
    5   10 1.2509993 0.2800144  0.71471192 1.8024770
    
    # Percentage change
    ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
    (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term     estimate    std.error  conf.low    conf.high
    1    2 1.056659e+02 1.257457e+02 -30.97074     287.1470
    2    4 3.105262e+02 1.329256e+02 108.88708     558.6118
    3    6 9.851546e+02 4.843524e+02 318.35471    1857.7700
    4    8 2.244783e+07 1.687360e+09 254.92501   80921.6730
    5   10 3.250875e+07 1.501876e+09 209.17051 1786965.3351
    
    # Probability that the effect is greater than 50% (an increase of >50%)
    (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
    
            2         4         6         8        10 
    0.7134815 0.9998519 1.0000000 1.0000000 1.0000000 
    
    ## fractional change
    (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term     estimate    std.error  conf.low    conf.high
    1    2 2.056659e+00 1.257457e+00 0.6902926     3.871470
    2    4 4.105262e+00 1.329256e+00 2.0888708     6.586118
    3    6 1.085155e+01 4.843524e+00 4.1835471    19.577700
    4    8 2.244793e+05 1.687360e+07 3.5492501   810.216730
    5   10 3.250885e+05 1.501876e+07 3.0917051 17870.653351
    
    paruelo.mcmc = as.matrix(paruelo.brm)
    newdata = with(paruelo, expand.grid(cLAT = c(35 - mean.LAT, 45 - mean.LAT),
        cLONG = (-2:2) * sd(cLONG)))
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    coefs = paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]
    fit = (coefs %*% t(Xmat))^2
    s1 = seq(1, 9, b = 2)
    s2 = seq(2, 10, b = 2)
    ## Raw effect size
    (RES = tidyMCMC(as.mcmc(fit[, s2] - fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term  estimate  std.error    conf.low conf.high
    1    2 0.1352462 0.08867666 -0.04429386 0.3055298
    2    4 0.2579365 0.05387510  0.15274263 0.3620875
    3    6 0.3693866 0.04588539  0.27546608 0.4539318
    4    8 0.4695965 0.07691804  0.32720996 0.6250321
    5   10 0.5585662 0.12721779  0.31538627 0.8060951
    
    ## Cohen's D
    cohenD = (fit[, s2] - fit[, s1])/sqrt(paruelo.mcmc[, "sigma"])
    (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term  estimate std.error   conf.low conf.high
    1    2 0.3010112 0.1962995 -0.0918150 0.6830484
    2    4 0.5740773 0.1213968  0.3380522 0.8117176
    3    6 0.8221476 0.1071816  0.6197105 1.0355653
    4    8 1.0452221 0.1758657  0.7327157 1.4183406
    5   10 1.2433007 0.2864037  0.7091841 1.8161433
    
    # Percentage change
    ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
    (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term     estimate    std.error  conf.low    conf.high
    1    2 1.065653e+02 9.500259e+01 -29.80102     292.8927
    2    4 3.122440e+02 1.301541e+02 107.32351     566.7632
    3    6 9.873246e+02 5.411930e+02 304.56631    1864.4427
    4    8 9.556796e+06 6.550140e+08 236.86783   79799.0072
    5   10 1.402617e+07 3.718325e+08 167.62419 1638648.6131
    
    # Probability that the effect is greater than 50% (an increase of >50%)
    (p50 = apply(ESp, 2, function(x) sum(x > 50)/length(x)))
    
            2         4         6         8        10 
    0.7179259 0.9997037 1.0000000 1.0000000 1.0000000 
    
    ## fractional change
    (FES = tidyMCMC(as.mcmc(fit[, s2]/fit[, s1]), conf.int = TRUE, conf.method = "HPDinterval"))
    
      term     estimate    std.error  conf.low    conf.high
    1    2 2.065653e+00 9.500259e-01 0.7019898     3.928927
    2    4 4.122440e+00 1.301541e+00 2.0732351     6.667632
    3    6 1.087325e+01 5.411930e+00 4.0456631    19.644427
    4    8 9.556896e+04 6.550140e+06 3.3686783   798.990072
    5   10 1.402627e+05 3.718325e+06 2.6762419 16387.486131
    
  12. Explore finite-population standard deviations
    library(MCMCpack)
    library(broom)
    paruelo.mcmc = paruelo.mcmcpack
    Xmat = model.matrix(~cLAT * cLONG, data = paruelo)
    sd.LAT = abs(paruelo.mcmc[, "cLAT"]) * sd(Xmat[, "cLAT"])
    sd.LONG = abs(paruelo.mcmc[, "cLONG"]) * sd(Xmat[, "cLONG"])
    sd.LATLONG = abs(paruelo.mcmc[, "cLAT:cLONG"]) * sd(Xmat[, "cLAT:cLONG"])
    sd.x = sd.LAT + sd.LONG + sd.LATLONG
    
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit^2, 2, sqrt(paruelo$C3), "-")
    sd.resid = apply(resid, 1, sd)
    
    sd.all = cbind(sd.LAT, sd.LONG, sd.LATLONG, sd.resid)
    (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
    
            term   estimate   std.error     conf.low  conf.high
    1     sd.LAT 0.23169015 0.026451592 1.792035e-01 0.28260491
    2    sd.LONG 0.02439035 0.017868035 4.028548e-06 0.05853145
    3 sd.LATLONG 0.07925537 0.026142083 2.817931e-02 0.13091661
    4   sd.resid 0.21542613 0.006389563 2.071466e-01 0.22865064
    
    # OR expressed as a percentage
    (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
        conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1     sd.LAT 42.182310  2.898362 36.118457692  47.42229
    2    sd.LONG  3.846411  3.084873  0.000724435  10.19369
    3 sd.LATLONG 14.411731  3.858801  6.333041215  21.35791
    4   sd.resid 39.018325  3.094226 33.791482239  45.61601
    
    ## we can even plot this as a Bayesian ANOVA table
    ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
        ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
        vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
        scale_x_discrete() + coord_flip() + theme_classic()
    
    plot of chunk tut7.3bQ1.11a1
    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix
    Xmat = model.matrix(~cLAT * cLONG, data = paruelo)
    sd.LAT = abs(paruelo.mcmc[, "beta[1]"]) * sd(Xmat[, "cLAT"])
    sd.LONG = abs(paruelo.mcmc[, "beta[2]"]) * sd(Xmat[, "cLONG"])
    sd.LATLONG = abs(paruelo.mcmc[, "beta[3]"]) * sd(Xmat[, "cLAT:cLONG"])
    sd.x = sd.LAT + sd.LONG + sd.LATLONG
    
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit^2, 2, sqrt(paruelo$C3), "-")
    sd.resid = apply(resid, 1, sd)
    
    sd.all = cbind(sd.LAT, sd.LONG, sd.LATLONG, sd.resid)
    (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
    
            term   estimate  std.error     conf.low  conf.high
    1     sd.LAT 0.23203188 0.02627327 1.780725e-01 0.28120950
    2    sd.LONG 0.02490470 0.01809705 3.322881e-08 0.05925782
    3 sd.LATLONG 0.07922763 0.02651085 2.660959e-02 0.13048759
    4   sd.resid 0.21553417 0.00653920 2.067978e-01 0.22860010
    
    # OR expressed as a percentage
    (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
        conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1     sd.LAT 42.126265  2.898268 3.633944e+01  47.62775
    2    sd.LONG  3.960719  3.115583 6.436877e-06  10.32277
    3 sd.LATLONG 14.415918  3.921531 6.384791e+00  21.62824
    4   sd.resid 38.967055  3.080186 3.369378e+01  45.44767
    
    ## we can even plot this as a Bayesian ANOVA table
    ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
        ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
        vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
        scale_x_discrete() + coord_flip() + theme_classic()
    
    plot of chunk tut7.3bQ1.11b1
    paruelo.mcmc = as.matrix(paruelo.rstan)
    Xmat = model.matrix(~cLAT * cLONG, data = paruelo)
    sd.LAT = abs(paruelo.mcmc[, "beta[1]"]) * sd(Xmat[, "cLAT"])
    sd.LONG = abs(paruelo.mcmc[, "beta[2]"]) * sd(Xmat[, "cLONG"])
    sd.LATLONG = abs(paruelo.mcmc[, "beta[3]"]) * sd(Xmat[, "cLAT:cLONG"])
    sd.x = sd.LAT + sd.LONG + sd.LATLONG
    
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit^2, 2, sqrt(paruelo$C3), "-")
    sd.resid = apply(resid, 1, sd)
    
    sd.all = cbind(sd.LAT, sd.LONG, sd.LATLONG, sd.resid)
    (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
    
            term   estimate  std.error     conf.low conf.high
    1     sd.LAT 0.23138295 0.02636773 1.760709e-01 0.2805903
    2    sd.LONG 0.02485541 0.01786845 2.194655e-05 0.0588445
    3 sd.LATLONG 0.07965332 0.02657340 3.020573e-02 0.1360499
    4   sd.resid 0.21560997 0.00654553 2.071048e-01 0.2286751
    
    # OR expressed as a percentage
    (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
        conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error    conf.low conf.high
    1     sd.LAT 42.049248  2.938491 36.18841075  47.75280
    2    sd.LONG  3.921904  3.081730  0.00064015  10.26762
    3 sd.LATLONG 14.478042  3.927237  6.46152709  21.81337
    4   sd.resid 38.961952  3.125724 33.92858577  46.03353
    
    ## we can even plot this as a Bayesian ANOVA table
    ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
        ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
        vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
        scale_x_discrete() + coord_flip() + theme_classic()
    
    plot of chunk tut7.3bQ1.11c1
    paruelo.mcmc = as.matrix(paruelo.rstanarm)
    Xmat = model.matrix(~cLAT * cLONG, data = paruelo)
    sd.LAT = abs(paruelo.mcmc[, "cLAT"]) * sd(Xmat[, "cLAT"])
    sd.LONG = abs(paruelo.mcmc[, "cLONG"]) * sd(Xmat[, "cLONG"])
    sd.LATLONG = abs(paruelo.mcmc[, "cLAT:cLONG"]) * sd(Xmat[, "cLAT:cLONG"])
    sd.x = sd.LAT + sd.LONG + sd.LATLONG
    
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit^2, 2, sqrt(paruelo$C3), "-")
    sd.resid = apply(resid, 1, sd)
    
    sd.all = cbind(sd.LAT, sd.LONG, sd.LATLONG, sd.resid)
    (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
    
            term   estimate   std.error     conf.low  conf.high
    1     sd.LAT 0.23185182 0.026294626 1.800538e-01 0.28352393
    2    sd.LONG 0.02468265 0.017832763 1.694408e-06 0.05882593
    3 sd.LATLONG 0.07937612 0.026349295 2.523573e-02 0.12834373
    4   sd.resid 0.21552224 0.006501481 2.069207e-01 0.22867590
    
    # OR expressed as a percentage
    (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
        conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1     sd.LAT 42.131235  2.873188 3.623640e+01  47.60497
    2    sd.LONG  3.918416  3.073268 3.293841e-04  10.13401
    3 sd.LATLONG 14.463484  3.913939 6.280235e+00  21.41574
    4   sd.resid 38.973321  3.089661 3.408651e+01  45.92105
    
    ## we can even plot this as a Bayesian ANOVA table
    ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
        ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
        vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
        scale_x_discrete() + coord_flip() + theme_classic()
    
    plot of chunk tut7.3bQ1.11d1
    paruelo.mcmc = as.matrix(paruelo.brm)
    Xmat = model.matrix(~cLAT * cLONG, data = paruelo)
    sd.LAT = abs(paruelo.mcmc[, "b_cLAT"]) * sd(Xmat[, "cLAT"])
    sd.LONG = abs(paruelo.mcmc[, "b_cLONG"]) * sd(Xmat[, "cLONG"])
    sd.LATLONG = abs(paruelo.mcmc[, "b_cLAT:cLONG"]) * sd(Xmat[, "cLAT:cLONG"])
    sd.x = sd.LAT + sd.LONG + sd.LATLONG
    
    # generate a model matrix
    newdata = paruelo
    Xmat = model.matrix(~cLAT * cLONG, newdata)
    ## get median parameter estimates
    coefs = paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit^2, 2, sqrt(paruelo$C3), "-")
    sd.resid = apply(resid, 1, sd)
    
    sd.all = cbind(sd.LAT, sd.LONG, sd.LATLONG, sd.resid)
    (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
    
            term   estimate   std.error     conf.low  conf.high
    1     sd.LAT 0.23129941 0.026340445 1.793043e-01 0.28218470
    2    sd.LONG 0.02462705 0.017936072 2.508594e-05 0.05878528
    3 sd.LATLONG 0.07870679 0.026680297 2.537974e-02 0.13204085
    4   sd.resid 0.21548437 0.006463628 2.071177e-01 0.22864105
    
    # OR expressed as a percentage
    (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
        conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1     sd.LAT 42.161471  2.925826 3.627886e+01  47.58413
    2    sd.LONG  3.914417  3.097391 1.754179e-04  10.13780
    3 sd.LATLONG 14.349954  3.967145 5.640723e+00  21.34444
    4   sd.resid 39.046627  3.137516 3.387698e+01  45.82190
    
    ## we can even plot this as a Bayesian ANOVA table
    ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
        ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
        vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
        scale_x_discrete() + coord_flip() + theme_classic()
    
    plot of chunk tut7.2bQ1.11e1
  13. Explore $R^2$
    library(MCMCpack)
    library(broom)
    paruelo.mcmc <- paruelo.mcmcpack
    Xmat = model.matrix(~cLAT * cLONG, data = paruelo)
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, sqrt(paruelo$C3), "-")
    var_f = apply(fit, 1, var)
    var_e = apply(resid, 1, var)
    R2 = var_f/(var_f + var_e)
    tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
    
      term  estimate  std.error  conf.low conf.high
    1 var1 0.5324033 0.05603256 0.4218392 0.6325557
    
    # for comparison with frequentist
    summary(lm(sqrt(C3) ~ cLAT * cLONG, data = paruelo))
    
    Call:
    lm(formula = sqrt(C3) ~ cLAT * cLONG, data = paruelo)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -0.51312 -0.13427 -0.01134  0.14086  0.38940 
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  0.4282658  0.0234347  18.275  < 2e-16 ***
    cLAT         0.0436937  0.0048670   8.977 3.28e-13 ***
    cLONG       -0.0028773  0.0036842  -0.781   0.4375    
    cLAT:cLONG   0.0022824  0.0007471   3.055   0.0032 ** 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 0.1991 on 69 degrees of freedom
    Multiple R-squared:  0.5403,	Adjusted R-squared:  0.5203 
    F-statistic: 27.03 on 3 and 69 DF,  p-value: 1.128e-11
    
    paruelo.mcmc = paruelo.r2jags$BUGSoutput$sims.matrix
    Xmat = model.matrix(~cLAT * cLONG, data = paruelo)
    coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, sqrt(paruelo$C3), "-")
    var_f = apply(fit, 1, var)
    var_e = apply(resid, 1, var)
    R2 = var_f/(var_f + var_e)
    tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
    
      term  estimate  std.error  conf.low conf.high
    1 var1 0.5331754 0.05548993 0.4212902 0.6298881
    
    # for comparison with frequentist
    summary(lm(sqrt(C3) ~ cLAT * cLONG, data = paruelo))
    
    Call:
    lm(formula = sqrt(C3) ~ cLAT * cLONG, data = paruelo)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -0.51312 -0.13427 -0.01134  0.14086  0.38940 
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  0.4282658  0.0234347  18.275  < 2e-16 ***
    cLAT         0.0436937  0.0048670   8.977 3.28e-13 ***
    cLONG       -0.0028773  0.0036842  -0.781   0.4375    
    cLAT:cLONG   0.0022824  0.0007471   3.055   0.0032 ** 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 0.1991 on 69 degrees of freedom
    Multiple R-squared:  0.5403,	Adjusted R-squared:  0.5203 
    F-statistic: 27.03 on 3 and 69 DF,  p-value: 1.128e-11
    
    paruelo.mcmc = as.matrix(paruelo.rstan)
    Xmat = model.matrix(~cLAT * cLONG, data = paruelo)
    coefs = paruelo.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, sqrt(paruelo$C3), "-")
    var_f = apply(fit, 1, var)
    var_e = apply(resid, 1, var)
    R2 = var_f/(var_f + var_e)
    tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
    
      term  estimate std.error  conf.low conf.high
    1 var1 0.5316571 0.0559445 0.4233242 0.6339557
    
    # for comparison with frequentist
    summary(lm(sqrt(C3) ~ cLAT * cLONG, data = paruelo))
    
    Call:
    lm(formula = sqrt(C3) ~ cLAT * cLONG, data = paruelo)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -0.51312 -0.13427 -0.01134  0.14086  0.38940 
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  0.4282658  0.0234347  18.275  < 2e-16 ***
    cLAT         0.0436937  0.0048670   8.977 3.28e-13 ***
    cLONG       -0.0028773  0.0036842  -0.781   0.4375    
    cLAT:cLONG   0.0022824  0.0007471   3.055   0.0032 ** 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 0.1991 on 69 degrees of freedom
    Multiple R-squared:  0.5403,	Adjusted R-squared:  0.5203 
    F-statistic: 27.03 on 3 and 69 DF,  p-value: 1.128e-11
    
    paruelo.mcmc = as.matrix(paruelo.rstanarm)
    Xmat = model.matrix(~cLAT * cLONG, data = paruelo)
    coefs = paruelo.mcmc[, c("(Intercept)", "cLAT", "cLONG", "cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, sqrt(paruelo$C3), "-")
    var_f = apply(fit, 1, var)
    var_e = apply(resid, 1, var)
    R2 = var_f/(var_f + var_e)
    tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
    
      term estimate  std.error  conf.low conf.high
    1 var1   0.5327 0.05578058 0.4222702 0.6314585
    
    # for comparison with frequentist
    summary(lm(sqrt(C3) ~ cLAT * cLONG, data = paruelo))
    
    Call:
    lm(formula = sqrt(C3) ~ cLAT * cLONG, data = paruelo)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -0.51312 -0.13427 -0.01134  0.14086  0.38940 
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  0.4282658  0.0234347  18.275  < 2e-16 ***
    cLAT         0.0436937  0.0048670   8.977 3.28e-13 ***
    cLONG       -0.0028773  0.0036842  -0.781   0.4375    
    cLAT:cLONG   0.0022824  0.0007471   3.055   0.0032 ** 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 0.1991 on 69 degrees of freedom
    Multiple R-squared:  0.5403,	Adjusted R-squared:  0.5203 
    F-statistic: 27.03 on 3 and 69 DF,  p-value: 1.128e-11
    
    paruelo.mcmc = as.matrix(paruelo.brm)
    Xmat = model.matrix(~cLAT * cLONG, data = paruelo)
    coefs = paruelo.mcmc[, c("b_Intercept", "b_cLAT", "b_cLONG", "b_cLAT:cLONG")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, sqrt(paruelo$C3), "-")
    var_f = apply(fit, 1, var)
    var_e = apply(resid, 1, var)
    R2 = var_f/(var_f + var_e)
    tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
    
      term  estimate  std.error  conf.low conf.high
    1 var1 0.5316392 0.05603751 0.4212598 0.6302732
    
    # for comparison with frequentist
    summary(lm(sqrt(C3) ~ cLAT * cLONG, data = paruelo))
    
    Call:
    lm(formula = sqrt(C3) ~ cLAT * cLONG, data = paruelo)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -0.51312 -0.13427 -0.01134  0.14086  0.38940 
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  0.4282658  0.0234347  18.275  < 2e-16 ***
    cLAT         0.0436937  0.0048670   8.977 3.28e-13 ***
    cLONG       -0.0028773  0.0036842  -0.781   0.4375    
    cLAT:cLONG   0.0022824  0.0007471   3.055   0.0032 ** 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 0.1991 on 69 degrees of freedom
    Multiple R-squared:  0.5403,	Adjusted R-squared:  0.5203 
    F-statistic: 27.03 on 3 and 69 DF,  p-value: 1.128e-11
    
  14. Although not overly useful in the case of only two main effects and an interaction, explore sparsity.
    					  modelString="
    					  data {
    					  int < lower =0 > n; # number of observations
    					  int < lower =0 > nX; # number of predictors
    					  vector [ n] Y; # outputs
    					  matrix [n ,nX] X; # inputs
    					  real < lower =0 > scale_icept ; # prior std for the intercept
    					  real < lower =0 > scale_global ; # scale for the half -t prior for tau
    					  real < lower =1 > nu_global ; # degrees of freedom for the half -t priors for tau
    					  real < lower =1 > nu_local ; # degrees of freedom for the half - t priors for lambdas
    					  real < lower =0 > slab_scale ; # slab scale for the regularized horseshoe
    					  real < lower =0 > slab_df ; # slab degrees of freedom for the regularized horseshoe
    					  }
    					  transformed data {
    					  matrix[n, nX - 1] Xc;  // centered version of X 
    					  vector[nX - 1] means_X;  // column means of X before centering 
    					  for (i in 2:nX) { 
    					  means_X[i - 1] = mean(X[, i]); 
    					  Xc[, i - 1] = X[, i] - means_X[i - 1]; 
    					  }  
    					  }
    					  parameters {
    					  real logsigma ;
    					  real cbeta0 ;
    					  vector [ nX-1] z;
    					  real < lower =0 > tau ; # global shrinkage parameter
    					  vector < lower =0 >[ nX-1] lambda ; # local shrinkage parameter
    					  real < lower =0 > caux ;
    					  }
    					  transformed parameters {
    					  real < lower =0 > sigma ; # noise std
    					  vector < lower =0 >[ nX-1] lambda_tilde ; # ’ truncated ’ local shrinkage parameter
    					  real < lower =0 > c; # slab scale
    					  vector [ nX-1] beta ; # regression coefficients
    					  vector [ n] mu; # latent function values
    					  sigma = exp ( logsigma );
    					  c = slab_scale * sqrt ( caux );
    					  lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) );
    					  beta = z .* lambda_tilde * tau ;
    					  mu = cbeta0 + Xc* beta ;
    					  }
    					  model {
    					  # half -t priors for lambdas and tau , and inverse - gamma for c ^2
    					  z ~ normal (0 , 1);
    					  lambda ~ student_t ( nu_local , 0, 1);
    					  tau ~ student_t ( nu_global , 0 , scale_global * sigma );
    					  caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );
    					  cbeta0 ~ normal (0 , scale_icept );
    					  Y ~ normal (mu , sigma );
    					  }
    					  generated quantities { 
    					  real beta0;  // population-level intercept 
    					  vector[n] log_lik;
    					  beta0 = cbeta0 - dot_product(means_X, beta);
    					  for (i in 1:n) {
    					  log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma);
    					  }
    					  }"
    
    X = model.matrix(~cLAT * cLONG, data = paruelo)
    paruelo.list <- with(paruelo, list(Y = sqrt(C3), X = X, nX = ncol(X), n = nrow(paruelo), 
        scale_icept = 100, scale_global = 1, nu_global = 1, nu_local = 1, slab_scale = 2, 
        slab_df = 4))
    
    paruelo.rstan.sparsity <- stan(data = paruelo.list, model_code = modelString, 
        chains = 3, iter = 4000, warmup = 2000, thin = 3, save_dso = TRUE)
    
    SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 1).
    
    Gradient evaluation took 6.3e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 4000 [  0%]  (Warmup)
    Iteration:  400 / 4000 [ 10%]  (Warmup)
    Iteration:  800 / 4000 [ 20%]  (Warmup)
    Iteration: 1200 / 4000 [ 30%]  (Warmup)
    Iteration: 1600 / 4000 [ 40%]  (Warmup)
    Iteration: 2000 / 4000 [ 50%]  (Warmup)
    Iteration: 2001 / 4000 [ 50%]  (Sampling)
    Iteration: 2400 / 4000 [ 60%]  (Sampling)
    Iteration: 2800 / 4000 [ 70%]  (Sampling)
    Iteration: 3200 / 4000 [ 80%]  (Sampling)
    Iteration: 3600 / 4000 [ 90%]  (Sampling)
    Iteration: 4000 / 4000 [100%]  (Sampling)
    
     Elapsed Time: 1.72439 seconds (Warm-up)
                   1.36468 seconds (Sampling)
                   3.08907 seconds (Total)
    
    
    SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 2).
    
    Gradient evaluation took 2.2e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.22 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 4000 [  0%]  (Warmup)
    Iteration:  400 / 4000 [ 10%]  (Warmup)
    Iteration:  800 / 4000 [ 20%]  (Warmup)
    Iteration: 1200 / 4000 [ 30%]  (Warmup)
    Iteration: 1600 / 4000 [ 40%]  (Warmup)
    Iteration: 2000 / 4000 [ 50%]  (Warmup)
    Iteration: 2001 / 4000 [ 50%]  (Sampling)
    Iteration: 2400 / 4000 [ 60%]  (Sampling)
    Iteration: 2800 / 4000 [ 70%]  (Sampling)
    Iteration: 3200 / 4000 [ 80%]  (Sampling)
    Iteration: 3600 / 4000 [ 90%]  (Sampling)
    Iteration: 4000 / 4000 [100%]  (Sampling)
    
     Elapsed Time: 2.13692 seconds (Warm-up)
                   1.76853 seconds (Sampling)
                   3.90545 seconds (Total)
    
    
    SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 3).
    
    Gradient evaluation took 2.1e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 4000 [  0%]  (Warmup)
    Iteration:  400 / 4000 [ 10%]  (Warmup)
    Iteration:  800 / 4000 [ 20%]  (Warmup)
    Iteration: 1200 / 4000 [ 30%]  (Warmup)
    Iteration: 1600 / 4000 [ 40%]  (Warmup)
    Iteration: 2000 / 4000 [ 50%]  (Warmup)
    Iteration: 2001 / 4000 [ 50%]  (Sampling)
    Iteration: 2400 / 4000 [ 60%]  (Sampling)
    Iteration: 2800 / 4000 [ 70%]  (Sampling)
    Iteration: 3200 / 4000 [ 80%]  (Sampling)
    Iteration: 3600 / 4000 [ 90%]  (Sampling)
    Iteration: 4000 / 4000 [100%]  (Sampling)
    
     Elapsed Time: 1.83401 seconds (Warm-up)
                   2.12364 seconds (Sampling)
                   3.95765 seconds (Total)
    
    tidyMCMC(paruelo.rstan.sparsity, pars = c("beta[1]", "beta[2]", "beta[3]"), 
        conf.int = TRUE, conf.type = "HPDinterval", rhat = TRUE, ess = TRUE)
    
         term     estimate    std.error      conf.low   conf.high     rhat  ess
    1 beta[1]  0.042527103 0.0050097190  0.0327966631 0.052135937 1.000417 2001
    2 beta[2] -0.001716738 0.0031593104 -0.0087583583 0.004329824 1.007750  302
    3 beta[3]  0.002134985 0.0007772265  0.0005952987 0.003651370 1.005599 1344
    
    library(bayesplot)
    mcmc_areas(as.matrix(paruelo.rstan.sparsity), pars = c("beta[1]", "beta[2]", 
        "beta[3]"))
    
    plot of chunk tut7.2bQ1.13a2
    n = nrow(paruelo)
    +\n X = 2
    p0 = 1
    global_scale = p0/(nX - p0)/sqrt(n)
    paruelo.rstanarm.sparsity = stan_glm(sqrt(C3) ~ cLAT * cLONG, data = paruelo, 
        iter = 2000, warmup = 200, chains = 3, thin = 2, refresh = 0, prior_intercept = normal(0, 
            100), prior = hs(df = 1, global_df = 1, global_scale = global_scale), 
        prior_aux = cauchy(0, 2))
    
    Gradient evaluation took 6e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.6 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 1.0696 seconds (Warm-up)
                   3.66112 seconds (Sampling)
                   4.73072 seconds (Total)
    
    
    Gradient evaluation took 2.5e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.25 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.751543 seconds (Warm-up)
                   4.86999 seconds (Sampling)
                   5.62153 seconds (Total)
    
    
    Gradient evaluation took 1.7e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.915796 seconds (Warm-up)
                   4.15483 seconds (Sampling)
                   5.07063 seconds (Total)
    
    print(paruelo.rstanarm.sparsity)
    
    stan_glm
     family:  gaussian [identity]
     formula: sqrt(C3) ~ cLAT * cLONG
    ------
    
    Estimates:
                Median MAD_SD
    (Intercept) 0.4    0.0   
    cLAT        0.0    0.0   
    cLONG       0.0    0.0   
    cLAT:cLONG  0.0    0.0   
    sigma       0.2    0.0   
    
    Sample avg. posterior predictive 
    distribution of y (X = xbar):
             Median MAD_SD
    mean_PPD 0.4    0.0   
    
    ------
    For info on the priors used see help('prior_summary.stanreg').
    
    tidyMCMC(paruelo.rstanarm.sparsity$stanfit, conf.int = TRUE, conf.method = "HPDinterval", 
        rhat = TRUE, ess = TRUE)
    
               term      estimate    std.error      conf.low    conf.high     rhat  ess
    1   (Intercept)   0.427902575 0.0236414863   0.384058534  0.476529216 1.000818 2231
    2          cLAT   0.042106547 0.0051212043   0.031640909  0.052108292 1.000424 2700
    3         cLONG  -0.001616742 0.0030434350  -0.008075258  0.004223560 1.001325 2422
    4    cLAT:cLONG   0.002078546 0.0007825663   0.000561742  0.003633725 1.000547 2164
    5         sigma   0.201559149 0.0171438985   0.169816432  0.235290681 1.000151 2307
    6      mean_PPD   0.434643137 0.0335247480   0.370934109  0.501939647 1.000614 2414
    7 log-posterior -13.229908688 2.9708421088 -18.809548038 -7.664415828 1.001692 1304
    
    library(bayesplot)
    mcmc_areas(as.matrix(paruelo.rstanarm.sparsity), regex_par = "cL")
    
    plot of chunk tut7.2bQ1.13b1
    n = nrow(paruelo)
    +\n X = 2
    p0 = 1
    global_scale = p0/(nX - p0)/sqrt(n)
    paruelo.brms.sparsity = brm(sqrt(C3) ~ cLAT * cLONG, data = paruelo, iter = 2000, 
        warmup = 200, chains = 3, thin = 2, refresh = 0, prior = c(prior(normal(0, 
            100), class = "Intercept"), prior(horseshoe(df = 1, par_ratio = par_ratio), 
            class = "b"), prior(cauchy(0, 5), class = "sigma")))
    
    Gradient evaluation took 4.2e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.42 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.241516 seconds (Warm-up)
                   0.957428 seconds (Sampling)
                   1.19894 seconds (Total)
    
    
    Gradient evaluation took 1.9e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.226307 seconds (Warm-up)
                   1.43232 seconds (Sampling)
                   1.65862 seconds (Total)
    
    
    Gradient evaluation took 1.4e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.250383 seconds (Warm-up)
                   1.40852 seconds (Sampling)
                   1.6589 seconds (Total)
    
    print(paruelo.brms.sparsity)
    
     Family: gaussian(identity) 
    Formula: sqrt(C3) ~ cLAT * cLONG 
       Data: paruelo (Number of observations: 73) 
    Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; 
             total post-warmup samples = 2700
        ICs: LOO = NA; WAIC = NA; R2 = NA
     
    Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    Intercept      0.43      0.02     0.38     0.48       2509    1
    cLAT           0.04      0.01     0.03     0.05       1931    1
    cLONG          0.00      0.00    -0.01     0.00       2543    1
    cLAT:cLONG     0.00      0.00     0.00     0.00       2376    1
    
    Family Specific Parameters: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    sigma      0.2      0.02     0.17     0.24       2269    1
    
    Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
    is a crude measure of effective sample size, and Rhat is the potential 
    scale reduction factor on split chains (at convergence, Rhat = 1).
    
    tidyMCMC(paruelo.brms.sparsity$fit, conf.int = TRUE, conf.method = "HPDinterval", 
        rhat = TRUE, ess = TRUE)
    
              term     estimate    std.error     conf.low   conf.high      rhat  ess
    1  b_Intercept  0.429311590 0.0240182589  0.382581777 0.476884717 0.9997970 2509
    2       b_cLAT  0.042280287 0.0050651053  0.032019965 0.051715046 0.9992565 1931
    3      b_cLONG -0.001707310 0.0029955918 -0.007777912 0.004060562 1.0005126 2543
    4 b_cLAT:cLONG  0.002088576 0.0008262995  0.000429252 0.003678067 0.9998758 2376
    5        sigma  0.202876614 0.0179247843  0.170190244 0.239294598 0.9998154 2269
    6        hs_c2  1.907347192 2.9752114577  0.238132959 5.487135251 0.9994273 1930
    
    library(bayesplot)
    mcmc_areas(as.matrix(paruelo.brms.sparsity), regex_par = "cL")
    
    plot of chunk tut7.2bQ1.13c1

Multiple Linear Regression

Loyn (1987) modeled the abundance of forest birds with six predictor variables (patch area, distance to nearest patch, distance to nearest larger patch, grazing intensity, altitude and years since the patch had been isolated).

Download Loyn data set
Format of loyn.csv data file
ABUNDDISTLDISTAREAGRAZEALTYR.ISOL
..............

ABUNDAbundance of forest birds in patch- response variable
DISTDistance to nearest patch - predictor variable
LDISTDistance to nearest larger patch - predictor variable
AREASize of the patch - predictor variable
GRAZEGrazing intensity (1 to 5, representing light to heavy) - predictor variable
ALTAltitude - predictor variable
YR.ISOLNumber of years since the patch was isolated - predictor variable
Saltmarsh

Open the loyn data file. HINT.
loyn <- read.table("../downloads/data/loyn.csv", header = T, sep = ",", strip.white = T)
head(loyn)
  ABUND AREA YR.ISOL DIST LDIST GRAZE ALT
1   5.3  0.1    1968   39    39     2 160
2   2.0  0.5    1920  234   234     5  60
3   1.5  0.5    1900  104   311     5 140
4  17.1  1.0    1966   66    66     3 160
5  13.8  1.0    1918  246   246     5 140
6  14.1  1.0    1965  234   285     3 130
  1. Perform exploratory data analysis to help guide what sort of analysis will be suitable and whether the various assumptions are likely to be met.
    # via car's scatterplotMatrix function
    library(car)
    scatterplotMatrix(~ABUND + DIST + LDIST + AREA + GRAZE + ALT + YR.ISOL,
        data = loyn, diagonal = "boxplot")
    
    plot of chunk tut7.3bQ2.1
    # via lattice
    library(lattice)
    splom.lat <- splom(loyn, type = c("p", "r"))
    print(splom.lat)
    
    plot of chunk tut7.3bQ2.1
    # via ggplot2 - warning these are slow!
    library(GGally)
    ggpairs(loyn, lower = list(continuous = "smooth"), diag = list(continuous = "density"),
        axisLabels = "none")
    
    plot of chunk tut7.3bQ2.1
  2. Abund (bird abundance) seems reasonably normal, however, the same cannot be said for AREA DIST and LDIST Try applying temporary logarithmic (base 10) transformations to these variables (HINT). Does this improve some of these specific assumptions (y or n)?

  3. Whilst in many model fitting and graphing routines are able to perform transformation inline, for more complex examples, it is often advisable to also create transformed versions of variables.
    # via car's scatterplotMatrix function
    library(car)
    scatterplotMatrix(~ABUND + log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE +
        ALT + YR.ISOL, data = loyn, diagonal = "boxplot")
    
    plot of chunk tut7.3bQ2.1b
    ggpairs(with(loyn, data.frame(logDIST = log10(DIST), logLDIST = log(LDIST),
        logAREA = log10(AREA), GRAZE, ALT, YR.ISOL)), lower = list(continuous = "smooth"),
        diag = list(continuous = "density"), axisLabels = "none")
    
    plot of chunk tut7.3bQ2.1b
  4. Explore (multi)collinearity.
    loyn.lm <- lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT + YR.ISOL, data = loyn)
    vif(loyn.lm)
    
     log10(DIST) log10(LDIST)  log10(AREA)        GRAZE          ALT      YR.ISOL 
        1.654553     2.009749     1.911514     2.524814     1.467937     1.804769 
    
    1/vif(loyn.lm)
    
     log10(DIST) log10(LDIST)  log10(AREA)        GRAZE          ALT      YR.ISOL 
       0.6043930    0.4975746    0.5231454    0.3960688    0.6812282    0.5540876 
    
  5. Despite the apparent correlation between DIST and LDIST, this does not appear to manifest into a statistical issue.

  6. In preparation for a Bayesian regression models, we should center each of the predictor variables. Note, transformations must occur prior to centering...
    mean.DIST = mean(log10(loyn$DIST))
    mean.LDIST = mean(log10(loyn$LDIST))
    mean.AREA = mean(log10(loyn$AREA))
    mean.GRAZE = mean(loyn$GRAZE)
    mean.ALT = mean(loyn$ALT)
    mean.YR.ISOL = mean(loyn$YR.ISOL)
    loyn = loyn %>% dplyr:::mutate(cDIST = log10(DIST), cDIST = cDIST - mean(cDIST), cLDIST = log10(LDIST),
        cLDIST = cLDIST - mean(cLDIST), cAREA = log10(AREA), cAREA = cAREA - mean(cAREA), cGRAZE = GRAZE -
            mean(GRAZE), cALT = ALT - mean(ALT), cYR.ISOL = YR.ISOL - mean(YR.ISOL))
    
  7. Fit the appropriate Bayesian model to explore the effect of the various predictors on the Abundance of forest birds.
    library(MCMCpack)
    loyn.mcmcpack = MCMCregress(ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn)
    
    modelString = "
      model {
      #Likelihood
      for (i in 1:n) {
      y[i]~dnorm(mu[i],tau)
      mu[i] <- beta0 + inprod(beta[],X[i,])
      }
      #Priors
      beta0 ~ dnorm(0.01,1.0E-6)
      for (j in 1:nX) {
      beta[j] ~ dnorm(0.01,1.0E-6)
      }
      tau <- 1 / (sigma * sigma)
      sigma~dunif(0,100)
      }
      "
    
    X = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn)
    loyn.list <- with(loyn, list(y = ABUND, X = X[, -1], nX = ncol(X) - 1,
        n = nrow(loyn)))
    
    params <- c("beta0", "beta", "sigma")
    burnInSteps = 3000
    nChains = 3
    numSavedSteps = 15000
    thinSteps = 10
    nIter = ceiling((numSavedSteps * thinSteps)/nChains)
    
    loyn.r2jags <- jags(data = loyn.list, inits = NULL, parameters.to.save = params,
        model.file = textConnection(modelString), n.chains = nChains, n.iter = nIter,
        n.burnin = burnInSteps, n.thin = thinSteps)
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
    Graph information:
       Observed stochastic nodes: 56
       Unobserved stochastic nodes: 8
       Total graph size: 590
    
    Initializing model
    
    					  modelString=" 
    					  data { 
    					  int n;   // total number of observations 
    					  vector[n] Y;      // response variable 
    					  int nX;  // number of effects 
    					  matrix[n, nX] X;   // model matrix 
    					  } 
    					  transformed data { 
    					  matrix[n, nX - 1] Xc;  // centered version of X 
    					  vector[nX - 1] means_X;  // column means of X before centering 
    					  
    					  for (i in 2:nX) { 
    					  means_X[i - 1] = mean(X[, i]); 
    					  Xc[, i - 1] = X[, i] - means_X[i - 1]; 
    					  }  
    					  } 
    					  parameters { 
    					  vector[nX-1] beta;  // population-level effects 
    					  real cbeta0;  // center-scale intercept 
    					  real sigma;  // residual SD 
    					  } 
    					  transformed parameters { 
    					  } 
    					  model { 
    					  vector[n] mu; 
    					  mu = Xc * beta + cbeta0; 
    					  // prior specifications 
    					  beta ~ normal(0, 10); 
    					  cbeta0 ~ normal(0, 10); 
    					  sigma ~ cauchy(0, 5); 
    					  // likelihood contribution 
    					  Y ~ normal(mu, sigma); 
    					  } 
    					  generated quantities { 
    					  real beta0;  // population-level intercept 
    					  vector[n] log_lik;
    					  beta0 = cbeta0 - dot_product(means_X, beta);
    					  for (i in 1:n) {
    					  log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma);
    					  }
    					  }
    					  "
    
    X = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn)
    loyn.list <- with(loyn, list(Y = ABUND, X = X, nX = ncol(X), n = nrow(loyn)))
    
    library(rstan)
    loyn.rstan <- stan(data = loyn.list, model_code = modelString, chains = 3, iter = 5000, warmup = 500,
        thin = 2)
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 1).
    
    Gradient evaluation took 3.9e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.39 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration:  501 / 5000 [ 10%]  (Sampling)
    Iteration: 1000 / 5000 [ 20%]  (Sampling)
    Iteration: 1500 / 5000 [ 30%]  (Sampling)
    Iteration: 2000 / 5000 [ 40%]  (Sampling)
    Iteration: 2500 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.138507 seconds (Warm-up)
                   0.270008 seconds (Sampling)
                   0.408515 seconds (Total)
    
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 2).
    
    Gradient evaluation took 9e-06 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration:  501 / 5000 [ 10%]  (Sampling)
    Iteration: 1000 / 5000 [ 20%]  (Sampling)
    Iteration: 1500 / 5000 [ 30%]  (Sampling)
    Iteration: 2000 / 5000 [ 40%]  (Sampling)
    Iteration: 2500 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.122783 seconds (Warm-up)
                   0.247383 seconds (Sampling)
                   0.370166 seconds (Total)
    
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 3).
    
    Gradient evaluation took 9e-06 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration:  501 / 5000 [ 10%]  (Sampling)
    Iteration: 1000 / 5000 [ 20%]  (Sampling)
    Iteration: 1500 / 5000 [ 30%]  (Sampling)
    Iteration: 2000 / 5000 [ 40%]  (Sampling)
    Iteration: 2500 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.141817 seconds (Warm-up)
                   0.28119 seconds (Sampling)
                   0.423007 seconds (Total)
    
    loyn.rstanarm = stan_glm(ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT +
        cYR.ISOL, data = loyn, iter = 5000, warmup = 500, chains = 3, thin = 2,
        refresh = 0, prior_intercept = normal(0, 10), prior = normal(0, 10),
        prior_aux = cauchy(0, 5))
    
    Gradient evaluation took 3.8e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.38 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.099257 seconds (Warm-up)
                   0.455713 seconds (Sampling)
                   0.55497 seconds (Total)
    
    
    Gradient evaluation took 1.6e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.105746 seconds (Warm-up)
                   0.410567 seconds (Sampling)
                   0.516313 seconds (Total)
    
    
    Gradient evaluation took 1.5e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.096972 seconds (Warm-up)
                   0.416244 seconds (Sampling)
                   0.513216 seconds (Total)
    
    loyn.brm = brm(ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn, iter = 5000, warmup = 500, chains = 3, thin = 2, refresh = 0,
        prior = c(prior(normal(0, 10), class = "Intercept"), prior(normal(0,
            10), class = "b"), prior(cauchy(0, 5), class = "sigma")))
    
    Gradient evaluation took 1.7e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.126585 seconds (Warm-up)
                   0.269009 seconds (Sampling)
                   0.395594 seconds (Total)
    
    
    Gradient evaluation took 9e-06 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.125164 seconds (Warm-up)
                   0.245459 seconds (Sampling)
                   0.370623 seconds (Total)
    
    
    Gradient evaluation took 9e-06 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.120374 seconds (Warm-up)
                   0.273265 seconds (Sampling)
                   0.393639 seconds (Total)
    
  8. Explore MCMC diagnostics
    library(MCMCpack)
    plot(loyn.mcmcpack)
    
    plot of chunk tut7.3bQ2.6a
    plot of chunk tut7.3bQ2.6a
    raftery.diag(loyn.mcmcpack)
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                       
                 Burn-in  Total Lower bound  Dependence
                 (M)      (N)   (Nmin)       factor (I)
     (Intercept) 2        3741  3746         0.999     
     cDIST       2        3802  3746         1.010     
     cLDIST      2        3771  3746         1.010     
     cAREA       2        3929  3746         1.050     
     cGRAZE      2        3771  3746         1.010     
     cALT        2        3802  3746         1.010     
     cYR.ISOL    2        3771  3746         1.010     
     sigma2      2        3929  3746         1.050     
    
    autocorr.diag(loyn.mcmcpack)
    
            (Intercept)         cDIST       cLDIST        cAREA       cGRAZE          cALT     cYR.ISOL
    Lag 0   1.000000000  1.0000000000  1.000000000  1.000000000  1.000000000  1.0000000000  1.000000000
    Lag 1   0.003859081 -0.0031673984 -0.019186754 -0.011659408  0.006409519 -0.0038454362 -0.006161199
    Lag 5   0.001559142  0.0112382480 -0.002730253 -0.021259031 -0.016927056 -0.0144856450 -0.020155560
    Lag 10 -0.005782004 -0.0008454841  0.002457735  0.015401555  0.006719719 -0.0064410769  0.002482767
    Lag 50  0.014999125  0.0108067185 -0.005566923 -0.008437947 -0.007984261 -0.0005289131  0.002831566
                  sigma2
    Lag 0   1.0000000000
    Lag 1   0.1473528757
    Lag 5   0.0002854281
    Lag 10 -0.0046914506
    Lag 50 -0.0058806412
    
    library(R2jags)
    library(coda)
    loyn.mcmc = as.mcmc(loyn.r2jags)
    plot(loyn.mcmc)
    
    plot of chunk tut7.3bQ2.6b
    plot of chunk tut7.3bQ2.6b
    plot of chunk tut7.3bQ2.6b
    raftery.diag(loyn.mcmc)
    
    [[1]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                    
              Burn-in  Total Lower bound  Dependence
              (M)      (N)   (Nmin)       factor (I)
     beta0    20       37020 3746          9.88     
     beta[1]  20       38330 3746         10.20     
     beta[2]  20       38330 3746         10.20     
     beta[3]  20       37020 3746          9.88     
     beta[4]  20       35750 3746          9.54     
     beta[5]  10       37660 3746         10.10     
     beta[6]  20       38330 3746         10.20     
     deviance 20       37020 3746          9.88     
     sigma    20       37020 3746          9.88     
    
    
    [[2]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                    
              Burn-in  Total Lower bound  Dependence
              (M)      (N)   (Nmin)       factor (I)
     beta0    20       36380 3746          9.71     
     beta[1]  10       37660 3746         10.10     
     beta[2]  20       38330 3746         10.20     
     beta[3]  20       39680 3746         10.60     
     beta[4]  20       39000 3746         10.40     
     beta[5]  20       39000 3746         10.40     
     beta[6]  20       36380 3746          9.71     
     deviance 20       38330 3746         10.20     
     sigma    20       39000 3746         10.40     
    
    
    [[3]]
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
                                                    
              Burn-in  Total Lower bound  Dependence
              (M)      (N)   (Nmin)       factor (I)
     beta0    20       37020 3746          9.88     
     beta[1]  20       37020 3746          9.88     
     beta[2]  10       37660 3746         10.10     
     beta[3]  20       36380 3746          9.71     
     beta[4]  20       39000 3746         10.40     
     beta[5]  20       39730 3746         10.60     
     beta[6]  20       38330 3746         10.20     
     deviance 20       39000 3746         10.40     
     sigma    20       35750 3746          9.54     
    
    autocorr.diag(loyn.mcmc)
    
                    beta0      beta[1]       beta[2]      beta[3]      beta[4]       beta[5]
    Lag 0    1.0000000000  1.000000000  1.000000e+00  1.000000000 1.0000000000  1.0000000000
    Lag 10  -0.0002701973  0.005792608  4.096609e-03 -0.002465089 0.0064268862 -0.0056341418
    Lag 50  -0.0024666264 -0.003008229  5.990082e-03 -0.002866591 0.0170218192 -0.0052958457
    Lag 100 -0.0055030580  0.007310639  6.781602e-05 -0.005953953 0.0147580101  0.0060628956
    Lag 500  0.0012328414 -0.005326759 -6.548372e-03  0.000738738 0.0005113358  0.0001472076
                  beta[6]     deviance        sigma
    Lag 0    1.0000000000  1.000000000  1.000000000
    Lag 10  -0.0011343808  0.006388500  0.012755663
    Lag 50   0.0092845920 -0.014166264  0.009013642
    Lag 100 -0.0065748440  0.001447733 -0.006410172
    Lag 500  0.0005790938 -0.011828114 -0.021543097
    
    library(rstan)
    library(coda)
    s = as.array(loyn.rstan)
    loyn.mcmc <- do.call(mcmc.list, plyr:::alply(s[, , c("beta0", "beta[1]", "beta[2]", "beta[3]", "sigma")],
        2, as.mcmc))
    plot(loyn.mcmc)
    
    plot of chunk tut7.3bQ2.6c
    plot of chunk tut7.3bQ2.6c
    raftery.diag(loyn.mcmc)
    
    $`1`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    $`2`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    $`3`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    autocorr.diag(loyn.mcmc)
    
                  beta0      beta[1]      beta[2]       beta[3]         sigma
    Lag 0   1.000000000  1.000000000  1.000000000  1.0000000000  1.0000000000
    Lag 1   0.029133427  0.037125861  0.061232330  0.0361683227  0.0743461011
    Lag 5  -0.011825015  0.002171057 -0.001297878 -0.0270706359  0.0264250864
    Lag 10 -0.021725986 -0.003345725  0.022047199 -0.0101347535  0.0001568589
    Lag 50  0.004095666  0.004849422  0.012866721  0.0006091856 -0.0001450329
    
    library(rstan)
    library(coda)
    stan_ac(loyn.rstan, pars = c("beta", "sigma"))
    
    plot of chunk tut7.3bQ2.6c1
    stan_rhat(loyn.rstan, pars = c("beta", "sigma"))
    
    plot of chunk tut7.3bQ2.6c1
    stan_ess(loyn.rstan, pars = c("beta", "sigma"))
    
    plot of chunk tut7.3bQ2.6c1
    # using Bayeseplot
    library(bayesplot)
    detach("package:reshape")
    mcmc_trace(as.array(loyn.rstan), regex_par = "beta|sigma")
    
    plot of chunk tut7.3bQ2.6c2
    mcmc_trace(as.array(loyn.rstan), regex_pars = "beta|sigma")
    
    plot of chunk tut7.3bQ2.6c2
    mcmc_dens(as.array(loyn.rstan), regex_par = "beta|sigma")
    
    plot of chunk tut7.3bQ2.6c2
    detach("package:reshape")
    library(bayesplot)
    mcmc_combo(as.array(loyn.rstan), regex_par = "beta|sigma")
    
    plot of chunk tut7.3bQ2.6c3
    s = as.array(loyn.rstanarm)
    loyn.mcmc <- do.call(mcmc.list, plyr:::alply(s[, , c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "YR.ISOL", "sigma")], 2, as.mcmc))
    
    Error in s[, , c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", : subscript out of bounds
    
    plot(loyn.mcmc)
    
    plot of chunk tut7.3bQ2.6d
    plot of chunk tut7.3bQ2.6d
    raftery.diag(loyn.mcmc)
    
    $`1`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    $`2`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    $`3`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    autocorr.diag(loyn.mcmc)
    
                  beta0      beta[1]      beta[2]       beta[3]         sigma
    Lag 0   1.000000000  1.000000000  1.000000000  1.0000000000  1.0000000000
    Lag 1   0.029133427  0.037125861  0.061232330  0.0361683227  0.0743461011
    Lag 5  -0.011825015  0.002171057 -0.001297878 -0.0270706359  0.0264250864
    Lag 10 -0.021725986 -0.003345725  0.022047199 -0.0101347535  0.0001568589
    Lag 50  0.004095666  0.004849422  0.012866721  0.0006091856 -0.0001450329
    
    library(rstanarm)
    library(coda)
    stan_ac(loyn.rstanarm, pars = c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL",
        "sigma"))
    
    plot of chunk tut7.3bQ2.6d1
    stan_rhat(loyn.rstanarm, pars = c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL",
        "sigma"))
    
    plot of chunk tut7.3bQ2.6d1
    stan_ess(loyn.rstanarm, pars = c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL",
        "sigma"))
    
    plot of chunk tut7.3bQ2.6d1
    # using Bayeseplot
    library(bayesplot)
    detach("package:reshape")
    mcmc_trace(as.array(loyn.rstanarm), regex_par = "^c|sigma")
    
    plot of chunk tut7.3bQ2.6d2
    mcmc_trace(as.array(loyn.rstanarm), regex_pars = "^c|sigma")
    
    plot of chunk tut7.3bQ2.6d2
    mcmc_dens(as.array(loyn.rstanarm), regex_par = "^c|sigma")
    
    plot of chunk tut7.3bQ2.6d2
    detach("package:reshape")
    library(bayesplot)
    mcmc_combo(as.array(loyn.rstanarm), regex_par = "^c|sigma")
    
    plot of chunk tut7.3bQ2.6d3
    loyn.mcmc = as.mccm(loyn.brm)
    
    Error in eval(expr, envir, enclos): could not find function "as.mccm"
    
    plot(loyn.mcmc)
    
    plot of chunk tut7.3bQ2.6e
    plot of chunk tut7.3bQ2.6e
    raftery.diag(loyn.mcmc)
    
    $`1`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    $`2`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    $`3`
    
    Quantile (q) = 0.025
    Accuracy (r) = +/- 0.005
    Probability (s) = 0.95 
    
    You need a sample size of at least 3746 with these values of q, r and s
    
    autocorr.diag(loyn.mcmc)
    
                  beta0      beta[1]      beta[2]       beta[3]         sigma
    Lag 0   1.000000000  1.000000000  1.000000000  1.0000000000  1.0000000000
    Lag 1   0.029133427  0.037125861  0.061232330  0.0361683227  0.0743461011
    Lag 5  -0.011825015  0.002171057 -0.001297878 -0.0270706359  0.0264250864
    Lag 10 -0.021725986 -0.003345725  0.022047199 -0.0101347535  0.0001568589
    Lag 50  0.004095666  0.004849422  0.012866721  0.0006091856 -0.0001450329
    
    library(coda)
    stan_ac(loyn.brm$fit)
    
    plot of chunk tut7.3bQ2.6e1
    stan_rhat(loyn.brm$fit)
    
    plot of chunk tut7.3bQ2.6e1
    stan_ess(loyn.brm$fit)
    
    plot of chunk tut7.3bQ2.6e1
    # using Bayeseplot
    library(bayesplot)
    detach("package:reshape")
    mcmc_trace(as.array(loyn.brm), regex_par = "^b|sigma")
    
    plot of chunk tut7.3bQ2.6e2
    mcmc_trace(as.array(loyn.rstanarm), regex_pars = "^c|sigma")
    
    plot of chunk tut7.3bQ2.6e2
    mcmc_dens(as.array(loyn.rstanarm), regex_par = "^c|sigma")
    
    plot of chunk tut7.3bQ2.6e2
    detach("package:reshape")
    library(bayesplot)
    mcmc_combo(as.array(loyn.brm), regex_par = "^b|sigma")
    
    plot of chunk tut7.3bQ2.6e3
  9. Perform model validation
    library(MCMCpack)
    loyn.mcmc = as.data.frame(loyn.mcmcpack)
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata)
    ## get median parameter estimates
    coefs = apply(loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")],
        2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = loyn$ABUND - fit
    ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
    
    plot of chunk tut7.3bQ2.7a1
    library(MCMCpack)
    loyn.mcmc = as.data.frame(loyn.mcmcpack)
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata)
    ## get median parameter estimates
    coefs = apply(loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")],
        2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = loyn$ABUND - fit
    newdata = newdata %>% cbind(fit, resid)
    newdata.melt = newdata %>% gather(key = Pred, value = Value, cDIST:cYR.ISOL)
    ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scale = "free_x")
    
    plot of chunk tut7.3bQ2.7a2
    library(MCMCpack)
    loyn.mcmc = as.data.frame(loyn.mcmcpack)
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata)
    ## get median parameter estimates
    coefs = apply(loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")],
        2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = loyn$ABUND - fit
    sresid = resid/sd(resid)
    ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
    
    plot of chunk tut7.3bQ2.7a3
    library(MCMCpack)
    loyn.mcmc = as.data.frame(loyn.mcmcpack)
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    ## get median parameter estimates
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    
    Error in coefs %*% t(Xmat): requires numeric/complex matrix/vector arguments
    
    ## draw samples from this model
    yRep = sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(loyn), fit[i, ],
        sqrt(loyn.mcmc[i, "sigma2"])))
    
    Error in fit[i, ]: incorrect number of dimensions
    
    ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"),
        alpha = 0.5) + geom_density(data = loyn, aes(x = ABUND, fill = "Obs"),
        alpha = 0.5)
    
    plot of chunk tut7.3bQ2.7a4
    library(bayesplot)
    mcmc_intervals(as.matrix(loyn.mcmcpack), regex_pars = "^c")
    
    plot of chunk tut7.3bQ2.7a5
    mcmc_areas(as.matrix(loyn.mcmcpack), regex_pars = "^c")
    
    plot of chunk tut7.3bQ2.7a5
    loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata)
    ## get median parameter estimates
    coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")],
        2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = loyn$ABUND - fit
    ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
    
    plot of chunk tut7.3bQ2.7b1
    loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata)
    ## get median parameter estimates
    coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")],
        2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = loyn$ABUND - fit
    newdata = newdata %>% cbind(fit, resid)
    newdata.melt = newdata %>% gather(key = Pred, value = Value, cDIST:cYR.ISOL)
    ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scale = "free_x")
    
    plot of chunk tut7.3bQ2.7b2
    loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata)
    ## get median parameter estimates
    coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")],
        2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = loyn$ABUND - fit
    sresid = resid/sd(resid)
    ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
    
    plot of chunk tut7.3bQ2.7b3
    loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    ## get median parameter estimates
    coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]",
        "beta[5]", "beta[6]")]
    fit = coefs %*% t(Xmat)
    ## draw samples from this model
    yRep = sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(loyn), fit[i, ],
        loyn.mcmc[i, "sigma"]))
    ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"),
        alpha = 0.5) + geom_density(data = loyn, aes(x = ABUND, fill = "Obs"),
        alpha = 0.5)
    
    plot of chunk tut7.3bQ2.7b4
    loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix %>% as.data.frame %>%
        dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix
    coefs = loyn.mcmc[, 1:7]
    # generate prediction matrix
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    # OR newdata = rbind(new_data(loyn, seq='cDIST', len=100),
    # new_data(loyn, seq='cLDIST', len=100), new_data(loyn,
    # seq='cAREA', len=100), new_data(loyn, seq='cGRAZE',
    # len=100), new_data(loyn, seq='cALT', len=100),
    # new_data(loyn, seq='cYR.ISOL', len=100))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT +
        cYR.ISOL, data = newdata)
    fit = coefs %*% t(Xmat)
    # add noise for prediction instead of confidence
    fit = t(sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(newdata),
        fit[i, ], loyn.mcmc[i, "sigma"])))
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0)
    loyn.melt = loyn %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = loyn.melt,
        aes(y = ABUND)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred,
        scale = "free_x")
    
    plot of chunk tut7.3bQ2.7b5
    library(bayesplot)
    mcmc_intervals(loyn.r2jags$BUGSoutput$sims.matrix, regex_pars = "^beta")
    
    plot of chunk tut7.3bQ2.7b6
    mcmc_areas(loyn.r2jags$BUGSoutput$sims.matrix, regex_pars = "^beta")
    
    plot of chunk tut7.3bQ2.7b6
    loyn.mcmc = as.data.frame(loyn.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata)
    ## get median parameter estimates
    coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")],
        2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = loyn$ABUND - fit
    ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
    
    plot of chunk tut7.3bQ2.7c1
    loyn.mcmc = as.data.frame(loyn.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata)
    ## get median parameter estimates
    coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")],
        2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = loyn$ABUND - fit
    newdata = newdata %>% cbind(fit, resid)
    newdata.melt = newdata %>% gather(key = Pred, value = Value, cDIST:cYR.ISOL)
    ggplot(newdata.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scale = "free_x")
    
    plot of chunk tut7.3bQ2.7c2
    loyn.mcmc = as.data.frame(loyn.rstan) %>% dplyr:::select(beta0, starts_with("beta"), sigma) %>% as.matrix
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, newdata)
    ## get median parameter estimates
    coefs = apply(loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]", "beta[5]", "beta[6]")],
        2, median)
    fit = as.vector(coefs %*% t(Xmat))
    resid = loyn$ABUND - fit
    sresid = resid/sd(resid)
    ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
    
    plot of chunk tut7.3bQ2.7c3
    loyn.mcmc = as.data.frame(loyn.rstan) %>% dplyr:::select(beta0, starts_with("beta"),
        sigma) %>% as.matrix
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    ## get median parameter estimates
    coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]",
        "beta[5]", "beta[6]")]
    fit = coefs %*% t(Xmat)
    ## draw samples from this model
    yRep = sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(loyn), fit[i, ],
        loyn.mcmc[i, "sigma"]))
    ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"),
        alpha = 0.5) + geom_density(data = loyn, aes(x = ABUND, fill = "Obs"),
        alpha = 0.5)
    
    plot of chunk tut7.3bQ2.7c4
    loyn.mcmc = as.data.frame(loyn.rstan) %>% dplyr:::select(beta0,
        starts_with("beta"), sigma) %>% as.matrix
    coefs = loyn.mcmc[, 1:7]
    # generate prediction matrix
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    # OR newdata = rbind(new_data(loyn, seq='cDIST', len=100),
    # new_data(loyn, seq='cLDIST', len=100), new_data(loyn,
    # seq='cAREA', len=100), new_data(loyn, seq='cGRAZE',
    # len=100), new_data(loyn, seq='cALT', len=100),
    # new_data(loyn, seq='cYR.ISOL', len=100))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT +
        cYR.ISOL, data = newdata)
    fit = coefs %*% t(Xmat)
    # add noise for prediction instead of confidence
    fit = t(sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(newdata),
        fit[i, ], loyn.mcmc[i, "sigma"])))
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0)
    loyn.melt = loyn %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = loyn.melt,
        aes(y = ABUND)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred,
        scale = "free_x")
    
    plot of chunk tut7.3bQ2.7c5
    library(bayesplot)
    loyn.mcmc = as.matrix(loyn.rstan)
    mcmc_intervals(loyn.mcmc, regex_pars = "^beta")
    
    plot of chunk tut7.3bQ2.7c6
    mcmc_areas(loyn.mcmc, regex_pars = "^beta")
    
    plot of chunk tut7.3bQ2.7c6
    resid = resid(loyn.rstanarm)
    fit = fitted(loyn.rstanarm)
    ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
    
    plot of chunk tut7.3bQ2.7d1
    resid = resid(loyn.rstanarm)
    loyn.melt = loyn %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cDIST:cYR.ISOL)
    ggplot(loyn.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scales = "free_x")
    
    plot of chunk tut7.3bQ2.7d2
    resid = resid(loyn.rstanarm)
    sresid = resid/sd(resid)
    fit = fitted(loyn.rstanarm)
    ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
    
    plot of chunk tut7.3bQ2.7d3
    y_pred = posterior_predict(loyn.rstanarm)
    newdata = loyn %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value",
        -ABUND:-cYR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val", cDIST:cYR.ISOL)
    loyn.melt = loyn %>% gather(key = "Pred", value = "Pred_val", cDIST:cYR.ISOL)
    ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue",
        fill = "blue", alpha = 0.5) + geom_violin(data = loyn.melt, aes(y = ABUND,
        x = Pred_val), fill = "red", color = "red", alpha = 0.5) + facet_wrap(~Pred,
        scales = "free_x")
    
    plot of chunk tut7.3bQ2.7d4
    loyn.mcmc = as.data.frame(loyn.rstanarm) %>% dplyr:::select(matches("Inter"),
        starts_with("c"), sigma) %>% as.matrix
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    ## get median parameter estimates
    fit = coefs %*% t(Xmat)
    ## draw samples from this model
    yRep = sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(loyn), fit[i, ],
        loyn.mcmc[i, "sigma"]))
    ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"),
        alpha = 0.5) + geom_density(data = loyn, aes(x = ABUND, fill = "Obs"),
        alpha = 0.5)
    
    plot of chunk tut7.3bQ2.7d4a
    # generate prediction matrix
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    # OR newdata = rbind(new_data(loyn, seq='cDIST', len=100),
    # new_data(loyn, seq='cLDIST', len=100), new_data(loyn,
    # seq='cAREA', len=100), new_data(loyn, seq='cGRAZE',
    # len=100), new_data(loyn, seq='cALT', len=100),
    # new_data(loyn, seq='cYR.ISOL', len=100))
    fit = posterior_predict(loyn.rstanarm, newdata = newdata)
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0)
    loyn.melt = loyn %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = loyn.melt,
        aes(y = ABUND)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred,
        scale = "free_x")
    
    plot of chunk tut7.3bQ2.7d5
    library(bayesplot)
    loyn.mcmc = as.matrix(loyn.rstanarm)
    mcmc_intervals(loyn.mcmc, regex_pars = "^c")
    
    plot of chunk tut7.3bQ2.7d6
    mcmc_areas(loyn.mcmc, regex_pars = "^c")
    
    plot of chunk tut7.3bQ2.7d6
    resid = resid(loyn.brm)[, "Estimate"]
    fit = fitted(loyn.brm)[, "Estimate"]
    ggplot() + geom_point(data = NULL, aes(y = resid, x = fit))
    
    plot of chunk tut7.3bQ2.7e1
    resid = resid(loyn.brm)[, "Estimate"]
    loyn.melt = loyn %>% mutate(resid = resid) %>% gather(key = Pred, value = Value, cDIST:cYR.ISOL)
    ggplot(loyn.melt) + geom_point(aes(y = resid, x = Value)) + facet_wrap(~Pred, scales = "free_x")
    
    plot of chunk tut7.3bQ2.7e2
    resid = resid(loyn.brm)[, "Estimate"]
    sresid = resid/sd(resid)
    fit = fitted(loyn.brm)[, "Estimate"]
    ggplot() + geom_point(data = NULL, aes(y = sresid, x = fit))
    
    plot of chunk tut7.3bQ2.7e3
    y_pred = posterior_predict(loyn.brm)
    newdata = loyn %>% cbind(t(y_pred)) %>% gather(key = "Rep", value = "Value",
        -ABUND:-cYR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Pred_val", cDIST:cYR.ISOL)
    loyn.melt = loyn %>% gather(key = "Pred", value = "Pred_val", cDIST:cYR.ISOL)
    ggplot(newdata.melt, aes(Value, x = Pred_val)) + geom_violin(color = "blue",
        fill = "blue", alpha = 0.5) + geom_violin(data = loyn.melt, aes(y = ABUND,
        x = Pred_val), fill = "red", color = "red", alpha = 0.5) + facet_wrap(~Pred,
        scales = "free_x")
    
    plot of chunk tut7.3bQ2.7e4
    loyn.mcmc = as.data.frame(loyn.brm) %>% dplyr:::select(starts_with("b_"),
        sigma) %>% as.matrix
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    
    Error in loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE", : subscript out of bounds
    
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    ## get median parameter estimates
    fit = coefs %*% t(Xmat)
    ## draw samples from this model
    yRep = sapply(1:nrow(loyn.mcmc), function(i) rnorm(nrow(loyn), fit[i, ],
        loyn.mcmc[i, "sigma"]))
    ggplot() + geom_density(data = NULL, aes(x = as.vector(yRep), fill = "Model"),
        alpha = 0.5) + geom_density(data = loyn, aes(x = ABUND, fill = "Obs"),
        alpha = 0.5)
    
    plot of chunk tut7.3bQ2.7e4a
    # generate prediction matrix
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    # OR newdata = rbind(new_data(loyn, seq='cDIST', len=100),
    # new_data(loyn, seq='cLDIST', len=100), new_data(loyn,
    # seq='cAREA', len=100), new_data(loyn, seq='cGRAZE',
    # len=100), new_data(loyn, seq='cALT', len=100),
    # new_data(loyn, seq='cYR.ISOL', len=100))
    fit = posterior_predict(loyn.brm, newdata = newdata)
    newdata = newdata %>% cbind(tidyMCMC(as.mcmc(fit), conf.int = TRUE,
        conf.method = "HPDinterval"))
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value",
        cDIST:cYR.ISOL) %>% filter(round(Value, 5) != 0)
    loyn.melt = loyn %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL)
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_point(data = loyn.melt,
        aes(y = ABUND)) + geom_line() + geom_ribbon(aes(ymin = conf.low,
        ymax = conf.high), fill = "blue", alpha = 0.3) + scale_y_continuous("Abundance") +
        scale_x_continuous("") + theme_classic() + facet_wrap(~Pred,
        scale = "free_x")
    
    plot of chunk tut7.3bQ2.7e5
    library(bayesplot)
    loyn.mcmc = as.matrix(loyn.brm)
    mcmc_intervals(loyn.mcmc, regex_pars = "^b_")
    
    plot of chunk tut7.3bQ2.7e6
    mcmc_areas(loyn.mcmc, regex_pars = "^b_")
    
    plot of chunk tut7.3bQ2.7e6
  10. Explore parameter estimates
    library(MCMCpack)
    summary(loyn.mcmcpack)
    
    Iterations = 1001:11000
    Thinning interval = 1 
    Number of chains = 1 
    Sample size per chain = 10000 
    
    1. Empirical mean and standard deviation for each variable,
       plus standard error of the mean:
    
                    Mean      SD  Naive SE Time-series SE
    (Intercept) 19.52399 0.86649 0.0086649      0.0086649
    cDIST       -0.91832 2.74520 0.0274520      0.0255319
    cLDIST      -0.61941 2.17855 0.0217855      0.0213725
    cAREA        7.45165 1.49424 0.0149424      0.0149424
    cGRAZE      -1.67804 0.95447 0.0095447      0.0095447
    cALT         0.01962 0.02438 0.0002438      0.0002405
    cYR.ISOL     0.07350 0.04690 0.0004690      0.0004690
    sigma2      42.58137 8.92244 0.0892244      0.1035069
    
    2. Quantiles for each variable:
    
                    2.5%       25%      50%      75%    97.5%
    (Intercept) 17.81405 18.959661 19.51913 20.09181 21.22761
    cDIST       -6.35916 -2.741141 -0.88385  0.89585  4.45512
    cLDIST      -4.88865 -2.063250 -0.59703  0.83050  3.63505
    cAREA        4.52004  6.457833  7.43912  8.43842 10.40461
    cGRAZE      -3.54347 -2.308868 -1.68057 -1.04469  0.18256
    cALT        -0.02837  0.003564  0.01963  0.03575  0.06727
    cYR.ISOL    -0.01960  0.042925  0.07330  0.10428  0.16627
    sigma2      28.49295 36.278224 41.47093 47.58073 63.37241
    
    library(broom)
    tidyMCMC(loyn.mcmcpack, conf.int = TRUE, conf.method = "HPDinterval")
    
             term    estimate  std.error    conf.low   conf.high
    1 (Intercept) 19.52399358 0.86649050 17.84303378 21.25142949
    2       cDIST -0.91831714 2.74520058 -6.46277160  4.33066565
    3      cLDIST -0.61941346 2.17855377 -4.86177798  3.64754513
    4       cAREA  7.45164638 1.49423574  4.48112045 10.35905122
    5      cGRAZE -1.67804492 0.95446814 -3.57512530  0.13921553
    6        cALT  0.01962116 0.02437745 -0.02696416  0.06833682
    7    cYR.ISOL  0.07349636 0.04690323 -0.02117776  0.16423954
    8      sigma2 42.58136989 8.92244191 26.73620812 60.24660624
    
    mcmcpvalue(loyn.mcmcpack[, "cDIST"])
    
    [1] 0.7335
    
    mcmcpvalue(loyn.mcmcpack[, "cLDIST"])
    
    [1] 0.7705
    
    mcmcpvalue(loyn.mcmcpack[, "cAREA"])
    
    [1] 0
    
    mcmcpvalue(loyn.mcmcpack[, "cGRAZE"])
    
    [1] 0.0764
    
    mcmcpvalue(loyn.mcmcpack[, "cALT"])
    
    [1] 0.4132
    
    mcmcpvalue(loyn.mcmcpack[, "cYR.ISOL"])
    
    [1] 0.1175
    
    print(loyn.r2jags)
    
    Inference for Bugs model at "5", fit using jags,
     3 chains, each with 50000 iterations (first 3000 discarded), n.thin = 10
     n.sims = 14100 iterations saved
             mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
    beta[1]   -0.925   2.769  -6.342  -2.786  -0.903   0.920   4.474 1.001 14000
    beta[2]   -0.645   2.192  -4.999  -2.106  -0.655   0.817   3.653 1.001 14000
    beta[3]    7.470   1.509   4.480   6.458   7.459   8.478  10.441 1.001 14000
    beta[4]   -1.666   0.967  -3.560  -2.317  -1.668  -1.026   0.239 1.001 14000
    beta[5]    0.020   0.025  -0.029   0.003   0.020   0.036   0.068 1.001 14000
    beta[6]    0.074   0.047  -0.020   0.043   0.074   0.106   0.166 1.001 14000
    beta0     19.517   0.887  17.802  18.920  19.519  20.100  21.273 1.001 14000
    sigma      6.553   0.680   5.382   6.076   6.498   6.973   8.038 1.001 14000
    deviance 367.930   4.422 361.502 364.691 367.249 370.363 378.574 1.001 11000
    
    For each parameter, n.eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
    
    DIC info (using the rule, pD = var(deviance)/2)
    pD = 9.8 and DIC = 377.7
    DIC is an estimate of expected predictive error (lower deviance is better).
    
    library(broom)
    tidyMCMC(as.mcmc(loyn.r2jags), conf.int = TRUE, conf.method = "HPDinterval")
    
          term     estimate  std.error     conf.low    conf.high
    1    beta0  19.51671537 0.88653929  17.80044107  21.27151905
    2  beta[1]  -0.92516084 2.76914318  -6.31725094   4.47780930
    3  beta[2]  -0.64512795 2.19237959  -4.87230718   3.74680583
    4  beta[3]   7.46983136 1.50921976   4.49154376  10.45213033
    5  beta[4]  -1.66558727 0.96744947  -3.51363528   0.28381275
    6  beta[5]   0.01970088 0.02497929  -0.02931751   0.06843318
    7  beta[6]   0.07398983 0.04714504  -0.01834409   0.16712090
    8 deviance 367.92987558 4.42245372 360.47334022 376.40993885
    9    sigma   6.55252461 0.68025713   5.29232571   7.90205315
    
    mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[1]"])
    
    [1] 0.7329787
    
    mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[2]"])
    
    [1] 0.7687234
    
    mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[3]"])
    
    [1] 0
    
    mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[4]"])
    
    [1] 0.08553191
    
    mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[5]"])
    
    [1] 0.4195035
    
    mcmcpvalue(loyn.r2jags$BUGSoutput$sims.matrix[, "beta[6]"])
    
    [1] 0.1158156
    
    print(loyn.rstan, pars = c("beta0", "beta", "sigma"))
    
    Inference for Stan model: d98dbf6a02725fc3fce11306b77873e9.
    3 chains, each with iter=5000; warmup=500; thin=2; 
    post-warmup draws per chain=2250, total post-warmup draws=6750.
    
             mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
    beta0   19.36    0.01 0.87 17.63 18.79 19.36 19.95 21.05  6389    1
    beta[1] -0.87    0.03 2.64 -5.97 -2.67 -0.84  0.89  4.26  6167    1
    beta[2] -0.54    0.03 2.10 -4.64 -1.95 -0.56  0.84  3.62  6014    1
    beta[3]  7.33    0.02 1.47  4.40  6.36  7.32  8.33 10.23  6293    1
    beta[4] -1.67    0.01 0.94 -3.48 -2.32 -1.70 -1.04  0.19  6149    1
    beta[5]  0.02    0.00 0.02 -0.03  0.00  0.02  0.04  0.07  6240    1
    beta[6]  0.07    0.00 0.05 -0.02  0.04  0.07  0.11  0.17  5908    1
    sigma    6.46    0.01 0.66  5.34  5.99  6.40  6.87  7.90  5582    1
    
    Samples were drawn using NUTS(diag_e) at Mon Aug 28 12:58:03 2017.
    For each parameter, n_eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor on split chains (at 
    convergence, Rhat=1).
    
    library(broom)
    tidyMCMC(loyn.rstan, conf.int = TRUE, conf.method = "HPDinterval", pars = c("beta0", "beta", "sigma"))
    
         term    estimate  std.error    conf.low   conf.high
    1   beta0 19.35781914 0.87419049 17.63200046 21.04667110
    2 beta[1] -0.87296466 2.64087863 -6.00306466  4.21472882
    3 beta[2] -0.53619405 2.10350367 -4.59917757  3.65478091
    4 beta[3]  7.33464753 1.46590498  4.38357023 10.20416316
    5 beta[4] -1.67449356 0.94137372 -3.47370216  0.19084411
    6 beta[5]  0.02026565 0.02432515 -0.02633551  0.07000986
    7 beta[6]  0.07487073 0.04626295 -0.01566025  0.16666197
    8   sigma  6.46001116 0.66131687  5.23776603  7.75854853
    
    mcmcpvalue(as.matrix(loyn.rstan)[, "beta[1]"])
    
    [1] 0.7416296
    
    mcmcpvalue(as.matrix(loyn.rstan)[, "beta[2]"])
    
    [1] 0.7917037
    
    mcmcpvalue(as.matrix(loyn.rstan)[, "beta[3]"])
    
    [1] 0
    
    mcmcpvalue(as.matrix(loyn.rstan)[, "beta[4]"])
    
    [1] 0.07155556
    
    mcmcpvalue(as.matrix(loyn.rstan)[, "beta[5]"])
    
    [1] 0.390963
    
    mcmcpvalue(as.matrix(loyn.rstan)[, "beta[6]"])
    
    [1] 0.1060741
    
    # lets explore the support for GRAZE via loo
    library(loo)
    (full = loo(extract_log_lik(loyn.rstan)))
    
    Computed from 6750 by 56 log-likelihood matrix
    
             Estimate   SE
    elpd_loo   -188.6  6.2
    p_loo         8.1  1.8
    looic       377.1 12.5
    
    Pareto k diagnostic values:
                             Count  Pct 
    (-Inf, 0.5]   (good)     54    96.4%
     (0.5, 0.7]   (ok)        2     3.6%
       (0.7, 1]   (bad)       0     0.0%
       (1, Inf)   (very bad)  0     0.0%
    
    All Pareto k estimates are ok (k < 0.7)
    See help('pareto-k-diagnostic') for details.
    
    X = model.matrix(~cDIST + cLDIST + cAREA + cALT + cYR.ISOL, data = loyn)
    loyn.list <- with(loyn, list(Y = ABUND, X = X, nX = ncol(X), n = nrow(loyn)))
    loyn.rstan.red <- stan(data = loyn.list, model_code = modelString, chains = 3, iter = 5000, warmup = 2500,
        thin = 3, save_dso = TRUE)
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 1).
    
    Gradient evaluation took 2.1e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.210041 seconds (Warm-up)
                   0.099843 seconds (Sampling)
                   0.309884 seconds (Total)
    
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 2).
    
    Gradient evaluation took 9e-06 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.212176 seconds (Warm-up)
                   0.101142 seconds (Sampling)
                   0.313318 seconds (Total)
    
    
    SAMPLING FOR MODEL 'd98dbf6a02725fc3fce11306b77873e9' NOW (CHAIN 3).
    
    Gradient evaluation took 1e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.201244 seconds (Warm-up)
                   0.098536 seconds (Sampling)
                   0.29978 seconds (Total)
    
    (reduced = loo(extract_log_lik(loyn.rstan.red)))
    
    Computed from 2502 by 56 log-likelihood matrix
    
             Estimate   SE
    elpd_loo   -189.5  6.3
    p_loo         7.5  1.7
    looic       378.9 12.5
    
    Pareto k diagnostic values:
                             Count  Pct 
    (-Inf, 0.5]   (good)     54    96.4%
     (0.5, 0.7]   (ok)        2     3.6%
       (0.7, 1]   (bad)       0     0.0%
       (1, Inf)   (very bad)  0     0.0%
    
    All Pareto k estimates are ok (k < 0.7)
    See help('pareto-k-diagnostic') for details.
    
    par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3)
    plot(full, label_points = TRUE)
    plot(reduced, label_points = TRUE)
    
    plot of chunk tut7.3bQ2.8c1
    summary(loyn.rstanarm)
    
    Model Info:
    
     function:  stan_glm
     family:    gaussian [identity]
     formula:   ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL
     algorithm: sampling
     priors:    see help('prior_summary')
     sample:    6750 (posterior sample size)
     num obs:   56
    
    Estimates:
                    mean   sd     2.5%   25%    50%    75%    97.5%
    (Intercept)     19.5    0.9   17.8   18.9   19.5   20.1   21.3 
    cDIST           -0.9    2.7   -6.3   -2.8   -1.0    0.9    4.4 
    cLDIST          -0.6    2.2   -5.0   -2.0   -0.6    0.9    3.7 
    cAREA            7.5    1.5    4.6    6.5    7.5    8.5   10.4 
    cGRAZE          -1.7    1.0   -3.5   -2.3   -1.7   -1.0    0.3 
    cALT             0.0    0.0    0.0    0.0    0.0    0.0    0.1 
    cYR.ISOL         0.1    0.0    0.0    0.0    0.1    0.1    0.2 
    sigma            6.5    0.7    5.4    6.1    6.5    6.9    8.0 
    mean_PPD        19.5    1.2   17.1   18.6   19.5   20.3   21.9 
    log-posterior -198.3    2.1 -203.4 -199.5 -198.0 -196.7 -195.1 
    
    Diagnostics:
                  mcse Rhat n_eff
    (Intercept)   0.0  1.0  6455 
    cDIST         0.0  1.0  6241 
    cLDIST        0.0  1.0  6750 
    cAREA         0.0  1.0  6364 
    cGRAZE        0.0  1.0  5911 
    cALT          0.0  1.0  6505 
    cYR.ISOL      0.0  1.0  6436 
    sigma         0.0  1.0  6137 
    mean_PPD      0.0  1.0  6412 
    log-posterior 0.0  1.0  4445 
    
    For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
    
    library(broom)
    tidyMCMC(loyn.rstanarm$stanfit, conf.int = TRUE, conf.method = "HPDinterval", ess = TRUE, rhat = TRUE)
    
                term      estimate  std.error      conf.low     conf.high      rhat  ess
    1    (Intercept)   19.51470858 0.87397606   17.80774674   21.28378216 0.9999715 6455
    2          cDIST   -0.93894131 2.74304984   -6.21251435    4.50342466 1.0001898 6241
    3         cLDIST   -0.61542432 2.18690279   -5.04889534    3.58168450 0.9999507 6750
    4          cAREA    7.47880465 1.48748763    4.51755530   10.30705743 1.0003404 6364
    5         cGRAZE   -1.66200918 0.95323779   -3.54291988    0.15651234 0.9998401 5911
    6           cALT    0.01927526 0.02446506   -0.02782742    0.06755766 1.0002757 6505
    7       cYR.ISOL    0.07368280 0.04607753   -0.01021086    0.16906692 1.0001234 6436
    8          sigma    6.53502840 0.68217042    5.26894439    7.88647953 0.9999148 6137
    9       mean_PPD   19.48796549 1.24135073   17.19598615   21.98422036 0.9999042 6412
    10 log-posterior -198.30715262 2.14292966 -202.55736971 -194.83677821 0.9999104 4445
    
    mcmcpvalue(as.matrix(loyn.rstanarm)[, "cDIST"])
    
    [1] 0.7238519
    
    mcmcpvalue(as.matrix(loyn.rstanarm)[, "cLDIST"])
    
    [1] 0.7751111
    
    mcmcpvalue(as.matrix(loyn.rstanarm)[, "cAREA"])
    
    [1] 0
    
    mcmcpvalue(as.matrix(loyn.rstanarm)[, "cGRAZE"])
    
    [1] 0.08014815
    
    mcmcpvalue(as.matrix(loyn.rstanarm)[, "cALT"])
    
    [1] 0.4260741
    
    mcmcpvalue(as.matrix(loyn.rstanarm)[, "cYR.ISOL"])
    
    [1] 0.1056296
    
    # lets explore the support for GRAZE via loo
    library(loo)
    (full = loo(loyn.rstanarm))
    
    Computed from 6750 by 56 log-likelihood matrix
    
             Estimate   SE
    elpd_loo   -188.6  6.1
    p_loo         8.0  1.8
    looic       377.3 12.2
    
    Pareto k diagnostic values:
                             Count  Pct 
    (-Inf, 0.5]   (good)     55    98.2%
     (0.5, 0.7]   (ok)        1     1.8%
       (0.7, 1]   (bad)       0     0.0%
       (1, Inf)   (very bad)  0     0.0%
    
    All Pareto k estimates are ok (k < 0.7)
    See help('pareto-k-diagnostic') for details.
    
    loyn.rstanarm.red <- update(loyn.rstanarm, . ~ . - cGRAZE)
    
    Gradient evaluation took 2.6e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.26 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.102453 seconds (Warm-up)
                   0.362637 seconds (Sampling)
                   0.46509 seconds (Total)
    
    
    Gradient evaluation took 2.8e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.079963 seconds (Warm-up)
                   0.342621 seconds (Sampling)
                   0.422584 seconds (Total)
    
    
    Gradient evaluation took 1.4e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.076559 seconds (Warm-up)
                   0.326624 seconds (Sampling)
                   0.403183 seconds (Total)
    
    (reduced = loo(loyn.rstanarm.red))
    
    Computed from 6750 by 56 log-likelihood matrix
    
             Estimate   SE
    elpd_loo   -189.6  6.2
    p_loo         7.6  1.7
    looic       379.1 12.4
    
    Pareto k diagnostic values:
                             Count  Pct 
    (-Inf, 0.5]   (good)     55    98.2%
     (0.5, 0.7]   (ok)        1     1.8%
       (0.7, 1]   (bad)       0     0.0%
       (1, Inf)   (very bad)  0     0.0%
    
    All Pareto k estimates are ok (k < 0.7)
    See help('pareto-k-diagnostic') for details.
    
    par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3)
    plot(full, label_points = TRUE)
    plot(reduced, label_points = TRUE)
    
    plot of chunk tut7.3bQ2.8d1
    summary(loyn.brm)
    
     Family: gaussian(identity) 
    Formula: ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL 
       Data: loyn (Number of observations: 56) 
    Samples: 3 chains, each with iter = 5000; warmup = 500; thin = 2; 
             total post-warmup samples = 6750
        ICs: LOO = NA; WAIC = NA; R2 = NA
     
    Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    Intercept    19.38      0.87    17.61    21.09       6062    1
    cDIST        -0.84      2.59    -5.84     4.26       6718    1
    cLDIST       -0.56      2.08    -4.72     3.47       6305    1
    cAREA         7.30      1.47     4.46    10.16       6451    1
    cGRAZE       -1.70      0.94    -3.55     0.14       5912    1
    cALT          0.02      0.02    -0.03     0.07       6540    1
    cYR.ISOL      0.07      0.05    -0.02     0.16       6378    1
    
    Family Specific Parameters: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    sigma     6.47      0.66     5.34     7.92       5867    1
    
    Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
    is a crude measure of effective sample size, and Rhat is the potential 
    scale reduction factor on split chains (at convergence, Rhat = 1).
    
    library(broom)
    tidyMCMC(loyn.brm$fit, conf.int = TRUE, conf.method = "HPDinterval", ess = TRUE, rhat = TRUE)
    
             term    estimate  std.error    conf.low   conf.high      rhat  ess
    1 b_Intercept 19.37880736 0.87350422 17.59486668 21.04523042 1.0002504 6062
    2     b_cDIST -0.84118727 2.59215760 -5.93414106  4.14410963 0.9997645 6718
    3    b_cLDIST -0.55827202 2.08479070 -4.75640558  3.42477995 1.0006374 6305
    4     b_cAREA  7.30182513 1.46762773  4.44967625 10.14571910 1.0001524 6451
    5    b_cGRAZE -1.69718041 0.93533654 -3.56234009  0.11421872 0.9997029 5912
    6      b_cALT  0.02050336 0.02460275 -0.02782871  0.06897146 0.9996017 6540
    7  b_cYR.ISOL  0.07385555 0.04541507 -0.01317674  0.16516260 1.0001454 6378
    8       sigma  6.47151516 0.65910492  5.29794394  7.82266145 0.9998090 5867
    
    mcmcpvalue(as.matrix(loyn.brm)[, "b_cDIST"])
    
    [1] 0.7465185
    
    mcmcpvalue(as.matrix(loyn.brm)[, "b_cLDIST"])
    
    [1] 0.7779259
    
    mcmcpvalue(as.matrix(loyn.brm)[, "b_cAREA"])
    
    [1] 0
    
    mcmcpvalue(as.matrix(loyn.brm)[, "b_cGRAZE"])
    
    [1] 0.07037037
    
    mcmcpvalue(as.matrix(loyn.brm)[, "b_cALT"])
    
    [1] 0.3955556
    
    mcmcpvalue(as.matrix(loyn.brm)[, "b_cYR.ISOL"])
    
    [1] 0.1042963
    
    # lets explore the support for GRAZE via loo
    library(loo)
    (full = loo(loyn.brm))
    
      LOOIC    SE
     376.84 12.38
    
    loyn.brm.red <- update(loyn.brm, . ~ . - cGRAZE)
    
    SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 1).
    
    Gradient evaluation took 1.8e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.194386 seconds (Warm-up)
                   0.098192 seconds (Sampling)
                   0.292578 seconds (Total)
    
    
    SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 2).
    
    Gradient evaluation took 8e-06 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.17107 seconds (Warm-up)
                   0.097613 seconds (Sampling)
                   0.268683 seconds (Total)
    
    
    SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 3).
    
    Gradient evaluation took 1.9e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 0.176611 seconds (Warm-up)
                   0.099638 seconds (Sampling)
                   0.276249 seconds (Total)
    
    (reduced = loo(loyn.brm.red))
    
      LOOIC    SE
     379.13 12.66
    
    par(mfrow = 1:2, mar = c(5, 3.8, 1, 0) + 0.1, las = 3)
    plot(full, label_points = TRUE)
    plot(reduced, label_points = TRUE)
    
    plot of chunk tut7.3bQ2.8e1

    There is not much (if any) support for GRAZE. We will explore this more thoroughly when we look at sparsity.

  11. Generate graphical summaries
    library(MCMCpack)
    loyn.mcmc = loyn.mcmcpack
    ## Calculate the fitted values
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST +
        mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE,
        ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit,
        conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>%
        mutate_at(Vars[!Vars %in% x], mean)))
    
    fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat)))
    rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(DIST = exp(cDIST +
        mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA),
        GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL +
            mean.YR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>%
        filter(round(Value, 5) != 0)
    rdata.melt = rdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>%
        filter(round(Value, 5) != 0)
    
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") +
        geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
            alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") +
        facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() +
        theme(strip.background = element_blank(), strip.placement = "outside")
    
    plot of chunk tut7.3bQ2.9a1
    library(MCMCpack)
    loyn.mcmc = loyn.mcmcpack
    ## Calculate the fitted values
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>%
        mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST),
            AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT +
                mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit,
        conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>%
        mutate_at(Vars[!Vars %in% x], mean)))
    fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat)))
    rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate_all(funs(ifelse(round(.,
        12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST +
        mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE,
        ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value", DIST,
        LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0)
    rdata.melt = rdata %>% gather(key = "Pred", value = "Value", DIST, LDIST,
        AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0)
    
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") +
        geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
            alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") +
        facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() +
        theme(strip.background = element_blank(), strip.placement = "outside")
    
    plot of chunk tut7.3bQ2.9a2
    loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix
    ## Calculate the fitted values
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]",
        "beta[5]", "beta[6]")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST +
        mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE,
        ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit,
        conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>%
        mutate_at(Vars[!Vars %in% x], mean)))
    
    fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat)))
    rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(DIST = exp(cDIST +
        mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA),
        GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL +
            mean.YR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>%
        filter(round(Value, 5) != 0)
    rdata.melt = rdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>%
        filter(round(Value, 5) != 0)
    
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") +
        geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
            alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") +
        facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() +
        theme(strip.background = element_blank(), strip.placement = "outside")
    
    plot of chunk tut7.3bQ2.9b1
    loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix
    ## Calculate the fitted values
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]",
        "beta[5]", "beta[6]")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>%
        mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST),
            AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT +
                mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit,
        conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>%
        mutate_at(Vars[!Vars %in% x], mean)))
    fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat)))
    rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate_all(funs(ifelse(round(.,
        12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST +
        mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE,
        ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value", DIST,
        LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0)
    rdata.melt = rdata %>% gather(key = "Pred", value = "Value", DIST, LDIST,
        AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0)
    
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") +
        geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
            alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") +
        facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() +
        theme(strip.background = element_blank(), strip.placement = "outside")
    
    plot of chunk tut7.3bQ2.9b2
    loyn.mcmc = as.matrix(loyn.rstan)
    ## Calculate the fitted values
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]",
        "beta[5]", "beta[6]")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST +
        mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE,
        ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit,
        conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>%
        mutate_at(Vars[!Vars %in% x], mean)))
    
    fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat)))
    rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(DIST = exp(cDIST +
        mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA),
        GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL +
            mean.YR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>%
        filter(round(Value, 5) != 0)
    rdata.melt = rdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>%
        filter(round(Value, 5) != 0)
    
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") +
        geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
            alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") +
        facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() +
        theme(strip.background = element_blank(), strip.placement = "outside")
    
    plot of chunk tut7.3bQ2.9c1
    loyn.mcmc = as.matrix(loyn.rstan)
    ## Calculate the fitted values
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]",
        "beta[5]", "beta[6]")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>%
        mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST),
            AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT +
                mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit,
        conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>%
        mutate_at(Vars[!Vars %in% x], mean)))
    fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat)))
    rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate_all(funs(ifelse(round(.,
        12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST +
        mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE,
        ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value", DIST,
        LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0)
    rdata.melt = rdata %>% gather(key = "Pred", value = "Value", DIST, LDIST,
        AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0)
    
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") +
        geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
            alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") +
        facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() +
        theme(strip.background = element_blank(), strip.placement = "outside")
    
    plot of chunk tut7.3bQ2.9c2
    loyn.mcmc = as.matrix(loyn.rstanarm)
    ## Calculate the fitted values
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST +
        mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE,
        ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit,
        conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>%
        mutate_at(Vars[!Vars %in% x], mean)))
    
    fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat)))
    rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(DIST = exp(cDIST +
        mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA),
        GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL +
            mean.YR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>%
        filter(round(Value, 5) != 0)
    rdata.melt = rdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>%
        filter(round(Value, 5) != 0)
    
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") +
        geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
            alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") +
        facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() +
        theme(strip.background = element_blank(), strip.placement = "outside")
    
    plot of chunk tut7.3bQ2.9d1
    loyn.mcmc = as.matrix(loyn.rstanarm)
    ## Calculate the fitted values
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>%
        mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST),
            AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT +
                mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit,
        conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>%
        mutate_at(Vars[!Vars %in% x], mean)))
    fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat)))
    rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate_all(funs(ifelse(round(.,
        12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST +
        mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE,
        ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value", DIST,
        LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0)
    rdata.melt = rdata %>% gather(key = "Pred", value = "Value", DIST, LDIST,
        AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0)
    
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") +
        geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
            alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") +
        facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() +
        theme(strip.background = element_blank(), strip.placement = "outside")
    
    plot of chunk tut7.3bQ2.9d2
    plot(marginal_effects(loyn.brm), points = TRUE)
    
    plot of chunk tut7.3bQ2.9e1
    plot of chunk tut7.3bQ2.9e1
    plot of chunk tut7.3bQ2.9e1
    plot of chunk tut7.3bQ2.9e1
    plot of chunk tut7.3bQ2.9e1
    plot of chunk tut7.3bQ2.9e1
    loyn.mcmc = as.matrix(loyn.brm)
    ## Calculate the fitted values
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("b_Intercept)", "b_cDIST", "b_cLDIST", "b_cAREA",
        "b_cGRAZE", "b_cALT", "b_cYR.ISOL")]
    
    Error in loyn.mcmc[, c("b_Intercept)", "b_cDIST", "b_cLDIST", "b_cAREA", : subscript out of bounds
    
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate(DIST = exp(cDIST + mean.DIST), LDIST = exp(cLDIST +
        mean.LDIST), AREA = exp(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE,
        ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit,
        conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>%
        mutate_at(Vars[!Vars %in% x], mean)))
    
    fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat)))
    rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate(DIST = exp(cDIST +
        mean.DIST), LDIST = exp(cLDIST + mean.LDIST), AREA = exp(cAREA + mean.AREA),
        GRAZE = cGRAZE + mean.GRAZE, ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL +
            mean.YR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>%
        filter(round(Value, 5) != 0)
    rdata.melt = rdata %>% gather(key = "Pred", value = "Value", cDIST:cYR.ISOL) %>%
        filter(round(Value, 5) != 0)
    
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") +
        geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
            alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") +
        facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() +
        theme(strip.background = element_blank(), strip.placement = "outside")
    
    plot of chunk tut7.3bQ2.9e1
    loyn.mcmc = as.matrix(loyn.brm)
    ## Calculate the fitted values
    Vars = c("cDIST", "cLDIST", "cAREA", "cGRAZE", "cALT", "cYR.ISOL")
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    library(newdata)
    newdata = do.call(rbind, lapply(Vars, function(x) new_data(loyn.list[[x]],
        seq = x, len = 100)))
    
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("b_Intercept", "b_cDIST", "b_cLDIST", "b_cAREA",
        "b_cGRAZE", "b_cALT", "b_cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    newdata = newdata %>% mutate_all(funs(ifelse(round(., 12) == 0, NA, .))) %>%
        mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST + mean.LDIST),
            AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE, ALT = cALT +
                mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL) %>% cbind(tidyMCMC(fit,
        conf.int = TRUE, conf.method = "HPDinterval"))
    
    ## Partial residuals
    loyn.list = rep(list(loyn), length(Vars))
    names(loyn.list) <- Vars
    rdata = fdata = do.call(rbind, lapply(Vars, function(x) loyn.list[[x]] %>%
        mutate_at(Vars[!Vars %in% x], mean)))
    fMat = rMat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        fdata)
    fit = as.vector(apply(coefs, 2, median) %*% t(fMat))
    resid = as.vector(loyn$ABUND - (apply(coefs, 2, median) %*% t(rMat)))
    rdata = rdata %>% mutate(partial.resid = resid + fit) %>% mutate_all(funs(ifelse(round(.,
        12) == 0, NA, .))) %>% mutate(DIST = 10^(cDIST + mean.DIST), LDIST = 10^(cLDIST +
        mean.LDIST), AREA = 10^(cAREA + mean.AREA), GRAZE = cGRAZE + mean.GRAZE,
        ALT = cALT + mean.ALT, YR.ISOL = cYR.ISOL + mean.YR.ISOL)
    newdata.melt = newdata %>% gather(key = "Pred", value = "Value", DIST,
        LDIST, AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0)
    rdata.melt = rdata %>% gather(key = "Pred", value = "Value", DIST, LDIST,
        AREA, GRAZE, ALT, YR.ISOL) %>% filter(round(Value, 5) != 0)
    
    ggplot(newdata.melt, aes(y = estimate, x = Value)) + geom_line() + # geom_blank(aes(y=9)) +
    geom_point(data = rdata.melt, aes(y = partial.resid), color = "grey") +
        geom_ribbon(aes(ymin = conf.low, ymax = conf.high), fill = "blue",
            alpha = 0.3) + scale_y_continuous("Abundance") + scale_x_continuous("") +
        facet_wrap(~Pred, scales = "free_x", strip.position = "bottom") + theme_classic() +
        theme(strip.background = element_blank(), strip.placement = "outside")
    
    plot of chunk tut7.3bQ2.9e2
  12. Explore effect sizes - change in Abundance associated with a change equivalent to increasing from the 20th to 80th percentile of each predictor holding the other predictors constant.
    library(MCMCpack)
    loyn.mcmc = loyn.mcmcpack
    
    newdata = with(loyn, rbind(data.frame(cDIST = log10(quantile(DIST, p = c(0.2,
        0.8))) - log10(mean.DIST), cLDIST = 0, cAREA = 0, cGRAZE = 0, cALT = 0,
        cYR.ISOL = 0), data.frame(cDIST = 0, cLDIST = log10(quantile(LDIST,
        p = c(0.2, 0.8))) - log10(mean.LDIST), cAREA = 0, cGRAZE = 0, cALT = 0,
        cYR.ISOL = 0), data.frame(cDIST = 0, cLDIST = 0, cAREA = log10(quantile(AREA,
        p = c(0.2, 0.8))) - log10(mean.AREA), cGRAZE = 0, cALT = 0, cYR.ISOL = 0),
        data.frame(cDIST = 0, cLDIST = 0, cAREA = 0, cGRAZE = quantile(GRAZE,
            p = c(0.2, 0.8)) - mean.GRAZE, cALT = 0, cYR.ISOL = 0), data.frame(cDIST = 0,
            cLDIST = 0, cAREA = 0, cGRAZE = 0, cALT = quantile(ALT, p = c(0.2,
                0.8)) - mean.ALT, cYR.ISOL = 0), data.frame(cDIST = 0, cLDIST = 0,
            cAREA = 0, cGRAZE = 0, cALT = 0, cYR.ISOL = quantile(YR.ISOL, p = c(0.2,
                0.8)) - mean.YR.ISOL)))
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    fit = (coefs %*% t(Xmat))
    s1 = seq(1, 12, b = 2)
    s2 = seq(2, 12, b = 2)
    ## Raw effect size
    RES = fit[, s2] - fit[, s1]
    colnames(RES) = c("DIST", "LDIST", "AREA", "GRAZE", "ALT", "YR.ISOL")
    mcmc_intervals(as.mcmc(RES))
    
    plot of chunk tut7.3bQ2.10a1
    (RES = tidyMCMC(as.mcmc(RES), conf.int = TRUE, conf.method = "HPDinterval"))
    
         term   estimate std.error    conf.low  conf.high
    1    DIST -0.6653429  1.988964  -4.6824338  3.1376717
    2   LDIST -0.6136897  2.158422  -4.8168520  3.6138394
    3    AREA  9.6948155  1.944046   5.8300721 13.4774364
    4   GRAZE -6.7121797  3.817873 -14.3005012  0.5568621
    5     ALT  1.5696924  1.950196  -2.1571331  5.4669455
    6 YR.ISOL  3.4543288  2.204452  -0.9953545  7.7192585
    
    ## Cohen's D
    cohenD = (fit[, s2] - fit[, s1])/sqrt(loyn.mcmc[, "sigma2"])
    colnames(cohenD) = c("DIST", "LDIST", "AREA", "GRAZE", "ALT", "YR.ISOL")
    (cohenDES = tidyMCMC(as.mcmc(cohenD), conf.int = TRUE, conf.method = "HPDinterval"))
    
         term    estimate std.error   conf.low  conf.high
    1    DIST -0.10369055 0.3047738 -0.7095491 0.47470403
    2   LDIST -0.09515195 0.3312163 -0.7532847 0.53591205
    3    AREA  1.50909802 0.3342992  0.8440978 2.14860590
    4   GRAZE -1.04398623 0.5932122 -2.2274001 0.08168777
    5     ALT  0.24466323 0.3007082 -0.3211053 0.85235582
    6 YR.ISOL  0.53775976 0.3411211 -0.1620817 1.18748656
    
    # Percentage change
    ESp = 100 * (fit[, s2] - fit[, s1])/fit[, s1]
    colnames(ESp) = c("DIST", "LDIST", "AREA", "GRAZE", "ALT", "YR.ISOL")
    mcmc_intervals(as.mcmc(ESp))
    
    plot of chunk tut7.3bQ2.10a1
    (PES = tidyMCMC(as.mcmc(ESp), conf.int = TRUE, conf.method = "HPDinterval"))
    
         term   estimate std.error   conf.low conf.high
    1    DIST  -6.960431 17.227980 -36.350869 15.480668
    2   LDIST  -6.347939 16.321089 -35.412372 17.405606
    3    AREA  43.964662  8.040105  28.102975 59.620387
    4   GRAZE -28.207372 14.782815 -56.032146  0.454167
    5     ALT   8.877497 10.924913 -11.728873 30.844127
    6 YR.ISOL  21.144322 14.796056  -6.154603 51.715683
    
    # Probability that the effect is greater than various percentages
    (p0 = apply(ESp, 2, function(x, f = 0) ifelse(mean(x) > 0, sum(x > f)/length(x),
        sum(-1 * x > f)/length(x))))
    
       DIST   LDIST    AREA   GRAZE     ALT YR.ISOL 
     0.6321  0.6121  1.0000  0.9623  0.7931  0.9414 
    
    (p5 = apply(ESp, 2, function(x, f = 5) ifelse(mean(x) > 0, sum(x > f)/length(x),
        sum(-1 * x > f)/length(x))))
    
       DIST   LDIST    AREA   GRAZE     ALT YR.ISOL 
     0.4529  0.4400  1.0000  0.9325  0.6244  0.8760 
    
    (p10 = apply(ESp, 2, function(x, f = 10) ifelse(mean(x) > 0, sum(x > f)/length(x),
        sum(-1 * x > f)/length(x))))
    
       DIST   LDIST    AREA   GRAZE     ALT YR.ISOL 
     0.3096  0.3089  1.0000  0.8875  0.4376  0.7773 
    
    (p20 = apply(ESp, 2, function(x, f = 20) ifelse(mean(x) > 0, sum(x > f)/length(x),
        sum(-1 * x > f)/length(x))))
    
       DIST   LDIST    AREA   GRAZE     ALT YR.ISOL 
     0.1451  0.1449  0.9972  0.7366  0.1511  0.4961 
    
    (p50 = apply(ESp, 2, function(x, f = 50) ifelse(mean(x) > 0, sum(x > f)/length(x),
        sum(-1 * x > f)/length(x))))
    
       DIST   LDIST    AREA   GRAZE     ALT YR.ISOL 
     0.0187  0.0184  0.2223  0.0516  0.0008  0.0383 
    
    ## fractional change
    FES = fit[, s2]/fit[, s1]
    colnames(FES) = c("DIST", "LDIST", "AREA", "GRAZE", "ALT", "YR.ISOL")
    (FES = tidyMCMC(as.mcmc(FES), conf.int = TRUE, conf.method = "HPDinterval"))
    
         term  estimate  std.error  conf.low conf.high
    1    DIST 0.9303957 0.17227980 0.6364913  1.154807
    2   LDIST 0.9365206 0.16321089 0.6458763  1.174056
    3    AREA 1.4396466 0.08040105 1.2810298  1.596204
    4   GRAZE 0.7179263 0.14782815 0.4396785  1.004542
    5     ALT 1.0887750 0.10924913 0.8827113  1.308441
    6 YR.ISOL 1.2114432 0.14796056 0.9384540  1.517157
    
  13. Explore finite-population standard deviations
    library(MCMCpack)
    library(broom)
    loyn.mcmc = loyn.mcmcpack
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn)
    sd.DIST = abs(loyn.mcmc[, "cDIST"]) * sd(Xmat[, "cDIST"])
    sd.LDIST = abs(loyn.mcmc[, "cLDIST"]) * sd(Xmat[, "cLDIST"])
    sd.AREA = abs(loyn.mcmc[, "cAREA"]) * sd(Xmat[, "cAREA"])
    sd.GRAZE = abs(loyn.mcmc[, "cGRAZE"]) * sd(Xmat[, "cGRAZE"])
    sd.ALT = abs(loyn.mcmc[, "cALT"]) * sd(Xmat[, "cALT"])
    sd.YR.ISOL = abs(loyn.mcmc[, "cYR.ISOL"]) * sd(Xmat[, "cYR.ISOL"])
    sd.x = sd.DIST + sd.LDIST + sd.AREA + sd.GRAZE + sd.ALT + sd.YR.ISOL
    
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    ## get median parameter estimates
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, loyn$ABUND, "-")
    sd.resid = apply(resid, 1, sd)
    
    sd.all = cbind(sd.DIST, sd.LDIST, sd.AREA, sd.GRAZE, sd.ALT, sd.YR.ISOL,
        sd.resid)
    mcmc_intervals(sd.all)
    
    plot of chunk tut7.3bQ2.11a1
    (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1    sd.DIST 0.9516963 0.7273438 1.758422e-04  2.348979
    2   sd.LDIST 1.0318149 0.7940909 3.821824e-04  2.560397
    3    sd.AREA 6.0538065 1.2139350 3.640516e+00  8.415817
    4   sd.GRAZE 2.5133971 1.3210367 1.124092e-03  4.760110
    5     sd.ALT 1.1069382 0.7937846 1.267090e-06  2.609109
    6 sd.YR.ISOL 1.9459572 1.0898622 2.102330e-04  3.873274
    7   sd.resid 6.3977840 0.2256042 6.058299e+00  6.840074
    
    # OR expressed as a percentage
    mcmc_intervals(100 * sd.all/rowSums(sd.all))
    
    plot of chunk tut7.3bQ2.11a1
    (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
        conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1    sd.DIST  4.040253  3.296537  0.003643070  10.88095
    2   sd.LDIST  4.389845  3.575600  0.001932600  11.79132
    3    sd.AREA 30.594181  5.619669 18.862557405  40.73962
    4   sd.GRAZE 12.496277  6.574856  0.005641009  23.69286
    5     sd.ALT  4.829754  3.866615  0.001394094  12.91534
    6 sd.YR.ISOL  9.517571  5.286090  0.001119749  18.77406
    7   sd.resid 32.019220  2.571167 27.477591094  37.42906
    
    ## we can even plot this as a Bayesian ANOVA table
    ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
        ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
        vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
        scale_x_discrete() + coord_flip() + theme_classic()
    
    plot of chunk tut7.3bQ2.11a1
    loyn.mcmc = loyn.r2jags$BUGSoutput$sims.matrix
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn)
    sd.DIST = abs(loyn.mcmc[, "beta[1]"]) * sd(Xmat[, "cDIST"])
    sd.LDIST = abs(loyn.mcmc[, "beta[2]"]) * sd(Xmat[, "cLDIST"])
    sd.AREA = abs(loyn.mcmc[, "beta[3]"]) * sd(Xmat[, "cAREA"])
    sd.GRAZE = abs(loyn.mcmc[, "beta[4]"]) * sd(Xmat[, "cGRAZE"])
    sd.ALT = abs(loyn.mcmc[, "beta[5]"]) * sd(Xmat[, "cALT"])
    sd.YR.ISOL = abs(loyn.mcmc[, "beta[6]"]) * sd(Xmat[, "cYR.ISOL"])
    sd.x = sd.DIST + sd.LDIST + sd.AREA + sd.GRAZE + sd.ALT + sd.YR.ISOL
    
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    ## get median parameter estimates
    coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]",
        "beta[5]", "beta[6]")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, loyn$ABUND, "-")
    sd.resid = apply(resid, 1, sd)
    
    sd.all = cbind(sd.DIST, sd.LDIST, sd.AREA, sd.GRAZE, sd.ALT, sd.YR.ISOL,
        sd.resid)
    mcmc_intervals(sd.all)
    
    plot of chunk tut7.3bQ2.11b1
    (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1    sd.DIST 0.9596003 0.7339694 6.403065e-07  2.370494
    2   sd.LDIST 1.0462352 0.7945808 5.629702e-06  2.568776
    3    sd.AREA 6.0685802 1.2261081 3.648984e+00  8.491435
    4   sd.GRAZE 2.5039995 1.3252001 6.243016e-03  4.765300
    5     sd.ALT 1.1249923 0.8074863 2.584672e-05  2.656941
    6 sd.YR.ISOL 1.9570094 1.0987181 2.994709e-03  3.864309
    7   sd.resid 6.4058753 0.2282916 6.057727e+00  6.852572
    
    # OR expressed as a percentage
    mcmc_intervals(100 * sd.all/rowSums(sd.all))
    
    plot of chunk tut7.3bQ2.11b1
    (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
        conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1    sd.DIST  4.051554  3.319205 3.635510e-06  10.92768
    2   sd.LDIST  4.451107  3.578625 3.084334e-05  11.83866
    3    sd.AREA 30.556324  5.647921 1.910566e+01  41.03840
    4   sd.GRAZE 12.324617  6.590258 2.584667e-02  23.67757
    5     sd.ALT  4.916221  3.909358 8.387666e-04  12.93600
    6 sd.YR.ISOL  9.535497  5.318853 2.178405e-02  18.74905
    7   sd.resid 31.931299  2.560712 2.734591e+01  37.31862
    
    ## we can even plot this as a Bayesian ANOVA table
    ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
        ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
        vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
        scale_x_discrete() + coord_flip() + theme_classic()
    
    plot of chunk tut7.3bQ2.11b1
    loyn.mcmc = as.matrix(loyn.rstan)
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn)
    sd.DIST = abs(loyn.mcmc[, "beta[1]"]) * sd(Xmat[, "cDIST"])
    sd.LDIST = abs(loyn.mcmc[, "beta[2]"]) * sd(Xmat[, "cLDIST"])
    sd.AREA = abs(loyn.mcmc[, "beta[3]"]) * sd(Xmat[, "cAREA"])
    sd.GRAZE = abs(loyn.mcmc[, "beta[4]"]) * sd(Xmat[, "cGRAZE"])
    sd.ALT = abs(loyn.mcmc[, "beta[5]"]) * sd(Xmat[, "cALT"])
    sd.YR.ISOL = abs(loyn.mcmc[, "beta[6]"]) * sd(Xmat[, "cYR.ISOL"])
    sd.x = sd.DIST + sd.LDIST + sd.AREA + sd.GRAZE + sd.ALT + sd.YR.ISOL
    
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    ## get median parameter estimates
    coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]",
        "beta[5]", "beta[6]")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, loyn$ABUND, "-")
    sd.resid = apply(resid, 1, sd)
    
    sd.all = cbind(sd.DIST, sd.LDIST, sd.AREA, sd.GRAZE, sd.ALT, sd.YR.ISOL,
        sd.resid)
    mcmc_intervals(sd.all)
    
    plot of chunk tut7.3bQ2.11c1
    (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1    sd.DIST 0.9148496 0.6983213 8.785653e-04  2.256061
    2   sd.LDIST 0.9907171 0.7587319 3.052998e-05  2.438256
    3    sd.AREA 5.9587552 1.1909187 3.561265e+00  8.289984
    4   sd.GRAZE 2.5103440 1.2965350 2.970500e-02  4.676791
    5     sd.ALT 1.1171491 0.8069985 3.085684e-05  2.650259
    6 sd.YR.ISOL 1.9678262 1.0938962 5.873276e-04  3.872703
    7   sd.resid 6.3890485 0.2190489 6.054454e+00  6.823333
    
    # OR expressed as a percentage
    mcmc_intervals(100 * sd.all/rowSums(sd.all))
    
    plot of chunk tut7.3bQ2.11c1
    (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
        conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1    sd.DIST  3.907519  3.212341 4.097890e-03  10.56099
    2   sd.LDIST  4.273942  3.452200 1.731688e-04  11.44000
    3    sd.AREA 30.236970  5.580443 1.926755e+01  40.74984
    4   sd.GRAZE 12.657728  6.582046 7.654405e-02  23.72766
    5     sd.ALT  4.918719  3.946802 1.728940e-04  13.04556
    6 sd.YR.ISOL  9.710659  5.318334 8.498579e-03  18.89806
    7   sd.resid 32.163828  2.610195 2.776468e+01  37.80383
    
    ## we can even plot this as a Bayesian ANOVA table
    ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
        ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
        vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
        scale_x_discrete() + coord_flip() + theme_classic()
    
    plot of chunk tut7.3bQ2.11c1
    loyn.mcmc = as.matrix(loyn.rstanarm)
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn)
    sd.DIST = abs(loyn.mcmc[, "cDIST"]) * sd(Xmat[, "cDIST"])
    sd.LDIST = abs(loyn.mcmc[, "cLDIST"]) * sd(Xmat[, "cLDIST"])
    sd.AREA = abs(loyn.mcmc[, "cAREA"]) * sd(Xmat[, "cAREA"])
    sd.GRAZE = abs(loyn.mcmc[, "cGRAZE"]) * sd(Xmat[, "cGRAZE"])
    sd.ALT = abs(loyn.mcmc[, "cALT"]) * sd(Xmat[, "cALT"])
    sd.YR.ISOL = abs(loyn.mcmc[, "cYR.ISOL"]) * sd(Xmat[, "cYR.ISOL"])
    sd.x = sd.DIST + sd.LDIST + sd.AREA + sd.GRAZE + sd.ALT + sd.YR.ISOL
    
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    ## get median parameter estimates
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, loyn$ABUND, "-")
    sd.resid = apply(resid, 1, sd)
    
    sd.all = cbind(sd.DIST, sd.LDIST, sd.AREA, sd.GRAZE, sd.ALT, sd.YR.ISOL,
        sd.resid)
    mcmc_intervals(sd.all)
    
    plot of chunk tut7.3bQ2.11d1
    (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1    sd.DIST 0.9571352 0.7232895 2.055808e-04  2.354083
    2   sd.LDIST 1.0359987 0.7951729 8.080357e-05  2.614886
    3    sd.AREA 6.0758702 1.2084527 3.670116e+00  8.373577
    4   sd.GRAZE 2.4987310 1.3029521 2.308358e-02  4.733729
    5     sd.ALT 1.1035495 0.7875110 5.217236e-04  2.587163
    6 sd.YR.ISOL 1.9408230 1.0841879 5.716389e-04  3.812901
    7   sd.resid 6.4001331 0.2267968 6.059646e+00  6.842554
    
    # OR expressed as a percentage
    mcmc_intervals(100 * sd.all/rowSums(sd.all))
    
    plot of chunk tut7.3bQ2.11d1
    (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
        conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1    sd.DIST  4.069556  3.279479 1.849442e-03  10.96721
    2   sd.LDIST  4.405925  3.579056 4.602495e-04  11.99038
    3    sd.AREA 30.607918  5.599275 1.952106e+01  41.27510
    4   sd.GRAZE 12.275375  6.481209 1.274328e-01  23.66851
    5     sd.ALT  4.895469  3.812649 8.188199e-04  12.55481
    6 sd.YR.ISOL  9.483868  5.262687 6.561997e-02  18.64995
    7   sd.resid 31.999916  2.553375 2.742948e+01  37.42199
    
    ## we can even plot this as a Bayesian ANOVA table
    ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
        ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
        vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
        scale_x_discrete() + coord_flip() + theme_classic()
    
    plot of chunk tut7.3bQ2.11d1
    loyn.mcmc = as.matrix(loyn.brm)
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn)
    sd.DIST = abs(loyn.mcmc[, "b_cDIST"]) * sd(Xmat[, "cDIST"])
    sd.LDIST = abs(loyn.mcmc[, "b_cLDIST"]) * sd(Xmat[, "cLDIST"])
    sd.AREA = abs(loyn.mcmc[, "b_cAREA"]) * sd(Xmat[, "cAREA"])
    sd.GRAZE = abs(loyn.mcmc[, "b_cGRAZE"]) * sd(Xmat[, "cGRAZE"])
    sd.ALT = abs(loyn.mcmc[, "b_cALT"]) * sd(Xmat[, "cALT"])
    sd.YR.ISOL = abs(loyn.mcmc[, "b_cYR.ISOL"]) * sd(Xmat[, "cYR.ISOL"])
    sd.x = sd.DIST + sd.LDIST + sd.AREA + sd.GRAZE + sd.ALT + sd.YR.ISOL
    
    # generate a model matrix
    newdata = loyn
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        newdata)
    ## get median parameter estimates
    coefs = loyn.mcmc[, c("b_Intercept", "b_cDIST", "b_cLDIST", "b_cAREA",
        "b_cGRAZE", "b_cALT", "b_cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, loyn$ABUND, "-")
    sd.resid = apply(resid, 1, sd)
    
    sd.all = cbind(sd.DIST, sd.LDIST, sd.AREA, sd.GRAZE, sd.ALT, sd.YR.ISOL,
        sd.resid)
    mcmc_intervals(sd.all)
    
    plot of chunk tut7.3bQ2.11e1
    (fpsd = tidyMCMC(sd.all, conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1    sd.DIST 0.8991315 0.6805757 4.123211e-04  2.214328
    2   sd.LDIST 0.9788072 0.7623775 2.009903e-04  2.452555
    3    sd.AREA 5.9320899 1.1923183 3.614970e+00  8.242503
    4   sd.GRAZE 2.5338467 1.3051533 2.385887e-03  4.772074
    5     sd.ALT 1.1308774 0.8151568 3.568906e-05  2.674572
    6 sd.YR.ISOL 1.9432391 1.0691568 3.139418e-04  3.786755
    7   sd.resid 6.3850130 0.2137011 6.066638e+00  6.811799
    
    # OR expressed as a percentage
    mcmc_intervals(100 * sd.all/rowSums(sd.all))
    
    plot of chunk tut7.3bQ2.11e1
    (fpsd.p = tidyMCMC(100 * sd.all/rowSums(sd.all), estimate.method = "median",
        conf.int = TRUE, conf.method = "HPDinterval"))
    
            term  estimate std.error     conf.low conf.high
    1    sd.DIST  3.876120  3.148269 4.860050e-03  10.31386
    2   sd.LDIST  4.179393  3.472956 1.031670e-03  11.37728
    3    sd.AREA 30.279843  5.607016 1.885392e+01  40.38044
    4   sd.GRAZE 12.742637  6.568014 1.273447e-02  24.05760
    5     sd.ALT  5.035547  4.001749 1.836134e-04  13.09358
    6 sd.YR.ISOL  9.607205  5.239434 1.359604e-02  18.65769
    7   sd.resid 32.247312  2.542148 2.769044e+01  37.58520
    
    ## we can even plot this as a Bayesian ANOVA table
    ggplot(fpsd, aes(y = estimate, x = term)) + geom_pointrange(aes(ymin = conf.low,
        ymax = conf.high)) + geom_text(aes(label = sprintf("%.2f%%", fpsd.p$estimate),
        vjust = -1)) + scale_y_continuous("Finite population standard deviation") +
        scale_x_discrete() + coord_flip() + theme_classic()
    
    plot of chunk tut7.3bQ2.11e1
  14. Explore $R^2$
    library(MCMCpack)
    library(broom)
    loyn.mcmc <- loyn.mcmcpack
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn)
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, loyn$ABUND, "-")
    var_f = apply(fit, 1, var)
    var_e = apply(resid, 1, var)
    R2 = var_f/(var_f + var_e)
    tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
    
      term estimate  std.error  conf.low conf.high
    1 var1 0.665933 0.04516778 0.5803855 0.7428253
    
    # for comparison with frequentist
    summary(lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT +
        YR.ISOL, data = loyn))
    
    Call:
    lm(formula = ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + 
        GRAZE + ALT + YR.ISOL, data = loyn)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -15.6506  -2.9390   0.5289   2.5353  15.2842 
    
    Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  -125.69725   91.69228  -1.371   0.1767    
    log10(DIST)    -0.90696    2.67572  -0.339   0.7361    
    log10(LDIST)   -0.64842    2.12270  -0.305   0.7613    
    log10(AREA)     7.47023    1.46489   5.099 5.49e-06 ***
    GRAZE          -1.66774    0.92993  -1.793   0.0791 .  
    ALT             0.01951    0.02396   0.814   0.4195    
    YR.ISOL         0.07387    0.04520   1.634   0.1086    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 6.384 on 49 degrees of freedom
    Multiple R-squared:  0.6849,	Adjusted R-squared:  0.6464 
    F-statistic: 17.75 on 6 and 49 DF,  p-value: 8.443e-11
    
    loyn.mcmc <- loyn.r2jags$BUGSoutput$sims.matrix
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn)
    coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]",
        "beta[5]", "beta[6]")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, loyn$ABUND, "-")
    var_f = apply(fit, 1, var)
    var_e = apply(resid, 1, var)
    R2 = var_f/(var_f + var_e)
    tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
    
      term  estimate  std.error  conf.low conf.high
    1 var1 0.6659416 0.04498095 0.5786503 0.7411309
    
    # for comparison with frequentist
    summary(lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT +
        YR.ISOL, data = loyn))
    
    Call:
    lm(formula = ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + 
        GRAZE + ALT + YR.ISOL, data = loyn)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -15.6506  -2.9390   0.5289   2.5353  15.2842 
    
    Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  -125.69725   91.69228  -1.371   0.1767    
    log10(DIST)    -0.90696    2.67572  -0.339   0.7361    
    log10(LDIST)   -0.64842    2.12270  -0.305   0.7613    
    log10(AREA)     7.47023    1.46489   5.099 5.49e-06 ***
    GRAZE          -1.66774    0.92993  -1.793   0.0791 .  
    ALT             0.01951    0.02396   0.814   0.4195    
    YR.ISOL         0.07387    0.04520   1.634   0.1086    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 6.384 on 49 degrees of freedom
    Multiple R-squared:  0.6849,	Adjusted R-squared:  0.6464 
    F-statistic: 17.75 on 6 and 49 DF,  p-value: 8.443e-11
    
    loyn.mcmc <- as.matrix(loyn.rstan)
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn)
    coefs = loyn.mcmc[, c("beta0", "beta[1]", "beta[2]", "beta[3]", "beta[4]",
        "beta[5]", "beta[6]")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, loyn$ABUND, "-")
    var_f = apply(fit, 1, var)
    var_e = apply(resid, 1, var)
    R2 = var_f/(var_f + var_e)
    tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
    
      term  estimate  std.error conf.low conf.high
    1 var1 0.6642038 0.04535031 0.575571 0.7392861
    
    # for comparison with frequentist
    summary(lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT +
        YR.ISOL, data = loyn))
    
    Call:
    lm(formula = ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + 
        GRAZE + ALT + YR.ISOL, data = loyn)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -15.6506  -2.9390   0.5289   2.5353  15.2842 
    
    Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  -125.69725   91.69228  -1.371   0.1767    
    log10(DIST)    -0.90696    2.67572  -0.339   0.7361    
    log10(LDIST)   -0.64842    2.12270  -0.305   0.7613    
    log10(AREA)     7.47023    1.46489   5.099 5.49e-06 ***
    GRAZE          -1.66774    0.92993  -1.793   0.0791 .  
    ALT             0.01951    0.02396   0.814   0.4195    
    YR.ISOL         0.07387    0.04520   1.634   0.1086    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 6.384 on 49 degrees of freedom
    Multiple R-squared:  0.6849,	Adjusted R-squared:  0.6464 
    F-statistic: 17.75 on 6 and 49 DF,  p-value: 8.443e-11
    
    loyn.mcmc <- as.matrix(loyn.rstanarm)
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn)
    coefs = loyn.mcmc[, c("(Intercept)", "cDIST", "cLDIST", "cAREA", "cGRAZE",
        "cALT", "cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, loyn$ABUND, "-")
    var_f = apply(fit, 1, var)
    var_e = apply(resid, 1, var)
    R2 = var_f/(var_f + var_e)
    tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
    
      term  estimate  std.error  conf.low conf.high
    1 var1 0.6657499 0.04470819 0.5755614 0.7379351
    
    # for comparison with frequentist
    summary(lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT +
        YR.ISOL, data = loyn))
    
    Call:
    lm(formula = ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + 
        GRAZE + ALT + YR.ISOL, data = loyn)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -15.6506  -2.9390   0.5289   2.5353  15.2842 
    
    Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  -125.69725   91.69228  -1.371   0.1767    
    log10(DIST)    -0.90696    2.67572  -0.339   0.7361    
    log10(LDIST)   -0.64842    2.12270  -0.305   0.7613    
    log10(AREA)     7.47023    1.46489   5.099 5.49e-06 ***
    GRAZE          -1.66774    0.92993  -1.793   0.0791 .  
    ALT             0.01951    0.02396   0.814   0.4195    
    YR.ISOL         0.07387    0.04520   1.634   0.1086    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 6.384 on 49 degrees of freedom
    Multiple R-squared:  0.6849,	Adjusted R-squared:  0.6464 
    F-statistic: 17.75 on 6 and 49 DF,  p-value: 8.443e-11
    
    loyn.mcmc <- as.matrix(loyn.brm)
    Xmat = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL,
        data = loyn)
    coefs = loyn.mcmc[, c("b_Intercept", "b_cDIST", "b_cLDIST", "b_cAREA",
        "b_cGRAZE", "b_cALT", "b_cYR.ISOL")]
    fit = coefs %*% t(Xmat)
    resid = sweep(fit, 2, loyn$ABUND, "-")
    var_f = apply(fit, 1, var)
    var_e = apply(resid, 1, var)
    R2 = var_f/(var_f + var_e)
    tidyMCMC(as.mcmc(R2), conf.int = TRUE, conf.method = "HPDinterval")
    
      term  estimate  std.error  conf.low conf.high
    1 var1 0.6639469 0.04520351 0.5769229 0.7408735
    
    # for comparison with frequentist
    summary(lm(ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + GRAZE + ALT +
        YR.ISOL, data = loyn))
    
    Call:
    lm(formula = ABUND ~ log10(DIST) + log10(LDIST) + log10(AREA) + 
        GRAZE + ALT + YR.ISOL, data = loyn)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -15.6506  -2.9390   0.5289   2.5353  15.2842 
    
    Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  -125.69725   91.69228  -1.371   0.1767    
    log10(DIST)    -0.90696    2.67572  -0.339   0.7361    
    log10(LDIST)   -0.64842    2.12270  -0.305   0.7613    
    log10(AREA)     7.47023    1.46489   5.099 5.49e-06 ***
    GRAZE          -1.66774    0.92993  -1.793   0.0791 .  
    ALT             0.01951    0.02396   0.814   0.4195    
    YR.ISOL         0.07387    0.04520   1.634   0.1086    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 6.384 on 49 degrees of freedom
    Multiple R-squared:  0.6849,	Adjusted R-squared:  0.6464 
    F-statistic: 17.75 on 6 and 49 DF,  p-value: 8.443e-11
    
  15. We might expect that some of the predictors have no effect, so we could explore sparsity.
    					  modelString="
    					  data {
    					  int < lower =0 > n; # number of observations
    					  int < lower =0 > nX; # number of predictors
    					  vector [ n] Y; # outputs
    					  matrix [n ,nX] X; # inputs
    					  real < lower =0 > scale_icept ; # prior std for the intercept
    					  real < lower =0 > scale_global ; # scale for the half -t prior for tau
    					  real < lower =1 > nu_global ; # degrees of freedom for the half -t priors for tau
    					  real < lower =1 > nu_local ; # degrees of freedom for the half - t priors for lambdas
    					  real < lower =0 > slab_scale ; # slab scale for the regularized horseshoe
    					  real < lower =0 > slab_df ; # slab degrees of freedom for the regularized horseshoe
    					  }
    					  transformed data {
    					  matrix[n, nX - 1] Xc;  // centered version of X 
    					  vector[nX - 1] means_X;  // column means of X before centering 
    					  for (i in 2:nX) { 
    					  means_X[i - 1] = mean(X[, i]); 
    					  Xc[, i - 1] = X[, i] - means_X[i - 1]; 
    					  }  
    					  }
    					  parameters {
    					  real logsigma ;
    					  real cbeta0 ;
    					  vector [ nX-1] z;
    					  real < lower =0 > tau ; # global shrinkage parameter
    					  vector < lower =0 >[ nX-1] lambda ; # local shrinkage parameter
    					  real < lower =0 > caux ;
    					  }
    					  transformed parameters {
    					  real < lower =0 > sigma ; # noise std
    					  vector < lower =0 >[ nX-1] lambda_tilde ; # ’ truncated ’ local shrinkage parameter
    					  real < lower =0 > c; # slab scale
    					  vector [ nX-1] beta ; # regression coefficients
    					  vector [ n] mu; # latent function values
    					  sigma = exp ( logsigma );
    					  c = slab_scale * sqrt ( caux );
    					  lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) );
    					  beta = z .* lambda_tilde * tau ;
    					  mu = cbeta0 + Xc* beta ;
    					  }
    					  model {
    					  # half -t priors for lambdas and tau , and inverse - gamma for c ^2
    					  z ~ normal (0 , 1);
    					  lambda ~ student_t ( nu_local , 0, 1);
    					  tau ~ student_t ( nu_global , 0 , scale_global * sigma );
    					  caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );
    					  cbeta0 ~ normal (0 , scale_icept );
    					  Y ~ normal (mu , sigma );
    					  }
    					  generated quantities { 
    					  real beta0;  // population-level intercept 
    					  vector[n] log_lik;
    					  beta0 = cbeta0 - dot_product(means_X, beta);
    					  for (i in 1:n) {
    					  log_lik[i] = normal_lpdf(Y[i] | Xc[i] * beta + cbeta0, sigma);
    					  }
    					  }"
    
    X = model.matrix(~cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL, data = loyn)
    loyn.list <- with(loyn, list(Y = ABUND, X = X, nX = ncol(X), n = nrow(loyn), 
        scale_icept = 100, scale_global = 1, nu_global = 1, nu_local = 1, slab_scale = 2, 
        slab_df = 4))
    
    loyn.rstan.sparsity <- stan(data = loyn.list, model_code = modelString, 
        chains = 3, iter = 5000, warmup = 2500, thin = 3, save_dso = TRUE)
    
    SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 1).
    
    Gradient evaluation took 3.2e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.32 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 1.62127 seconds (Warm-up)
                   1.88662 seconds (Sampling)
                   3.50789 seconds (Total)
    
    
    SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 2).
    
    Gradient evaluation took 1.6e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 1.50273 seconds (Warm-up)
                   2.34926 seconds (Sampling)
                   3.85199 seconds (Total)
    
    
    SAMPLING FOR MODEL '00bfb1e363378528725b0dadb922f0fc' NOW (CHAIN 3).
    
    Gradient evaluation took 1.8e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
    Adjust your expectations accordingly!
    
    
    Iteration:    1 / 5000 [  0%]  (Warmup)
    Iteration:  500 / 5000 [ 10%]  (Warmup)
    Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Iteration: 5000 / 5000 [100%]  (Sampling)
    
     Elapsed Time: 2.34557 seconds (Warm-up)
                   0.915766 seconds (Sampling)
                   3.26134 seconds (Total)
    
    tidyMCMC(loyn.rstan.sparsity, pars = c("beta[1]", "beta[2]", "beta[3]", 
        "beta[4]", "beta[5]", "beta[6]"), conf.int = TRUE, conf.type = "HPDinterval", 
        rhat = TRUE, ess = TRUE)
    
         term     estimate  std.error    conf.low  conf.high     rhat  ess
    1 beta[1] -0.102450119 1.34718829 -3.26288544 2.53029434 1.003982 1076
    2 beta[2]  0.005498916 1.13020409 -2.38645159 2.53589963 1.000971 1809
    3 beta[3]  6.015072135 1.49222583  3.00689451 8.89352035 1.000410  912
    4 beta[4] -1.627923126 0.98545534 -3.56946777 0.05998661 1.005908  248
    5 beta[5]  0.026963370 0.02298719 -0.01398106 0.07309717 1.000758 1800
    6 beta[6]  0.083831429 0.04710366 -0.00398308 0.17670472 1.003063 1396
    
    library(bayesplot)
    mcmc_areas(as.matrix(loyn.rstan.sparsity), pars = c("beta[1]", "beta[2]", 
        "beta[3]", "beta[4]", "beta[5]", "beta[6]"))
    
    plot of chunk tut7.2bQ2.13a2
    n = nrow(loyn)
    +\n X = 2
    p0 = 1
    global_scale = p0/(nX - p0)/sqrt(n)
    loyn.rstanarm.sparsity = stan_glm(ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + 
        cALT + cYR.ISOL, data = loyn, iter = 5000, warmup = 2500, chains = 3, 
        thin = 2, refresh = 0, prior_intercept = normal(0, 100), prior = hs(df = 1, 
            global_df = 1, global_scale = global_scale), prior_aux = cauchy(0, 
            2))
    
    Gradient evaluation took 0.000103 seconds
    1000 transitions using 10 leapfrog steps per transition would take 1.03 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 7.38882 seconds (Warm-up)
                   24.3185 seconds (Sampling)
                   31.7073 seconds (Total)
    
    
    Gradient evaluation took 1.8e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 8.66478 seconds (Warm-up)
                   5.28659 seconds (Sampling)
                   13.9514 seconds (Total)
    
    
    Gradient evaluation took 2.4e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 6.80974 seconds (Warm-up)
                   4.8305 seconds (Sampling)
                   11.6402 seconds (Total)
    
    print(loyn.rstanarm.sparsity)
    
    stan_glm
     family:  gaussian [identity]
     formula: ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL
    ------
    
    Estimates:
                Median MAD_SD
    (Intercept) 19.6    0.9  
    cDIST        0.0    0.3  
    cLDIST       0.0    0.3  
    cAREA        7.4    1.4  
    cGRAZE      -0.7    1.0  
    cALT         0.0    0.0  
    cYR.ISOL     0.1    0.0  
    sigma        6.4    0.6  
    
    Sample avg. posterior predictive 
    distribution of y (X = xbar):
             Median MAD_SD
    mean_PPD 19.6    1.2  
    
    ------
    For info on the priors used see help('prior_summary.stanreg').
    
    tidyMCMC(loyn.rstanarm.sparsity$stanfit, conf.int = TRUE, conf.method = "HPDinterval", 
        rhat = TRUE, ess = TRUE)
    
                term      estimate  std.error      conf.low     conf.high      rhat  ess
    1    (Intercept)   19.53351641 0.86432405  1.788227e+01   21.26985950 1.0001766 3542
    2          cDIST   -0.13838979 0.95322116 -2.455377e+00    1.73041549 0.9997868 3271
    3         cLDIST   -0.13903438 0.82075933 -2.276093e+00    1.38848409 1.0005991 3360
    4          cAREA    7.40356992 1.40493339  4.760695e+00   10.27762748 0.9999890 3140
    5         cGRAZE   -0.90631856 0.93853532 -2.858119e+00    0.38289343 0.9993849 2692
    6           cALT    0.02593069 0.02211978 -1.382799e-02    0.06813137 1.0000430 3138
    7       cYR.ISOL    0.09427198 0.04620134 -4.486496e-04    0.17767418 0.9996034 3331
    8          sigma    6.48209753 0.64324825  5.333988e+00    7.81244883 1.0006806 3274
    9       mean_PPD   19.55320518 1.21915903  1.717871e+01   21.98696666 0.9996938 3578
    10 log-posterior -232.08622059 4.07046967 -2.399648e+02 -224.28703690 1.0007980 1745
    
    library(bayesplot)
    mcmc_areas(as.matrix(loyn.rstanarm.sparsity), regex_par = "^c")
    
    plot of chunk tut7.2bQ2.13b1
    n = nrow(loyn)
    +\n X = 2
    p0 = 1
    global_scale = p0/(nX - p0)/sqrt(n)
    loyn.brms.sparsity = brm(ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + 
        cYR.ISOL, data = loyn, iter = 2000, warmup = 200, chains = 3, thin = 2, 
        refresh = 0, prior = c(prior(normal(0, 100), class = "Intercept"), 
            prior(horseshoe(df = 1, par_ratio = par_ratio), class = "b"), prior(cauchy(0, 
                5), class = "sigma")))
    
    Gradient evaluation took 3.4e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.220903 seconds (Warm-up)
                   0.853136 seconds (Sampling)
                   1.07404 seconds (Total)
    
    
    Gradient evaluation took 2.4e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.302515 seconds (Warm-up)
                   1.02843 seconds (Sampling)
                   1.33094 seconds (Total)
    
    
    Gradient evaluation took 2.4e-05 seconds
    1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds.
    Adjust your expectations accordingly!
    
    
    
     Elapsed Time: 0.236384 seconds (Warm-up)
                   1.42325 seconds (Sampling)
                   1.65963 seconds (Total)
    
    print(loyn.brms.sparsity)
    
     Family: gaussian(identity) 
    Formula: ABUND ~ cDIST + cLDIST + cAREA + cGRAZE + cALT + cYR.ISOL 
       Data: loyn (Number of observations: 56) 
    Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; 
             total post-warmup samples = 2700
        ICs: LOO = NA; WAIC = NA; R2 = NA
     
    Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    Intercept    19.52      0.82    17.88    21.19       1754 1.00
    cDIST        -0.07      0.96    -2.50     1.92       2079 1.00
    cLDIST       -0.05      0.83    -1.87     1.77       2173 1.00
    cAREA         6.45      1.47     3.52     9.21       1560 1.01
    cGRAZE       -1.28      1.03    -3.31     0.18        315 1.01
    cALT          0.03      0.02    -0.01     0.07       2094 1.00
    cYR.ISOL      0.09      0.05     0.00     0.18       1197 1.00
    
    Family Specific Parameters: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    sigma     6.41      0.66      5.3     7.85        506    1
    
    Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
    is a crude measure of effective sample size, and Rhat is the potential 
    scale reduction factor on split chains (at convergence, Rhat = 1).
    
    tidyMCMC(loyn.brms.sparsity$fit, conf.int = TRUE, conf.method = "HPDinterval", 
        rhat = TRUE, ess = TRUE)
    
             term    estimate  std.error     conf.low   conf.high      rhat  ess
    1 b_Intercept 19.52226633 0.82160697 17.981431835 21.26754566 1.0017875 1754
    2     b_cDIST -0.07125559 0.96185771 -2.513628278  1.90341363 0.9995230 2079
    3    b_cLDIST -0.04643137 0.82792388 -1.978043084  1.63693893 0.9994019 2173
    4     b_cAREA  6.45028569 1.46750185  3.505470412  9.20117108 1.0059270 1560
    5    b_cGRAZE -1.28136463 1.03167979 -3.220460816  0.22999487 1.0056027  315
    6      b_cALT  0.02799159 0.02226846 -0.011726251  0.07226876 0.9994930 2094
    7  b_cYR.ISOL  0.08941109 0.04761456 -0.002998981  0.17472440 1.0024133 1197
    8       sigma  6.40947392 0.66387866  5.187831810  7.67092283 1.0017656  506
    9       hs_c2  5.54353010 8.79605100  0.372670519 17.86665059 1.0025060 1249
    
    library(bayesplot)
    mcmc_areas(as.matrix(loyn.brms.sparsity), regex_par = "^b_c")
    
    plot of chunk tut7.2bQ2.13c1