Tutorial 9.6b - Factorial ANOVA (Bayesian)

14 Jan 2014

Overview

Factorial designs are an extension of single factor ANOVA designs in which additional factors are added such that each level of one factor is applied to all levels of the other factor(s) and these combinations are replicated.

For example, we might design an experiment in which the effects of temperature (high vs low) and fertilizer (added vs not added) on the growth rate of seedlings are investigated by growing seedlings under the different temperature and fertilizer combinations.

The following diagram depicts a very simple factorial design in which there are two factors (Shape and Color) each with two levels (Shape: square and circle; Color: blue and white) and each combination has 3 replicates.

In addition to investigating the impacts of the main factors, factorial designs allow us to investigate whether the effects of one factor are consistent across levels of another factor. For example, is the effect of temperature on growth rate the same for both fertilized and unfertilized seedlings and similarly, does the impact of fertilizer treatment depend on the temperature under which the seedlings are grown?

Arguably, these interactions give more sophisticated insights into the dynamics of the system we are investigating. Hence, we could add additional main effects, such as soil pH, amount of water etc along with all the two way (temp:fert, temp:pH, temp:water, etc), three-way (temp:fert:pH, temp:pH:water), four-way etc interactions in order to explore how these various factors interact with one another to effect the response.

However, the more interactions, the more complex the model becomes to specify, compute and interpret - not to mention the rate at which the number of required observations increases.

To appreciate the interpretation of interactions, consider the following figure that depicts fictitious two factor (temperature and fertilizer) designs.

It is clear from the top-right figure that whether or not there is an observed effect of adding fertilizer or not depends on whether we are focused on seedlings growth under high or low temperatures. Fertilizer is only important for seedlings grown under high temperatures. In this case, it is not possible to simply state that there is an effect of fertilizer as it depends on the level of temperature. Similarly, the magnitude of the effect of temperature depends on whether fertilizer has been added or not.

Such interactions are represented by plots in which lines either intersect or converge. The top-right and bottom-left figures both depict parallel lines which are indicative of no interaction. That is, the effects of temperature are similar for both fertilizer added and controls and vice verse. Whilst the former displays an effect of both fertilizer and temperature, the latter only fertilizer is important.

Finally, the bottom-right figure represents a strong interaction that would mask the main effects of temperature and fertilizer (since the nature of the effect of temperature is very different for the different fertilizer treatments and visa verse).

In a frequentist framework, factorial designs can consist:

entirely of crossed fixed factors (Model I ANOVA - most common) in which conclusions are restricted to the specific combinations of levels selected for the experiment,
entirely of crossed random factors (Model II ANOVA) or
a mixture of crossed fixed and random factors (Model III ANOVA).

Bayesians of course make no distinction between fixed and random factors - all factors are random. Hence, the above constructs (Model I, II and III) are irrelevant in a Bayesian framework.

The tutorial on frequentist factorial ANOVA described procedures used to further investigate models in the presence of significant interactions as well as the complications that arise with linear models fitted to unbalanced designs. Again, these issues largely evaporate in a Bayesian framework. Consequently, we will not really dwell on these complications in this tutorial. At most, we will model some unbalanced data, yet it should be noted that we will not need to make any special adjustments in order to do so.

Linear model

The generic linear model is presented here purely for revisory purposes. If, it is unfamiliar to you or you are unsure about what parameters are to be estimated and tested, you are strongly advised to review the the tutorial on frequentist factorial ANOVA

The linear models for two and three factor design are: $$y_{ijk}=\mu+\alpha_i + \beta_{j} + (\alpha\beta)_{ij} + \varepsilon_{ijk}$$ $$y_{ijkl}=\mu+\alpha_i + \beta_{j} + \gamma_{k} + (\alpha\beta)_{ij} + (\alpha\gamma)_{ik} + (\beta\gamma)_{jk} + (\alpha\beta\gamma)_{ijk} + \varepsilon_{ijkl}$$ where $\mu$ is the overall mean, $\alpha$ is the effect of Factor A, $\beta$ is the effect of Factor B, $\gamma$ is the effect of Factor C and $\varepsilon$ is the random unexplained or residual component.

Scenario and Data

Imagine we has designed an experiment in which we had measured the response ($y$) under a combination of two different potential influences (Factor A: levels $a1$ and $a2$; and Factor B: levels $b1$, $b2$ and $b3$), each combination replicated 10 times ($n=10$). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Details of data generation

Random data incorporating the following trends (effect parameters)

the sample size per treatment=10
factor A with 2 levels
factor B with 3 levels
the 6 effects parameters are 40, 15, 5, 0, -15,10 ($\mu_{a1b1}=40$, $\mu_{a2b1}=40+15=55$, $\mu_{a1b2}=40+5=45$, $\mu_{a1b3}=40+0=40$, $\mu_{a2b2}=40+15+5-15=45$ and $\mu_{a2b3}=40+15+0+10=65$)
the data are drawn from normal distributions with a mean of 0 and standard deviation of 3 ($\sigma^2=9$)

> set.seed(1)
> nA <- 2  #number of levels of A
> nB <- 3  #number of levels of B
> nsample <- 10  #number of reps in each
> A <- gl(nA, 1, nA, lab = paste("a", 1:nA, sep = ""))
> B <- gl(nB, 1, nB, lab = paste("b", 1:nB, sep = ""))
> data <- expand.grid(A = A, B = B, n = 1:nsample)
> X <- model.matrix(~A * B, data = data)
> eff <- c(40, 15, 5, 0, -15, 10)
> sigma <- 3  #residual standard deviation
> n <- nrow(data)
> eps <- rnorm(n, 0, sigma)  #residuals
> data$y <- as.numeric(X %*% eff + eps)
> head(data)  #print out the first six rows of the data set

   A  B n     y
1 a1 b1 1 38.12
2 a2 b1 1 55.55
3 a1 b2 1 42.49
4 a2 b2 1 49.79
5 a1 b3 1 40.99
6 a2 b3 1 62.54

> with(data, interaction.plot(A, B, y))

> 
> ## ALTERNATIVELY, we could supply the population means and get the effect parameters from these.  To
> ## correspond to the model matrix, enter the population means in the order of: a1b1, a2b1, a1b1,
> ## a2b2,a1b3,a2b3
> pop.means <- as.matrix(c(40, 55, 45, 45, 40, 65), byrow = F)
> ## Generate a minimum model matrix for the effects
> XX <- model.matrix(~A * B, expand.grid(A = factor(1:2), B = factor(1:3)))
> ## Use the solve() function to solve what are effectively simultaneous equations
> (eff <- as.vector(solve(XX, pop.means)))

[1]  40  15   5   0 -15  10

> data$y <- as.numeric(X %*% eff + eps)

With these sort of data, we are primarily interested in investigating whether there is a relationship between the continuous response variable and the treatment type. Does treatment type effect the response.

Assumptions

The assumptions are:

All of the observations are independent - this must be addressed at the design and collection stages. Importantly, to be considered independent replicates, the replicates must be made at the same scale at which the treatment is applied. For example, if the experiment involves subjecting organisms housed in tanks to different water temperatures, then the unit of replication is the individual tanks not the individual organisms in the tanks. The individuals in a tank are strictly not independent with respect to the treatment.
The response variable (and thus the residuals) should be normally distributed for each sampled populations (combination of factors). Boxplots of each treatment combination are useful for diagnosing major issues with normality.
The response variable should be equally varied (variance should not be related to mean as these are supposed to be estimated separately) for each combination of treatments. Again, boxplots are useful.

Exploratory data analysis

Normality and Homogeneity of variance

> boxplot(y ~ A * B, data)

Conclusions:

there is no evidence that the response variable is consistently non-normal across all populations - each boxplot is approximately symmetrical
there is no evidence that variance (as estimated by the height of the boxplots) differs between the five populations. . More importantly, there is no evidence of a relationship between mean and variance - the height of boxplots does not increase with increasing position along the y-axis. Hence it there is no evidence of non-homogeneity

Obvious violations could be addressed either by:

transform the scale of the response variables (to address normality etc). Note transformations should be applied to the entire response variable (not just those populations that are skewed).

Model fitting or statistical analysis

It is possible to model in all sorts of specific comparisons into the JAGS or STAN model statement (as we did for single factor ANOVA). Likewise, it is possible to define specific main effects type tests within the model statement. However, we will just define the minimum required model and perform all other posterior derivatives from the MCMC samples using R. This way, the techniques can be applied no mater which of the Bayesian modelling routines (JAGS, STAN, MCMCpack etc) were used.

JAGS

Effects model

$$y_{ij}\sim \mathcal{N}(\beta_i\times \mathbf{X}_i, \sigma^2)\\ \mathbf{X}_i = \mu+\alpha_i + \beta_{j} + (\alpha\beta)_{ij}\\ \mu\sim \mathcal{N}(0, 1.0E-6)\\ \beta_i\sim \mathcal{N}(0, 1.0E-6)\\ \sigma^2\sim \mathcal{U}(0,100) $$

Bayesian factorial models can be parameterized as either as effects models or as cell means models. Furthermore, the models can be defined either in terms of a fully specified linear predictor (mirroring the right-hand side of the linear model), or as a model matrix. Adopting the model matrix notation dramatically simplifies the model specification (particularly if there are a few levels of each factor and/or more than two factors. Not only does this allow a more compact and generic model statement, as a result, it also makes model specification less error prone.

> modelString="
model {
   #Likelihood
   for (i in 1:n) {
      y[i]~dnorm(mean[i],tau)
      mean[i] <- inprod(beta[],X[i,])
   }
   #Priors and derivatives
   for (i in 1:p) {
     beta[i] ~ dnorm(0, 1.0E-6) #prior
   }
   sigma ~ dgamma(0.001, 0.001)
   tau <- 1 / (sigma * sigma)
 }
"
> writeLines(modelString,con="../downloads/BUGSscripts/tut9.6bS6.1.txt")

Actually, you might have noticed that this is the same model specification that was used for the single factor ANOVA via the model matrix. That is the point. The one model specification accommodates a huge range of linear models.

In creating a data list to pass to this JAGS model, we need to define a model matrix that reflects our intended linear model. This can be achieved simply via the model.matrix() function.

> X <- model.matrix(~A*B, data)
> data.list <- with(data,
+         list(y=y,
+                  X=X,
+          n=nrow(data),
+          p=ncol(X)
+          )
+ )
> 
> params <- c("beta","sigma")
> adaptSteps = 1000
> burnInSteps = 2000
> nChains = 3
> numSavedSteps = 50000
> thinSteps = 10
> nIter = ceiling((numSavedSteps * thinSteps)/nChains)

> library(R2jags)

jags.effects.simple.time <- system.time(
data.r2jags <- jags(data=data.list,
          inits=NULL,
          parameters.to.save=params,
          model.file="../downloads/BUGSscripts/tut9.6bS6.1.txt",
          n.chains=3,
          n.iter=nIter,
          n.burnin=burnInSteps,
      n.thin=thinSteps
          )
)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 502

Initializing model

print(data.r2jags)

Inference for Bugs model at "../downloads/BUGSscripts/tut9.6bS6.1.txt", fit using jags,
 3 chains, each with 166667 iterations (first 2000 discarded), n.thin = 10
 n.sims = 49401 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]   41.098   0.837  39.450  40.538  41.090  41.654  42.758 1.001 36000
beta[2]   14.649   1.184  12.330  13.859  14.652  15.435  16.989 1.001 49000
beta[3]    4.639   1.184   2.298   3.849   4.640   5.431   6.969 1.001 49000
beta[4]   -0.753   1.188  -3.094  -1.545  -0.753   0.043   1.589 1.001 30000
beta[5]  -15.714   1.672 -19.024 -16.807 -15.718 -14.595 -12.448 1.001 49000
beta[6]    9.335   1.673   6.045   8.225   9.338  10.450  12.614 1.001 49000
sigma      2.636   0.260   2.185   2.453   2.617   2.796   3.207 1.001 49000
deviance 286.016   3.984 280.339 283.105 285.318 288.177 295.700 1.001 49000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 7.9 and DIC = 294.0
DIC is an estimate of expected predictive error (lower deviance is better).

data.mcmc.list <- as.mcmc(data.r2jags)
Data.mcmc.list <- data.mcmc.list

Rstan

Model matrix

> rstanString="
data {
   int<lower=0> n;
   int<lower=0> p;
   vector[n] y;
   matrix [n,p] X;
}
parameters {
   vector[p] beta;
   real<lower=0> sigma;
}
model {
   vector[n] mu;

   #Priors
   beta ~ normal(0.01, 100000);
   sigma~ uniform(0,100);
   
   #Likelihood
   mu <- X*beta;
   y ~ normal(mu, sigma);
 }
"
> X <- model.matrix(~A*B,data)
> data.list <- with(data, list(y = y, X = X, n = nrow(data), p = ncol(X)))
> 
> nChains = 3
> burnInSteps = 3000
> thinSteps = 10
> numSavedSteps = 15000 #across all chains
> nIter = ceiling(burnInSteps+(numSavedSteps * thinSteps)/nChains)
> 
> library(rstan)
> rstan.simple.time <- system.time(
+ data.rstan <- stan(data=data.list,
+            model_code=rstanString,
+            chains=nChains,
+            iter=nIter,
+            warmup=burnInSteps,
+                    thin=thinSteps,
+            save_dso=TRUE
+            )
+ )

TRANSLATING MODEL 'rstanString' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'rstanString' NOW.
SAMPLING FOR MODEL 'rstanString' NOW (CHAIN 1).

Iteration:     1 / 53000 [  0%]  (Warmup)
Iteration:  5300 / 53000 [ 10%]  (Sampling)
Iteration: 10600 / 53000 [ 20%]  (Sampling)
Iteration: 15900 / 53000 [ 30%]  (Sampling)
Iteration: 21200 / 53000 [ 40%]  (Sampling)
Iteration: 26500 / 53000 [ 50%]  (Sampling)
Iteration: 31800 / 53000 [ 60%]  (Sampling)
Iteration: 37100 / 53000 [ 70%]  (Sampling)
Iteration: 42400 / 53000 [ 80%]  (Sampling)
Iteration: 47700 / 53000 [ 90%]  (Sampling)
Iteration: 53000 / 53000 [100%]  (Sampling)
Elapsed Time: 0.28 seconds (Warm-up)
              5.04 seconds (Sampling)
              5.32 seconds (Total)

SAMPLING FOR MODEL 'rstanString' NOW (CHAIN 2).

Iteration:     1 / 53000 [  0%]  (Warmup)
Iteration:  5300 / 53000 [ 10%]  (Sampling)
Iteration: 10600 / 53000 [ 20%]  (Sampling)
Iteration: 15900 / 53000 [ 30%]  (Sampling)
Iteration: 21200 / 53000 [ 40%]  (Sampling)
Iteration: 26500 / 53000 [ 50%]  (Sampling)
Iteration: 31800 / 53000 [ 60%]  (Sampling)
Iteration: 37100 / 53000 [ 70%]  (Sampling)
Iteration: 42400 / 53000 [ 80%]  (Sampling)
Iteration: 47700 / 53000 [ 90%]  (Sampling)
Iteration: 53000 / 53000 [100%]  (Sampling)
Elapsed Time: 0.3 seconds (Warm-up)
              5.42 seconds (Sampling)
              5.72 seconds (Total)

SAMPLING FOR MODEL 'rstanString' NOW (CHAIN 3).

Iteration:     1 / 53000 [  0%]  (Warmup)
Iteration:  5300 / 53000 [ 10%]  (Sampling)
Iteration: 10600 / 53000 [ 20%]  (Sampling)
Iteration: 15900 / 53000 [ 30%]  (Sampling)
Iteration: 21200 / 53000 [ 40%]  (Sampling)
Iteration: 26500 / 53000 [ 50%]  (Sampling)
Iteration: 31800 / 53000 [ 60%]  (Sampling)
Iteration: 37100 / 53000 [ 70%]  (Sampling)
Iteration: 42400 / 53000 [ 80%]  (Sampling)
Iteration: 47700 / 53000 [ 90%]  (Sampling)
Iteration: 53000 / 53000 [100%]  (Sampling)
Elapsed Time: 0.27 seconds (Warm-up)
              5.22 seconds (Sampling)
              5.49 seconds (Total)

> print(data.rstan)

Inference for Stan model: rstanString.
3 chains, each with iter=53000; warmup=3000; thin=10; 
post-warmup draws per chain=5000, total post-warmup draws=15000.

         mean se_mean  sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
beta[1]  41.1       0 0.8  39.4  40.5  41.1  41.7  42.8 14709    1
beta[2]  14.7       0 1.2  12.3  13.8  14.6  15.5  17.0 14783    1
beta[3]   4.6       0 1.2   2.3   3.8   4.6   5.4   7.1 14579    1
beta[4]  -0.7       0 1.2  -3.1  -1.5  -0.8   0.0   1.6 14910    1
beta[5] -15.7       0 1.7 -19.1 -16.8 -15.7 -14.6 -12.5 14678    1
beta[6]   9.3       0 1.7   6.1   8.2   9.3  10.5  12.6 14504    1
sigma     2.7       0 0.3   2.2   2.5   2.6   2.8   3.2 13829    1
lp__    -86.9       0 2.0 -91.7 -88.0 -86.6 -85.5 -84.1 15000    1

Samples were drawn using NUTS(diag_e) at Tue Feb 25 11:25:41 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

> data.rstan.df <-as.data.frame(extract(data.rstan))
> head(data.rstan.df)

  beta.1 beta.2 beta.3  beta.4 beta.5 beta.6 sigma   lp__
1  39.75  16.40  5.971  1.4991 -17.75  6.653 2.424 -85.73
2  39.74  14.05  6.798  1.6368 -16.85  8.913 2.649 -89.32
3  41.14  15.08  3.830 -0.9698 -16.53  9.511 2.477 -85.15
4  40.48  14.11  5.297  0.5224 -14.56 10.878 2.355 -88.23
5  40.12  15.32  5.918  0.3923 -17.59 10.278 2.557 -87.22
6  40.69  12.81  5.234 -1.2517 -14.62 13.137 2.549 -89.05

> 
> data.mcmc<-rstan:::as.mcmc.list.stanfit(data.rstan)
> 
>       
> plyr:::adply(as.matrix(data.rstan.df),2,MCMCsum)

      X1   Median       X0.    X25.     X50.     X75.   X100.   lower   upper lower.1  upper.1
1 beta.1  41.0955  37.51459  40.535  41.0955  41.6528  44.401  39.407  42.761  40.270  41.9263
2 beta.2  14.6357  10.10496  13.848  14.6357  15.4512  19.707  12.395  17.072  13.409  15.7779
3 beta.3   4.6413   0.04425   3.843   4.6413   5.4344   9.675   2.308   7.038   3.461   5.8125
4 beta.4  -0.7649  -6.85967  -1.538  -0.7649   0.0419   4.021  -3.168   1.519  -1.986   0.3503
5 beta.5 -15.7054 -22.86702 -16.819 -15.7054 -14.5661  -9.244 -19.014 -12.430 -17.337 -13.9972
6 beta.6   9.3374   2.84353   8.208   9.3374  10.4894  16.531   6.049  12.620   7.746  11.0858
7  sigma   2.6423   1.87165   2.477   2.6423   2.8290   4.157   2.177   3.191   2.372   2.8800
8   lp__ -86.6107 -99.28779 -88.015 -86.6107 -85.4866 -83.393 -90.856 -83.833 -87.947 -84.4797

>  
> comparisonTab <- cbind(comparisonTab,'Stan'=comTab(data.mcmc,cols="beta|sigma",lag=10,rstan.simple.time))

MCMCregress (MCMCpack)

> library(MCMCpack)
> MCMCpack.additive.time <- system.time({
+ data.MCMCregress.1 <- MCMCregress(y~A*B, data=data, burnin=3000, thin=10, mcmc=50000, seed=1234)
+ data.MCMCregress.2 <- MCMCregress(y~A*B, data=data, burnin=3000, thin=10, mcmc=50000, seed=5678)
+ data.MCMCregress.3 <- MCMCregress(y~A*B, data=data, burnin=3000, thin=10, mcmc=50000, seed=910111)
+ })
> 
> data.mcmc.list <- as.mcmc.list(list(data.MCMCregress.1,data.MCMCregress.2,data.MCMCregress.3))
> summary(data.mcmc.list)

Iterations = 3001:52991
Thinning interval = 10 
Number of chains = 3 
Sample size per chain = 5000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

               Mean    SD Naive SE Time-series SE
(Intercept)  41.106 0.835  0.00682        0.00669
Aa2          14.652 1.185  0.00967        0.00986
Bb2           4.634 1.187  0.00969        0.00962
Bb3          -0.762 1.190  0.00972        0.00964
Aa2:Bb2     -15.731 1.679  0.01371        0.01376
Aa2:Bb3       9.337 1.676  0.01369        0.01389
sigma2        7.013 1.409  0.01151        0.01130

2. Quantiles for each variable:

              2.5%    25%     50%      75%  97.5%
(Intercept)  39.46  40.55  41.104  41.6671  42.76
Aa2          12.33  13.86  14.654  15.4412  16.99
Bb2           2.28   3.86   4.632   5.4232   6.96
Bb3          -3.09  -1.56  -0.773   0.0303   1.58
Aa2:Bb2     -19.05 -16.84 -15.731 -14.6393 -12.38
Aa2:Bb3       6.02   8.24   9.337  10.4622  12.60
sigma2        4.79   6.01   6.825   7.8067  10.28

> 
> nIter=150000
> thinSteps=10
> comparisonTab <- cbind(comparisonTab,'MCMCregress'=comTab(data.mcmc.list,cols="Intercept|A|B|sigma",lag=10,MCMCpack.additive.time))
>

MCMCglmm

> library(MCMCglmm)
> MCMCglmm.time <- system.time({
+ data.MCMCglmm.1 <- MCMCglmm(y~A*B, data=data, burnin=3000, thin=10, nitt=50000, verbose=FALSE)
+ data.MCMCglmm.2 <- MCMCglmm(y~A*B, data=data, burnin=3000, thin=10, nitt=50000, verbose=FALSE)
+ data.MCMCglmm.3 <- MCMCglmm(y~A*B, data=data, burnin=3000, thin=10, nitt=50000, verbose=FALSE)
+ })
> 
> data.mcmc.list<-mcmc.list(
+ as.mcmc(cbind(data.MCMCglmm.1$Sol,data.MCMCglmm.1$VCV)),
+ as.mcmc(cbind(data.MCMCglmm.2$Sol,data.MCMCglmm.2$VCV)),
+ as.mcmc(cbind(data.MCMCglmm.3$Sol,data.MCMCglmm.3$VCV))
+ )
> 
> summary(data.mcmc.list)

Iterations = 1:4700
Thinning interval = 1 
Number of chains = 3 
Sample size per chain = 4700 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

              Mean    SD Naive SE Time-series SE
(Intercept)  41.08 0.838  0.00706        0.00715
Aa2          14.66 1.190  0.01002        0.01038
Bb2           4.65 1.171  0.00986        0.00986
Bb3          -0.73 1.189  0.01001        0.00984
Aa2:Bb2     -15.73 1.664  0.01402        0.01417
Aa2:Bb3       9.31 1.674  0.01410        0.01410
units         7.01 1.410  0.01187        0.01187

2. Quantiles for each variable:

              2.5%    25%     50%      75%  97.5%
(Intercept)  39.43  40.52  41.086  41.6365  42.72
Aa2          12.35  13.87  14.663  15.4562  17.03
Bb2           2.35   3.88   4.653   5.4261   6.96
Bb3          -3.07  -1.51  -0.739   0.0645   1.62
Aa2:Bb2     -19.00 -16.84 -15.733 -14.6049 -12.43
Aa2:Bb3       5.97   8.21   9.324  10.4196  12.57
units         4.77   6.01   6.834   7.8265  10.22

> 
> nIter=150000
> thinSteps=10
> comparisonTab <- cbind(comparisonTab,'MCMCglmm'=comTab(data.mcmc.list,cols="Intercept|A|B|unit",lag=10,MCMCglmm.time))
>

rcppbugs

> library(rcppbugs)
> 
> b <- mcmc.normal(rnorm(6),mu=0,tau=0.0001)
> tau.y <- mcmc.gamma(runif(1),alpha=0.1,beta=0.1)
> X <- model.matrix(~A*B, data)
> y.hat <- linear(X,b)
> #or
> y.hat <- deterministic(function(X,b) { X %*% b }, X, b)
> 
> y.lik <- mcmc.normal(data$y,mu=y.hat,tau=tau.y,observed=TRUE)
> m <- create.model(b, tau.y, y.hat, y.lik)
> 
> ## then to run the model:
> cat("running model...\n")

running model...

> rcppbugs.time <- system.time(
+ ans <- run.model(m, iterations=500000, burn=30000, adapt=1e3L, thin=500)
+ )
> ## and view the 'trace' of b
> #print(ans[["b"]])
> 
> summary(as.mcmc(ans$b))

Iterations = 1:1000
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 1000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

        Mean   SD Naive SE Time-series SE
[1,]  41.138 0.83   0.0262         0.0295
[2,]  14.671 1.20   0.0379         0.0440
[3,]   4.586 1.13   0.0358         0.0428
[4,]  -0.812 1.20   0.0380         0.0416
[5,] -15.765 1.64   0.0519         0.0643
[6,]   9.341 1.68   0.0531         0.0610

2. Quantiles for each variable:

       2.5%    25%     50%      75%  97.5%
var1  39.54  40.61  41.148  41.7188  42.75
var2  12.35  13.84  14.663  15.4575  17.13
var3   2.43   3.82   4.541   5.3699   6.82
var4  -3.30  -1.62  -0.826   0.0182   1.64
var5 -19.12 -16.81 -15.768 -14.6479 -12.63
var6   5.86   8.19   9.418  10.4815  12.44

> data.mcmc <- as.mcmc(cbind(ans$b, 1/ans$tau.y))
> colnames(data.mcmc) <- c("beta1","beta2","beta3","beta4","beta5","beta6","sigma")
> 
> nIter=500000
> thinSteps=500
> comparisonTab <- cbind(comparisonTab,'rcppbugs'=comTab(data.mcmc,cols="beta|sigma",lag=1,rcppbugs.time))
>

Python

> X <- model.matrix(~A * B, data = data)
> dataP <- cbind(data$y, X)
> write.csv(dataP, "../downloads/data/factorialAnova.csv")

import pandas as pd
import numpy as np
from pymc import *
   
pd.set_option('html', False)
data = pd.read_csv('../downloads/data/factorialAnova.csv')
#print data.head()

Y = data.y
X = data.iloc[:,2:7]

beta = Normal('beta_%s', mu=0.0, tau=100000**-1, value=np.zeros(5))
sigma = Uniform('sigma', lower=0, upper=1000)
tau = Lambda('tau', lambda sd=sigma: sd**-2)

#  Model
mu = Lambda('mu', lambda b=beta: np.dot(X,b))
#mu = LinearCombination('mu',[X],[beta],doc='mu=Xb')

# Likelihood
Yi = Normal('Yi',mu=mu,tau=tau,observed=True,value=Y)

alpha = beta[1]+beta[0]

mdl= Model([Yi,tau,sigma,beta])
M = MCMC(mdl)

M.sample(50000,40000,thin=100,progress_bar=False)

print M.stats()

print M.summary()

Matplot.plot(M)

Quick glance Bayesian regression model specification




     JAGS   Stan   MCMCregress   MCMCglmm   rcppbugs   
    Iterations   166667   53000   150000   150000   5e+05  
    Thinning   10   10   10   10   500  
    Samples   49401   15000   15000   14100   1000  
    Maximum ACF   0.0079   0.0169   0.0183   0.0191   0.16  
    (Intercept)   41.10 [39.46 , 42.77]   41.10 [39.41 , 42.76]   41.10 [39.49 , 42.77]   41.09 [39.43 , 42.72]   41.15 [39.59 , 42.78]  
    beta2   14.65 [12.37 , 17.02]   14.64 [12.39 , 17.07]   14.65 [12.34 , 16.99]   14.66 [12.26 , 16.94]   14.66 [12.48 , 17.17]  
    beta3   4.63 [2.35 , 7.00]   4.64 [2.31 , 7.04]   4.63 [2.29 , 6.97]   4.65 [2.43 , 7.01]   4.54 [2.27 , 6.66]  
    beta4   -0.75 [-3.14 , 1.52]   -0.76 [-3.17 , 1.52]   -0.77 [-3.02 , 1.64]   -0.74 [-3.06 , 1.62]   -0.83 [-3.00 , 1.86]  
    beta5   -15.71 [-18.97 , -12.41]   -15.71 [-19.01 , -12.43]   -15.73 [-19.10 , -12.46]   -15.73 [-19.05 , -12.51]   -15.77 [-19.12 , -12.61]  
    beta6   9.34 [5.94 , 12.58]   9.34 [6.05 , 12.62]   9.34 [6.16 , 12.72]   9.32 [6.03 , 12.60]   9.42 [6.04 , 12.59]  
    sigma   2.62 [2.15 , 3.16]   2.64 [2.18 , 3.19]   6.82 [4.51 , 9.78]   6.83 [4.45 , 9.75]   6.77 [4.72 , 10.12]  
    Time (user self)   23.09   17.05   0.79   3.00   2.80  
    Time (system self)   0.00   0.05   0.00   0.00   0.00  
    Time (elapsed)   23.12   45.46   0.80   3.04   2.81  
    Time (user child)   0.00   27.43   0.00   0.00   0.00  
    Time (system child)   0.00   0.65   0.00   0.00   0.00

	JAGS	Stan	MCMCregress	MCMCglmm	rcppbugs
Iterations	166667	53000	150000	150000	5e+05
Thinning	10	10	10	10	500
Samples	49401	15000	15000	14100	1000
Maximum ACF	0.0079	0.0169	0.0183	0.0191	0.16
(Intercept)	41.10 [39.46 , 42.77]	41.10 [39.41 , 42.76]	41.10 [39.49 , 42.77]	41.09 [39.43 , 42.72]	41.15 [39.59 , 42.78]
beta2	14.65 [12.37 , 17.02]	14.64 [12.39 , 17.07]	14.65 [12.34 , 16.99]	14.66 [12.26 , 16.94]	14.66 [12.48 , 17.17]
beta3	4.63 [2.35 , 7.00]	4.64 [2.31 , 7.04]	4.63 [2.29 , 6.97]	4.65 [2.43 , 7.01]	4.54 [2.27 , 6.66]
beta4	-0.75 [-3.14 , 1.52]	-0.76 [-3.17 , 1.52]	-0.77 [-3.02 , 1.64]	-0.74 [-3.06 , 1.62]	-0.83 [-3.00 , 1.86]
beta5	-15.71 [-18.97 , -12.41]	-15.71 [-19.01 , -12.43]	-15.73 [-19.10 , -12.46]	-15.73 [-19.05 , -12.51]	-15.77 [-19.12 , -12.61]
beta6	9.34 [5.94 , 12.58]	9.34 [6.05 , 12.62]	9.34 [6.16 , 12.72]	9.32 [6.03 , 12.60]	9.42 [6.04 , 12.59]
sigma	2.62 [2.15 , 3.16]	2.64 [2.18 , 3.19]	6.82 [4.51 , 9.78]	6.83 [4.45 , 9.75]	6.77 [4.72 , 10.12]
Time (user self)	23.09	17.05	0.79	3.00	2.80
Time (system self)	0.00	0.05	0.00	0.00	0.00
Time (elapsed)	23.12	45.46	0.80	3.04	2.81
Time (user child)	0.00	27.43	0.00	0.00	0.00
Time (system child)	0.00	0.65	0.00	0.00	0.00

Model	JAGS	rstan
Matrix effects parameterization	model { #Likelihood for (i in 1:n) { y[i]~dnorm(mean[i],tau) mean[i] <- inprod(beta[],X[i,]) } #Priors and derivatives for (i in 1:p) { beta[i] ~ dnorm(0, 1.0E-6) #prior } sigma ~ dgamma(0.001, 0.001) tau <- 1 / (sigma * sigma) }	data { int<lower=0> n; int<lower=0> p; vector[n] y; matrix [n,p] X; } parameters { vector[p] beta; real<lower=0> sigma; } model { vector[n] mu; #Priors beta ~ normal(0.01, 100000); sigma~ uniform(0,100); #Likelihood mu <- X*beta; y ~ normal(mu, sigma); }

Model

JAGS

rstan

Matrix effects parameterization

model {
   #Likelihood
   for (i in 1:n) {
      y[i]~dnorm(mean[i],tau)
      mean[i] <- inprod(beta[],X[i,])
   }
   #Priors and derivatives
   for (i in 1:p) {
     beta[i] ~ dnorm(0, 1.0E-6) #prior
   }
   sigma ~ dgamma(0.001, 0.001)
   tau <- 1 / (sigma * sigma)
 }

data {
   int<lower=0> n;
   int<lower=0> p;
   vector[n] y;
   matrix [n,p] X;
}
parameters {
   vector[p] beta;
   real<lower=0> sigma;
}
model {
   vector[n] mu;

   #Priors
   beta ~ normal(0.01, 100000);
   sigma~ uniform(0,100);
   
   #Likelihood
   mu <- X*beta;
   y ~ normal(mu, sigma);
 }

Finite population standard deviations

Prior to creating code to do all sorts of matrix manipulations, we will first define a new function to calculate finite-population standard deviation.

The sd() function calculates population standard deviations from samples according to the regular formula: $$\sigma = \frac{\sum{(x_i - \bar{x})^2}}{n-1}$$ where $n$ is the sample size. Division by $n-1$ is intended to reduce the bias in the estimated population standard deviation given that the mean is also calculated from the same sample.

In order to be able to sensibly compare standard deviations between 'fixed' and 'random' factors, all factors need to be considered as 'fixed' for the purpose of standard deviation calculations. That is, we need to assume that we have sampled from all possible levels available.

> sdP <- function(x) {
+     m <- mean(x)
+     sqrt((sum((x - m)^2))/length(x))
+ }

Method 1 - create specific model matrices to generate specific cell means for each of the main effects and interactions from which we can calculate the finite-population standard deviation.

> data.fpsd <- NULL
> # Get the contrasts used for each Factor
> contr <- attr(model.matrix(~A * B, data), "contrasts")
> ## Get the effects parameter samples
> library(coda)
> coefs <- as.matrix(Data.mcmc.list)[, 1:6]
> 
> # A
> dt <- data
> dt$B <- "b1"
> cc <- model.frame(~A * B, dt, xlev = list(A = levels(data$A), B = levels(data$B)))
> X1 <- model.matrix(~A * B, cc, contrasts = contr)
> a <- coefs[, 1:6] %*% t(X1)
> data.fpsd <- rbind(data.fpsd, A = MCMCsum(apply(a, 1, sdP)))
> # MCMCsum(apply(t(apply(a,1,function(x) x-t(data$y))), 1, sdP))
> 
> dt <- data
> dt$A <- "a1"
> cc <- model.frame(~A * B, dt, xlev = list(A = levels(data$A), B = levels(data$B)))
> X1 <- model.matrix(~A * B, cc, contrasts = contr)
> a <- coefs[, 1:6] %*% t(X1)
> data.fpsd <- rbind(data.fpsd, B = MCMCsum(apply(a, 1, sdP)))
> # MCMCsum(apply(t(apply(a,1,function(x) x-t(data$y))), 1, sdP))
> 
> 
> dt <- data  #expand.grid(A=levels(data$A), B=levels(data$B))
> cc <- model.frame(~A * B, dt, xlev = list(A = levels(data$A), B = levels(data$B)))
> X1 <- model.matrix(~A * B, cc, contrasts = contr)
> a <- coefs[, 1:6] %*% t(X1)
> data.fpsd <- rbind(data.fpsd, `A:B` = MCMCsum(apply(a, 1, sdP)))
> 
> # Residuals
> data.Xmat <- model.matrix(~A * B, data)
> a <- coefs[, 1:6] %*% t(data.Xmat)
> data.fpsd <- rbind(data.fpsd, Resid = MCMCsum(apply(t(apply(a, 1, function(x) x - t(data$y))), 1, sdP)))
> 
> data.fpsd

      Median    X0.  X25.  X50.  X75.  X100. lower upper lower.1 upper.1
A      7.326 4.3800 6.929 7.326 7.718  9.987 6.161 8.490   6.707   7.876
B      2.432 0.5893 2.113 2.432 2.749  4.788 1.487 3.362   1.974   2.919
A:B    8.653 7.0894 8.425 8.653 8.882 10.126 7.997 9.344   8.312   8.983
Resid  2.565 2.4662 2.526 2.565 2.618  3.409 2.473 2.729   2.488   2.609

Method 2 - incorporate the finite population standard deviation calculations as derived quantities (posteriors) into the Bayesian model.

> modelString = "\nmodel {\n   #Likelihood\n   for (i in 1:n) {\n      y[i]~dnorm(mean[i],tau)\n      mean[i] <- inprod(beta[],X[i,])\n      y.err[i] <- y[i] - mean[i]\n   }\n   #Priors and derivatives\n   for (i in 1:p) {\n     beta[i] ~ dnorm(0, 1.0E-6) #prior\n   }\n   sigma ~ dgamma(0.001, 0.001)\n   tau <- 1 / (sigma * sigma)\n\n   sd.A <- abs(beta[2])*sd(X[,2])\n   sd.B <- sd(beta[3:4])\n   sd.AB <- sd(beta[5:6])\n   sd.resid <- sd(y.err)\n }\n"
> writeLines(modelString, con = "../downloads/BUGSscripts/junk.txt")
> X <- model.matrix(~A * B, data)
> data.list <- with(data, list(y = y, X = X, n = nrow(data), p = ncol(X)))
> 
> params <- c("beta", "sigma", "sd.A", "sd.B", "sd.AB", "sd.resid")
> adaptSteps = 1000
> burnInSteps = 2000
> nChains = 3
> numSavedSteps = 50000
> thinSteps = 10
> nIter = ceiling((numSavedSteps * thinSteps)/nChains)
> library(R2jags)
> rnorm(1)

[1] 1.594

> jags.effects.simple.time <- system.time(data.r2jagsA <- jags(data = data.list, inits = NULL, parameters.to.save = params, model.file = "../downloads/BUGSscripts/junk.txt", n.chains = 3, n.iter = nIter,
+     n.burnin = burnInSteps, n.thin = thinSteps))

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 572

Initializing model

> print(data.r2jagsA)

Inference for Bugs model at "../downloads/BUGSscripts/junk.txt", fit using jags,
 3 chains, each with 166667 iterations (first 2000 discarded), n.thin = 10
 n.sims = 49401 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]   41.098   0.835  39.444  40.544  41.100  41.654  42.752 1.001 49000
beta[2]   14.649   1.185  12.316  13.856  14.644  15.438  16.997 1.001 49000
beta[3]    4.641   1.180   2.317   3.862   4.642   5.424   6.957 1.001 49000
beta[4]   -0.747   1.185  -3.082  -1.541  -0.747   0.040   1.593 1.001 49000
beta[5]  -15.718   1.672 -19.025 -16.826 -15.712 -14.600 -12.442 1.001 49000
beta[6]    9.335   1.678   6.040   8.215   9.333  10.450  12.636 1.001 49000
sd.A       7.325   0.593   6.158   6.928   7.322   7.719   8.498 1.001 49000
sd.AB     12.527   0.842  10.874  11.961  12.529  13.085  14.177 1.001 49000
sd.B       2.694   0.591   1.531   2.300   2.693   3.088   3.855 1.001 32000
sd.resid   2.580   0.075   2.484   2.526   2.564   2.618   2.769 1.001 49000
sigma      2.635   0.261   2.187   2.451   2.613   2.795   3.208 1.001 49000
deviance 286.007   3.987 280.342 283.077 285.321 288.168 295.572 1.001 49000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 7.9 and DIC = 294.0
DIC is an estimate of expected predictive error (lower deviance is better).

Method 3 - calculating the finite-population standard deviations from the parameter samples.

> data.fpsd1 <- NULL
> 
> ## Get the effects parameter samples
> library(coda)
> coefs <- as.matrix(Data.mcmc.list)[, 1:6]
> # sd A
> data.fpsd1 <- rbind(data.fpsd1, A = MCMCsum(apply(cbind(0, coefs[, 2]), 1, sdP)))
> # sd B
> data.fpsd1 <- rbind(data.fpsd1, B = MCMCsum(apply(coefs[, 3:4], 1, sdP)))
> # sd AB
> data.fpsd1 <- rbind(data.fpsd1, `A:B` = MCMCsum(apply(coefs[, 5:6], 1, sdP)))
> # sd Resid
> data.Xmat <- model.matrix(~A * B, data)
> a <- coefs[, 1:6] %*% t(data.Xmat)
> data.fpsd1 <- rbind(data.fpsd1, Resid = MCMCsum(apply(t(apply(a, 1, function(x) x - t(data$y))), 1, sdP)))
> 
> data.fpsd1

      Median     X0.   X25.   X50.   X75.  X100.  lower  upper lower.1 upper.1
A      7.326 4.38004  6.929  7.326  7.718  9.987  6.161  8.490   6.707   7.876
B      2.696 0.04553  2.299  2.696  3.093  5.303  1.547  3.869   2.119   3.289
A:B   12.526 9.05699 11.965 12.526 13.083 16.195 10.868 14.169  11.704  13.353
Resid  2.565 2.46622  2.526  2.565  2.618  3.409  2.473  2.729   2.488   2.609

Via specific matricies	Via MCMC samples

Conclusions - there is clear evidence of a strong interactions between Factor A and Factor B. There is very little residual variability. In the presence of a strong interaction, it is not really appropriate to make statements about the main effects (similar to frequentist ANOVA).

Dealing with interactions

Start by generating posteriors for the cell means (means of each factor combination) using the MCMC effects parameter samples.

> ## Get the effects parameter samples
> library(coda)
> coefs <- as.matrix(Data.mcmc.list)[, 1:6]
> ## Calculate the model matrix
> dt <- expand.grid(A = gl(2, 1, 2, lab = c("a1", "a2")), B = gl(3, 1, 3, lab = paste("b", 1:3, sep = "")))
> Xmat <- model.matrix(~A * B, dt)
> # OR using actual data frame----------------------------------------------
> Xmat <- model.matrix(~A * B, data)
> library(plyr)
> Xmat <- as.matrix(ddply(data.frame(Xmat), ~interaction(data$A, data$B), colwise(mean))[, -1])
> ## ------------------------------------------------------------------------
> cellmeans.mcmc <- coefs %*% t(Xmat)
> colnames(cellmeans.mcmc) <- unique(interaction(data$A, data$B))
> head(cellmeans.mcmc)

     a1.b1 a2.b1 a1.b2 a2.b2 a1.b3 a2.b3
[1,] 42.56 56.64 45.55 42.79 40.65 63.92
[2,] 41.48 54.76 45.32 44.87 39.80 64.60
[3,] 40.54 57.91 45.63 43.47 40.97 62.15
[4,] 41.17 55.94 45.90 43.60 40.21 65.58
[5,] 41.41 56.90 45.40 45.92 40.06 63.52
[6,] 40.21 56.56 45.50 43.65 40.25 63.34

Work out which means you want to compare for any particular comparison.

> ## a2 vs a1 at b1 (columns 2 vs 1)
> comps <- cellmeans.mcmc[, 2] - cellmeans.mcmc[, 1]
> MCMCsum(comps)

     Median  X0.  X25.  X50.  X75. X100. lower upper lower.1 upper.1
var1  14.65 8.76 13.86 14.65 15.44 19.97 12.32 16.98   13.41   15.75

> 
> comps <- cellmeans.mcmc[, c(2, 4)] - cellmeans.mcmc[, c(1, 3)]
> library(plyr)
> adply(comps, 2, MCMCsum)

     X1 Median    X0.   X25.   X50.    X75.  X100. lower  upper lower.1 upper.1
1 a2.b1 14.652  8.760 13.859 14.652 15.4352 19.975 12.32 16.980  13.415 15.7510
2 a2.b2 -1.064 -6.575 -1.852 -1.064 -0.2744  4.457 -3.35  1.301  -2.193  0.1391

Alternatively, we could generate a new model matrix to enter in the matrix multiplication.

> dt <- expand.grid(A = gl(2, 1, 2, lab = c("a1", "a2")), B = gl(3, 1, 3, lab = paste("b", 1:3, sep = "")))
> Xmat <- model.matrix(~A * B, dt)
> contr <- attr(Xmat, "contrasts")
> cc <- model.frame(~A * B, expand.grid(A = "a1", B = levels(dt$B)), xlev = list(A = levels(dt$A), B = levels(dt$B)))
> X1 <- model.matrix(~A * B, cc, contrasts = contr)
> cc <- model.frame(~A * B, expand.grid(A = "a2", B = levels(dt$B)), xlev = list(A = levels(dt$A), B = levels(dt$B)))
> X2 <- model.matrix(~A * B, cc, contrasts = contr)
> Xmat <- X2 - X1
> 
> comps <- coefs %*% t(Xmat)
> adply(comps, 2, MCMCsum)

  X1 Median    X0.   X25.   X50.    X75.  X100. lower  upper lower.1 upper.1
1  1 14.652  8.760 13.859 14.652 15.4352 19.975 12.32 16.980  13.415 15.7510
2  2 -1.064 -6.575 -1.852 -1.064 -0.2744  4.457 -3.35  1.301  -2.193  0.1391
3  3 23.986 18.693 23.188 23.986 24.7809 28.781 21.71 26.336  22.754 25.0940

We could also setup a contrast matrix to compare every factor combination to every other combination. Of course, many of these comparisons would be of very little value, yet since they come at no statistical cost and virtually no CPU cost, we might as well compute them.

> library(multcomp)
> dt <- expand.grid(A = gl(2, 1, 2, lab = c("a1", "a2")), B = gl(3, 1, 3, lab = paste("b", 1:3, sep = "")))
> Xmat <- contrMat(n = rep(1, 6), type = "Tukey") %*% model.matrix(~A * B, dt)
> 
> wch <- ldply(strsplit(rownames(Xmat), " - "), as.numeric)
> rownames(Xmat) <- paste(interaction(dt$A, dt$B)[wch[, 1]], "-", interaction(dt$A, dt$B)[wch[, 2]])
> pairwise.comps <- coefs %*% t(Xmat)
> adply(pairwise.comps, 2, MCMCsum)

              X1  Median      X0.    X25.    X50.      X75.     X100.   lower   upper lower.1  upper.1
1  a2.b1 - a1.b1  14.652   8.7601  13.859  14.652  15.43517  19.97466  12.323  16.980  13.415  15.7510
2  a1.b2 - a1.b1   4.640  -0.4678   3.849   4.640   5.43066  10.41009   2.359   7.015   3.468   5.7973
3  a2.b2 - a1.b1   3.584  -1.4835   2.784   3.584   4.36833   9.21229   1.213   5.890   2.426   4.7643
4  a1.b3 - a1.b1  -0.753  -5.7112  -1.545  -0.753   0.04267   4.60223  -3.119   1.560  -1.985   0.3477
5  a2.b3 - a1.b1  23.236  16.9283  22.443  23.236  24.02545  28.58819  20.960  25.625  22.096  24.4334
6  a1.b2 - a2.b1 -10.008 -15.1043 -10.800 -10.008  -9.21427  -3.02951 -12.345  -7.689 -11.152  -8.8161
7  a2.b2 - a2.b1 -11.069 -16.6425 -11.859 -11.069 -10.29079  -5.84443 -13.428  -8.765 -12.230  -9.9030
8  a1.b3 - a2.b1 -15.403 -20.1645 -16.197 -15.403 -14.60353 -10.06412 -17.740 -13.056 -16.609 -14.2743
9  a2.b3 - a2.b1   8.583   2.5642   7.788   8.583   9.36127  14.47940   6.276  10.928   7.427   9.7446
10 a2.b2 - a1.b2  -1.064  -6.5748  -1.852  -1.064  -0.27435   4.45700  -3.350   1.301  -2.193   0.1391
11 a1.b3 - a1.b2  -5.393 -10.6058  -6.185  -5.393  -4.59728  -0.09107  -7.738  -3.094  -6.577  -4.2387
12 a2.b3 - a1.b2  18.592  13.1093  17.796  18.592  19.37964  23.68861  16.230  20.900  17.439  19.7637
13 a1.b3 - a2.b2  -4.330  -9.4606  -5.117  -4.330  -3.53601   1.28821  -6.676  -2.047  -5.516  -3.1866
14 a2.b3 - a2.b2  19.663  14.0185  18.875  19.663  20.43900  24.98024  17.356  22.009  18.514  20.8492
15 a2.b3 - a1.b3  23.986  18.6933  23.188  23.986  24.78088  28.78113  21.709  26.336  22.754  25.0940

Unbalanced designs

Unlike in frequentist modelling, in Bayesian statistics, design balance does not present any additional challenges (above the potential impacts on variance homogeneity. In order to illustrate this, we will modify the current data set by removing a few replicates from specific cells.

> data1 <- data[c(-3, -6, -12), ]
> with(data1, table(A, B))

    B
A    b1 b2 b3
  a1 10  9 10
  a2 10 10  8

> ## If the following function returns a list (implying that the number of replicates is different for different combinations of factors), then the design is NOT balanced
> replications(~A * B, data1)

$A
A
a1 a2 
29 28 

$B
B
b1 b2 b3 
20 19 18 

$`A:B`
    B
A    b1 b2 b3
  a1 10  9 10
  a2 10 10  8

> modelString="
model {
   #Likelihood
   for (i in 1:n) {
      y[i]~dnorm(mean[i],tau)
      mean[i] <- inprod(beta[],X[i,])
   }
   #Priors and derivatives
   for (i in 1:p) {
     beta[i] ~ dnorm(0, 1.0E-6) #prior
   }
   sigma ~ dgamma(0.001, 0.001)
   tau <- 1 / (sigma * sigma)
 }
"
> writeLines(modelString,con="../downloads/BUGSscripts/tut9.6bS9.2.txt")
> X <- model.matrix(~A*B, data1)
> data.list1 <- with(data1,
+         list(y=y,
+                  X=X,
+          n=nrow(data1),
+          p=ncol(X)
+          )
+ )
> 
> params <- c("beta","sigma")
> adaptSteps = 1000
> burnInSteps = 2000
> nChains = 3
> numSavedSteps = 50000
> thinSteps = 10
> nIter = ceiling((numSavedSteps * thinSteps)/nChains)
> 
> library(R2jags)
> rnorm(1)

[1] 0.9133

> 
> jags.effects.simple.time <- system.time(
+ data.r2jags1 <- jags(data=data.list1,
+           inits=NULL,
+           parameters.to.save=params,
+           model.file="../downloads/BUGSscripts/tut9.6bS9.2.txt",
+           n.chains=3,
+           n.iter=nIter,
+           n.burnin=burnInSteps,
+       n.thin=thinSteps
+           )
+ )

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 478

Initializing model

> print(data.r2jags1)

Inference for Bugs model at "../downloads/BUGSscripts/tut9.6bS9.2.txt", fit using jags,
 3 chains, each with 166667 iterations (first 2000 discarded), n.thin = 10
 n.sims = 49401 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]   41.090   0.842  39.428  40.531  41.088  41.651  42.742 1.001 49000
beta[2]   14.656   1.192  12.302  13.861  14.653  15.451  16.994 1.001 49000
beta[3]    5.002   1.224   2.586   4.181   5.007   5.825   7.403 1.001 46000
beta[4]   -0.745   1.189  -3.093  -1.538  -0.746   0.048   1.592 1.001 49000
beta[5]  -16.074   1.713 -19.443 -17.212 -16.077 -14.936 -12.714 1.001 49000
beta[6]    9.325   1.734   5.903   8.177   9.320  10.484  12.754 1.001 49000
sigma      2.648   0.268   2.189   2.458   2.626   2.813   3.233 1.001 31000
deviance 272.110   3.989 266.444 269.176 271.433 274.274 281.681 1.001 19000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 8.0 and DIC = 280.1
DIC is an estimate of expected predictive error (lower deviance is better).

> data.mcmc.list1 <- as.mcmc(data.r2jags1)
> 
> ## Get the effects parameter samples
> library(coda)
> coefs <- as.matrix(data.mcmc.list1)[, 1:6]
> ## Calculate the model matrix
> dt <- expand.grid(A = gl(2, 1, 2, lab = c("a1", "a2")), B = gl(3, 1, 3, lab = paste("b", 1:3, sep = "")))
> Xmat <- model.matrix(~A * B, dt)
> # OR using actual data frame----------------------------------------------
> Xmat <- model.matrix(~A * B, data1)
> library(plyr)
> Xmat <- as.matrix(ddply(data.frame(Xmat), ~interaction(data1$A, data1$B), colwise(mean))[, -1])
> ## ------------------------------------------------------------------------
> cellmeans.mcmc1 <- coefs %*% t(Xmat)
> colnames(cellmeans.mcmc1) <- unique(interaction(data1$A, data1$B))
> adply(cellmeans.mcmc1, 2, MCMCsum)

     X1 Median   X0.  X25.  X50.  X75. X100. lower upper lower.1 upper.1
1 a1.b1  41.09 37.38 40.53 41.09 41.65 44.65 39.43 42.75   40.27   41.93
2 a2.b1  55.74 51.82 55.18 55.74 56.31 59.86 54.08 57.38   54.91   56.56
3 a2.b2  46.09 41.90 45.50 46.09 46.68 50.13 44.28 47.80   45.23   46.98
4 a1.b3  44.68 40.72 44.11 44.68 45.23 48.31 43.01 46.34   43.77   45.44
5 a1.b2  40.35 36.98 39.79 40.35 40.90 44.12 38.70 42.00   39.53   41.18
6 a2.b3  64.33 60.00 63.70 64.33 64.96 68.71 62.51 66.21   63.42   65.28

Conclusions - these values are practically the same as previous estimates based on a full balanced design!