Jump to main navigation


Tutorial 10.6b - Poisson regression and log-linear models (Bayesian)

12 Sep 2016

Overview

Whilst in many instances, count data can be approximated reasonably well by a normal distribution (particularly if the counts are all above zero and the mean count is greater than about 20), more typically, when count data are modelled via normal distribution certain undesirable characteristics arise that are a consequence of the nature of discrete non-negative data.

  • Expected (predicted) values and confidence bands less than zero are illogical, yet these are entirely possible from a normal distribution
  • The distribution of count data are often skewed when their mean is low (in part because the distribution is truncated to the left by zero) and variance usually increases with increasing mean (variance is typically proportional to mean in count data). By contrast, the Gaussian (normal) distribution assumes that mean and variance are unrelated and thus estimates (particularly of standard error) might well be reasonable inaccurate.

Poisson regression is a type of generalized linear model (GLM) in which a non-negative integer (natural number) response is modelled against a linear predictor via a specific link function. The linear predictor is typically a linear combination of effects parameters (e.g. $\beta_0 + \beta_1x_x$). The role of the link function is to transform the expected values of the response y (which is on the scale of (0,$\infty$), as is the Poisson distribution from which expectations are drawn) into the scale of the linear predictor (which is $-\infty,\infty$).

As implied in the name of this group of analyses, a Poisson rather than Gaussian (normal) distribution is used to represent the errors (residuals). Like count data (number of individuals, species etc), the Poisson distribution encapsulates positive integers and is bound by zero at one end. Consequently, the degree of variability is directly related the expected value (equivalent to the mean of a Gaussian distribution). Put differently, the variance is a function of the mean. Repeated observations from a Poisson distribution located close to zero will yield a much smaller spread of observations than will samples drawn from a Poisson distribution located a greater distance from zero. In the Poisson distribution, the variance has a 1:1 relationship with the mean.

The canonical link function for the Poisson distribution is a log-link function.

Whilst the expectation that the mean=variance ($\mu=\sigma$) is broadly compatible with actual count data (that variance increases at the same rate as the mean), under certain circumstances, this might not be the case. For example, when there are other unmeasured influences on the response variable, the distribution of counts might be somewhat clumped which can result in higher than expected variability (that is $\sigma\gt\mu$). The variance increases more rapidly than does the mean. This is referred to as overdispersion. The degree to which the variability is greater than the mean (and thus the expected degree of variability) is called dispersion. Effectively, the Poisson distribution has a dispersion parameter (or scaling factor) of 1.

It turns out that overdispersion is very common for count data and it typically underestimates variability, standard errors and thus deflated p-values. There are a number of ways of overcoming this limitation, the effectiveness of which depend on the causes of overdispersion.

  • Quasi-Poisson models - these introduce the dispersion parameter ($\phi$) into the model. This approach does not utilize an underlying error distribution to calculate the maximum likelihood (there is no quasi-Poisson distribution). Instead, if the Newton-Ralphson iterative reweighting least squares algorithm is applied using a direct specification of the relationship between mean and variance ($var(y)=\phi\mu$), the estimates of the regression coefficients are identical to those of the maximum likelihood estimates from the Poisson model. This is analogous to fitting ordinary least squares on symmetrical, yet not normally distributed data - the parameter estimates are the same, however they won't necessarily be as efficient. The standard errors of the coefficients are then calculated by multiplying the Poisson model coefficient standard errors by $\sqrt{\phi}$.

    Unfortunately, because the quasi-poisson model is not estimated via maximum likelihood, properties such as AIC and log-likelihood cannot be derived. Consequently, quasi-poisson and Poisson model fits cannot be compared via either AIC or likelihood ratio tests (nor can they be compared via deviance as uasi-poisson and Poisson models have the same residual deviance). That said, quasi-likelihood can be obtained by dividing the likelihood from the Poisson model by the dispersion (scale) factor.

  • Negative binomial model - technically, the negative binomial distribution is a probability distribution for the number of successes before a specified number of failures. However, the negative binomial can also be defined (parameterized) in terms of a mean ($\mu$) and scale factor ($\omega$), $$p(y_i)=\frac{\Gamma(y_i+\omega)}{\Gamma(\omega)y!}\times\frac{\mu_i^{y_i}\omega^\omega}{(\mu_i+\omega)^{\mu_i+\omega}}$$ where the expectected value of the values $y_i$ (the means) are ($\mu_i$) and the variance is $y_i=\frac{\mu_i+\mu_i^2}{\omega}$. In this way, the negative binomial is a two-stage hierarchical process in which the response is modeled against a Poisson distribution whose expected count is in turn modeled by a Gamma distribution with a mean of $\mu$ and constant scale parameter ($\omega$).

    Strictly, the negative binomial is not an exponential family distribution (unless $\omega$ is fixed as a constant), and thus negative binomial models cannot be fit via the usual GLM iterative reweighting algorithm. Instead estimates of the regression parameters along with the scale factor ($\omega$) are obtained via maximum likelihood.

    The negative binomial model is useful for accommodating overdispersal when it is likely caused by clumping (due to the influence of other unmeasured factors) within the response.

  • Zero-inflated Poisson model - overdispersion can also be caused by the presence of a greater number of zero's than would otherwise be expected for a Poisson distribution. There are potentially two sources of zero counts - genuine zeros and false zeros. Firstly, there may genuinely be no individuals present. This would be the number expected by a Poisson distribution. Secondly, individuals may have been present yet not detected or may not even been possible. These are false zero's and lead to zero inflated data (data with more zeros than expected).

    For example, the number of joeys accompanying an adult koala could be zero because the koala has no offspring (true zero) or because the koala is male or infertile (both of which would be examples of false zeros). Similarly, zero counts of the number of individual in a transect are due either to the absence of individuals or the inability of the observer to detect them. Whilst in the former example, the latent variable representing false zeros (sex or infertility) can be identified and those individuals removed prior to analysis, this is not the case for the latter example. That is, we cannot easily partition which counts of zero are due to detection issues and which are a true indication of the natural state.

    Consistent with these two sources of zeros, zero-inflated models combine a binary logistic regression model (that models count membership according to a latent variable representing observations that can only be zeros - not detectable or male koalas) with a Poisson regression (that models count membership according to a latent variable representing observations whose values could be 0 or any positive integer - fertile female koalas).

Summary of important equations

Many ecologists are repulsed or frightened by statistical formulae. In part this is due to the use of Greek letters to represent concepts, constants and functions with the assumption that the readers are familiar with their meaning. Hence, the issue is largely that many ecologists are familiar with certain statistical concepts, yet are not overly familiar with the notation used to represent those concepts.

As a typed language, R exposes users to a rich variety of statistical opportunities. Whilst many of the common procedures are wrapped into simple to use functions (and so all of the calculations etc underlying a GLM need not be performed by hand by the user), not all procedures have been packaged into functions. For such cases, it is useful to have the formulas handy.

Hence, in this section I am going to summarize and compare equations for common procedures associated with Poisson and Negative Binomial models. Feel free to skip this section until it is useful to you.

PoissonNegative Binomial (p,r)
Density function $f(y|\lambda)=\frac{\lambda^{y}e^{-\lambda}}{y!}$ $f(y|r,p)=\frac{\Gamma(y+r)}{\Gamma(r)\Gamma(y+1)}p^r(1-p)^y$
Expected value $E(Y)=\mu=\lambda$ $E(Y)=\mu=\frac{r(1-p)}{p}$
Variance $var(Y)=\lambda$ $var(Y)=\frac{r(1-p)}{p^2}$
log-likelihood $\mathcal{LL}(\lambda\mid y) = \sum\limits^n_{i=1}y_i log(\lambda_i)-\lambda_i - log\Gamma(y_i+1)$ $\begin{align}\mathcal{LL}(r,p\mid y) = \sum\limits^n_{i=1}&log\Gamma(y_1 + r) - log\Gamma(r) - log\Gamma(y_i + 1) + \\&r.log(p) + y_i log(1-p)\end{align}$
Negative Binomial (a,b)
Density function $f(y \mid a,b) = \frac{b^{a}y^{b-1}e^{-b y}}{\Gamma(a)}$
Expected value $E(Y)=\mu=\frac{r}{p}$
Variance
log-likelihood $\mathcal{LL}(\lambda;y)=\sum\limits^n_{i=1} log\Gamma(y_i + a) - log\Gamma(y_i+1) - log\Gamma(a) + \alpha(log(b_i) - log(b_i+1)) - y_i.log(b_i+1)$
Zero-inflated Poisson
Density function $p(y \mid \theta,\lambda) = \left\{ \begin{array}{l l} \theta + (1-\theta)\times \text{Pois}(0|\lambda) & \quad \text{if $y_i=0$ and}\\ (1-\theta)\times \text{Pois}(y_i|\lambda) & \quad \text{if $y_i>0$} \end{array} \right.$
Expected value $E(Y)=\mu=\lambda\times(1-theta)$
Variance $var(Y)=\lambda\times(1-\theta)\times(1+\theta\times\lambda^2)$
log-likelihood $\mathcal{LL}(\theta,\lambda\mid y) = \left\{ \begin{array}{l l} \sum\limits^n_{i=1} (1-\theta)\times log(\theta + \lambda_i)\\ \sum\limits^n_{i=1}y_i log(1-\theta)\times y_i\lambda_i - exp(\lambda_i)\\ \end{array} \right. $
ProcedureEquation
Residuals $\varepsilon_i = y_i - \hat{y}$
Pearson's Residuals $\frac{y_i - \hat{y}}{\sqrt{var(y)}}$
Residual Sums of Squares $RSS = \sum \varepsilon_i^2$
Dispersion statistic $\phi=\frac{RSS}{df}$
$R^2$ $R^2 = 1-\frac{RSS_{model}}{RSS_{null}}$
McFadden's $R^2$ $R_{McF}^2 = 1 - \frac{\mathcal{LL}_{model}}{\mathcal{LL}_{null}}$
Deviance $dev = -2*\mathcal{LL}$
pD $pD = $
DIC $DIC = -pD$
AIC $AIC = min(dev+(2pD))$

Poisson regression

The following equations are provided since in Bayesian modelling, it is occasionally necessary to directly define the log-likelihood calculations (particularly for zero-inflated models and other mixture models). Feel free initially gloss over these equations until such time when your models require them ;)

Density function: $$ f(y\mid\lambda)=\frac{\lambda^{y}e^{-\lambda}}{y!}\\ E(Y)=Var(Y)=\lambda\\ $$ where $\lambda$ is the mean.

Likelihood: $$ \begin{align} \mathcal{L}(\lambda\mid y) &= \prod\limits_{i=1}^n \dfrac{\lambda^{y_i}e^{-\lambda}}{y_i!}\\ %&= \dfrac{\lambda^{\sum\limits^n_{i=1}y_i} e^{-n\lambda}}{y_1!y_2! \cdots y_n!}\\ \end{align} $$ Log-likelihood: $$ \begin{array}{rcl} \mathcal{LL}(\lambda\mid y)&=&\sum\limits^n_{i=1}log(\lambda^{y_i}e^{-\lambda_i})-log(y_i!)\\ \mathcal{LL}(\lambda\mid y)&=&\sum\limits^n_{i=1}log(\lambda^{y_i})+log(e^{-\lambda_i})-log(y_i!)\\ \mathcal{LL}(\lambda\mid y)&=&\sum\limits^n_{i=1}y_i log(\lambda_i)-\lambda_i - log(y_i!) \end{array} $$

Scenario and Data

Lets say we wanted to model the abundance of an item ($y$) against a continuous predictor ($x$). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Random data incorporating the following trends (effect parameters)
  • the sample size = 20
  • the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
  • the rate of change in log $y$ per unit of $x$ (slope) = 0.1.
  • the value of $x$ when log$y$ equals 0 (when $y$=1)
  • to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
  • finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.
set.seed(8)
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1, max =20))
mm <- model.matrix(~x)
intercept <- 0.6
slope=0.1
#The linear predictor
linpred <- mm %*% c(intercept,slope)
#Predicted y values
lambda <- exp(linpred)
#Add some noise and make binomial
y <- rpois(n=n.x, lambda=lambda)
dat <- data.frame(y,x)

With these sort of data, we are primarily interested in investigating whether there is a relationship between the positive integer response variable and the linear predictor (linear combination of one or more continuous or categorical predictors).

Exploratory data analysis and initial assumption checking

The assumptions are:
  1. All of the observations are independent - this must be addressed at the design and collection stages
  2. The response variable (and thus the residuals) should be matched by an appropriate distribution (in the case of positive integers response - a Poisson is appropriate).
  3. All observations are equally influential in determining the trends - or at least no observations are overly influential. This is most effectively diagnosed via residuals and other influence indices and is very difficult to diagnose prior to analysis
  4. the relationship between the linear predictor (right hand side of the regression formula) and the link function should be linear. A scatterplot with smoother can be useful for identifying possible non-linearity.
  5. The dispersion factor is close to 1

There are at least five main potential models we could consider fitting to these data:

  1. Ordinary least squares regression (general linear model) - assumes normality of residuals
  2. Poisson regression - assumes mean=variance (dispersion=1)
  3. Quasi-poisson regression - a general solution to overdispersion. Assumes variance is a function of mean, dispersion estimated, however likelihood based statistics unavailable
  4. Negative binomial regression - a specific solution to overdispersion caused by clumping (due to an unmeasured latent variable). Scaling factor ($\omega$) is estimated along with the regression parameters.
  5. Zero-inflation model - a specific solution to overdispersion caused by excessive zeros (due to an unmeasured latent variable). Mixture of binomial and Poisson models.

Confirm non-normality and explore clumping

When counts are all very large (not close to 0) and their ranges do not span orders of magnitude, they take on very Gaussian properties (symmetrical distribution and variance independent of the mean). Given that models based on the Gaussian distribution are more optimized and recognized than Generalized Linear Models, it can be prudent to adopt Gaussian models for such data. Hence it is a good idea to first explore whether a Poisson model is likely to be more appropriate than a standard Gaussian model.

The potential for overdispersion can be explored by adding a rug to boxplot. The rug is simply tick marks on the inside of an axis at the position corresponding to an observation. As multiple identical values result in tick marks drawn over one another, it is typically a good idea to apply a slight amount of jitter (random displacement) to the values used by the rug.

hist(dat$y)
plot of chunk tut11.5bS1.2
boxplot(dat$y, horizontal=TRUE)
rug(jitter(dat$y), side=1)
plot of chunk tut11.5bS1.2

There is definitely signs of non-normality that would warrant Poisson models. The rug applied to the boxplots does not indicate a series degree of clumping and there appears to be few zero. Thus overdispersion is unlikely to be an issue.

Confirm linearity

Lets now explore linearity by creating a histogram of the predictor variable ($x$) and a scatterplot of the relationship between the response ($y$) and the predictor ($x$)

hist(dat$x)
plot of chunk tut11.5bS1.3
#now for the scatterplot
plot(y~x, dat, log="y")
with(dat, lines(lowess(y~x)))
plot of chunk tut11.5bS1.3

Conclusions: the predictor ($x$) does not display any skewness or other issues that might lead to non-linearity. The lowess smoother on the scatterplot does not display major deviations from a straight line and thus linearity is satisfied. Violations of linearity could be addressed by either:

  • define a non-linear linear predictor (such as a polynomial, spline or other non-linear function)
  • transform the scale of the predictor variables

Explore zero inflation

Although we have already established that there are few zeros in the data (and thus overdispersion is unlikely to be an issue), we can also explore this by comparing the number of zeros in the data to the number of zeros that would be expected from a Poisson distribution with a mean equal to the mean count of the data.

#proportion of 0's in the data
dat.tab<-table(dat$y==0)
dat.tab/sum(dat.tab)
FALSE 
    1 
#proportion of 0's expected from a Poisson distribution
mu <- mean(dat$y)
cnts <- rpois(1000, mu)
dat.tab <- table(cnts == 0)
dat.tab/sum(dat.tab)
FALSE  TRUE 
0.997 0.003 
In the above, the value under FALSE is the proportion of non-zero values in the data and the value under TRUE is the proportion of zeros in the data. In this example, there are no zeros in the observed data which corresponds closely to the very low proportion expected (0.003).

Model fitting or statistical analysis

JAGS

$$ \begin{align} Y_i&\sim{}P(\lambda) & (\text{response distribution})\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,10000) & (\text{diffuse Bayesian prior})\\ \end{align} $$
dat.list <- with(dat,list(Y=y, X=x,N=nrow(dat)))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dpois(lambda[i])
     log(lambda[i]) <- beta0 + beta1*X[i]
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS1.4.bug')
library(R2jags)
system.time(
dat.P.jags <- jags(model='../downloads/BUGSscripts/tut11.5bS1.4.bug', data=dat.list, inits=NULL,
          param=c('beta0','beta1'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 105

Initializing model
   user  system elapsed 
  1.852   0.004   1.861 
Xmat <- model.matrix(~x, dat)
nX <- ncol(Xmat)
dat.list1 <- with(dat,list(Y=y, X=Xmat,N=nrow(dat), nX=nX))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dpois(lambda[i])
     eta[i] <- inprod(beta[], X[i,])
     log(lambda[i]) <- eta[i]
  }
  for (i in 1:nX) {
    beta[i] ~ dnorm(0,1.0E-06)
  }
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS1.41.bug')
library(R2jags)
system.time(
dat.P.jags1 <- jags(model='../downloads/BUGSscripts/tut11.5bS1.41.bug', data=dat.list1, inits=NULL,
          param=c('beta'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 127

Initializing model
   user  system elapsed 
  1.836   0.004   1.844 
print(dat.P.jags1)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS1.41.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta[1]    0.546   0.256  0.019  0.384  0.552  0.719  1.018 1.002  1700
beta[2]    0.112   0.019  0.077  0.099  0.112  0.124  0.149 1.002  1800
deviance  88.428   4.999 86.376 86.891 87.680 89.036 93.531 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 12.5 and DIC = 100.9
DIC is an estimate of expected predictive error (lower deviance is better).

Or arguably better still, use a multivariate normal prior. If we have a $k$ regression parameters ($\beta_k$), then the multivariate normal priors are defined as: $$ \boldsymbol{\beta}\sim{}\mathcal{N_k}(\boldsymbol{\mu}, \mathbf{\Sigma}) $$ where $$\boldsymbol{\mu}=[E[\beta_1],E[\beta_2],...,E[\beta_k]] = \left(\begin{array}{c}0\\\vdots\\0\end{array}\right)$$ $$ \mathbf{\Sigma}=[Cov[\beta_i, \beta_j]] = \left(\begin{array}{ccc}1000^2&0&0\\0&\ddots&0\\0&0&1000^2\end{array} \right) $$ hence, along with the response and predictor matrix, we need to supply $\boldsymbol{\mu}$ (a vector of zeros) and $\boldsymbol{\Sigma}$ (a covariance matrix with $1000^2$ in the diagonals).

Xmat <- model.matrix(~x, dat)
nX <- ncol(Xmat)
dat.list2 <- with(dat,list(Y=y, X=Xmat,N=nrow(dat), mu=rep(0,nX),Sigma=diag(1.0E-06,nX)))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dpois(lambda[i])
     eta[i] <- inprod(beta[], X[i,])
     log(lambda[i]) <- eta[i]
  }
  beta ~ dmnorm(mu[],Sigma[,])
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS1.42.bug')
library(R2jags)
system.time(
dat.P.jags2 <- jags(model='../downloads/BUGSscripts/tut11.5bS1.42.bug', data=dat.list2, inits=NULL,
          param=c('beta'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 130

Initializing model
   user  system elapsed 
  0.460   0.000   0.463 
print(dat.P.jags2)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS1.42.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta[1]    0.561   0.241  0.084  0.405  0.571  0.724  1.034 1.002  1500
beta[2]    0.111   0.018  0.074  0.099  0.111  0.123  0.145 1.002  1300
deviance  88.184   1.850 86.372 86.854 87.596 88.898 92.951 1.007   900

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 1.7 and DIC = 89.9
DIC is an estimate of expected predictive error (lower deviance is better).

BRMS

library(brms)
dat.brm <- brm(y~x, data=dat, family='poisson',
  prior = c(set_prior("normal(0,100)", class="b")),
  chains=3, iter=2000, warmup=1000, thin=2)
SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.03963 seconds (Warm-up)
#                0.03519 seconds (Sampling)
#                0.07482 seconds (Total)
# 

SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.038022 seconds (Warm-up)
#                0.035344 seconds (Sampling)
#                0.073366 seconds (Total)
# 

SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.038597 seconds (Warm-up)
#                0.037356 seconds (Sampling)
#                0.075953 seconds (Total)
# 

Chain mixing and Model validation

Prior to exploring the model parameters, it is prudent to confirm that the model did indeed fit the assumptions and was an appropriate fit to the data as well as that the MCMC sampling chain was adequately mixed and the retained samples independent.

Whilst I will only demonstrate this for the logit model, the procedure would be identical for exploring the probit and clog-log models.

  • We will start by exploring the mixing of the MCMC chains via traceplots.
    plot(as.mcmc(dat.P.jags))
    
    plot of chunk tut11.5bS2
    library(gridExtra)
    grid.arrange(stan_trace(dat.brm$fit, ncol=1),
                 stan_dens(dat.brm$fit, separate_chains=TRUE,ncol=1),
                 ncol=2)
    
    plot of chunk tut11.5bS2BRMS

    The chains appear well mixed and stable

  • Next we will explore correlation amongst MCMC samples.
    autocorr.diag(as.mcmc(dat.P.jags))
    
                 beta0    beta1  deviance
    Lag 0    1.0000000  1.00000  1.000000
    Lag 10   0.1553164  0.11400  0.009912
    Lag 50   0.0228548  0.01439  0.004464
    Lag 100 -0.0008548 -0.01320  0.001207
    Lag 500 -0.0339699 -0.02674 -0.007545
    

    The level of auto-correlation at the nominated lag of 10 is higher than we would generally like. It is worth increasing the thinning rate from 10 to 50. Obviously, to support this higher thinning rate, we would also increase the number of iterations.

    library(R2jags)
    dat.P.jags <- jags(data=dat.list,model.file='../downloads/BUGSscripts/tut11.5bS1.4.bug',
                       param=c('beta0','beta1'),
                       n.chains=3, n.iter=100000, n.burnin=20000, n.thin=50)
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
       Graph Size: 105
    
    Initializing model
    
    print(dat.P.jags)
    
    Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS1.4.bug", fit using jags,
     3 chains, each with 1e+05 iterations (first 20000 discarded), n.thin = 50
     n.sims = 4800 iterations saved
             mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
    beta0      0.545   0.257  0.040  0.375  0.545  0.718  1.039 1.001  4800
    beta1      0.112   0.019  0.075  0.100  0.112  0.124  0.148 1.001  4800
    deviance  88.379   3.712 86.368 86.871 87.664 89.058 93.778 1.002  4800
    
    For each parameter, n.eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
    
    DIC info (using the rule, pD = var(deviance)/2)
    pD = 6.9 and DIC = 95.3
    DIC is an estimate of expected predictive error (lower deviance is better).
    
    plot(as.mcmc(dat.P.jags))
    
    plot of chunk tut11.5bS4
    autocorr.diag(as.mcmc(dat.P.jags))
    
                 beta0    beta1 deviance
    Lag 0    1.0000000 1.000000 1.000000
    Lag 50   0.0100259 0.009642 0.003845
    Lag 250  0.0133849 0.010277 0.001847
    Lag 500  0.0056560 0.003987 0.001649
    Lag 2500 0.0006608 0.003896 0.007031
    

    Conclusions: the samples are now less auto-correlated and the chains are also arguably mixed a little better.

    stan_ac(dat.brm$fit)
    
    plot of chunk tut11.5bS4BRMS
  • Explore the step size characteristics (STAN only)
    summary(do.call(rbind, args = get_sampler_params(dat.brm$fit, inc_warmup = FALSE)), digits = 2)
    
     accept_stat__    stepsize__    treedepth__   n_leapfrog__ n_divergent__
     Min.   :0.24   Min.   :0.64   Min.   :1.0   Min.   :1.0   Min.   :0    
     1st Qu.:0.88   1st Qu.:0.64   1st Qu.:2.0   1st Qu.:3.0   1st Qu.:0    
     Median :0.96   Median :0.72   Median :2.0   Median :3.0   Median :0    
     Mean   :0.92   Mean   :0.70   Mean   :2.1   Mean   :3.7   Mean   :0    
     3rd Qu.:1.00   3rd Qu.:0.73   3rd Qu.:3.0   3rd Qu.:5.0   3rd Qu.:0    
     Max.   :1.00   Max.   :0.73   Max.   :3.0   Max.   :7.0   Max.   :0    
    
    stan_diag(dat.brm$fit)
    
    plot of chunk tut11.5bS4BRMSstepSize
    stan_diag(dat.brm$fit, information = "stepsize")
    
    plot of chunk tut11.5bS4BRMSstepSize
    stan_diag(dat.brm$fit, information = "treedepth")
    
    plot of chunk tut11.5bS4BRMSstepSize
    stan_diag(dat.brm$fit, information = "divergence")
    
    plot of chunk tut11.5bS4BRMSstepSize
    library(gridExtra)
    grid.arrange(stan_rhat(dat.brm$fit) + theme_classic(8),
                 stan_ess(dat.brm$fit) + theme_classic(8),
                 stan_mcse(dat.brm$fit) + theme_classic(8),
                 ncol = 2)
    
    plot of chunk tut11.5bS4BRMSstepSize
    Conclusions: acceptance ratio is very good, the lot posterior is relatively robust to step size and tree depth, rhat values are all below 1.05, the effective sample size is always above 0.5 and the ratio of MCMC error to posterior sd is very small.
  • One very important model validation procedure is to examine a plot of residuals against predicted or fitted values (the residual plot). Ideally, residual plots should show a random scatter of points without outliers. That is, there should be no patterns in the residuals. Patterns suggest inappropriate linear predictor (or scale) and/or inappropriate residual distribution/link function.

    The residuals used in such plots should be standardized (particularly if the model incorporated any variance-covariance structures - such as an autoregressive correlation structure) Pearsons's residuals standardize residuals by division with the square-root of the variance.

    We can generate Pearson's residuals within the JAGS model. Alternatively, we could use the parameters to generate the residuals outside of JAGS. Pearson's residuals are calculated according to: $$ \varepsilon = \frac{y_i - \mu}{\sqrt{var(y)}} $$ where $\mu$ is the expected value of $Y$ ($=\lambda$ for Poisson) and $var(y)$ is the variance of $Y$ ($=\lambda$ for Poisson).
    #extract the samples for the two model parameters
    coefs <- dat.P.jags$BUGSoutput$sims.matrix[,1:2]
    Xmat <- model.matrix(~x, data=dat)
    #expected values on a log scale
    eta<-coefs %*% t(Xmat)
    #expected value on response scale
    lambda <- exp(eta)
    #Expected value and variance are both equal to lambda
    expY <- varY <- lambda
    #sweep across rows and then divide by lambda
    Resid <- -1*sweep(expY,2,dat$y,'-')/sqrt(varY)
    #plot residuals vs expected values
    plot(apply(Resid,2,mean)~apply(eta,2,mean))
    
    plot of chunk tut11.5bS5
    #Calculate residuals 
    Resid.brm <- residuals(dat.brm, type='pearson')
    Fitted.brm <- fitted(dat.brm, scale='linear')
    ggplot(data=NULL, aes(y=Resid.brm[,'Estimate'], x=Fitted.brm[,'Estimate'])) + geom_point()
    
    plot of chunk tut11.5bS5BRM

    There is one residual that is substantially larger in magnitude than all the others. However, there are no other patterns in the residuals.

  • Now we will compare the sum of squared residuals to the sum of squares residuals that would be expected from a Poisson distribution matching that estimated by the model. Essentially this is estimating how well the Poisson distribution, the log-link function and the linear model approximates the observed data.
    SSres<-apply(Resid^2,1,sum)
    
    set.seed(2)
    #generate a matrix of draws from a poisson distribution
    # the matrix is the same dimensions as lambda and uses the probabilities of lambda
    YNew <- matrix(rpois(length(lambda),lambda=lambda),nrow=nrow(lambda))
    
    Resid1<-(lambda - YNew)/sqrt(lambda)
    SSres.sim<-apply(Resid1^2,1,sum)
    mean(SSres.sim>SSres)
    
    [1] 0.484375
    
    dat.list1 <- with(dat,list(Y=y, X=x,N=nrow(dat)))
    modelString="
    model {
      for (i in 1:N) {
        #likelihood function
        Y[i] ~ dpois(lambda[i])
        eta[i] <- beta0+beta1*X[i] #linear predictor
        log(lambda[i]) <- eta[i]   #link function
    
        #E(Y) and var(Y)
        expY[i] <- lambda[i]
        varY[i] <- lambda[i]
    
        # Calculate RSS
        Resid[i] <- (Y[i] - expY[i])/sqrt(varY[i])
        RSS[i] <- pow(Resid[i],2)
    
        #Simulate data from a Poisson distribution
        Y1[i] ~ dpois(lambda[i])
        #Calculate RSS for simulated data
        Resid1[i] <- (Y1[i] - expY[i])/sqrt(varY[i])
        RSS1[i] <-pow(Resid1[i],2) 
      }
      #Priors
      beta0 ~ dnorm(0,1.0E-06)
      beta1 ~ dnorm(0,1.0E-06)
      #Bayesian P-value
      Pvalue <- mean(sum(RSS1)>sum(RSS))
    } 
    "
    writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS6.41.bug')
    library(R2jags)
    system.time(
    dat.P.jags1 <- jags(model='../downloads/BUGSscripts/tut11.5bS6.41.bug', data=dat.list1, inits=NULL,
              param=c('beta0','beta1','Pvalue'),
              n.chain=3,
              n.iter=100000, n.thin=50, n.burnin=20000)
      )
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
       Graph Size: 272
    
    Initializing model
    
       user  system elapsed 
     16.189   0.004  16.209 
    
    print(dat.P.jags1)
    
    Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS6.41.bug", fit using jags,
     3 chains, each with 1e+05 iterations (first 20000 discarded), n.thin = 50
     n.sims = 4800 iterations saved
             mu.vect sd.vect   2.5%    25%    50%    75% 97.5%  Rhat n.eff
    Pvalue     0.481   0.500  0.000  0.000  0.000  1.000  1.00 1.001  4800
    beta0      0.544   0.261  0.017  0.374  0.550  0.720  1.04 1.002  1200
    beta1      0.112   0.019  0.075  0.099  0.112  0.125  0.15 1.002  1600
    deviance  88.466   3.891 86.372 86.931 87.758 89.180 93.91 1.001  4800
    
    For each parameter, n.eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
    
    DIC info (using the rule, pD = var(deviance)/2)
    pD = 7.6 and DIC = 96.0
    DIC is an estimate of expected predictive error (lower deviance is better).
    
    Resid.brm <- residuals(dat.brm, type='pearson', summary=FALSE)
    SSres.brm <- apply(Resid.brm^2,1,sum)
    lambda.brm = fitted(dat.brm, scale='response', summary=FALSE)
    YNew.brm <- matrix(rpois(length(lambda.brm), lambda=lambda.brm),nrow=nrow(lambda.brm))
    
    Resid1.brm<-(lambda.brm - YNew.brm)/sqrt(lambda.brm)
    SSres.sim.brm<-apply(Resid1.brm^2,1,sum)
    mean(SSres.sim.brm>SSres.brm)
    
    [1] 0.6026667
    

    Conclusions: the Bayesian p-value is approximately 0.5, suggesting that there is a good fit of the model to the data.

  • Unfortunately, unlike with linear models (Gaussian family), the expected distribution of data (residuals) varies over the range of fitted values for numerous (often competing) ways that make diagnosing (and attributing causes thereof) miss-specified generalized linear models from standard residual plots very difficult. The use of standardized (Pearson) residuals or deviance residuals can partly address this issue, yet they still do not offer completely consistent diagnoses across all issues (miss-specified model, over-dispersion, zero-inflation).

    An alternative approach is to use simulated data from the model posteriors to calculate an empirical cumulative density function from which residuals are are generated as values corresponding to the observed data along the density function.

    #extract the samples for the two model parameters
    coefs <- dat.P.jags$BUGSoutput$sims.matrix[,1:2]
    Xmat <- model.matrix(~x, data=dat)
    #expected values on a log scale
    eta<-coefs %*% t(Xmat)
    #expected value on response scale
    lambda <- exp(eta)
    
    simRes <- function(lambda, data,n=250, plot=T, family='poisson') {
     require(gap)
     N = nrow(data)
     sim = switch(family,
        'poisson' = matrix(rpois(n*N,apply(lambda,2,mean)),ncol=N, byrow=TRUE)
     )
     a = apply(sim + runif(n,-0.5,0.5),2,ecdf)
     resid<-NULL
     for (i in 1:nrow(data)) resid<-c(resid,a[[i]](data$y[i] + runif(1 ,-0.5,0.5)))
     if (plot==T) {
       par(mfrow=c(1,2))
       gap::qqunif(resid,pch = 2, bty = "n",
       logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals",
       cex.main = 1, las=1)
       plot(resid~apply(lambda,2,mean), xlab='Predicted value', ylab='Standardized residual', las=1)
     }
     resid
    }
    
    simRes(lambda,dat, family='poisson')
    
    plot of chunk tut10.6bS6.simulatedResiduals
     [1] 0.264 0.280 0.556 0.416 0.584 0.988 0.068 0.576 0.524 0.756 0.080 0.544 0.888 0.044 0.288 0.340 0.796 0.832 0.164 0.704
    
    The trend (black symbols) in the qq-plot does not appear to be overly non-linear (matching the ideal red line well), suggesting that the model is not overdispersed. The spread of standardized (simulated) residuals in the residual plot do not appear overly non-uniform. That is there is not trend in the residuals. Furthermore, there is not a concentration of points close to 1 or 0 (which would imply overdispersion).
    lambda.brm = fitted(dat.brm, scale='response', summary=FALSE)
    
    simRes <- function(lambda, data,n=250, plot=T, family='poisson') {
     require(gap)
     N = nrow(data)
     sim = switch(family,
        'poisson' = matrix(rpois(n*N,apply(lambda,2,mean)),ncol=N, byrow=TRUE)
     )
     a = apply(sim + runif(n,-0.5,0.5),2,ecdf)
     resid<-NULL
     for (i in 1:nrow(data)) resid<-c(resid,a[[i]](data$y[i] + runif(1 ,-0.5,0.5)))
     if (plot==T) {
       par(mfrow=c(1,2))
       gap::qqunif(resid,pch = 2, bty = "n",
       logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals",
       cex.main = 1, las=1)
       plot(resid~apply(lambda,2,mean), xlab='Predicted value', ylab='Standardized residual', las=1)
     }
     resid
    }
    
    simRes(lambda.brm, dat, family='poisson')
    
    plot of chunk tut11.5bS6.BRMSsimulatedResiduals
     [1] 0.500 0.564 0.676 0.184 0.632 0.996 0.072 0.568 0.588 0.764 0.128 0.580 0.940 0.100 0.264 0.216 0.780 0.880 0.148 0.716
    
  • Recall that the Poisson regression model assumes that variance=mean ($var=\mu\phi$ where $\phi=1$) and thus dispersion ($\phi=\frac{var}{\mu}=1$). However, we can also calculate approximately what the dispersion factor would be by using sum square of the residuals as a measure of variance and the model residual degrees of freedom as a measure of the mean (since the expected value of a Poisson distribution is the same as its degrees of freedom). $$\phi=\frac{RSS}{df}$$ where $df=n-k$ and $k$ is the number of estimated model coefficients.
    Resid <- -1*sweep(lambda,2,dat$y,'-')/sqrt(lambda)
    RSS<-apply(Resid^2,1,sum)
    (df<-nrow(dat)-ncol(coefs))
    
    [1] 18
    
    Disp <- RSS/df
    data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)),
               HPDinterval(as.mcmc(Disp),p=0.5))
    
           Median     Mean     lower    upper   lower.1  upper.1
    var1 1.044197 1.109181 0.9299409 1.452832 0.9299853 1.044238
    
    dat.list <- with(dat,list(Y=y, X=x,N=nrow(dat)))
    modelString="
    model {
      for (i in 1:N) {
         Y[i] ~ dpois(lambda[i])
         eta[i] <- beta0 + beta1*X[i]
         log(lambda[i]) <- eta[i]
         expY[i] <- lambda[i]
         varY[i] <- lambda[i]
    	 Resid[i] <- (Y[i] - expY[i])/sqrt(varY[i]) 
      }
      beta0 ~ dnorm(0,1.0E-06)
      beta1 ~ dnorm(0,1.0E-06)
      RSS <- sum(pow(Resid,2))
      df <- N-2
      phi <- RSS/df
    } 
    "
    writeLines(modelString,con='tut11.5bS1.40.bug')
    library(R2jags)
    system.time(
    dat.P.jags <- jags(model='tut11.5bS1.40.bug', data=dat.list, inits=NULL,
              param=c('beta0','beta1','phi'),
              n.chain=3,
              n.iter=100000, n.thin=50, n.burnin=20000)
      )
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
       Graph Size: 171
    
    Initializing model
    
       user  system elapsed 
     14.865   0.004  14.886 
    
    print(dat.P.jags)
    
    Inference for Bugs model at "tut11.5bS1.40.bug", fit using jags,
     3 chains, each with 1e+05 iterations (first 20000 discarded), n.thin = 50
     n.sims = 4800 iterations saved
             mu.vect sd.vect   2.5%   25%    50%    75%  97.5%  Rhat n.eff
    beta0      0.542   0.258  0.013  0.37  0.544  0.716  1.042 1.001  4800
    beta1      0.112   0.019  0.075  0.10  0.112  0.125  0.149 1.001  4800
    phi        1.112   0.400  0.934  0.98  1.052  1.173  1.570 1.001  4800
    deviance  88.433   3.859 86.373 86.89 87.727 89.159 93.839 1.001  4800
    
    For each parameter, n.eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
    
    DIC info (using the rule, pD = var(deviance)/2)
    pD = 7.4 and DIC = 95.9
    DIC is an estimate of expected predictive error (lower deviance is better).
    
    Resid.brm <- residuals(dat.brm, type='pearson', summary=FALSE)
    SSres.brm <- apply(Resid.brm^2,1,sum)
    (df <- nrow(dat) - nrow(coef(dat.brm)))
    
    [1] 18
    
    Disp <- SSres.brm/df
    data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)),
               HPDinterval(as.mcmc(Disp)))
    
            Median      Mean     lower    upper   lower.1  upper.1
    var1 0.9666143 0.9944079 0.8964392 1.168199 0.8964392 1.168199
    

    The dispersion statistic $\phi$ is close to 1 and thus there is no evidence that the data were overdispersed. The Poisson distribution was therefore appropriate.

Exploring the model parameters, test hypotheses

If there was any evidence that the assumptions had been violated or the model was not an appropriate fit, then we would need to reconsider the model and start the process again. In this case, there is no evidence that the test will be unreliable so we can proceed to explore the test statistics.

print(dat.P.jags)
Inference for Bugs model at "tut11.5bS1.40.bug", fit using jags,
 3 chains, each with 1e+05 iterations (first 20000 discarded), n.thin = 50
 n.sims = 4800 iterations saved
         mu.vect sd.vect   2.5%   25%    50%    75%  97.5%  Rhat n.eff
beta0      0.542   0.258  0.013  0.37  0.544  0.716  1.042 1.001  4800
beta1      0.112   0.019  0.075  0.10  0.112  0.125  0.149 1.001  4800
phi        1.112   0.400  0.934  0.98  1.052  1.173  1.570 1.001  4800
deviance  88.433   3.859 86.373 86.89 87.727 89.159 93.839 1.001  4800

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 7.4 and DIC = 95.9
DIC is an estimate of expected predictive error (lower deviance is better).
library(plyr)
adply(dat.P.jags$BUGSoutput$sims.matrix[,1:2], 2, function(x) {
  data.frame(Median=median(x), Mean=mean(x), HPDinterval(as.mcmc(x)), HPDinterval(as.mcmc(x),p=0.5))
})
     X1 Median   Mean   lower  upper lower.1 upper.1
1 beta0 0.5445 0.5423 0.02752 1.0480  0.3617  0.7056
2 beta1 0.1123 0.1122 0.07599 0.1491  0.1001  0.1251

Actually, many find it more palatable to express the estimates in the original scale of the observations rather than on a log scale.

library(plyr)
adply(exp(dat.P.jags$BUGSoutput$sims.matrix[,1:2]), 2, function(x) {
  data.frame(Median=median(x), Mean=mean(x), HPDinterval(as.mcmc(x)), HPDinterval(as.mcmc(x),p=0.5))
})
     X1 Median  Mean  lower upper lower.1 upper.1
1 beta0  1.724 1.778 0.9645 2.709   1.342   1.915
2 beta1  1.119 1.119 1.0790 1.161   1.105   1.133

Conclusions: We would reject the null hypothesis of no effect of $x$ on $y$. An increase in x is associated with a significant linear increase (positive slope) in the abundance of $y$. Every 1 unit increase in $x$ results in a log 0.1123415 unit increase in $y$. We usually express this in terms of abundance rather than log abundance, so every 1 unit increase in $x$ results in a ($e^{ 0.1123415 } = 1.1188949 $) 1.1188949 unit increase in the abundance of $y$.

summary(dat.brm)
 Family: poisson (log) 
Formula: y ~ x 
   Data: dat (Number of observations: 20) 
Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 2; 
         total post-warmup samples = 1500
   WAIC: Not computed
 
Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     0.54      0.25     0.04     1.01        864    1
x             0.11      0.02     0.08     0.15        962    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(coef(dat.brm))
              mean
Intercept 1.710068
x         1.119290
coefs.brm <- as.matrix(as.data.frame(rstan:::extract(dat.brm$fit)))
coefs.brm <- coefs.brm[,grep('b', colnames(coefs.brm))]
plyr:::adply(exp(coefs.brm), 2, function(x) {
  data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
  })
           X1     Mean   median    lower    upper
1 b_Intercept 1.763680 1.710650 0.950453 2.630181
2         b_x 1.119476 1.118679 1.082259 1.161513
marginal_effects(dat.brm)
plot of chunk tut11.5bS1.7BRMS

Further explorations of the trends

A measure of the strength of the relationship can be obtained according to: $$R^2 = 1 - \frac{RSS_{model}}{RSS_{null}}$$

Xmat <- model.matrix(~x, data=dat)
#expected values on a log scale
eta<-coefs %*% t(Xmat)
#expected value on response scale
lambda <- exp(eta)
#calculate the raw SS residuals
SSres <- apply((-1*(sweep(lambda,2,dat$y,'-')))^2,1,sum)
SSres.null <- sum((dat$y - mean(dat$y))^2)
#OR 
SSres.null <- crossprod(dat$y - mean(dat$y))
#calculate the model r2
1-mean(SSres)/SSres.null
       [,1]
[1,] 0.6572

Conclusions: 65.72% of the variation in $y$ abundance can be explained by its relationship with $x$.

dat.list <- with(dat,list(Y=y, X=x,N=nrow(dat)))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dpois(lambda[i])
     eta[i] <- beta0 + beta1*X[i]
     log(lambda[i]) <- eta[i]
     res[i] <- Y[i] - lambda[i]
     resnull[i] <- Y[i] - meanY
  }
  meanY <- mean(Y)
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
  RSS <- sum(res^2)
  RSSnull <- sum(resnull^2)
  r2 <- 1-RSS/RSSnull
} 
"
writeLines(modelString,con='tut11.5bS1.40.bug')
library(R2jags)
system.time(
dat.P.jags <- jags(model='tut11.5bS1.40.bug', data=dat.list, inits=NULL,
          param=c('beta0','beta1','r2'),
          n.chain=3,
          n.iter=100000, n.thin=50, n.burnin=20000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 150

Initializing model
   user  system elapsed 
 14.909   0.008  14.933 
print(dat.P.jags)
Inference for Bugs model at "tut11.5bS1.40.bug", fit using jags,
 3 chains, each with 1e+05 iterations (first 20000 discarded), n.thin = 50
 n.sims = 4800 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta0      0.550   0.253  0.052  0.384  0.554  0.723  1.028 1.001  4800
beta1      0.112   0.019  0.076  0.099  0.112  0.124  0.149 1.001  4800
r2         0.655   0.066  0.509  0.640  0.673  0.691  0.701 1.001  4800
deviance  88.418   4.483 86.372 86.885 87.718 89.117 93.725 1.003  4800

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 10.1 and DIC = 98.5
DIC is an estimate of expected predictive error (lower deviance is better).
## calculate the expected values on the response scale
lambda.brm = fitted(dat.brm, scale='response', summary=FALSE)
## calculate the raw SSresid
SSres.brm <- apply((-1*(sweep(lambda.brm,2,dat$y,'-')))^2,1,sum)
SSres.null <- sum((dat$y - mean(dat$y))^2)
#OR 
SSres.null <- crossprod(dat$y - mean(dat$y))
#calculate the model r2
1-mean(SSres.brm)/SSres.null
          [,1]
[1,] 0.6570747

Finally, we will create a summary plot.

par(mar = c(4, 5, 0, 0))
plot(y ~ x, data = dat, type = "n", ann = F, axes = F)
points(y ~ x, data = dat, pch = 16)
xs <- seq(min(dat$x,na.rm=TRUE),max(dat$x,na.rm=TRUE), l = 1000)
Xmat <- model.matrix(~xs)
eta<-coefs %*% t(Xmat)
ys <- exp(eta)
library(plyr)
library(coda)
data.tab <- adply(ys,2,function(x) {
  data.frame(Median=median(x), HPDinterval(as.mcmc(x)))
})
data.tab <- cbind(x=xs,data.tab)
points(Median ~ x, data=data.tab,col = "black", type = "l")
lines(lower ~ x, data=data.tab,col = "black", type = "l", lty = 2)
lines(upper ~ x, data=data.tab,col = "black", type = "l", lty = 2)

axis(1)
mtext("X", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Abundance of Y", 2, cex = 1.5, line = 3)
box(bty = "l")
plot of chunk tut11.5bS1.10
newdata = data.frame(x=seq(min(dat$x), max(dat$x), len=100))
Xmat = model.matrix(~x, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(dat.brm$fit)))
coefs <- coefs[,grep('b', colnames(coefs))]
fit = exp(coefs %*% t(Xmat))
newdata = cbind(newdata,
                plyr:::adply(fit, 2, function(x) {
                                  data.frame(Mean=mean(x), Median=median(x),
                                             HPDinterval(as.mcmc(x)))
                                })
)

ggplot(newdata, aes(y=Mean, x=x)) +
  geom_point(data=dat, aes(y=y)) +
  geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue',alpha=0.2) +
  geom_line() +
  scale_x_continuous('X') +
  scale_y_continuous('Abundance of Y') +
  theme_classic() +
  theme(axis.line.x=element_line(),axis.line.y=element_line())
plot of chunk tut11.5bS1.10BRMS

Defining full log-likelihood function

Now lets try it by specifying log-likelihood and the zero trick. When applying this trick, we need to manually calculate the deviance as the inbuilt deviance will be based on the log-likelihood of estimating the zeros (as part of the zero trick) rather than the deviance of the intended model..

The one advantage of the zero trick is that the Deviance and thus DIC, AIC provided by R2jags will be incorrect. Hence, they too need to be manually defined within jags. I suspect that the AIC calculation I have used is incorrect...

Xmat <- model.matrix(~x, dat)
nX <- ncol(Xmat)
dat.list2 <- with(dat,list(Y=y, X=Xmat,N=nrow(dat), mu=rep(0,nX),
                  Sigma=diag(1.0E-06,nX), zeros=rep(0,nrow(dat)), C=10000))
modelString="
model {
  for (i in 1:N) {
     zeros[i] ~ dpois(zeros.lambda[i])
     zeros.lambda[i] <- -ll[i] + C     
     ll[i] <- Y[i]*log(lambda[i]) - lambda[i] - loggam(Y[i]+1)
     eta[i] <- inprod(beta[], X[i,])
     log(lambda[i]) <- eta[i]
    llm[i] <- Y[i]*log(meanlambda) - meanlambda - loggam(Y[i]+1)
  }
  meanlambda <- mean(lambda)
  beta ~ dmnorm(mu[],Sigma[,])
  dev <- sum(-2*ll)
  pD <- mean(dev)-sum(-2*llm)
  AIC <- min(dev+(2*pD))
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS1.42.bug')
library(R2jags)
system.time(
dat.P.jags3 <- jags(model='../downloads/BUGSscripts/tut11.5bS1.42.bug', data=dat.list2, inits=NULL,
          param=c('beta','dev','AIC'),
          n.chain=3,
          n.iter=50000, n.thin=50, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 353

Initializing model
   user  system elapsed 
  1.936   0.004   1.942 
print(dat.P.jags3)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS1.42.bug", fit using jags,
 3 chains, each with 50000 iterations (first 10000 discarded), n.thin = 50
 n.sims = 2400 iterations saved
           mu.vect sd.vect      2.5%       25%       50%       75%     97.5%  Rhat n.eff
AIC      1.328e+01   4.076 9.634e+00 1.050e+01 1.200e+01 1.458e+01 2.438e+01 1.001  2400
beta[1]  5.510e-01   0.245 6.200e-02 3.860e-01 5.540e-01 7.160e-01 1.008e+00 1.000  2400
beta[2]  1.120e-01   0.018 7.800e-02 9.900e-02 1.120e-01 1.240e-01 1.470e-01 1.001  2400
dev      8.821e+01   1.868 8.637e+01 8.686e+01 8.764e+01 8.897e+01 9.294e+01 1.002  2400
deviance 4.001e+05   1.868 4.001e+05 4.001e+05 4.001e+05 4.001e+05 4.001e+05 1.000     1

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 1.7 and DIC = 400090.0
DIC is an estimate of expected predictive error (lower deviance is better).

Negative binomial

The following equations are provided since in Bayesian modelling, it is occasionally necessary to directly define the log-likelihood calculations (particularly for zero-inflated models and other mixture models). Feel free initially gloss over these equations until such time when your models require them ;)

prob, sizealpha, beta (gamma-poisson)
$$ f(y|r,p)=\frac{\Gamma(y+r)}{\Gamma(r)\Gamma(y+1)}p^r(1-p)^y $$ where $p$ is the probability of $y$ successes until $r$ failures. If, we make $p=\frac{size}{size+\mu}$, then we can define the function in terms of $\mu$ $$ f(y|r,\mu)=\frac{\Gamma(y+r)}{\Gamma(r)\Gamma(y+1)}\left(\frac{r}{\mu+r}\right)^r\left(1-\frac{r}{\mu+r}\right)^y $$ where $p$ is the probability of $y$ successes until $r$ failures. $$\mu = \frac{r(1-p)}{p}\\ E(Y)=\mu, Var(Y)=\mu+\frac{\mu^2}{r} $$ $$ f(y \mid \alpha, \beta) = \frac{\beta^{\alpha}y^{\alpha-1}e^{-\beta y}}{\Gamma(\alpha)} $$ where $$ \begin{align} l(\lambda;y)=&\sum\limits^n_{i=1} log\Gamma(y_i + \alpha) - log\Gamma(y_i+1) - log\Gamma(\alpha) + \\ & \alpha(log(\beta_i) - log(\beta_i+1)) - y_i.log(\beta_i+1) \end{align} $$

Scenario and Data

Lets say we wanted to model the abundance of an item ($y$) against a continuous predictor ($x$). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Random data incorporating the following trends (effect parameters)
  • the sample size = 20
  • the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
  • the rate of change in log $y$ per unit of $x$ (slope) = 0.1.
  • the value of $x$ when log$y$ equals 0 (when $y$=1)
  • to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
  • finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.
set.seed(37) #16 #35
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1, max =20))
mm <- model.matrix(~x)
intercept <- 0.6
slope=0.1
#The linear predictor
linpred <- mm %*% c(intercept,slope)
#Predicted y values
lambda <- exp(linpred)
#Add some noise and make binomial
y <- rnbinom(n=n.x, mu=lambda, size=1)
dat.nb <- data.frame(y,x)

Exploratory data analysis and initial assumption checking

The assumptions are:
  1. All of the observations are independent - this must be addressed at the design and collection stages
  2. The response variable (and thus the residuals) should be matched by an appropriate distribution (in the case of positive integers response - a Poisson is appropriate).
  3. All observations are equally influential in determining the trends - or at least no observations are overly influential. This is most effectively diagnosed via residuals and other influence indices and is very difficult to diagnose prior to analysis
  4. the relationship between the linear predictor (right hand side of the regression formula) and the link function should be linear. A scatterplot with smoother can be useful for identifying possible non-linearity.
  5. Dispersion is either 1 or overdispersion is otherwise accounted for in the model
  6. The number of zeros is either not excessive or else they are specifically addressed by the model

When counts are all very large (not close to 0) and their ranges do not span orders of magnitude, they take on very Gaussian properties (symmetrical distribution and variance independent of the mean). Given that models based on the Gaussian distribution are more optimized and recognized than Generalized Linear Models, it can be prudent to adopt Gaussian models for such data. Hence it is a good idea to first explore whether a Poisson or Negative Binomial model is likely to be more appropriate than a standard Gaussian model.

Recall from Poisson regression, there are five main potential models that we could consider fitting to these data.

There are five main potential models we could consider fitting to these data:

  1. Ordinary least squares regression (general linear model) - assumes normality of residuals
  2. Poisson regression - assumes mean=variance (dispersion=1)
  3. Quasi-poisson regression - a general solution to overdispersion. Assumes variance is a function of mean, dispersion estimated, however likelihood based statistics unavailable
  4. Negative binomial regression - a specific solution to overdispersion caused by clumping (due to an unmeasured latent variable). Scaling factor ($\omega$) is estimated along with the regression parameters.
  5. Zero-inflation model - a specific solution to overdispersion caused by excessive zeros (due to an unmeasured latent variable). Mixture of binomial and Poisson models.

Confirm non-normality and explore clumping

Check the distribution of the $y$ abundances

hist(dat.nb$y)
plot of chunk tut11.5bS3.2
boxplot(dat.nb$y, horizontal=TRUE)
rug(jitter(dat.nb$y))
plot of chunk tut11.5bS3.2
There is definitely signs of non-normality that would warrant Poisson models. Further to that, the rug on the boxplot is suggestive of clumping (observations are not evenly distributed and well spaced out). Together the, skewness and clumping could point to an overdispersed Poisson. Negative binomial models are one of the most effective means of modeling such data.

Confirm linearity

Lets now explore linearity by creating a histogram of the predictor variable ($x$) and a scatterplot of the relationship between the response ($y$) and the predictor ($x$)

hist(dat.nb$x)
plot of chunk tut11.5bS3.3
#now for the scatterplot
plot(y~x, dat.nb, log="y")
with(dat.nb, lines(lowess(y~x)))
plot of chunk tut11.5bS3.3

Conclusions: the predictor ($x$) does not display any skewness or other issues that might lead to non-linearity. The lowess smoother on the scatterplot does not display major deviations from a straight line and thus linearity is satisfied. Violations of linearity could be addressed by either:

  • define a non-linear linear predictor (such as a polynomial, spline or other non-linear function)
  • transform the scale of the predictor variables

Explore zero inflation

Although we have already established that there are few zeros in the data (and thus overdispersion is unlikely to be an issue), we can also explore this by comparing the number of zeros in the data to the number of zeros that would be expected from a Poisson distribution with a mean equal to the mean count of the data.

#proportion of 0's in the data
dat.nb.tab<-table(dat.nb$y==0)
dat.nb.tab/sum(dat.nb.tab)
FALSE  TRUE 
 0.95  0.05 
#proportion of 0's expected from a Poisson distribution
mu <- mean(dat.nb$y)
cnts <- rpois(1000, mu)
dat.nb.tabE <- table(cnts == 0)
dat.nb.tabE/sum(dat.nb.tabE)
FALSE 
    1 
In the above, the value under FALSE is the proportion of non-zero values in the data and the value under TRUE is the proportion of zeros in the data. In this example, the proportion of zeros observed is similar to the proportion expected. Indeed, there was only a single zero observed. Hence it is likely that if there is overdispersion it is unlikely to be due to excessive zeros.

Model fitting or statistical analysis

The boxplot of $y$ with the axis rug suggested that there might be some clumping (possibly due to some other unmeasured influence). It is therefore likely that the data are overdispersed.

dat.nb.list <- with(dat.nb,list(Y=y, X=x,N=nrow(dat.nb)))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dpois(lambda[i])
     eta[i] <- beta0 + beta1*X[i]
     log(lambda[i]) <- eta[i]
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS3.4.bug')
library(R2jags)
system.time(
dat.nb.P.jags <- jags(model='../downloads/BUGSscripts/tut11.5bS3.4.bug', data=dat.nb.list, inits=NULL,
          param=c('beta0','beta1'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 105

Initializing model
   user  system elapsed 
  1.788   0.000   1.791 
print(dat.nb.P.jags)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS3.4.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%  97.5%  Rhat n.eff
beta0      0.555   0.278   0.002   0.365   0.568   0.736   1.08 1.003  1000
beta1      0.111   0.020   0.072   0.098   0.111   0.125   0.15 1.002  1400
deviance 135.489   4.131 133.269 133.849 134.697 136.261 141.07 1.010  2700

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 8.5 and DIC = 144.0
DIC is an estimate of expected predictive error (lower deviance is better).
#extract the samples for the two model parameters
coefs <- dat.nb.P.jags$BUGSoutput$sims.matrix[,1:2]
Xmat <- model.matrix(~x, data=dat)
#expected values on a log scale
eta<-coefs %*% t(Xmat)
#expected value on response scale
lambda <- exp(eta)

Resid <- -1*sweep(lambda,2,dat.nb$y, '-')/sqrt(lambda)
RSS <- apply(Resid^2, 1, sum)
Disp <- RSS/(nrow(dat.nb)-ncol(coefs))
data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)), HPDinterval(as.mcmc(Disp),p=0.5))
     Median  Mean lower upper lower.1 upper.1
var1  3.147 3.243  2.63 4.002   2.786   3.252

The dispersion parameter was 3.2425024, indicating over three times more variability than would be expected for a Poisson distribution. The data are thus over-dispersed. Given that this is unlikely to be due to zero inflation and the rug plot did suggest some level of clumping, negative binomial regression would seem reasonable.

Model fitting or statistical analysis

JAGS

$$ \begin{align} Y_i&\sim{}NB(p_i,size) & (\text{response distribution})\\ p_i&=size/(size+\lambda_i)\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,10000) & (\text{diffuse Bayesian prior})\\ size &\sim{}\mathcal{U}(0.001,1000)\\ \end{align} $$
dat.nb.list <- with(dat.nb,list(Y=y, X=x,N=nrow(dat.nb)))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dnegbin(p[i],size)
     p[i] <- size/(size+lambda[i])
     log(lambda[i]) <- beta0 + beta1*X[i]
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
  size ~ dunif(0.001,1000)
  theta <- pow(1/mean(p),2)
  scaleparam <- mean((1-p)/p) 
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS4.1.bug')
library(R2jags)
system.time(
dat.NB.jags <- jags(model='../downloads/BUGSscripts/tut11.5bS4.1.bug', data=dat.nb.list, inits=NULL,
          param=c('beta0','beta1', 'size','theta','scaleparam'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 157

Initializing model
   user  system elapsed 
 17.430   0.008  17.455 
Xmat <- model.matrix(~x, dat.nb)
nX <- ncol(Xmat)
dat.nb.list1 <- with(dat.nb,list(Y=y, X=Xmat,N=nrow(dat.nb), nX=nX))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dnegbin(p[i],size)
     p[i] <- size/(size+lambda[i])
     eta[i] <- inprod(beta[], X[i,])
     log(lambda[i]) <- max(-20,min(20,eta[i]))
  }
  for (i in 1:nX) {
    beta[i] ~ dnorm(0,1.0E-06)
  }
  size ~ dunif(0.001,10000)
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS4.11.bug')
library(R2jags)
system.time(
dat.NB.jags1 <- jags(model='../downloads/BUGSscripts/tut11.5bS4.11.bug', data=dat.nb.list1, inits=NULL,
          param=c('beta'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 212

Initializing model
   user  system elapsed 
  17.01    0.00   17.02 
print(dat.NB.jags1)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS4.11.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]    0.733   0.415  -0.069   0.465   0.725   1.005   1.564 1.001  3000
beta[2]    0.097   0.033   0.032   0.075   0.097   0.118   0.163 1.001  3000
deviance 113.046   2.682 110.080 111.088 112.321 114.261 120.171 1.002  2000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.6 and DIC = 116.6
DIC is an estimate of expected predictive error (lower deviance is better).

Or arguably better still, use a multivariate normal prior. If we have a $k$ regression parameters ($\beta_k$), then the multivariate normal priors are defined as: $$ \boldsymbol{\beta}\sim{}\mathcal{N_k}(\boldsymbol{\mu}, \mathbf{\Sigma}) $$ where $$\boldsymbol{\mu}=[E[\beta_1],E[\beta_2],...,E[\beta_k]] = \left(\begin{array}{c}0\\\vdots\\0\end{array}\right)$$ $$ \mathbf{\Sigma}=[Cov[\beta_i, \beta_j]] = \left(\begin{array}{ccc}1000^2&0&0\\0&\ddots&0\\0&0&1000^2\end{array} \right) $$ hence, along with the response and predictor matrix, we need to supply $\boldsymbol{\mu}$ (a vector of zeros) and $\boldsymbol{\Sigma}$ (a covariance matrix with $1000^2$ in the diagonals).

Xmat <- model.matrix(~x, dat.nb)
nX <- ncol(Xmat)
dat.nb.list2 <- with(dat.nb,list(Y=y, X=Xmat,N=nrow(dat.nb), mu=rep(0,nX),Sigma=diag(1.0E-06,nX)))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dnegbin(p[i],size)
     p[i] <- size/(size+lambda[i])
     eta[i] <- inprod(beta[], X[i,])
     log(lambda[i]) <- eta[i]
  }
  beta ~ dmnorm(mu[],Sigma[,])
  size ~ dunif(0.001,10000)
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS4.12.bug')
library(R2jags)
system.time(
dat.NB.jags2 <- jags(model='../downloads/BUGSscripts/tut11.5bS4.12.bug', data=dat.nb.list2, inits=NULL,
          param=c('beta', 'size'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 173

Initializing model
   user  system elapsed 
  6.353   0.000   6.361 
print(dat.NB.jags2)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS4.12.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]    0.737   0.421  -0.097   0.460   0.736   1.013   1.555 1.002  3000
beta[2]    0.097   0.033   0.030   0.076   0.097   0.118   0.164 1.001  3000
size       3.102   1.931   1.048   1.885   2.615   3.749   7.885 1.001  3000
deviance 113.067   2.628 110.066 111.150 112.407 114.255 120.059 1.002  1600

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.5 and DIC = 116.5
DIC is an estimate of expected predictive error (lower deviance is better).
dat.nb.brm <- brm(y~x, data=dat.nb, family='negbinomial',
   prior=c(set_prior("normal(0,100)", class="b"),
           set_prior("student_t(3,0,5)", class="shape")),
   chains=3, iter=2000, warmup=1000, thin=2)
SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.129428 seconds (Warm-up)
#                0.100482 seconds (Sampling)
#                0.22991 seconds (Total)
# 

SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.11 seconds (Warm-up)
#                0.091671 seconds (Sampling)
#                0.201671 seconds (Total)
# 

SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.108511 seconds (Warm-up)
#                0.074716 seconds (Sampling)
#                0.183227 seconds (Total)
# 

Chain mixing and Model validation

Prior to exploring the model parameters, it is prudent to confirm that the model did indeed fit the assumptions and was an appropriate fit to the data as well as that the MCMC sampling chain was adequately mixed and the retained samples independent.

  • We will start by exploring the mixing of the MCMC chains via traceplots.
    plot(as.mcmc(dat.NB.jags))
    
    plot of chunk tut11.5bS4.2
    plot of chunk tut11.5bS4.2
    library(gridExtra)
    grid.arrange(stan_trace(dat.nb.brm$fit, ncol=1),
                 stan_dens(dat.nb.brm$fit, separate_chains=TRUE,ncol=1),
                 ncol=2)
    
    plot of chunk tut11.5bS4.2BRMS

    The chains appear well mixed and stable

  • Next we will explore correlation amongst MCMC samples.
    autocorr.diag(as.mcmc(dat.NB.jags))
    
               beta0    beta1   deviance scaleparam     size     theta
    Lag 0    1.00000  1.00000  1.0000000  1.0000000  1.00000  1.000000
    Lag 10   0.30859  0.30371  0.0455964  0.0180221  0.08759  0.028477
    Lag 50  -0.05370 -0.05565  0.0011639 -0.0206865 -0.01812 -0.022056
    Lag 100 -0.01028 -0.01914 -0.0003804 -0.0075196  0.01651 -0.004871
    Lag 500  0.02130  0.01817 -0.0441559  0.0001324  0.01337  0.022265
    

    The level of auto-correlation at the nominated lag of 10 is higher than we would generally like. It is worth increasing the thinning rate from 10 to 50. Obviously, to support this higher thinning rate, we would also increase the number of iterations. Typically for a negative binomial, it is worth having a large burnin (approximately half of the iterations).

    library(R2jags)
    dat.NB.jags <- jags(data=dat.nb.list,model.file='../downloads/BUGSscripts/tut11.5bS4.1.bug',
                       param=c('beta0','beta1','size'),
                       n.chains=3, n.iter=50000, n.burnin=25000, n.thin=50)
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
       Graph Size: 157
    
    Initializing model
    
    print(dat.NB.jags)
    
    Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS4.1.bug", fit using jags,
     3 chains, each with 50000 iterations (first 25000 discarded), n.thin = 50
     n.sims = 1500 iterations saved
             mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
    beta0      0.730   0.421  -0.074   0.452   0.725   0.996   1.602 1.000  1500
    beta1      0.097   0.033   0.030   0.077   0.098   0.119   0.157 1.001  1500
    size       3.167   2.185   1.086   1.911   2.641   3.704   8.551 1.002  1100
    deviance 113.089   2.760 110.053 111.122 112.340 114.179 120.468 1.004   780
    
    For each parameter, n.eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
    
    DIC info (using the rule, pD = var(deviance)/2)
    pD = 3.8 and DIC = 116.9
    DIC is an estimate of expected predictive error (lower deviance is better).
    
    plot(as.mcmc(dat.NB.jags))
    
    plot of chunk tut11.5bS4.4
    autocorr.diag(as.mcmc(dat.NB.jags))
    
                 beta0     beta1  deviance     size
    Lag 0     1.000000  1.000000  1.000000  1.00000
    Lag 50    0.002132  0.007780  0.002472  0.00480
    Lag 250  -0.014601 -0.013641 -0.012261 -0.06051
    Lag 500   0.019129  0.015262 -0.023785  0.01291
    Lag 2500  0.039030  0.008195 -0.013073 -0.01248
    

    Conclusions: the samples are now less auto-correlated. Ideally, we should probably collect even more samples. Whilst the traceplots are reasonably noisy, there is more of a signal or pattern than there ideally should be.

    stan_ac(dat.nb.brm$fit)
    
    plot of chunk tut11.5bS4.4BRMS
  • Explore the step size characteristics (STAN only)
    summary(do.call(rbind, args = get_sampler_params(dat.nb.brm$fit, inc_warmup = FALSE)), digits = 2)
    
     accept_stat__     stepsize__    treedepth__   n_leapfrog__  n_divergent__
     Min.   :0.012   Min.   :0.63   Min.   :1.0   Min.   : 1.0   Min.   :0    
     1st Qu.:0.850   1st Qu.:0.63   1st Qu.:2.0   1st Qu.: 3.0   1st Qu.:0    
     Median :0.937   Median :0.75   Median :2.0   Median : 3.0   Median :0    
     Mean   :0.884   Mean   :0.76   Mean   :2.2   Mean   : 3.8   Mean   :0    
     3rd Qu.:0.990   3rd Qu.:0.90   3rd Qu.:3.0   3rd Qu.: 5.0   3rd Qu.:0    
     Max.   :1.000   Max.   :0.90   Max.   :4.0   Max.   :15.0   Max.   :0    
    
    stan_diag(dat.nb.brm$fit)
    
    plot of chunk tut11.5bS4.4BRMSstepSize
    stan_diag(dat.nb.brm$fit, information = "stepsize")
    
    plot of chunk tut11.5bS4.4BRMSstepSize
    stan_diag(dat.nb.brm$fit, information = "treedepth")
    
    plot of chunk tut11.5bS4.4BRMSstepSize
    stan_diag(dat.nb.brm$fit, information = "divergence")
    
    plot of chunk tut11.5bS4.4BRMSstepSize
    library(gridExtra)
    grid.arrange(stan_rhat(dat.nb.brm$fit) + theme_classic(8),
                 stan_ess(dat.nb.brm$fit) + theme_classic(8),
                 stan_mcse(dat.nb.brm$fit) + theme_classic(8),
                 ncol = 2)
    
    plot of chunk tut11.5bS4.4BRMSstepSize
    Conclusions: acceptance ratio is very good, the lot posterior is relatively robust to step size and tree depth, rhat values are all below 1.05, the effective sample size is always above 0.5 and the ratio of MCMC error to posterior sd is very small.
  • We now explore the goodness of fit of the models via the residuals and deviance. We could calculate the Pearsons's residuals within the JAGS model. Alternatively, we could use the parameters to generate the residuals outside of JAGS.
    #extract the samples for the two model parameters
    coefs <- dat.NB.jags$BUGSoutput$sims.matrix[,1:2]
    size <- dat.NB.jags$BUGSoutput$sims.matrix[,'size']
    Xmat <- model.matrix(~x, data=dat.nb)
    #expected values on a log scale
    eta<-coefs %*% t(Xmat)
    #expected value on response scale
    lambda <- exp(eta)
    varY <- lambda + (lambda^2)/size
    #sweep across rows and then divide by lambda
    Resid <- -1*sweep(lambda,2,dat.nb$y,'-')/sqrt(varY)
    #plot residuals vs expected values
    plot(apply(Resid,2,mean)~apply(eta,2,mean))
    
    plot of chunk tut11.5bS4.5
    #Calculate residuals 
    Resid.nb.brm <- residuals(dat.nb.brm, type='pearson')
    Fitted.nb.brm <- fitted(dat.nb.brm, scale='linear')
    ggplot(data=NULL, aes(y=Resid.nb.brm[,'Estimate'], x=Fitted.nb.brm[,'Estimate'])) + geom_point()
    
    plot of chunk tut11.5bS4.5BRM

    There are no real patterns in the residuals.

  • Now we will compare the sum of squared residuals to the sum of squares residuals that would be expected from a Poisson distribution matching that estimated by the model. Essentially this is estimating how well the Poisson distribution, the log-link function and the linear model approximates the observed data.
    SSres<-apply(Resid^2,1,sum)
    
    set.seed(2)
    #generate a matrix of draws from a negative binomial distribution
    # the matrix is the same dimensions as pi and uses the probabilities of pi
    YNew <- matrix(rnbinom(length(lambda),mu=lambda, size=size),nrow=nrow(lambda))
    Resid1<-(lambda - YNew)/sqrt(varY)
    SSres.sim<-apply(Resid1^2,1,sum)
    mean(SSres.sim>SSres)
    
    [1] 0.4127
    
    Xmat <- model.matrix(~x, dat.nb)
    nX <- ncol(Xmat)
    dat.nb.list2 <- with(dat.nb,list(Y=y, X=Xmat,N=nrow(dat.nb), mu=rep(0,nX),Sigma=diag(1.0E-06,nX)))
    modelString="
    model {
      for (i in 1:N) {
         Y[i] ~ dnegbin(p[i],size)
         p[i] <- size/(size+lambda[i])
         eta[i] <- inprod(beta[], X[i,])
         log(lambda[i]) <- eta[i]
    
         Y1[i] ~ dnegbin(p[i],size)
    	 varY[i] <- lambda[i] + pow(lambda[i],2)/size
    	 Resid[i] <- (Y[i] - lambda[i])/sqrt(varY[i])
         Resid1[i] <- (Y1[i] - lambda[i])/sqrt(varY[i])
         RSS[i] <- pow(Resid[i],2)
         RSS1[i] <-pow(Resid1[i],2) 
      }
      beta ~ dmnorm(mu[],Sigma[,])
      size ~ dunif(0.001,10000)
      Pvalue <- mean(sum(RSS1)>sum(RSS))
    } 
    "
    writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS4.61.bug')
    library(R2jags)
    system.time(
    dat.NB.jags3 <- jags(model='../downloads/BUGSscripts/tut11.5bS4.61.bug', data=dat.nb.list2, inits=NULL,
              param=c('beta','Pvalue'),
              n.chain=3,
              n.iter=20000, n.thin=10, n.burnin=10000)
      )
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
       Graph Size: 400
    
    Initializing model
    
       user  system elapsed 
      6.717   0.000   6.726 
    
    print(dat.NB.jags3)
    
    Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS4.61.bug", fit using jags,
     3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
     n.sims = 3000 iterations saved
             mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
    Pvalue     0.421   0.494   0.000   0.000   0.000   1.000   1.000 1.002  1000
    beta[1]    0.763   0.422  -0.035   0.482   0.746   1.044   1.619 1.002  1400
    beta[2]    0.095   0.033   0.030   0.072   0.095   0.117   0.160 1.002  1500
    deviance 113.064   2.596 110.071 111.143 112.361 114.276 119.730 1.001  3000
    
    For each parameter, n.eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
    
    DIC info (using the rule, pD = var(deviance)/2)
    pD = 3.4 and DIC = 116.4
    DIC is an estimate of expected predictive error (lower deviance is better).
    
    Resid.nb.brm <- residuals(dat.nb.brm, type='pearson', summary=FALSE)
    SSres.nb.brm <- apply(Resid.nb.brm^2,1,sum)
    lambda.nb.brm = fitted(dat.nb.brm, scale='response', summary=FALSE)
    YNew.nb.brm <- matrix(rpois(length(lambda.nb.brm), lambda=lambda.nb.brm),nrow=nrow(lambda.nb.brm))
    
    Resid1.nb.brm<-(lambda.nb.brm - YNew.nb.brm)/sqrt(lambda.nb.brm)
    SSres.sim.nb.brm<-apply(Resid1.nb.brm^2,1,sum)
    mean(SSres.sim.nb.brm>SSres.nb.brm)
    
    [1] 0.8286667
    

    Conclusions: the Bayesian p-value is not far from 0.5, suggesting that there is a good fit of the model to the data.

  • Unfortunately, unlike with linear models (Gaussian family), the expected distribution of data (residuals) varies over the range of fitted values for numerous (often competing) ways that make diagnosing (and attributing causes thereof) miss-specified generalized linear models from standard residual plots very difficult. The use of standardized (Pearson) residuals or deviance residuals can partly address this issue, yet they still do not offer completely consistent diagnoses across all issues (miss-specified model, over-dispersion, zero-inflation).

    An alternative approach is to use simulated data from the model posteriors to calculate an empirical cumulative density function from which residuals are are generated as values corresponding to the observed data along the density function.

    #extract the samples for the two model parameters
    coefs <- dat.NB.jags$BUGSoutput$sims.matrix[,1:2]
    size <- dat.NB.jags$BUGSoutput$sims.matrix[,'size']
    Xmat <- model.matrix(~x, data=dat.nb)
    #expected values on a log scale
    eta<-coefs %*% t(Xmat)
    #expected value on response scale
    lambda <- exp(eta)
    
    simRes <- function(lambda, data,n=250, plot=T, family='negbin', size=NULL) {
     require(gap)
     N = nrow(data)
     sim = switch(family,
        'poisson' = matrix(rpois(n*N,apply(lambda,2,mean)),ncol=N, byrow=TRUE),
        'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2,mean),size),ncol=N, byrow=TRUE)
     )
     a = apply(sim + runif(n,-0.5,0.5),2,ecdf)
     resid<-NULL
     for (i in 1:nrow(data)) resid<-c(resid,a[[i]](data$y[i] + runif(1 ,-0.5,0.5)))
     if (plot==T) {
       par(mfrow=c(1,2))
       gap::qqunif(resid,pch = 2, bty = "n",
       logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals",
       cex.main = 1, las=1)
       plot(resid~apply(lambda,2,mean), xlab='Predicted value', ylab='Standardized residual', las=1)
     }
     resid
    }
    
    simRes(lambda,dat.nb, family='negbin', size=mean(size))
    
    plot of chunk tut10.6bS4.6.simulatedResiduals
     [1] 0.208 0.956 0.428 0.688 0.080 0.888 0.124 0.728 0.144 0.824 0.460 0.352 0.100 0.172 0.684 0.208 0.004 0.688 0.892 0.876
    
    The trend (black symbols) in the qq-plot does not appear to be overly non-linear (matching the ideal red line well), suggesting that the model is not overdispersed. The spread of standardized (simulated) residuals in the residual plot do not appear overly non-uniform. That is there is not trend in the residuals. Furthermore, there is not a concentration of points close to 1 or 0 (which would imply overdispersion).
    lambda.nb.brm = fitted(dat.nb.brm, scale='response', summary=FALSE)
    size <- as.matrix(as.data.frame(rstan:::extract(dat.nb.brm$fit, par='shape')))
    
    simRes <- function(lambda, data,n=250, plot=T, family='poisson', size=NULL) {
     require(gap)
     N = nrow(data)
     sim = switch(family,
        'poisson' = matrix(rpois(n*N,apply(lambda,2,mean)),ncol=N, byrow=TRUE),
        'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2,mean),size),ncol=N, byrow=TRUE)
     )
     a = apply(sim + runif(n,-0.5,0.5),2,ecdf)
     resid<-NULL
     for (i in 1:nrow(data)) resid<-c(resid,a[[i]](data$y[i] + runif(1 ,-0.5,0.5)))
     if (plot==T) {
       par(mfrow=c(1,2))
       gap::qqunif(resid,pch = 2, bty = "n",
       logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals",
       cex.main = 1, las=1)
       plot(resid~apply(lambda,2,mean), xlab='Predicted value', ylab='Standardized residual', las=1)
     }
     resid
    }
    
    simRes(lambda.nb.brm, dat.nb, family='negbin', size=mean(size))
    
    plot of chunk tut11.5bS4.6.BRMSsimulatedResiduals
     [1] 0.332 0.924 0.420 0.748 0.128 0.888 0.108 0.680 0.184 0.824 0.480 0.464 0.092 0.180 0.672 0.212 0.000 0.628 0.884 0.844
    

Exploring the model parameters, test hypotheses

If there was any evidence that the assumptions had been violated or the model was not an appropriate fit, then we would need to reconsider the model and start the process again. In this case, there is no evidence that the test will be unreliable so we can proceed to explore the test statistics.

As with most Bayesian models, it is best to base conclusions on medians rather than means.

print(dat.NB.jags)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS4.1.bug", fit using jags,
 3 chains, each with 50000 iterations (first 25000 discarded), n.thin = 50
 n.sims = 1500 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta0      0.730   0.421  -0.074   0.452   0.725   0.996   1.602 1.000  1500
beta1      0.097   0.033   0.030   0.077   0.098   0.119   0.157 1.001  1500
size       3.167   2.185   1.086   1.911   2.641   3.704   8.551 1.002  1100
deviance 113.089   2.760 110.053 111.122 112.340 114.179 120.468 1.004   780

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.8 and DIC = 116.9
DIC is an estimate of expected predictive error (lower deviance is better).
library(plyr)
adply(dat.NB.jags$BUGSoutput$sims.matrix, 2, function(x) {
  data.frame(Median=median(x), Mean=mean(x), HPDinterval(as.mcmc(x)), HPDinterval(as.mcmc(x),p=0.5))
})
        X1    Median      Mean     lower    upper   lower.1  upper.1
1    beta0   0.72460   0.72989  -0.08683   1.5612   0.40701   0.9444
2    beta1   0.09763   0.09734   0.03644   0.1629   0.07544   0.1171
3 deviance 112.34047 113.08851 109.87327 118.6993 110.36278 112.7765
4     size   2.64119   3.16733   0.64845   7.0820   1.21209   2.7610

Actually, many find it more palatable to express the estimates in the original scale of the observations rather than on a log scale.

library(plyr)
adply(exp(dat.NB.jags$BUGSoutput$sims.matrix[,1:2]), 2, function(x) {
  data.frame(Median=median(x), Mean=mean(x), HPDinterval(as.mcmc(x)), HPDinterval(as.mcmc(x),p=0.5))
})
     X1 Median  Mean  lower upper lower.1 upper.1
1 beta0  2.064 2.274 0.7971 4.352   1.353   2.368
2 beta1  1.103 1.103 1.0307 1.170   1.078   1.124

Conclusions: We would reject the null hypothesis of no effect of $x$ on $y$. An increase in x is associated with a significant linear increase (positive slope) in the abundance of $y$. Every 1 unit increase in $x$ results in a log 0.0976275 unit increase in $y$. We usually express this in terms of abundance rather than log abundance, so every 1 unit increase in $x$ results in a ($e^{0.0976275}=1.1025521$) 1.1025521 unit increase in the abundance of $y$.

summary(dat.nb.brm)
 Family: negbinomial (log) 
Formula: y ~ x 
   Data: dat.nb (Number of observations: 20) 
Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 2; 
         total post-warmup samples = 1500
   WAIC: Not computed
 
Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     0.75      0.42    -0.04     1.59       1124    1
x             0.10      0.03     0.03     0.16       1110    1

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
shape     2.64      1.27     0.93     5.76       1214    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(coef(dat.nb.brm))
              mean
Intercept 2.114221
x         1.100268
coefs.nb.brm <- as.matrix(as.data.frame(rstan:::extract(dat.nb.brm$fit)))
coefs.nb.brm <- coefs.nb.brm[,grep('b', colnames(coefs.nb.brm))]
plyr:::adply(exp(coefs.nb.brm), 2, function(x) {
  data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
  })
           X1     Mean   median     lower    upper
1 b_Intercept 2.312289 2.123695 0.8215863 4.460667
2         b_x 1.100884 1.100589 1.0281742 1.173267
marginal_effects(dat.nb.brm)
plot of chunk tut11.5bS4.8BRMS

Further explorations of the trends

A measure of the strength of the relationship can be obtained according to: $$R^2 = 1 - \frac{RSS_{model}}{RSS_{null}}$$

Xmat <- model.matrix(~x, data=dat.nb)
#expected values on a log scale
eta<-coefs %*% t(Xmat)
#expected value on response scale
lambda <- exp(eta)
#calculate the raw SS residuals
SSres <- apply((-1*(sweep(lambda,2,dat.nb$y,'-')))^2,1,sum)
SSres.null <- sum((dat.nb$y - mean(dat.nb$y))^2)
#OR 
SSres.null <- crossprod(dat.nb$y - mean(dat.nb$y))
#calculate the model r2
1-mean(SSres)/SSres.null
       [,1]
[1,] 0.2719

Conclusions: 27.19% of the variation in $y$ abundance can be explained by its relationship with $x$.

Xmat <- model.matrix(~x, dat.nb)
nX <- ncol(Xmat)
dat.nb.list2 <- with(dat.nb,list(Y=y, X=Xmat,N=nrow(dat.nb), mu=rep(0,nX),Sigma=diag(1.0E-06,nX)))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dnegbin(p[i],size)
     p[i] <- size/(size+lambda[i])
     eta[i] <- inprod(beta[], X[i,])
     log(lambda[i]) <- eta[i]

     res[i] <- Y[i] - lambda[i]
     resnull[i] <- Y[i] - meanY
  }
  meanY <- mean(Y)
  beta ~ dmnorm(mu[],Sigma[,])
  size ~ dunif(0.001,10000)
  RSS <- sum(res^2)
  RSSnull <- sum(resnull^2)
  r2 <- 1-RSS/RSSnull
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS4.9.bug')
library(R2jags)
system.time(
dat.NB.jags4 <- jags(model='../downloads/BUGSscripts/tut11.5bS4.9.bug', data=dat.nb.list2, inits=NULL,
          param=c('beta','r2'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 217

Initializing model
   user  system elapsed 
  6.181   0.000   6.191 
print(dat.NB.jags4)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS4.9.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta[1]    0.727   0.405  -0.073   0.463   0.730   0.989   1.529 1.001  3000
beta[2]    0.097   0.032   0.034   0.076   0.097   0.117   0.163 1.002  1500
r2         0.276   0.148  -0.029   0.245   0.310   0.352   0.390 1.001  3000
deviance 113.051   2.625 110.048 111.134 112.366 114.231 120.033 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.4 and DIC = 116.5
DIC is an estimate of expected predictive error (lower deviance is better).
## calculate the expected values on the response scale
lambda.nb.brm = fitted(dat.nb.brm, scale='response', summary=FALSE)
## calculate the raw SSresid
SSres.nb.brm <- apply((-1*(sweep(lambda.nb.brm,2,dat.nb$y,'-')))^2,1,sum)
SSres.null <- sum((dat.nb$y - mean(dat.nb$y))^2)
#OR 
SSres.null <- crossprod(dat.nb$y - mean(dat.nb$y))
#calculate the model r2
1-mean(SSres.nb.brm)/SSres.null
          [,1]
[1,] 0.2601319

Finally, we will create a summary plot.

par(mar = c(4, 5, 0, 0))
plot(y ~ x, data = dat.nb, type = "n", ann = F, axes = F)
points(y ~ x, data = dat.nb, pch = 16)
xs <- seq(min(dat.nb$x,na.rm=TRUE),max(dat.nb$x,na.rm=TRUE), l = 1000)
Xmat <- model.matrix(~xs)
eta<-coefs %*% t(Xmat)
ys <- exp(eta)
library(plyr)
library(coda)
data.tab <- adply(ys,2,function(x) {
  data.frame(Median=median(x), HPDinterval(as.mcmc(x)))
})
data.tab <- cbind(x=xs,data.tab)
points(Median ~ x, data=data.tab,col = "black", type = "l")
lines(lower ~ x, data=data.tab,col = "black", type = "l", lty = 2)
lines(upper ~ x, data=data.tab,col = "black", type = "l", lty = 2)

axis(1)
mtext("X", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Abundance of Y", 2, cex = 1.5, line = 3)
box(bty = "l")
plot of chunk tut11.5bS4.13
newdata = data.frame(x=seq(min(dat.nb$x), max(dat.nb$x), len=100))
Xmat = model.matrix(~x, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(dat.nb.brm$fit)))
coefs <- coefs[,grep('b', colnames(coefs))]
fit = exp(coefs %*% t(Xmat))
newdata = cbind(newdata,
                plyr:::adply(fit, 2, function(x) {
                                  data.frame(Mean=mean(x), Median=median(x),
                                             HPDinterval(as.mcmc(x)))
                                })
)

ggplot(newdata, aes(y=Mean, x=x)) +
  geom_point(data=dat.nb, aes(y=y)) +
  geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue',alpha=0.2) +
  geom_line() +
  scale_x_continuous('X') +
  scale_y_continuous('Abundance of Y') +
  theme_classic() +
  theme(axis.line.x=element_line(),axis.line.y=element_line())
plot of chunk tut11.5bS4.13BRMS

Defining full log-likelihood function

Now lets try it by specifying log-likelihood and the zero trick. When applying this trick, we need to manually calculate the deviance as the inbuilt deviance will be based on the log-likelihood of estimating the zeros (as part of the zero trick) rather than the deviance of the intended model..

The one advantage of the zero trick is that the Deviance and thus DIC, AIC provided by R2jags will be incorrect. Hence, they too need to be manually defined within jags. I suspect that the AIC calculation I have used is incorrect...

Xmat <- model.matrix(~x, dat.nb)
nX <- ncol(Xmat)
dat.nb.list2 <- with(dat.nb,list(Y=y, X=Xmat,N=nrow(dat.nb), mu=rep(0,nX),
                  Sigma=diag(1.0E-06,nX), zeros=rep(0,nrow(dat.nb)), C=10000))
modelString="
model {
  for (i in 1:N) {
     zeros[i] ~ dpois(zeros.lambda[i])
     zeros.lambda[i] <- -ll[i] + C     
     ll[i] <- loggam(Y[i]+size) - loggam(Y[i]+1) - loggam(size) + size*(log(p[i]) - log(p[i]+1)) - 
              Y[i]*log(p[i]+1)
     p[i] <- size/lambda[i]
     eta[i] <- inprod(beta[], X[i,])
     log(lambda[i]) <- eta[i]
  }
  beta ~ dmnorm(mu[],Sigma[,])
  size ~ dunif(0.001,1000)
  dev <- sum(-2*ll)
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS4.14.bug')
library(R2jags)
system.time(
dat.NB.jags3 <- jags(model='../downloads/BUGSscripts/tut11.5bS4.14.bug', data=dat.nb.list2, inits=NULL,
          param=c('beta','dev'),
          n.chain=3,
          n.iter=50000, n.thin=50, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 453

Initializing model
   user  system elapsed 
 16.996   0.000  17.014 
print(dat.NB.jags3)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS4.14.bug", fit using jags,
 3 chains, each with 50000 iterations (first 10000 discarded), n.thin = 50
 n.sims = 2400 iterations saved
            mu.vect sd.vect       2.5%        25%        50%        75%      97.5%  Rhat n.eff
beta[1]       0.741   0.417     -0.074      0.461      0.738      1.020      1.559 1.002   990
beta[2]       0.096   0.033      0.033      0.074      0.097      0.118      0.160 1.003   640
dev         113.085   2.662    110.073    111.179    112.398    114.194    120.071 1.000  2400
deviance 400113.085   2.662 400110.073 400111.179 400112.398 400114.194 400120.071 1.000     1

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.5 and DIC = 400116.6
DIC is an estimate of expected predictive error (lower deviance is better).

Zero-inflated Poisson (ZIP) regression

Zero-Inflation Poisson (ZIP) mixture model is defined as: $$ p(y_i|\theta,\lambda) = \left\{ \begin{array}{l l} \theta + (1-\theta)\times \text{Pois}(0|\lambda) & \quad \text{if $y_i=0$ and}\\ (1-\theta)\times \text{Pois}(y_i|\lambda) & \quad \text{if $y_i>0$} \end{array} \right. $$ where $\theta$ is the probability of false values (zeros).

Hence there is essentially two models coupled together (a mixture model) to yield an overall probability:

  • when an observed response is zero ($y_i=0$), it is the probability of getting a false value (zero) plus the probability of a true value multiplied probability of drawing a value of zero from a Poisson distribution of $lambda$
  • when an observed response is greater than 0, it is the probability of a true value multiplied probability of drawing that value from a Poisson distribution of $lambda$

The above formulation indicates the same $\lambda$ for both the zeros and non-zeros components. In the model of zero values, we are essentially investigating whether the likelihood of false zeros is related to the linear predictor and then the greater than zero model investigates whether the counts are related to the linear predictor.

However, we are typically less interested in modelling determinants of false zeros. Indeed, it is better that the likelihood of false zeros be unrelated to the linear predictor. For example, if excess (false zeros) are due to issues of detectability (individuals are present, just not detected), it is better that the detectability is not related to experimental treatments. Ideally, any detectability issues should be equal across all treatment levels.

The expected value of $Y$ and the variance in $Y$ for a ZIP model are: $$ E(y_i) = \lambda\times(1-\theta)\\ Var(y_i) = \lambda\times(1-\theta)\times(1+\theta\times\lambda^2) $$

Scenario and Data

Lets say we wanted to model the abundance of an item ($y$) against a continuous predictor ($x$). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Random data incorporating the following trends (effect parameters)
  • the sample size = 20
  • the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
  • the rate of change in log $y$ per unit of $x$ (slope) = 0.1.
  • the value of $x$ when log$y$ equals 0 (when $y$=1)
  • to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
  • finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.
set.seed(37) #34.5  #4 #10 #16 #17 #26
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1, max =20))
mm <- model.matrix(~x)
intercept <- 0.6
slope=0.1
#The linear predictor
linpred <- mm %*% c(intercept,slope)
#Predicted y values
lambda <- exp(linpred)
#Add some noise and make binomial
library(gamlss.dist)
#fixed latent binomial
y<- rZIP(n.x,lambda, 0.4)
#latent binomial influenced by the linear predictor 
#y<- rZIP(n.x,lambda, 1-exp(linpred)/(1+exp(linpred)))
dat.zip <- data.frame(y,x)

summary(glm(y~x, dat.zip, family="poisson"))
Call:
glm(formula = y ~ x, family = "poisson", data = dat.zip)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.5415  -2.3769  -0.9753   1.1736   3.6380  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.67048    0.33018   2.031   0.0423 *
x            0.02961    0.02663   1.112   0.2662  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 85.469  on 19  degrees of freedom
Residual deviance: 84.209  on 18  degrees of freedom
AIC: 124.01

Number of Fisher Scoring iterations: 6
plot(glm(y~x, dat.zip, family="poisson"))
plot of chunk tut11.5bS5.1
plot of chunk tut11.5bS5.1
plot of chunk tut11.5bS5.1
plot of chunk tut11.5bS5.1
library(pscl)
summary(zeroinfl(y ~ x | 1, dist = "poisson", data = dat.zip))
Call:
zeroinfl(formula = y ~ x | 1, data = dat.zip, dist = "poisson")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.0015 -0.9556 -0.3932  0.9663  1.6195 

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.70474    0.31960   2.205 0.027449 *  
x            0.08734    0.02532   3.449 0.000563 ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.2292     0.4563  -0.502    0.615
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 13 
Log-likelihood: -36.17 on 3 Df
plot(resid(zeroinfl(y ~ x | 1, dist = "poisson", data = dat.zip))~fitted(zeroinfl(y ~ x | 1, dist = "poisson", data = dat.zip)))
plot of chunk tut11.5bS5.1
library(gamlss)
summary(gamlss(y~x,data=dat.zip, family=ZIP))
GAMLSS-RS iteration 1: Global Deviance = 72.4363 
GAMLSS-RS iteration 2: Global Deviance = 72.3428 
GAMLSS-RS iteration 3: Global Deviance = 72.3428 
*******************************************************************
Family:  c("ZIP", "Poisson Zero Inflated") 

Call:  gamlss(formula = y ~ x, family = ZIP, data = dat.zip) 

Fitting method: RS() 

-------------------------------------------------------------------
Mu link function:  log
Mu Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.70582    0.31958   2.209  0.04123 * 
x            0.08719    0.02533   3.442  0.00311 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

-------------------------------------------------------------------
Sigma link function:  logit
Sigma Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.2292     0.4563  -0.502    0.622

-------------------------------------------------------------------
No. of observations in the fit:  20 
Degrees of Freedom for the fit:  3
      Residual Deg. of Freedom:  17 
                      at cycle:  3 
 
Global Deviance:     72.34282 
            AIC:     78.34282 
            SBC:     81.33002 
*******************************************************************
predict(gamlss(y~x,data=dat.zip, family=ZIP), se.fit=TRUE, what="mu")
GAMLSS-RS iteration 1: Global Deviance = 72.4363 
GAMLSS-RS iteration 2: Global Deviance = 72.3428 
GAMLSS-RS iteration 3: Global Deviance = 72.3428 
$fit
        1         2         3         4         5         6         7         8         9        10        11        12        13        14        15        16        17        18        19        20 
0.7966905 0.9236189 1.0263933 1.0369823 1.1305358 1.3763304 1.4054417 1.4299603 1.4951229 1.6161339 1.7035853 1.7629994 1.8678543 1.9838052 1.9951833 2.1187071 2.1249555 2.1431616 2.1837285 2.3064727 

$se.fit
        1         2         3         4         5         6         7         8         9        10        11        12        13        14        15        16        17        18        19        20 
0.3517894 0.3089432 0.2758980 0.2726011 0.2445792 0.1856062 0.1807322 0.1770893 0.1696661 0.1656458 0.1710016 0.1783138 0.1973009 0.2251987 0.2282363 0.2637860 0.2656903 0.2712879 0.2840032 0.3241532 

Exploratory data analysis and initial assumption checking

The assumptions are:
  1. All of the observations are independent - this must be addressed at the design and collection stages
  2. The response variable (and thus the residuals) should be matched by an appropriate distribution (in the case of positive integers response - a Poisson is appropriate).
  3. All observations are equally influential in determining the trends - or at least no observations are overly influential. This is most effectively diagnosed via residuals and other influence indices and is very difficult to diagnose prior to analysis
  4. the relationship between the linear predictor (right hand side of the regression formula) and the link function should be linear. A scatterplot with smoother can be useful for identifying possible non-linearity.
  5. Dispersion is either 1 or overdispersion is otherwise accounted for in the model
  6. The number of zeros is either not excessive or else they are specifically addressed by the model

Confirm non-normality and explore clumping

Check the distribution of the $y$ abundances

hist(dat.zip$y)
plot of chunk tut11.5bS5.2
boxplot(dat.zip$y, horizontal=TRUE)
rug(jitter(dat.zip$y))
plot of chunk tut11.5bS5.2
There is definitely signs of non-normality that would warrant Poisson models. Further to that, there appears to be a large number of zeros that are likely to be the cause of overdispersion A zero-inflated Poisson model is likely to be one of the most effective for modeling these data.

Confirm linearity

Lets now explore linearity by creating a histogram of the predictor variable ($x$). Note, it is difficult to directly assess issues of linearity. Indeed, a scatterplot with lowess smoother will be largely influenced by the presence of zeros. One possible way of doing so is to explore the trend in the non-zero data.

hist(dat.zip$x)
plot of chunk tut11.5bS5.3
#now for the scatterplot
plot(y~x, dat.zip)
with(subset(dat.zip,y>0), lines(lowess(y~x)))
plot of chunk tut11.5bS5.3

Conclusions: the predictor ($x$) does not display any skewness or other issues that might lead to non-linearity. The lowess smoother on the non-zero data cloud does not display major deviations from a straight line and thus linearity is likely to be satisfied. Violations of linearity (whilst difficult to be certain about due to the unknown influence of the zeros) could be addressed by either:

  • define a non-linear linear predictor (such as a polynomial, spline or other non-linear function)
  • transform the scale of the predictor variables

Explore zero inflation

Although we have already established that there are few zeros in the data (and thus overdispersion is unlikely to be an issue), we can also explore this by comparing the number of zeros in the data to the number of zeros that would be expected from a Poisson distribution with a mean equal to the mean count of the data.

#proportion of 0's in the data
dat.zip.tab<-table(dat.zip$y==0)
dat.zip.tab/sum(dat.zip.tab)
FALSE  TRUE 
 0.55  0.45 
#proportion of 0's expected from a Poisson distribution
mu <- mean(dat.zip$y)
cnts <- rpois(1000, mu)
dat.zip.tabE <- table(cnts == 0)
dat.zip.tabE/sum(dat.zip.tabE)
FALSE  TRUE 
0.921 0.079 
In the above, the value under FALSE is the proportion of non-zero values in the data and the value under TRUE is the proportion of zeros in the data. In this example, the proportion of zeros observed (45%) far exceeds that that would have been expected (7.9%). Hence it is highly likely that any models will be zero-inflated.

Model fitting or statistical analysis

The exploratory data analyses have suggested that a zero-inflated Poisson model might be the most appropriate for these data.

dat.zip.list <- with(dat.zip,list(Y=y, X=x,N=nrow(dat.zip)))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dpois(lambda[i])
     eta[i] <- beta0 + beta1*X[i]
     log(lambda[i]) <- eta[i]
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS5.4.bug')
library(R2jags)
system.time(
dat.zip.P.jags <- jags(model='../downloads/BUGSscripts/tut11.5bS5.4.bug', data=dat.zip.list, inits=NULL,
          param=c('beta0','beta1'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 105

Initializing model
   user  system elapsed 
  1.476   0.000   1.477 
print(dat.zip.P.jags)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS5.4.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta0      0.635   0.326  -0.061   0.431   0.644   0.857   1.250 1.001  2400
beta1      0.032   0.025  -0.018   0.014   0.031   0.048   0.084 1.001  2800
deviance 121.928   2.039 120.059 120.548 121.311 122.606 127.122 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.1 and DIC = 124.0
DIC is an estimate of expected predictive error (lower deviance is better).
#extract the samples for the two model parameters
coefs <- dat.zip.P.jags$BUGSoutput$sims.matrix[,1:2]
Xmat <- model.matrix(~x, data=dat)
#expected values on a log scale
eta<-coefs %*% t(Xmat)
#expected value on response scale
lambda <- exp(eta)

Resid <- -1*sweep(lambda,2,dat.zip$y, '-')/sqrt(lambda)
RSS <- apply(Resid^2, 1, sum)
Disp <- RSS/(nrow(dat.zip)-ncol(coefs))
data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)), HPDinterval(as.mcmc(Disp),p=0.5))
     Median  Mean lower upper lower.1 upper.1
var1  4.195 4.325 3.181 5.642   3.605   4.426

The dispersion parameter was 4.3250848, indicating over three times more variability than would be expected for a Poisson distribution. The data are thus over-dispersed. Given the large number of zeros in the response, it would seem likely that the overdispersion is as a result of the excessive zeros and thus zero-inflated Poisson model would seem reasonable. Note, if this model is still overdispersed (possibly due to clumpiness in the non-zero values), then a zero-inflated negative binomial model might be worth exploring.

JAGS

$$ Y_i = \left\{ \begin{array}{lrcl} 0~\text{with}~P(y_i) = 1-\mu & z_i&\sim&\text{Binom}(1-\theta)\\ & logit(\theta)&=&\gamma_0\\ &\gamma_0&\sim{}&\mathcal{N}(0,10000)\\ >0 & Y_i&\sim&\text{Pois}(\lambda_i)\\ &\lambda_i&=&z_i + \eta_i\\ &log(\eta_i)&=&\beta_0+\beta_1x_1\\ &\beta_0, \beta_1&\sim{}&\mathcal{N}(0,10000)\\ \end{array} \right. $$ Or in shorthand: $$ \begin{align} Y_i&\sim{}ZIP(\lambda,\theta) & (\text{response distribution})\\ logit(\theta)&=\gamma_0 & (\text{link function/linear predictor - zero component})\\ log(\lambda_i)&=\eta_i & (\text{link function - count component})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor - count component})\\ \beta_0, \beta_1, \gamma_0&\sim{}\mathcal{N}(0,10000) & (\text{diffuse Bayesian priors})\\ \end{align} $$
dat.zip.list <- with(dat.zip,list(Y=y, X=x,N=nrow(dat.nb), z=ifelse(y==0,0,1)))
modelString="
model {
  for (i in 1:N) {
     z[i] ~ dbern(one.minus.theta)
     Y[i] ~ dpois(lambda[i])
     lambda[i] <- z[i]*eta[i]
     log(eta[i]) <- beta0 + beta1*X[i]
  }
  one.minus.theta <- 1-theta
  logit(theta) <- gamma0
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
  gamma0 ~ dnorm(0,1.0E-06)
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS5.5.bug')
library(R2jags)
system.time(
dat.zip.jags <- jags(model='../downloads/BUGSscripts/tut11.5bS5.5.bug', data=dat.zip.list, inits=NULL,
          param=c('beta0','beta1', 'gamma0','theta'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 149

Initializing model
   user  system elapsed 
  2.492   0.004   2.503 
Xmat <- model.matrix(~x, dat.zip)
nX <- ncol(Xmat)
dat.zip.list1 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), nX=nX, z=ifelse(y==0,0,1)))
modelString="
model {
  for (i in 1:N) {
     z[i] ~ dbern(one.minus.theta)
     Y[i] ~ dpois(lambda[i])
     lambda[i] <- z[i]*eta[i]
     log(eta[i]) <- inprod(beta[], X[i,])
  }
  one.minus.theta <- 1-theta
  logit(theta) <- gamma0
  gamma0 ~ dnorm(0,1.0E-06)
  for (i in 1:nX) {
    beta[i] ~ dnorm(0,1.0E-06)
  }
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS5.6.bug')
library(R2jags)
system.time(
dat.zip.jags1 <- jags(model='../downloads/BUGSscripts/tut11.5bS5.6.bug', data=dat.zip.list1, inits=NULL,
          param=c('beta','gamma0'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 171

Initializing model
   user  system elapsed 
  2.465   0.004   2.470 
print(dat.zip.jags1)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS5.6.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta[1]    0.709   0.316  0.059  0.495  0.723  0.922  1.304 1.001  3000
beta[2]    0.086   0.025  0.039  0.070  0.085  0.103  0.135 1.001  3000
gamma0    -0.197   0.462 -1.097 -0.508 -0.198  0.113  0.724 1.001  3000
deviance  75.675   2.578 72.834 73.815 74.979 76.794 82.470 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.3 and DIC = 79.0
DIC is an estimate of expected predictive error (lower deviance is better).

Or arguably better still, use a multivariate normal prior. If we have a $k$ regression parameters ($\beta_k$), then the multivariate normal priors are defined as: $$ \boldsymbol{\beta}\sim{}\mathcal{N_k}(\boldsymbol{\mu}, \mathbf{\Sigma}) $$ where $$\boldsymbol{\mu}=[E[\beta_1],E[\beta_2],...,E[\beta_k]] = \left(\begin{array}{c}0\\\vdots\\0\end{array}\right)$$ $$ \mathbf{\Sigma}=[Cov[\beta_i, \beta_j]] = \left(\begin{array}{ccc}1000^2&0&0\\0&\ddots&0\\0&0&1000^2\end{array} \right) $$ hence, along with the response and predictor matrix, we need to supply $\boldsymbol{\mu}$ (a vector of zeros) and $\boldsymbol{\Sigma}$ (a covariance matrix with $1000^2$ in the diagonals).

Xmat <- model.matrix(~x, dat.zip)
nX <- ncol(Xmat)
dat.zip.list2 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), nX=nX,
                      mu=rep(0,nX),Sigma=diag(1.0E-06,nX), z=ifelse(y==0,0,1)))
modelString="
model {
  for (i in 1:N) {
     z[i] ~ dbern(one.minus.theta)
     Y[i] ~ dpois(lambda[i])
     lambda[i] <- z[i]*eta[i]
     log(eta[i]) <- inprod(beta[], X[i,])
  }
  one.minus.theta <- 1-theta
  logit(theta) <- gamma0
  gamma0 ~ dnorm(0,1.0E-06)
  beta ~ dmnorm(mu[],Sigma[,])
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS5.7.bug')
library(R2jags)
system.time(
dat.zip.jags2 <- jags(model='../downloads/BUGSscripts/tut11.5bS5.7.bug', data=dat.zip.list2, inits=NULL,
          param=c('beta', 'gamma0'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 176

Initializing model
   user  system elapsed 
  0.912   0.004   0.923 
print(dat.zip.jags2)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS5.7.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta[1]    0.700   0.321  0.054  0.484  0.702  0.924  1.301 1.001  3000
beta[2]    0.087   0.025  0.038  0.070  0.087  0.104  0.136 1.001  3000
gamma0    -0.217   0.462 -1.155 -0.524 -0.201  0.094  0.674 1.001  3000
deviance  75.688   2.477 72.857 73.895 75.026 76.816 82.064 1.002  2400

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.1 and DIC = 78.8
DIC is an estimate of expected predictive error (lower deviance is better).

Note, the n.eff indicates that we probably have an issue with chain mixing and/or autocorrelation. We probably should increase the number of iterations and the thinning rate.

dat.zip.brm <- brm(y~0+spec + main + main:x, data=dat.zip, family='zero_inflated_poisson',
  prior = c(set_prior("normal(0,100)", class="b")),
  chains=3, iter=2000, warmup=1000, thin=2)
SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.184441 seconds (Warm-up)
#                0.109929 seconds (Sampling)
#                0.29437 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.18517 seconds (Warm-up)
#                0.121613 seconds (Sampling)
#                0.306783 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.183662 seconds (Warm-up)
#                0.125256 seconds (Sampling)
#                0.308918 seconds (Total)
# 
## Or if we allow the zero-inflation model and the count model to vary with x
dat.zip.brm1 <- brm(y~trait:x, data=dat.zip, family='zero_inflated_poisson',
  prior = c(set_prior("normal(0,100)", class="b")),
  chains=3, iter=2000, warmup=1000, thin=2)
SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.140589 seconds (Warm-up)
#                0.212475 seconds (Sampling)
#                0.353064 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.154223 seconds (Warm-up)
#                0.130932 seconds (Sampling)
#                0.285155 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.139585 seconds (Warm-up)
#                0.132922 seconds (Sampling)
#                0.272507 seconds (Total)
# 

Chain mixing and Model validation

Prior to exploring the model parameters, it is prudent to confirm that the model did indeed fit the assumptions and was an appropriate fit to the data as well as that the MCMC sampling chain was adequately mixed and the retained samples independent.

  • We will start by exploring the mixing of the MCMC chains via traceplots.
    plot(as.mcmc(dat.zip.jags))
    
    plot of chunk tut11.5bS5.8
    plot of chunk tut11.5bS5.8

    The chains appear well mixed and stable

    library(gridExtra)
    grid.arrange(stan_trace(dat.zip.brm$fit, ncol=1),
                 stan_dens(dat.zip.brm$fit, separate_chains=TRUE,ncol=1),
                 ncol=2)
    
    plot of chunk tut11.5bS5.8BRMS
  • Next we will explore correlation amongst MCMC samples.
    autocorr.diag(as.mcmc(dat.zip.jags))
    
                beta0      beta1  deviance     gamma0     theta
    Lag 0    1.000000  1.0000000  1.000000  1.0000000  1.000000
    Lag 10   0.275638  0.2908784  0.043162  0.0469457  0.048194
    Lag 50  -0.006762  0.0226283 -0.007629  0.0035146  0.002608
    Lag 100  0.016017  0.0046681 -0.008336  0.0003516 -0.000948
    Lag 500 -0.001439 -0.0009426 -0.003084 -0.0085311 -0.010254
    

    The level of auto-correlation at the nominated lag of 10 is higher than we would generally like. It is worth increasing the thinning rate from 10 to 50. Obviously, to support this higher thinning rate, we would also increase the number of iterations.

    library(R2jags)
    dat.zip.jags <- jags(data=dat.zip.list,model.file='../downloads/BUGSscripts/tut11.5bS5.5.bug',
                       param=c('beta0','beta1','gamma0','theta'),
                       n.chains=3, n.iter=100000, n.burnin=50000, n.thin=50)
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
       Graph Size: 149
    
    Initializing model
    
    print(dat.zip.jags)
    
    Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS5.5.bug", fit using jags,
     3 chains, each with 1e+05 iterations (first 50000 discarded), n.thin = 50
     n.sims = 3000 iterations saved
             mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
    beta0      0.706   0.319  0.043  0.494  0.716  0.928  1.301 1.001  2800
    beta1      0.086   0.026  0.037  0.069  0.086  0.103  0.137 1.001  2300
    gamma0    -0.220   0.466 -1.127 -0.541 -0.212  0.107  0.670 1.002  1800
    theta      0.448   0.110  0.245  0.368  0.447  0.527  0.661 1.001  2000
    deviance  75.685   2.453 72.827 73.883 75.054 76.878 81.846 1.001  2300
    
    For each parameter, n.eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
    
    DIC info (using the rule, pD = var(deviance)/2)
    pD = 3.0 and DIC = 78.7
    DIC is an estimate of expected predictive error (lower deviance is better).
    
    plot(as.mcmc(dat.zip.jags))
    
    plot of chunk tut11.5bS5.10
    plot of chunk tut11.5bS5.10
    autocorr.diag(as.mcmc(dat.zip.jags))
    
                 beta0     beta1  deviance    gamma0     theta
    Lag 0     1.000000  1.000000  1.000000  1.000000  1.000000
    Lag 50   -0.003957 -0.029631  0.016977 -0.019662 -0.019085
    Lag 250   0.020470  0.011623 -0.016871  0.009786  0.011709
    Lag 500   0.010446  0.003702  0.003925 -0.048037 -0.047746
    Lag 2500 -0.026443  0.002784 -0.021067 -0.005668 -0.003888
    

    Conclusions: the samples are now less auto-correlated and the chains are all well mixed and appear stable.

    stan_ac(dat.zip.brm$fit)
    
    plot of chunk tut11.5bS5.10BRMS
  • Explore the step size characteristics (STAN only)
    summary(do.call(rbind, args = get_sampler_params(dat.zip.brm$fit, inc_warmup = FALSE)), digits = 2)
    
     accept_stat__    stepsize__    treedepth__   n_leapfrog__  n_divergent__
     Min.   :0.19   Min.   :0.29   Min.   :1.0   Min.   : 1.0   Min.   :0    
     1st Qu.:0.89   1st Qu.:0.29   1st Qu.:2.0   1st Qu.: 3.0   1st Qu.:0    
     Median :0.96   Median :0.35   Median :3.0   Median : 7.0   Median :0    
     Mean   :0.92   Mean   :0.33   Mean   :2.9   Mean   : 7.6   Mean   :0    
     3rd Qu.:0.99   3rd Qu.:0.36   3rd Qu.:4.0   3rd Qu.:11.0   3rd Qu.:0    
     Max.   :1.00   Max.   :0.36   Max.   :4.0   Max.   :15.0   Max.   :0    
    
    stan_diag(dat.zip.brm$fit)
    
    plot of chunk tut11.5bS5.10BRMSstepSize
    stan_diag(dat.zip.brm$fit, information = "stepsize")
    
    plot of chunk tut11.5bS5.10BRMSstepSize
    stan_diag(dat.zip.brm$fit, information = "treedepth")
    
    plot of chunk tut11.5bS5.10BRMSstepSize
    stan_diag(dat.zip.brm$fit, information = "divergence")
    
    plot of chunk tut11.5bS5.10BRMSstepSize
    library(gridExtra)
    grid.arrange(stan_rhat(dat.zip.brm$fit) + theme_classic(8),
                 stan_ess(dat.zip.brm$fit) + theme_classic(8),
                 stan_mcse(dat.zip.brm$fit) + theme_classic(8),
                 ncol = 2)
    
    plot of chunk tut11.5bS5.10BRMSstepSize
    Conclusions: acceptance ratio is very good, the lot posterior is relatively robust to step size and tree depth, rhat values are all below 1.05, the effective sample size is not too bad (we might be able to achieve better acceptance ratios with tighter priors) and the ratio of MCMC error to posterior sd is very small.
  • We now explore the goodness of fit of the models via the residuals and deviance. We could calculate the Pearsons's residuals within the JAGS model. Alternatively, we could use the parameters to generate the residuals outside of JAGS.
    #extract the samples for the two model parameters
    coefs <- dat.zip.jags$BUGSoutput$sims.matrix[,1:2]
    theta <- dat.zip.jags$BUGSoutput$sims.matrix[,'theta']
    Xmat <- model.matrix(~x, data=dat.zip)
    #expected values on a log scale
    lambda<-coefs %*% t(Xmat)
    #expected value on response scale
    eta <- exp(lambda)
    expY <- sweep(eta,1,(1-theta),"*")
    varY <- eta+sweep(eta^2,1,theta,"*")
    varY <- sweep(varY,1,(1-theta),'*')
    #sweep across rows and then divide by lambda
    Resid <- -1*sweep(expY,2,dat.zip$y,'-')/sqrt(varY)
    #plot residuals vs expected values
    plot(apply(Resid,2,mean)~apply(eta,2,mean))
    
    plot of chunk tut11.5bS5.11
    #Calculate residuals 
    Resid.zip.brm <- residuals(dat.zip.brm, type='pearson')[,'Estimate']
    #Extract the fitted values.  Note, since a zero-inflated model is really two models 
    #(the binary process and the poisson process), there are fitted values associated 
    #with both processes.  The first half of the fitted values are associated with
    #the count process (the ones we are interested in here), the second half are
    #associated with the binomial process. 
    #Furthermore, dont be tempted to use fitted(dat.zip.brm, scale='response',...) with
    #zero inflated models, they seem to be a bit odd...
    Fitted.zip.brm <- exp(fitted(dat.zip.brm, scale='linear')[1:nrow(dat.zip),'Estimate'])
    ggplot(data=NULL, aes(y=Resid.zip.brm, x=Fitted.zip.brm)) + geom_point()
    
    plot of chunk tut11.5bS5.11BRM
    ## Or the x varying zero-inflated and count model
    Resid.zip.brm <- residuals(dat.zip.brm1, type='pearson')[,'Estimate']
    Fitted.zip.brm <- exp(fitted(dat.zip.brm1, scale='linear')[1:nrow(dat.zip),'Estimate'])
    ggplot(data=NULL, aes(y=Resid.zip.brm, x=Fitted.zip.brm)) + geom_point()
    
    plot of chunk tut11.5bS5.11BRM

    There are no real patterns in the residuals.

  • Now we will compare the sum of squared residuals to the sum of squares residuals that would be expected from a Poisson distribution matching that estimated by the model. Essentially this is estimating how well the Poisson distribution, the log-link function and the linear model approximates the observed data.

    When doing so, we need to consider the expected value and variance of the zero-inflated poisson. $$ E(y_i) = \lambda\times(1-\theta)\\ Var(y_i) = \lambda\times(1-\theta)\times(1+\theta\times\lambda^2) $$

    SSres<-apply(Resid^2,1,sum)
    
    set.seed(2)
    #generate a matrix of draws from a zero-inflated poisson (ZIP) distribution
    # the matrix is the same dimensions as lambda
    library(gamlss.dist)
    #YNew <- matrix(rZIP(length(lambda),eta, theta),nrow=nrow(lambda))
    lambda <- sweep(eta,1,ifelse(dat.zip$y==0,0,1),'*')
    YNew <- matrix(rpois(length(lambda),lambda),nrow=nrow(lambda))
    Resid1<-(expY - YNew)/sqrt(varY)
    SSres.sim<-apply(Resid1^2,1,sum)
    mean(SSres.sim>SSres)
    
    [1] 0.4346667
    

    Whilst not ideal (as we would prefer a Bayesian P-value of around 0.5), this value is not wildly bad and does not constitute overwhelming evidence of a lack of fit.

    Xmat <- model.matrix(~x, dat.zip)
    nX <- ncol(Xmat)
    dat.zip.list2 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), nX=nX,
                          mu=rep(0,nX),Sigma=diag(1.0E-06,nX), z=ifelse(y==0,0,1)))
    modelString="
    model {
      for (i in 1:N) {
         z[i] ~ dbern(one.minus.theta)
         Y[i] ~ dpois(lambda[i])
         lambda[i] <- z[i]*eta[i]
         log(eta[i]) <- max(-20,min(20,inprod(beta[], X[i,])))
    
         expY[i] <- eta[i]*(1-theta)
         varY[i] <- (1-theta)*(eta[i]*theta*pow(eta[i],2))
         Resid[i] <- (Y[i] - expY[i])/sqrt(varY[i])
         Y1[i] ~ dpois(lambda[i])
         Resid1[i] <- (Y1[i] - expY[i])/sqrt(varY[i])
         RSS[i] <- pow(Resid[i],2)
         RSS1[i] <-pow(Resid1[i],2) 
      }
      one.minus.theta <- 1-theta
      logit(theta) <- gamma0
      gamma0 ~ dnorm(0,1.0E-06)
      beta ~ dmnorm(mu[],Sigma[,])
      Pvalue <- mean(sum(RSS1)>sum(RSS))
    } 
    "
    writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS5.13.bug')
    library(R2jags)
    system.time(
    dat.zip.jags3 <- jags(model='../downloads/BUGSscripts/tut11.5bS5.13.bug', data=dat.zip.list2, inits=NULL,
              param=c('beta', 'gamma0','Pvalue'),
              n.chain=3,
              n.iter=20000, n.thin=10, n.burnin=10000)
      )
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
       Graph Size: 465
    
    Initializing model
    
       user  system elapsed 
      1.321   0.004   1.328 
    
    print(dat.zip.jags3)
    
    Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS5.13.bug", fit using jags,
     3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
     n.sims = 3000 iterations saved
             mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
    Pvalue     0.455   0.498  0.000  0.000  0.000  1.000  1.000 1.001  3000
    beta[1]    0.718   0.320  0.079  0.512  0.729  0.921  1.328 1.001  3000
    beta[2]    0.085   0.025  0.034  0.069  0.085  0.101  0.136 1.001  3000
    gamma0    -0.197   0.465 -1.096 -0.512 -0.195  0.122  0.711 1.002  1600
    deviance  75.686   2.594 72.853 73.845 74.987 76.779 82.738 1.006  1500
    
    For each parameter, n.eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
    
    DIC info (using the rule, pD = var(deviance)/2)
    pD = 3.4 and DIC = 79.0
    DIC is an estimate of expected predictive error (lower deviance is better).
    
    Resid.zip.brm <- residuals(dat.zip.brm, type='pearson', summary=FALSE)
    SSres.zip.brm <- apply(Resid.zip.brm^2,1,sum)
    lambda.zip.brm = fitted(dat.brm, scale='response', summary=FALSE)
    YNew.zip.brm <- matrix(rpois(length(lambda.zip.brm), lambda=lambda.zip.brm),
                           nrow=nrow(lambda.zip.brm))
    
    Resid1.zip.brm<-(lambda.zip.brm - YNew.zip.brm)/sqrt(lambda.zip.brm)
    SSres.sim.zip.brm<-apply(Resid1.zip.brm^2,1,sum)
    mean(SSres.sim.zip.brm>SSres.zip.brm)
    
    [1] 0.482
    

    Conclusions: the Bayesian p-value is very close to 0.5, suggesting that there is a good fit of the model to the data.

  • Since it is difficult to diagnose many issues from the typical residuals we will now explore simulated residuals.
    #extract the samples for the two model parameters
    coefs <- dat.zip.jags$BUGSoutput$sims.matrix[,1:2]
    theta <- dat.zip.jags$BUGSoutput$sims.matrix[,'theta']
    Xmat <- model.matrix(~x, data=dat.zip)
    #expected values on a log scale
    eta<-coefs %*% t(Xmat)
    #expected value on response scale
    lambda <- exp(eta)
    
    simRes <- function(lambda, data,n=250, plot=T, family='negbin', size=NULL,theta=NULL) {
     require(gap)
     N = nrow(data)
     sim = switch(family,
        'poisson' = matrix(rpois(n*N,apply(lambda,2,mean)),ncol=N, byrow=TRUE),
        'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2,mean),size),ncol=N, byrow=TRUE),
            'zip' = matrix(gamlss.dist:::rZIP(n*N,apply(lambda,2,mean),theta),ncol=N, byrow=TRUE)
     )
     a = apply(sim + runif(n,-0.5,0.5),2,ecdf)
     resid<-NULL
     for (i in 1:nrow(data)) resid<-c(resid,a[[i]](data$y[i] + runif(1 ,-0.5,0.5)))
     if (plot==T) {
       par(mfrow=c(1,2))
       gap::qqunif(resid,pch = 2, bty = "n",
       logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals",
       cex.main = 1, las=1)
       plot(resid~apply(lambda,2,mean), xlab='Predicted value', ylab='Standardized residual', las=1)
     }
     resid
    }
    
    simRes(lambda,dat.zip, family='zip',theta=theta)
    
    plot of chunk tut10.6bS5.13simulatedResiduals
     [1] 0.272 0.856 0.480 0.828 0.944 0.468 0.828 0.932 0.616 0.164 0.348 0.268 0.384 0.508 0.340 0.196 0.848 0.284 0.124 0.828
    
    The trend (black symbols) in the qq-plot does not appear to be overly non-linear (matching the ideal red line well), suggesting that the model is not overdispersed. The spread of standardized (simulated) residuals in the residual plot do not appear overly non-uniform. That is there is not trend in the residuals. Furthermore, there is not a concentration of points close to 1 or 0 (which would imply overdispersion). Hence, once zero-inflation is accounted for, the model does not display overdispersion.

    Although there is a slight hint of non-linearity in that the residuals are high for low and high fitted values and lower in the middle, this might well be an artifact of the small data set size. By change, most of the observed values in the middle range of the predictor were zero.

  • lambda.zip.brm = exp(fitted(dat.zip.brm, scale='linear', summary=FALSE))[,1:nrow(dat.zip)]
    theta = binomial()$linkinv(mean(rstan:::extract(dat.zip.brm$fit, 'b_spec')[[1]]))
    simRes <- function(lambda, data,n=250, plot=T, family='negbin', size=NULL,theta=NULL) {
     require(gap)
     N = nrow(data)
     sim = switch(family,
        'poisson' = matrix(rpois(n*N,apply(lambda,2,mean)),ncol=N, byrow=TRUE),
        'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2,mean),size),ncol=N, byrow=TRUE),
            'zip' = matrix(gamlss.dist:::rZIP(n*N,apply(lambda,2,mean),theta),ncol=N, byrow=TRUE)
     )
     a = apply(sim + runif(n,-0.5,0.5),2,ecdf)
     resid<-NULL
     for (i in 1:nrow(data)) resid<-c(resid,a[[i]](data$y[i] + runif(1 ,-0.5,0.5)))
     if (plot==T) {
       par(mfrow=c(1,2))
       gap::qqunif(resid,pch = 2, bty = "n",
       logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals",
       cex.main = 1, las=1)
       plot(resid~apply(lambda,2,mean), xlab='Predicted value', ylab='Standardized residual', las=1)
     }
     resid
    }
    
    simRes(lambda.zip.brm, dat.zip, family='zip', theta=theta)
    
    plot of chunk tut11.5bS5.13BRMSsimulatedResiduals
     [1] 0.332 0.820 0.592 0.892 0.940 0.548 0.732 0.900 0.648 0.396 0.272 0.256 0.204 0.464 0.024 0.236 0.764 0.180 0.256 0.824
    

Exploring the model parameters, test hypotheses

If there was any evidence that the assumptions had been violated or the model was not an appropriate fit, then we would need to reconsider the model and start the process again. In this case, there is no evidence that the test will be unreliable so we can proceed to explore the test statistics.

As with most Bayesian models, it is best to base conclusions on medians rather than means.

print(dat.zip.jags)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS5.5.bug", fit using jags,
 3 chains, each with 1e+05 iterations (first 50000 discarded), n.thin = 50
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta0      0.706   0.319  0.043  0.494  0.716  0.928  1.301 1.001  2800
beta1      0.086   0.026  0.037  0.069  0.086  0.103  0.137 1.001  2300
gamma0    -0.220   0.466 -1.127 -0.541 -0.212  0.107  0.670 1.002  1800
theta      0.448   0.110  0.245  0.368  0.447  0.527  0.661 1.001  2000
deviance  75.685   2.453 72.827 73.883 75.054 76.878 81.846 1.001  2300

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.0 and DIC = 78.7
DIC is an estimate of expected predictive error (lower deviance is better).
library(plyr)
adply(dat.zip.jags$BUGSoutput$sims.matrix, 2, function(x) {
  data.frame(Median=median(x), Mean=mean(x), HPDinterval(as.mcmc(x)), HPDinterval(as.mcmc(x),p=0.5))
})
        X1   Median     Mean    lower   upper  lower.1 upper.1
1    beta0  0.71587  0.70638  0.06835  1.3198  0.57561  0.9971
2    beta1  0.08556  0.08604  0.03607  0.1359  0.07058  0.1042
3 deviance 75.05425 75.68540 72.64080 80.6794 73.02722 75.3436
4   gamma0 -0.21225 -0.21991 -1.14123  0.6553 -0.49398  0.1432
5    theta  0.44714  0.44798  0.24209  0.6582  0.37853  0.5353

Actually, many find it more palatable to express the estimates in the original scale of the observations rather than on a log scale.

library(plyr)
adply(exp(dat.zip.jags$BUGSoutput$sims.matrix[,1:2]), 2, function(x) {
  data.frame(Median=median(x), Mean=mean(x), HPDinterval(as.mcmc(x)), HPDinterval(as.mcmc(x),p=0.5))
})
     X1 Median Mean  lower upper lower.1 upper.1
1 beta0  2.046 2.13 0.9547 3.495   1.536   2.392
2 beta1  1.089 1.09 1.0351 1.144   1.073   1.110

Conclusions: We would reject the null hypothesis of no effect of $x$ on $y$. An increase in x is associated with a significant linear increase (positive slope) in the abundance of $y$. Every 1 unit increase in $x$ results in a log 0.0855612 unit increase in $y$. We usually express this in terms of abundance rather than log abundance, so every 1 unit increase in $x$ results in a ($e^{0.0855612}=1.0893282$) 1.0893282 unit increase in the abundance of $y$.

summary(dat.zip.brm)
 Family: zero_inflated_poisson (log) 
Formula: y ~ 0 + spec + main + main:x 
   Data: dat.zip (Number of observations: 20) 
Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 2; 
         total post-warmup samples = 1500
   WAIC: Not computed
 
Fixed Effects: 
       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
spec      -0.25      0.49    -1.27     0.65       1060    1
main       0.70      0.32     0.08     1.30        896    1
main:x     0.09      0.03     0.04     0.13        877    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(coef(dat.zip.brm))
            mean
spec   0.7810716
main   2.0043001
main:x 1.0907019
coefs.zip.brm <- as.matrix(as.data.frame(rstan:::extract(dat.zip.brm$fit)))
coefs.zip.brm <- coefs.zip.brm[,grep('b', colnames(coefs.zip.brm))]
plyr:::adply(exp(coefs.zip.brm), 2, function(x) {
  data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
  })
        X1      Mean    median     lower    upper
1   b_spec 0.8761436 0.7914704 0.2026179 1.690390
2   b_main 2.1094730 2.0088827 0.9788579 3.463095
3 b_main.x 1.0910609 1.0910342 1.0360825 1.141645
marginal_effects(dat.zip.brm)
plot of chunk tut11.5bS5.15BRMS

Further explorations of the trends

A measure of the strength of the relationship can be obtained according to: $$R^2 = 1 - \frac{RSS_{model}}{RSS_{null}}$$

Alternatively, we could use McFadden's psuedo $$R^2 = 1-\frac{\mathcal{LL}(Model_{full})}{\mathcal{LL}(Model_{reduced})}$$ [[http://www.statisticalhorizons.com/category/uncategorized]] - about a quarter of the way down the page.

Xmat <- model.matrix(~x, dat=dat.zip)
#expected values on a log scale
neta<-coefs %*% t(Xmat)
#expected value on response scale
eta <- exp(neta)
lambda <- sweep(eta,2,ifelse(dat.zip$y==0,0,1),'*')
theta <- dat.zip.jags$BUGSoutput$sims.matrix[,'theta']
expY <- sweep(lambda,2,1-theta,'*')
#calculate the raw SS residuals
SSres <- apply((-1*(sweep(expY,2,dat.zip$y,'-')))^2,1,sum)
mean(SSres)
[1] 112.5743
SSres.null <- sum((dat.zip$y - mean(dat.zip$y))^2)
#calculate the model r2
1-mean(SSres)/SSres.null
[1] 0.4978847

Conclusions: 49.79% of the variation in $y$ abundance can be explained by its relationship with $x$.

Xmat <- model.matrix(~x, dat.zip)
nX <- ncol(Xmat)
dat.zip.list2 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), nX=nX,
                      mu=rep(0,nX),Sigma=diag(1.0E-06,nX), z=ifelse(y==0,0,1)))
modelString="
model {
  for (i in 1:N) {
     z[i] ~ dbern(one.minus.theta)
     Y[i] ~ dpois(lambda[i])
     lambda[i] <- z[i]*eta[i]
     log(eta[i]) <- max(-20,min(20,inprod(beta[], X[i,])))

	 expY[i] <- lambda[i]*(1-theta)
     res[i] <- Y[i] - expY[i]
     resnull[i] <- Y[i] - meanY
  }
  one.minus.theta <- 1-theta
  logit(theta) <- gamma0
  gamma0 ~ dnorm(0,1.0E-06)
  beta ~ dmnorm(mu[],Sigma[,])

  meanY <- mean(Y)
  RSS <- sum(res^2)
  RSSnull <- sum(resnull^2)
  r2 <- 1-RSS/RSSnull
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS5.17.bug')
library(R2jags)
system.time(
dat.ZIP.jags4 <- jags(model='../downloads/BUGSscripts/tut11.5bS5.17.bug', data=dat.zip.list2, inits=NULL,
          param=c('beta','r2'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 277

Initializing model
   user  system elapsed 
  1.072   0.000   1.075 
print(dat.ZIP.jags4)
Inference for Bugs model at "../downloads/BUGSscripts/tut11.5bS5.17.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta[1]    0.724   0.313  0.064  0.522  0.737  0.941  1.304 1.003   880
beta[2]    0.085   0.025  0.037  0.067  0.084  0.101  0.134 1.002  1200
r2         0.492   0.176  0.113  0.377  0.504  0.622  0.787 1.002  1800
deviance  75.612   2.516 72.833 73.816 74.930 76.635 82.063 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.2 and DIC = 78.8
DIC is an estimate of expected predictive error (lower deviance is better).
Xmat <- model.matrix(~x, dat=dat.zip)
#expected values on a log scale
coefs.zip <- as.matrix(as.data.frame(rstan:::extract(dat.zip.brm$fit)))
coefs.zip <- coefs.zip[,grep('b.main', colnames(coefs.zip))]
neta<-coefs.zip %*% t(Xmat)
#expected value on response scale
eta <- exp(neta)

lambda <- sweep(eta,2,ifelse(dat.zip$y==0,0,1),'*')
theta <- dat.zip.jags$BUGSoutput$sims.matrix[,'theta']
expY <- sweep(lambda,2,1-theta,'*')
#calculate the raw SS residuals
SSres <- apply((-1*(sweep(expY,2,dat.zip$y,'-')))^2,1,sum)

SSres.null <- sum((dat.zip$y - mean(dat.zip$y))^2)
#OR 
SSres.null <- crossprod(dat.zip$y - mean(dat.zip$y))
#calculate the model r2
1-mean(SSres)/SSres.null
         [,1]
[1,] 0.497461

Finally, we will create a summary plot.

par(mar = c(4, 5, 0, 0))
plot(y ~ x, data = dat.zip, type = "n", ann = F, axes = F)
points(y ~ x, data = dat.zip, pch = 16)
xs <- seq(min(dat.zip$x,na.rm=TRUE),max(dat.zip$x,na.rm=TRUE), l = 1000)
Xmat <- model.matrix(~xs)
eta<-coefs %*% t(Xmat)
ys <- exp(eta)
library(plyr)
library(coda)
data.tab <- adply(ys,2,function(x) {
  data.frame(Median=median(x), HPDinterval(as.mcmc(x)))
})
data.tab <- cbind(x=xs,data.tab)
points(Median ~ x, data=data.tab,col = "black", type = "l")
lines(lower ~ x, data=data.tab,col = "black", type = "l", lty = 2)
lines(upper ~ x, data=data.tab,col = "black", type = "l", lty = 2)

axis(1)
mtext("X", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Abundance of Y", 2, cex = 1.5, line = 3)
box(bty = "l")
plot of chunk tut11.5bS5.18
newdata = data.frame(x=seq(min(dat.zip$x), max(dat.zip$x), len=100))
Xmat = model.matrix(~x, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(dat.zip.brm$fit)))
coefs <- coefs[,grep('b_main', colnames(coefs))]
fit = exp(coefs %*% t(Xmat))
newdata = cbind(newdata,
                plyr:::adply(fit, 2, function(x) {
                                  data.frame(Mean=mean(x), Median=median(x),
                                             HPDinterval(as.mcmc(x)))
                                })
)

ggplot(newdata, aes(y=Mean, x=x)) +
  geom_point(data=dat.zip, aes(y=y)) +
  geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue',alpha=0.2) +
  geom_line() +
  scale_x_continuous('X') +
  scale_y_continuous('Abundance of Y') +
  theme_classic() +
  theme(axis.line.x=element_line(),axis.line.y=element_line())
plot of chunk tut11.5bS1.18BRMS

Defining full log-likelihood function

Now lets try it by specifying log-likelihood and the zero trick. When applying this trick, we need to manually calculate the deviance as the inbuilt deviance will be based on the log-likelihood of estimating the zeros (as part of the zero trick) rather than the deviance of the intended model..

The one advantage of the zero trick is that the Deviance and thus DIC, AIC provided by R2jags will be incorrect. Hence, they too need to be manually defined within jags. I suspect that the AIC calculation I have used is incorrect...

Xmat <- model.matrix(~x, dat.zip)
nX <- ncol(Xmat)
dat.zip.list2 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), mu=rep(0,nX),
                  Sigma=diag(1.0E-06,nX), zeros=rep(0,nrow(dat)), C=10000))
modelString="
model {
  for (i in 1:N) {
     zeros[i] ~ dpois(zeros.lambda[i])
     zeros.lambda[i] <- -ll[i] + C     
     ll[i] <- Y[i]*log(lambda[i]) - lambda[i] - loggam(Y[i]+1)
     eta[i] <- inprod(beta[], X[i,])
     log(lambda[i]) <- eta[i]
    llm[i] <- Y[i]*log(meanlambda) - meanlambda - loggam(Y[i]+1)
  }
  meanlambda <- mean(lambda)
  beta ~ dmnorm(mu[],Sigma[,])
  dev <- sum(-2*ll)
  pD <- mean(dev)-sum(-2*llm)
  AIC <- min(dev+(2*pD))
} 
"
writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS1.42.bug')
library(R2jags)
system.time(
dat.ZIP.jags3 <- jags(model=textConnection(modelString),
          data=dat.zip.list2, inits=NULL,
          param=c('beta','dev','AIC'),
          n.chain=3,
          n.iter=50000, n.thin=50, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 328

Initializing model
   user  system elapsed 
  1.832   0.000   1.839 
print(dat.ZIP.jags3)
Inference for Bugs model at "5", fit using jags,
 3 chains, each with 50000 iterations (first 10000 discarded), n.thin = 50
 n.sims = 2400 iterations saved
           mu.vect sd.vect       2.5%       25%       50%       75%     97.5%  Rhat n.eff
AIC      1.215e+02   4.532    117.557 1.185e+02 1.200e+02 1.229e+02 1.336e+02 1.001  2400
beta[1]  6.440e-01   0.336     -0.045 4.270e-01 6.580e-01 8.750e-01 1.252e+00 1.005   480
beta[2]  3.000e-02   0.027     -0.022 1.100e-02 3.000e-02 4.700e-02 8.600e-02 1.005   480
dev      1.220e+02   2.038    120.044 1.206e+02 1.214e+02 1.228e+02 1.274e+02 1.001  2400
deviance 4.001e+05   2.038 400120.044 4.001e+05 4.001e+05 4.001e+05 4.001e+05 1.000     1

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.1 and DIC = 400124.1
DIC is an estimate of expected predictive error (lower deviance is better).

Zero-inflated negative binomial (ZINB)

Scenario and Data

Lets say we wanted to model the abundance of an item ($y$) against a continuous predictor ($x$). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Random data incorporating the following trends (effect parameters)
  • the sample size = 20
  • the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
  • the rate of change in log $y$ per unit of $x$ (slope) = 0.1.
  • the value of $x$ when log$y$ equals 0 (when $y$=1)
  • to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
  • finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.
set.seed(37) #34.5  #4 #10 #16 #17 #26
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1, max =20))
mm <- model.matrix(~x)
intercept <- 0.6
slope=0.1
#The linear predictor
linpred <- mm %*% c(intercept,slope)
#Predicted y values
lambda <- exp(linpred)
#Add some noise and make binomial
library(gamlss.dist)
#fixed latent binomial
y<- rZINBI(n.x,lambda, 0.4)
#latent binomial influenced by the linear predictor 
#y<- rZINB(n.x,lambda, 1-exp(linpred)/(1+exp(linpred)))
dat.zinb <- data.frame(y,x)

summary(dat.glm.nb<-glm.nb(y~x, dat.zinb))
Call:
glm.nb(formula = y ~ x, data = dat.zinb, init.theta = 0.4646673144, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3578  -1.3455  -0.5069   0.3790   1.1809  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.914191   0.796804   1.147    0.251
x           0.009149   0.067713   0.135    0.893

(Dispersion parameter for Negative Binomial(0.4647) family taken to be 1)

    Null deviance: 20.303  on 19  degrees of freedom
Residual deviance: 20.282  on 18  degrees of freedom
AIC: 90.365

Number of Fisher Scoring iterations: 1


              Theta:  0.465 
          Std. Err.:  0.218 

 2 x log-likelihood:  -84.365 
plot(glm.nb(y~x, dat.zinb))
plot of chunk tut11.5bS6.1
plot of chunk tut11.5bS6.1
plot of chunk tut11.5bS6.1
plot of chunk tut11.5bS6.1
library(pscl)
summary(dat.zeroinfl<-zeroinfl(y ~ x | 1, dist = "negbin", data = dat.zinb))
Call:
zeroinfl(formula = y ~ x | 1, data = dat.zinb, dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-0.9609 -0.9268 -0.4446  1.0425  1.7556 

Count model coefficients (negbin with log link):
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  0.92733    0.32507   2.853  0.00433 **
x            0.06870    0.02755   2.494  0.01263 * 
Log(theta)   3.36066    3.59739   0.934  0.35020   

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.2250     0.4559  -0.494    0.622
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 28.8082 
Number of iterations in BFGS optimization: 17 
Log-likelihood: -38.54 on 4 Df
plot(resid(zeroinfl(y ~ x | 1, dist = "negbin", data = dat.zinb))~fitted(zeroinfl(y ~ x | 1, dist = "negbin", data = dat.zinb)))
plot of chunk tut11.5bS6.1
vuong(dat.glm.nb, dat.zeroinfl)
Vuong Non-Nested Hypothesis Test-Statistic: 
(test-statistic is asymptotically distributed N(0,1) under the
 null that the models are indistinguishible)
-------------------------------------------------------------
              Vuong z-statistic             H_A p-value
Raw                  -1.2809521 model2 > model1 0.10011
AIC-corrected        -0.9296587 model2 > model1 0.17627
BIC-corrected        -0.7547616 model2 > model1 0.22520
library(gamlss)
summary(gamlss(y~x, data=dat.zinb, family='ZINBI'))
GAMLSS-RS iteration 1: Global Deviance = 81.436 
GAMLSS-RS iteration 2: Global Deviance = 78.1917 
GAMLSS-RS iteration 3: Global Deviance = 77.0798 
GAMLSS-RS iteration 4: Global Deviance = 77.0726 
GAMLSS-RS iteration 5: Global Deviance = 77.0725 
*******************************************************************
Family:  c("ZINBI", "Zero inflated negative binomial type I") 

Call:  gamlss(formula = y ~ x, family = "ZINBI", data = dat.zinb) 

Fitting method: RS() 

-------------------------------------------------------------------
Mu link function:  log
Mu Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.92653    0.32502   2.851   0.0116 *
x            0.06880    0.02753   2.499   0.0237 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

-------------------------------------------------------------------
Sigma link function:  log
Sigma Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -3.363      3.603  -0.933    0.365

-------------------------------------------------------------------
Nu link function:  logit 
Nu Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.2250     0.4559  -0.494    0.628

-------------------------------------------------------------------
No. of observations in the fit:  20 
Degrees of Freedom for the fit:  4
      Residual Deg. of Freedom:  16 
                      at cycle:  5 
 
Global Deviance:     77.0725 
            AIC:     85.0725 
            SBC:     89.05543 
*******************************************************************
summary(gamlss(y~x, nu.fo=y~x,data=dat.zinb, family='ZINBI'))
GAMLSS-RS iteration 1: Global Deviance = 78.2478 
GAMLSS-RS iteration 2: Global Deviance = 74.2622 
GAMLSS-RS iteration 3: Global Deviance = 73.8329 
GAMLSS-RS iteration 4: Global Deviance = 73.8305 
GAMLSS-RS iteration 5: Global Deviance = 73.8305 
*******************************************************************
Family:  c("ZINBI", "Zero inflated negative binomial type I") 

Call:  gamlss(formula = y ~ x, nu.formula = y ~ x, family = "ZINBI",      data = dat.zinb) 

Fitting method: RS() 

-------------------------------------------------------------------
Mu link function:  log
Mu Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.84246    0.35267   2.389   0.0305 *
x            0.07481    0.02933   2.550   0.0222 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

-------------------------------------------------------------------
Sigma link function:  log
Sigma Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   -2.982      2.844  -1.048    0.311

-------------------------------------------------------------------
Nu link function:  logit 
Nu Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -2.4988     1.8283  -1.367    0.192
x             0.1996     0.1417   1.408    0.179

-------------------------------------------------------------------
No. of observations in the fit:  20 
Degrees of Freedom for the fit:  5
      Residual Deg. of Freedom:  15 
                      at cycle:  5 
 
Global Deviance:     73.83046 
            AIC:     83.83046 
            SBC:     88.80912 
*******************************************************************
library(INLA)
summary(inla(y~x,data=dat.zinb, family='zeroinflatednbinomial1'))
Call:
"inla(formula = y ~ x, family = \"zeroinflatednbinomial1\", data = dat.zinb)"

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.1669          0.0427          0.0285          0.2380 

Fixed effects:
              mean     sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 0.9374 0.7147    -0.4199   0.9152     2.4282 0.8816   0
x           0.0259 0.0625    -0.0982   0.0257     0.1508 0.0254   0

The model has no random effects

Model hyperparameters:
                                                           mean     sd 0.025quant 0.5quant 0.975quant   mode
size for nbinomial zero-inflated observations            0.6180 0.3059     0.1785   0.5699     1.3497 0.4477
zero-probability parameter for zero-inflated nbinomial_1 0.1743 0.1284     0.0142   0.1456     0.4896 0.0431

Expected number of effective parameters(std dev): 2.00(0.00)
Number of equivalent replicates : 9.999 

Marginal log-Likelihood:  -49.48 

Exploratory data analysis and initial assumption checking

The assumptions are:
  1. All of the observations are independent - this must be addressed at the design and collection stages
  2. The response variable (and thus the residuals) should be matched by an appropriate distribution (in the case of positive integers response - a Poisson is appropriate).
  3. All observations are equally influential in determining the trends - or at least no observations are overly influential. This is most effectively diagnosed via residuals and other influence indices and is very difficult to diagnose prior to analysis
  4. the relationship between the linear predictor (right hand side of the regression formula) and the link function should be linear. A scatterplot with smoother can be useful for identifying possible non-linearity.
  5. Dispersion is either 1 or overdispersion is otherwise accounted for in the model
  6. The number of zeros is either not excessive or else they are specifically addressed by the model

Confirm non-normality and explore clumping

Check the distribution of the $y$ abundances

hist(dat.zinb$y)
plot of chunk tut10.6bS6.2
boxplot(dat.zinb$y, horizontal=TRUE)
rug(jitter(dat.zinb$y))
plot of chunk tut10.6bS6.2
There is definitely signs of non-normality that would warrant Poisson or negative binomial models. Further to that, there appears to be a large number of zeros and a possible clumpiness that are likely to be the cause of overdispersion A zero-inflated negative binomial model is likely to be one of the most effective for modeling these data.

Confirm linearity

Lets now explore linearity by creating a histogram of the predictor variable ($x$). Note, it is difficult to directly assess issues of linearity. Indeed, a scatterplot with lowess smoother will be largely influenced by the presence of zeros. One possible way of doing so is to explore the trend in the non-zero data.

hist(dat.zinb$x)
plot of chunk tut10.5bS6.3
#now for the scatterplot
plot(y~x, dat.zinb, log="y")
with(subset(dat.zinb,y>0), lines(lowess(y~x)))
plot of chunk tut10.5bS6.3

Conclusions: the predictor ($x$) does not display any skewness or other issues that might lead to non-linearity. The lowess smoother on the non-zero data cloud does not display major deviations from a straight line and thus linearity is likely to be satisfied. Violations of linearity (whilst difficult to be certain about due to the unknown influence of the zeros) could be addressed by either:

  • define a non-linear linear predictor (such as a polynomial, spline or other non-linear function)
  • transform the scale of the predictor variables

Explore zero inflation

Although we have already established that there are few zeros in the data (and thus overdispersion is unlikely to be an issue), we can also explore this by comparing the number of zeros in the data to the number of zeros that would be expected from a Poisson distribution with a mean equal to the mean count of the data.

#proportion of 0's in the data
dat.zinb.tab<-table(dat.zinb$y==0)
dat.zinb.tab/sum(dat.zinb.tab)
FALSE  TRUE 
 0.55  0.45 
#proportion of 0's expected from a Poisson distribution
mu <- mean(dat.zinb$y)
v <- var(dat.zinb$y)
size <- mu + (mu^2)/v
cnts <- rnbinom(1000, mu=mu, size=size)
dat.zinb.tabE <- table(cnts == 0)
dat.zinb.tabE/sum(dat.zinb.tabE)
FALSE  TRUE 
 0.86  0.14 
In the above, the value under FALSE is the proportion of non-zero values in the data and the value under TRUE is the proportion of zeros in the data. In this example, the proportion of zeros observed (45%) far exceeds that that would have been expected (14%). Hence it is highly likely that any models will be zero-inflated.

Model fitting or statistical analysis

The exploratory data analyses have suggested that a zero-inflated Poisson model might be the most appropriate for these data.

dat.zinb.list <- with(dat.zinb,list(Y=y, X=x,N=nrow(dat.zinb)))
modelString="
model {
  for (i in 1:N) {
     Y[i] ~ dnegbin(p[i],size)
     p[i] <- size/(size+mu[i])
     eta[i] <- beta0 + beta1*X[i]
     log(mu[i]) <- eta[i]
  }
  size ~ dunif(0.001, 5)
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
} 
"

#writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS5.4.bug')
library(R2jags)
system.time(
dat.zinb.jags <- jags(model.file=textConnection(modelString), data=dat.zinb.list, inits=NULL,
          param=c('beta0','beta1', 'size', 'theta'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 148

Initializing model
   user  system elapsed 
 11.001   0.004  11.021 
print(dat.zinb.jags)
Inference for Bugs model at "5", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta0      1.009   0.826 -0.441  0.450  0.953  1.492  2.859 1.001  2300
beta1      0.010   0.069 -0.125 -0.036  0.010  0.052  0.148 1.001  2000
size       0.545   0.297  0.183  0.348  0.478  0.674  1.237 1.002  1500
deviance  87.531   2.572 84.591 85.630 86.886 88.682 94.252 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 3.3 and DIC = 90.8
DIC is an estimate of expected predictive error (lower deviance is better).
#extract the samples for the two model parameters
coefs <- dat.zinb.jags$BUGSoutput$sims.matrix[,1:2]
Xmat <- model.matrix(~x, data=dat)
#expected values on a log scale
eta<-coefs %*% t(Xmat)
#expected value on response scale
lambda <- exp(eta)

Resid <- -1*sweep(lambda,2,dat.zinb$y, '-')/sqrt(lambda)
RSS <- apply(Resid^2, 1, sum)
Disp <- RSS/(nrow(dat.zinb)-ncol(coefs))
data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)), HPDinterval(as.mcmc(Disp),p=0.5))
       Median    Mean    lower    upper  lower.1  upper.1
var1 5.057138 5.85636 3.303402 10.79281 3.304139 5.058315

JAGS

$$ Y_i = \left\{ \begin{array}{lrcl} 0~\text{with}~P(y_i) = 1-\mu & z_i&\sim&\text{Binom}(1-\theta)\\ & logit(\theta)&=&\gamma_0\\ &\gamma_0&\sim{}&\mathcal{N}(0,10000)\\ >0 & Y_i&\sim&\text{NB}(\lambda_i)\\ &\lambda_i&=&z_i + \eta_i\\ &log(\eta_i)&=&\beta_0+\beta_1x_1\\ &\beta_0, \beta_1&\sim{}&\mathcal{N}(0,10000)\\ \end{array} \right. $$ Or in shorthand: $$ \begin{align} Y_i&\sim{}ZINB(\lambda,\theta) & (\text{response distribution})\\ logit(\theta)&=\gamma_0 & (\text{link function/linear predictor - zero component})\\ log(\lambda_i)&=\eta_i & (\text{link function - count component})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor - count component})\\ \beta_0, \beta_1, \gamma_0&\sim{}\mathcal{N}(0,10000) & (\text{diffuse Bayesian priors})\\ \end{align} $$
dat.zinb.list <- with(dat.zinb,list(Y=y, X=x,N=nrow(dat.zinb),z=ifelse(y==0,0,1)))
modelString="
model {
  for (i in 1:N) {
     z[i] ~ dbern(psi.min)
     Y[i] ~ dnegbin(p[i],size)
     p[i] <- size/(size+mu.eff[i])
     mu.eff[i] <- z[i]*mu[i]
     eta[i] <- beta0 + beta1*X[i]
     log(mu[i]) <- eta[i]
  }
  gamma ~ dnorm(0,0.001)
  psi.min <- min(0.9999, max(0.00001, (1-psi)))
  logit(psi) <- max(-20, min(20, gamma))
  size ~ dunif(0.001, 5)
  theta <- pow(1/mean(p),2)
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
} 
"

system.time(
dat.zinb.jags <- jags(model.file=textConnection(modelString), data=dat.zinb.list, inits=NULL,
          param=c('beta0','beta1', 'size', 'theta'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 205

Initializing model
   user  system elapsed 
 12.241   0.000  12.273 
print(dat.zinb.jags)
Inference for Bugs model at "5", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta0      0.961   0.449  0.077  0.661  0.962  1.258  1.851 1.001  2300
beta1      0.067   0.041 -0.009  0.039  0.066  0.095  0.152 1.001  3000
size       3.503   1.010  1.402  2.781  3.647  4.348  4.930 1.001  3000
theta      2.196   0.367  1.718  1.938  2.109  2.363  3.145 1.001  3000
deviance  82.682   2.814 79.127 80.610 82.086 84.011 89.900 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 4.0 and DIC = 86.6
DIC is an estimate of expected predictive error (lower deviance is better).
Xmat <- model.matrix(~x, dat.zip)
nX <- ncol(Xmat)
dat.zip.list1 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), nX=nX, z=ifelse(y==0,0,1)))
modelString="
model {
  for (i in 1:N) {
     z[i] ~ dbern(psi.min)
     Y[i] ~ dnegbin(p[i],size)
     p[i] <- size/(size+mu.eff[i])
     mu.eff[i] <- z[i]*mu[i]
     log(mu[i]) <- inprod(beta[], X[i,])
  }
  gamma ~ dnorm(0,0.001)
  psi.min <- min(0.9999, max(0.00001, (1-psi)))
  logit(psi) <- max(-20, min(20, gamma))
  size ~ dunif(0.001, 5)
  theta <- pow(1/mean(p),2)
  for (i in 1:nX) {
    beta[i] ~ dnorm(0,1.0E-06)
  }
} 
"
library(R2jags)
system.time(
dat.zip.jags1 <- jags(model.file=textConnection(modelString), data=dat.zip.list1, inits=NULL,
          param=c('beta','gamma0'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 227

Initializing model
   user  system elapsed 
 12.389   0.004  12.411 
print(dat.zip.jags1)
Inference for Bugs model at "5", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta[1]    0.753   0.453 -0.143  0.448  0.743  1.069  1.633 1.001  3000
beta[2]    0.084   0.041  0.006  0.056  0.084  0.111  0.167 1.001  3000
deviance  80.373   2.924 76.666 78.234 79.724 81.807 87.799 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 4.3 and DIC = 84.7
DIC is an estimate of expected predictive error (lower deviance is better).

Or arguably better still, use a multivariate normal prior. If we have a $k$ regression parameters ($\beta_k$), then the multivariate normal priors are defined as: $$ \boldsymbol{\beta}\sim{}\mathcal{N_k}(\boldsymbol{\mu}, \mathbf{\Sigma}) $$ where $$\boldsymbol{\mu}=[E[\beta_1],E[\beta_2],...,E[\beta_k]] = \left(\begin{array}{c}0\\\vdots\\0\end{array}\right)$$ $$ \mathbf{\Sigma}=[Cov[\beta_i, \beta_j]] = \left(\begin{array}{ccc}1000^2&0&0\\0&\ddots&0\\0&0&1000^2\end{array} \right) $$ hence, along with the response and predictor matrix, we need to supply $\boldsymbol{\mu}$ (a vector of zeros) and $\boldsymbol{\Sigma}$ (a covariance matrix with $1000^2$ in the diagonals).

Xmat <- model.matrix(~x, dat.zip)
nX <- ncol(Xmat)
dat.zip.list2 <- with(dat.zip,list(Y=y, X=Xmat,N=nrow(dat.zip), nX=nX,
                      mu=rep(0,nX),Sigma=diag(1.0E-06,nX), z=ifelse(y==0,0,1)))
modelString="
model {
  for (i in 1:N) {
     z[i] ~ dbern(psi.min)
     Y[i] ~ dnegbin(p[i],size)
     p[i] <- size/(size+mu.eff[i])
     mu.eff[i] <- z[i]*eta[i]
     log(eta[i]) <- inprod(beta[], X[i,])
  }
  gamma ~ dnorm(0,0.001)
  psi.min <- min(0.9999, max(0.00001, (1-psi)))
  logit(psi) <- max(-20, min(20, gamma))
  size ~ dunif(0.001, 5)
  theta <- pow(1/mean(p),2)
  beta ~ dmnorm(mu[],Sigma[,])
} 
"
library(R2jags)
system.time(
dat.zip.jags2 <- jags(model.file=textConnection(modelString), data=dat.zip.list2, inits=NULL,
          param=c('beta','gamma','size','theta'),
          n.chain=3,
          n.iter=20000, n.thin=10, n.burnin=10000)
  )
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 231

Initializing model
   user  system elapsed 
  5.220   0.004   5.234 
print(dat.zip.jags2)
Inference for Bugs model at "5", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta[1]    0.757   0.453 -0.155  0.453  0.772  1.063  1.620 1.001  3000
beta[2]    0.084   0.041  0.007  0.056  0.084  0.112  0.168 1.001  3000
gamma     -0.201   0.455 -1.131 -0.505 -0.186  0.104  0.655 1.003   940
size       3.661   0.954  1.563  3.006  3.830  4.462  4.946 1.002  1300
theta      2.104   0.320  1.673  1.884  2.039  2.249  2.938 1.003   850
deviance  80.435   2.902 76.659 78.327 79.800 81.843 87.741 1.002  1300

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 4.2 and DIC = 84.6
DIC is an estimate of expected predictive error (lower deviance is better).
dat.zinb.brm <- brm(y~0+spec + main + main:x, data=dat.zinb, family='zero_inflated_negbinomial',
  prior = c(set_prior("normal(0,10)", class="b", coef="main"),
            set_prior("normal(0,100)", class="b", coef="spec"),
            set_prior("student_t(3,0,5)", class='shape')),
  chains=3, iter=5000, warmup=2500, thin=2)
SAMPLING FOR MODEL 'zero_inflated_negbinomial(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 5000 [  0%]  (Warmup)
Chain 1, Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 1, Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 1, Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 1, Iteration: 2000 / 5000 [ 40%]  (Warmup)
Chain 1, Iteration: 2500 / 5000 [ 50%]  (Warmup)
Chain 1, Iteration: 2501 / 5000 [ 50%]  (Sampling)
Chain 1, Iteration: 3000 / 5000 [ 60%]  (Sampling)
Chain 1, Iteration: 3500 / 5000 [ 70%]  (Sampling)
Chain 1, Iteration: 4000 / 5000 [ 80%]  (Sampling)
Chain 1, Iteration: 4500 / 5000 [ 90%]  (Sampling)
Chain 1, Iteration: 5000 / 5000 [100%]  (Sampling)# 
#  Elapsed Time: 1.52902 seconds (Warm-up)
#                0.555669 seconds (Sampling)
#                2.08469 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_negbinomial(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 5000 [  0%]  (Warmup)
Chain 2, Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 2, Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 2, Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 2, Iteration: 2000 / 5000 [ 40%]  (Warmup)
Chain 2, Iteration: 2500 / 5000 [ 50%]  (Warmup)
Chain 2, Iteration: 2501 / 5000 [ 50%]  (Sampling)
Chain 2, Iteration: 3000 / 5000 [ 60%]  (Sampling)
Chain 2, Iteration: 3500 / 5000 [ 70%]  (Sampling)
Chain 2, Iteration: 4000 / 5000 [ 80%]  (Sampling)
Chain 2, Iteration: 4500 / 5000 [ 90%]  (Sampling)
Chain 2, Iteration: 5000 / 5000 [100%]  (Sampling)# 
#  Elapsed Time: 1.33141 seconds (Warm-up)
#                0.635069 seconds (Sampling)
#                1.96648 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_negbinomial(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 5000 [  0%]  (Warmup)
Chain 3, Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 3, Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 3, Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 3, Iteration: 2000 / 5000 [ 40%]  (Warmup)
Chain 3, Iteration: 2500 / 5000 [ 50%]  (Warmup)
Chain 3, Iteration: 2501 / 5000 [ 50%]  (Sampling)
Chain 3, Iteration: 3000 / 5000 [ 60%]  (Sampling)
Chain 3, Iteration: 3500 / 5000 [ 70%]  (Sampling)
Chain 3, Iteration: 4000 / 5000 [ 80%]  (Sampling)
Chain 3, Iteration: 4500 / 5000 [ 90%]  (Sampling)
Chain 3, Iteration: 5000 / 5000 [100%]  (Sampling)# 
#  Elapsed Time: 1.19223 seconds (Warm-up)
#                2.45475 seconds (Sampling)
#                3.64698 seconds (Total)
# 
## Or if we allow the zero-inflation model and the count model to vary with x
dat.zinb.brm1 <- brm(y~trait:x, data=dat.zinb, family='zero_inflated_negbinomial',
  prior = c(set_prior("normal(0,100)", class="Intercept"),
            set_prior("normal(0,10)", class="b", coef="traity:x"),
            set_prior("normal(0,10)", class="b", coef="traitzi_y:x"),
            set_prior("student_t(3,0,5)", class='shape')),
  chains=3, iter=5000, warmup=2500, thin=2)
SAMPLING FOR MODEL 'zero_inflated_negbinomial(log) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 5000 [  0%]  (Warmup)
Chain 1, Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 1, Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 1, Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 1, Iteration: 2000 / 5000 [ 40%]  (Warmup)
Chain 1, Iteration: 2500 / 5000 [ 50%]  (Warmup)
Chain 1, Iteration: 2501 / 5000 [ 50%]  (Sampling)
Chain 1, Iteration: 3000 / 5000 [ 60%]  (Sampling)
Chain 1, Iteration: 3500 / 5000 [ 70%]  (Sampling)
Chain 1, Iteration: 4000 / 5000 [ 80%]  (Sampling)
Chain 1, Iteration: 4500 / 5000 [ 90%]  (Sampling)
Chain 1, Iteration: 5000 / 5000 [100%]  (Sampling)# 
#  Elapsed Time: 8.43611 seconds (Warm-up)
#                12.464 seconds (Sampling)
#                20.9001 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_negbinomial(log) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 5000 [  0%]  (Warmup)
Chain 2, Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 2, Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 2, Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 2, Iteration: 2000 / 5000 [ 40%]  (Warmup)
Chain 2, Iteration: 2500 / 5000 [ 50%]  (Warmup)
Chain 2, Iteration: 2501 / 5000 [ 50%]  (Sampling)
Chain 2, Iteration: 3000 / 5000 [ 60%]  (Sampling)
Chain 2, Iteration: 3500 / 5000 [ 70%]  (Sampling)
Chain 2, Iteration: 4000 / 5000 [ 80%]  (Sampling)
Chain 2, Iteration: 4500 / 5000 [ 90%]  (Sampling)
Chain 2, Iteration: 5000 / 5000 [100%]  (Sampling)# 
#  Elapsed Time: 5.60244 seconds (Warm-up)
#                8.82955 seconds (Sampling)
#                14.432 seconds (Total)
# 

SAMPLING FOR MODEL 'zero_inflated_negbinomial(log) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 5000 [  0%]  (Warmup)
Chain 3, Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 3, Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 3, Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 3, Iteration: 2000 / 5000 [ 40%]  (Warmup)
Chain 3, Iteration: 2500 / 5000 [ 50%]  (Warmup)
Chain 3, Iteration: 2501 / 5000 [ 50%]  (Sampling)
Chain 3, Iteration: 3000 / 5000 [ 60%]  (Sampling)
Chain 3, Iteration: 3500 / 5000 [ 70%]  (Sampling)
Chain 3, Iteration: 4000 / 5000 [ 80%]  (Sampling)
Chain 3, Iteration: 4500 / 5000 [ 90%]  (Sampling)
Chain 3, Iteration: 5000 / 5000 [100%]  (Sampling)# 
#  Elapsed Time: 8.60118 seconds (Warm-up)
#                18.1576 seconds (Sampling)
#                26.7588 seconds (Total)
# 

Chain mixing and Model validation

Prior to exploring the model parameters, it is prudent to confirm that the model did indeed fit the assumptions and was an appropriate fit to the data as well as that the MCMC sampling chain was adequately mixed and the retained samples independent.

  • We will start by exploring the mixing of the MCMC chains via traceplots.
    plot(as.mcmc(dat.zinb.jags))
    
    plot of chunk tut10.6bS6.8
    plot of chunk tut10.6bS6.8
    library(gridExtra)
    grid.arrange(stan_trace(dat.zinb.brm$fit, ncol=1),
                 stan_dens(dat.zinb.brm$fit, separate_chains=TRUE,ncol=1),
                 ncol=2)
    
    plot of chunk tut10.6bS6.8BRMS

    The chains appear well mixed and stable

  • Next we will explore correlation amongst MCMC samples.
    autocorr.diag(as.mcmc(dat.zinb.jags))
    
                   beta0       beta1     deviance         size       theta
    Lag 0    1.000000000 1.000000000  1.000000000  1.000000000  1.00000000
    Lag 10   0.157707113 0.164120975  0.046293478  0.005759824 -0.01817507
    Lag 50  -0.000258746 0.023917373 -0.002086921  0.005145297 -0.01124115
    Lag 100  0.001276269 0.002424773 -0.017995644 -0.035551481 -0.03951204
    Lag 500 -0.007191282 0.005616019 -0.021469518 -0.044121841 -0.05450130
    

    The level of auto-correlation at the nominated lag of 10 is higher than we would generally like. It is worth increasing the thinning rate from 10 to 50. Obviously, to support this higher thinning rate, we would also increase the number of iterations.

    library(R2jags)
    modelString="
    model {
      for (i in 1:N) {
         z[i] ~ dbern(psi.min)
         Y[i] ~ dnegbin(p[i],size)
         p[i] <- size/(size+mu.eff[i])
         mu.eff[i] <- z[i]*mu[i]
         eta[i] <- beta0 + beta1*X[i]
         log(mu[i]) <- eta[i]
      }
      gamma ~ dnorm(0,0.001)
      psi.min <- min(0.9999, max(0.00001, (1-psi)))
      logit(psi) <- max(-20, min(20, gamma))
      size ~ dunif(0.001, 5)
      theta <- pow(1/mean(p),2)
      beta0 ~ dnorm(0,1.0E-06)
      beta1 ~ dnorm(0,1.0E-06)
    } 
    "
    dat.zinb.jags <- jags(data=dat.zinb.list,model.file=textConnection(modelString),
                       param=c('beta0','beta1','gamma','theta','psi'),
                       n.chains=3, n.iter=100000, n.burnin=50000, n.thin=50)
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
       Graph Size: 205
    
    Initializing model
    
    print(dat.zinb.jags)
    
    Inference for Bugs model at "5", fit using jags,
     3 chains, each with 1e+05 iterations (first 50000 discarded), n.thin = 50
     n.sims = 3000 iterations saved
             mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
    beta0      0.973   0.453  0.058  0.679  0.961  1.269  1.882 1.001  3000
    beta1      0.066   0.042 -0.014  0.037  0.065  0.094  0.151 1.001  3000
    gamma     -0.210   0.462 -1.133 -0.521 -0.202  0.090  0.674 1.002  1200
    psi        0.450   0.109  0.244  0.373  0.450  0.522  0.662 1.003  1000
    theta      2.190   0.363  1.724  1.939  2.102  2.360  3.133 1.001  3000
    deviance  82.759   2.824 79.157 80.658 82.132 84.199 89.605 1.001  3000
    
    For each parameter, n.eff is a crude measure of effective sample size,
    and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
    
    DIC info (using the rule, pD = var(deviance)/2)
    pD = 4.0 and DIC = 86.7
    DIC is an estimate of expected predictive error (lower deviance is better).
    
    plot(as.mcmc(dat.zinb.jags))
    
    plot of chunk tut10.6bS6.10
    plot of chunk tut10.6bS6.10
    autocorr.diag(as.mcmc(dat.zinb.jags))
    
                    beta0         beta1     deviance        gamma          psi         theta
    Lag 0     1.000000000  1.0000000000  1.000000000  1.000000000  1.000000000  1.0000000000
    Lag 50    0.008803828  0.0071072063  0.009745184  0.041114460  0.041195791 -0.0009151277
    Lag 250   0.003371794  0.0007720164 -0.006544352  0.012836461  0.014694728  0.0298401324
    Lag 500  -0.027820267 -0.0299358791  0.004795827 -0.005996009 -0.007053143  0.0094320909
    Lag 2500  0.015181957  0.0200111646 -0.024418362 -0.019640184 -0.020398216 -0.0171605926
    
    stan_ac(dat.zinb.brm$fit)
    
    plot of chunk tut10.6bS6.10BRMS
    #these samples are highly correlated, lets thin more...
    dat.zinb.brm <- brm(y~0+spec + main + main:x, data=dat.zinb, family='zero_inflated_negbinomial',
      prior = c(set_prior("normal(0,10)", class="b", coef="main"),
                set_prior("normal(0,100)", class="b", coef="spec"),
                set_prior("student_t(3,0,5)", class='shape')),
      chains=3, iter=5000, warmup=2500, thin=5)
    
    SAMPLING FOR MODEL 'zero_inflated_negbinomial(log) brms-model' NOW (CHAIN 1).
    
    Chain 1, Iteration:    1 / 5000 [  0%]  (Warmup)
    Chain 1, Iteration:  500 / 5000 [ 10%]  (Warmup)
    Chain 1, Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Chain 1, Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Chain 1, Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Chain 1, Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Chain 1, Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Chain 1, Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Chain 1, Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Chain 1, Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Chain 1, Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Chain 1, Iteration: 5000 / 5000 [100%]  (Sampling)# 
    #  Elapsed Time: 1.51449 seconds (Warm-up)
    #                0.541013 seconds (Sampling)
    #                2.05551 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'zero_inflated_negbinomial(log) brms-model' NOW (CHAIN 2).
    
    Chain 2, Iteration:    1 / 5000 [  0%]  (Warmup)
    Chain 2, Iteration:  500 / 5000 [ 10%]  (Warmup)
    Chain 2, Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Chain 2, Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Chain 2, Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Chain 2, Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Chain 2, Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Chain 2, Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Chain 2, Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Chain 2, Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Chain 2, Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Chain 2, Iteration: 5000 / 5000 [100%]  (Sampling)# 
    #  Elapsed Time: 1.15206 seconds (Warm-up)
    #                0.525064 seconds (Sampling)
    #                1.67712 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'zero_inflated_negbinomial(log) brms-model' NOW (CHAIN 3).
    
    Chain 3, Iteration:    1 / 5000 [  0%]  (Warmup)
    Chain 3, Iteration:  500 / 5000 [ 10%]  (Warmup)
    Chain 3, Iteration: 1000 / 5000 [ 20%]  (Warmup)
    Chain 3, Iteration: 1500 / 5000 [ 30%]  (Warmup)
    Chain 3, Iteration: 2000 / 5000 [ 40%]  (Warmup)
    Chain 3, Iteration: 2500 / 5000 [ 50%]  (Warmup)
    Chain 3, Iteration: 2501 / 5000 [ 50%]  (Sampling)
    Chain 3, Iteration: 3000 / 5000 [ 60%]  (Sampling)
    Chain 3, Iteration: 3500 / 5000 [ 70%]  (Sampling)
    Chain 3, Iteration: 4000 / 5000 [ 80%]  (Sampling)
    Chain 3, Iteration: 4500 / 5000 [ 90%]  (Sampling)
    Chain 3, Iteration: 5000 / 5000 [100%]  (Sampling)# 
    #  Elapsed Time: 1.09119 seconds (Warm-up)
    #                2.19041 seconds (Sampling)
    #                3.2816 seconds (Total)
    # 
    
    stan_ac(dat.zinb.brm$fit)
    
    plot of chunk tut10.6bS6.10BRMS

    Conclusions: the samples are now less auto-correlated and the chains are all well mixed and appear stable.

  • Explore the step size characteristics (STAN only)
    summary(do.call(rbind, args = get_sampler_params(dat.zip.brm$fit, inc_warmup = FALSE)), digits = 2)
    
     accept_stat__    stepsize__    treedepth__   n_leapfrog__  n_divergent__
     Min.   :0.19   Min.   :0.29   Min.   :1.0   Min.   : 1.0   Min.   :0    
     1st Qu.:0.89   1st Qu.:0.29   1st Qu.:2.0   1st Qu.: 3.0   1st Qu.:0    
     Median :0.96   Median :0.35   Median :3.0   Median : 7.0   Median :0    
     Mean   :0.92   Mean   :0.33   Mean   :2.9   Mean   : 7.6   Mean   :0    
     3rd Qu.:0.99   3rd Qu.:0.36   3rd Qu.:4.0   3rd Qu.:11.0   3rd Qu.:0    
     Max.   :1.00   Max.   :0.36   Max.   :4.0   Max.   :15.0   Max.   :0    
    
    stan_diag(dat.zinb.brm$fit)
    
    plot of chunk tut10.6bS6.10BRMSstepSize
    stan_diag(dat.zinb.brm$fit, information = "stepsize")
    
    plot of chunk tut10.6bS6.10BRMSstepSize
    stan_diag(dat.zinb.brm$fit, information = "treedepth")
    
    plot of chunk tut10.6bS6.10BRMSstepSize
    stan_diag(dat.zinb.brm$fit, information = "divergence")
    
    plot of chunk tut10.6bS6.10BRMSstepSize
    library(gridExtra)
    grid.arrange(stan_rhat(dat.zinb.brm$fit) + theme_classic(8),
                 stan_ess(dat.zinb.brm$fit) + theme_classic(8),
                 stan_mcse(dat.zinb.brm$fit) + theme_classic(8),
                 ncol = 2)
    
    plot of chunk tut10.6bS6.10BRMSstepSize
    Conclusions: acceptance ratio is very good, the lot posterior is relatively robust to step size and tree depth, rhat values are all below 1.05, the effective sample size is not too bad (we might be able to achieve better acceptance ratios with tighter priors) and the ratio of MCMC error to posterior sd is very small.
  • We now explore the goodness of fit of the models via the residuals and deviance. We could calculate the Pearsons's residuals within the JAGS model. Alternatively, we could use the parameters to generate the residuals outside of JAGS.
    #extract the samples for the two model parameters
    coefs <- dat.zinb.jags$BUGSoutput$sims.matrix[,1:2]
    theta <- dat.zinb.jags$BUGSoutput$sims.matrix[,'psi']
    Xmat <- model.matrix(~x, data=dat.zinb)
    #expected values on a log scale
    lambda<-coefs %*% t(Xmat)
    #expected value on response scale
    eta <- exp(lambda)
    expY <- sweep(eta,1,(1-theta),"*")
    varY <- eta+sweep(eta^2,1,theta,"*")
    head(varY)
    
                 1         2         3         4         5         6         7         8         9        10       11       12       13        14        15        16        17        18        19        20
    [1,]  5.343355  6.056455  6.710940  6.782673  7.454279  9.591883  9.886422 10.142129 10.857296 12.334728 13.53727 14.42609 16.15161  18.32106  18.55017  21.24417  21.39110  21.82541  22.82726  26.16783
    [2,]  6.136290  8.213971 10.463062 10.730368 13.440247 24.739116 26.636896 28.354937 33.515421 45.900953 57.78469 67.65739 89.58710 122.59671 126.45037 177.25918 180.32847 189.58587 212.00514 297.82859
    [3,] 11.086488 11.972272 12.745761 12.828478 13.585024 15.812014 16.100779 16.348390 17.026571 18.367469 19.40679 20.14872 21.53308  23.18303  23.35205  25.27383  25.37541  25.67387  26.35240  28.52397
    [4,]  5.390940  5.903767  6.358283  6.407248  6.858209  8.216792  8.396190  8.550587  8.976105  9.828478 10.49885 10.98238 11.89530  13.00082  13.11511  14.42760  14.49762  14.70375  15.17440  16.69918
    [5,]  5.011368  6.014766  6.990161  7.100137  8.158381 11.852450 12.398287 12.879089 14.257358 17.255958 19.84063 21.83053 25.87864  31.29833  31.89134  39.14718  39.55753  40.77912  43.64564  53.67698
    [6,] 20.534217 22.189351 23.632622 23.786853 25.196463 29.335264 29.870784 30.329775 31.585938 34.065499 35.98370 37.35109 39.89831  42.92726  43.23714  46.75510  46.94078  47.48620  48.72534  52.68347
    
    varY <- sweep(varY,1,(1-theta),'*')
    #sweep across rows and then divide by lambda
    Resid <- -1*sweep(expY,2,dat.zinb$y,'-')/sqrt(varY)
    #plot residuals vs expected values
    plot(apply(Resid,2,mean)~apply(eta,2,mean))
    
    plot of chunk tut10.6bS6.11
    #Calculate residuals 
    coefs.zinb <- as.matrix(as.data.frame(rstan:::extract(dat.zinb.brm$fit)))
    coefs.zinb <- coefs.zinb[,grep('b_main', colnames(coefs.zinb))]
    shape <- as.matrix(as.data.frame(rstan:::extract(dat.zinb.brm$fit,'shape')))
    Xmat <- model.matrix(~x, data=dat.zinb)
    
    lambda.zinb<-coefs.zinb %*% t(Xmat)
    #expected value on response scale
    eta.zinb <- exp(lambda.zinb)
    expY.zinb <- sweep(eta.zinb,1,(1-shape),"*")
    varY.zinb <- eta.zinb+sweep(eta.zinb^2,1,shape,"*")
    
    varY.zinb <- sweep(varY.zinb,1,(1-theta),'*')
    #sweep across rows and then divide by lambda
    Resid.zinb <- -1*sweep(expY.zinb,2,dat.zinb$y,'-')/sqrt(varY.zinb)
    
    ggplot(data=NULL, aes(y=apply(Resid.zinb,2,mean), x=apply(eta.zinb,2,mean))) + geom_point()
    
    plot of chunk tut10.6bS6.11BRM

    There are no real patterns in the residuals.

  • Now we will compare the sum of squared residuals to the sum of squares residuals that would be expected from a Poisson distribution matching that estimated by the model. Essentially this is estimating how well the Poisson distribution, the log-link function and the linear model approximates the observed data.

    When doing so, we need to consider the expected value and variance of the zero-inflated poisson. $$ E(y_i) = \lambda\times(1-\theta)\\ Var(y_i) = \lambda\times(1-\theta)\times(1+\theta\times\lambda^2) $$

    SSres<-apply(Resid^2,1,sum)
    
    set.seed(2)
    #generate a matrix of draws from a zero-inflated poisson (ZINB) distribution
    # the matrix is the same dimensions as lambda
    library(gamlss.dist)
    #YNew <- matrix(rZINB(length(lambda),eta, theta),nrow=nrow(lambda))
    lambda <- sweep(eta,1,ifelse(dat.zinb$y==0,0,1),'*')
    YNew <- matrix(rpois(length(lambda),lambda),nrow=nrow(lambda))
    Resid1<-(expY - YNew)/sqrt(varY)
    SSres.sim<-apply(Resid1^2,1,sum)
    mean(SSres.sim>SSres)
    
    [1] 0.349
    

    Whilst not ideal (as we would prefer a Bayesian P-value of around 0.5), this value is not wildly bad and does not constitute overwhelming evidence of a lack of fit.

    Xmat <- model.matrix(~x, dat.zinb)
    nX <- ncol(Xmat)
    dat.zinb.list2 <- with(dat.zinb,list(Y=y, X=Xmat,N=nrow(dat.zinb), nX=nX,
                          mu=rep(0,nX),Sigma=diag(1.0E-06,nX), z=ifelse(y==0,0,1)))
    modelString="
    model {
      for (i in 1:N) {
         z[i] ~ dbern(psi.min)
         Y[i] ~ dnegbin(p[i],size)
         p[i] <- size/(size+mu.eff[i])
         mu.eff[i] <- z[i]*eta[i]
         log(eta[i]) <- beta0 + beta1*X[i]
         
         expY[i] <- eta[i]*(1-psi)
         varY[i] <- (1-psi)*(eta[i]*psi*pow(eta[i],2))
         Resid[i] <- (Y[i] - expY[i])/sqrt(varY[i])
         Y1[i] ~ dnegbin(p[i], size)
         Resid1[i] <- (Y1[i] - expY[i])/sqrt(varY[i])
         RSS[i] <- pow(Resid[i],2)
         RSS1[i] <-pow(Resid1[i],2) 
      }
      gamma ~ dnorm(0,0.001)
      psi.min <- min(0.9999, max(0.00001, (1-psi)))
      logit(psi) <- max(-20, min(20, gamma))
      size ~ dunif(0.001, 5)
      theta <- pow(1/mean(p),2)
    
      beta ~ dmnorm(mu[],Sigma[,])
      Pvalue <- mean(sum(RSS1)>sum(RSS))
    } 
    "
    writeLines(modelString,con='../downloads/BUGSscripts/tut11.5bS5.13.bug')
    library(R2jags)
    system.time(
    dat.zinb.jags3 <- jags(model='../downloads/BUGSscripts/tut11.5bS5.13.bug', data=dat.zinb.list2, inits=NULL,
              param=c('beta', 'gamma0','Pvalue'),
              n.chain=3,
              n.iter=20000, n.thin=10, n.burnin=10000)
      )
    
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
    Deleting model
    
    Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, : RUNTIME ERROR:
    Compilation error on line 5.
    Unable to resolve node p[1]
    This may be due to an undefined ancestor node or a directed cycle in the graph
    
    Timing stopped at: 0.012 0 0.009 
    
    print(dat.zinb.jags3)
    
    Error in print(dat.zinb.jags3): error in evaluating the argument 'x' in selecting a method for function 'print': Error: object 'dat.zinb.jags3' not found
    
    SSres.zinb<-apply(Resid.zinb^2,1,sum)
    
    set.seed(2)
    #generate a matrix of draws from a zero-inflated poisson (ZINB) distribution
    # the matrix is the same dimensions as lambda
    library(gamlss.dist)
    #YNew <- matrix(rZINB(length(lambda),eta, theta),nrow=nrow(lambda))
    lambda.zinb <- sweep(eta.zinb,1,ifelse(dat.zinb$y==0,0,1),'*')
    YNew.zinb <- matrix(MASS:::rnegbin(length(lambda.zinb),lambda.zinb, theta=mean(shape)),
                        nrow=nrow(lambda.zinb))
    Resid1.zinb<-(expY.zinb - YNew.zinb)/sqrt(varY.zinb)
    SSres.sim<-apply(Resid1.zinb^2,1,sum)
    mean(SSres.sim>SSres.zinb)
    
    [1] 0.3206667
    

    Conclusions: the Bayesian p-value is very close to 0.5, suggesting that there is a good fit of the model to the data.

  • Since it is difficult to diagnose many issues from the typical residuals we will now explore simulated residuals.
    #extract the samples for the two model parameters
    colnames(dat.zinb.jags$BUGSoutput$sims.matrix)
    
    [1] "beta0"    "beta1"    "deviance" "gamma"    "psi"      "theta"   
    
    coefs <- dat.zinb.jags$BUGSoutput$sims.matrix[,1:2]
    theta <- dat.zinb.jags$BUGSoutput$sims.matrix[,'theta']
    size <- dat.zinb.jags$BUGSoutput$sims.matrix[,'psi']
    Xmat <- model.matrix(~x, data=dat.zinb)
    #expected values on a log scale
    eta<-coefs %*% t(Xmat)
    #expected value on response scale
    lambda <- exp(eta)
    
    simRes <- function(lambda, data,n=250, plot=T, family='negbin', size=NULL,theta=NULL) {
     require(gap)
     N = nrow(data)
     sim = switch(family,
        'poisson' = matrix(rpois(n*N,apply(lambda,2,mean)),ncol=N, byrow=TRUE),
        'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2,mean),size),ncol=N, byrow=TRUE),
        'zip' = matrix(gamlss.dist:::rZIP(n*N,apply(lambda,2,mean),theta),ncol=N, byrow=TRUE),
        'zinb' = matrix(gamlss.dist:::rZINBI(n*N,apply(lambda,2,mean),sigma=theta,nu=size),ncol=N,
                        byrow=TRUE)
     )
     a = apply(sim + runif(n,-0.5,0.5),2,ecdf)
     resid<-NULL
     for (i in 1:nrow(data)) resid<-c(resid,a[[i]](data$y[i] + runif(1 ,-0.5,0.5)))
     if (plot==T) {
       par(mfrow=c(1,2))
       gap::qqunif(resid,pch = 2, bty = "n",
       logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals",
       cex.main = 1, las=1)
       plot(resid~apply(lambda,2,mean), xlab='Predicted value', ylab='Standardized residual', las=1)
     }
     resid
    }
    
    simRes(lambda,dat.zinb, family='zinb',theta=theta, size=size)
    
    plot of chunk tut10.6bS6.13simulatedResiduals
     [1] 0.5533333 0.8466667 0.6400000 0.8333333 0.8733333 0.6266667 0.9333333 0.8933333 0.7200000 0.1600000 0.4933333 0.3866667 0.3866667 0.7800000 0.0200000 0.2000000 0.8533333 0.2133333 0.3800000 0.8600000
    
    The trend (black symbols) in the qq-plot does not appear to be overly non-linear (matching the ideal red line well), suggesting that the model is not overdispersed. The spread of standardized (simulated) residuals in the residual plot do not appear overly non-uniform. That is there is not trend in the residuals. Furthermore, there is not a concentration of points close to 1 or 0 (which would imply overdispersion). Hence, once zero-inflation is accounted for, the model does not display overdispersion.

    Although there is a slight hint of non-linearity in that the residuals are high for low and high fitted values and lower in the middle, this might well be an artifact of the small data set size. By change, most of the observed values in the middle range of the predictor were zero.

Exploring the model parameters, inference

If there was any evidence that the assumptions had been violated or the model was not an appropriate fit, then we would need to reconsider the model and start the process again. In this case, there is no evidence that the test will be unreliable so we can proceed to explore the test statistics.

As with most Bayesian models, it is best to base conclusions on medians rather than means.

print(dat.zinb.jags)
Inference for Bugs model at "5", fit using jags,
 3 chains, each with 1e+05 iterations (first 50000 discarded), n.thin = 50
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
beta0      0.973   0.453  0.058  0.679  0.961  1.269  1.882 1.001  3000
beta1      0.066   0.042 -0.014  0.037  0.065  0.094  0.151 1.001  3000
gamma     -0.210   0.462 -1.133 -0.521 -0.202  0.090  0.674 1.002  1200
psi        0.450   0.109  0.244  0.373  0.450  0.522  0.662 1.003  1000
theta      2.190   0.363  1.724  1.939  2.102  2.360  3.133 1.001  3000
deviance  82.759   2.824 79.157 80.658 82.132 84.199 89.605 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 4.0 and DIC = 86.7
DIC is an estimate of expected predictive error (lower deviance is better).
library(plyr)
adply(dat.zinb.jags$BUGSoutput$sims.matrix, 2, function(x) {
  data.frame(Median=median(x), Mean=mean(x), HPDinterval(as.mcmc(x)), HPDinterval(as.mcmc(x),p=0.5))
})
        X1      Median        Mean       lower      upper     lower.1     upper.1
1    beta0  0.96103123  0.97276291  0.13961096  1.9515909  0.66801095  1.25524602
2    beta1  0.06467899  0.06570763 -0.01525967  0.1486955  0.03618379  0.09232641
3 deviance 82.13216622 82.75885652 78.74846503 88.3394029 79.66753681 82.68026059
4    gamma -0.20172349 -0.21028082 -1.12438237  0.6754259 -0.52792406  0.07915285
5      psi  0.44973945  0.45021792  0.24269534  0.6600966  0.37100120  0.51977789
6    theta  2.10185552  2.18996772  1.62902894  2.9316514  1.83725582  2.19604146

Actually, many find it more palatable to express the estimates in the original scale of the observations rather than on a log scale.

library(plyr)
adply(exp(dat.zinb.jags$BUGSoutput$sims.matrix[,1:2]), 2, function(x) {
  data.frame(Median=median(x), Mean=mean(x), HPDinterval(as.mcmc(x)), HPDinterval(as.mcmc(x),p=0.5))
})
     X1   Median     Mean     lower   upper  lower.1  upper.1
1 beta0 2.614391 2.933031 0.8237093 5.78519 1.615310 3.052375
2 beta1 1.066817 1.068866 0.9848562 1.16032 1.034969 1.094738

Conclusions: We would reject the null hypothesis of no effect of $x$ on $y$. An increase in x is associated with a significant linear increase (positive slope) in the abundance of $y$. Every 1 unit increase in $x$ results in a log 0.064679 unit increase in $y$. We usually express this in terms of abundance rather than log abundance, so every 1 unit increase in $x$ results in a (e^{0.064679}=1.0668165) 1.0668165 unit increase in the abundance of $y$.

Further explorations of the trends

A measure of the strength of the relationship can be obtained according to: $$R^2 = 1 - \frac{RSS_{model}}{RSS_{null}}$$

Alternatively, we could use McFadden's psuedo $$R^2 = 1-\frac{\mathcal{LL}(Model_{full})}{\mathcal{LL}(Model_{reduced})}$$ [[http://www.statisticalhorizons.com/category/uncategorized]] - about a quarter of the way down the page.

Xmat <- model.matrix(~x, dat=dat.zinb)
#expected values on a log scale
neta<-coefs %*% t(Xmat)
#expected value on response scale
eta <- exp(neta)
lambda <- sweep(eta,2,ifelse(dat.zinb$y==0,0,1),'*')
expY <- sweep(lambda,2,1-theta,'*')
#calculate the raw SS residuals
SSres <- apply((-1*(sweep(expY,2,dat.zinb$y,'-')))^2,1,sum)
SSres.null <- sum((dat.zinb$y - mean(dat.zinb$y))^2)
#calculate the model r2
1-mean(SSres)/SSres.null
[1] 0.3869893

Conclusions: 38.7% of the variation in $y$ abundance can be explained by its relationship with $x$.




Worked Examples

Basic χ2 references
  • Logan (2010) - Chpt 16-17
  • Quinn & Keough (2002) - Chpt 13-14

Poisson t-test

A marine ecologist was interested in examining the effects of wave exposure on the abundance of the striped limpet Siphonaria diemenensis on rocky intertidal shores. To do so, a single quadrat was placed on 10 exposed (to large waves) shores and 10 sheltered shores. From each quadrat, the number of Siphonaria diemenensis were counted.

Download Limpets data set
Format of limpets.csv data files
CountShore
1sheltered
3sheltered
2sheltered
1sheltered
4sheltered
......

ShoreCategorical description of the shore type (sheltered or exposed) - Predictor variable
CountNumber of limpets per quadrat - Response variable
turf

Open the limpet data set.

Show code
limpets <- read.table('../downloads/data/limpets.csv', header=T, sep=',', strip.white=T)
limpets
   Count     Shore
1      2 sheltered
2      2 sheltered
3      0 sheltered
4      6 sheltered
5      0 sheltered
6      3 sheltered
7      3 sheltered
8      0 sheltered
9      1 sheltered
10     2 sheltered
11     5   exposed
12     3   exposed
13     6   exposed
14     3   exposed
15     4   exposed
16     7   exposed
17    10   exposed
18     3   exposed
19     5   exposed
20     2   exposed
  1. Is there any evidence that these assumptions have been violated?
    Show code
    boxplot(Count~Shore, data=limpets)
    
    plot of chunk ws10.6bQ1_1
    library(vcd)
    fit <- goodfit(limpets$Count, type='poisson')
    summary(fit)
    
    	 Goodness-of-fit test for poisson distribution
    
                          X^2 df   P(> X^2)
    Likelihood Ratio 14.03661  7 0.05053407
    
    rootogram(fit)
    
    plot of chunk ws10.6bQ1_1
    fit <- goodfit(limpets$Count, type='nbinom')
    summary(fit)
    
    	 Goodness-of-fit test for nbinomial distribution
    
                          X^2 df  P(> X^2)
    Likelihood Ratio 9.105475  6 0.1677325
    
    rootogram(fit)
    
    plot of chunk ws10.6bQ1_1
    Ord_plot(limpets$Count, tol=0.2)
    
    plot of chunk ws10.6bQ1_1
    distplot(limpets$Count, type='poisson')
    
    plot of chunk ws10.6bQ1_1
    ## Although we could argue that NB more appropriate than Poisson, this is possibly 
    ## due to the small sample size.  Small samples are more varied than the populations 
    ## from which they are drawn
    
    1. The assumption of normality has been violated?
    2. The assumption of homogeneity of variance has been violated?
    3. Notwithstanding the very small sample size, is a Poisson distribution appropriate?
  2. At this point we could transform the data in an attempt to satisfy normality and homogeneity of variance. However, the analyses are then not on the scale of the data and thus the conclusions also pertain to a different scale. Furthermore, linear modelling on transformed count data is generally not as effective as modelling count data with a Poisson distribution.

  3. Lets instead we fit the same model with a Poisson distribution. $$ \begin{align} Y_i&\sim{}P(\lambda) & (\text{response distribution})\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,100) & (\text{week Bayesian prior})\\ \end{align} $$
    Show BRMS code
    limpets.brm <- brm(Count~Shore, family=poisson, data=limpets,
      prior = c(set_prior("normal(0,100)", class="Intercept"),
                set_prior("normal(0,100)", class="b")),
      chains=3, iter=2000, warmup=1000, thin=2)
    
    SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 1).
    
    Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.037372 seconds (Warm-up)
    #                0.036072 seconds (Sampling)
    #                0.073444 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 2).
    
    Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.038755 seconds (Warm-up)
    #                0.038426 seconds (Sampling)
    #                0.077181 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 3).
    
    Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.045705 seconds (Warm-up)
    #                0.038072 seconds (Sampling)
    #                0.083777 seconds (Total)
    # 
    
  4. Explore the chain mixing diagnostics.
    1. Trace plots
      Show code
      library(gridExtra)
      grid.arrange(stan_trace(limpets.brm$fit, ncol=1),
                   stan_dens(limpets.brm$fit, separate_chains=TRUE,ncol=1),
                   ncol=2)
      
      plot of chunk ws10.6bQ1_0a
    2. Autocorrelation
      Show code
      stan_ac(limpets.brm$fit)
      
      plot of chunk ws10.6bQ1_0b
    3. Step size characteristics (STAN only)
      Show code
      summary(do.call(rbind, args = get_sampler_params(limpets.brm$fit, inc_warmup = FALSE)), digits = 2)
      
       accept_stat__    stepsize__    treedepth__   n_leapfrog__ n_divergent__
       Min.   :0.28   Min.   :0.60   Min.   :1.0   Min.   :1.0   Min.   :0    
       1st Qu.:0.91   1st Qu.:0.60   1st Qu.:2.0   1st Qu.:3.0   1st Qu.:0    
       Median :0.97   Median :0.74   Median :2.0   Median :3.0   Median :0    
       Mean   :0.93   Mean   :0.70   Mean   :2.2   Mean   :3.8   Mean   :0    
       3rd Qu.:1.00   3rd Qu.:0.76   3rd Qu.:3.0   3rd Qu.:5.0   3rd Qu.:0    
       Max.   :1.00   Max.   :0.76   Max.   :3.0   Max.   :7.0   Max.   :0    
      
      stan_diag(limpets.brm$fit)
      
      plot of chunk ws10.6bQ1_0c
      stan_diag(limpets.brm$fit, information = "stepsize")
      
      plot of chunk ws10.6bQ1_0c
      stan_diag(limpets.brm$fit, information = "treedepth")
      
      plot of chunk ws10.6bQ1_0c
      stan_diag(limpets.brm$fit, information = "divergence")
      
      plot of chunk ws10.6bQ1_0c
      library(gridExtra)
      grid.arrange(stan_rhat(limpets.brm$fit) + theme_classic(8),
                   stan_ess(limpets.brm$fit) + theme_classic(8),
                   stan_mcse(limpets.brm$fit) + theme_classic(8),
                   ncol = 2)
      
      plot of chunk ws10.6bQ1_0c
  5. Explore the model fit diagnostics.
    1. Explore the patterns in simulated residuals:
      Show code
      lambda.brm = fitted(limpets.brm, scale='response', summary=FALSE)
      
      simRes <- function(lambda, Y,n=250, plot=T, family='negbin', size=NULL,theta=NULL) {
       require(gap)
       N = length(Y)
       sim = switch(family,
          'poisson' = matrix(rpois(n*N,apply(lambda,2,mean)),ncol=N, byrow=TRUE),
          'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2,mean),size),ncol=N, byrow=TRUE),
          'zip' = matrix(gamlss.dist:::rZIP(n*N,apply(lambda,2,mean),theta),ncol=N, byrow=TRUE),
          'zinb' = matrix(gamlss.dist:::rZINBI(n*N,apply(lambda,2,mean),sigma=theta,nu=size),ncol=N,
                          byrow=TRUE)
       )
       a = apply(sim + runif(n,-0.5,0.5),2,ecdf)
       resid<-NULL
       for (i in 1:length(Y)) resid<-c(resid,a[[i]](Y[i] + runif(1 ,-0.5,0.5)))
       if (plot==T) {
         par(mfrow=c(1,2))
         gap::qqunif(resid,pch = 2, bty = "n",
         logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals",
         cex.main = 1, las=1)
         plot(resid~apply(lambda,2,mean), xlab='Predicted value', ylab='Standardized residual', las=1)
       }
       resid
      }
      
      simRes(lambda.brm, limpets$Count, family='poisson')
      
      plot of chunk ws10.6bQ1_1b
       [1] 0.636 0.584 0.060 1.000 0.096 0.724 0.736 0.100 0.320 0.672 0.640 0.276 0.736 0.220 0.440 0.772 0.988 0.216 0.520 0.100
      
    2. Goodness of fit test:
      Show code
      Resid.brm <- residuals(limpets.brm, type='pearson', summary=FALSE)
      SSres.brm <- apply(Resid.brm^2,1,sum)
      lambda.brm = fitted(limpets.brm, scale='response', summary=FALSE)
      YNew.brm <- matrix(rpois(length(lambda.brm), lambda=lambda.brm),nrow=nrow(lambda.brm))
      
      Resid1.brm<-(lambda.brm - YNew.brm)/sqrt(lambda.brm)
      SSres.sim.brm<-apply(Resid1.brm^2,1,sum)
      mean(SSres.sim.brm>SSres.brm)
      
      [1] 0.1686667
      
    3. Fitted and observed values (plot):
      Show code
      newdata <- data.frame(Shore=levels(limpets$Shore))
      tempdata = fitted(limpets.brm, scale='response',newdata=newdata)
      tempdata <- cbind(newdata, Mean=tempdata[,'Estimate'], Lower=tempdata[,'2.5%ile'],
                      Upper=tempdata[,'97.5%ile'])
      tempdata
      
            Shore     Mean    Lower    Upper
      1   exposed 4.808977 3.582597 6.194391
      2 sheltered 1.891051 1.172518 2.870922
      
      library(ggplot2)
      ggplot(limpets, aes(y=Count, x=Shore)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=Lower, ymax=Upper), color='red')
      
      plot of chunk ws10.6bQ1_1d
      ##OR the manual way
      Xmat <- model.matrix(~Shore, data=newdata)
      coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))[,1:2]
      fit = exp(coefs %*% t(Xmat))
      tempdata <- cbind(newdata, plyr:::adply(fit, 2, function(x) {
       data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
      }))
      ggplot(limpets, aes(y=Count, x=Shore)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=lower, ymax=upper), color='red')
      
      plot of chunk ws10.6bQ1_1d
    4. Check for overdispersion:
      Show code
      coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))[,1:2]
      Resid <- residuals(limpets.brm, type='pearson',summary=FALSE)
      RSS <- apply(Resid^2,1,sum)
      Disp <- RSS/(nrow(limpets)-ncol(coefs))
      data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))
      
             Median     Mean    lower    upper
      var1 1.384552 1.415549 1.319196 1.619498
      
  6. Explore the parameter estimates
    Show code
    summary(limpets.brm)
    
     Family: poisson (log) 
    Formula: Count ~ Shore 
       Data: limpets (Number of observations: 20) 
    Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 2; 
             total post-warmup samples = 1500
       WAIC: Not computed
     
    Fixed Effects: 
                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    Intercept          1.56      0.14     1.28     1.82       1361    1
    Shoresheltered    -0.95      0.26    -1.45    -0.43        952    1
    
    Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
    is a crude measure of effective sample size, and Rhat is the potential 
    scale reduction factor on split chains (at convergence, Rhat = 1).
    
    exp(coef(limpets.brm))
    
                       mean
    Intercept      4.761504
    Shoresheltered 0.387136
    
    coefs.brm <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))
    coefs.brm <- coefs.brm[,grep('b', colnames(coefs.brm))]
    plyr:::adply(exp(coefs.brm), 2, function(x) {
      data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
      })
    
                    X1      Mean    median     lower     upper
    1      b_Intercept 4.8089767 4.7877066 3.4711793 6.0637796
    2 b_Shoresheltered 0.4008196 0.3880695 0.2186746 0.6188941
    
    marginal_effects(limpets.brm)
    
    plot of chunk ws10.6bQ1_2a
  7. Estimate the strength of the relationship
    Show code
    ## calculate the expected values on the response scale
    lambda.brm = fitted(limpets.brm, scale='response', summary=FALSE)
    ## calculate the raw SSresid
    SSres.brm <- apply((-1*(sweep(lambda.brm,2,limpets$Count,'-')))^2,1,sum)
    SSres.null <- sum((limpets$Count - mean(limpets$Count))^2)
    #OR 
    #calculate the model r2
    1-mean(SSres.brm)/SSres.null
    
    [1] 0.2857745
    
    #OR manually
    coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))
    coefs <- coefs.brm[,grep('b', colnames(coefs))]
    Xmat <- model.matrix(~Shore, data=limpets)
    lambda.brm = exp(coefs %*% t(Xmat))
    
    ## calculate the raw SSresid
    SSres.brm <- apply((-1*(sweep(lambda.brm,2,limpets$Count,'-')))^2,1,sum)
    SSres.null <- crossprod(limpets$Count - mean(limpets$Count))
    #calculate the model r2
    1-mean(SSres.brm)/SSres.null
    
              [,1]
    [1,] 0.2857745
    
    ## Alternatively..
    R.var = apply(residuals(limpets.brm, summary=FALSE),1,var)
    X.var = apply(fitted(limpets.brm, summary=FALSE),1,var)
    R2.marginal <- X.var/(X.var + R.var)
    data.frame(Median=median(R2.marginal), coda:::HPDinterval(as.mcmc(R2.marginal)))
    
            Median     lower     upper
    var1 0.3386231 0.1113502 0.5261667
    
  8. Construct a summary plot
    Show code
    newdata=data.frame(Shore=levels(limpets$Shore))
    coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))
    coefs <- coefs.brm[,grep('b', colnames(coefs))]
    Xmat <- model.matrix(~Shore, data=newdata)
    fit = exp(coefs %*% t(Xmat))
    newdata = cbind(newdata, plyr:::adply(fit, 2, function(x) {
      data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
    }))
    
    ggplot(limpets, aes(y=Count, x=Shore)) +
      geom_point(color='grey') +
      geom_pointrange(data=newdata, aes(y=Mean, ymin=lower, ymax=upper)) +
      scale_y_continuous('Abundance of limpets') +
      scale_x_discrete('Shore type', breaks=c('exposed','sheltered'), labels=c('Exposed','Sheltered')) +
      theme_classic() +
      theme(axis.line.x=element_line(), axis.line.y=element_line())
    
    plot of chunk ws10.6bQ1_4a

Poisson ANOVA (regression)

We once again return to a modified example from Quinn and Keough (2002). In Exercise 1 of Workshop 9.4a, we introduced an experiment by Day and Quinn (1989) that examined how rock surface type affected the recruitment of barnacles to a rocky shore. The experiment had a single factor, surface type, with 4 treatments or levels: algal species 1 (ALG1), algal species 2 (ALG2), naturally bare surfaces (NB) and artificially scraped bare surfaces (S). There were 5 replicate plots for each surface type and the response (dependent) variable was the number of newly recruited barnacles on each plot after 4 weeks.

Download Day data set
Format of day.csv data files
TREATBARNACLE
ALG127
....
ALG224
....
NB9
....
S12
....
TREATCategorical listing of surface types. ALG1 = algal species 1, ALG2 = algal species 2, NB = naturally bare surface, S = scraped bare surface.
BARNACLEThe number of newly recruited barnacles on each plot after 4 weeks.
Six-plated barnacle

Open
the day data file.
Show code
day <- read.table('../downloads/data/day.csv', header=T, sep=',', strip.white=T)
head(day)
  TREAT BARNACLE
1  ALG1       27
2  ALG1       19
3  ALG1       18
4  ALG1       23
5  ALG1       25
6  ALG2       24

We previously analysed these data with via a classic linear model (ANOVA) as the data appeared to conform to the linear model assumptions. Alternatively, we could analyse these data with a generalized linear model (Poisson error distribution). Note that as the boxplots were all fairly symmetrical and equally varied, and the sample means are well away from zero (in fact there are no zero's in the data) we might suspect that whether we fit the model as a linear or generalized linear model is probably not going to be of great consequence. Nevertheless, it does then provide a good comparison between the two frameworks.

  1. Using boxplots re-examine the assumptions of normality and homogeneity of variance. Note that when sample sizes are small (as is the case with this data set), these ANOVA assumptions cannot reliably be checked using boxplots since boxplots require at least 5 replicates (and preferably more), from which to calculate the median and quartiles. As with regression analysis, it is the assumption of homogeneity of variances (and in particular, whether there is a relationship between the mean and variance) that is of most concern for ANOVA.
    Show code
    boxplot(BARNACLE~TREAT, data=day)
    
    plot of chunk ws10.6bQ2_1
    library(vcd)
    fit <- goodfit(day$BARNACLE, type='poisson')
    summary(fit)
    
    	 Goodness-of-fit test for poisson distribution
    
                          X^2 df   P(> X^2)
    Likelihood Ratio 32.19209 17 0.01424181
    
    rootogram(fit)
    
    plot of chunk ws10.6bQ2_1
    fit <- goodfit(day$BARNACLE, type='nbinom')
    summary(fit)
    
    	 Goodness-of-fit test for nbinomial distribution
    
                          X^2 df  P(> X^2)
    Likelihood Ratio 18.41836 16 0.2999741
    
    rootogram(fit)
    
    plot of chunk ws10.6bQ2_1
    distplot(day$BARNACLE, type='poisson')
    
    plot of chunk ws10.6bQ2_1
    ## Poisson would appear appropriate
    
  2. Fit the generalized linear model (GLM) relating the number of newly recruited barnacles to substrate type. $$ \begin{align} Y_i&\sim{}P(\lambda) & (\text{response distribution})\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ \end{align} $$
    Show BRMS code
    day.brm <- brm(BARNACLE~TREAT, family=poisson, data=day,
      prior = c(set_prior("normal(0,1000)", class="Intercept"),
                set_prior("normal(0,1000)", class="b")),
      chains=3, iter=2000, warmup=1000, thin=2)
    
    SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 1).
    
    Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.04357 seconds (Warm-up)
    #                0.041621 seconds (Sampling)
    #                0.085191 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 2).
    
    Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.055115 seconds (Warm-up)
    #                0.04102 seconds (Sampling)
    #                0.096135 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 3).
    
    Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.052148 seconds (Warm-up)
    #                0.045466 seconds (Sampling)
    #                0.097614 seconds (Total)
    # 
    
  3. Explore the chain mixing diagnostics.
    1. Trace plots
      Show code
      library(gridExtra)
      grid.arrange(stan_trace(day.brm$fit, ncol=1),
                   stan_dens(day.brm$fit, separate_chains=TRUE,ncol=1),
                   ncol=2)
      
      plot of chunk ws10.6bQ2_0a
    2. Autocorrelation
      Show code
      stan_ac(day.brm$fit)
      
      plot of chunk ws10.6bQ2_0b
    3. Step size characteristics (STAN only)
      Show code
      summary(do.call(rbind, args = get_sampler_params(day.brm$fit, inc_warmup = FALSE)), digits = 2)
      
       accept_stat__    stepsize__    treedepth__   n_leapfrog__ n_divergent__
       Min.   :0.17   Min.   :0.62   Min.   :1.0   Min.   :1.0   Min.   :0    
       1st Qu.:0.86   1st Qu.:0.62   1st Qu.:2.0   1st Qu.:3.0   1st Qu.:0    
       Median :0.95   Median :0.64   Median :2.0   Median :3.0   Median :0    
       Mean   :0.91   Mean   :0.64   Mean   :2.4   Mean   :4.5   Mean   :0    
       3rd Qu.:0.99   3rd Qu.:0.66   3rd Qu.:3.0   3rd Qu.:7.0   3rd Qu.:0    
       Max.   :1.00   Max.   :0.66   Max.   :3.0   Max.   :7.0   Max.   :0    
      
      stan_diag(day.brm$fit)
      
      plot of chunk ws10.6bQ2_0c
      stan_diag(day.brm$fit, information = "stepsize")
      
      plot of chunk ws10.6bQ2_0c
      stan_diag(day.brm$fit, information = "treedepth")
      
      plot of chunk ws10.6bQ2_0c
      stan_diag(day.brm$fit, information = "divergence")
      
      plot of chunk ws10.6bQ2_0c
      library(gridExtra)
      grid.arrange(stan_rhat(day.brm$fit) + theme_classic(8),
                   stan_ess(day.brm$fit) + theme_classic(8),
                   stan_mcse(day.brm$fit) + theme_classic(8),
                   ncol = 2)
      
      plot of chunk ws10.6bQ2_0c
  4. Explore the model fit diagnostics.
    1. Explore the patterns in simulated residuals:
      Show code
      lambda.brm = fitted(day.brm, scale='response', summary=FALSE)
      
      simRes <- function(lambda, Y,n=250, plot=T, family='negbin', size=NULL,theta=NULL) {
       require(gap)
       N = length(Y)
       sim = switch(family,
          'poisson' = matrix(rpois(n*N,apply(lambda,2,mean)),ncol=N, byrow=TRUE),
          'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2,mean),size),ncol=N, byrow=TRUE),
          'zip' = matrix(gamlss.dist:::rZIP(n*N,apply(lambda,2,mean),theta),ncol=N, byrow=TRUE),
          'zinb' = matrix(gamlss.dist:::rZINBI(n*N,apply(lambda,2,mean),sigma=theta,nu=size),ncol=N,
                          byrow=TRUE)
       )
       a = apply(sim + runif(n,-0.5,0.5),2,ecdf)
       resid<-NULL
       for (i in 1:length(Y)) resid<-c(resid,a[[i]](Y[i] + runif(1 ,-0.5,0.5)))
       if (plot==T) {
         par(mfrow=c(1,2))
         gap::qqunif(resid,pch = 2, bty = "n",
         logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals",
         cex.main = 1, las=1)
         plot(resid~apply(lambda,2,mean), xlab='Predicted value', ylab='Standardized residual', las=1)
       }
       resid
      }
      
      simRes(lambda.brm, day$BARNACLE, family='poisson')
      
      plot of chunk ws10.6bQ2_1b
       [1] 0.804 0.280 0.192 0.572 0.676 0.224 0.816 0.416 0.332 0.780 0.040 0.368 0.724 0.484 0.944 0.376 0.116 0.672 0.952 0.320
      
    2. Goodness of fit test:
      Show code
      Resid.brm <- residuals(day.brm, type='pearson', summary=FALSE)
      SSres.brm <- apply(Resid.brm^2,1,sum)
      lambda.brm = fitted(day.brm, scale='response', summary=FALSE)
      YNew.brm <- matrix(rpois(length(lambda.brm), lambda=lambda.brm),nrow=nrow(lambda.brm))
      
      Resid1.brm<-(lambda.brm - YNew.brm)/sqrt(lambda.brm)
      SSres.sim.brm<-apply(Resid1.brm^2,1,sum)
      mean(SSres.sim.brm>SSres.brm)
      
      [1] 0.5986667
      
    3. Fitted and observed values (plot):
      Show code
      newdata <- data.frame(TREAT=levels(day$TREAT))
      tempdata = fitted(day.brm, scale='response',newdata=newdata)
      tempdata <- cbind(newdata, Mean=tempdata[,'Estimate'], Lower=tempdata[,'2.5%ile'],
                      Upper=tempdata[,'97.5%ile'])
      tempdata
      
        TREAT     Mean    Lower    Upper
      1  ALG1 22.42653 18.50291 26.69970
      2  ALG2 28.34577 23.75776 33.21707
      3    NB 14.93645 11.56363 18.37661
      4     S 13.15741 10.15435 16.54420
      
      library(ggplot2)
      ggplot(day, aes(y=BARNACLE, x=TREAT)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=Lower, ymax=Upper), color='red')
      
      plot of chunk ws10.6bQ2_1d
      ##OR the manual way
      Xmat <- model.matrix(~TREAT, data=newdata)
      coefs <- as.matrix(as.data.frame(rstan:::extract(day.brm$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      
      fit = exp(coefs %*% t(Xmat))
      tempdata <- cbind(newdata, plyr:::adply(fit, 2, function(x) {
       data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
      }))
      ggplot(day, aes(y=BARNACLE, x=TREAT)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=lower, ymax=upper), color='red')
      
      plot of chunk ws10.6bQ2_1d
    4. Check for overdispersion:
      Show code
      coefs <- as.matrix(as.data.frame(rstan:::extract(day.brm$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      Resid <- residuals(day.brm, type='pearson',summary=FALSE)
      RSS <- apply(Resid^2,1,sum)
      Disp <- RSS/(nrow(day)-ncol(coefs))
      data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))
      
             Median     Mean     lower   upper
      var1 1.072787 1.110258 0.9095692 1.39219
      
  5. Explore the parameter estimates
    Show code
    summary(day.brm)
    
     Family: poisson (log) 
    Formula: BARNACLE ~ TREAT 
       Data: day (Number of observations: 20) 
    Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 2; 
             total post-warmup samples = 1500
       WAIC: Not computed
     
    Fixed Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    Intercept     3.11      0.09     2.92     3.28       1099 1.00
    TREATALG2     0.23      0.13    -0.01     0.48       1026 1.00
    TREATNB      -0.41      0.15    -0.72    -0.12        818 1.01
    TREATS       -0.54      0.16    -0.84    -0.22        721 1.00
    
    Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
    is a crude measure of effective sample size, and Rhat is the potential 
    scale reduction factor on split chains (at convergence, Rhat = 1).
    
    exp(coef(day.brm))
    
                    mean
    Intercept 22.3302880
    TREATALG2  1.2647170
    TREATNB    0.6643044
    TREATS     0.5845021
    
    coefs.brm <- as.matrix(as.data.frame(rstan:::extract(day.brm$fit)))
    coefs.brm <- coefs.brm[,grep('b', colnames(coefs.brm))]
    plyr:::adply(exp(coefs.brm), 2, function(x) {
      data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
      })
    
               X1       Mean     median      lower      upper
    1 b_Intercept 22.4265258 22.3563112 18.7801517 26.9246921
    2 b_TREATALG2  1.2750621  1.2633289  0.9820844  1.5952722
    3   b_TREATNB  0.6718217  0.6643929  0.4654365  0.8588770
    4    b_TREATS  0.5919046  0.5845745  0.4160054  0.7742567
    
    marginal_effects(day.brm)
    
    plot of chunk ws10.6bQ2_2a
  6. Estimate the strength of the relationship
    Show code
    ## calculate the expected values on the response scale
    lambda.brm = fitted(day.brm, scale='response', summary=FALSE)
    ## calculate the raw SSresid
    SSres.brm <- apply((-1*(sweep(lambda.brm,2,day$BARNACLE,'-')))^2,1,sum)
    SSres.null <- sum((day$BARNACLE - mean(day$BARNACLE))^2)
    #OR 
    #calculate the model r2
    1-mean(SSres.brm)/SSres.null
    
    [1] 0.635007
    
    #OR manually
    coefs <- as.matrix(as.data.frame(rstan:::extract(day.brm$fit)))
    coefs <- coefs[,grep('b', colnames(coefs))]
    Xmat <- model.matrix(~TREAT, data=day)
    lambda.brm = exp(coefs %*% t(Xmat))
    
    ## calculate the raw SSresid
    SSres.brm <- apply((-1*(sweep(lambda.brm,2,day$BARNACLE,'-')))^2,1,sum)
    SSres.null <- crossprod(day$BARNACLE - mean(day$BARNACLE))
    #calculate the model r2
    1-mean(SSres.brm)/SSres.null
    
             [,1]
    [1,] 0.635007
    
    ## Alternatively..
    R.var = apply(residuals(day.brm, summary=FALSE),1,var)
    X.var = apply(fitted(day.brm, summary=FALSE),1,var)
    R2.marginal <- X.var/(X.var + R.var)
    data.frame(Median=median(R2.marginal), coda:::HPDinterval(as.mcmc(R2.marginal)))
    
            Median     lower     upper
    var1 0.6985856 0.5149793 0.7759515
    
  7. Although we have now established that there is a statistical difference between the group means, we do not yet know which group(s) are different from which other(s). To address this we can explore all pairwise contrasts from the posterior. Note, careful when back transforming from log. Recall that log laws indicate that $log(a) - log(b) = log(a/b)$.
    Show code
    coefs <-as.matrix(as.data.frame(rstan:::extract(day.brm$fit)))
    coefs <- coefs[,grep('b_', colnames(coefs))]
    newdata <- data.frame(TREAT=levels(day$TREAT))
    Xmat <- model.matrix(~TREAT, data=newdata)
    library(multcomp)
    tuk.mat <- contrMat(n=table(newdata$TREAT), type="Tukey")
    Xmat <- model.matrix(~TREAT, data=newdata)
    Xmat <- tuk.mat %*% Xmat
    fit <- exp(coefs %*% t(Xmat))
    library(coda)
    MCMCsum <- function(x) {
       data.frame(Median=median(x, na.rm=TRUE), t(quantile(x,na.rm=TRUE)),
                  HPDinterval(as.mcmc(x)))
    }
    
    plyr:::adply(as.matrix(fit),2,MCMCsum)
    
               X1    Median       X0.      X25.      X50.      X75.     X100.     lower     upper
    1 ALG2 - ALG1 1.2633289 0.8452016 1.1549126 1.2633289 1.3888569 2.0054558 0.9820844 1.5952722
    2   NB - ALG1 0.6643929 0.4142791 0.5990942 0.6643929 0.7348131 1.0752521 0.4654365 0.8588770
    3    S - ALG1 0.5845745 0.3666083 0.5252027 0.5845745 0.6472396 0.9985889 0.4160054 0.7742567
    4   NB - ALG2 0.5246730 0.3142344 0.4775387 0.5246730 0.5817067 0.8599918 0.3790918 0.6773199
    5    S - ALG2 0.4627503 0.2666343 0.4155122 0.4627503 0.5136620 0.7633502 0.3366329 0.6175163
    6      S - NB 0.8804989 0.5160321 0.7789036 0.8804989 0.9970252 1.6316219 0.6258814 1.2183909
    
    ## Due to log laws, these effects are multipliers rather than differences
    
    ## If you want to express the outcomes in terms of differences then it is necessary to 
    ## estimate the cell posteriors and subtract those - this cannot easily be done Frequentist!
    Xmat <- model.matrix(~TREAT, data=newdata)
    fit <- exp(coefs %*% t(Xmat))
    library(multcomp)
    tuk.mat <- contrMat(n=table(newdata$TREAT), type="Tukey")
    Xmat <- tuk.mat
    fit <- fit %*% t(Xmat)
    plyr:::adply(as.matrix(fit),2,MCMCsum)
    
               X1     Median        X0.       X25.       X50.         X75.       X100.        lower     upper
    1 ALG2 - ALG1   5.895130  -3.772229   3.619496   5.895130   8.21805575 17.17097099  -0.03095788 12.031425
    2   NB - ALG1  -7.483685 -16.063734  -9.424278  -7.483685  -5.67358521  1.43404144 -12.96977305 -2.324986
    3    S - ALG1  -9.257110 -19.048803 -11.112041  -9.257110  -7.47521694 -0.02471089 -14.60523050 -4.132667
    4   NB - ALG2 -13.383835 -25.127509 -15.420303 -13.383835 -11.30591077 -3.08899312 -19.21054550 -7.484513
    5    S - ALG2 -15.182663 -26.816258 -17.194247 -15.182663 -13.16149810 -5.56128213 -20.74073778 -9.168501
    6      S - NB  -1.807393  -8.789820  -3.462725  -1.807393  -0.04160081  6.74548604  -6.36710762  2.861639
    
  8. Alternatively, it might be interesting to explore specific contrasts
    • compare the two algal surfaces
    • compare the two bare surfaces
    • compare the algal and bare surfaces
    Show code
    ## Effect as a multiplier
    coefs <-as.matrix(as.data.frame(rstan:::extract(day.brm$fit)))
    coefs <- coefs[,grep('b_', colnames(coefs))]
    newdata <- data.frame(TREAT=levels(day$TREAT))
    Xmat <- model.matrix(~TREAT, data=newdata)
    
    comp.mat <- rbind(c(1,-1,0,0), c(0,0,1,-1), c(1/2,1/2,-1/2,-1/2))
    Xmat <- cbind(0,comp.mat %*% contr.treatment(4))
    fit <- exp(coefs %*% t(Xmat))
    MCMCsum <- function(x) {
       data.frame(Median=median(x, na.rm=TRUE), t(quantile(x,na.rm=TRUE)),
                  HPDinterval(as.mcmc(x)))
    }
    plyr:::adply(as.matrix(fit),2,MCMCsum)
    
      X1    Median       X0.      X25.      X50.      X75.    X100.     lower     upper
    1  1 0.7915595 0.4986398 0.7200166 0.7915595 0.8658664 1.183150 0.6083706 0.9906496
    2  2 1.1357198 0.6128871 1.0029837 1.1357198 1.2838560 1.937864 0.7969842 1.5466949
    3  3 1.8067288 1.2633552 1.6705385 1.8067288 1.9401542 2.563707 1.4381279 2.1821891
    
    ## Effect as a difference
    Xmat <- model.matrix(~TREAT, data=newdata)
    fit <- exp(coefs %*% t(Xmat))
    comp.mat <- rbind(c(1,-1,0,0), c(0,0,1,-1), c(1/2,1/2,-1/2,-1/2))
    fit <- fit %*% t(comp.mat)
    plyr:::adply(as.matrix(fit),2,MCMCsum)
    
      X1    Median        X0.        X25.      X50.      X75.     X100.      lower       upper
    1  1 -5.895130 -17.170971 -8.21805575 -5.895130 -3.619496  3.772229 -12.031425  0.03095788
    2  2  1.807393  -6.745486  0.04160081  1.807393  3.462725  8.789820  -2.861639  6.36710762
    3  3 11.296350   4.225970  9.88922295 11.296350 12.684859 18.425620   7.272778 14.98168802
    
  9. Produce a suitable summary plot
    Show code
    coefs <-as.matrix(as.data.frame(rstan:::extract(day.brm$fit)))
    coefs <- coefs[,grep('b_', colnames(coefs))]
    newdata <- data.frame(TREAT=levels(day$TREAT))
    Xmat <- model.matrix(~TREAT, data=newdata)
    fit = exp(coefs %*% t(Xmat))
    
    newdata = cbind(newdata, plyr:::adply(fit, 2, function(x) {
      data.frame(Mean=mean(x, na.rm=TRUE),Median=median(x, na.rm=TRUE), HPDinterval(as.mcmc(x)))
    }))
    
    ggplot(day, aes(y=BARNACLE, x=TREAT)) +
      geom_point(color='grey') +
      geom_pointrange(data=newdata, aes(y=Mean, ymin=lower, ymax=upper)) +
      scale_y_continuous('Number of newly recruited Barnacles') +
      scale_x_discrete('Substrate type', breaks=c('ALG1','ALG2','NB','S'),
             labels=c('Algae 1','Algae 2', 'Naturally\nBare','Scraped\nBare')) +
      theme_classic() +
      theme(axis.line.x=element_line(), axis.line.y=element_line())
    
    plot of chunk ws10.6bQ2_5

Poisson ANOVA (regression)

To investigate habitat differences in moth abundances, researchers from the Australian National University (ANU) in Canberra counted the number of individuals of two species of moth (recorded as A and P) along transects throughout a landscape comprising eight different habitat types ('Bank', 'Disturbed', 'Lowerside', 'NEsoak', 'NWsoak', 'SEsoak', 'SWsoak', 'Upperside'). For the data presented here, one of the habitat types ('Bank') has been ommitted as no moths were encounted in that habitat.

Although each transect was approximately the same length, each transect passed through multiple habitat types. Consequently, each transect was divided into sections according to habitat and the number of moths observed in each section recorded. Clearly, the number of observed moths in a section would be related to the length of the transect in that section. Therefore, the researchers also recorded the length of each habitat section.

Download moths data set
Format of moth.csv data files
METERS A P HABITAT
25 9 8 NWsoak
37 3 20 SWsoak
109 7 9 Lowerside
10 0 2 Lowerside
133 9 1 Upperside
26 3 18 Disturbed
METERSThe length of the section of transect
AThe number of moth species A observed in section of transect
PThe number of moth species P observed in section of transect
HABITATCategorical listing of the habitat type within section of transect.
Six-plated barnacle

Open
the day data file.
Show code
moths <- read.csv('../downloads/data/moths.csv', header=T, strip.white=T)
head(moths)
  METERS A  P   HABITAT
1     25 9  8    NWsoak
2     37 3 20    SWsoak
3    109 7  9 Lowerside
4     10 0  2 Lowerside
5    133 9  1 Upperside
6     26 3 18 Disturbed

The primary focus of this question will be to investigate the effect of habitat on the abundance of moth species A.

  1. To standardize the moth counts for transect section length, we could convert the counts into densities (by dividing the A by METERS. Create such as variable (DENSITY and explore the usual ANOVA assumptions.
    Show code
    moths <- within(moths, DENSITY<-A/METERS)
    boxplot(DENSITY~HABITAT, data=moths)
    
    plot of chunk ws10.6bQ2a_1
    library(vcd)
    fit <- goodfit(moths$A, type='poisson')
    summary(fit)
    
    	 Goodness-of-fit test for poisson distribution
    
                          X^2 df     P(> X^2)
    Likelihood Ratio 170.2708 11 1.037761e-30
    
    rootogram(fit)
    
    plot of chunk ws10.6bQ2a_1
    fit <- goodfit(moths$A, type='nbinom')
    summary(fit)
    
    	 Goodness-of-fit test for nbinomial distribution
    
                         X^2 df     P(> X^2)
    Likelihood Ratio 32.1636 10 0.0003760562
    
    rootogram(fit)
    
    plot of chunk ws10.6bQ2a_1
    Ord_plot(moths$A, tol=0.2)
    
    plot of chunk ws10.6bQ2a_1
    distplot(moths$A, type='poisson')
    
    plot of chunk ws10.6bQ2a_1
    distplot(moths$A, type='nbinom')
    
    plot of chunk ws10.6bQ2a_1
  2. Clearly, the there is an issue with normality and homogeneity of variance. Perhaps it would be worth transforming density in an attempt to normalize these data. Given that there are densities of zero, a straight logarithmic transformation would not be appropriate. Alternatives could inlude a square-root transform, a forth-root transform and a log plus 1 transform.
    Show code
    moths <- within(moths, DENSITY<-A/METERS)
    boxplot(sqrt(DENSITY)~HABITAT, data=moths)
    
    plot of chunk ws10.6bQ2a_2
    boxplot((DENSITY)^0.25~HABITAT, data=moths)
    
    plot of chunk ws10.6bQ2a_2
    boxplot(log(DENSITY+1)~HABITAT, data=moths)
    
    plot of chunk ws10.6bQ2a_2
  3. Arguably, non of the above transformations have improved the data's adherence to the linear modelling assumptions. Count data (such as the abundance of moth species A) is unlikely to follow a normal or even log-normal distribution. Count data are usually more appropriately modelled via a Poisson or Negative binomial distributions. Note, dividing by transect length is unlikely to alter the underlying nature of the data distribution as it still based on counts. Fit the generalized linear model (GLM) relating the number of moths to habitat. $$ \begin{align} Y_i&\sim{}P(\lambda) & (\text{response distribution})\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ \end{align} $$ OR $$ \begin{align} Y_i&\sim{}NB(p, size) & (\text{response distribution})\\ p &= size/(size + \lambda)\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ size &\sim{}T3,0,5)& (\text{week student t prior})\\ \end{align} $$
    Show code
    ## Poisson model 
    moths.brm <- brm(A~HABITAT, family=poisson, data=moths,
      prior = c(set_prior("normal(0,1000)", class="Intercept"),
                set_prior("normal(0,1000)", class="b")),
      chains=3, iter=2000, warmup=1000, thin=4)
    
    SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 1).
    
    Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.115726 seconds (Warm-up)
    #                0.081034 seconds (Sampling)
    #                0.19676 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 2).
    
    Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.09259 seconds (Warm-up)
    #                0.099851 seconds (Sampling)
    #                0.192441 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 3).
    
    Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.092985 seconds (Warm-up)
    #                0.104814 seconds (Sampling)
    #                0.197799 seconds (Total)
    # 
    
    ## Negative Binomial model
    moths.brm2 <- brm(A~HABITAT, family=negbinomial, data=moths,
      prior = c(set_prior("normal(0,1000)", class="Intercept"),
                set_prior("normal(0,1000)", class="b"),
                            set_prior("student_t(3,0,5)", class="shape")),
      chains=3, iter=2000, warmup=1000, thin=4)
    
    SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 1).
    
    Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.289452 seconds (Warm-up)
    #                0.317398 seconds (Sampling)
    #                0.60685 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 2).
    
    Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.305892 seconds (Warm-up)
    #                0.265276 seconds (Sampling)
    #                0.571168 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 3).
    
    Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.285935 seconds (Warm-up)
    #                0.228923 seconds (Sampling)
    #                0.514858 seconds (Total)
    # 
    
  4. Actually, the above models fail to account for the length of the section of transect. We could do this a couple of ways:
    1. Add METERS as a covariate. Note, since we are modelling against a Poission distribution with a log link function, we should also log the covariate - in order to maintain linearity between the expected value of species A abundance and transect length. $$ \begin{align} Y_i&\sim{}P(\lambda) & (\text{response distribution})\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 Habitat_i + \beta_2 log(Meters_i) & (\text{linear predictor})\\ \beta_0, \beta_1, \beta_2 &\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ \end{align} $$ OR $$ \begin{align} Y_i&\sim{}NB(p, size) & (\text{response distribution})\\ p &= size/(size + \lambda)\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 Habitat_i + \beta_2 log(Meters_i) & (\text{linear predictor})\\ \beta_0, \beta_1, \beta_2 &\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ size &\sim{}T3,0,5)& (\text{week student t prior})\\ \end{align} $$ $$log(\mu) = \beta_0+\beta_1 Habitat_i + \beta_2 log(meters) +\epsilon_i, \hspace{1cm} \epsilon \sim Poisson(\lambda)\\ log(\mu) = \beta_0+\beta_1 Habitat_i + \beta_2 log(meters) +\epsilon_i, \hspace{1cm} \epsilon \sim NB(n,p) $$
      Show code
      ## Poisson model 
      moths.brmP <- brm(A~log(METERS)+HABITAT, family=poisson, data=moths,
        prior = c(set_prior("normal(0,1000)", class="Intercept"),
                  set_prior("normal(0,1000)", class="b")),
        chains=3, iter=2000, warmup=1000, thin=4)
      
      SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 1).
      
      Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.144395 seconds (Warm-up)
      #                0.112145 seconds (Sampling)
      #                0.25654 seconds (Total)
      # 
      
      SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 2).
      
      Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.118709 seconds (Warm-up)
      #                0.164646 seconds (Sampling)
      #                0.283355 seconds (Total)
      # 
      
      SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 3).
      
      Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.219836 seconds (Warm-up)
      #                0.141561 seconds (Sampling)
      #                0.361397 seconds (Total)
      # 
      
      ## Negative Binomial model
      moths.brmNB <- brm(A~log(METERS)+HABITAT, family=negbinomial, data=moths,
        prior = c(set_prior("normal(0,1000)", class="Intercept"),
                  set_prior("normal(0,1000)", class="b"),
                              set_prior("student_t(3,0,5)", class="shape")),
        chains=3, iter=2000, warmup=1000, thin=4)
      
      SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 1).
      
      Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.451177 seconds (Warm-up)
      #                0.373002 seconds (Sampling)
      #                0.824179 seconds (Total)
      # 
      
      SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 2).
      
      Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.299282 seconds (Warm-up)
      #                0.312715 seconds (Sampling)
      #                0.611997 seconds (Total)
      # 
      
      SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 3).
      
      Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.397361 seconds (Warm-up)
      #                0.293748 seconds (Sampling)
      #                0.691109 seconds (Total)
      # 
      
    2. Add METERS as an offset in which case $\beta_2$ is assumed to be 1 (a reasonable assumption in this case). The advantage is that it does not sacrifice any residual degrees of freedom. Again to maintain linearity, the offset should include the log of transect length. $$ \begin{align} Y_i&\sim{}P(\lambda) & (\text{response distribution})\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 Habitat_i + log(Meters_i)& (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ \end{align} $$ OR $$ \begin{align} Y_i&\sim{}NB(p, size) & (\text{response distribution})\\ p &= size/(size + \lambda)\\ log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 Habitat_i + log(Meters_i)& (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ size &\sim{}T3,0,5)& (\text{week student t prior})\\ \end{align} $$ $$log(\mu) = \beta_0+\beta_1 Habitat_i + log(meters) +\epsilon_i, \hspace{1cm} \epsilon \sim Poisson(\lambda)\\ log(\mu) = \beta_0+\beta_1 Habitat_i + log(meters) +\epsilon_i, \hspace{1cm} \epsilon \sim NB(n,p)$$
      Show code
      ## Poisson model
      moths.brmPO <- brm(A~HABITAT + offset(log(METERS)), family=poisson, data=moths,
        prior = c(set_prior("normal(0,1000)", class="Intercept"),
                  set_prior("normal(0,1000)", class="b")),
        chains=3, iter=2000, warmup=1000, thin=4)
      
      SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 1).
      
      Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.093852 seconds (Warm-up)
      #                0.090044 seconds (Sampling)
      #                0.183896 seconds (Total)
      # 
      
      SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 2).
      
      Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.166358 seconds (Warm-up)
      #                0.179634 seconds (Sampling)
      #                0.345992 seconds (Total)
      # 
      
      SAMPLING FOR MODEL 'poisson(log) brms-model' NOW (CHAIN 3).
      
      Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.12022 seconds (Warm-up)
      #                0.099833 seconds (Sampling)
      #                0.220053 seconds (Total)
      # 
      
      ## Negative Binomial model
      moths.brmNBO <- brm(A~HABITAT + offset(log(METERS)), family=negbinomial, data=moths,
        prior = c(set_prior("normal(0,1000)", class="Intercept"),
                  set_prior("normal(0,1000)", class="b"),
                              set_prior("student_t(3,0,5)", class="shape")),
        chains=3, iter=2000, warmup=1000, thin=4)
      
      SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 1).
      
      Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.2179 seconds (Warm-up)
      #                0.275275 seconds (Sampling)
      #                0.493175 seconds (Total)
      # 
      
      SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 2).
      
      Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.300552 seconds (Warm-up)
      #                0.254311 seconds (Sampling)
      #                0.554863 seconds (Total)
      # 
      
      SAMPLING FOR MODEL 'negbinomial(log) brms-model' NOW (CHAIN 3).
      
      Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
      Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
      Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
      Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
      Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
      Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
      Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
      Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
      Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
      Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
      Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
      Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
      #  Elapsed Time: 0.239755 seconds (Warm-up)
      #                0.225719 seconds (Sampling)
      #                0.465474 seconds (Total)
      # 
      
  5. Explore the chain mixing diagnostics.
    1. Trace plots
      Show code
      library(gridExtra)
      grid.arrange(stan_trace(moths.brmP$fit, ncol=1),
                   stan_dens(moths.brmP$fit, separate_chains=TRUE,ncol=1),
                   ncol=2)
      
      plot of chunk ws10.6bQ2a_5
      grid.arrange(stan_trace(moths.brmNB$fit, ncol=1),
                   stan_dens(moths.brmNB$fit, separate_chains=TRUE,ncol=1),
                   ncol=2)
      
      plot of chunk ws10.6bQ2a_5
      grid.arrange(stan_trace(moths.brmPO$fit, ncol=1),
                   stan_dens(moths.brmPO$fit, separate_chains=TRUE,ncol=1),
                   ncol=2)
      
      plot of chunk ws10.6bQ2a_5
      grid.arrange(stan_trace(moths.brmNBO$fit, ncol=1),
                   stan_dens(moths.brmNBO$fit, separate_chains=TRUE,ncol=1),
                   ncol=2)
      
      plot of chunk ws10.6bQ2a_5
    2. Autocorrelation
      Show code
      stan_ac(moths.brmP$fit)
      
      plot of chunk ws10.6bQ2a_5b
      stan_ac(moths.brmNB$fit)
      
      plot of chunk ws10.6bQ2a_5b
      stan_ac(moths.brmPO$fit)
      
      plot of chunk ws10.6bQ2a_5b
      stan_ac(moths.brmNBO$fit)
      
      plot of chunk ws10.6bQ2a_5b
    3. Step size characteristics (STAN only)
      Show code
      summary(do.call(rbind, args = get_sampler_params(moths.brmP$fit, inc_warmup = FALSE)), digits = 2)
      
       accept_stat__    stepsize__    treedepth__   n_leapfrog__ n_divergent__
       Min.   :0.39   Min.   :0.26   Min.   :2.0   Min.   : 3    Min.   :0    
       1st Qu.:0.86   1st Qu.:0.26   1st Qu.:3.0   1st Qu.: 7    1st Qu.:0    
       Median :0.94   Median :0.26   Median :3.0   Median : 7    Median :0    
       Mean   :0.90   Mean   :0.27   Mean   :3.3   Mean   :10    Mean   :0    
       3rd Qu.:0.98   3rd Qu.:0.30   3rd Qu.:4.0   3rd Qu.:15    3rd Qu.:0    
       Max.   :1.00   Max.   :0.30   Max.   :5.0   Max.   :31    Max.   :0    
      
      stan_diag(moths.brmP$fit)
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmP$fit, information = "stepsize")
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmP$fit, information = "treedepth")
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmP$fit, information = "divergence")
      
      plot of chunk ws10.6bQ2a_5c
      library(gridExtra)
      grid.arrange(stan_rhat(moths.brmP$fit) + theme_classic(8),
                   stan_ess(moths.brmP$fit) + theme_classic(8),
                   stan_mcse(moths.brmP$fit) + theme_classic(8),
                   ncol = 2)
      
      plot of chunk ws10.6bQ2a_5c
      summary(do.call(rbind, args = get_sampler_params(moths.brmNB$fit, inc_warmup = FALSE)), digits = 2)
      
       accept_stat__    stepsize__    treedepth__   n_leapfrog__  n_divergent__
       Min.   :0.43   Min.   :0.33   Min.   :2.0   Min.   : 3.0   Min.   :0    
       1st Qu.:0.86   1st Qu.:0.33   1st Qu.:3.0   1st Qu.: 7.0   1st Qu.:0    
       Median :0.94   Median :0.34   Median :3.0   Median : 7.0   Median :0    
       Mean   :0.90   Mean   :0.34   Mean   :3.1   Mean   : 8.5   Mean   :0    
       3rd Qu.:0.98   3rd Qu.:0.36   3rd Qu.:3.0   3rd Qu.: 7.0   3rd Qu.:0    
       Max.   :1.00   Max.   :0.36   Max.   :5.0   Max.   :31.0   Max.   :0    
      
      stan_diag(moths.brmNB$fit)
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmNB$fit, information = "stepsize")
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmNB$fit, information = "treedepth")
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmNB$fit, information = "divergence")
      
      plot of chunk ws10.6bQ2a_5c
      library(gridExtra)
      grid.arrange(stan_rhat(moths.brmNB$fit) + theme_classic(8),
                   stan_ess(moths.brmNB$fit) + theme_classic(8),
                   stan_mcse(moths.brmNB$fit) + theme_classic(8),
                   ncol = 2)
      
      plot of chunk ws10.6bQ2a_5c
      summary(do.call(rbind, args = get_sampler_params(moths.brmPO$fit, inc_warmup = FALSE)), digits = 2)
      
       accept_stat__    stepsize__    treedepth__  n_leapfrog__  n_divergent__
       Min.   :0.36   Min.   :0.28   Min.   :1    Min.   : 1.0   Min.   :0    
       1st Qu.:0.85   1st Qu.:0.28   1st Qu.:2    1st Qu.: 3.0   1st Qu.:0    
       Median :0.94   Median :0.31   Median :3    Median : 7.0   Median :0    
       Mean   :0.90   Mean   :0.31   Mean   :3    Mean   : 7.6   Mean   :0    
       3rd Qu.:0.98   3rd Qu.:0.32   3rd Qu.:3    3rd Qu.: 7.0   3rd Qu.:0    
       Max.   :1.00   Max.   :0.32   Max.   :5    Max.   :31.0   Max.   :0    
      
      stan_diag(moths.brmPO$fit)
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmPO$fit, information = "stepsize")
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmPO$fit, information = "treedepth")
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmPO$fit, information = "divergence")
      
      plot of chunk ws10.6bQ2a_5c
      library(gridExtra)
      grid.arrange(stan_rhat(moths.brmPO$fit) + theme_classic(8),
                   stan_ess(moths.brmPO$fit) + theme_classic(8),
                   stan_mcse(moths.brmPO$fit) + theme_classic(8),
                   ncol = 2)
      
      plot of chunk ws10.6bQ2a_5c
      summary(do.call(rbind, args = get_sampler_params(moths.brmNBO$fit, inc_warmup = FALSE)), digits = 2)
      
       accept_stat__    stepsize__    treedepth__   n_leapfrog__  n_divergent__
       Min.   :0.39   Min.   :0.41   Min.   :1.0   Min.   : 1.0   Min.   :0    
       1st Qu.:0.88   1st Qu.:0.41   1st Qu.:3.0   1st Qu.: 7.0   1st Qu.:0    
       Median :0.96   Median :0.44   Median :3.0   Median : 7.0   Median :0    
       Mean   :0.91   Mean   :0.43   Mean   :2.8   Mean   : 6.6   Mean   :0    
       3rd Qu.:0.99   3rd Qu.:0.44   3rd Qu.:3.0   3rd Qu.: 7.0   3rd Qu.:0    
       Max.   :1.00   Max.   :0.44   Max.   :4.0   Max.   :15.0   Max.   :0    
      
      stan_diag(moths.brmNBO$fit)
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmNBO$fit, information = "stepsize")
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmNBO$fit, information = "treedepth")
      
      plot of chunk ws10.6bQ2a_5c
      stan_diag(moths.brmNBO$fit, information = "divergence")
      
      plot of chunk ws10.6bQ2a_5c
      library(gridExtra)
      grid.arrange(stan_rhat(moths.brmNBO$fit) + theme_classic(8),
                   stan_ess(moths.brmNBO$fit) + theme_classic(8),
                   stan_mcse(moths.brmNBO$fit) + theme_classic(8),
                   ncol = 2)
      
      plot of chunk ws10.6bQ2a_5c
  6. Explore the model fit diagnostics.
    1. Explore the patterns in simulated residuals:
      Show code
      simRes <- function(lambda, Y,n=250, plot=T, family='negbin', size=NULL,theta=NULL) {
       require(gap)
       N = length(Y)
       sim = switch(family,
          'poisson' = matrix(rpois(n*N,apply(lambda,2,mean)),ncol=N, byrow=TRUE),
          'negbin' = matrix(MASS:::rnegbin(n*N,apply(lambda,2,mean),mean(size)),ncol=N, byrow=TRUE),
          'zip' = matrix(gamlss.dist:::rZIP(n*N,apply(lambda,2,mean),theta),ncol=N, byrow=TRUE),
          'zinb' = matrix(gamlss.dist:::rZINBI(n*N,apply(lambda,2,mean),sigma=theta,nu=mean(size)),ncol=N,
                          byrow=TRUE)
       )
       a = apply(sim + runif(n,-0.5,0.5),2,ecdf)
       resid<-NULL
       for (i in 1:length(Y)) resid<-c(resid,a[[i]](Y[i] + runif(1 ,-0.5,0.5)))
       if (plot==T) {
         par(mfrow=c(1,2))
         gap::qqunif(resid,pch = 2, bty = "n",
         logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals",
         cex.main = 1, las=1)
         plot(resid~apply(lambda,2,mean), xlab='Predicted value', ylab='Standardized residual', las=1)
       }
       resid
      }
      
      lambda.brm = fitted(moths.brmP, scale='response', summary=FALSE)
      simRes(lambda.brm, moths$A, family='poisson')
      
      plot of chunk ws10.6bQ2a_6b
       [1] 0.000 0.092 0.756 0.008 0.952 0.952 0.024 0.076 0.024 1.000 0.648 1.000 0.740 0.140 0.024 0.540 0.424 0.716 0.624 0.832 0.624 0.848 0.100 0.332 0.440 0.964 0.232 0.772 0.192 0.060 0.064 0.488 0.220 0.928 1.000 0.056 0.204 0.976 0.052 0.888
      
      lambda.brm = fitted(moths.brmNB, scale='response', summary=FALSE)
      size = rstan:::extract(moths.brmNB$fit, 'shape')[[1]]
      simRes(lambda.brm, moths$A, family='negbin', size=size)
      
      plot of chunk ws10.6bQ2a_6b
       [1] 0.100 0.116 0.500 0.020 0.744 0.848 0.136 0.228 0.132 0.796 0.552 0.984 0.644 0.208 0.108 0.528 0.552 0.800 0.512 0.652 0.640 0.800 0.148 0.460 0.448 0.900 0.416 0.676 0.228 0.128 0.296 0.508 0.312 0.812 0.900 0.148 0.080 0.884 0.140 0.804
      
      lambda.brm = fitted(moths.brmPO, scale='response', summary=FALSE)
      simRes(lambda.brm, moths$A, family='poisson')
      
      plot of chunk ws10.6bQ2a_6b
       [1] 0.000 0.064 0.000 0.120 0.984 0.312 0.080 0.176 0.336 0.956 0.984 1.000 0.860 0.888 0.668 0.632 0.960 0.792 1.000 0.460 1.000 1.000 0.016 0.104 0.816 1.000 0.396 0.640 0.340 0.480 0.044 0.420 0.100 0.788 1.000 0.004 0.192 0.688 0.004 0.808
      
      lambda.brm = fitted(moths.brmNBO, scale='response', summary=FALSE)
      size = rstan:::extract(moths.brmNB$fit, 'shape')[[1]]
      simRes(lambda.brm, moths$A, family='negbin', size=size)
      
      plot of chunk ws10.6bQ2a_6b
       [1] 0.008 0.048 0.000 0.012 0.856 0.208 0.048 0.392 0.300 0.264 0.784 0.972 0.248 0.588 0.112 0.460 0.864 0.676 0.356 0.188 0.884 0.916 0.004 0.172 0.780 0.992 0.300 0.492 0.180 0.444 0.064 0.372 0.196 0.424 0.988 0.024 0.008 0.472 0.012 0.340
      
    2. Goodness of fit test:
      Show code
      ## Poisson
      Resid.brm <- residuals(moths.brmP, type='pearson', summary=FALSE)
      SSres.brm <- apply(Resid.brm^2,1,sum)
      lambda.brm = fitted(moths.brm, scale='response', summary=FALSE)
      YNew.brm <- matrix(rpois(length(lambda.brm), lambda=lambda.brm),nrow=nrow(lambda.brm))
      
      Resid1.brm<-(lambda.brm - YNew.brm)/sqrt(lambda.brm)
      SSres.sim.brm<-apply(Resid1.brm^2,1,sum)
      mean(SSres.sim.brm>SSres.brm)
      
      [1] 0.005333333
      
      ## Negative Binomial
      Resid.brm <- residuals(moths.brmNB, type='pearson', summary=FALSE)
      SSres.brm <- apply(Resid.brm^2,1,sum)
      lambda.brm = fitted(moths.brm, scale='response', summary=FALSE)
      YNew.brm <- matrix(rpois(length(lambda.brm), lambda=lambda.brm),nrow=nrow(lambda.brm))
      
      Resid1.brm<-(lambda.brm - YNew.brm)/sqrt(lambda.brm)
      SSres.sim.brm<-apply(Resid1.brm^2,1,sum)
      mean(SSres.sim.brm>SSres.brm)
      
      [1] 0.8346667
      
      ## Poisson with offset
      Resid.brm <- residuals(moths.brmPO, type='pearson', summary=FALSE)
      SSres.brm <- apply(Resid.brm^2,1,sum)
      lambda.brm = fitted(moths.brm, scale='response', summary=FALSE)
      YNew.brm <- matrix(rpois(length(lambda.brm), lambda=lambda.brm),nrow=nrow(lambda.brm))
      
      Resid1.brm<-(lambda.brm - YNew.brm)/sqrt(lambda.brm)
      SSres.sim.brm<-apply(Resid1.brm^2,1,sum)
      mean(SSres.sim.brm>SSres.brm)
      
      [1] 0
      
      ## Negative Binomial with offset
      Resid.brm <- residuals(moths.brmNBO, type='pearson', summary=FALSE)
      SSres.brm <- apply(Resid.brm^2,1,sum)
      lambda.brm = fitted(moths.brm, scale='response', summary=FALSE)
      YNew.brm <- matrix(rpois(length(lambda.brm), lambda=lambda.brm),nrow=nrow(lambda.brm))
      
      Resid1.brm<-(lambda.brm - YNew.brm)/sqrt(lambda.brm)
      SSres.sim.brm<-apply(Resid1.brm^2,1,sum)
      mean(SSres.sim.brm>SSres.brm)
      
      [1] 0.872
      
    3. Fitted and observed values (plot):
      Show code
      ## Poisson
      newdata <- data.frame(HABITAT=levels(moths$HABITAT), METERS=mean(moths$METERS))
      tempdata = fitted(moths.brmP, scale='response',newdata=newdata)
      tempdata <- cbind(newdata, Mean=tempdata[,'Estimate'], Lower=tempdata[,'2.5%ile'],
                      Upper=tempdata[,'97.5%ile'])
      tempdata
      
          HABITAT METERS      Mean      Lower     Upper
      1 Disturbed   45.5  1.474522  0.5637094  3.031730
      2 Lowerside   45.5  5.457930  3.5429458  7.981774
      3    NEsoak   45.5  2.377828  1.2604467  3.745447
      4    NWsoak   45.5 26.209140 19.6452112 34.361091
      5    SEsoak   45.5  5.851585  4.0600126  8.137415
      6    SWsoak   45.5  6.959453  4.2331902 10.367149
      7 Upperside   45.5  4.761405  2.9405168  6.993508
      
      library(ggplot2)
      ggplot(moths, aes(y=A, x=HABITAT)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=Lower, ymax=Upper), color='red')
      
      plot of chunk ws10.6bQ2a_6d
      ##OR the manual way
      Xmat <- model.matrix(~log(METERS) + HABITAT, data=newdata)
      coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmP$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      fit = exp(coefs %*% t(Xmat))
      tempdata <- cbind(newdata, plyr:::adply(fit, 2, function(x) {
       data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
      }))
      ggplot(moths, aes(y=A, x=HABITAT)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=lower, ymax=upper), color='red')
      
      plot of chunk ws10.6bQ2a_6d
      ## Negative Binomial
      newdata <- data.frame(HABITAT=levels(moths$HABITAT), METERS=mean(moths$METERS))
      tempdata = fitted(moths.brmNB, scale='response',newdata=newdata)
      tempdata <- cbind(newdata, Mean=tempdata[,'Estimate'], Lower=tempdata[,'2.5%ile'],
                      Upper=tempdata[,'97.5%ile'])
      library(ggplot2)
      ggplot(moths, aes(y=A, x=HABITAT)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=Lower, ymax=Upper), color='red')
      
      plot of chunk ws10.6bQ2a_6d
      ##OR the manual way
      Xmat <- model.matrix(~log(METERS) + HABITAT, data=newdata)
      coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmNB$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      
      fit = exp(coefs %*% t(Xmat))
      tempdata <- cbind(newdata, plyr:::adply(fit, 2, function(x) {
       data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
      }))
      ggplot(moths, aes(y=A, x=HABITAT)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=lower, ymax=upper), color='red')
      
      plot of chunk ws10.6bQ2a_6d
      ## Poisson with offset
      newdata <- data.frame(HABITAT=levels(moths$HABITAT), METERS=mean(moths$METERS))
      tempdata = fitted(moths.brmPO, scale='response',newdata=newdata)
      tempdata <- cbind(newdata, Mean=tempdata[,'Estimate'], Lower=tempdata[,'2.5%ile'],
                      Upper=tempdata[,'97.5%ile'])
      library(ggplot2)
      ggplot(moths, aes(y=A, x=HABITAT)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=Lower, ymax=Upper), color='red')
      
      plot of chunk ws10.6bQ2a_6d
      ##OR the manual way - dont forget to account for the offset
      Xmat <- model.matrix(~HABITAT, data=newdata)
      coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmPO$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      fit = exp(coefs %*% t(Xmat) + log(mean(moths$METERS)))
      tempdata <- cbind(newdata, plyr:::adply(fit, 2, function(x) {
       data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
      }))
      ggplot(moths, aes(y=A, x=HABITAT)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=lower, ymax=upper), color='red')
      
      plot of chunk ws10.6bQ2a_6d
      ## Negative Binomial with offset
      newdata <- data.frame(HABITAT=levels(moths$HABITAT), METERS=mean(moths$METERS))
      tempdata = fitted(moths.brmNBO, scale='response',newdata=newdata)
      tempdata <- cbind(newdata, Mean=tempdata[,'Estimate'], Lower=tempdata[,'2.5%ile'],
                      Upper=tempdata[,'97.5%ile'])
      library(ggplot2)
      ggplot(moths, aes(y=A, x=HABITAT)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=Lower, ymax=Upper), color='red')
      
      plot of chunk ws10.6bQ2a_6d
      ##OR the manual way
      Xmat <- model.matrix(~HABITAT, data=newdata)
      coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmNBO$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      
      fit = exp(coefs %*% t(Xmat) + log(mean(moths$METERS)))
      tempdata <- cbind(newdata, plyr:::adply(fit, 2, function(x) {
       data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
      }))
      ggplot(moths, aes(y=A, x=HABITAT)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=lower, ymax=upper), color='red')
      
      plot of chunk ws10.6bQ2a_6d
    4. Check for overdispersion:
      Show code
      ## Poisson
      coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmP$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      Resid <- residuals(moths.brmP, type='pearson',summary=FALSE)
      RSS <- apply(Resid^2,1,sum)
      Disp <- RSS/(nrow(moths)-ncol(coefs))
      data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))
      
            Median     Mean    lower    upper
      var1 2.42229 2.442048 2.273809 2.671013
      
      ## Negative Binomial
      coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmNB$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      Resid <- residuals(moths.brmNB, type='pearson',summary=FALSE)
      RSS <- apply(Resid^2,1,sum)
      Disp <- RSS/(nrow(moths)-ncol(coefs))
      data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))
      
              Median      Mean     lower    upper
      var1 0.8497883 0.9132843 0.7213029 1.304696
      
      ## Poisson with offset
      coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmPO$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      Resid <- residuals(moths.brmPO, type='pearson',summary=FALSE)
      RSS <- apply(Resid^2,1,sum)
      Disp <- RSS/(nrow(moths)-ncol(coefs))
      data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))
      
             Median     Mean   lower    upper
      var1 8.512843 8.533245 8.35746 8.765346
      
      ## Negative Binomial with offset
      coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmNBO$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      Resid <- residuals(moths.brmNBO, type='pearson',summary=FALSE)
      RSS <- apply(Resid^2,1,sum)
      Disp <- RSS/(nrow(moths)-ncol(coefs))
      data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))
      
              Median      Mean     lower   upper
      var1 0.7153655 0.9899389 0.6280123 1.46994
      
  7. At this stage we will conclude that the offset models are poor fits. The Negative Binomial model appears to be a better fit than the Poisson, the latter of which is overdispersed. Explore the parameter estimates for the Negative Binomial model.
    Show code
    summary(moths.brmNB)
    
     Family: negbinomial (log) 
    Formula: A ~ log(METERS) + HABITAT 
       Data: moths (Number of observations: 40) 
    Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 4; 
             total post-warmup samples = 750
       WAIC: Not computed
     
    Fixed Effects: 
                     Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    Intercept           -0.16      0.50    -1.17     0.76        651 1.01
    logMETERS            0.14      0.15    -0.14     0.42        566 1.00
    HABITATLowerside     1.37      0.52     0.38     2.32        556 1.00
    HABITATNEsoak        0.51      0.66    -0.75     1.85        492 1.00
    HABITATNWsoak        2.95      0.61     1.79     4.17        466 1.00
    HABITATSEsoak        1.40      0.55     0.36     2.50        431 1.00
    HABITATSWsoak        1.62      0.69     0.37     2.98        523 1.00
    HABITATUpperside     1.18      0.81    -0.27     2.75        464 1.00
    
    Family Specific Parameters: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    shape     3.27      1.41     1.31      6.8        649    1
    
    Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
    is a crude measure of effective sample size, and Rhat is the potential 
    scale reduction factor on split chains (at convergence, Rhat = 1).
    
    exp(coef(moths.brmNB))
    
                           mean
    Intercept         0.8530651
    logMETERS         1.1495789
    HABITATLowerside  3.9311608
    HABITATNEsoak     1.6688071
    HABITATNWsoak    19.0893054
    HABITATSEsoak     4.0741209
    HABITATSWsoak     5.0406178
    HABITATUpperside  3.2528957
    
    coefs.brm <- as.matrix(as.data.frame(rstan:::extract(moths.brmNB$fit)))
    coefs.brm <- coefs.brm[,grep('b', colnames(coefs.brm))]
    plyr:::adply(exp(coefs.brm), 2, function(x) {
      data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
      })
    
                      X1       Mean     median     lower     upper
    1        b_Intercept  0.9640476  0.8667087 0.2455259  1.924904
    2        b_logMETERS  1.1622245  1.1454692 0.8537851  1.493643
    3 b_HABITATLowerside  4.4979287  4.0014549 1.0605590  8.794885
    4    b_HABITATNEsoak  2.0883054  1.6512687 0.2406724  5.097136
    5    b_HABITATNWsoak 23.0365909 19.1029295 2.6240569 52.580216
    6    b_HABITATSEsoak  4.7417832  4.1066968 0.7076325  9.973949
    7    b_HABITATSWsoak  6.4842791  4.9137160 0.8596906 15.789222
    8 b_HABITATUpperside  4.5218465  3.1861182 0.3221937 12.337032
    
    marginal_effects(moths.brmNB, effects='HABITAT')
    
    plot of chunk ws10.6bQ2a_7
  8. Estimate the strength of the relationship
    Show code
    ## calculate the expected values on the response scale
    lambda.brm = fitted(moths.brmNB, scale='response', summary=FALSE)
    ## calculate the raw SSresid
    SSres.brm <- apply((-1*(sweep(lambda.brm,2, moths$A,'-')))^2,1,sum)
    SSres.null <- sum((moths$A - mean(moths$A))^2)
    #OR 
    #calculate the model r2
    1-mean(SSres.brm)/SSres.null
    
    [1] 0.3798634
    
    #OR manually
    coefs <- as.matrix(as.data.frame(rstan:::extract(moths.brmNB$fit)))
    coefs <- coefs[,grep('^b_', colnames(coefs))]
    Xmat <- model.matrix(~log(METERS) + HABITAT, data=moths)
    lambda.brm = exp(coefs %*% t(Xmat))
    
    ## calculate the raw SSresid
    SSres.brm <- apply((-1*(sweep(lambda.brm,2,moths$A,'-')))^2,1,sum)
    SSres.null <- crossprod(moths$A - mean(moths$A))
    #calculate the model r2
    1-mean(SSres.brm)/SSres.null
    
              [,1]
    [1,] 0.3798634
    
    ## Alternatively..
    R.var = apply(residuals(moths.brmNB, summary=FALSE),1,var)
    X.var = apply(fitted(moths.brmNB, summary=FALSE),1,var)
    R2.marginal <- X.var/(X.var + R.var)
    data.frame(Median=median(R2.marginal), coda:::HPDinterval(as.mcmc(R2.marginal)))
    
            Median     lower     upper
    var1 0.6292283 0.3216714 0.7238538
    
  9. Although we have now established that there is a statistical difference between the group means, we do not yet know which group(s) are different from which other(s). To address this we can explore all pairwise contrasts from the posterior. Note, careful when back transforming from log. Recall that log laws indicate that $log(a) - log(b) = log(a/b)$.
    Show code
    coefs <-as.matrix(as.data.frame(rstan:::extract(moths.brmNB$fit)))
    coefs <- coefs[,grep('^b_', colnames(coefs))]
    newdata <- data.frame(HABITAT=levels(moths$HABITAT), METERS=mean(moths$METERS))
    Xmat <- model.matrix(~log(METERS) + HABITAT, data=newdata)
    library(multcomp)
    tuk.mat <- contrMat(n=table(newdata$HABITAT), type="Tukey")
    Xmat <- model.matrix(~log(METERS)+HABITAT, data=newdata)
    Xmat <- tuk.mat %*% Xmat
    fit <- exp(coefs %*% t(Xmat))
    library(coda)
    MCMCsum <- function(x) {
       data.frame(Median=median(x, na.rm=TRUE), t(quantile(x,na.rm=TRUE)),
                  HPDinterval(as.mcmc(x)))
    }
    
    plyr:::adply(as.matrix(fit),2,MCMCsum)
    
                          X1     Median        X0.       X25.       X50.       X75.       X100.      lower      upper
    1  Lowerside - Disturbed  4.0014549 0.75058230  2.6927014  4.0014549  5.7029870  34.0094273 1.06055902  8.7948850
    2     NEsoak - Disturbed  1.6512687 0.17191247  1.0657961  1.6512687  2.5321403  15.0458543 0.24067244  5.0971364
    3     NWsoak - Disturbed 19.1029295 2.62405689 12.8337712 19.1029295 27.8190728 183.3046958 2.62405689 52.5802164
    4     SEsoak - Disturbed  4.1066968 0.59830817  2.8466216  4.1066968  5.7985525  32.5714541 0.70763248  9.9739487
    5     SWsoak - Disturbed  4.9137160 0.33815372  3.1774973  4.9137160  8.0255117  55.9231149 0.85969057 15.7892225
    6  Upperside - Disturbed  3.1861182 0.32219367  1.8454384  3.1861182  5.6032516  36.7013276 0.32219367 12.3370325
    7     NEsoak - Lowerside  0.4302645 0.09341336  0.3011472  0.4302645  0.5996800   1.7768091 0.12722450  0.9474301
    8     NWsoak - Lowerside  4.7596871 1.40687834  3.5717100  4.7596871  6.6442824  22.2970521 1.40687834 10.2087762
    9     SEsoak - Lowerside  1.0230626 0.30087781  0.8127477  1.0230626  1.3137698   3.8331690 0.49229150  2.0169133
    10    SWsoak - Lowerside  1.2891018 0.16286740  0.8997906  1.2891018  1.8098058   7.7462736 0.16286740  3.0991319
    11 Upperside - Lowerside  0.8160707 0.15518045  0.5505156  0.8160707  1.2557586   5.5328028 0.15518045  2.2845561
    12       NWsoak - NEsoak 11.3275794 2.38413610  7.9713011 11.3275794 16.4057151  74.0606945 3.52147567 27.5885849
    13       SEsoak - NEsoak  2.4364756 0.49206697  1.7481048  2.4364756  3.4164552   8.6936643 0.70461238  5.4812849
    14       SWsoak - NEsoak  2.9909563 0.49728521  2.0435734  2.9909563  4.3208633  22.0843886 0.64089541  7.6826544
    15    Upperside - NEsoak  1.9914847 0.17379858  1.4008755  1.9914847  2.7697335  13.3831349 0.39992138  4.8920306
    16       SEsoak - NWsoak  0.2155425 0.05343626  0.1554463  0.2155425  0.2892431   0.9090968 0.08251825  0.4751569
    17       SWsoak - NWsoak  0.2691229 0.04348677  0.1833171  0.2691229  0.3791270   1.4805610 0.06255997  0.6595744
    18    Upperside - NWsoak  0.1718971 0.02995233  0.1149770  0.1718971  0.2468979   0.9966972 0.02995233  0.4584054
    19       SWsoak - SEsoak  1.2630026 0.14038551  0.8964932  1.2630026  1.7209516   8.8383401 0.27528176  2.8265154
    20    Upperside - SEsoak  0.7956164 0.13375965  0.5553199  0.7956164  1.1470159   5.5829958 0.13375965  2.0120800
    21    Upperside - SWsoak  0.6492371 0.10376765  0.4309988  0.6492371  0.9598559   5.3098336 0.11219507  1.7852750
    
    ## Due to log laws, these effects are multipliers rather than differences
    
    ## If you want to express the outcomes in terms of differences then it is necessary to 
    ## estimate the cell posteriors and subtract those - this cannot easily be done Frequentist!
    Xmat <- model.matrix(~log(METERS) + HABITAT, data=newdata)
    fit <- exp(coefs %*% t(Xmat))
    library(multcomp)
    tuk.mat <- contrMat(n=table(newdata$HABITAT), type="Tukey")
    
    fit <- fit %*% t(tuk.mat)
    plyr:::adply(as.matrix(fit),2,MCMCsum)
    
                          X1      Median        X0.        X25.        X50.        X75.       X100.       lower      upper
    1  Lowerside - Disturbed   4.0689569  -2.202499   3.0265807   4.0689569   5.3089091 14.57838684   1.2409552  8.1448004
    2     NEsoak - Disturbed   0.9017660  -7.312479   0.1340833   0.9017660   1.6467105  6.94889870  -1.8583399  3.6882766
    3     NWsoak - Disturbed  25.4062806   9.963564  20.2980553  25.4062806  32.7550632 85.26925867  10.6408161 51.6049801
    4     SEsoak - Disturbed   4.2722895  -2.581770   3.3043011   4.2722895   5.4839030 14.10709025   1.0868378  8.2307350
    5     SWsoak - Disturbed   5.7266432  -3.546058   3.8621775   5.7266432   8.1649238 31.40750951   0.2218100 13.8002744
    6  Upperside - Disturbed   3.0285388  -5.831655   1.6724786   3.0285388   4.9985728 16.28502930  -0.8236566  9.9717337
    7     NEsoak - Lowerside  -3.1348894 -14.535877  -4.6852567  -3.1348894  -1.9589919  3.03830738  -8.6001836  0.6197858
    8     NWsoak - Lowerside  21.3923404   3.912020  15.6498096  21.3923404  28.6200122 78.99186890   4.2933904 46.4128749
    9     SEsoak - Lowerside   0.1404985 -10.703890  -1.1670427   0.1404985   1.5726694 11.06519437  -5.7490841  4.1488093
    10    SWsoak - Lowerside   1.7227120 -10.186649  -0.5544462   1.7227120   3.9412710 27.85100117  -5.1579731 12.1380466
    11 Upperside - Lowerside  -1.0124225 -12.908841  -3.0498762  -1.0124225   1.2095389 12.87205248  -7.8457054  6.6083564
    12       NWsoak - NEsoak  24.7864257   7.193430  19.1047828  24.7864257  32.1681075 85.29390321   7.1934299 50.4455294
    13       SEsoak - NEsoak   3.3941761  -3.446630   2.1103394   3.3941761   4.7262440 12.75358000  -0.2609212  8.8197256
    14       SWsoak - NEsoak   4.6908240  -2.163572   2.9268259   4.6908240   7.1472431 30.53130201  -0.3405123 14.5425090
    15    Upperside - NEsoak   2.3346951  -9.374825   1.0356404   2.3346951   3.7770028 15.27627974  -2.4397183  7.7411171
    16       SEsoak - NWsoak -20.9607257 -77.798077 -28.8479819 -20.9607257 -15.6463026 -1.08124784 -47.7083030 -6.1904502
    17       SWsoak - NWsoak -19.5044667 -81.107760 -26.6646773 -19.5044667 -13.4849094  9.43627683 -47.2339163 -0.3334677
    18    Upperside - NWsoak -22.5288701 -82.524769 -29.9152465 -22.5288701 -16.3370635 -0.03711989 -50.4070421 -4.8827190
    19       SWsoak - SEsoak   1.5127864 -11.093903  -0.7095885   1.5127864   3.9371915 28.36110107  -5.4554441 10.4675256
    20    Upperside - SEsoak  -1.1427166 -12.535355  -3.0425532  -1.1427166   0.7945388 14.15899246  -6.9809698  6.0547850
    21    Upperside - SWsoak  -2.3541702 -23.471257  -5.2799897  -2.3541702  -0.2025978 12.69251435 -12.3680295  5.9271043
    
  10. Produce a suitable summary plot
    Show code
    coefs <-as.matrix(as.data.frame(rstan:::extract(moths.brmNB$fit)))
    coefs <- coefs[,grep('^b_', colnames(coefs))]
    newdata <- data.frame(HABITAT=levels(moths$HABITAT), METERS=mean(moths$METERS))
    Xmat <- model.matrix(~log(METERS) + HABITAT, data=newdata)
    fit = exp(coefs %*% t(Xmat))
    
    newdata = cbind(newdata, plyr:::adply(fit, 2, function(x) {
      data.frame(Mean=mean(x, na.rm=TRUE),Median=median(x, na.rm=TRUE), HPDinterval(as.mcmc(x)))
    }))
    
    ggplot(moths, aes(y=A, x=HABITAT)) +
      geom_point(color='grey') +
      geom_pointrange(data=newdata, aes(y=Mean, ymin=lower, ymax=upper)) +
      scale_y_continuous('Number of Moths') +
      scale_x_discrete('HABITAT') +
      theme_classic() +
      theme(axis.line.x=element_line(), axis.line.y=element_line())
    
    plot of chunk ws10.6bQ2a_9

Poisson t-test with zero-inflation

Yet again our rather fixated marine ecologist has returned to the rocky shore with an interest in examining the effects of wave exposure on the abundance of yet another intertidal marine limpet Patelloida latistrigata on rocky intertidal shores. It would appear that either this ecologist is lazy, is satisfied with the methodology or else suffers from some superstition disorder (that prevents them from deviating too far from a series of tasks), and thus, yet again, a single quadrat was placed on 10 exposed (to large waves) shores and 10 sheltered shores. From each quadrat, the number of scaley limpets (Patelloida latistrigata) were counted. Initially, faculty were mildly interested in the research of this ecologist (although lets be honest, most academics usually only display interest in the work of the other members of their school out of politeness - Oh I did not say that did I?). Nevertheless, after years of attending seminars by this ecologist in which the only difference is the target species, faculty have started to display more disturbing levels of animosity towards our ecologist. In fact, only last week, the members of the school's internal ARC review panel (when presented with yet another wave exposure proposal) were rumored to take great pleasure in concocting up numerous bogus applications involving hedgehogs and balloons just so they could rank our ecologist proposal outside of the top 50 prospects and ... Actually, I may just have digressed!

Download LimpetsScaley data set
Format of limpetsScaley.csv data files
CountShore
4sheltered
1sheltered
2sheltered
0sheltered
4sheltered
......

ShoreCategorical description of the shore type (sheltered or exposed) - Predictor variable
CountNumber of limpets per quadrat - Response variable
turf

Open the limpetsSmooth data set.

Show code
limpets <- read.table('../downloads/data/limpetsScaley.csv', header=T, sep=',', strip.white=T)
limpets
   Count     Shore
1      2 sheltered
2      1 sheltered
3      3 sheltered
4      1 sheltered
5      0 sheltered
6      0 sheltered
7      1 sheltered
8      0 sheltered
9      2 sheltered
10     1 sheltered
11     4   exposed
12     9   exposed
13     3   exposed
14     1   exposed
15     3   exposed
16     0   exposed
17     0   exposed
18     7   exposed
19     8   exposed
20     5   exposed
  1. As part of the routine exploratory data analysis:
    1. Explore the distribution of counts from each population
      Show code
      boxplot(Count~Shore,data=limpets)
      rug(jitter(limpets$Count), side=2)
      
      plot of chunk ws10.6bQ4_1a
    2. Compare the expected number of zeros to the actual number of zeros to determine whether the model might be zero inflated.
      • Calculate the proportion of zeros in the data
        Show code
        limpets.tab<-table(limpets$Count==0)
        limpets.tab/sum(limpets.tab)
        
        FALSE  TRUE 
         0.75  0.25 
        
      • Now work out the proportion of zeros we would expect from a Poisson distribution with a lambda equal to the mean count of limpets in this study.
        Show code
        limpets.mu <- mean(limpets$Count)
        cnts <-rpois(1000,limpets.mu)
        limpets.tab<-table(cnts==0)
        limpets.tab/sum(limpets.tab)
        
        FALSE  TRUE 
         0.91  0.09 
        
        Clearly there are substantially more zeros than expected so the data are likely to be zero inflated.
  2. Lets explore a zero-inflated Poisson (ZIP) model. $$ p(y_i|\theta,\lambda) = \left\{ \begin{array}{l l} \theta + (1-\theta)\times \text{Pois}(0|\lambda) & \quad \text{if $y_i=0$ and}\\ (1-\theta)\times \text{Pois}(y_i|\lambda) & \quad \text{if $y_i>0$} \end{array} \right.\\ \begin{align} log(\lambda_i)&=\eta_i & (\text{link function})\\ \eta_i&=\beta_0+\beta_1 X_i & (\text{linear predictor})\\ \beta_0, \beta_1&\sim{}\mathcal{N}(0,1000) & (\text{week Bayesian prior})\\ \end{align} $$
    Show code
    limpets.brm <- brm(Count~0+spec+main+main:Shore, family='zero_inflated_poisson', data=limpets,
      prior = c(set_prior("normal(0,1000)", class="b")),
      chains=3, iter=2000, warmup=1000, thin=4)
    
    SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 1).
    
    Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.161603 seconds (Warm-up)
    #                0.167162 seconds (Sampling)
    #                0.328765 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 2).
    
    Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.567222 seconds (Warm-up)
    #                0.092628 seconds (Sampling)
    #                0.65985 seconds (Total)
    # 
    
    SAMPLING FOR MODEL 'zero_inflated_poisson(log) brms-model' NOW (CHAIN 3).
    
    Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
    Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
    Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
    Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
    Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
    Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
    Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
    Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
    Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
    Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
    Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
    Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
    #  Elapsed Time: 0.35152 seconds (Warm-up)
    #                0.126483 seconds (Sampling)
    #                0.478003 seconds (Total)
    # 
    
  3. Explore the chain mixing diagnostics.
    1. Trace plots
      Show code
      library(gridExtra)
      grid.arrange(stan_trace(limpets.brm$fit, ncol=1),
                   stan_dens(limpets.brm$fit, separate_chains=TRUE,ncol=1),
                   ncol=2)
      
      plot of chunk ws10.6bQ4_5
    2. Autocorrelation
      Show code
      stan_ac(limpets.brm$fit)
      
      plot of chunk ws10.6bQ4_5b
    3. Step size characteristics (STAN only)
      Show code
      summary(do.call(rbind, args = get_sampler_params(limpets.brm$fit, inc_warmup = FALSE)), digits = 2)
      
       accept_stat__    stepsize__    treedepth__   n_leapfrog__  n_divergent__ 
       Min.   :0.00   Min.   :0.22   Min.   :1.0   Min.   : 1.0   Min.   :0.00  
       1st Qu.:0.80   1st Qu.:0.22   1st Qu.:3.0   1st Qu.: 4.0   1st Qu.:0.00  
       Median :0.93   Median :0.22   Median :3.0   Median : 7.0   Median :1.00  
       Mean   :0.84   Mean   :0.28   Mean   :3.1   Mean   : 7.8   Mean   :0.51  
       3rd Qu.:1.00   3rd Qu.:0.40   3rd Qu.:4.0   3rd Qu.:11.0   3rd Qu.:1.00  
       Max.   :1.00   Max.   :0.40   Max.   :5.0   Max.   :24.0   Max.   :1.00  
      
      stan_diag(limpets.brm$fit)
      
      plot of chunk ws10.6bQ4_6a
      stan_diag(limpets.brm$fit, information = "stepsize")
      
      plot of chunk ws10.6bQ4_6a
      stan_diag(limpets.brm$fit, information = "treedepth")
      
      plot of chunk ws10.6bQ4_6a
      stan_diag(limpets.brm$fit, information = "divergence")
      
      plot of chunk ws10.6bQ4_6a
      library(gridExtra)
      grid.arrange(stan_rhat(limpets.brm$fit) + theme_classic(8),
                   stan_ess(limpets.brm$fit) + theme_classic(8),
                   stan_mcse(limpets.brm$fit) + theme_classic(8),
                   ncol = 2)
      
      plot of chunk ws10.6bQ4_6a
  4. Explore the model fit diagnostics.
    1. Explore the patterns in simulated residuals:
      Show code
      lambda.brm = exp(fitted(limpets.brm, scale='linear', summary=FALSE))[,1:nrow(limpets)]
      theta = binomial()$linkinv(mean(rstan:::extract(limpets.brm$fit, 'b_spec')[[1]]))
      simRes(lambda.brm, limpets$Count, family='zip', theta=theta)
      
      plot of chunk ws10.6bQ4_7a
       [1] 0.660 0.484 0.944 0.464 0.288 0.196 0.736 0.312 0.768 0.636 0.592 0.996 0.376 0.056 0.420 0.008 0.000 0.884 0.944 0.708
      
    2. Goodness of fit test:
      Show code
      ## Poisson
      Resid.brm <- residuals(limpets.brm, type='pearson', summary=FALSE)
      SSres.brm <- apply(Resid.brm^2,1,sum)
      lambda.brm = fitted(limpets.brm, scale='response', summary=FALSE)
      YNew.brm <- matrix(rpois(length(lambda.brm), lambda=lambda.brm),nrow=nrow(lambda.brm))
      
      Resid1.brm<-(lambda.brm - YNew.brm)/sqrt(lambda.brm)
      SSres.sim.brm<-apply(Resid1.brm^2,1,sum)
      mean(SSres.sim.brm>SSres.brm)
      
      [1] 0.07466667
      
    3. Fitted and observed values (plot):
      Show code
      newdata <- data.frame(Shore=levels(limpets$Shore))
      tempdata = fitted(limpets.brm, scale='response',newdata=newdata)
      tempdata <- cbind(newdata, Mean=tempdata[,'Estimate'], Lower=tempdata[,'2.5%ile'],
                      Upper=tempdata[,'97.5%ile'])
      library(ggplot2)
      ggplot(limpets, aes(y=Count, x=Shore)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=Lower, ymax=Upper), color='red')
      
      plot of chunk ws10.6bQ4_6d
      ##OR the manual way
      Xmat <- model.matrix(~Shore, data=newdata)
      coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))
      coefs <- coefs[,grep('^b_main', colnames(coefs))]
      fit = exp(coefs %*% t(Xmat))
      tempdata <- cbind(newdata, plyr:::adply(fit, 2, function(x) {
       data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
      }))
      ggplot(limpets, aes(y=Count, x=Shore)) + geom_boxplot() +
       geom_pointrange(data=tempdata,aes(y=Mean, ymin=lower, ymax=upper), color='red')
      
      plot of chunk ws10.6bQ4_6d
    4. Check for overdispersion:
      Show code
      coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))
      coefs <- coefs[,grep('^b_', colnames(coefs))]
      Resid <- residuals(limpets.brm, type='pearson',summary=FALSE)
      RSS <- apply(Resid^2,1,sum)
      Disp <- RSS/(nrow(limpets)-ncol(coefs))
      data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))
      
             Median     Mean    lower    upper
      var1 1.750797 1.789582 1.686107 2.004651
      
  5. Given that the model seems to be well specified and there are no diagnostics that would question the reliability of the model, lets explore the model parameters.
    Show code
    summary(limpets.brm)
    
     Family: zero_inflated_poisson (log) 
    Formula: Count ~ 0 + spec + main + main:Shore 
       Data: limpets (Number of observations: 20) 
    Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 4; 
             total post-warmup samples = 750
       WAIC: Not computed
     
    Fixed Effects: 
                        Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
    spec                 -787.73    602.86 -2246.33   -35.06        459    1
    main                    1.37      0.16     1.03     1.68        520    1
    main:Shoresheltered    -1.33      0.34    -2.00    -0.70        590    1
    
    Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
    is a crude measure of effective sample size, and Rhat is the potential 
    scale reduction factor on split chains (at convergence, Rhat = 1).
    
    exp(coef(limpets.brm))
    
                             mean
    spec                0.0000000
    main                3.9513940
    main:Shoresheltered 0.2637503
    
    coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))
    coefs <- coefs[,grep('^b_', colnames(coefs))]
    plyr:::adply(exp(coefs), 2, function(x) {
      data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
      })
    
                         X1         Mean        median     lower        upper
    1                b_spec 5.746643e-08 3.422828e-287 0.0000000 1.358137e-28
    2                b_main 4.000980e+00  3.973522e+00 2.8246056 5.369929e+00
    3 b_main.Shoresheltered 2.787194e-01  2.671325e-01 0.1166521 4.613737e-01
    
    marginal_effects(limpets.brm)
    
    plot of chunk ws10.6bQ4_6
  6. Estimate the strength of the relationship
    Show code
    ## calculate the expected values on the response scale
    lambda.brm = fitted(limpets.brm, scale='response', summary=FALSE)
    ## calculate the raw SSresid
    SSres.brm <- apply((-1*(sweep(lambda.brm,2, limpets$Count,'-')))^2,1,sum)
    SSres.null <- sum((limpets$Count - mean(limpets$Count))^2)
    #OR 
    #calculate the model r2
    1-mean(SSres.brm)/SSres.null
    
    [1] 0.2554819
    
    #OR manually
    coefs <- as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))
    coefs <- coefs[,grep('^b_main', colnames(coefs))]
    Xmat <- model.matrix(~Shore, data=limpets)
    lambda.brm = exp(coefs %*% t(Xmat))
    
    ## calculate the raw SSresid
    SSres.brm <- apply((-1*(sweep(lambda.brm,2,limpets$Count,'-')))^2,1,sum)
    SSres.null <- crossprod(limpets$Count - mean(limpets$Count))
    #calculate the model r2
    1-mean(SSres.brm)/SSres.null
    
              [,1]
    [1,] 0.2554819
    
    ## Alternatively..
    R.var = apply(residuals(limpets.brm, summary=FALSE),1,var)
    X.var = apply(fitted(limpets.brm, summary=FALSE),1,var)
    R2.marginal <- X.var/(X.var + R.var)
    data.frame(Median=median(R2.marginal), coda:::HPDinterval(as.mcmc(R2.marginal)))
    
            Median     lower     upper
    var1 0.2907571 0.1057379 0.4574874
    
  7. Produce a suitable summary plot
    Show code
    coefs <-as.matrix(as.data.frame(rstan:::extract(limpets.brm$fit)))
    coefs <- coefs[,grep('^b_main', colnames(coefs))]
    newdata <- data.frame(Shore=levels(limpets$Shore))
    Xmat <- model.matrix(~Shore, data=newdata)
    fit = exp(coefs %*% t(Xmat))
    
    newdata = cbind(newdata, plyr:::adply(fit, 2, function(x) {
      data.frame(Mean=mean(x, na.rm=TRUE),Median=median(x, na.rm=TRUE), HPDinterval(as.mcmc(x)))
    }))
    
    ggplot(limpets, aes(y=Count, x=Shore)) +
      geom_point(color='grey') +
      geom_pointrange(data=newdata, aes(y=Mean, ymin=lower, ymax=upper)) +
      scale_y_continuous('Abundance of limpets') +
      scale_x_discrete('Shore type') +
      theme_classic() +
      theme(axis.line.x=element_line(), axis.line.y=element_line())
    
    plot of chunk ws10.6bQ4_9

Exponential family of distributions

The exponential distributions are a class of continuous distribution which can be characterized by two parameters. One of these parameters (the location parameter) is a function of the mean and the other (the dispersion parameter) is a function of the variance of the distribution. Note that recent developments have further extended generalized linear models to accommodate other non-exponential residual distributions.

End of instructions