Jump to main navigation


Tutorial 11.2a - Generalized linear mixed effects models

12 Mar 2016

Generalized linear mixed effects models

Tutorials 9.1, 9.2a, 9.3a and 9.4a introduced hierarchical (or mixed effects) designs and models in which sampling units are arranged in space and time so as to reduce associated sources of unexplained variability and the models incorporate some provisions for resulting dependency structure. Tutorials 10.4, 10.5a, and 11.5a introduced linear models that accommodated residual distributions other than Gaussian (normal) distributions.

Parameter estimation

In some respects, Generalized Linear Mixed effects Models (GLMM) are a hierarchical extension of Generalized linear models (GLM) in a similar manner that Linear Mixed effects Models (LMM) are a hierarchical extension of Linear Models (LM). However, whilst the Gaussian (normal) distribution facilitates a relatively straight way of generating the marginal likelihood of the observed response by integrating likelihoods across all possible (and unobserved) levels of a random effect to yield parameter estimates, the same cannot be said for other distributions. Consequently various approximations have been developed to estimate the fixed and random parameters for GLMM's:

Penalized quasi-likelihood (PQL)

This method approximates a quasi-likelihood by iterative fitting of (re)weighted linear mixed effects models based on the fit of GLM fit. Specifically, it estimates the fixed effects parameters by fitting a GLM that incorporates a correlation (variance-covariance) structure resulting from a LMM and then refits a LMM to re-estimate the variance-covariance structure by using the variance structure from the previous GLM. The cycle continues to iterate until either the fit improvement is below a threshold or a defined number of iterations has occurred.

Whilst this is a relatively simple approach, that enables us to leverage methodologies for accommodating heterogeneity and spatial/temporal autocorrelation, it is known to perform poorly (estimates biased towards large variance) for Poisson distributions when the expected value is less than 5 and for binary data when the expected number of successes or failures are less than 5. Moreover, as it approximates quasi-likelihood rather than likelihood, likelihood based inference and information criterion methods (such as likelihood ratio tests and AIC) are not appropriate with this approach. Instead, Wald tests are required for inference.

Laplace approximation

This approach utilizes a second-order Taylor series expansion to approximate (a mathematical technique for approximating the properties of a function around a point by taking multiple derivatives of the function and summing them together) the likelihood function. If we assume that the likelihood function is approximately normal and thus a quadratic function on a log scale, we can use second-order Taylor series expansion to approximate this likelihood.

Whilst this approach is considered to be more accurate than PQL, it is considerably slower and unable to accommodate alternative variance and correlation structures.

Gauss-Hermite quadrature (GHQ)

This approach approximates the marginal likelihood by approximating the value of integrals at specific points (quadratures). This technique can be further adapted by allowing the number of quadratures and their weights to be optimized via a set of rules.

Markov-chain Monte-Carlo MCMC

This takes a bruit force approach by recreating the likelihood by traversing the likelihood function with sequential sampling proportional to the likelihood. Although this approach is very robust (when the posteriors have converged), they are computationally very intense. Interestingly, some (including Andrew Gelman) argue that PQL, Laplace and GHQ do not yield estimates. Rather they are only approximations of estimates. By contrast, as MCMC methods are able to integrate over all levels by bruit force, the resulting parameters are indeed true estimates.

Inference (hypothesis) testing

Inference (and hypothesis testing) in linear mixed effects models has become a very volatile topic over the last 5-10 years. The purists claim that in hierarchical models, determining the appropriate denominator degrees of freedom to use calculating F-ratio significance is without an exact (or even approximate) solution. Whilst some LME (and thus GLMM) routines do provide p-values, these are based on the traditional techniques of establishing the appropriate error mean squares (and degrees of freedom - the so called 'inner-outer' or 'Between-within' method) for each level of the hierarchy. However, as just stated, these methods are now considered potentially inappropriate. The essence of the argument is that the effective number of parameters (and thus degrees of freedom) used by random factors is somewhere between 1 (a single standard deviation estimate) and $N$ (the number of random-effect levels).

For linear mixed effects models (LMM), \citet{Bolker-2008-127} suggest performing likelihood-ratio tests. These tests involve comparing successive models with and without the terms of interest and comparing -2 times the log-likelihood ratio to a $\chi^2$ distribution. However, this approach is not recommended for generalized linear models (particularly those estimated by penalized quasilikelihood) \citep{Pinheiro-2000-2000}.

For GLMM's without over-dispersion, \citet{Bolker-2008-127} recommend the use of Wald $\chi^2$ tests for hypothesis testing, yet indicates that Wald $F$ tests are more appropriate when overdispersion exists as they account for uncertainty in the estimates of overdispersion. Unfortunately, this brings us almost full circle, since in order to perform Wald $F$ tests, estimates of residual degrees of freedom are required!

There are a number of (partially accepted current methods of estimating these degrees of freedom). To date, the major R players in this field are unsatisfied and thus take the (rather non-pragmatic) approach of not supporting any techniques. This is fine, if this is your field of research. However, if you are a research scientist trying to use these tools to analyse your data and get research passed editors and reviewers (many of whom have just enough statistical knowledge to know that there is a problem, but do not have a solutions themselves), these philosophical high-grounds are of no use.

In previous years, the most popular means of performing GLMM in R was with the glmmPQL (MASS) function. This uses a penalized quasi-likelihood method and was thought to provide a good compromise between speed and accuracy. More recent attention has been drawn to the developments of the glmer (lme4) function. This function uses either Laplace or Guass-Hermit approximations of likelihood instead of quasilikelihood and are therefore thought to yield more accurate estimates (and thus inferences). This function is apparently supposed to reflect the current state of thought and statistical theory in the fields of generalized linear mixed effects models. To that end, this function represents the bleeding edge of GLMM. The developers do not support Wald tests (for the reasons discussed above) and alternative variance-covariance structures (including spatial/temporal correlation) are yet to be incorporated. Unfortunately, such is the promise of this approach, that the development of many other routines (such as Wood's new generalized additive mixed effects modeling) are contingent on the stability, progress and appropriateness of the glmer build.

Recently Ben Bolker performed some simulations that demonstrated serious errors with glmer (lme4) when modelling with quasi families. As a result, the lme4 dev team (Douglas Bates) have now (temporarily!) forbid the use of "quasi" families in glmer. In the eyes of many, this drastically reduces the usefulness of this routine for GLMM. However, even more recently, the lme4 developers (and others) have advocated the use of observation level random effects as a means of accounting for over-dispersion in glmer. This approach constrains the residuals of the model to have magnitudes that are consistent with a theoretical variance of 1 (and thus dispersion parameter of 1), thereby restoring some of the merits of the glmer function.

Inference testing options vary depending on the estimation approximation methods used (PQL, Laplace or GHQ) well as the nature of the fitted model (whether it is overdispersed or not).

Wald $Z$ and $\chi^2$ tests

Wald $Z$ and $\chi^2$ tests compare parameter estimates (scaled by their standard errors) to a test statistic ($Z$ or $\chi^2$) of zero yet are only suitable in the absence of overdispersion. As parameters associated with random effects typically pertain to standard deviation (estimating amount of variability due to the random effect) and standard deviation cannot be less than zero, hypothesis tests that compare to a statistic of zero are said to be testing at the boundary and tend to be overly conservative (inflated Type II error). Consequently, Wald tests are not appropriate for testing inferences about random effects.

Wald $F$ and $\chi^2$ tests

Similarly Wald $F$ and $t$ tests compare parameter estimates (scaled by their standard errors) to a test statistic ($F$ or $t$) of zero, however are more suitable when the model is overdispersed. $F$ and $t$ tests require estimates of residual degrees of freedom (which supposedly reflects the effective number of parameters in the random effect). For mixed effects models, the effective number of parameters in the random effect lies somewhere between $1$ and $N-1$ (where $N$ is the number of random effects levels). As a result of the obscurity of the appropriate residual degrees of freedom, Wald $F$ and $t$ tests should be used with caution.

Likelihood ratio test (LRT)

Likelihood ratio tests (LRT) compare the fit (as measured via deviance = $-2\times \text{log-likelihood}$) of a model with and without a factor (fixed or random) and are preferred over Wald tests for testing random effects. LRT are only considered to be reliable for fixed effects when the sample sizes are very large relative to the number of fixed and random factor levels. Whilst a threshold for what constitutes 'large', ecological models would rarely be considered large enough. Moreover, LRT is inappropriate for models fit via PQL.

Summary of analysis options

The number of GLMM estimation and inference routines and associated R packages is sufficiently large and confusing to warrant a tabular summations. I will start by contrasting the various approximation routines.

ApproximationCharacteristicsAssociated inferenceR function
Penalized Quasi-likelihood (PQL) Fast and simple, accommodates heterogeneity and dependency structures, biased for small samples Wald tests only glmmPQL (MASS)
Laplace More accurate (less biased), slower, does not accommodates heterogeneity and dependency structures LRT glmer (lme4), glmmadmb (glmmADMB)
Gauss-Hermite quadrature Even more accurate (less biased), even slower, does not accommodates heterogeneity and dependency structures LRT glmer (lme4)?? *Does not seem to work*
Markov Chain Monte Carlo (MCMC) Bayesian, very flexible and accurate, yet very slow and complex Bayesian credibility intervals, Bayesian P-values Numerous (see this tutorial)

Lets now contrast the various R functions.

FeatureglmmPQL (MASS)glmer (lme4)glmmadmb (glmmADMB)
Variance and covariance structures Yes - not yet
Overdispersed (Quasi) families Yes - -
Complex nesting Yes Yes Yes
Zero-inflation - - Yes
Resid degrees of freedom Between-Within - -
Parameter tests Wald $t$ Wald $z$ Wald $z$
Marginal tests (fixed effects) Wald $F$,$\chi^2$ Wald $F$,$\chi^2$ Wald $F$,$\chi^2$
Marginal tests (Random effects) Wald $F$,$\chi^2$ LRT LRT
Information criterion - Yes Yes

Finally, it might also be worth capturing the essence of some of this information into a combination of decision tree and tables.


Finally, it should also go without saying that other linear modelling issues that were individually addressed in previous tutorials are also potentially important within GLMM's:

The current tutorial focuses on the issues and techniques that arise when these two broad concepts interact in models that combine hierarchical structures with a broader set of residual distributions.

Hierarchical Poisson regression

Scenario and Data

Consider again the experimental design in which we wished to measure of the abundance of individuals of a species (or number of incidents etc) in response to some categorical influence (such as level of exposure to some disturbance(s) - 'high','medium', 'low'). In an attempt to minimize the expected extent of unexplained variability (due to highly heterogeneous underlying conditions across the landscape), each of the exposure 'treatments' were applied within a relatively small spatial scale (individual sites) - that is, they were blocked. In all, we will employ 20 randomly (or haphazardly selected) sites and the treatments will be applied randomly (or haphazardly) within sites.

Random data incorporating the following trends (effect parameters)
  • the number of sites ($sites$) = 20
  • the categorical treatment $treat$ variable comprising of three levels ('High', 'Medium' and 'Low')
  • the mean value of $y$ of the 'low' treatment is 10
  • the standard deviation in 'low' treatment $y$ values across the 20 sites is 5
  • the mean effect of the 'Medium' effect is -2
  • the mean effect of the 'High' effect is -4
  • to generate the values of $y$ expected at each $treat$ level within each of the $sites$, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
  • finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.
set.seed(5) #3 #7
#number of sites
n.sites <- 20
n.treat <- 3
n<-n.sites*n.treat
sites <- gl(n=n.sites, k=n.treat, lab=paste("Site",1:n.sites,sep=''))
treat <- gl(n=n.treat,1,n, lab=c('Low','Medium','High'))
Xmat <- model.matrix(~-1+sites+treat)

intercept.mean <- 10 # mu alpha
intercept.sd <- 5 # sigma alpha
treatM.effect <- -2
treatH.effect <- -4
intercept.effects<-rnorm(n=n.sites, mean=intercept.mean, sd=intercept.sd)
all.effects <- c(intercept.effects, treatM.effect, treatH.effect)  # Put them all together
lin.pred <- Xmat[,] %*% all.effects
lin.pred[lin.pred<0] <- exp(lin.pred[lin.pred<0])
y <- rpois(n=n, lin.pred)
data<-data.frame(y,treat,sites)
head(data)
   y  treat sites
1  8    Low Site1
2  2 Medium Site1
3  3   High Site1
4 17    Low Site2
5 15 Medium Site2
6  7   High Site2
data.hp <- data
write.table(data.hp, file='../downloads/data/data.hp.csv', quote=F, row.names=F, sep=',')

To some extent this is a classic randomized complete block design. However, since the responses are effectively drawn from a Poisson distribution, the residuals may not follow a normal distribution and importantly the variances may not be independent of the means. With all count data, we should be prepared to fit models against distributions such as a Poisson distribution.

Exploratory data analysis and initial assumption checking

The assumptions are:
  1. All of the observations are independent - the blocking design will introduce a dependency structure (observations within each block are not strictly independent) that we need to address via mixed effects modelling.
  2. The response variable (and thus the residuals) should be matched by an appropriate distribution (in the case of positive integers response - a Poisson is appropriate).
  3. All observations are equally influential in determining the trends - or at least no observations are overly influential. This is most effectively diagnosed via residuals and other influence indices and is very difficult to diagnose prior to analysis
  4. the relationship between the linear predictor (right hand side of the regression formula) and the link function should be linear. A scatterplot with smoother can be useful for identifying possible non-linearity.
  5. If using a Poisson distribution, the dispersion factor is close to 1

There are (at least) five main potential models we could consider fitting to these data:

  1. Ordinary least squares ANOVA (generalized least squares model) - assumes normality and homogeneity of variance of residuals as well as an absence of a dependency structure
  2. Linear mixed effects model - assumes normality and homogeneity of variance of residuals
  3. Poisson mixed effects model - assumes mean=variance (dispersion=1)
  4. Quasi-poisson mixed effects model - a general solution to overdispersion. Assumes variance is a function of mean, dispersion estimated, however likelihood based statistics unavailable
  5. Negative binomial mixed effects model - a specific solution to overdispersion caused by clumping (due to an unmeasured latent variable). Scaling factor ($\omega$) is estimated along with the regression parameters.
  6. Zero-inflation mixed effects model - a specific solution to overdispersion caused by excessive zeros (due to an unmeasured latent variable). Mixture of binomial and Poisson mixed effects models.

Confirm non-normality and explore clumping

When counts are all very large (not close to 0) and their ranges do not span orders of magnitude, they take on very Gaussian-like properties (symmetrical distribution and variance independent of the mean). Given that models based on the Gaussian distribution are more optimized and recognized than Generalized Linear Mixed Effects Models (GLMM), it can be prudent to adopt Gaussian models for such data. Hence it is a good idea to first explore whether a Poisson model is likely to be more appropriate than a standard Gaussian model.

The potential for overdispersion can be explored by adding a rug to boxplot. The rug is simply tick marks on the inside of an axis at the position corresponding to an observation. As multiple identical values result in tick marks drawn over one another, it is typically a good idea to apply a slight amount of jitter (random displacement) to the values used by the rug.

library(ggplot2)
ggplot(data.hp, aes(x=y))+geom_histogram()+facet_wrap(~treat)
plot of chunk tut11.2aS1.2
ggplot(data.hp, aes(y=y, x=1))+geom_boxplot()+geom_rug(position='jitter', sides='l')+facet_grid(~treat)
plot of chunk tut11.2aS1.2
ggplot(data.hp, aes(y=y, x=treat))+geom_point()+facet_wrap(~sites)
plot of chunk tut11.2aS1.2

There are signs of non-normality that would warrant Poisson models. The rug applied to the boxplots does not indicate a serious degree of clumping and there appears to be few zeros. Thus overdispersion is unlikely to be an issue.

Explore zero inflation

Although we have already established that there are few zeros in the data (and thus overdispersion is unlikely to be an issue), we can also explore this by comparing the number of zeros in the data to the number of zeros that would be expected from a Poisson distribution with a mean equal to the mean count of the data.

#proportion of 0's in the data
data.hp.tab<-table(data.hp$y==0)
data.hp.tab/sum(data.hp.tab)
     FALSE       TRUE 
0.93333333 0.06666667 
#proportion of 0's expected from a Poisson distribution
mu <- mean(data.hp$y)
cnts <- rpois(1000, mu)
data.hp.tab <- table(cnts == 0)
data.hp.tab/sum(data.hp.tab)
FALSE  TRUE 
0.999 0.001 

In the above, the value under FALSE is the proportion of non-zero values in the data and the value under TRUE is the proportion of zeros in the data. In this example, the observed proportion (6.7%) corresponds closely to the very low proportion expected (NA, NA%).

Model fitting or statistical analysis

As introduced at the start of this tutorial, Generalized Linear Mixed effects Models (GLMM) can be fit via numerous alternative processes. In a frequentist paradigm, the three most popular estimation routines are Penalized Quasi-likelihood (PQL), Laplace approximation and Gauss-Hermite quadrature (GHQ).

In R, the first of these (PQL) is provided by the glmmPQL() function in the MASS package whereas the later two are provided by the glmer() function in the lme4 package. As these functions are quite different to one another, I will tackle each separately.

PQL - glmmPQL
The most important (=commonly used) parameters/arguments for logistic regression are:
  • fixed: the linear model relating the response variable to the fixed components of the linear predictor
  • random: the linear model relating the response variable to the random components of the linear predictor
  • family: specification of the error distribution (and link function). Can be specified as either a string or as one of the family functions (which allows for the explicit declaration of the link function). For examples:
    • family="poisson" or equivalently family=poisson(link="log")
  • data: the data frame containing the data

library(MASS)
data.hp.glmmPQL <- glmmPQL(y~treat, random=~1|sites, data=data.hp, family="poisson")
Laplace approximation - glmer
The most important (=commonly used) parameters/arguments for logistic regression are:
  • formula: the linear model relating the response variable to the linear predictor (fixed and random factors)
  • family: specification of the error distribution (and link function). Can be specified as either a string or as one of the family functions (which allows for the explicit declaration of the link function). For examples:
    • family="poisson" or equivalently family=poisson(link="log")
  • data: the data frame containing the data
  • nAGQ=1 indicates only a single quadrature point and therefore Laplace approximation (note, this is also the default and thus does not actually need to be specified).
library(lme4)
data.hp.glmerL <- glmer(y~treat+(1|sites), data=data.hp, family='poisson') #Laplacian approximation
Gauss-Hermite quadrature - glmer
The most important (=commonly used) parameters/arguments for logistic regression are:
  • formula: the linear model relating the response variable to the linear predictor (fixed and random factors)
  • family: specification of the error distribution (and link function). Can be specified as either a string or as one of the family functions (which allows for the explicit declaration of the link function). For examples:
    • family="poisson" or equivalently family=poisson(link="log")
  • data: the data frame containing the data
  • nAGQ>1 specifies the number of quadrature points to use - the higher the number the more accurate the approximations will be.

Note that for models with more than one random factor, glmer does not yet allow GHQ. This is an issue that is right at the frontier of statistical development. Ironically, when more than 1 random effect is incorporated, PQL might actually yield less biased variance approximations than Laplace approximation. Arguably, if you are fitting a model that potentially is suffering from the above issues, it might be worth switching over to Bayesian estimation.

library(lme4)
data.hp.glmerGHQ <- glmer(y~treat+(1|sites), data=data.hp, family='poisson', nAGQ=7) #Adaptive Gaussian Quadrature (L=7) gives same as Laplace here

Model evaluation

Expected values > 5
predict(data.hp.glmmPQL, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),type='response',level=0)
[1] 7.552102 6.050263 4.162238
attr(,"label")
[1] "Predicted values"
predict(data.hp.glmerL, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),REform=NA,,type='response')
       1        2        3 
7.261257 5.817254 4.001939 
Conclusions: the expected values are all greater than 5 and thus PQL is likely to yield reasonably good approximations.
Residuals

Lets explore the diagnostics - particularly the residuals

#glmmPQL
plot(data.hp.glmmPQL)
plot of chunk tut11.2aS1.5a
##OR
plot(residuals(data.hp.glmmPQL, type='pearson') ~ predict(data.hp.glmmPQL, type='link'))
plot of chunk tut11.2aS1.5a
plot(residuals(data.hp.glmmPQL, type='pearson') ~ as.numeric(data.hp$treat))
plot of chunk tut11.2aS1.5a
#Laplace
#plot(data.hp.glmerL)
plot(residuals(data.hp.glmerL, type='pearson') ~ predict(data.hp.glmerL, type='link'))
plot of chunk tut11.2aS1.5a
plot(residuals(data.hp.glmerL, type='pearson') ~ as.numeric(data.hp$treat))
plot of chunk tut11.2aS1.5a
#GHQ
#plot(data.hp.glmerGHQ)
plot(residuals(data.hp.glmerGHQ, type='pearson') ~ predict(data.hp.glmerGHQ, type='link'))
plot of chunk tut11.2aS1.5a
plot(residuals(data.hp.glmerGHQ, type='pearson') ~ as.numeric(data.hp$treat))
plot of chunk tut11.2aS1.5a
Conclusions: there are no immediately obvious patterns in the residuals. All three approximation techniques yield very similar residuals.
Goodness of fit (and overdispersion) of the model

Estimating the dispersion parameter in GLMM is a reasonably tricky process. Essentially it involves calculating the ratio of the deviance (or sum of residual sums of squares) to the residual degrees of freedom. However, glmmPQL() (being an implementation of quasi-likelihood) strictly does not approximate the likelihood (and thus deviance) and the developers have gone to lengths to make the deviance difficult to obtain. On the other hand, glmer() developers have do not provide estimates of residual degrees of freedom.

Similarly, n order to estimate the goodness of fit and estimates of overdispersion we need to compare the sum of the squared residuals to chi squared. However, determining which chi squared distribution depends on having an estimation of the residual degrees of freedom. For PQL fits, residual degrees of freedom. is estimated via the 'Between-Within' method. For Laplace and GHQ fits (in R), no estimates of residual degrees of freedom are available.

Consequently, it is necessary to estimate this ourselves. A pragmatic approach is to calculate the degrees of freedom as the number of data rows minus the number of fixed effects minus one (for the residual term). This method considers the random effects as part of the residuals. Alternatively, the random effects could be considered as parameters in which case we could also subtract the number of levels of the random effects. We will adopt the former.

Ben Bolker has put together a FAQ [http://glmm.wikidot.com/faq] that provides numerous explanations and code snippets relating to issues of fitted GLMM's in R. On this page, he includes a small function that he has written that calculates the chi squared statistic, estimated residual degrees of freedom (assuming that the random effects are part of the residuals) and associated p-value for assessing overdispersion from models fit via glmer. This function also provides a measure of the dispersion statistic (ratio of sum of squared residuals to residual degrees of freedom). We can also explore the goodness of the fit of the model via:

  • Pearson's $\chi^2$ residuals - explores whether there are any significant patterns remaining in the residuals and in particular whether there is any evidence of overdispersion
    #PQL
    data.hp.resid <- sum(resid(data.hp.glmmPQL, type = "pearson")^2)
    1 - pchisq(data.hp.resid, data.hp.glmmPQL$fixDF$X[2])
    
    [1] 0.2620208
    
    #OR
    rdf <- length(data.hp.glmmPQL$data$y)-length(fixef(data.hp.glmmPQL))-1
    rdf
    
    [1] 56
    
    1 - pchisq(data.hp.resid, rdf)
    
    [1] 0.8966153
    
    data.hp.resid /rdf
    
    [1] 0.7697555
    
    #Laplace
    #scale parameter (That quantity is the square root of the penalized residual sum of
    #squares divided by n, the number of observations)
    sqrt( sum(c(residuals(data.hp.glmerL), data.hp.glmerL@u) ^2) / length(residuals(data.hp.glmerL)))
    
    [1] 1.090584
    
    data.hp.resid <- sum(resid(data.hp.glmerL, type = "pearson")^2)
    rdf <- nrow(model.frame(data.hp.glmerL))-length(fixef(data.hp.glmerL))-1
    rdf
    
    [1] 56
    
    1 - pchisq(data.hp.resid, rdf)
    
    [1] 0.7720175
    
    #OR using Ben Bolker's function
    overdisp_fun <- function(model) {
      ## number of variance parameters in 
      ##   an n-by-n variance-covariance matrix
      vpars <- function(m) {
        nrow(m)*(nrow(m)+1)/2
      }
      model.df <- sum(sapply(VarCorr(model),vpars))+length(fixef(model))
      rdf <- nrow(model.frame(model))-model.df
      rp <- residuals(model,type="pearson")
      Pearson.chisq <- sum(rp^2)
      prat <- Pearson.chisq/rdf
      pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
      c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
    }
    
    overdisp_fun(data.hp.glmerL)
    
         chisq      ratio        rdf          p 
    47.8656555  0.8547438 56.0000000  0.7720175 
    
Conclusions: the Pearson's residuals from neither model indicate a lack of fit or evidence of overdispersion (p values greater than 0.05).

Exploring the model parameters, test hypotheses

If there was any evidence that the assumptions had been violated or the model was not an appropriate fit, en we would need to reconsider the model and start the process again. In this case, there is no evidence that the test will be unreliable so we can proceed to explore the test statistics. The main statistic of interest is the Wald statistic ($t$ for PQL $z$ for Laplace) for the fixed effects parameters.

Along with the parameter estimates (and there associated Wald tests), we can also investigate the influence of the factor as a whole. This is an approach analogous to ANOVA that explores the main effects. Since likelihood ratio tests (LRT) are inappropriate for PQL models, and LRT for fixed factors should only be based on models that employ REML (which is not currently available for GLMM in R), testing main effects is best done via Wald tests and the wald.test() function in the aod package.

The wald.test() function uses the fixed effects parameter estimates (b) along with the variance-covariance matrix (Sigma) and a specification of which fixed factor terms (Terms) to combine for the Wald statistic. In this case, to explore the overall effect of treatment, we need to combine factor effects 2 (the effect of the 'Medium' treatment) and 3 (the effect of the 'High treatment).

summary(data.hp.glmmPQL)
Linear mixed-effects model fit by maximum likelihood
 Data: data.hp 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 Formula: ~1 | sites
        (Intercept) Residual
StdDev:   0.6235177 1.066625

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects: y ~ treat 
                 Value Std.Error DF   t-value
(Intercept)  2.0218259 0.1677375 38 12.053512
treatMedium -0.2217241 0.1236839 38 -1.792668
treatHigh   -0.5957730 0.1383848 38 -4.305190
            p-value
(Intercept)  0.0000
treatMedium  0.0810
treatHigh    0.0001
 Correlation: 
            (Intr) trtMdm
treatMedium -0.328       
treatHigh   -0.293  0.398

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3 
-1.7731484 -0.5340131 -0.1107496  0.4204912 
       Max 
 2.4987085 

Number of Observations: 60
Number of Groups: 20 
library(aod)
Error in library(aod): there is no package called 'aod'
wald.test(b=fixef(data.hp.glmmPQL),
 Sigma=vcov(data.hp.glmmPQL),Terms=2:3)
Error in eval(expr, envir, enclos): could not find function "wald.test"
intervals(data.hp.glmmPQL)
Approximate 95% confidence intervals

 Fixed effects:
                 lower       est.       upper
(Intercept)  1.6908571  2.0218259  2.35279471
treatMedium -0.4657691 -0.2217241  0.02232094
treatHigh   -0.8688250 -0.5957730 -0.32272101
attr(,"label")
[1] "Fixed effects:"

 Random Effects:
  Level: sites 
                    lower      est.     upper
sd((Intercept)) 0.4122913 0.6235177 0.9429602

 Within-group standard error:
    lower      est.     upper 
0.8468539 1.0666254 1.3434310 
library(gmodels)
ci(data.hp.glmmPQL)
              Estimate   CI lower    CI upper
(Intercept)  2.0218259  1.6822591  2.36139273
treatMedium -0.2217241 -0.4721090  0.02866083
treatHigh   -0.5957730 -0.8759185 -0.31562757
            Std. Error DF      p-value
(Intercept)  0.1677375 38 1.490138e-14
treatMedium  0.1236839 38 8.098973e-02
treatHigh    0.1383848 38 1.130374e-04
VarCorr(data.hp.glmmPQL)
sites = pdLogChol(1) 
            Variance  StdDev   
(Intercept) 0.3887743 0.6235177
Residual    1.1376898 1.0666254
summary(data.hp.glmerL)
Generalized linear mixed model fit by maximum
  likelihood (Laplace Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: y ~ treat + (1 | sites)
   Data: data.hp

     AIC      BIC   logLik deviance df.resid 
   326.0    334.4   -159.0    318.0       56 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.8914 -0.5527 -0.1060  0.4859  2.6610 

Random effects:
 Groups Name        Variance Std.Dev.
 sites  (Intercept) 0.435    0.6596  
Number of obs: 60, groups:  sites, 20

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   1.9826     0.1695  11.699  < 2e-16
treatMedium  -0.2217     0.1120  -1.980   0.0477
treatHigh    -0.5958     0.1253  -4.754 1.99e-06
               
(Intercept) ***
treatMedium *  
treatHigh   ***
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trtMdm
treatMedium -0.294       
treatHigh   -0.263  0.398
library(aod)
Error in library(aod): there is no package called 'aod'
wald.test(b=fixef(data.hp.glmerL),
 Sigma=vcov(data.hp.glmerL),Terms=2:3)
Error in eval(expr, envir, enclos): could not find function "wald.test"
anova(data.hp.glmerL)
Analysis of Variance Table
      Df Sum Sq Mean Sq F value
treat  2 22.148  11.074  11.074
confint(data.hp.glmerL)
                 2.5 %        97.5 %
.sig01       0.4485283  1.0175130142
(Intercept)  1.6170509  2.3176813864
treatMedium -0.4443751 -0.0008815724
treatHigh   -0.8469135 -0.3505664810
VarCorr(data.hp.glmerL)
 Groups Name        Std.Dev.
 sites  (Intercept) 0.65956 

Conclusions: Via either method we would reject the null hypothesis (p<0.05 from the global Wald test). The individual effects parameters suggest that the response is significantly higher for the 'Low' treatment effect than the 'High' treatment effect, yet was not found to be different between the 'Low' and 'Medium' treatments.

Note, the effect sizes (provided in the Fixed effects coefficients tables) are on a log scale since by default the Poisson distribution uses a log-link function.

Coefficient of determination ($R^2$) analogue

Unlike simple linear regression in which the $R^2$ value (ratio of explained variability to total variability) neatly encapsulates a property that we can interpret as the proportion of variability explained by a model, simple analogues for GLM and LMM (and thus GLMM) are not available. In part this is because it is difficult to know how to include all the sources of random variation (random effects and residuals).

Nevertheless, we can generate quasi-$R^2$ values by calculating the ratio of variance in the residuals to total variance in the response:

#PQL
#totalss <- var(resid(data.hp.glmmPQL)+fitted(data.hp.glmmPQL))
totalss <- var(resid(data.hp.glmmPQL,type='pearson')+predict(data.hp.glmmPQL, type='link'))
1-var(residuals(data.hp.glmmPQL, type='pearson'))/(totalss)
[1] 0.5101296
#Laplace
#totalss <- var(resid(data.hp.glmerL)+fitted(data.hp.glmerL))
totalss <- var(resid(data.hp.glmerL, type='pearson')+predict(data.hp.glmerL, type='link'))
1-var(residuals(data.hp.glmerL, type='pearson'))/(totalss)
[1] 0.4974269
Also look at MuMIn:::r.squaredGLMM() and Shinichi Nakagawa and Holger Schielzeth 2013

Summary plot

Note, plotting the raw data, is of little value when there are large effects of the random effects (sites) since the spread of points will say more about the variability between sites than the variability due the treatments. If you wish to plot something in addition to the modelled summaries (means and confidence intervals), one sensible option would be to plot the result of adding the residuals to the fitted values.

par(mar = c(4, 5, 0, 1))
res <- exp(predict(data.hp.glmmPQL, level=0)+residuals(data.hp.glmmPQL))
plot.default(res ~ treat, data=data,xlim=c(0.5,3.5),type = "n", ann = F, axes = F)
points(res ~ treat, data = data.hp, pch = 16, col='grey')
coefs <- fixef(data.hp.glmmPQL)
pred <- data.frame(treat=gl(3,1,3,c('Low','Medium','High')))
mm <- model.matrix(~treat, data=pred)
fit <- as.vector(coefs %*% t(mm))
pred$fit <- exp(fit)
se <- sqrt(diag(mm %*% vcov(data.hp.glmmPQL) %*% t(mm)))
pred$lwr <- exp(fit-qt(0.975,data.hp.glmmPQL$fixDF$X[2])*se)
pred$upr <- exp(fit+qt(0.975,data.hp.glmmPQL$fixDF$X[2])*se)
points(fit~treat, data=pred, pch=16)
with(pred, arrows(as.numeric(treat),lwr,as.numeric(treat),upr, code=3, angle=90,length=0.1))
axis(1, at=1:3, lab=c('Low','Medium','High'))
mtext("Treatment", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Abundance of Y", 2, cex = 1.5, line = 3)
box(bty = "l")
plot of chunk tut11.2aS1.9

Hierarchical Quasi-Poisson regression

Scenario and Data

Consider an experimental design in which we wished to measure of the abundance of individuals of a species (or number of incidents etc) in response to some categorical influence (such as level of exposure to some disturbance(s) - 'high','medium', 'low'). In an attempt to minimize the expected extent of unexplained variability (due to highly heterogeneous underlying conditions across the landscape), each of the exposure 'treatments' were applied within a relatively small spatial scale (individual sites) - that is, they were blocked. In all, we will employ 20 randomly (or haphazardly selected) sites and the treatments will be applied randomly (or haphazardly) within sites.

Random data incorporating the following trends (effect parameters)
  • the number of sites ($sites$) = 20
  • the categorical treatment $treat$ variable comprising of three levels ('High', 'Medium' and 'Low')
  • the mean value of $y$ of the 'low' treatment is 8
  • the standard deviation in 'low' treatment $y$ values across the 20 sites is 0.5
  • the mean effect of the 'Medium' effect is -1
  • the mean effect of the 'High' effect is -2
  • to generate the values of $y$ expected at each $treat$ level within each of the $sites$, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
  • finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.
#Quasi-poisson distrition
rqpois <- function(n, lambda, phi) {
  mu = lambda
  r = rnbinom(n, size=mu/phi/(1-1/phi),prob=1/phi)
  return(r)
}
set.seed(321) #4
#number of sites
n.sites <- 20
n.treat <- 3
n<-n.sites*n.treat
sites <- gl(n=n.sites, k=n.treat, lab=paste("Site",1:n.sites,sep=''))
treat <- gl(n=n.treat,1,n, lab=c('Low','Medium','High'))
#Xmat <- model.matrix(~sites*treat-1-treat)
Xmat <- model.matrix(~-1+sites+treat)

intercept.mean <- 12 # mu alpha
intercept.sd <- 5 # sigma alpha
#effect.sd <- 1.1
treatM.effect <- -3
treatH.effect <- -5
intercept.effects<-rnorm(n=n.sites, mean=intercept.mean, sd=intercept.sd)
#treatM.effects <- rnorm(n=n.sites, mean=treatM.effect, sd=effect.sd)
#treatH.effects <- rnorm(n=n.sites, mean=treatH.effect, sd=effect.sd)
all.effects <- c(intercept.effects, treatM.effect, treatH.effect)  # Put them all together
lin.pred <- Xmat[,] %*% all.effects
lin.pred[lin.pred<0] <- exp(lin.pred[lin.pred<0])
y <- rqpois(n=n, lambda=lin.pred, phi=6) #as.integer(lin.pred+eps)
data.hqp<-data.frame(y,treat,sites)
head(data.hqp)
   y  treat sites
1 27    Low Site1
2 19 Medium Site1
3  1   High Site1
4  0    Low Site2
5  0 Medium Site2
6  0   High Site2
write.table(data.hqp, file='../downloads/data/data.hqp.csv', quote=F, row.names=F, sep=',')

Exploratory data analysis and initial assumption checking

The assumptions are:
  1. All of the observations are independent - the blocking design will introduce a dependency structure (observations within each block are not strictly independent) that we need to address via mixed effects modelling.
  2. The response variable (and thus the residuals) should be matched by an appropriate distribution (in the case of positive integers response - a Poisson is appropriate).
  3. All observations are equally influential in determining the trends - or at least no observations are overly influential. This is most effectively diagnosed via residuals and other influence indices and is very difficult to diagnose prior to analysis
  4. the relationship between the linear predictor (right hand side of the regression formula) and the link function should be linear. A scatterplot with smoother can be useful for identifying possible non-linearity.
  5. If using a Poisson distribution, the dispersion factor is close to 1

There are (at least) five main potential models we could consider fitting to these data:

  1. Ordinary least squares ANOVA (generalized least squares model) - assumes normality and homogeneity of variance of residuals as well as an absence of a dependency structure
  2. Linear mixed effects model - assumes normality and homogeneity of variance of residuals
  3. Poisson mixed effects model - assumes mean=variance (dispersion=1)
  4. Quasi-poisson mixed effects model - a general solution to overdispersion. Assumes variance is a function of mean, dispersion estimated, however likelihood based statistics unavailable
  5. Negative binomial mixed effects model - a specific solution to overdispersion caused by clumping (due to an unmeasured latent variable). Scaling factor ($\omega$) is estimated along with the regression parameters.
  6. Zero-inflation mixed effects model - a specific solution to overdispersion caused by excessive zeros (due to an unmeasured latent variable). Mixture of binomial and Poisson mixed effects models.

Confirm non-normality and explore clumping

When counts are all very large (not close to 0) and their ranges do not span orders of magnitude, they take on very Gaussian-like properties (symmetrical distribution and variance independent of the mean). Given that models based on the Gaussian distribution are more optimized and recognized than Generalized Linear Mixed Effects Models (GLMM), it can be prudent to adopt Gaussian models for such data. Hence it is a good idea to first explore whether a Poisson model is likely to be more appropriate than a standard Gaussian model.

The potential for overdispersion can be explored by adding a rug to boxplot. The rug is simply tick marks on the inside of an axis at the position corresponding to an observation. As multiple identical values result in tick marks drawn over one another, it is typically a good idea to apply a slight amount of jitter (random displacement) to the values used by the rug.

library(ggplot2)
ggplot(data.hqp, aes(x=y))+geom_histogram()+facet_wrap(~treat)
plot of chunk tut11.2aS2.2
ggplot(data.hqp, aes(y=y, x=1))+geom_boxplot()+geom_rug(position='jitter', sides='l')+facet_grid(~treat)
plot of chunk tut11.2aS2.2
ggplot(data.hqp, aes(y=y, x=treat))+geom_point()+facet_wrap(~sites)
plot of chunk tut11.2aS2.2

There are signs of non-normality that would warrant Poisson models. The rug applied to the boxplots does not indicate a serious degree of clumping and there appears to be few zeros. So if there is overdispersion it is unlikely that it is due to either clumping or excessive zeros.

Explore zero inflation

Although we have already established that there are few zeros in the data (and thus overdispersion is unlikely to be an issue), we can also explore this by comparing the number of zeros in the data to the number of zeros that would be expected from a Poisson distribution with a mean equal to the mean count of the data.

#proportion of 0's in the data1
data.hqp.tab<-table(data.hqp$y==0)
data.hqp.tab/sum(data.hqp.tab)
    FALSE      TRUE 
0.8833333 0.1166667 
#proportion of 0's expected from a Poisson distribution
mu <- mean(data.hqp$y)
cnts <- rpois(1000, mu)
data.hqp.tab <- table(cnts == 0)
data.hqp.tab/sum(data.hqp.tab)
FALSE 
    1 
In the above, the value under FALSE is the proportion of non-zero values in the data and the value under TRUE is the proportion of zeros in the data. In this example, the observed proportion (11.7%) is not substantially higher than the expected number of zeros (0.0%).

Model fitting or statistical analysis

As introduced at the start of this tutorial, Generalized Linear Mixed effects Models (GLMM) can be fit via numerous alternative processes. In a frequentist paradigm, the three most popular estimation routines are Penalized Quasi-likelihood (PQL), Laplace approximation and Gauss-Hermite quadrature (GHQ).

At this point, we could fit a Poisson mixed effects model and explore the fit of the model (particularly with respect to overdispersion and residuals). For the purpose of this tutorial, we will assume that the data are overdispersed. However, if you expand the following section, you will be able to see the data fit with a Poisson model and the diagnostics.

#PQL
library(MASS)
data.hqp.glmmPQL <- glmmPQL(y~treat, random=list(~1|sites), data=data.hqp, family="poisson")
#Laplace
library(lme4)
data.hqp.glmerL <- glmer(y~treat+(1|sites), data=data.hqp, family='poisson') #Laplacian approximation
#Laplace (via glmmADMB)
library(glmmADMB)
data.hqp.admb <- glmmadmb(y ~ treat + (1|sites), family = "poisson", data = data.hqp)
#Note that glmer and glmmadmb for simple Laplace balanced GLMM are identical, therefore I will cease to present the glmmADMB model

Model evaluation

Expected values > 5
#from PQL
predict(data.hqp.glmmPQL, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),type='response',level=0)
[1] 14.041862 10.722061  5.742359
attr(,"label")
[1] "Predicted values"
#from Laplace
predict(data.hqp.glmerL, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),REform=NA,type='response')
        1         2         3 
12.195000  9.311861  4.987078 
Conclusions: the expected values are all greater than 5 and thus PQL is likely to yield reasonably good approximations.

Lets explore the diagnostics - particularly the residuals

plot(data.hqp.glmmPQL)
plot of chunk tut11.2aS2.33c
#OR
plot(residuals(data.hqp.glmmPQL, type='pearson') ~ predict(data.hqp.glmmPQL, type='link'))
plot of chunk tut11.2aS2.33c
plot(residuals(data.hqp.glmmPQL, type='pearson') ~ as.numeric(data.hqp$treat))
plot of chunk tut11.2aS2.33c
#Laplace
#plot(data.hqp.glmerL)
plot(residuals(data.hp.glmerL, type='pearson') ~ predict(data.hp.glmerL, type='link'))
plot of chunk tut11.2aS2.33c
plot(residuals(data.hqp.glmerL, type='pearson') ~ as.numeric(data.hqp$treat))
plot of chunk tut11.2aS2.33c
Conclusions: there are no immediately obvious patterns in the residuals. All three approximation techniques yield very similar residuals.
Goodness of fit (and overdispersion) of the model
#PQL
data.hqp.resid <- sum(resid(data.hqp.glmmPQL, type = "pearson")^2)
rdf <- length(data.hqp.glmmPQL$data$y)-length(fixef(data.hqp.glmmPQL))-1
1 - pchisq(data.hqp.resid, rdf)
[1] 0.8418667
data.hqp.resid /rdf
[1] 0.8117981
#Laplace
data.hqp.resid <- sum(resid(data.hqp.glmerL, type = "pearson")^2)
rdf <- nrow(model.frame(data.hqp.glmerL))-length(fixef(data.hqp.glmerL))-1
1 - pchisq(data.hqp.resid, rdf)
[1] 4.42726e-08
#OR using Ben Bolker's function
overdisp_fun <- function(model) {
  ## number of variance parameters in 
  ##   an n-by-n variance-covariance matrix
  vpars <- function(m) {
    nrow(m)*(nrow(m)+1)/2
  }
  model.df <- sum(sapply(VarCorr(model),vpars))+length(fixef(model))
  rdf <- nrow(model.frame(model))-model.df
  rp <- residuals(model,type="pearson")
  Pearson.chisq <- sum(rp^2)
  prat <- Pearson.chisq/rdf
  pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
  c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}

overdisp_fun(data.hqp.glmerL)
Error in VarCorr(model): could not find symbol "rdig" in environment of the generic function
Conclusions: there is some evidence that the data are overdispersed and that a regular Poisson distribution does not adequately capture all of the variance in the data.

The focus of this demonstration will be on fitting a Quasi-Poisson mixed effects model. Simulations by Ben Bolker have indicated that glmer does not consistently perform correctly with Quasi family distributions. Instead, lme4 developers are now advocating the use of observation-level random effects (the focus of the next major section in the tutorial).

In R, the first of these (PQL) is provided by the glmmPQL() function in the MASS package whereas the later two are provided by the glmer() function in the lme4 package. As these functions are quite different to one another, I will tackle each separately.

library(MASS)
data.hqp.glmmPQL <- glmmPQL(y~treat, random=list(~1|sites), data=data.hqp, family=quasipoisson(link=log))

Model evaluation

Expected values > 5
predict(data.hqp.glmmPQL, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),type='response',level=0)
[1] 14.041862 10.722061  5.742359
attr(,"label")
[1] "Predicted values"
Conclusions: the expected values are all greater than 5 and thus PQL is likely to yield reasonably good approximations.

Lets explore the diagnostics - particularly the residuals

plot(data.hqp.glmmPQL)
plot of chunk tut11.2aS2.5a
plot(residuals(data.hqp.glmmPQL, type='pearson') ~ as.numeric(data.hqp$treat))
plot of chunk tut11.2aS2.5a
Conclusions: there are no immediately obvious patterns in the residuals. All three approximation techniques yield very similar residuals.
Goodness of fit (and overdispersion) of the model
  • Pearson's $\chi^2$ residuals - explores whether there are any significant patterns remaining in the residuals and in particular whether there is any evidence of overdispersion
    #PQL
    data.hqp.resid <- sum(resid(data.hqp.glmmPQL, type = "pearson")^2)
    #str(data.glmmPQL)
    rdf <- length(data.hqp.glmmPQL$data$y)-length(fixef(data.hqp.glmmPQL))-1
    rdf
    
    [1] 56
    
    1 - pchisq(data.hqp.resid, rdf)
    
    [1] 0.8418667
    
    data.hqp.resid/(nrow(data.hqp)-length(fixef(data.hqp.glmmPQL))-1)
    
    [1] 0.8117981
    
Conclusions: the Pearson's residuals from neither model indicate a lack of fit or evidence of overdispersion (p values greater than 0.05).

Exploring the model parameters, test hypotheses

If there was any evidence that the assumptions had been violated or the model was not an appropriate fit, then we would need to reconsider the model and start the process again. In this case, there is no evidence that the test will be unreliable so we can proceed to explore the test statistics. The main statistic of interest is the Wald statistic ($t$ for PQL) for the fixed effects parameters.

Along with the parameter estimates (and there associated Wald tests), we can also investigate the influence of the factor as a whole. This is an approach analogous to ANOVA that explores the main effects. Since likelihood ratio tests (LRT) are inappropriate for PQL models, and LRT for fixed factors should only be based on models that employ REML (which is not currently available for GLMM in R), testing main effects is best done via Wald tests and the wald.test() function in the aod package.

The wald.test() function uses the fixed effects parameter estimates (b) along with the variance-covariance matrix (Sigma) and a specification of which fixed factor terms (Terms) to combine for the Wald statistic. In this case, to explore the overall effect of treatment, we need to combine factor effects 2 (the effect of the 'Medium' treatment) and 3 (the effect of the 'High treatment).

summary(data.hqp.glmmPQL)
Linear mixed-effects model fit by maximum likelihood
 Data: data.hqp 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 Formula: ~1 | sites
        (Intercept) Residual
StdDev:   0.5574376 1.840991

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects: y ~ treat 
                 Value Std.Error DF   t-value p-value
(Intercept)  2.6420430 0.1692550 38 15.609837  0.0000
treatMedium -0.2697396 0.1622323 38 -1.662675  0.1046
treatHigh   -0.8941729 0.1981441 38 -4.512741  0.0001
 Correlation: 
            (Intr) trtMdm
treatMedium -0.415       
treatHigh   -0.340  0.354

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.66945893 -0.79020482 -0.09365499  0.49892375  2.36923548 

Number of Observations: 60
Number of Groups: 20 
library(aod)
Error in library(aod): there is no package called 'aod'
wald.test(b = fixef(data.hqp.glmmPQL), Sigma = vcov(data.hqp.glmmPQL), Terms = 2:3)
Error in eval(expr, envir, enclos): could not find function "wald.test"
intervals(data.hqp.glmmPQL)
Approximate 95% confidence intervals

 Fixed effects:
                lower       est.       upper
(Intercept)  2.308080  2.6420430  2.97600603
treatMedium -0.589846 -0.2697396  0.05036669
treatHigh   -1.285138 -0.8941729 -0.50320783
attr(,"label")
[1] "Fixed effects:"

 Random Effects:
  Level: sites 
                    lower      est.     upper
sd((Intercept)) 0.3419561 0.5574376 0.9087035

 Within-group standard error:
   lower     est.    upper 
1.461895 1.840991 2.318393 
library(gmodels)
ci(data.hqp.glmmPQL)
              Estimate   CI lower    CI upper Std. Error DF      p-value
(Intercept)  2.6420430  2.2994042  2.98468184  0.1692550 38 4.072748e-18
treatMedium -0.2697396 -0.5981618  0.05868253  0.1622323 38 1.046056e-01
treatHigh   -0.8941729 -1.2952946 -0.49305121  0.1981441 38 6.004093e-05
VarCorr(data.hqp.glmmPQL)
Error in VarCorr(data.hqp.glmmPQL): could not find symbol "rdig" in environment of the generic function

Conclusions: Via either method we would reject the null hypothesis (p<0.05 from the global Wald test). The individual effects parameters suggest that the response is significantly higher for the 'Low' treatment effect than the 'High' treatment effect, yet was not found to be different between the 'Low' and 'Medium' treatments.

Note, the effect sizes (provided in the Fixed effects coefficients tables) are on a log scale since by default the Poisson distribution uses a log-link function.

Coefficient of determination ($R^2$) analogue

Unlike simple linear regression in which the $R^2$ value (ratio of explained variability to total variability) neatly encapsulates a property that we can interpret as the proportion of variability explained by a model, simple analogues for GLM and LMM (and thus GLMM) are not available. In part this is because it is difficult to know how to include all the sources of random variation (random effects and residuals).

Nevertheless, we can generate quasi-$R^2$ values by calculating the ratio of variance in the residuals to total variance in the response:

totalss <- var(resid(data.hqp.glmmPQL, type='pearson')+predict(data.hqp.glmmPQL, type='link'))
1-var(residuals(data.hqp.glmmPQL, type='pearson'))/(totalss)
[1] 0.4697712

Summary plot

par(mar = c(4, 5, 0, 1))
res <- exp(predict(data.hqp.glmmPQL, level=0)+residuals(data.hqp.glmmPQL))
plot.default(res ~ treat, data=data.hqp,xlim=c(0.5,3.5),type = "n", ann = F, axes = F)
points(res ~ treat, data = data.hqp, pch = 16, col='grey')
coefs <- fixef(data.hqp.glmmPQL)
pred <- data.frame(treat=gl(3,1,3,c('Low','Medium','High')))
mm <- model.matrix(~treat, data=pred)
fit <- as.vector(coefs %*% t(mm))
pred$fit <- exp(fit)
se <- sqrt(diag(mm %*% vcov(data.hqp.glmmPQL) %*% t(mm)))
pred$lwr <- exp(fit-qt(0.975,data.hqp.glmmPQL$fixDF$X[2])*se)
pred$upr <- exp(fit+qt(0.975,data.hqp.glmmPQL$fixDF$X[2])*se)
points(fit~treat, data=pred, pch=16)
with(pred, arrows(as.numeric(treat),lwr,as.numeric(treat),upr, code=3, angle=90,length=0.1))
axis(1, at=1:3, lab=c('Low','Medium','High'))
mtext("Treatment", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Abundance of Y", 2, cex = 1.5, line = 3)
box(bty = "l")
plot of chunk tut11.2aS2.9

Hierarchical Poisson regression with observation-level random effects

We will use the same data from above and thus the data exploration is identical to that illustrated above.

Model fitting or statistical analysis

data.hob <- data.hqp
data.hob$Obs <- factor(1:nrow(data.hob))
#PQL
library(MASS)
data.hob.glmmPQL <- glmmPQL(y~treat, random=list(~1|sites, ~1|Obs), data=data.hob, family=quasipoisson(link=log))
#Laplace
library(lme4)
data.hob.glmerL <- glmer(y~treat+(1|sites)+(1|Obs), data=data.hob, family='poisson')
#ADMB
library(glmmADMB)
data.hob.admb <- glmmadmb(y~treat+(1|sites)+(1|Obs), data=data.hob, family='poisson')
Expected values > 5
#from PQL
predict(data.hob.glmmPQL, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),type='response',level=0)
[1] 13.589916 10.366879  5.685048
attr(,"label")
[1] "Predicted values"
#from Laplace
predict(data.hob.glmerL, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),REform=NA,type='response')
        1         2         3 
11.570854  8.283443  4.494939 
#from Laplace
predict(data.hob.admb, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),REform=NA,type='response')
[1] 11.569336  8.280933  4.491226
Conclusions: the expected values are all greater than 5 and thus PQL is likely to yield reasonably good approximations.

Lets explore the diagnostics - particularly the residuals

plot(data.hob.glmmPQL)
plot of chunk tut11.2aS3.5a
plot(residuals(data.hob.glmmPQL, type='pearson') ~ predict(data.hob.glmmPQL, type='link'))
plot of chunk tut11.2aS3.5a
plot(residuals(data.hob.glmmPQL, type='pearson') ~ as.numeric(data.hob$treat))
plot of chunk tut11.2aS3.5a
#Laplace
plot(data.hob.glmerL)
plot of chunk tut11.2aS3.5a
plot(residuals(data.hob.glmerL, type='pearson') ~ predict(data.hob.glmerL, type='link'))
plot of chunk tut11.2aS3.5a
plot(residuals(data.hob.glmerL, type='pearson') ~ as.numeric(data.hob$treat))
plot of chunk tut11.2aS3.5a
#ADM
plot(resid(data.hob.admb, type='pearson')~predict(data.hob.admb, type='link'))
plot of chunk tut11.2aS3.5a
plot(residuals(data.hob.admb, type='pearson') ~ as.numeric(data.hob$treat))
plot of chunk tut11.2aS3.5a
Conclusions: there are no immediately obvious patterns in the residuals. All three approximation techniques yield very similar residuals.

Exploring the model parameters, test hypotheses

summary(data.hob.glmmPQL)
Linear mixed-effects model fit by maximum likelihood
 Data: data.hob 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 Formula: ~1 | sites
        (Intercept)
StdDev:   0.5526872

 Formula: ~1 | Obs %in% sites
        (Intercept)     Residual
StdDev:   0.4772213 0.0002005456

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects: y ~ treat 
                 Value Std.Error DF   t-value
(Intercept)  2.6093280 0.1675212 38 15.576111
treatMedium -0.2707120 0.1548310 38 -1.748435
treatHigh   -0.8714885 0.1548310 38 -5.628642
            p-value
(Intercept)  0.0000
treatMedium  0.0885
treatHigh    0.0000
 Correlation: 
            (Intr) trtMdm
treatMedium -0.462       
treatHigh   -0.462  0.500

Standardized Within-Group Residuals:
          Min            Q1           Med 
-2.455701e-04 -8.954641e-05 -8.499374e-06 
           Q3           Max 
 7.130705e-05  2.604165e-04 

Number of Observations: 60
Number of Groups: 
         sites Obs %in% sites 
            20             60 
library(aod)
Error in library(aod): there is no package called 'aod'
wald.test(b=fixef(data.hob.glmmPQL),
 Sigma=vcov(data.hob.glmmPQL),Terms=2:3)
Error in eval(expr, envir, enclos): could not find function "wald.test"
intervals(data.hob.glmmPQL)
Error in intervals.lme(data.hob.glmmPQL): cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance
 Consider 'which = "fixed"'
library(gmodels)
ci(data.hob.glmmPQL)
              Estimate  CI lower    CI upper
(Intercept)  2.6093280  2.270199  2.94845685
treatMedium -0.2707120 -0.584151  0.04272708
treatHigh   -0.8714885 -1.184928 -0.55804941
            Std. Error DF      p-value
(Intercept)  0.1675211 38 4.373987e-18
treatMedium  0.1548310 38 8.846431e-02
treatHigh    0.1548310 38 1.839856e-06
VarCorr(data.hob.glmmPQL)
Error in VarCorr(data.hob.glmmPQL): could not find symbol "rdig" in environment of the generic function
summary(data.hob.glmerL)
Generalized linear mixed model fit by maximum
  likelihood (Laplace Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: y ~ treat + (1 | sites) + (1 | Obs)
   Data: data.hob

     AIC      BIC   logLik deviance df.resid 
   410.9    421.4   -200.5    400.9       55 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.65726 -0.46666  0.01637  0.23927  0.63063 

Random effects:
 Groups Name        Variance Std.Dev.
 Obs    (Intercept) 0.2850   0.5338  
 sites  (Intercept) 0.5992   0.7741  
Number of obs: 60, groups:  Obs, 60; sites, 20

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   2.4485     0.2230  10.982  < 2e-16
treatMedium  -0.3342     0.1998  -1.673   0.0944
treatHigh    -0.9455     0.2099  -4.505 6.65e-06
               
(Intercept) ***
treatMedium .  
treatHigh   ***
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trtMdm
treatMedium -0.419       
treatHigh   -0.396  0.459
library(aod)
Error in library(aod): there is no package called 'aod'
wald.test(b=fixef(data.hob.glmerL),
 Sigma=vcov(data.hob.glmerL),Terms=2:3)
Error in eval(expr, envir, enclos): could not find function "wald.test"
anova(data.hob.glmerL)
Analysis of Variance Table
      Df Sum Sq Mean Sq F value
treat  2 20.436  10.218  10.218
confint(data.hob.glmerL)
                 2.5 %      97.5 %
.sig01       0.3694369  0.76088114
.sig02       0.4720298  1.24556719
(Intercept)  1.9749694  2.89330161
treatMedium -0.7533367  0.06494574
treatHigh   -1.3846038 -0.52753781
VarCorr(data.hob.glmerL)
Error in VarCorr(data.hob.glmerL): could not find symbol "rdig" in environment of the generic function

Coefficient of determination ($R^2$) analogue

#PQL
totalss <- var(resid(data.hob.glmmPQL, type='pearson')+predict(data.hob.glmmPQL, type='link'))
1-var(residuals(data.hob.glmmPQL, type='pearson'))/(totalss)
[1] 1
#Laplace
totalss <- var(resid(data.hob.glmerL)+fitted(data.hob.glmerL))
1-var(residuals(data.hob.glmerL))/(totalss)
[1] 0.9947502

Summary plot

Note, plotting the raw data, is of little value when there are large effects of the random effects (sites) since the spread of points will say more about the variability between sites than the variability due the treatments. If you wish to plot something in addition to the modelled summaries (means and confidence intervals), one sensible option would be to plot the result of adding the residuals to the fitted values.

par(mar = c(4, 5, 0, 1))
res <- exp(predict(data.hob.glmmPQL, level=0)+residuals(data.hob.glmmPQL))
plot.default(res ~ treat, data=data.hob,xlim=c(0.5,3.5),type = "n", ann = F, axes = F)
points(res ~ treat, data = data.hob, pch = 16, col='grey')
coefs <- fixef(data.hob.glmmPQL)
pred <- data.frame(treat=gl(3,1,3,c('Low','Medium','High')))
mm <- model.matrix(~treat, data=pred)
fit <- as.vector(coefs %*% t(mm))
pred$fit <- exp(fit)
se <- sqrt(diag(mm %*% vcov(data.hob.glmmPQL) %*% t(mm)))
pred$lwr <- exp(fit-qt(0.975,data.hob.glmmPQL$fixDF$X[2])*se)
pred$upr <- exp(fit+qt(0.975,data.hob.glmmPQL$fixDF$X[2])*se)
points(fit~treat, data=pred, pch=16)
with(pred, arrows(as.numeric(treat),lwr,as.numeric(treat),upr, code=3, angle=90,length=0.1))
axis(1, at=1:3, lab=c('Low','Medium','High'))
mtext("Treatment", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Abundance of Y", 2, cex = 1.5, line = 3)
box(bty = "l")
plot of chunk tut11.2aS3.9

Hierarchical negative binomial

Consider again the experimental design in which we wished to measure of the abundance of individuals of a species (or number of incidents etc) in response to some categorical influence (such as level of exposure to some disturbance(s) - 'high','medium', 'low'). In an attempt to minimize the expected extent of unexplained variability (due to highly heterogeneous underlying conditions across the landscape), each of the exposure 'treatments' were applied within a relatively small spatial scale (individual sites) - that is, they were blocked. In all, we will employ 20 randomly (or haphazardly selected) sites and the treatments will be applied randomly (or haphazardly) within sites.

Random data incorporating the following trends (effect parameters)
  • the number of sites ($sites$) = 20
  • the categorical treatment $treat$ variable comprising of three levels ('High', 'Medium' and 'Low')
  • the mean value of $y$ of the 'low' treatment is 8
  • the standard deviation in 'low' treatment $y$ values across the 20 sites is 0.5
  • the mean effect of the 'Medium' effect is -1
  • the mean effect of the 'High' effect is -2
  • to generate the values of $y$ expected at each $treat$ level within each of the $sites$, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
  • finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.
set.seed(1) #3 #7
#number of sites
n.sites <- 20
n.treat <- 3
n<-n.sites*n.treat
sites <- gl(n=n.sites, k=n.treat, lab=paste("Site",1:n.sites,sep=''))
treat <- gl(n=n.treat,1,n, lab=c('Low','Medium','High'))
#Xmat <- model.matrix(~sites*treat-1-treat)
Xmat <- model.matrix(~-1+sites+treat)

intercept.mean <- 15 # mu alpha
intercept.sd <- 5 # sigma alpha
#effect.sd <- 1
treatM.effect <- -4
treatH.effect <- -8
intercept.effects<-rnorm(n=n.sites, mean=intercept.mean, sd=intercept.sd)
#treatM.effects <- rnorm(n=n.sites, mean=treatM.effect, sd=effect.sd)
#treatH.effects <- rnorm(n=n.sites, mean=treatH.effect, sd=effect.sd)
all.effects <- c(intercept.effects, treatM.effect, treatH.effect)  # Put them all together
lin.pred <- Xmat[,] %*% all.effects
lin.pred[lin.pred<0] <- exp(lin.pred[lin.pred<0])
eps <- rpois(n=n, n.treat)
#y <- rpois(n=n, lin.pred) #as.integer(lin.pred+eps)
y <- rnbinom(n=n, mu=lin.pred, size=1)
data.hnb<-data.frame(y,treat,sites)
head(data.hnb)
   y  treat sites
1  8    Low Site1
2 24 Medium Site1
3 14   High Site1
4 15    Low Site2
5  5 Medium Site2
6  0   High Site2
library(ggplot2)
ggplot(data.hnb, aes(y=y, x=treat))+geom_point()+facet_wrap(~sites)
plot of chunk tut11.7aaS4.1
write.table(data.hnb, file='../downloads/data/data.hnb.csv', quote=F, row.names=F, sep=',')

Exploratory data analysis and initial assumption checking

Confirm non-normality and explore clumping

When counts are all very large (not close to 0) and their ranges do not span orders of magnitude, they take on very Gaussian-like properties (symmetrical distribution and variance independent of the mean). Given that models based on the Gaussian distribution are more optimized and recognized than Generalized Linear Mixed Effects Models (GLMM), it can be prudent to adopt Gaussian models for such data. Hence it is a good idea to first explore whether a Poisson model is likely to be more appropriate than a standard Gaussian model.

The potential for overdispersion can be explored by adding a rug to boxplot. The rug is simply tick marks on the inside of an axis at the position corresponding to an observation. As multiple identical values result in tick marks drawn over one another, it is typically a good idea to apply a slight amount of jitter (random displacement) to the values used by the rug.

library(ggplot2)
ggplot(data.hnb, aes(x=y))+geom_histogram()+facet_wrap(~treat)
plot of chunk tut11.2aS4.2
ggplot(data.hnb, aes(y=y, x=1))+geom_boxplot()+geom_rug(position='jitter', sides='l')+facet_grid(~treat)
plot of chunk tut11.2aS4.2
ggplot(data.hnb, aes(y=y, x=treat))+geom_point()+facet_wrap(~sites)
plot of chunk tut11.2aS4.2
There are signs of non-normality that would warrant Poisson models. The rug applied to the boxplots does not indicate a serious degree of clumping and there appears to be few zeros. Thus overdispersion is unlikely to be an issue.
Explore zero inflation

Although we have already established that there are few zeros in the data (and thus overdispersion is unlikely to be an issue), we can also explore this by comparing the number of zeros in the data to the number of zeros that would be expected from a Poisson distribution with a mean equal to the mean count of the data.

#proportion of 0's in the data
data.tab<-table(data.hnb$y==0)
data.tab/sum(data.tab)
    FALSE      TRUE 
0.8833333 0.1166667 
#proportion of 0's expected from a Poisson distribution
mu <- mean(data.hnb$y)
cnts <- rpois(1000, mu)
data.tab <- table(cnts == 0)
data.tab/sum(data.tab)
FALSE 
    1 
In the above, the value under FALSE is the proportion of non-zero values in the data and the value under TRUE is the proportion of zeros in the data. In this example, the observed proportion (11.7%) corresponds closely to the very low proportion expected (NA%).

Model fitting or statistical analysis

#PQL
library(MASS)
a<-glm.nb(y~treat, data=data.hnb)
theta <- a$theta
data.hnb.glmmPQL <- glmmPQL(y~treat, random=list(~1|sites), data=data.hnb, family=negative.binomial(theta))

#Laplace
library(lme4)
data.hnb.glmerL <- glmer(y~treat+(1|sites), data=data.hnb, family=negative.binomial(theta))
#ADMB
library(glmmADMB)
data.hnb.admb <- glmmadmb(y~treat+(1|sites), data=data.hnb, family='nbinom')
Expected values > 5
#from PQL
predict(data.hnb.glmmPQL, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),type='response',level=0)
[1] 14.90 10.70  7.95
attr(,"label")
[1] "Predicted values"
#from Laplace
predict(data.hnb.glmerL, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),REform=NA,type='response')
    1     2     3 
14.90 10.70  7.95 
#from Laplace
predict(data.hnb.admb, newdata=data.frame(treat=gl(3,1,3,c('Low','Medium','High'))),REform=NA,type='response')
[1] 14.899875 10.699925  7.949929
Conclusions: the expected values are all greater than 5 and thus PQL is likely to yield reasonably good approximations.

Lets explore the diagnostics - particularly the residuals

plot(data.hnb.glmmPQL)
plot of chunk tut11.2aS4.5a
plot(residuals(data.hnb.glmmPQL, type='pearson') ~ predict(data.hnb.glmmPQL, type='link'))
plot of chunk tut11.2aS4.5a
plot(residuals(data.hnb.glmmPQL, type='pearson') ~ as.numeric(data.hnb$treat))
plot of chunk tut11.2aS4.5a
#Laplace
#plot(data.hnb.glmerL)
plot(residuals(data.hnb.glmerL, type='pearson') ~ predict(data.hnb.glmerL, type='link'))
plot of chunk tut11.2aS4.5a
plot(residuals(data.hnb.glmerL, type='pearson') ~ as.numeric(data.hnb$treat))
plot of chunk tut11.2aS4.5a
Conclusions: there are no immediately obvious patterns in the residuals. All three approximation techniques yield very similar residuals.

Exploring the model parameters, test hypotheses

summary(data.hnb.glmmPQL)
Linear mixed-effects model fit by maximum likelihood
 Data: data.hnb 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 Formula: ~1 | sites
         (Intercept)  Residual
StdDev: 3.656718e-05 0.8249279

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects: y ~ treat 
                 Value Std.Error DF   t-value
(Intercept)  2.7013612 0.2083966 38 12.962595
treatMedium -0.3311175 0.2963138 38 -1.117456
treatHigh   -0.6281893 0.2982612 38 -2.106172
            p-value
(Intercept)  0.0000
treatMedium  0.2708
treatHigh    0.0419
 Correlation: 
            (Intr) trtMdm
treatMedium -0.703       
treatHigh   -0.699  0.491

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3 
-1.1008610 -0.9399167 -0.2389896  0.9088510 
       Max 
 2.0761204 

Number of Observations: 60
Number of Groups: 20 
library(aod)
Error in library(aod): there is no package called 'aod'
wald.test(b=fixef(data.hnb.glmmPQL),
 Sigma=vcov(data.hnb.glmmPQL),Terms=2:3)
Error in eval(expr, envir, enclos): could not find function "wald.test"
intervals(data.hnb.glmmPQL)
Error in intervals.lme(data.hnb.glmmPQL): cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance
 Consider 'which = "fixed"'
library(gmodels)
ci(data.hnb.glmmPQL)
              Estimate   CI lower    CI upper
(Intercept)  2.7013612  2.2794843  3.12323815
treatMedium -0.3311175 -0.9309733  0.26873839
treatHigh   -0.6281893 -1.2319874 -0.02439112
            Std. Error DF      p-value
(Intercept)  0.2083966 38 1.593290e-15
treatMedium  0.2963138 38 2.708150e-01
treatHigh    0.2982612 38 4.185028e-02
VarCorr(data.hnb.glmmPQL)
Error in VarCorr(data.hnb.glmmPQL): could not find symbol "rdig" in environment of the generic function
summary(data.hnb.glmerL)
Generalized linear mixed model fit by maximum
  likelihood (Laplace Approximation) [glmerMod]
 Family: Negative Binomial(0.873)  ( log )
Formula: y ~ treat + (1 | sites)
   Data: data.hnb

     AIC      BIC   logLik deviance df.resid 
   420.9    431.3   -205.4    410.9       55 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.9081 -0.7754 -0.1971  0.7497  1.7127 

Random effects:
 Groups Name        Variance Std.Dev.
 sites  (Intercept) 0        0       
Number of obs: 60, groups:  sites, 20

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   2.7014     0.2462  10.971   <2e-16
treatMedium  -0.3311     0.3501  -0.946   0.3443
treatHigh    -0.6282     0.3524  -1.783   0.0747
               
(Intercept) ***
treatMedium    
treatHigh   .  
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trtMdm
treatMedium -0.703       
treatHigh   -0.699  0.491
library(aod)
Error in library(aod): there is no package called 'aod'
wald.test(b=fixef(data.hnb.glmerL),
 Sigma=vcov(data.hnb.glmerL),Terms=2:3)
Error in eval(expr, envir, enclos): could not find function "wald.test"
anova(data.hnb.glmerL)
Analysis of Variance Table
      Df Sum Sq Mean Sq F value
treat  2  3.184   1.592   1.592
confint(data.hnb.glmerL)
                2.5 %     97.5 %
.sig01       0.000000 0.43054685
(Intercept)  2.250621 3.22162510
treatMedium -1.022885 0.36054120
treatHigh   -1.324450 0.06779285
VarCorr(data.hnb.glmerL)
Error in VarCorr(data.hnb.glmerL): could not find symbol "rdig" in environment of the generic function
summary(data.hnb.admb)
Call:
glmmadmb(formula = y ~ treat + (1 | sites), data = data.hnb, 
    family = "nbinom")

AIC: 420.9 

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)    2.701      0.246   10.97   <2e-16
treatMedium   -0.331      0.350   -0.95    0.344
treatHigh     -0.628      0.352   -1.78    0.075
               
(Intercept) ***
treatMedium    
treatHigh   .  
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Number of observations: total=60, sites=20 
Random effect variance(s):
Error in VarCorr(x): could not find symbol "rdig" in environment of the generic function
library(aod)
Error in library(aod): there is no package called 'aod'
wald.test(b=fixef(data.hnb.admb),
 Sigma=vcov(data.hnb.admb),Terms=2:3)
Error in eval(expr, envir, enclos): could not find function "wald.test"
anova(data.hnb.admb)
Error in anova.glmmadmb(data.hnb.admb): Two or more model fits required.
confint(data.hnb.admb)
                2.5 %     97.5 %
(Intercept)  2.218751 3.18395473
treatMedium -1.017299 0.35506729
treatHigh   -1.318901 0.06252105
VarCorr(data.hnb.admb)
Error in VarCorr(data.hnb.admb): could not find symbol "rdig" in environment of the generic function
#install.packages("coefplot2",repos="http://www.math.mcmaster.ca/bolker/R")
library(coefplot2)
Error in library(coefplot2): there is no package called 'coefplot2'
coefplot2(list('glmmPQL (NB)'=data.hnb.glmmPQL, 'Laplace (NB)'=data.hnb.glmerL, 'ADMB (NB)'=data.hnb.admb), varnames=c('High vs Low', 'Medium vs Low'),legend=TRUE,legend.x='right')
Error in eval(expr, envir, enclos): could not find function "coefplot2"

Zero-inflation Poisson (ZIP) regression

Scenario and Data

Lets say we wanted to model the abundance of an item ($y$) against a continuous predictor ($x$). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Random data incorporating the following trends (effect parameters)
  • the sample size = 20
  • the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
  • the rate of change in log $y$ per unit of $x$ (slope) = 0.1.
  • the value of $x$ when log$y$ equals 0 (when $y$=1)
  • to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor (created by calculating the outer product of the model matrix and the regression parameters). These expected values are then transformed into a scale mapped by (0,$\infty$) by using the log function $e^{linear~predictor}$
  • finally, we generate $y$ values by using the expected $y$ values ($\lambda$) as probabilities when drawing random numbers from a Poisson distribution. This step adds random noise to the expected $y$ values and returns only 0's and positive integers.
set.seed(123) #3 #7
#number of sites
n.sites <- 20
n.treat <- 3
n<-n.sites*n.treat
sites <- gl(n=n.sites, k=n.treat, lab=paste("Site",1:n.sites,sep=''))
treat <- gl(n=n.treat,1,n, lab=c('Low','Medium','High'))
#Xmat <- model.matrix(~sites*treat-1-treat)
Xmat <- model.matrix(~-1+sites+treat)

intercept.mean <- 10 # mu alpha
intercept.sd <- 5 # sigma alpha
#effect.sd <- 1
treatM.effect <- -2
treatH.effect <- -4
intercept.effects<-rnorm(n=n.sites, mean=intercept.mean, sd=intercept.sd)
#treatM.effects <- rnorm(n=n.sites, mean=treatM.effect, sd=effect.sd)
#treatH.effects <- rnorm(n=n.sites, mean=treatH.effect, sd=effect.sd)
all.effects <- c(intercept.effects, treatM.effect, treatH.effect)  # Put them all together
lin.pred <- Xmat[,] %*% all.effects
lin.pred[lin.pred<0] <- exp(lin.pred[lin.pred<0])
#eps <- rpois(n=n, n.treat)
#Add some noise and make binomial
library(gamlss.dist)
#fixed latent binomial
y<- rZIP(n,lin.pred, 0.4)
data.hzip <- data.frame(y,treat,sites)
head(data.hzip)
  y  treat sites
1 0    Low Site1
2 1 Medium Site1
3 0   High Site1
4 0    Low Site2
5 0 Medium Site2
6 0   High Site2
write.table(data.hzip, file='../downloads/data/data.hzip.csv', quote=F, row.names=F, sep=',')

Exploratory data analysis and initial assumption checking

Confirm non-normality and explore clumping

When counts are all very large (not close to 0) and their ranges do not span orders of magnitude, they take on very Gaussian-like properties (symmetrical distribution and variance independent of the mean). Given that models based on the Gaussian distribution are more optimized and recognized than Generalized Linear Mixed Effects Models (GLMM), it can be prudent to adopt Gaussian models for such data. Hence it is a good idea to first explore whether a Poisson model is likely to be more appropriate than a standard Gaussian model.

The potential for overdispersion can be explored by adding a rug to boxplot. The rug is simply tick marks on the inside of an axis at the position corresponding to an observation. As multiple identical values result in tick marks drawn over one another, it is typically a good idea to apply a slight amount of jitter (random displacement) to the values used by the rug.

library(ggplot2)
ggplot(data.hzip, aes(x=y))+geom_histogram()+facet_wrap(~treat)
plot of chunk tut11.2aS5.2
ggplot(data.hzip, aes(y=y, x=1))+geom_boxplot()+geom_rug(position='jitter', sides='l')+facet_grid(~treat)
plot of chunk tut11.2aS5.2
ggplot(data.hzip, aes(y=y, x=treat))+geom_point()+facet_wrap(~sites)
plot of chunk tut11.2aS5.2
There are signs of non-normality that would warrant Poisson models. The rug applied to the boxplots does not indicate a serious degree of clumping and there appears to be few zeros. Thus overdispersion is unlikely to be an issue.
Explore zero inflation

Although we have already established that there are few zeros in the data (and thus overdispersion is unlikely to be an issue), we can also explore this by comparing the number of zeros in the data to the number of zeros that would be expected from a Poisson distribution with a mean equal to the mean count of the data.

#proportion of 0's in the data
data.tab<-table(data.hzip$y==0)
data.tab/sum(data.tab)
FALSE  TRUE 
  0.5   0.5 
#proportion of 0's expected from a Poisson distribution
mu <- mean(data.hzip$y)
cnts <- rpois(1000, mu)
data.tab <- table(cnts == 0)
data.tab/sum(data.tab)
FALSE  TRUE 
0.988 0.012 
In the above, the value under FALSE is the proportion of non-zero values in the data and the value under TRUE is the proportion of zeros in the data. In this example, the observed proportion (50%) corresponds closely to the very low proportion expected (NA, NA%).

Model fitting or statistical analysis

#ADMB ZIP
library(glmmADMB)
data.hzip.zip <- glmmadmb(y~treat+(1|sites), data=data.hzip, family='poisson', zeroInflation=TRUE,admb.opts=admbControl(shess=FALSE,noinit=FALSE))
#ADMB ZINB
data.hzip.zinb <- glmmadmb(y~treat+(1|sites), data=data.hzip, family='nbinom', zeroInflation=TRUE,admb.opts=admbControl(shess=FALSE,noinit=FALSE))
Error in glmmadmb(y ~ treat + (1 | sites), data = data.hzip, family = "nbinom", : The function maximizer failed (couldn't find parameter file) Troubleshooting steps include (1) run with 'save.dir' set and inspect output files; (2) change run parameters: see '?admbControl';(3) re-run with debug=TRUE for more information on failure mode

Lets explore the diagnostics - particularly the residuals

plot(resid(data.hzip.zip, type='pearson')~predict(data.hzip.zip, type='link'))
plot of chunk tut11.2aS5.5a
plot(resid(data.hzip.zinb)~predict(data.hzip.zinb, type='link'))
Error in resid(data.hzip.zinb): object 'data.hzip.zinb' not found
Conclusions: there are no immediately obvious patterns in the residuals. All three approximation techniques yield very similar residuals.

Exploring the model parameters, test hypotheses

summary(data.hzip.zip)
Call:
glmmadmb(formula = y ~ treat + (1 | sites), data = data.hzip, 
    family = "poisson", zeroInflation = TRUE, admb.opts = admbControl(shess = FALSE, 
        noinit = FALSE))

AIC: 272.9 

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)    2.273      0.186   12.24   <2e-16
treatMedium   -0.147      0.180   -0.82    0.414
treatHigh     -0.497      0.168   -2.96    0.003
               
(Intercept) ***
treatMedium    
treatHigh   ** 
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Number of observations: total=60, sites=20 
Random effect variance(s):
Error in VarCorr(x): could not find symbol "rdig" in environment of the generic function
library(aod)
Error in library(aod): there is no package called 'aod'
wald.test(b=fixef(data.hzip.zip),
 Sigma=vcov(data.hzip.zip),Terms=2:3)
Error in eval(expr, envir, enclos): could not find function "wald.test"
intervals(data.hzip.zip)
Error in UseMethod("intervals"): no applicable method for 'intervals' applied to an object of class "glmmadmb"
library(gmodels)
ci(data.hzip.zip)
Error in UseMethod("ci"): no applicable method for 'ci' applied to an object of class "glmmadmb"
VarCorr(data.hzip.zip)
Error in VarCorr(data.hzip.zip): could not find symbol "rdig" in environment of the generic function
summary(data.hzip.zinb)
Error in summary(data.hzip.zinb): error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'data.hzip.zinb' not found
library(aod)
Error in library(aod): there is no package called 'aod'
wald.test(b=fixef(data.hzip.zinb),
 Sigma=vcov(data.hzip.zinb),Terms=2:3)
Error in eval(expr, envir, enclos): could not find function "wald.test"
anova(data.hzip.zinb)
Error in anova(data.hzip.zinb): object 'data.hzip.zinb' not found
confint(data.hzip.zinb)
Error in confint(data.hzip.zinb): object 'data.hzip.zinb' not found
VarCorr(data.hzip.zinb)
Error in VarCorr(data.hzip.zinb): error in evaluating the argument 'x' in selecting a method for function 'VarCorr': Error: object 'data.hzip.zinb' not found

Exponential family of distributions

The exponential distributions are a class of continuous distribution which can be characterized by two parameters. One of these parameters (the location parameter) is a function of the mean and the other (the dispersion parameter) is a function of the variance of the distribution. Note that recent developments have further extended generalized linear models to accommodate other non-exponential residual distributions.

End of instructions