Tutorial 11.4b -Logistic regression

05 Sep 2016

Binary data

Scenario and Data

Logistic regression is a type of generalized linear model (GLM) that models a binary response against a linear predictor via a specific link function. The linear predictor is the typically a linear combination of effects parameters (e.g. $\beta_0 + \beta_1x_x$). The role of the link function is to transform the expected values of the response y (which is on the scale of (0,1), as is the binomial distribution from which expectations are drawn) into the scale of the linear predictor (which is $-\infty,\infty$). GLM's transform the expected values (via a link) whereas LM's transform the observed data. Thus while GLM's operate on the scale of the original data and yet also on a scale appropriate of the residuals, LM's do neither.

There are many ways (transformations) that can map values on the (0,1) scale into values on the ($-\infty,\infty$) scale, however, the three most common are:

logit: $log\left(\frac{\pi}{1-\pi}\right)$ - log odds ratio
probit: $\phi^{-1}(\pi)$ where $\phi^{-1}$ is an inverse normal cumulative density function
complimentary log-log: $log(-log(1-\pi))$

Lets say we wanted to model the presence/absence of an item ($y$) against a continuous predictor ($x$) As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped. Consequently, I have folded (toggled) this section away.

Details of data generation

Random data incorporating the following trends (effect parameters)

the sample size = 20
the continuous $x$ variable is a random uniform spread of measurements between 1 and 20
the rate of change in log odds ratio per unit change in x (slope) = 0.5. The magnitude of the slope is an indicator of how abruptly the log odds ratio changes. A very abrupt change would occur if there was very little variability in the trend. That is, a large slope would be indicative of a threshold effect. A lower slope indicates a gradual change in odds ratio and thus a less obvious and precise switch between the likelihood of 1 vs 0.
the inflection point (point where the slope of the line is greatest) = 10. The inflection point indicates the value of the predictor ($x$) where the log Odds are 50% (switching between 1 being more likely and 0 being more likely). This is also known as the LD50.
the intercept (which has no real interpretation) is equal to the negative of the slope multiplied by the inflection point
to generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor. These expected values are then transformed into a scale mapped by (0,1) by using the inverse logit function $\frac{e^{linear~predictor}}{1+e^{linear~predictor}}$
finally, we generate $y$ values by using the expected $y$ values as probabilities when drawing random numbers from a binomial distribution. This step adds random noise to the expected $y$ values and returns only 1's and 0's.

set.seed(8)
#The number of samples
n.x <- 20
#Create x values that at uniformly distributed throughout the rate of 1 to 20
x <- sort(runif(n = n.x, min = 1, max =20))
#The slope is the rate of change in log odds ratio for each unit change in x
# the smaller the slope, the slower the change (more variability in data too)
slope=0.5
#Inflection point is where the slope of the line is greatest
#this is also the LD50 point
inflect <- 10
#Intercept (no interpretation)
intercept <- -1*(slope*inflect)
#The linear predictor
linpred <- intercept+slope*x
#Predicted y values
y.pred <- exp(linpred)/(1+exp(linpred))
#Add some noise and make binomial
n.y <-rbinom(n=n.x,20,p=0.9)
y<- rbinom(n = n.x,size=1, prob = y.pred)
dat <- data.frame(y,x)

With these sort of data, we are primarily interested in investigating whether there is a relationship between the binary response variable and the linear predictor (linear combination of one or more continuous or categorical predictors).

Exploratory data analysis and initial assumption checking

The assumptions are:

All of the observations are independent - this must be addressed at the design and collection stages
The response variable (and thus the residuals) should be matched by an appropriate distribution (in the case of a binary response - a binomial is appropriate).
All observations are equally influential in determining the trends - or at least no observations are overly influential. This is most effectively diagnosed via residuals and other influence indices and is very difficult to diagnose prior to analysis
the relationship between the linear predictor (right hand side of the regression formula) and the link function should be linear. A scatterplot with smoother can be useful for identifying possible non-linearity.

So lets explore linearity by creating a histogram of the predictor variable ($x$) and a scatterplot of the relationship between the response ($y$) and the predictor ($x$)

hist(dat$x)

#now for the scatterplot
plot(y~x, dat)
with(dat, lines(lowess(y~x)))

Conclusions: the predictor ($x$) does not display any skewness or other issues that might lead to non-linearity. The lowess smoother on the scatterplot does not display major deviations from a standard sigmoidal curve and thus linearity is satisfied. Violations of linearity could be addressed by either:

define a non-linear linear predictor (such as a polynomial, spline or other non-linear function)
transform the scale of the predictor variables

JAGS

Effects model

Note that in order to prevent arithmetic overflows (particularly with the clog-log model, I am going to constrain the estimated linear predictor to between -20 and 20. Values outside of this on a inverse-log scale are extremely small and huge respectively.

I will demonstrate logistic regression with a range of possible link functions (each of which yield different parameter interpretations):

logit $$ \begin{array}{rcl} y &\sim& \operatorname{Bern}(\pi)\\ log\left(\frac{\pi}{1-\pi}\right)&=&\beta_0+\beta_1x_1\\ \beta_0, \beta_1&\sim&\mathcal{N}(0, 10000)\\ \end{array} $$

dat.list <- with(dat, list(y=y, x=x, N=nrow(dat)))
modelString="
model{
  for (i in 1:N) {
    y[i] ~ dbern(p[i])
    logit(p[i]) <- max(-20,min(20,beta0+beta1*x[i]))
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
}
"
writeLines(modelString, con='../downloads/BUGSscripts/tut11.4bS1.bug')
library(R2jags)
dat.logit.jags <- jags(data=dat.list,model.file='../downloads/BUGSscripts/tut11.4bS1.bug',
                   param=c('beta0','beta1'),
                   n.chains=3, n.iter=20000, n.burnin=10000, n.thin=10)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 147

Initializing model

probit $$ \begin{array}{rcl} y &\sim& \operatorname{Bern}(\pi)\\ \phi^{-1}\left(\pi\right)&=&\beta_0+\beta_1x_1\\ \beta_0, \beta_1&\sim&\mathcal{N}(0, 10000)\\ \end{array} $$

dat.list <- with(dat, list(y=y, x=x, N=nrow(dat)))
modelString="
model{
  for (i in 1:N) {
    y[i] ~ dbern(p[i])
    probit(p[i]) <- max(-20,min(20,beta0+beta1*x[i]))
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
}
"
writeLines(modelString, con='../downloads/BUGSscripts/tut11.4bS1.3b.bug')
library(R2jags)
dat.probit.jags <- jags(data=dat.list,model.file='../downloads/BUGSscripts/tut11.4bS1.3b.bug',
                   param=c('beta0','beta1'),
                   n.chains=3, n.iter=20000, n.burnin=10000, n.thin=10)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 147

Initializing model

complimentary log-log $$ \begin{array}{rcl} y &\sim& \operatorname{Bern}(\pi)\\ log(-log(1-\pi))&=&\beta_0+\beta_1x_1\\ \beta_0, \beta_1&\sim&\mathcal{N}(0, 10000)\\ \end{array} $$

dat.list <- with(dat, list(y=y, x=x, N=nrow(dat)))
modelString="
model{
  for (i in 1:N) {
    y[i] ~ dbern(p[i])
    cloglog(p[i]) <- max(-20,min(20,beta0+beta1*x[i]))
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
}
"
writeLines(modelString, con='../downloads/BUGSscripts/tut11.4bS1.3c.bug')
library(R2jags)
dat.cloglog.jags <- jags(data=dat.list,model.file='../downloads/BUGSscripts/tut11.4bS1.3c.bug',
                   param=c('beta0','beta1'),
                   n.chains=3, n.iter=20000, n.burnin=10000, n.thin=10)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 147

Initializing model

dat.glmC <- glm(y~x, data=dat, family=binomial(link="cloglog"))

Chain mixing and Model validation

Prior to exploring the model parameters, it is prudent to confirm that the model did indeed fit the assumptions and was an appropriate fit to the data as well as that the MCMC sampling chain was adequately mixed and the retained samples independent.

Whilst I will only demonstrate this for the logit model, the procedure would be identical for exploring the probit and clog-log models.

We will start by exploring the mixing of the MCMC chains via traceplots.
plot(as.mcmc(dat.logit.jags))
Other than the large spike in one of the chains (not ideal), the chains appear well mixed. Nevertheless, it is clear that the parameters are not symmetrically distributed. Hence, medians might be better descriptors of parameter estimates than means.

Next we will explore correlation amongst MCMC samples.

autocorr.diag(as.mcmc(dat.logit.jags))

          beta0   beta1 deviance
Lag 0   1.00000 1.00000  1.00000
Lag 10  0.73921 0.74013  0.38068
Lag 50  0.23896 0.23575  0.12992
Lag 100 0.02871 0.04098  0.04769
Lag 500 0.01573 0.01148  0.01980

It seems that the level of auto-correlation at the nominated lag of 10 is extremely high. Ideally, the level of auto-correlation should be less than 0.1. To achieve this, we need a lag of 1000. Consequently, we will resample at a lag of 1000 and obviously we are going to need more iterations to ensure that we retain a large enough sample from which to derive estimates.

In order to support a thinning rate of 1000, the number of iterations is going to need to be very high. Hence, the following might take considerable time to run.

library(R2jags)
dat.logit.jags <- jags(data=dat.list,model.file='../downloads/BUGSscripts/tut11.4bS1.bug',
                   param=c('beta0','beta1'),
                   n.chains=3, n.iter=20000, n.burnin=10000, n.thin=100)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 147

Initializing model

plot(as.mcmc(dat.logit.jags))

autocorr.diag(as.mcmc(dat.logit.jags))

            beta0    beta1  deviance
Lag 0     1.00000  1.00000  1.000000
Lag 100   0.09970  0.12296  0.131321
Lag 500  -0.01581 -0.02810 -0.003366
Lag 1000  0.06601  0.05309 -0.008802
Lag 5000 -0.01542 -0.03085 -0.013968

Conclusions: the samples are now less auto-correlated and the chains are arguably mixed better.

We now explore the goodness of fit of the models via the residuals and deviance. We could calculate the Pearsons's residuals within the JAGS model. Alternatively, we could use the parameters to generate the residuals outside of JAGS.

coefs <- dat.logit.jags$BUGSoutput$sims.matrix[,1:2]
Xmat <- model.matrix(~x, data=dat)
eta<-coefs %*% t(Xmat)
pi <- inv.logit(eta)
#sweep across rows and then divide by pi
Resid <- -1*sweep(pi,2,dat$y,'-')/sqrt(pi*(1-pi))
plot(apply(Resid,2,mean)~apply(eta,2,mean))

Now we will compare the sum of squared residuals to the sum of squares residuals that would be expected from a Bernoulli distribution matching that estimated by the model. Essentially this is estimating how well the Bernoulli distribution and linear model approximates the observed data.

SSres<-apply(Resid^2,1,sum)

set.seed(2)
#generate a matrix of draws from a binomial distribution
# the matrix is the same dimensions as pi and uses the probabilities of pi
YNew <- matrix(rbinom(length(pi),prob=pi,size=1),nrow=nrow(pi))

Resid1<-(pi - YNew)/sqrt(pi*(1-pi))
SSres.sim<-apply(Resid1^2,1,sum)
mean(SSres.sim>SSres)

[1] 0.23

Alternatively, we could generate the new samples and calculate the sums squares of residuals etc all within JAGS.

dat.list <- with(dat, list(y=y, x=x, N=nrow(dat)))
modelString="
model{
  for (i in 1:N) {
    y[i] ~ dbern(p[i])
    logit(p[i]) <- max(-20,min(20,eta[i]))
    eta[i] <- beta0+beta1*x[i]
    YNew[i] ~dbern(p[i])
    varY[i] <- p[i]*(1-p[i])
    PRes[i] <- (y[i] - p[i]) / sqrt(varY[i])
    PResNew[i] <- (YNew[i] - p[i]) / sqrt(varY[i])
    D[i] <- pow(PRes[i],2)
    DNew[i] <- pow(PResNew[i],2)
  }
  Fit <- sum(D[1:N])
  FitNew <-sum(DNew[1:N]) 
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
  pvalue <- mean(FitNew>Fit)
}
"
writeLines(modelString, con='../downloads/BUGSscripts/tut11.4bS5.bug')
library(R2jags)
dat.logit.jags1 <- jags(data=dat.list,model.file='../downloads/BUGSscripts/tut11.4bS5.bug',
                   param=c('beta0','beta1','Fit','FitNew'),
                   n.chains=3, n.iter=20000, n.burnin=10000, n.thin=10)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 346

Initializing model

print(dat.logit.jags1)

Inference for Bugs model at "../downloads/BUGSscripts/tut11.4bS5.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
         mu.vect sd.vect    2.5%     25%    50%
Fit       92.725 700.607   9.931  13.217 20.734
FitNew    17.011  45.964   1.607   4.049  7.838
beta0     -9.721   4.274 -19.899 -11.975 -9.110
beta1      0.915   0.402   0.323   0.630  0.860
deviance  14.004   2.309  11.710  12.378 13.323
            75%   97.5%  Rhat n.eff
Fit      40.679 381.579 1.016   300
FitNew   16.322  78.923 1.003   760
beta0    -6.620  -3.430 1.011   330
beta1     1.128   1.892 1.008   310
deviance 14.908  20.337 1.012   390

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.7 and DIC = 16.7
DIC is an estimate of expected predictive error (lower deviance is better).

out <- dat.logit.jags1$BUGSoutput
mean(out$sims.list$FitNew > out$sims.list$Fit)

[1] 0.244

Conclusions: although the Bayesian p-value is quite a bit lower than 0.5, suggesting that there is more variability in the data than should be expected from this simple logistic regression model, this value is not any closer to 0 (a value that would indicate that the model does not fit the data at all well. Thus we might conclude that whilst not ideal, the model is adequate.

Exploring the model parameters, test hypotheses

If there was any evidence that the assumptions had been violated or the model was not an appropriate fit, then we would need to reconsider the model and start the process again. In this case, there is no evidence that the test will be unreliable so we can proceed to explore the test statistics.

print(dat.logit.jags)

Inference for Bugs model at "../downloads/BUGSscripts/tut11.4bS1.bug", fit using jags,
 3 chains, each with 20000 iterations (first 10000 discarded), n.thin = 100
 n.sims = 300 iterations saved
         mu.vect sd.vect    2.5%     25%    50%
beta0     -9.898   4.468 -20.384 -12.214 -9.110
beta1      0.941   0.425   0.324   0.638  0.859
deviance  14.073   2.294  11.736  12.364 13.398
            75%  97.5%  Rhat n.eff
beta0    -6.796 -3.839 1.022   100
beta1     1.152  1.913 1.016   120
deviance 14.988 19.460 1.016   100

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.6 and DIC = 16.7
DIC is an estimate of expected predictive error (lower deviance is better).

library(plyr)
adply(dat.logit.jags$BUGSoutput$sims.matrix[,1:2], 2, function(x) {
  data.frame(Median=median(x), Mean=mean(x), HPDinterval(as.mcmc(x)), HPDinterval(as.mcmc(x),p=0.5))
})

     X1  Median    Mean    lower  upper  lower.1
1 beta0 -9.1101 -9.8980 -18.3513 -2.815 -10.2669
2 beta1  0.8593  0.9405   0.2807  1.778   0.5729
  upper.1
1  -5.658
2   1.013

Conclusions: We would reject the null hypothesis (p<0.05). An increase in x is associated with a significant linear increase (positive slope) in log odds of $y$ success. Every 1 unit increase in $x$ results in a 0.8593198 unit increase in log odds-ratio. We usually express this in terms of odds-ratio rather than log odds-ratio, so every 1 unit increase in $x$ results in a ($e^{0.8593198}=2.3615537$) 2.3615537 unit increase in odds-ratio.

Further explorations of the trends

We might also be interested in the LD50 - the value of $x$ where the probability switches from favoring 1 to favoring 0. LD50 is calculated as: $$LD50 = - \frac{intercept}{slope}$$

library(coda)
summary(as.mcmc(-coefs[,1]/coefs[,2]))

Iterations = 1:300
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 300 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

          Mean             SD       Naive SE 
      10.59179        1.09415        0.06317 
Time-series SE 
       0.04197 

2. Quantiles for each variable:

  2.5%    25%    50%    75%  97.5% 
 8.684  9.898 10.495 11.223 12.970

#OR
LD50 <- -coefs[,1]/coefs[,2]
data.frame(Median=median(LD50), Mean=mean(LD50), HPDinterval(as.mcmc(LD50)), HPDinterval(as.mcmc(LD50),p=0.5))

       Median     Mean    lower    upper  lower.1
var1 10.49471 10.59179 8.495029 12.73212 9.646387
      upper.1
var1 10.87609

Conclusions: the LD50 is 10.4947098.

Finally, we will create a summary plot.

par(mar = c(4, 5, 0, 0))
plot(y ~ x, data = dat, type = "n", ann = F, axes = F)
points(y ~ x, data = dat, pch = 16)
xs <- seq(0, 20, l = 1000)

Xmat <- model.matrix(~xs)
eta<-coefs %*% t(Xmat)
library(boot)
ys <- inv.logit(eta)
library(plyr)
data.tab <- adply(ys,2,function(x) {
  data.frame(Median=median(x), HPDinterval(as.mcmc(x)))
})
data.tab <- cbind(x=xs,data.tab)

points(Median ~ x, data=data.tab,col = "black", type = "l")
lines(lower ~ x, data=data.tab,col = "black", type = "l", lty = 2)
lines(upper ~ x, data=data.tab,col = "black", type = "l", lty = 2)
axis(1)
mtext("X", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Y", 2, cex = 1.5, line = 3)
box(bty = "l")

Grouped binary data (and proportional data)

Scenario and Data

In the previous demonstration, the response variable represented the state of a single item per level of the predictor variable ($x$). That single item could be observed having a value of either 1 or 0. Another common situation is to observe the number of items in one of two states (typically dead or alive) for each level of a treatment. For example, you could tally up the number of germinated and non-germinated seeds out of a bank of 10 seeds at each of 8 temperature or nutrient levels.

Recall that the binomial distribution represents the density (probability) of all possible successes (germinations) out of a total of $N$ items (seeds). Hence the binomial distribution is also a suitable error distribution for such grouped binary data.

For this demonstration, we will model the number of successes against a uniformly distributed predictor ($x$). The number of trials in each group (level of the predictor) will vary slightly (yet randomly) so as to mimick complications that inevadably occur in real experimentws.

Details of data generation

Random data incorporating the following trends (effect parameters)

the number of levels of $x$ = 10
the continuous $x$ variable is a random uniform spread of measurements between 10 and 20
the rate of change in log odds ratio per unit change in x (slope) = -0.25. The magnitude of the slope is an indicator of how abruptly the log odds ratio changes. A very abrupt change would occur if there was very little variability in the trend. That is, a large slope would be indicative of a threshold effect. A lower slope indicates a gradual change in odds ratio and thus a less obvious and precise switch between the likelihood of 1 vs 0.
the inflection point (point where the slope of the line is greatest) = 10. The inflection point indicates the value of the predictor ($x$) where the log Odds are 50% (switching between 1 being more likely and 0 being more likely). This is also known as the LD50.
the intercept (which has no real interpretation) is equal to the negative of the slope multiplied by the inflection point
generate the values of $y$ expected at each $x$ value, we evaluate the linear predictor.
generate random numbers of trials per group by drawing integers out of a binomial distribution with a total size of 20 and probability of 0.9. Hence the maximum number of trials per group will be 20, yet some will be less than that.
the number of success per group are drawn from a binomial distribution with a total size equal to the trial sizes generated in the previous step and probabilities equal to the expected $y$ values
finally, the number of failures per group are calculated as the difference between the total trial size and number of successes per group.

set.seed(8)
#The number of levels of x
n.x <- 10
#Create x values that at uniformly distributed throughout the rate of 10 to 20
x <- sort(runif(n = n.x, min = 10, max =20))
#The slope is the rate of change in log odds ratio for each unit change in x
# the smaller the slope, the slower the change (more variability in data too)
slope=-.25
#Inflection point is where the slope of the line is greatest
#this is also the LD50 point
inflect <- 15
#Intercept (no interpretation)
intercept <- -1*(slope*inflect)
#The linear predictor
linpred <- intercept+slope*x
#Predicted y values
y.pred <- exp(linpred)/(1+exp(linpred))
#Add some noise and make binary (0's and 1's)
n.trial <- rbinom(n=n.x,20, prob=0.9)
success <- rbinom(n = n.x, size = n.trial,prob = y.pred)
failure <- n.trial - success
dat <- data.frame(success,failure,x)

Exploratory data analysis and initial assumption checking

The assumptions are:

All of the observations are independent - this must be addressed at the design and collection stages
The response variable (and thus the residuals) should be matched by an appropriate distribution (in the case of a binary response - a binomial is appropriate).
All observations are equally influential in determining the trends - or at least no observations are overly influential. This is most effectively diagnosed via residuals and other influence indices and is very difficult to diagnose prior to analysis
the relationship between the linear predictor (right hand side of the regression formula) and the link function should be linear. A scatterplot with smoother can be useful for identifying possible non-linearity.

So lets explore linearity by creating a histogram of the predictor variable ($x$) and a scatterplot of the relationship between the either the number of successes ($success$) or the number of ($failures$) and the predictor ($x$). Note, that this will not account for the differences in trial size per group and so a scatterplot of the relationship between the number of successes ($success$) or the number of ($failures$) divided by the total number of trials against the predictor ($x$) might be more appropriate.

hist(dat$x)

#now for the scatterplot
plot(success~x, dat)
with(dat, lines(lowess(success~x)))

#scatterplot standardized for trial size
plot(success/(success+failure)~x, dat)
with(dat, lines(lowess(success/(success+failure)~x)))

Conclusions: the predictor ($x$) does not display any skewness (although it is not all that uniform - random data hey!) or other issues that might lead to non-linearity. The lowess smoother on either scatterplot does not display major deviations from a standard sigmoidal curve and thus linearity is likely to be satisfied. Violations of linearity could be addressed by either:

define a non-linear linear predictor (such as a polynomial, spline or other non-linear function)
transform the scale of the predictor variables

Model fitting or statistical analysis - JAGS

Clearly the number of successes is also dependent on the number of trials. Larger numbers of trials might be expected to yeild higher numbers of successes.

dat.list <- with(dat, list(success=success, total=success+failure, x=x, N=nrow(dat)))
modelString="
model{
  for (i in 1:N) {
    success[i] ~ dbin(p[i],total[i])
    logit(p[i]) <- max(-20,min(20,beta0+beta1*x[i]))
  }
  beta0 ~ dnorm(0,1.0E-06)
  beta1 ~ dnorm(0,1.0E-06)
}
"
writeLines(modelString, con='../downloads/BUGSscripts/tut11.4bS2.bug')
library(R2jags)
dat.logit.jags <- jags(data=dat.list,model.file='../downloads/BUGSscripts/tut11.4bS2.bug',
                   param=c('beta0','beta1'),
                   n.chains=3, n.iter=200000, n.burnin=100000, n.thin=100)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 87

Initializing model

As with the logistic regression presented earlier, we could alternatively use probit or clog-log link functions.

Model evaluation

Prior to exploring the model parameters, it is prudent to confirm that the model did indeed fit the assumptions and was an appropriate fit to the data.

We will start by exploring the mixing of the MCMC chains via traceplots.
plot(as.mcmc(dat.logit.jags))
Conclusions: the chains appear well mixed and stable.

Next we will explore correlation amongst MCMC samples.

autocorr.diag(as.mcmc(dat.logit.jags))

                beta0        beta1     deviance
Lag 0     1.000000000  1.000000000  1.000000000
Lag 100   0.237750449  0.243869665  0.025807991
Lag 500   0.005225247  0.007250363  0.008398106
Lag 1000 -0.007840742 -0.009143387 -0.028934638
Lag 5000 -0.004896295 -0.004932879 -0.027304575

It seems that the level of auto-correlation at the nominated lag of 10 is extremely high. Ideally, the level of auto-correlation should be less than 0.1. To achieve this, we need a lag of 100. Consequently, we will resample at a lag of 100 and obviously we are going to need more iterations to ensure that we retain a large enough sample from which to derive estimates.

library(R2jags)
dat.logit.jags <- jags(data=dat.list,model.file='../downloads/BUGSscripts/tut11.4bS2.bug',
                   param=c('beta0','beta1'),
                   n.chains=3, n.iter=2000000, n.burnin=10000, n.thin=1000)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 87

Initializing model

plot(as.mcmc(dat.logit.jags))

autocorr.diag(as.mcmc(dat.logit.jags))

                 beta0        beta1     deviance
Lag 0      1.000000000  1.000000000  1.000000000
Lag 1000  -0.009933648 -0.010646679  0.006179711
Lag 5000   0.011433698  0.010182990  0.005045334
Lag 10000 -0.012786987 -0.016708011  0.026831468
Lag 50000  0.006382612  0.003937007 -0.004550967

Lets explore the diagnostics - particularly the residuals

inv.logit <- binomial()$linkinv
#Calculate residuals
coefs <- dat.logit.jags$BUGSoutput$sims.matrix[,1:2]
Xmat <- model.matrix(~x, data=dat)
eta<-coefs %*% t(Xmat)
pi <- inv.logit(eta)
#sweep across rows and then divide by pi
Resid <- -1*sweep(pi,2,dat$success/(dat$success+dat$failure),'-')/sqrt(pi*(1-pi))
plot(apply(Resid,2,mean)~apply(eta,2,mean))
lines(lowess(apply(Resid,2,mean)~apply(eta,2,mean)))

Conclusions: there is no obvious patterns in the residuals, or at least there are no obvious trends remaining that would be indicative of non-linearity.

Now we will compare the sum of squared residuals to the sum of squares residuals that would be expected from a Bernoulli distribution matching that estimated by the model. Essentially this is estimating how well the Bernoulli distribution and linear model approximates the observed data.
SSres<-apply(Resid^2,1,sum) set.seed(2) #generate a matrix of draws from a binomial distribution #the matrix is the same dimensions as pi and uses the probabilities of pi YNew <- matrix(rbinom(length(pi),prob=pi,size=(dat$success+dat$failure)),nrow=nrow(pi)) Resid1 <- 1*(pi-YNew/(dat$success+dat$failure))/sqrt(pi*(1-pi)) SSres.sim<-apply(Resid1^2,1,sum) mean(SSres.sim>SSres)
[1] 0.6445561
Conclusions: this Bayesian p-value is reasonably close to 0.5. Therefore we would conclude that there was no strong evidence for a lack of fit of the model.

Further explorations of the trends

We might also be interested in the LD50 - the value of $x$ where the probability switches from favoring 1 to favoring 0. LD50 is calculated as: $$LD50 = - \frac{intercept}{slope}$$

library(coda)
summary(as.mcmc(-coefs[,1]/coefs[,2]))

Iterations = 1:5970
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 5970 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

          Mean             SD       Naive SE 
     15.672795       0.689132       0.008919 
Time-series SE 
      0.007972 

2. Quantiles for each variable:

 2.5%   25%   50%   75% 97.5% 
14.28 15.27 15.68 16.09 16.95

#OR
LD50 <- -coefs[,1]/coefs[,2]
data.frame(Median=median(LD50), Mean=mean(LD50),
           HPDinterval(as.mcmc(LD50)), HPDinterval(as.mcmc(LD50),p=0.5))

       Median     Mean    lower    upper  lower.1
var1 15.67809 15.67279 14.39209 17.04647 15.27153
      upper.1
var1 16.09027

Conclusions: the LD50 is 15.6780855.

Finally, we will create a summary plot.

par(mar = c(4, 5, 0, 0))
plot(success/(success+failure) ~ x, data = dat, type = "n", ann = F, axes = F)
points(success/(success+failure) ~ x, data = dat, pch = 16)
xs <- seq(min(dat$x, na.rm=TRUE),max(dat$x, na.rm=TRUE), l = 1000)

Xmat <- model.matrix(~xs)
eta<-coefs %*% t(Xmat)
library(boot)
ys <- inv.logit(eta)
library(plyr)
data.tab <- adply(ys,2,function(x) {
  data.frame(Median=median(x), HPDinterval(as.mcmc(x)))
})
data.tab <- cbind(x=xs,data.tab)

points(Median ~ x, data=data.tab,col = "black", type = "l")
with(data.tab,polygon(c(x,rev(x)),c(lower,rev(upper)), col="#0000ff60", border=NA))
#lines(lower ~ x, data=data.tab,col = "black", type = "l", lty = 2)
#lines(upper ~ x, data=data.tab,col = "black", type = "l", lty = 2)
axis(1)
mtext("X", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Probability of success", 2, cex = 1.5, line = 3)
box(bty = "l")

Or via ggplot2

xs <- seq(min(dat$x, na.rm=TRUE),max(dat$x, na.rm=TRUE), l = 1000)
Xmat <- model.matrix(~xs)
eta<-coefs %*% t(Xmat)
library(boot)
ys <- inv.logit(eta)
library(plyr)
data.tab <- adply(ys,2,function(x) {
  data.frame(Median=median(x), HPDinterval(as.mcmc(x)))
})
data.tab <- cbind(x=xs,data.tab)

library(ggplot2)
library(grid)
dat$p <- with(dat, success/(success+failure))
p1 <- ggplot(data.tab,aes(y=Median, x=x)) + geom_point(data=dat,aes(y=p, x=x),color="gray40")+
             geom_smooth(aes(ymin=lower, ymax=upper), stat="identity")+
                         scale_x_continuous("X")+scale_y_continuous("Probability of success")
p1+theme(panel.grid.major=element_blank(),
         panel.grid.minor=element_blank(),
         panel.border=element_blank(),
         panel.background=element_blank(),
                 axis.title.y=element_text(size=15,vjust=0,angle=90),
                 axis.title.x=element_text(size=15,vjust=-1),
                 axis.text.y=element_text(size=12),
                 axis.text.x=element_text(size=12),
                 axis.line=element_line(),
                 plot.margin=unit(c(0.5,0.5,2,2), "lines"))

Worked Examples

Basic χ² references

Logan (2010) - Chpt 16-17
Quinn & Keough (2002) - Chpt 13-14

Logistic regression

Polis et al. (1998) were intested in modelling the presence/absence of lizards (Uta sp.) against the perimeter to area ratio of 19 islands in the Gulf of California.

Download Polis data set

Format of polis.csv data file

ISLAND	RATIO	PA
Bota	15.41	1
Cabeza	5.63	1
Cerraja	25.92	1
Coronadito	15.17	0
..	..	..

ISLAND	Categorical listing of the name of the 19 islands used - variable not used in analysis.
RATIO	Ratio of perimeter to area of the island.
PA	Presence (1) or absence (0) of Uta lizards on island.

Open the polis data file (HINT).

Show code

polis <- read.table('../downloads/data/polis.csv', header=T, sep=',', strip.white=T)
head(polis)

      ISLAND RATIO PA
1       Bota 15.41  1
2     Cabeza  5.63  1
3    Cerraja 25.92  1
4 Coronadito 15.17  0
5     Flecha 13.04  1
6   Gemelose 18.85  0

Fit the appropriate logistic regression model

Show JAGS code

modelString="
model
{
  for (i in 1:n) {
	pa[i] ~ dbin(p[i],1)
	logit(p[i]) <- alpha+beta*ratio[i]
  }
	alpha~dnorm(0,1.0E-6)
	beta ~ dnorm(0,1.0E-6)
} 
"
## write the model to a text file (I suggest you alter the path to somewhere more relevant 
## to your system!)  
writeLines(modelString,con="../downloads/BUGSscripts/ws10.5bQ3.1.txt")

polis.list <- with(polis,
                         list(pa=PA,
                         ratio=RATIO,
                         n=nrow(polis))
)


params <- c("alpha","beta")

adaptSteps = 1000
burnInSteps = 2000
nChains = 3
numSavedSteps = 50000
thinSteps = 10
nIter = ceiling((numSavedSteps * thinSteps)/nChains)

polis.r2jags <- jags(data=polis.list,
          inits=NULL,#list(inits,inits,inits), #or inits=list(inits,inits,inits) # since there are three chains
          parameters.to.save=params,
          model.file="../downloads/BUGSscripts/ws10.5bQ3.1.txt",
          n.chains=3,
          n.iter=nIter,
          n.burnin=burnInSteps,
      n.thin=10
          )

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 101

Initializing model

print(polis.r2jags)

Inference for Bugs model at "../downloads/BUGSscripts/ws10.5bQ3.1.txt", fit using jags,
 3 chains, each with 166667 iterations (first 2000 discarded), n.thin = 10
 n.sims = 49401 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
alpha      4.616   2.034  1.411  3.166  4.345  5.782  9.336 1.001 19000
beta      -0.283   0.122 -0.565 -0.352 -0.267 -0.196 -0.092 1.001 11000
deviance  16.422   2.167 14.279 14.861 15.768 17.274 22.378 1.001 13000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.3 and DIC = 18.8
DIC is an estimate of expected predictive error (lower deviance is better).

Show BRM code

library(brms)

polis.brm <- brm(PA ~ RATIO, data = polis, family = binomial, prior = c(set_prior("normal(0,100)", class = "b")),
    chains = 3, iter = 2000, warmup = 500, thin = 2)

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 1, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 1, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 1, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.033453 seconds (Warm-up)
#                0.080412 seconds (Sampling)
#                0.113865 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 2, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 2, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 2, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 2, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.034379 seconds (Warm-up)
#                0.104578 seconds (Sampling)
#                0.138957 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 3, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 3, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 3, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 3, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.031965 seconds (Warm-up)
#                0.082741 seconds (Sampling)
#                0.114706 seconds (Total)
#

We could alternatively center the perimeter-area ratio predictor first. The will help with model convergence and autocorrelation.

Show the JAGS code

modelString="
data {
  cratio <- ratio-mean(ratio)
}
model
{
  for (i in 1:n) {
	pa[i] ~ dbern(p[i])
	logit(p[i]) <- alpha+beta*cratio[i]
  }
	alpha~dnorm(0.001,1.0E-6)
	beta ~ dnorm(0.001,1.0E-6)

	#Put the intercept back in the original scale
	intercept <- alpha - beta*mean(ratio)

    #odds ratio
    odds_ratio <- exp(beta)

	#LD50
	LD50<- (-1*intercept)/beta 
} 
"
## write the model to a text file (I suggest you alter the path to somewhere more relevant 
## to your system!)  
writeLines(modelString,con="../downloads/BUGSscripts/ws10.5bQ3.1b.txt")

polis.list <- with(polis,
                         list(pa=PA,
                         ratio=RATIO,
                         n=nrow(polis))
)


params <- c("alpha","beta", "intercept","odds_ratio","LD50")

adaptSteps = 1000
burnInSteps = 2000
nChains = 3
numSavedSteps = 50000
thinSteps = 10
nIter = ceiling((numSavedSteps * thinSteps)/nChains)

polis.r2jags3 <- jags(data=polis.list,
          inits=NULL,#list(inits,inits,inits), #or inits=list(inits,inits,inits) # since there are three chains
          parameters.to.save=params,
          model.file="../downloads/BUGSscripts/ws10.5bQ3.1b.txt",
          n.chains=3,
          n.iter=nIter,
          n.burnin=burnInSteps,
      n.thin=10
          )

Compiling data graph
   Resolving undeclared variables
   Allocating nodes
   Initializing
   Reading data back into data table
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 128

Initializing model

print(polis.r2jags3)

Inference for Bugs model at "../downloads/BUGSscripts/ws10.5bQ3.1b.txt", fit using jags,
 3 chains, each with 166667 iterations (first 2000 discarded), n.thin = 10
 n.sims = 49401 iterations saved
           mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
LD50        16.489   3.371 10.655 14.547 16.333 18.251 23.217 1.001 34000
alpha       -0.688   0.833 -2.470 -1.202 -0.638 -0.119  0.817 1.001 49000
beta        -0.284   0.122 -0.567 -0.353 -0.268 -0.197 -0.093 1.001  4200
intercept    4.628   2.043  1.407  3.173  4.356  5.795  9.382 1.001  4200
odds_ratio   0.758   0.088  0.568  0.703  0.765  0.821  0.912 1.001  4200
deviance    16.418   2.165 14.277 14.860 15.754 17.283 22.262 1.001  5900

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.3 and DIC = 18.8
DIC is an estimate of expected predictive error (lower deviance is better).

Show BRM code

library(brms)

center <- function(x) {
    m = mean(x, na.rm = TRUE)
    x = x - m
    attr(x, "mean") <- m
    x
}
polis$cRATIO <- center(polis$RATIO)
polis.brm3 <- brm(PA ~ cRATIO, data = polis, family = binomial, prior = c(set_prior("normal(0,100)",
    class = "b")), chains = 3, iter = 2000, warmup = 500, thin = 2)

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 1, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 1, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 1, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.050354 seconds (Warm-up)
#                0.111418 seconds (Sampling)
#                0.161772 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 2, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 2, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 2, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 2, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.048 seconds (Warm-up)
#                0.107309 seconds (Sampling)
#                0.155309 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 3, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 3, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 3, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 3, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.046641 seconds (Warm-up)
#                0.102475 seconds (Sampling)
#                0.149116 seconds (Total)
#

Lets now explore the chain mixing characteristics

Trace plots

Show JAGS code

plot(as.mcmc(polis.r2jags))

plot of chunk ws10.5bQ3.1ChainMixingJAGS

Show BRM code

library(gridExtra)
grid.arrange(stan_trace(polis.brm$fit, ncol=1),
             stan_dens(polis.brm$fit, separate_chains=TRUE,ncol=1),
             ncol=2)

plot of chunk ws10.5bQ3.1ChainMixingBRMS

Autocorrelation

Show JAGS code

autocorr.diag(as.mcmc(polis.r2jags))

               alpha        beta     deviance
Lag 0   1.0000000000 1.000000000 1.0000000000
Lag 10  0.3486302850 0.411327730 0.2095816015
Lag 50  0.0077084413 0.015038682 0.0021926870
Lag 100 0.0089742845 0.009909846 0.0009875869
Lag 500 0.0009197652 0.006464200 0.0015197919

## and the centered version
autocorr.diag(as.mcmc(polis.r2jags3))

               alpha         beta    deviance    intercept         LD50    odds_ratio
Lag 0   1.0000000000  1.000000000 1.000000000  1.000000000  1.000000000  1.000000e+00
Lag 10  0.1064366797  0.419052223 0.223137076  0.355734236  0.009911351  3.947151e-01
Lag 50  0.0036116763  0.016284778 0.015703330  0.011656509 -0.001278621  1.288787e-02
Lag 100 0.0041638195  0.001087960 0.005615787  0.001867140  0.002471861  6.922861e-05
Lag 500 0.0007728067 -0.002840246 0.002862181 -0.002974905  0.009877353 -4.708218e-03

Show BRM code

stan_ac(polis.brm$fit)

plot of chunk ws10.5bQ3.1AutocorrelationBRMS

## and the centered version
stan_ac(polis.brm3$fit)

Conclusions: there is evidence of autocorrelation (particularly the JAGS model). Note that the samples from the centered models are substantially less uncorrelated. Nevertheless, the JAGS model requires more thinning..

Show the JAGS code

modelString="
data {
  cratio <- ratio-mean(ratio)
}
model
{
  for (i in 1:n) {
	pa[i] ~ dbern(p[i])
	logit(p[i]) <- alpha+beta*cratio[i]
  }
	alpha~dnorm(0.001,1.0E-6)
	beta ~ dnorm(0.001,1.0E-6)

	#Put the intercept back in the original scale
	intercept <- alpha - beta*mean(ratio)

    #odds ratio
    odds_ratio <- exp(beta)

	#LD50
	LD50<- (-1*intercept)/beta 
} 
"


polis.list <- with(polis,
                         list(pa=PA,
                         ratio=RATIO,
                         n=nrow(polis))
)


params <- c("alpha","beta", "intercept","odds_ratio","LD50")

adaptSteps = 1000
burnInSteps = 2000
nChains = 3
numSavedSteps = 50000
thinSteps = 10
nIter = ceiling((numSavedSteps * thinSteps)/nChains)

polis.r2jags3 <- jags(data=polis.list,
          inits=NULL,#list(inits,inits,inits), #or inits=list(inits,inits,inits) # since there are three chains
          parameters.to.save=params,
          model.file=textConnection(modelString),
          n.chains=3,
          n.iter=nIter,
          n.burnin=burnInSteps,
      n.thin=100
          )

Compiling data graph
   Resolving undeclared variables
   Allocating nodes
   Initializing
   Reading data back into data table
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 128

Initializing model

autocorr.diag(as.mcmc(polis.r2jags3))

                alpha        beta     deviance   intercept        LD50  odds_ratio
Lag 0     1.000000000 1.000000000  1.000000000 1.000000000  1.00000000 1.000000000
Lag 100   0.006012842 0.003251699 -0.001141558 0.009113843  0.01058293 0.006021338
Lag 500  -0.019434918 0.015808090  0.009210730 0.004478822 -0.04252710 0.014713028
Lag 1000  0.021803394 0.015814476 -0.020245532 0.008612682  0.01246592 0.019784439
Lag 5000  0.007324124 0.003770143 -0.007788371 0.014764148  0.01070551 0.005464544

Conclusions: that is better..

Step size characteristics (STAN only)

Show BRM code

summary(do.call(rbind, args = get_sampler_params(polis.brm3$fit, inc_warmup = FALSE)), digits = 2)

 accept_stat__       stepsize__    treedepth__   n_leapfrog__ n_divergent__
 Min.   :6.1e-05   Min.   :0.55   Min.   :1.0   Min.   :1     Min.   :0    
 1st Qu.:8.9e-01   1st Qu.:0.55   1st Qu.:2.0   1st Qu.:3     1st Qu.:0    
 Median :9.6e-01   Median :0.57   Median :2.0   Median :3     Median :0    
 Mean   :9.0e-01   Mean   :0.61   Mean   :2.2   Mean   :4     Mean   :0    
 3rd Qu.:9.9e-01   3rd Qu.:0.70   3rd Qu.:3.0   3rd Qu.:7     3rd Qu.:0    
 Max.   :1.0e+00   Max.   :0.70   Max.   :3.0   Max.   :7     Max.   :0

stan_diag(polis.brm3$fit)

stan_diag(polis.brm3$fit, information = "stepsize")

stan_diag(polis.brm3$fit, information = "treedepth")

stan_diag(polis.brm3$fit, information = "divergence")

library(gridExtra)
grid.arrange(stan_rhat(polis.brm3$fit) + theme_classic(8),
             stan_ess(polis.brm3$fit) + theme_classic(8),
             stan_mcse(polis.brm3$fit) + theme_classic(8),
             ncol = 2)

Now we should check the model fit diagnostics

Show JAGS code

inv.logit <- binomial()$linkinv
#Calculate residuals
coefs <- polis.r2jags3$BUGSoutput$sims.matrix[,c('alpha','beta')]
Xmat <- model.matrix(~cRATIO, data=polis)
eta<-coefs %*% t(Xmat)
pi <- inv.logit(eta)
#sweep across rows and then divide by pi
Resid <- -1*sweep(pi,2,polis$PA,'-')/sqrt(pi*(1-pi))
plot(apply(Resid,2,mean)~apply(eta,2,mean))
lines(lowess(apply(Resid,2,mean)~apply(eta,2,mean)))

SSres<-apply(Resid^2,1,sum)
set.seed(2)
#generate a matrix of draws from a binomial distribution
# the matrix is the same dimensions as pi and uses the probabilities of pi
YNew <- matrix(rbinom(length(pi),prob=pi,size=1),nrow=nrow(pi))

Resid1<-(pi - YNew)/sqrt(pi*(1-pi))
SSres.sim<-apply(Resid1^2,1,sum)
mean(SSres.sim>SSres)

[1] 0.2493422

Show BRM code

inv.logit <- binomial()$linkinv
#Calculate residuals 
Resid <- residuals(polis.brm3, type='pearson')
Fitted <- fitted(polis.brm3, scale='linear')
ggplot(data=NULL, aes(y=Resid[,'Estimate'], x=Fitted[,'Estimate'])) + geom_point()

SSres <- sum(Resid[,'Estimate']^2)
pi = fitted(polis.brm3, scale='response')
YNew <- matrix(rbinom(length(pi),prob=pi,size=1),nrow=nrow(pi))

Resid1<-(pi - YNew)/sqrt(pi*(1-pi))
SSres.sim<-apply(Resid1^2,1,sum)
mean(SSres.sim>SSres)

[1] 0

Check overdispersion

Show JAGS code

#extract the samples for the two model parameters
coefs <- polis.r2jags3$BUGSoutput$sims.matrix[,c('alpha','beta')]
Xmat <- model.matrix(~x, data=dat)
#expected values on a logit scale
eta<-coefs %*% t(Xmat)
#expected value on response scale
pi <- inv.logit(eta)

Resid <- -1*sweep(pi,2,polis$PA, '-')/sqrt(pi)
RSS <- apply(Resid^2, 1, sum)
Disp <- RSS/(nrow(polis)-ncol(coefs))
data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)), HPDinterval(as.mcmc(Disp),p=0.5))

       Median     Mean     lower    upper   lower.1  upper.1
var1 49.37055 37914.67 0.2159034 9435.719 0.2159034 49.37055

Show BRM code

Resid <- residuals(polis.brm, type='pearson',summary=FALSE)
RSS <- apply(Resid^2,1,sum) #sum(Resid[,'Estimate']^2)
Disp <- RSS/(nrow(polis)-ncol(coefs))
data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))

       Median     Mean     lower    upper
var1 1.021486 1.077333 0.9623044 1.318911

Explore the parameter estimates

Show JAGS code

print(polis.r2jags3)

Inference for Bugs model at "5", fit using jags,
 3 chains, each with 166667 iterations (first 2000 discarded), n.thin = 100
 n.sims = 4941 iterations saved
           mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
LD50        16.481   3.314 10.612 14.556 16.316 18.239 23.413 1.002  2800
alpha       -0.705   0.855 -2.568 -1.227 -0.645 -0.119  0.801 1.002  1600
beta        -0.288   0.125 -0.588 -0.356 -0.271 -0.196 -0.099 1.002  1400
intercept    4.684   2.090  1.540  3.157  4.410  5.866  9.737 1.002  2400
odds_ratio   0.756   0.090  0.555  0.700  0.762  0.822  0.906 1.002  1400
deviance    16.491   2.243 14.274 14.894 15.803 17.385 22.644 1.002  1900

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.5 and DIC = 19.0
DIC is an estimate of expected predictive error (lower deviance is better).

coefs <- polis.r2jags3$BUGSoutput$sims.matrix[,c('alpha','beta')]
plyr:::adply(exp(coefs), 2, function(x) {
  data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
}
)

     X1      Mean    median       lower     upper
1 alpha 0.6882555 0.5246897 0.007931256 1.8319500
2  beta 0.7558136 0.7623621 0.577557742 0.9232893

Show BRM code

print(polis.brm3)

 Family: binomial (logit) 
Formula: PA ~ cRATIO 
   Data: polis (Number of observations: 19) 
Samples: 3 chains, each with iter = 2000; warmup = 500; thin = 2; 
         total post-warmup samples = 2250
   WAIC: Not computed
 
Fixed Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    -0.68      0.86    -2.49     0.85       1157    1
cRATIO       -0.28      0.12    -0.55    -0.09       1130    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

coefs <- as.matrix(as.data.frame(rstan:::extract(polis.brm3$fit)))
coefs <- coefs[,grep('b', colnames(coefs))]
plyr:::adply(exp(coefs), 2, function(x) {
  data.frame(Mean=mean(x), median=median(x), HPDinterval(as.mcmc(x)))
  })

           X1      Mean    median       lower     upper
1 b_Intercept 0.7094591 0.5522342 0.007441255 1.8576976
2    b_cRATIO 0.7618033 0.7665764 0.598009286 0.9276557

marginal_effects(polis.brm3)

Conclusion: Note the 'slope' parameter is on a log odds ratio scale. Typically we would prefer to interpret the model parameters on a natural scale (hence the exponentiated posteriors).

At the average perimeter to area ratio, the ratio of odds of the lizard being present to the odds of being absent is 0.71
For each 1 unit increase in perimeter to area ratio, the ratio of odds of being present to odds of being absent changes by a factor of 0.76. Hence, for each 1 unit increase in perimeter, the probability of the lizard being present declines by nearly 25%.

Construct a scatterplot of the presence/absence of Uta lizards against perimeter to area ratio for the 19 islands

Show JAGS code

newdata <- with(polis, data.frame(cRATIO=seq(min(cRATIO), max(cRATIO), len=1000)))
newdata$RATIO <- newdata$cRATIO+attr(polis$cRATIO, 'mean')
Xmat <- model.matrix(~cRATIO, data=newdata)
coefs <- polis.r2jags3$BUGSoutput$sims.matrix[,c('alpha','beta')]
fit <- inv.logit(coefs %*% t(Xmat))
newdata = cbind(newdata, plyr:::adply(fit, 2, function(x) {
   data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
}))

ggplot(newdata, aes(y=Median, x=RATIO)) +
  geom_point(data=polis, aes(y=PA)) +
  geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue', alpha=0.2) +
  geom_line() +
  scale_x_continuous('Perimeter to Area ratio')+
  scale_y_continuous(expression(italic(Uta)~lizard~presence/absence))+
  theme_classic()+
  theme(axis.line.y=element_line(),axis.line.x=element_line())

plot of chunk ws10.5bQ3.1OScatterplotJAGS

Show BRM code

newdata <- with(polis, data.frame(cRATIO=seq(min(cRATIO), max(cRATIO), len=1000)))
newdata$RATIO <- newdata$cRATIO+attr(polis$cRATIO, 'mean')
Xmat <- model.matrix(~cRATIO, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(polis.brm3$fit)))
coefs <- coefs[,grep('b', colnames(coefs))]
fit <- inv.logit(coefs %*% t(Xmat))
newdata = cbind(newdata, plyr:::adply(fit, 2, function(x) {
   data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
}))

ggplot(newdata, aes(y=Median, x=RATIO)) +
  geom_point(data=polis, aes(y=PA)) +
  geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue', alpha=0.2) +
  geom_line() +
  scale_x_continuous('Perimeter to Area ratio')+
  scale_y_continuous(expression(italic(Uta)~lizard~presence/absence))+
  theme_classic()+
  theme(axis.line.y=element_line(),axis.line.x=element_line())

plot of chunk ws10.5bQ3.1OScatterplotBRM

Calculate the LD50 (in this case, the perimeter to area ratio with a predicted probability of 0.5) from the fitted model

Show JAGS code

coefs <- polis.r2jags3$BUGSoutput$sims.matrix[,c('alpha','beta')]

ld50 <- (-coefs[,1]/coefs[,2]) + attr(polis$cRATIO, 'mean')
data.frame(Mean=mean(ld50), Median=median(ld50),HPDinterval(as.mcmc(ld50)))

        Mean   Median    lower    upper
var1 16.4807 16.31564 10.20908 22.62921

Show BRM code

coefs <- as.matrix(as.data.frame(rstan:::extract(polis.brm3$fit)))
coefs <- coefs[,grep('b', colnames(coefs))]

ld50 <- (-coefs[,1]/coefs[,2]) + attr(polis$cRATIO, 'mean')
data.frame(Mean=mean(ld50), Median=median(ld50),HPDinterval(as.mcmc(ld50)))

         Mean  Median    lower    upper
var1 16.60847 16.4525 10.29575 23.00646

Calculate the R² value

Show JAGS code

inv.logit <- binomial()$linkinv
#Calculate residuals
coefs <- polis.r2jags3$BUGSoutput$sims.matrix[,c('alpha','beta')]
Xmat <- model.matrix(~cRATIO, data=polis)
eta<-coefs %*% t(Xmat)
pi <- inv.logit(eta)
#sweep across rows and then divide by pi
Resid <- -1*sweep(pi,2,polis$PA,'-')
R.var <- apply(Resid,1,var)
X.var <- apply(pi,1,var)
R2.marginal <- X.var/(X.var + R.var)
data.frame(Median=median(R2.marginal), coda:::HPDinterval(as.mcmc(R2.marginal)))

        Median     lower     upper
var1 0.5376924 0.3574792 0.6184803

Show BRM code

R.var = apply(residuals(polis.brm3, summary=FALSE),1,var)
X.var = apply(fitted(polis.brm3, summary=FALSE),1,var)
R2.marginal <- X.var/(X.var + R.var)
data.frame(Median=median(R2.marginal), coda:::HPDinterval(as.mcmc(R2.marginal)))

        Median     lower     upper
var1 0.5362865 0.3439372 0.6173302

Proportional data

Arrington et al. (2002) explored patterns in the frequency with which fishes have empty stomachs. In particular, they were interested in investigating whether the frequency of empty stomachs differed between trophic classifications. Live captured fish were classified and the frequency of individuals in each taxa with empty stomachs were tallied.

Download full Arrington data set

Format of arrington_full.csv data file

...	Trophic	Individuals	Empty	Empty_perc
...	Invertivore	37	4	0.108
...	Invertivore	83	14	0.108
...	Invertivore	46	2	0.043
...	Invertivore	11	0	0.000
...	Invertivore	47	0	0.000
...	Omnivore	17	0	0.000

Trophic	Classification of trophic group (Algiv./Detritiv, Invertivore, Omnivore, Piscivore).
Individuals	Total number of fish individuals captured of the taxa.
Empty	Number of individuals of the taxa with empty stomachs
Empty_perc	Percentage of individuals of the taxa with empty stomachs

Open the polis data file (HINT).

Show code

arrington <- read.csv('../downloads/data/arrington_full.csv', strip.white=T)
head(arrington)

              Order     Family                      Species Location Behaviour     Trophic
1 Osteoglossiformes Mormyridae Hippopotamyrus discorhynchus   Africa Nocturnal Invertivore
2 Osteoglossiformes Mormyridae   Marcusenius macrolepidotus   Africa Nocturnal Invertivore
3 Osteoglossiformes Mormyridae             Mormyrus lacerda   Africa Nocturnal Invertivore
4 Osteoglossiformes Mormyridae      Petrocephalus catostoma   Africa Nocturnal Invertivore
5 Osteoglossiformes Mormyridae        Pollimyrus castelnaui   Africa Nocturnal Invertivore
6     Cypriniformes Cyprinidae             Barbus annectens   Africa   Diurnal    Omnivore
  Individuals Empty Empty_perc
1          37     4 0.10810811
2          83    14 0.16867470
3          46     2 0.04347826
4          11     0 0.00000000
5          47     0 0.00000000
6          17     0 0.00000000

We might be tempted to fit a simple linear (anova) model to relate frequency (percentage) of empty stomachs to trophic group. Lets then pursue the regular exploratory data analyses.
Show code
library(ggplot2) ggplot(arrington, aes(y=Empty_perc, x=Trophic)) + geom_boxplot()
Clearly, these data are not normal and there is some suggestion of a relationship between mean and variance. Furthermore, as there are numerous percentages of zero or close to zero, it is possible that predictions will yield expectations or confidence intervals less than 0.5. This is clearly illogical - it is not possible to have a frequency (percentage) less than zero.

These data present an opportunity to explore a range of analysis options.
- fit a linear model on arcsin transformed percentage of empty stomachs
- fit a linear model on logit transformed percentage of empty stomachs
- fit a generalized linear model (binomial distribution) on frequency (count) of empty stomachs

Fit a linear model on arcsin transformed percentage of empty stomachs

Show BRM code

#arrington$aEmpty_perc <- asin(sqrt(arrington$Empty_perc))

arrington.brm1 <- brm(asin(sqrt(Empty_perc)) ~ Trophic, data=arrington,
      prior=c(set_prior('normal(0,1000)', class='b'),
              set_prior('cauchy(0,4)', class='sigma')),
      chains=3, iter=2000, warmup=500, thin=2)

SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 1, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 1, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 1, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.202954 seconds (Warm-up)
#                0.473325 seconds (Sampling)
#                0.676279 seconds (Total)
# 

SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 2, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 2, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 2, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 2, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.165517 seconds (Warm-up)
#                0.399017 seconds (Sampling)
#                0.564534 seconds (Total)
# 

SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 3, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 3, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 3, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 3, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.165979 seconds (Warm-up)
#                0.455872 seconds (Sampling)
#                0.621851 seconds (Total)
#

Check the chain mixing characteristics

Show BRM code for traceplots and autocorrelation

library(gridExtra)
grid.arrange(stan_trace(arrington.brm1$fit, ncol=1),
             stan_dens(arrington.brm1$fit, separate_chains=TRUE,ncol=1),
             ncol=2)

stan_ac(arrington.brm1$fit)

Show BRM code for step characteristics

summary(do.call(rbind, args = get_sampler_params(arrington.brm1$fit, inc_warmup = FALSE)), digits = 2)

 accept_stat__    stepsize__    treedepth__   n_leapfrog__  n_divergent__
 Min.   :0.36   Min.   :0.46   Min.   :1.0   Min.   : 1.0   Min.   :0    
 1st Qu.:0.87   1st Qu.:0.46   1st Qu.:2.0   1st Qu.: 3.0   1st Qu.:0    
 Median :0.95   Median :0.46   Median :3.0   Median : 7.0   Median :0    
 Mean   :0.91   Mean   :0.48   Mean   :2.7   Mean   : 6.3   Mean   :0    
 3rd Qu.:0.99   3rd Qu.:0.52   3rd Qu.:3.0   3rd Qu.: 7.0   3rd Qu.:0    
 Max.   :1.00   Max.   :0.52   Max.   :4.0   Max.   :15.0   Max.   :0

stan_diag(arrington.brm1$fit)

stan_diag(arrington.brm1$fit, information = "stepsize")

library(gridExtra)
grid.arrange(stan_rhat(arrington.brm1$fit) + theme_classic(8),
             stan_ess(arrington.brm1$fit) + theme_classic(8),
             stan_mcse(arrington.brm1$fit) + theme_classic(8),
             ncol = 2)

Check model fit diagnostics

Show BRM code

inv.logit <- binomial()$linkinv
#Calculate residuals   
Resid <- residuals(arrington.brm1, type='pearson')
Fitted <- fitted(arrington.brm1, scale='linear')
ggplot(data=NULL, aes(y=Resid[,'Estimate'], x=Fitted[,'Estimate'])) + geom_point()

SSres <- sum(Resid[,'Estimate']^2)
pi = fitted(arrington.brm1, scale='response')
YNew <- matrix(rbinom(length(pi),prob=pi,size=1),nrow=nrow(pi))

Resid1<-(pi - YNew)/sqrt(pi*(1-pi))
SSres.sim<-apply(Resid1^2,1,sum)
mean(SSres.sim>SSres)

[1] 0

## overdispersion
Resid <- residuals(arrington.brm1, type='pearson',summary=FALSE)
RSS <- apply(Resid^2,1,sum) #sum(Resid[,'Estimate']^2)
Disp <- RSS/(nrow(arrington)-ncol(coefs))
data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))

        Median      Mean     lower    upper
var1 0.9832764 0.9861594 0.9714418 1.007431

There are no major concerns with these diagnostics, so we can now explore the model parameters.

Show BRM code

print(arrington.brm1)

 Family: gaussian (identity) 
Formula: asin(sqrt(Empty_perc)) ~ Trophic 
   Data: arrington (Number of observations: 254) 
Samples: 3 chains, each with iter = 2000; warmup = 500; thin = 2; 
         total post-warmup samples = 2250
   WAIC: Not computed
 
Fixed Effects: 
                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept              0.22      0.04     0.13     0.31       1208    1
TrophicInvertivore     0.09      0.05    -0.01     0.19       1222    1
TrophicOmnivore        0.00      0.05    -0.11     0.10       1297    1
TrophicPiscivore       0.36      0.05     0.25     0.46       1277    1

Family Specific Parameters: 
                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma(Empty_perc)     0.23      0.01     0.21     0.25       1732    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

WAIC(arrington.brm1)

   WAIC    SE
 -23.65 20.66

marginal_effects(arrington.brm1)

Fit a linear model on logit transformed percentage of empty stomachs

Fit a linear model on arcsin transformed percentage of empty stomachs

Show BRM code

logit <- binomial()$linkfun
logit.inv <- binomial()$linkinv
arrington$Perc <- arrington$Empty_perc + min(arrington$Empty_perc[arrington$Empty_perc>0])
arrington.brm2 <- brm(logit(Perc) ~ Trophic, data=arrington,
      prior=c(set_prior('normal(0,100)', class='b'),
              set_prior('cauchy(0,2)', class='sigma')),
      chains=3, iter=2000, warmup=500, thin=2)

SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 1, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 1, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 1, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.185868 seconds (Warm-up)
#                0.47553 seconds (Sampling)
#                0.661398 seconds (Total)
# 

SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 2, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 2, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 2, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 2, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.202127 seconds (Warm-up)
#                0.438679 seconds (Sampling)
#                0.640806 seconds (Total)
# 

SAMPLING FOR MODEL 'gaussian(identity) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 3, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 3, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 3, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 3, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.189506 seconds (Warm-up)
#                0.463606 seconds (Sampling)
#                0.653112 seconds (Total)
#

Check the chain mixing characteristics

Show BRM code for traceplots and autocorrelation

library(gridExtra)
grid.arrange(stan_trace(arrington.brm2$fit, ncol=1),
             stan_dens(arrington.brm2$fit, separate_chains=TRUE,ncol=1),
             ncol=2)

stan_ac(arrington.brm2$fit)

Show BRM code for step characteristics

summary(do.call(rbind, args = get_sampler_params(arrington.brm2$fit, inc_warmup = FALSE)), digits = 2)

 accept_stat__    stepsize__    treedepth__   n_leapfrog__  n_divergent__
 Min.   :0.37   Min.   :0.43   Min.   :1.0   Min.   : 1.0   Min.   :0    
 1st Qu.:0.87   1st Qu.:0.43   1st Qu.:2.0   1st Qu.: 3.0   1st Qu.:0    
 Median :0.95   Median :0.43   Median :3.0   Median : 7.0   Median :0    
 Mean   :0.91   Mean   :0.45   Mean   :2.8   Mean   : 6.5   Mean   :0    
 3rd Qu.:0.99   3rd Qu.:0.50   3rd Qu.:3.0   3rd Qu.: 7.0   3rd Qu.:0    
 Max.   :1.00   Max.   :0.50   Max.   :4.0   Max.   :15.0   Max.   :0

stan_diag(arrington.brm2$fit)

stan_diag(arrington.brm2$fit, information = "stepsize")

library(gridExtra)
grid.arrange(stan_rhat(arrington.brm2$fit) + theme_classic(8),
             stan_ess(arrington.brm2$fit) + theme_classic(8),
             stan_mcse(arrington.brm2$fit) + theme_classic(8),
             ncol = 2)

Check model fit diagnostics

Show BRM code

inv.logit <- binomial()$linkinv
#Calculate residuals   
Resid <- residuals(arrington.brm2, type='pearson')
Fitted <- fitted(arrington.brm2, scale='linear')
ggplot(data=NULL, aes(y=Resid[,'Estimate'], x=Fitted[,'Estimate'])) + geom_point()

SSres <- sum(Resid[,'Estimate']^2)
pi = fitted(arrington.brm2, scale='response')
YNew <- matrix(rbinom(length(pi),prob=pi,size=1),nrow=nrow(pi))

Resid1<-(pi - YNew)/sqrt(pi*(1-pi))
SSres.sim<-apply(Resid1^2,1,sum)
mean(SSres.sim>SSres)

[1] NA

## overdispersion
Resid <- residuals(arrington.brm2, type='pearson',summary=FALSE)
RSS <- apply(Resid^2,1,sum) #sum(Resid[,'Estimate']^2)
Disp <- RSS/(nrow(arrington)-ncol(coefs))
data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))

        Median      Mean     lower    upper
var1 0.9881678 0.9906284 0.9756028 1.011514

There are no major concerns with these diagnostics, so we can now explore the model parameters.

Show BRM code

print(arrington.brm2)

 Family: gaussian (identity) 
Formula: logit(Perc) ~ Trophic 
   Data: arrington (Number of observations: 254) 
Samples: 3 chains, each with iter = 2000; warmup = 500; thin = 2; 
         total post-warmup samples = 2250
   WAIC: Not computed
 
Fixed Effects: 
                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept             -3.25      0.31    -3.86    -2.64       1183    1
TrophicInvertivore     0.62      0.35    -0.04     1.30       1213    1
TrophicOmnivore       -0.17      0.38    -0.94     0.58       1341    1
TrophicPiscivore       2.18      0.38     1.46     2.94       1397    1

Family Specific Parameters: 
            Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma(Perc)     1.65      0.07     1.51     1.81       1704    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

WAIC(arrington.brm2)

   WAIC    SE
 979.13 20.75

marginal_effects(arrington.brm2)

Error in logit(Perc): REAL() can only be applied to a 'numeric', not a 'logical'

Fit a generalized linear model (binomial distribution) on frequency (count) of empty stomachs

Rather than transform the raw data in order to satisfy the requirements of linear regression, it is arguably better to fir a model that is more appropriate for the response variable. In this case, since the response is the frequency (proportion) of empty stomachs from a set of fish, we could argue that this is a good match for a binomial distribution. Proportions are bound to the range [0,1] and we might expect a certain degree of relationship between location and scale (mean and variance).

The glm() function allows us to work with either proportions or frequencies (the actual counts), so in the following, we will fit the model both ways. These can be considered as identical alternatives.

Show code

arrington.brm3 <- brm(Empty|trials(Individuals) ~ Trophic, data=arrington, family=binomial,
       prior=c(set_prior('normal(0,100)', class='b')),
       chains=3, iter=2000, warmup=500, thin=2)

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 1, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 1, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 1, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.387319 seconds (Warm-up)
#                1.1419 seconds (Sampling)
#                1.52921 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 2, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 2, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 2, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 2, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.374409 seconds (Warm-up)
#                1.05255 seconds (Sampling)
#                1.42696 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 3, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 3, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 3, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 3, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.387403 seconds (Warm-up)
#                0.965986 seconds (Sampling)
#                1.35339 seconds (Total)
#

Check the chain mixing characteristics

Show BRM code for traceplots and autocorrelation

library(gridExtra)
grid.arrange(stan_trace(arrington.brm3$fit, ncol=1),
             stan_dens(arrington.brm3$fit, separate_chains=TRUE,ncol=1),
             ncol=2)

stan_ac(arrington.brm3$fit)

Show BRM code for step characteristics

summary(do.call(rbind, args = get_sampler_params(arrington.brm3$fit, inc_warmup = FALSE)), digits = 2)

 accept_stat__    stepsize__    treedepth__  n_leapfrog__  n_divergent__
 Min.   :0.37   Min.   :0.29   Min.   :1    Min.   : 1.0   Min.   :0    
 1st Qu.:0.90   1st Qu.:0.29   1st Qu.:2    1st Qu.: 3.0   1st Qu.:0    
 Median :0.96   Median :0.33   Median :3    Median : 7.0   Median :0    
 Mean   :0.93   Mean   :0.33   Mean   :3    Mean   : 7.9   Mean   :0    
 3rd Qu.:0.99   3rd Qu.:0.38   3rd Qu.:4    3rd Qu.:11.0   3rd Qu.:0    
 Max.   :1.00   Max.   :0.38   Max.   :5    Max.   :31.0   Max.   :0

stan_diag(arrington.brm3$fit)

stan_diag(arrington.brm3$fit, information = "stepsize")

library(gridExtra)
grid.arrange(stan_rhat(arrington.brm3$fit) + theme_classic(8),
             stan_ess(arrington.brm3$fit) + theme_classic(8),
             stan_mcse(arrington.brm3$fit) + theme_classic(8),
             ncol = 2)

Check model fit diagnostics

Show BRM code

inv.logit <- binomial()$linkinv
#Calculate residuals   
Resid <- residuals(arrington.brm3, type='pearson')
Fitted <- fitted(arrington.brm3, scale='linear')
ggplot(data=NULL, aes(y=Resid[,'Estimate'], x=Fitted[,'Estimate'])) + geom_point()

SSres <- sum(Resid[,'Estimate']^2)
pi = fitted(arrington.brm3, scale='response')
YNew <- matrix(rbinom(length(pi),prob=pi,size=1),nrow=nrow(pi))

Resid1<-(pi - YNew)/sqrt(pi*(1-pi))
SSres.sim<-apply(Resid1^2,1,sum)
mean(SSres.sim>SSres)

[1] NA

## overdispersion
Resid <- residuals(arrington.brm3, type='pearson',summary=FALSE)
RSS <- apply(Resid^2,1,sum) #sum(Resid[,'Estimate']^2)
Disp <- RSS/(nrow(arrington)-ncol(coefs))
data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))

       Median     Mean    lower    upper
var1 24.51102 24.51353 24.49827 24.53387

There are no major concerns with these diagnostics, so we can now explore the model parameters.

Show BRM code

print(arrington.brm3)

 Family: binomial (logit) 
Formula: Empty | trials(Individuals) ~ Trophic 
   Data: arrington (Number of observations: 254) 
Samples: 3 chains, each with iter = 2000; warmup = 500; thin = 2; 
         total post-warmup samples = 2250
   WAIC: Not computed
 
Fixed Effects: 
                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept             -2.74      0.07    -2.88    -2.61        826 1.01
TrophicInvertivore     0.89      0.07     0.75     1.03        841 1.01
TrophicOmnivore        0.13      0.08    -0.02     0.30        888 1.00
TrophicPiscivore       1.97      0.07     1.83     2.12        876 1.01

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

WAIC(arrington.brm3)

    WAIC     SE
 6915.29 682.35

The model is clearly overdispersed... There are numerous ways of addressing this:
- overdispersion is often the result of the model failing to adequately capturing the full variability in the data. Data might be more variable than expected by a model (and its distribution) because of latent effects (additional influences that are not measured or incorporated in the model). If so, then adding a unit (or observation) level random effect (a latent variable) can address overdispersion by soaking up the additional variability. In this relatively simple example, a unit level random effect is simply a variable with the numbers 1 through to the number of rows in the data. Hence, each observation has a unique value (level). In more complex examples, it might be necessary to nest the number sequence within blocks.
- fit a model against a distribution that does not assume dispersion is 1 and has an ability to estimate both location and scale (such as a beta-binomial.

Lets first try fitting an observation-level random effect in a hierarchical generalized linear model

Show BRM code

arrington$Obs <- 1:nrow(arrington)
arrington.brm4 <- brm(Empty|trials(Individuals) ~ Trophic +(1|Obs), data=arrington, family=binomial,
       prior=c(set_prior('normal(0,100)', class='b'),
               set_prior('cauchy(0,2)', class='sd')),
       chains=3, iter=2000, warmup=500, thin=10)

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 1, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 1, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 1, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 4.52238 seconds (Warm-up)
#                9.73738 seconds (Sampling)
#                14.2598 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 2, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 2, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 2, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 2, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 4.65143 seconds (Warm-up)
#                9.25017 seconds (Sampling)
#                13.9016 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 3, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 3, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 3, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 3, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 4.40266 seconds (Warm-up)
#                9.61914 seconds (Sampling)
#                14.0218 seconds (Total)
#

Check the chain mixing characteristics

Show BRM code for traceplots and autocorrelation

library(gridExtra)
grid.arrange(stan_trace(arrington.brm4$fit, ncol=1),
             stan_dens(arrington.brm4$fit, separate_chains=TRUE,ncol=1),
             ncol=2)

stan_ac(arrington.brm4$fit)

Show BRM code for step characteristics

summary(do.call(rbind, args = get_sampler_params(arrington.brm4$fit, inc_warmup = FALSE)), digits = 2)

 accept_stat__    stepsize__     treedepth__  n_leapfrog__ n_divergent__
 Min.   :0.52   Min.   :0.072   Min.   :5    Min.   :31    Min.   :0    
 1st Qu.:0.89   1st Qu.:0.072   1st Qu.:6    1st Qu.:63    1st Qu.:0    
 Median :0.95   Median :0.074   Median :6    Median :63    Median :0    
 Mean   :0.92   Mean   :0.075   Mean   :6    Mean   :62    Mean   :0    
 3rd Qu.:0.98   3rd Qu.:0.080   3rd Qu.:6    3rd Qu.:63    3rd Qu.:0    
 Max.   :1.00   Max.   :0.080   Max.   :6    Max.   :63    Max.   :0

stan_diag(arrington.brm4$fit)

stan_diag(arrington.brm4$fit, information = "stepsize")

library(gridExtra)
grid.arrange(stan_rhat(arrington.brm4$fit) + theme_classic(8),
             stan_ess(arrington.brm4$fit) + theme_classic(8),
             stan_mcse(arrington.brm4$fit) + theme_classic(8),
             ncol = 2)

Check model fit diagnostics

Show BRM code

## rather than explore residuals plots, we should plot the random effects against the 
## predicted values from the fixed effect component of the model and check for no trend:
ran = ranef(arrington.brm4)$Obs
Xmat = model.matrix(~Trophic, data=arrington)
fit = Xmat%*%fixef(arrington.brm4)
ggplot(data=NULL, aes(y=ran, x=fit)) + geom_point()

## looks good..



## overdispersion
Resid <- residuals(arrington.brm4, type='pearson',summary=FALSE)
RSS <- apply(Resid^2,1,sum) #sum(Resid[,'Estimate']^2)
Disp <- RSS/(nrow(arrington)-ncol(coefs))
data.frame(Median=median(Disp), Mean=mean(Disp), HPDinterval(as.mcmc(Disp)))

        Median      Mean     lower     upper
var1 0.5608449 0.5629726 0.4400656 0.6863668

There are no major concerns with these diagnostics, so we can now explore the model parameters.

Show BRM code

print(arrington.brm4)

 Family: binomial (logit) 
Formula: Empty | trials(Individuals) ~ Trophic + (1 | Obs) 
   Data: arrington (Number of observations: 254) 
Samples: 3 chains, each with iter = 2000; warmup = 500; thin = 10; 
         total post-warmup samples = 450
   WAIC: Not computed
 
Random Effects: 
~Obs (Number of levels: 254) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     1.46      0.08     1.32     1.64        363    1

Fixed Effects: 
                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept             -3.38      0.32    -3.94    -2.78        450 1.00
TrophicInvertivore     0.87      0.34     0.16     1.51        450 0.99
TrophicOmnivore        0.07      0.38    -0.69     0.82        375 1.01
TrophicPiscivore       2.38      0.37     1.65     3.12        334 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

WAIC(arrington.brm4)

    WAIC    SE
 1279.27 27.91

Definitely worth a summary figure.

Show BRM code

newdata = data.frame(Trophic = levels(arrington$Trophic))
Xmat <- model.matrix(~Trophic, data=newdata)
coefs <- as.matrix(as.data.frame(rstan:::extract(arrington.brm4$fit)))
coefs <- coefs[,grep('^b_', colnames(coefs))]
inv.logit <- binomial()$linkinv
fit = inv.logit(coefs %*% t(Xmat))
newdata = cbind(newdata, plyr:::adply(fit, 2, function(x) {
   data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
}))

ggplot(newdata, aes(y=Median, x=Trophic)) +
  geom_blank()+
  geom_pointrange(aes(ymin=lower, ymax=upper, x=as.numeric(Trophic))) +
  scale_y_continuous('Proportion of individuals with empty stomachs')+
  scale_x_discrete('Trophic Group')+
  theme_classic() +
  theme(axis.line.x=element_line(),axis.line.y=element_line(),
        axis.title.x=element_text(margin=margin(t=2, unit='lines')),
        axis.title.y=element_text(margin=margin(r=2, unit='lines')))

Fit a beta-binomial on frequency (count) of empty stomachs

Give it a go...

Show BRM code

arrington.brm4 <- brm(Empty|trials(Individuals) ~ Trophic, data=arrington, family=binomial,
       prior=c(set_prior('normal(0,100)', class='b')),
       chains=3, iter=2000, warmup=500, thin=2)

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 1, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 1, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 1, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 1, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 1, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 1, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 1, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.395135 seconds (Warm-up)
#                1.12591 seconds (Sampling)
#                1.52105 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 2, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 2, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 2, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 2, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 2, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 2, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 2, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.365576 seconds (Warm-up)
#                1.10701 seconds (Sampling)
#                1.47258 seconds (Total)
# 

SAMPLING FOR MODEL 'binomial(logit) brms-model' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  501 / 2000 [ 25%]  (Sampling)
Chain 3, Iteration:  700 / 2000 [ 35%]  (Sampling)
Chain 3, Iteration:  900 / 2000 [ 45%]  (Sampling)
Chain 3, Iteration: 1100 / 2000 [ 55%]  (Sampling)
Chain 3, Iteration: 1300 / 2000 [ 65%]  (Sampling)
Chain 3, Iteration: 1500 / 2000 [ 75%]  (Sampling)
Chain 3, Iteration: 1700 / 2000 [ 85%]  (Sampling)
Chain 3, Iteration: 1900 / 2000 [ 95%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)# 
#  Elapsed Time: 0.382143 seconds (Warm-up)
#                0.926586 seconds (Sampling)
#                1.30873 seconds (Total)
#

stancode(arrington.brm4)

functions { 
} 
data { 
  int<lower=1> N;  // number of observations 
  int Y[N];  // response variable 
  int<lower=1> K;  // number of fixed effects 
  matrix[N, K] X;  // FE design matrix 
  vector[K] X_means;  // column means of X 
  int trials[N];  // number of trials 
} 
transformed data { 
} 
parameters { 
  real temp_Intercept;  // temporary Intercept 
  vector[K] b;  // fixed effects 
} 
transformed parameters { 
  vector[N] eta;  // linear predictor 
  // compute linear predictor 
  eta <- X * b + temp_Intercept; 
} 
model { 
  // prior specifications 
  b ~ normal(0,100); 
  // likelihood contribution 
  Y ~ binomial_logit(trials, eta); 
} 
generated quantities { 
  real b_Intercept;  // fixed effects intercept 
  b_Intercept <- temp_Intercept - dot_product(X_means, b); 
}

modelString="
data { 
  int<lower=1> N;  // number of observations 
  int Y[N];  // response variable 
  int<lower=1> K;  // number of fixed effects 
  matrix[N, K] X;  // FE design matrix 
  vector[K] X_means;  // column means of X 
  int trials[N];  // number of trials 
} 
transformed data { 
} 
parameters { 
  vector[K] b;  // fixed effects 
  real intercept;
  real phi;  //overdispersion parameter
} 
transformed parameters { 
  vector[N] eta;  // linear predictor 
  vector[N] alpha; //first shape parameter
  vector[N] beta;  //second shape parameter
  // compute linear predictor 
  for (i in 1:N) eta[i] <- inv_logit(X[i,] * b + intercept); 
  alpha <- eta*phi;
  beta <- (1-eta)*phi;
} 
model { 
  // prior specifications
  intercept ~ normal(0,100); 
  b ~ normal(0,100);
  // likelihood contribution 
  Y ~ beta_binomial(trials, alpha,beta); 
} 
generated quantities { 
}
"


library(rstan)
arrington.rstan <- stan(data=standata(arrington.brm4),
                    model_code=modelString,
                                chains=3,
                                iter=1000,
                                warmup=500,
                                thin=2,
                                save_dso=TRUE
                                )

SAMPLING FOR MODEL '0f53c5d078293f511e328ffcf4d8078a' NOW (CHAIN 1).

Chain 1, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 407.543 seconds (Warm-up)
#                404.442 seconds (Sampling)
#                811.985 seconds (Total)
# 

SAMPLING FOR MODEL '0f53c5d078293f511e328ffcf4d8078a' NOW (CHAIN 2).

Chain 2, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 363.687 seconds (Warm-up)
#                484.161 seconds (Sampling)
#                847.848 seconds (Total)
# 

SAMPLING FOR MODEL '0f53c5d078293f511e328ffcf4d8078a' NOW (CHAIN 3).

Chain 3, Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3, Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3, Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3, Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3, Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3, Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3, Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3, Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3, Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3, Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3, Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3, Iteration: 1000 / 1000 [100%]  (Sampling)# 
#  Elapsed Time: 405.924 seconds (Warm-up)
#                444.75 seconds (Sampling)
#                850.673 seconds (Total)
#

print(arrington.rstan, pars=c('intercept','b','phi'))

Inference for Stan model: 0f53c5d078293f511e328ffcf4d8078a.
3 chains, each with iter=1000; warmup=500; thin=2; 
post-warmup draws per chain=250, total post-warmup draws=750.

           mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
intercept -1.46    0.26 0.32 -1.84 -1.72 -1.62 -1.02 -1.01     2 5.83
b[1]       0.62    0.10 0.22  0.13  0.46  0.68  0.77  0.99     4 1.39
b[2]       0.13    0.04 0.22 -0.29  0.03  0.10  0.26  0.60    23 1.20
b[3]       1.42    0.23 0.33  1.04  1.07  1.43  1.71  1.99     2 2.06
phi        3.58    1.26 1.59  1.39  1.42  4.23  5.04  5.40     2 7.19

Samples were drawn using NUTS(diag_e) at Mon Sep  5 09:08:26 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

newdata = data.frame(Trophic = levels(arrington$Trophic))
Xmat <- model.matrix(~Trophic, data=newdata)
coefs <-as.matrix(as.data.frame(extract(arrington.rstan, pars=c('intercept','b'))))
inv.logit <- binomial()$linkinv
fit = inv.logit(coefs %*% t(Xmat))
newdata = cbind(newdata, plyr:::adply(fit, 2, function(x) {
   data.frame(Mean=mean(x), Median=median(x), HPDinterval(as.mcmc(x)))
}))

ggplot(newdata, aes(y=Median, x=Trophic)) +
  geom_blank()+
  geom_pointrange(aes(ymin=lower, ymax=upper, x=as.numeric(Trophic))) +
  scale_y_continuous('Proportion of individuals with empty stomachs')+
  scale_x_discrete('Trophic Group')+
  theme_classic() +
  theme(axis.line.x=element_line(),axis.line.y=element_line(),
        axis.title.x=element_text(margin=margin(t=2, unit='lines')),
        axis.title.y=element_text(margin=margin(r=2, unit='lines')))

Tutorial 11.4b -Logistic regression

Binary data

Scenario and Data

Exploratory data analysis and initial assumption checking

JAGS

Effects model

Chain mixing and Model validation

Exploring the model parameters, test hypotheses

Further explorations of the trends

Grouped binary data (and proportional data)

Scenario and Data

Exploratory data analysis and initial assumption checking

Model fitting or statistical analysis - JAGS

Model evaluation

Further explorations of the trends

Worked Examples

Basic χ2 references

Logistic regression

Proportional data

Fit a linear model on arcsin transformed percentage of empty stomachs

Fit a linear model on logit transformed percentage of empty stomachs

Fit a generalized linear model (binomial distribution) on frequency (count) of empty stomachs

Fit a beta-binomial on frequency (count) of empty stomachs

Basic χ² references