Other data types

Binary - only 0 and 1 (dead/alive) (present/absent)
Proportional abundance - range from 0 to 100
Count data - min of zero

General linear models

\[ Residual_i = y_i - E(Y_i) \]

General linear models

\[\underbrace{E(Y)}_{Link~~function} = \underbrace{\beta_0 + \beta_1x_1~+~...~+~\beta_px_p}_{Systematic} + \varepsilon, ~~\varepsilon\sim Dist(...)\]

General linear models

\[\underbrace{E(Y)}_{Link~~function} = \underbrace{\beta_0 + \beta_1x_1~+~...~+~\beta_px_p}_{Systematic}~~\underbrace{~+~e}_{Random}\]

Random component. \[E(Y_i) \sim{} N(\mu_i, \sigma^2)\]
Systematic component \[\beta_0 + \beta_1x_1~+~...~+~\beta_px_p\]
Link function \[\mu = \beta_0 + \beta_1x_1~+~...~+~\beta_px_p\]

Generalized linear models

\[\underbrace{g(\mu)}_{Link~~function} = \underbrace{\beta_0 + \beta_1x_1~+~...~+~\beta_px_p}_{Systematic}~~\underbrace{~+~e}_{Random}\]

Random component. A nominated distribution (Gaussian, Poisson, Binomial, Gamma, Beta,…)
Systematic component
Link function

Generalized linear models

Response variable	Probability distribution	Link function	Model name
Continuous measurements	Gaussian	identity: \(\mu\)	Linear regression
Binary	Binomial	logit: \(log\left(\frac{\pi}{1-\pi}\right)\)	Logistic regression
		probit: \(\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\alpha+\beta.X} exp\left(-\frac{1}{2}Z^2\right)dZ\)	Probit regression
		Complimentary log-log: \(log(-log(1-\pi))\)	Logistic regression
Counts	Poisson	log: \(log \mu\)	Poisson regression log-linear model
	Negative binomial	\(log\left(\frac{\mu}{\mu+\theta}\right)\)	Negative biomial regression
	Quasi-poisson	\(log\mu\)	Poisson regression

OLS

Maximum Likelihood

Logistic regression

Binary data

Link: \(log\left(\frac{\pi}{1-\pi}\right)\)

Transform scale of linear predictor (\(-\infty,\infty\)) into that of the response (0,1)

Logistic regression

\[E(Y) = \left(\begin{array}{c} n\\x \end{array}\right)p^{x}(1-p)^{n-x}\]

Dispersion

Spread assumed to be equal to mean. (\(\phi = 1\))

Dispersion

Over-dispersion

Sample more varied than expected from its mean

variability due to other unmeasured influences
- quasi- model
due to more zeros than expected
- zero-inflated model

Logistic regression

Binary data

Fit model

dat.glmL <- glm(y ~ x, data = dat, family = "binomial")

Logistic regression

Binary data

Fit model

dat.glmL <- glm(y ~ x, data = dat, family = "binomial")

Explore residuals

plot(dat.glmL)

Logistic regression

Binary data

Fit model

dat.glmL <- glm(y ~ x, data = dat, family = "binomial")

Explore residuals

plot(dat.glmL)

Explore goodness of fit
- Pearson’s \(\chi^2\) residuals
- Deviance (\(G^2\))

Logistic regression

Binary data

Explore model parameters

r summary(data.glmL} Slope parameter is on log odds-ratio scale

Logistic regression

Binary data

Explore model parameters

r summary(data.glmL} Slope parameter is on log odds-ratio scale

Quasi \(R^2\) \[quasi R^2 = 1-\left(\frac{deviance}{null~deviance}\right)\]

Logistic regression

Binary data

LD50 \[LD50 = - \frac{intercept}{slope}\]

Worked Example

polis <- read.csv('../data/polis.csv', strip.white=T)
head(polis)

      ISLAND RATIO PA
1       Bota 15.41  1
2     Cabeza  5.63  1
3    Cerraja 25.92  1
4 Coronadito 15.17  0
5     Flecha 13.04  1
6   Gemelose 18.85  0

ggplot(polis, aes(y=PA, x=RATIO))+geom_point()+
    geom_smooth(method='glm', formula=y~x, family='binomial')

Error: could not find function "ggplot"

plot(PA~RATIO, data=polis )
abline(lm(PA~RATIO, data=polis))
with(polis,lines(lowess(PA~RATIO)))

polis.glm <- glm(PA ~ RATIO, family=binomial, data=polis)

par(mfrow=c(2,3))
plot(polis.glm)

##Check the model for lack of fit via:
##Pearson chisq
polis.resid <- sum(resid(polis.glm, type="pearson")^2)
1-pchisq(polis.resid, polis.glm$df.resid)

[1] 0.5715

#No evidence of a lack of fit

#Deviance
1-pchisq(polis.glm$deviance, polis.glm$df.resid)

[1] 0.6514

#No evidence of a lack of fit

#Estimate dispersal
polis.resid/polis.glm$df.resid

[1] 0.9019

#OR
polis.glm$deviance/polis.glm$df.resid

[1] 0.8365

#Number of zeros
polis.tab<-table(polis$PA==0)
polis.tab/sum(polis.tab)


 FALSE   TRUE 
0.5263 0.4737

summary(polis.glm)


Call:
glm(formula = PA ~ RATIO, family = binomial, data = polis)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.607  -0.638   0.237   0.433   2.099  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)    3.606      1.695    2.13    0.033
RATIO         -0.220      0.101   -2.18    0.029
             
(Intercept) *
RATIO       *
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 26.287  on 18  degrees of freedom
Residual deviance: 14.221  on 17  degrees of freedom
AIC: 18.22

Number of Fisher Scoring iterations: 6

exp(-0.296)

[1] 0.7438

#R2
1-(polis.glm$deviance/polis.glm$null)

[1] 0.459

anova(polis.glm, test='Chisq')

Analysis of Deviance Table

Model: binomial, link: logit

Response: PA

Terms added sequentially (first to last)

      Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                     18       26.3         
RATIO  1     12.1        17       14.2  0.00051
         
NULL     
RATIO ***
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

par(mar=c(4,5,0,0))
plot(PA~RATIO, data=polis,type="n",ann=F,axes=F)
points(PA~RATIO, data=polis,pch=16)
xs <- seq(min(polis$RATIO),max(polis$RATIO),len=1000)
ys <- predict(polis.glm,newdata=data.frame(RATIO=xs),
        type='link',se=T)

ys$lwr <- polis.glm$family$linkinv(ys$fit - 2*ys$se.fit)
ys$upr <- polis.glm$family$linkinv(ys$fit + 2*ys$se.fit)
ys$fit <- polis.glm$family$linkinv(ys$fit)

lines(ys$fit~xs, col='black')
lines(ys$lwr~xs, col='black',lty=2)
lines(ys$upr~xs, col='black',lty=2)
axis(1)
mtext("Island perimeter to area ratio",1,cex=1.5,line=3)
axis(2,las=2)
mtext(expression(paste(italic(Uta)," lizard presence/absence")),2,cex=1.5,line=3)
box(bty="l")

#LD50
ld50<--polis.glm$coef[1]/polis.glm$coef[2]

newdata <- data.frame(RATIO=xs, fit=ys$fit, lower=ys$lwr, upper=ys$upr)
ggplot(newdata, aes(y=fit, x=RATIO))+geom_line()+
    geom_ribbon(aes(ymin=lower, ymax=upper), fill='blue',alpha=0.2)+
    geom_point(data=polis, aes(y=PA, x=RATIO))

Error: could not find function "ggplot"

Poisson regression

\[ p(Y_i) = \frac{e^{-\lambda}\lambda^x}{x!} \]

\[log(\mu)=\beta_0+\beta_1x_i+...+\beta_px_p\]

Dispersion

Spread assumed to be equal to mean. (\(\phi = 1\))

Dispersion

Over-dispersion

Sample more varied than expected from its mean

variability due to other unmeasured influences
- quasi-poisson model

clumpiness
- negative binomial model

due to more zeros than expected
- zero-inflated model

Worked Example

data.pois <- read.csv('../data/data.pois.csv', strip.white=T)
head(data.pois)

library(car)
scatterplot(y~x,data=data.pois)

scatterplot(y~x,data=data.pois, log='y')
rug(jitter(data.pois$y), side=2)

hist(data.pois$y)

boxplot(data.pois$y, horizontal=TRUE)
rug(jitter(data.pois$y), side=1)

#plot(y~x, data.pois, log="y")
#with(data.pois, lines(lowess(y~x)))
#library(car)
#scatterplot(y~x,data=data.pois, log='y')
#rug(jitter(data.pois$y), side=2)


#Zero inflation
#proportion of 0's in the data
data.pois.tab<-table(data.pois$y==0)
data.pois.tab/sum(data.pois.tab)


FALSE 
    1

#proportion of 0's expected from a Poisson distribution
mu <- mean(data.pois$y)
cnts <- rpois(1000, mu)
data.pois.tab <- table(cnts == 0)
data.pois.tab/sum(data.pois.tab)


FALSE  TRUE 
0.999 0.001

#fit model
data.pois.glm <- glm(y~x, data=data.pois, family="poisson")
data.pois.glm <- glm(y~x, data=data.pois, family=poisson(link='log'))

                                        #Model validation
par(mfrow=c(2,3))
plot(data.pois.glm, ask=F, which=1:6)

#goodness of fit
data.pois.resid <- sum(resid(data.pois.glm, type = "pearson")^2)
1 - pchisq(data.pois.resid, data.pois.glm$df.resid)

[1] 0.4897

#Deviance based
1-pchisq(data.pois.glm$deviance, data.pois.glm$df.resid)

[1] 0.5076

data.pois.glmG <- glm(y~x, data=data.pois, family="gaussian")
library(MuMIn)
AICc(data.pois.glm, data.pois.glmG)

               df  AICc
data.pois.glm   2 91.02
data.pois.glmG  3 99.78

#Poisson deviance
data.pois.glm$deviance

[1] 17.23

#Gaussian deviance
data.pois.glmG$deviance

[1] 118.1

data.pois.resid/data.pois.glm$df.resid

[1] 0.9717

data.pois.glm$deviance/data.pois.glm$df.resid

[1] 0.957

#Or alternatively, via Pearson's residuals
Resid <- resid(data.pois.glm, type = "pearson")
sum(Resid^2) / (nrow(data.pois) - length(coef(data.pois.glm)))

[1] 0.9717

summary(data.pois.glm)


Call:
glm(formula = y ~ x, family = poisson(link = "log"), data = data.pois)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.635  -0.705   0.044   0.562   2.046  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.5600     0.2539    2.21    0.027
x             0.1115     0.0186    6.00    2e-09
               
(Intercept) *  
x           ***
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 55.614  on 19  degrees of freedom
Residual deviance: 17.226  on 18  degrees of freedom
AIC: 90.32

Number of Fisher Scoring iterations: 4

library(gmodels)
ci(data.pois.glm)

            Estimate CI lower CI upper Std. Error
(Intercept)   0.5600  0.02649   1.0935    0.25395
x             0.1115  0.07247   0.1506    0.01858
              p-value
(Intercept) 2.744e-02
x           1.971e-09

1-(data.pois.glm$deviance/data.pois.glm$null)

[1] 0.6903

par(mfrow=c(1,1), mar = c(4, 5, 0, 0))
plot(y ~ x, data = data.pois, type = "n", ann = F, axes = F)
points(y ~ x, data = data.pois, pch = 16)
xs <- seq(min(data.pois$x), max(data.pois$x), len = 1000)
#ys <- predict(data.pois.glm, newdata = data.frame(x = xs), type = "response", se = T)
ys <- predict(data.pois.glm, newdata = data.frame(x = xs), type = "link", se = T)

ys$lwr <- exp(ys$fit - 2*ys$se.fit)
ys$upr <- exp(ys$fit + 2*ys$se.fit)
ys$fit <- exp(ys$fit)

lines(ys$fit ~ xs, col = "black")
lines(ys$lwr ~ xs, col = "black", lty = 2)
lines(ys$upr ~ xs, col = "black", lty = 2)

polygon(c(xs,rev(xs)), c(ys$lwr,rev(ys$upr)), border=NA, col="#0000FF50")
#lines(ys$fit - 2 * ys$se.fit ~ xs, col = "black", type = "l", lty = 2)
#lines(ys$fit + 2 * ys$se.fit ~ xs, col = "black", type = "l", lty = 2)
axis(1)
mtext("X", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Abundance of Y", 2, cex = 1.5, line = 3)
box(bty = "l")

Quasi-Poisson

specify \(var(y) = \phi \mu\)
\(\phi\) (scale parameter) estimated
parameter estimates same
Standard errors \(=\sqrt{\phi}\times SE\)

Worked Example

data.qp <- read.table('../data/data.qp.csv', header=T, sep=',', strip.white=T)
head(data.qp)

ggplot(data.qp, aes(y=y, x=x))+geom_point()+geom_rug()+
    geom_smooth(method='lm')

Error: could not find function "ggplot"

## library(car)
## scatterplot(y~x,data=data.qp)
## scatterplot(y~x,data=data.qp, log='y')
plot(y~x, data.qp)
with(data.qp, lines(lowess(y~x)))
rug(jitter(data.qp$y), side=2)

plot(y~x, data.qp, log="y")
with(data.qp, lines(lowess(y~x)))
rug(jitter(data.qp$y), side=2)

hist(data.qp$y)

boxplot(data.qp$y, horizontal=TRUE)
rug(jitter(data.qp$y), side=1)

#Zero inflation
#proportion of 0's in the data
data.qp.tab<-table(data.qp$y==0)
data.qp.tab/sum(data.qp.tab)


FALSE  TRUE 
 0.85  0.15

#proportion of 0's expected from a Poisson distribution
mu <- mean(data.qp$y)
cnts <- rpois(1000, mu)
data.qp.tab <- table(cnts == 0)
data.qp.tab/sum(data.qp.tab)


FALSE 
    1

#fit model
data.qp.glm<-glm(y~x, data=data.qp, family="poisson")

data.qp.glm2 <- glm(y~x, data=data.qp, family="quasipoisson")
AICc(data.qp.glm, data.qp.glm2)

             df  AICc
data.qp.glm   2 135.6
data.qp.glm2  2    NA

anova(data.qp.glm, data.qp.glm2, test='Chisq')

Analysis of Deviance Table

Model 1: y ~ x
Model 2: y ~ x
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1        18         70                     
2        18         70  0        0

                                        #Model validation
par(mfrow=c(2,3))
plot(data.qp.glm, ask=F, which=1:6)

data.qp.glm$deviance/data.qp.glm$df.resid

[1] 3.888

#Or alternatively, via Pearson's residuals
Resid <- resid(data.qp.glm, type = "pearson")
sum(Resid^2) / (nrow(data.qp) - length(coef(data.qp.glm)))

[1] 3.699

summary(data.qp.glm)


Call:
glm(formula = y ~ x, family = "poisson", data = data.qp)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-3.366  -1.736  -0.124   0.744   3.933  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.6997     0.2468    2.84   0.0046
x             0.1005     0.0184    5.47  4.4e-08
               
(Intercept) ** 
x           ***
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 101.521  on 19  degrees of freedom
Residual deviance:  69.987  on 18  degrees of freedom
AIC: 134.9

Number of Fisher Scoring iterations: 5

library(gmodels)
ci(data.qp.glm)

            Estimate CI lower CI upper Std. Error
(Intercept)   0.6997  0.18121    1.218    0.24677
x             0.1005  0.06192    0.139    0.01835
              p-value
(Intercept) 4.579e-03
x           4.389e-08

1-(data.qp.glm$deviance/data.qp.glm$null)

[1] 0.3106

par(mar = c(4, 5, 0, 0))
plot(y ~ x, data = data.qp, type = "n", ann = F, axes = F)
points(y ~ x, data = data.qp, pch = 16)
xs <- seq(0, 20, l = 1000)
ys <- predict(data.pois.glm, newdata = data.frame(x = xs), type = "link", se = T)
ys$lwr <- exp(ys$fit - 2*ys$se.fit)
ys$upr <- exp(ys$fit + 2*ys$se.fit)
ys$fit <- exp(ys$fit)
lines(ys$fit ~ xs, col = "black")
lines(ys$lwr ~ xs, col = "black", lty = 2)
lines(ys$upr ~ xs, col = "black", lty = 2)
axis(1)
mtext("X", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Abundance of Y", 2, cex = 1.5, line = 3)
box(bty = "l")

Negative Binomial

\[p(y_i)=\frac{\Gamma(y_i+\omega)}{\Gamma(\omega)y!}\times\frac{\mu_i^{y_i}\omega^\omega}{(\mu_i+\omega)^{\mu_i+\omega}}\]

Negative Binomial

\[p(y_i)=\frac{\Gamma(y_i+\omega)}{\Gamma(\omega)y!}\times\frac{\mu_i^{y_i}\omega^\omega}{(\mu_i+\omega)^{\mu_i+\omega}}\]

mixture model
\(\eta_i \sim Gamma(\mu_i, \omega)\)
\(E(y) \sim Poisson(\eta_i)\)

Worked Example

data.nb <- read.table('../data/data.nb.csv', header=T, sep=',', strip.white=T)
head(data.nb)

scatterplot(y~x, data.nb, log='y')

Error: NA/NaN/Inf in 'y'

plot(y~x, data.nb)
with(data.nb, lines(lowess(y~x)))

plot(y~x, data.nb, log="y")
with(data.nb, lines(lowess(y~x)))
rug(jitter(data.nb$y), side=2)

hist(data.nb$y, br=10)

boxplot(data.nb$y, horizontal=TRUE)
rug(jitter(data.nb$y), side=1)

#Zero inflation
#proportion of 0's in the data
data.nb.tab<-table(data.nb$y==0)
data.nb.tab/sum(data.nb.tab)


FALSE  TRUE 
 0.95  0.05

#proportion of 0's expected from a Poisson distribution
mu <- mean(data.nb$y)
cnts <- rpois(1000, mu)
hist(cnts)

tab <- table(cnts == 0)
tab/sum(tab)


FALSE  TRUE 
0.999 0.001

#fit model
library(MASS)
data.nb.glm <- glm.nb(y~x, data=data.nb)

                                        #Model validation
par(mfrow=c(2,3))
plot(data.nb.glm, ask=F,which=1:6)

data.ss <- sum(resid(data.nb1, type = "pearson")^2)

Error: object 'data.nb1' not found

data.ss/data.nb1$df.resid

Error: object 'data.ss' not found

data.nb1 <- glm(y~x, data=data.nb, family='poisson')
plot(data.nb1, ask=F,which=1:6)

1-pchisq(data.nb1$deviance, data.nb1$df.resid)

[1] 8.6e-08

#goodness of fit
data.nb.resid <- sum(resid(data.nb.glm, type = "pearson")^2)
1 - pchisq(data.nb.resid, data.nb.glm$df.resid)

[1] 0.5825

#Deviance based
1-pchisq(data.nb.glm$deviance, data.nb.glm$df.resid)

[1] 0.2407

data.nb.glmG <- glm(y~x, data=data.nb, family="gaussian")
AIC(data.nb.glm, data.nb.glmG)

             df   AIC
data.nb.glm   3 115.8
data.nb.glmG  3 128.2

#Poisson deviance
data.nb.glm$deviance

[1] 21.81

#Gaussian deviance
data.nb.glmG$deviance

[1] 527.6

summary(data.nb.glm)


Call:
glm.nb(formula = y ~ x, data = data.nb, init.theta = 2.359878187, 
    link = log)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.787  -0.894  -0.317   0.442   1.366  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.7454     0.4253    1.75   0.0797
x             0.0949     0.0344    2.76   0.0058
              
(Intercept) . 
x           **
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(2.36) family taken to be 1)

    Null deviance: 30.443  on 19  degrees of freedom
Residual deviance: 21.806  on 18  degrees of freedom
AIC: 115.8

Number of Fisher Scoring iterations: 1

              Theta:  2.36 
          Std. Err.:  1.10 

 2 x log-likelihood:  -109.85

library(gmodels)
ci(data.nb.glm)

            Estimate CI lower CI upper Std. Error
(Intercept)  0.74543 -0.14818   1.6390    0.42534
x            0.09494  0.02265   0.1672    0.03441
             p-value
(Intercept) 0.079682
x           0.005792

1-(data.nb.glm$deviance/data.nb.glm$null)

[1] 0.2837

par(mfrow=c(1,1),mar = c(4, 5, 0, 0))
plot(y ~ x, data = data.nb, type = "n", ann = F, axes = F)
points(y ~ x, data = data.nb, pch = 16)
xs <- seq(0, 20, l = 1000)
ys <- predict(data.nb.glm, newdata = data.frame(x = xs), type = "link", se = T)

ys$lwr <- exp(ys$fit - 2*ys$se.fit)
ys$upr <- exp(ys$fit + 2*ys$se.fit)
ys$fit <- exp(ys$fit)

lines(ys$fit ~ xs, col = "black")
lines(ys$lwr ~ xs, col = "black", lty = 2)
lines(ys$upr ~ xs, col = "black", lty = 2)

axis(1)
mtext("X", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Abundance of Y", 2, cex = 1.5, line = 3)
box(bty = "l")

Zero-inflated model

Two sources of zeros
- those expected by Poisson (or binomial)
- false zeros

Mixture model
- binary logistic (rate of false zeros)
- Poisson (trend in counts)

Worked Example

data.zip <- read.csv('../data/data.zip.csv', strip.white=T)
head(data.zip)

ggplot(data.zip, aes(y=y, x=x))+geom_point()

Error: could not find function "ggplot"

plot(y~x, data.zip)
with(data.zip, lines(lowess(y~x)))

plot(y~x, data.zip, log="y")
with(data.zip, lines(lowess(y~x)))
rug(jitter(data.pois$y), side=2)

hist(data.zip$y, br=20)

boxplot(data.zip$y, horizontal=TRUE)
rug(jitter(data.zip$y), side=1)

#Zero inflation
#proportion of 0's in the data
data.zip.tab<-table(data.zip$y==0)
data.zip.tab/sum(data.zip.tab)


FALSE  TRUE 
 0.55  0.45

#proportion of 0's expected from a Poisson distribution
mu <- mean(data.zip$y)
cnts <- rpois(1000, mu)
data.zip.tab <- table(cnts == 0)
data.zip.tab/sum(data.zip.tab)


FALSE  TRUE 
0.929 0.071

summary(glm(y~x, data.zip, family="poisson"))


Call:
glm(formula = y ~ x, family = "poisson", data = data.zip)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.541  -2.377  -0.975   1.174   3.638  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.6705     0.3302    2.03    0.042
x             0.0296     0.0266    1.11    0.266
             
(Intercept) *
x            
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 85.469  on 19  degrees of freedom
Residual deviance: 84.209  on 18  degrees of freedom
AIC: 124

Number of Fisher Scoring iterations: 6

plot(glm(y~x, data.zip, family="poisson"))

plot of chunk unnamed-chunk-12

library(pscl)

data.zip.zip <- zeroinfl(y ~ x | 1, dist = "poisson", data = data.zip)
plot(resid(data.zip.zip)~fitted(data.zip.zip))

summary(data.zip.zip)


Call:
zeroinfl(formula = y ~ x | 1, data = data.zip, 
    dist = "poisson")

Pearson residuals:
   Min     1Q Median     3Q    Max 
-1.001 -0.956 -0.393  0.966  1.619 

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.7047     0.3196    2.21  0.02745
x             0.0873     0.0253    3.45  0.00056
               
(Intercept) *  
x           ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -0.229      0.456    -0.5     0.62
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 13 
Log-likelihood: -36.2 on 3 Df

data.zip.zip1 <- zeroinfl(y ~ x | x, dist = "poisson", data = data.zip)
plot(resid(data.zip.zip1)~fitted(data.zip.zip1))

summary(data.zip.zip1)


Call:
zeroinfl(formula = y ~ x | x, data = data.zip, 
    dist = "poisson")

Pearson residuals:
   Min     1Q Median     3Q    Max 
-1.276 -0.781 -0.593  0.908  2.132 

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.6111     0.3300    1.85  0.06404
x             0.0939     0.0257    3.65  0.00027
               
(Intercept) .  
x           ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -2.731      1.890   -1.44     0.15
x              0.217      0.146    1.49     0.14
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 17 
Log-likelihood: -34.4 on 4 Df

AIC(data.zip.zip, data.zip.zip1)

              df   AIC
data.zip.zip   3 78.34
data.zip.zip1  4 76.81

par(mar = c(4, 5, 0, 0))
plot(y ~ x, data = data.zip, type = "n", ann = F, axes = F)
points(y ~ x, data = data.zip, pch = 16)
xs <- seq(0, 20, l = 1000)
#ys <- predict(data.pois.glm, newdata = data.frame(x = xs), type = "response", se = T)

coefs <- coef(data.zip.zip)[1:2]
mm <- model.matrix(~x, data=data.frame(x=xs))

fit <- coefs %*% t(mm)
se <- sqrt(diag(mm %*% vcov(data.zip.zip)[-3,-3] %*% t(mm)))
lwr <- as.vector(exp(fit - 2*se))
upr <- as.vector(exp(fit + 2*se))
fit <- as.vector(exp(fit))
lines(fit ~ xs, col = "black")
lines(lwr ~ xs, col = "black", lty = 2)
lines(upr ~ xs, col = "black", lty = 2)

polygon(c(xs,rev(xs)), c(lwr,rev(upr)), border=NA, col="#0000FF50")
#lines(ys$fit - 2 * ys$se.fit ~ xs, col = "black", type = "l", lty = 2)
#lines(ys$fit + 2 * ys$se.fit ~ xs, col = "black", type = "l", lty = 2)
axis(1)
mtext("X", 1, cex = 1.5, line = 3)
axis(2, las = 2)
mtext("Abundance of Y", 2, cex = 1.5, line = 3)
box(bty = "l")

Generalized Linear Mixed effects Models (GLMM)

Workshop 11.1: Generalized linear models

Murray Logan

26-011-2013

Other data types

General linear models

General linear models

General linear models

General linear models

General linear models

Generalized linear models

Generalized linear models

OLS

Maximum Likelihood

Logistic regression

Binary data

Logistic regression

Dispersion

Dispersion

Over-dispersion

Logistic regression

Binary data

Logistic regression

Binary data

Logistic regression

Binary data

Logistic regression

Binary data

Logistic regression

Binary data

Logistic regression

Binary data

Worked Example

Poisson regression

Dispersion

Dispersion

Over-dispersion

Worked Example

Quasi-Poisson

Worked Example

Negative Binomial

Negative Binomial

Worked Example

Zero-inflated model

Worked Example

Generalized Linear Mixed effects Models (GLMM)