Jump to main navigation


Tutorial 15.2 - Q-mode inference testing

12 Mar 2015

View R code for preliminaries.
> library(vegan)
> library(ggplot2)
> library(grid)
> #define my common ggplot options
> murray_opts <- opts(panel.grid.major=theme_blank(),
+                            panel.grid.minor=theme_blank(),
+                            panel.border = theme_blank(),
+                            panel.background = theme_blank(),
+                            axis.title.y=theme_text(size=15, vjust=0,angle=90),
+                            axis.text.y=theme_text(size=12),
+                            axis.title.x=theme_text(size=15, vjust=-1),
+                            axis.text.x=theme_text(size=12),
+                            axis.line = theme_segment(),
+                            plot.margin=unit(c(0.5,0.5,1,2),"lines")
+ )
Error: Use 'theme' instead. (Defunct;
last used in version 0.9.1)
> coenocline <- function(x,A0,m,r,a,g, int=T, noise=T) {
+ #x is the environmental range
+ #A0 is the maximum abundance of the species at the optimum environmental conditions
+ #m is the value of the environmental gradient that represents the optimum conditions for the species
+ #r the species range over the environmental gradient (niche width)
+ #a and g are shape parameters representing the skewness and kurtosis
+ # when a=g, the distribution is symmetrical
+ # when a>g - negative skew (large left tail)
+ # when a<g - positive skew (large right tail)
+ #int - indicates whether the responses should be rounded to integers (=T)
+ #noise - indicates whether or not random noise should be added (reflecting random sampling)  
+ #NOTE.  negative numbers converted to 0
+          b <- a/(a+g)
+          d <- (b^a)*(1-b)^g
+          cc <- (A0/d)*((((x-m)/r)+b)^a)*((1-(((x-m)/r)+b))^g)
+          if (noise) {n <- A0/10; n[n<0]<-0; cc<-cc+rnorm(length(cc),0,n)}
+          cc[cc<0] <- 0
+          cc[is.na(cc)]<-0
+          if (int) cc<-round(cc,0)
+          cc
+ }
> #plot(coenocline(0:100,40,40,20,1,1, int=T, noise=T), ylim=c(0,100))
> 
> dummy <- function(x) {
+   nms <- colnames(x)
+   ff <- eval(parse(text=paste("~",paste(nms,collapse="+"))))
+   mm <- model.matrix(ff,x)
+   nms <- colnames(mm)
+   mm <- as.matrix(mm[,-1])
+   colnames(mm) <- nms[-1]
+   mm
+ }

To assist with demonstrating Multidimensional Scaling (MDS), we will return to the fabricated species abundance data introduced in Tutorial 13.2. This data set comprises the abundances of 10 species within 10 sites located along a transect that extends in a northerly direction over a mountain range.


> set.seed(1)
> x <- seq(0,50,l=10)
> n <- 10
> sp1<-coenocline(x=x,A0=5,m=0,r=2,a=1,g=1,int=T, noise=T)
> sp2<-coenocline(x=x,A0=70,m=7,r=30,a=1,g=1,int=T, noise=T)
> sp3<-coenocline(x=x,A0=50,m=15,r=30,a=1,g=1,int=T, noise=T)
> sp4<-coenocline(x=x,A0=7,m=25,r=20,a=0.4,g=0.1,int=T, noise=T)
> sp5<-coenocline(x=x,A0=40,m=30,r=30,a=0.6,g=0.5,int=T, noise=T)
> sp6<-coenocline(x=x,A0=15,m=35,r=15,a=0.2,g=0.3,int=T, noise=T)
> sp7<-coenocline(x=x,A0=20,m=45,r=25,a=0.5,g=0.9,int=T, noise=T)
> sp8<-coenocline(x=x,A0=5,m=45,r=5,a=1,g=1,int=T, noise=T)
> sp9<-coenocline(x=x,A0=20,m=45,r=15,a=1,g=1,int=T, noise=T)
> sp10<-coenocline(x=x,A0=30,m=50,r=5,a=1,g=1,int=T, noise=T)
> X <- cbind(sp1, sp10,sp9,sp2,sp3,sp8,sp4,sp5,sp7,sp6)
> #X<-X[c(1,10,9,2,3,8,4,5,7,6),] 
> colnames(X) <- paste("Sp",1:10,sep="")
> rownames(X) <- paste("Site", c(1,10,9,2,3,8,4,5,7,6), sep="")
> X <- X[c(1,4,5,7,8,10,9,6,3,2),]
> data <- data.frame(Sites=factor(rownames(X),levels=rownames(X)), X)
Sites Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7 Sp8 Sp9 Sp10
Site1 5 0 0 65 5 0 0 0 0 0
Site2 0 0 0 25 39 0 6 23 0 0
Site3 0 0 0 6 42 0 6 31 0 0
Site4 0 0 0 0 0 0 0 40 0 14
Site5 0 0 6 0 0 0 0 34 18 12
Site6 0 29 12 0 0 0 0 0 22 0
Site7 0 0 21 0 0 5 0 0 20 0
Site8 0 0 0 0 13 0 6 37 0 0
Site9 0 0 0 60 47 0 4 0 0 0
Site10 0 0 0 72 34 0 0 0 0 0
> set.seed(1)
> Site <- gl(10,1,10,lab=paste('Site',1:10, sep=""))
> Y <- matrix(c(
+ 6.1,4.2,101325,2,
+ 6.7,9.2,101352,510,
+ 6.8,8.6,101356,546,
+ 7.0,7.4,101372,758,
+ 7.2,5.8,101384,813,
+ 7.5,8.4,101395,856,
+ 7.5,0.5,101396,854,
+ 7.0,11.8,101370,734,
+ 8.4,8.2,101347,360,
+ 6.2,1.5,101345,356
+   ),10,4, byrow=TRUE)
> colnames(Y) <- c('pH','Slope', 'Pressure', 'Altitude')
> Substrate <- factor(c('Quartz','Shale','Shale','Shale','Shale','Quartz','Quartz','Shale','Quartz','Quartz'))
> enviro <- data.frame(Site,Y,Substrate)
Site pH Slope Pressure Altitude Substrate
Site1 6 4 101325 2 Quartz
Site2 7 9 101352 510 Shale
Site3 7 9 101356 546 Shale
Site4 7 7 101372 758 Shale
Site5 7 6 101384 813 Shale
Site6 8 8 101395 856 Quartz
Site7 8 0 101396 854 Quartz
Site8 7 12 101370 734 Shale
Site9 8 8 101347 360 Quartz
Site10 6 2 101345 356 Quartz

At the end of the previous tutorial, we started to explore the relationships between community patterns and concurrently measured environmental data. To that end, we also regressed the first few principle components against environmental variables in Tutorial 14.2 and Tutorial 14.3. Tutorial 14.4 took the process of incorporating environmental predictors one step further by constraining the axis rotation process by the environmental variables. Nevertheless, in each case we are essentially exploring the (associations) effects of the environmental in reduced ordinal space. That is, we are investigating how the environmental variables relate to SOME of the patterns in the community data.

There are a number of techniques available for relating community data to environmental data without having to go via ordination. Each of these techniques explores relationships between the environmental data and the community distance matrix. Since the values in a distance (dissimilarity) matrix are not independent of one another (nor the same size as a regular predictor variable), regular modelling techniques cannot be employed. Instead, all of the techniques employ permutation tests. This tutorial will illustrate a couple of the most useful techniques.

Permutation (randomization) tests

Permutation (randomization) tests proceed by calculating a test statistic of some kind. This test statistic can be anything, but is usually a measure of effect, difference, association etc and is typically analogous to those used in regular statistical analyses. Where permutation tests diverge from regular hypothesis tests is that rather than base inferences on comparing the test statistic to a mathematical distribution representing the expected density of possible values when the null hypothesis is true, the test statistic is compared to a distribution created entirely and specifically for the observed data.

If the null hypothesis is really true (and there is no effect, association or pattern etc), then any arrangement (reshuffle) of the data should be just as likely to yield a test statistic as large as the actual one observed from the original data. Therefore, by repeatedly shuffling the data (or residuals) and each time generating the test statistic, we can build up a distribution of possible test statistic values for when the null hypothesis is true. This distribution of course includes the original test statistic value.

Null hypotheses can thus be tested by calculating the proportion of times the randomized/permuted test statistics (including the original test statistic value) are greater or equal to your original test statistic. This proportion is the equivalent of a p-value. Using the data to generate a custom reference distribution frees the analysis up from many of the assumptions that restrict parametric tests (such as normality and independence). However, the p-value is a proportion of the number of permutations, its resolution is dependent on the number of permutations and thus sample size. For example, there are only 5 rows in your data set, then there are a total of 120 ways to reorder the values in one of the columns. Consequently, the smallest p-value possible is $1/120=0.008$. More generally, permutation tests are less powerful (more conservative) than regular parametric tests. Moreover, the hypotheses they test do not strictly pertain to populations. Rather they are testing how likely an outcome could occur by chance.

Permutation test are so called because they generate a test statistic for each permutation (unique combination of values) of the data. Obviously, the original data is just one possible permulation. When the number of unique combinations in which the data can be rearranged gets very large (i.e. from large data sets) it is often not necessary to survey them all. Instead, it is often sufficient to sample a random selection (hence the alternative name - randomization test).

Mantel test

The Mantel test explores the correlation between two distance matrices. The two matrices are first flattered out into vectors and the Pearson product-moment correlation coefficient ($R$) is calculated. To assess the 'significance' (probability that population $R=0$) of this correlation coefficient is tested via a permutation procedure in which one of the vectors is repeatedly shuffled (random permutations) and the correlation coefficient recalculated after each permutation. The p-value is then calculated as the proportion of permuted $R$ values that are higher than (or equal to) the observed $R$ value.

In R, the Mantel test is supported by the mantel() function in the vegan package. As input, the mantel() function takes two distance matrices.

For this exercise we will prepare the community and environmental data according to:

  • community data - square root transform followed by a Wisconsin double standardization and then a Bray-Curtis dissimilarity matrix
    > data.dist <- vegdist(wisconsin(sqrt(data[,-1])),"bray")
    > data.dist
    
             Site1   Site2   Site3   Site4
    Site2  0.67588                        
    Site3  0.76402 0.08815                
    Site4  1.00000 0.76729 0.71733        
    Site5  1.00000 0.76729 0.71950 0.43782
    Site6  1.00000 1.00000 1.00000 1.00000
    Site7  1.00000 1.00000 1.00000 1.00000
    Site8  0.85671 0.24898 0.18482 0.61339
    Site9  0.52225 0.24045 0.30462 1.00000
    Site10 0.43931 0.53961 0.60377 1.00000
             Site5   Site6   Site7   Site8
    Site2                                 
    Site3                                 
    Site4                                 
    Site5                                 
    Site6  0.56218                        
    Site7  0.56218 0.40288                
    Site8  0.71950 1.00000 1.00000        
    Site9  1.00000 1.00000 1.00000 0.48944
    Site10 1.00000 1.00000 1.00000 0.78859
             Site9
    Site2         
    Site3         
    Site4         
    Site5         
    Site6         
    Site7         
    Site8         
    Site9         
    Site10 0.29915
    
  • environmental data - standardize each of the variables to a mean of 0 and a standard deviation of 1 and then a Euclidean distance matrix
    > #since Substrate is a categorical variable, it first needs to be converted to a numeric
    > # this could be done by first converting it into dummy variables or when there are only 
    > # two levels, a simple conversion to numeric will suffice.
    > enviro1 <- within(enviro, Substrate <- as.numeric(Substrate))
    > env.dist <- vegdist(decostand(enviro1[,-1],"standardize"),"euclid")
    > env.dist
    
            1      2      3      4      5
    2  3.3230                            
    3  3.4352 0.3113                     
    4  4.2004 1.4095 1.1200              
    5  4.6243 2.1315 1.8281 0.7722       
    6  4.9177 3.1669 2.9562 2.3098 2.1404
    7  4.9080 4.0132 3.7487 3.0169 2.5116
    8  4.5387 1.4067 1.3098 1.2431 1.8378
    9  3.9401 3.2272 3.1427 3.3445 3.5238
    10 1.7178 3.0392 3.0118 3.3462 3.5746
            6      7      8      9
    2                             
    3                             
    4                             
    5                             
    6                             
    7  2.2217                     
    8  2.5329 3.9604              
    9  3.0373 3.7501 3.4269       
    10 3.9177 3.4388 4.0496 3.7782
    

Whilst the permutation aspect does free up some of the usual modelling assumptions (such as independence and normality), correlations are still inherently linear and therefore the relationships between the two matrices must also be linear. So our first step must be to explore the relationship between the two distance matrices using a scatterplot.

> plot(data.dist, env.dist)
plot of chunk scatterplot
This looks vaguely linear - at least no other simple function would be obviously more appropriate.

So lets now perform the Mantel test with Pearson's product-moment correlation coefficient and a total of 1000 permutations.

> data.mantel <- mantel(data.dist, env.dist, perm=1000)
> data.mantel
Mantel statistic based on Pearson's product-moment correlation 

Call:
mantel(xdis = data.dist, ydis = env.dist, permutations = 1000) 

Mantel statistic r: 0.559 
      Significance: 0.007 

Upper quantiles of permutations (null model):
  90%   95% 97.5%   99% 
0.225 0.305 0.358 0.465 

Based on 1000 permutations
  • The correlation between community distances and environmental distances is 0.56.
  • Of the 1000 random permutations only 7 were equal to or greater than 0.56 (including the original $R$ value. We would therefore conclude that there is a relationship between community structure and the environmental variables.
  • For those who are not fixated on p-values, a range of upper confidence limits are also provided. These limits represent the upper limits of various confidence ranges of the $R$ values. Hence the original $R$ value is outside the upper 99% limit of 0.465

We can even examine a histogram of the permutational $R$ values and visualize how the original $R$ compares.

> hist(data.mantel$perm)
> abline(v=data.mantel$statistic)
plot of chunk mantelhist
  • The probability distribution of $R$ is slightly skewed to the right rather than being a normal distribution. Nevertheless the observed $R$ value (indicated by the vertical line) is far to the right of this distribution and would therefore be considered a highly unusual outcome if the null hypothesis really was true.

Best subsets of environmental variables

The Mantel test investigates whether there is any evidence of a relationship between the patterns of community structure and and the patterns of environemntal variables. It does not single out the contributions of each of the environmental variables. In order to do so, we would need to iteratively go through all possible subsets (combinations) of the environmental variables and determine which subset correlates strongest with the community data.

The bioenv() function provides this very functionality by comparing the Spearman rank-based $R$ values from each combination of environmental predictors and selecting the combination with the highest $R$ value. This procedure is analogous to model selection in multiple regression.

Note, as input the bioenv() function expectes the community data to be a regular data matrix (not a distance matrix as it calls vegdist to create the distance matrix). Note also that the predictor variables must all be numeric (no factors). Factors should be first converted to a number (if there are only two levels) or to dummy codes.

> # convert the Substrate factor to a numeric via dummy coding
> # function definition at the start of this Tutorial
> enviro1 <- dummy(enviro[,-1])

Importantly, as the bioenv procedure fits models with multiple predictors (analogous to multiple regression), issues analogous to (multi)collinearity also apply. Specifically, when two or more predictors are correlated to one another, terms estimated later in a model tend to be underestimated as their trends have already been accounted for in earlier terms. Consequently, we need to ensure that the predictors are not correlated to one another. This can be assessed via scatterplot matricies, correlation matrices or more definitively, via variance inflation.

> library(car)
> vif(lm(1:nrow(enviro1) ~ pH+Slope+Pressure+Altitude+SubstrateShale, data=data.frame(enviro1)))
            pH          Slope 
         1.976          2.188 
      Pressure       Altitude 
        45.805         52.754 
SubstrateShale 
         5.118 
> vif(lm(1:nrow(enviro1) ~ pH+Slope+Altitude+SubstrateShale, data=data.frame(enviro1)))
            pH          Slope 
         1.968          2.117 
      Altitude SubstrateShale 
         1.830          2.636 
  • Clearly there is an issue with the inclusion of both Pressure and Altitude. It is perhaps not surprising that Pressure and Altitude are related.
  • Whilst neither Pressure nor Altitude are likely to be directly responsible for changes in community structure, Altitude is likely to be a closer conceptual proxy to a genuine ecological gradient - so Altitude will be retained at the expense of Pressure.

> # convert the Substrate factor to a numeric via dummy coding
> # function definition at the start of this Tutorial
> enviro1 <- dummy(enviro[,-1])
> #Pressure is now in column 3
> data.bioenv <- bioenv(wisconsin(sqrt(data[,-1])), decostand(enviro1[,-3],"standardize"))
> data.bioenv
Call:
bioenv(comm = wisconsin(sqrt(data[, -1])), env = decostand(enviro1[,      -3], "standardize")) 

Subset of environmental variables with best correlation to community data.

Correlations:      spearman 
Dissimilarities:   bray 

Best model has 1 parameters (max. 4 allowed):
Altitude
with correlation  0.5515 
  • In this case, the best model contained just a single predictor (Altitude) which had a very strong association (0.5515) with the community data.

Note, this technique is best viewed as an exploratory variable selection tool. It does not perform inference testing, it just provides an empirical basis on which to justify exploring trends with subsets of environmental variables.

Permutational Multivariate Analaysis of Variance - adonis

Permutational Multivariate Analysis of Variance (adonis - or sometimes referred to as permutational MANOVA or nonparameteric MANOVA), partitions the total sums of squares of the multivariate data set (distance matrix) into the sequential contributions of each of the terms of a linear predictor (set of predictor variables perhaps with interactions).

Inference testing is conducted via permuted sequential sums of squares F-statistics generated from a large number of permutations of the raw data. When the distance matrix is Euclidean, the analysis is analogous to Redundancy Analysis (RDA). However, adonis is more flexible in that it can accommodate any distance matrix.

In R, adonis is supported via the adonis() function in the vegan package. The model is expressed as a formula of which the right hand side is a linear combination of predictors and the left hand side is either a distance matrix or a raw data frame. In the latter case, the data frame will be converted into a distance matrix via the vegdist() function

For reasons given above, we must avoid building models with predictors that are correlated to one another. In this case, we will exclude Pressure from the model.

> data.adonis <- adonis(data.dist ~ pH + Slope + Altitude + Substrate, data=enviro)
> data.adonis
Call:
adonis(formula = data.dist ~ pH + Slope + Altitude + Substrate,      data = enviro) 

Terms added sequentially (first to last)

          Df SumsOfSqs MeanSqs F.Model
pH         1     0.219   0.219    1.63
Slope      1     0.560   0.560    4.16
Altitude   1     0.984   0.984    7.32
Substrate  1     0.384   0.384    2.86
Residuals  5     0.672   0.134        
Total      9     2.819                
             R2 Pr(>F)    
pH        0.078  0.221    
Slope     0.199  0.013 *  
Altitude  0.349  0.001 ***
Substrate 0.136  0.045 *  
Residuals 0.238           
Total     1.000           
---
Signif. codes:  
  0 '***' 0.001 '**' 0.01 '*' 0.05
  '.' 0.1 ' ' 1
  • Community patterns are associated with changes in slope, altitude and also substrate type.

There are numerous other techniques for testing hypotheses about the effects of individual factors on multivariate community data (notably, Analysis of Similarities or ANOSIM), yet these are less flexible (only handle single predictors) and tend to be more susceptible to heterogeneity and sample sizes.




Worked Examples

Basic statistics references
  • Legendre and Legendre
  • Quinn & Keough (2002) - Chpt 17

Mantel tests

Vare et al. (1995) measured the cover abundance of 44 plants from 24 sites so as to explore patterns in vegetation communities between these sites. They also measured a number of environmental variables (mainly concentration or various soil chemicals) from each site so as to also be able to characterise sites according to soil characteristics. Their primary interest was to investigate whether there was a correlation between the plant communities and the soil characteristics.

Download vareveg data set Download vareenv data set
Format of vareveg and vareenv data files
vareveg
SITEC.vul...Cla.phy
180.55...0.00
150.67...0.00
240.10...0.00
270.00...0.00
230.00...0.00

SITEName or number of the 14 sites
C.val, ...Cl.phyCover abundance of 44 species of plants
vareenv
SITEN...pH
1819.8...2.7
1513.4...2.8
2420.2...3.0
2720.6...2.8
2323.8...2.7

SITEName or number of the 14 sites
N, ..., pHMeasurements for 14 environmental soil chemicals from each of the sites

Open the vareveg and vareenv data sets.
Show code
> vareveg <- read.csv('../downloads/data/vareveg.csv')
> vareenv <- read.csv('../downloads/data/vareenv.csv')
  1. Briefly explore the two data sets and determine the appropriate sorts of standardizations and distance matrices
    1. Vegetation data
      Show code
      > #species means
      > apply(vareveg[,-1],2, mean, na.rm=TRUE)
      
          C.vul   Emp.nig   Led.pal   Vac.myr 
       1.877917  6.332917  0.349583  2.112917 
        Vac.vit   Pin.syl   Des.fle   Bet.pub 
      11.459583  0.171250  0.233333  0.012083 
        Vac.uli   Dip.mon    Dic.sp   Dic.fus 
       0.634167  0.135000  1.687500  4.730000 
        Dic.pol   Hyl.spl   Ple.sch   Pol.pil 
       0.252500  0.751667 15.748750  0.025417 
        Pol.jun   Pol.com   Poh.nut   Pti.cil 
       0.577083  0.029583  0.109167  0.583750 
        Bar.lyc   Cla.arb   Cla.ran   Cla.ste 
       0.132917 10.627083 16.196250 20.279583 
        Cla.unc   Cla.coc   Cla.cor   Cla.gra 
       2.345000  0.116250  0.259167  0.214167 
        Cla.fim   Cla.cri   Cla.chl   Cla.bot 
       0.165000  0.311250  0.048333  0.019583 
        Cla.ama    Cla.sp   Cet.eri   Cet.isl 
       0.005833  0.021667  0.150000  0.084583 
        Cet.niv   Nep.arc    Ste.sp   Pel.aph 
       0.493750  0.219167  0.730000  0.031667 
        Ich.eri   Cla.cer   Cla.def   Cla.phy 
       0.009167  0.004167  0.426250  0.033333 
      
      > #species maximums
      > apply(vareveg[,-1],2, max)
      
        C.vul Emp.nig Led.pal Vac.myr Vac.vit 
        24.13   16.00    4.00   18.27   25.00 
      Pin.syl Des.fle Bet.pub Vac.uli Dip.mon 
         1.20    3.70    0.25    8.10    2.07 
       Dic.sp Dic.fus Dic.pol Hyl.spl Ple.sch 
        23.43   37.07    3.00    9.97   70.03 
      Pol.pil Pol.jun Pol.com Poh.nut Pti.cil 
         0.25    6.98    0.25    0.32   10.00 
      Bar.lyc Cla.arb Cla.ran Cla.ste Cla.unc 
         3.00   39.00   59.00   84.30   23.68 
      Cla.coc Cla.cor Cla.gra Cla.fim Cla.cri 
         0.25    1.42    0.50    0.25    1.78 
      Cla.chl Cla.bot Cla.ama  Cla.sp Cet.eri 
         0.25    0.25    0.08    0.25    0.78 
      Cet.isl Cet.niv Nep.arc  Ste.sp Pel.aph 
         0.67   10.03    4.87   10.28    0.33 
      Ich.eri Cla.cer Cla.def Cla.phy 
         0.10    0.05    1.97    0.25 
      
      > #species sums
      > apply(vareveg[,-1],2, sum, na.rm=TRUE)
      
        C.vul Emp.nig Led.pal Vac.myr Vac.vit 
        45.07  151.99    8.39   50.71  275.03 
      Pin.syl Des.fle Bet.pub Vac.uli Dip.mon 
         4.11    5.60    0.29   15.22    3.24 
       Dic.sp Dic.fus Dic.pol Hyl.spl Ple.sch 
        40.50  113.52    6.06   18.04  377.97 
      Pol.pil Pol.jun Pol.com Poh.nut Pti.cil 
         0.61   13.85    0.71    2.62   14.01 
      Bar.lyc Cla.arb Cla.ran Cla.ste Cla.unc 
         3.19  255.05  388.71  486.71   56.28 
      Cla.coc Cla.cor Cla.gra Cla.fim Cla.cri 
         2.79    6.22    5.14    3.96    7.47 
      Cla.chl Cla.bot Cla.ama  Cla.sp Cet.eri 
         1.16    0.47    0.14    0.52    3.60 
      Cet.isl Cet.niv Nep.arc  Ste.sp Pel.aph 
         2.03   11.85    5.26   17.52    0.76 
      Ich.eri Cla.cer Cla.def Cla.phy 
         0.22    0.10   10.23    0.80 
      
      > #species variance
      > apply(vareveg[,-1],2, var, na.rm=TRUE)
      
          C.vul   Emp.nig   Led.pal   Vac.myr 
      2.488e+01 2.278e+01 9.328e-01 2.322e+01 
        Vac.vit   Pin.syl   Des.fle   Bet.pub 
      3.882e+01 7.510e-02 5.834e-01 2.600e-03 
        Vac.uli   Dip.mon    Dic.sp   Dic.fus 
      2.903e+00 1.847e-01 2.953e+01 8.290e+01 
        Dic.pol   Hyl.spl   Ple.sch   Pol.pil 
      4.601e-01 5.763e+00 3.570e+02 3.774e-03 
        Pol.jun   Pol.com   Poh.nut   Pti.cil 
      2.117e+00 4.700e-03 1.031e-02 4.165e+00 
        Bar.lyc   Cla.arb   Cla.ran   Cla.ste 
      3.734e-01 1.124e+02 2.223e+02 8.595e+02 
        Cla.unc   Cla.coc   Cla.cor   Cla.gra 
      2.468e+01 7.885e-03 7.913e-02 1.254e-02 
        Cla.fim   Cla.cri   Cla.chl   Cla.bot 
      6.478e-03 1.823e-01 6.797e-03 2.726e-03 
        Cla.ama    Cla.sp   Cet.eri   Cet.isl 
      3.210e-04 2.693e-03 4.372e-02 2.404e-02 
        Cet.niv   Nep.arc    Ste.sp   Pel.aph 
      4.149e+00 9.838e-01 4.325e+00 6.867e-03 
        Ich.eri   Cla.cer   Cla.def   Cla.phy 
      6.167e-04 1.471e-04 2.595e-01 7.101e-03 
      
      What combination of standardization and distance matrix would you recommend?
    2. Vegetation data
      Show code
      > #environmental variables means
      > apply(vareenv[,-1],2, mean, variables.rm=TRUE)
      
             N        P        K       Ca 
       22.3833  45.0792 162.9292 569.6625 
            Mg        S       Al       Fe 
       87.4583  37.1917 142.4750  49.6125 
            Mn       Zn       Mo Baresoil 
       49.3292   7.5958   0.3958  22.9492 
      Humdepth       pH 
        2.2000   2.9333 
      
      > #environmental variables maximums
      > apply(vareenv[,-1],2, max)
      
             N        P        K       Ca 
          33.1     73.5    313.8   1169.7 
            Mg        S       Al       Fe 
         209.1     60.2    435.1    204.4 
            Mn       Zn       Mo Baresoil 
         132.0     16.8      1.1     56.9 
      Humdepth       pH 
           3.8      3.6 
      
      > #variable sums
      > apply(vareenv[,-1],2, sum, na.rm=TRUE)
      
             N        P        K       Ca 
         537.2   1081.9   3910.3  13671.9 
            Mg        S       Al       Fe 
        2099.0    892.6   3419.4   1190.7 
            Mn       Zn       Mo Baresoil 
        1183.9    182.3      9.5    550.8 
      Humdepth       pH 
          52.8     70.4 
      
      > #environmental variables variance
      > apply(vareenv[,-1],2, var, na.rm=TRUE)
      
              N         P         K        Ca 
      3.056e+01 2.234e+02 4.205e+03 5.933e+04 
             Mg         S        Al        Fe 
      1.682e+03 1.361e+02 1.496e+04 3.654e+03 
             Mn        Zn        Mo  Baresoil 
      1.150e+03 8.904e+00 5.759e-02 2.703e+02 
       Humdepth        pH 
      4.417e-01 4.580e-02 
      
      What combination of standardization and distance matrix would you recommend?
  2. Actually there are not incorrect answers to the above questions. Each combination of standardizations and distance matrices will emphasize a different aspect of the multivariate data sets. Indeed one of the attractions of Q-mode analyses is this very flexibility. Perform the standardizations/distance matrices as indicated above.
    Show code
    > library(vegan)
    > #species 
    > vareveg.std <- wisconsin(vareveg[,-1])
    > vareveg.dist <- vegdist(vareveg.std, "bray")
    > #environmental variables 
    > vareenv.std <- decostand(vareenv[,-1], "standardize")
    > vareenv.dist <- vegdist(vareenv.std, "euc")
    
  3. Perform a Mantel test to calculate the correlation between the two matricies (vegetation and soil).
    Show code
    > library(vegan)
    > mantel(vareveg.dist,vareenv.dist)
    
    Mantel statistic based on Pearson's product-moment correlation 
    
    Call:
    mantel(xdis = vareveg.dist, ydis = vareenv.dist) 
    
    Mantel statistic r: 0.427 
          Significance: 0.001 
    
    Upper quantiles of permutations (null model):
      90%   95% 97.5%   99% 
    0.160 0.202 0.230 0.276 
    
    Based on 999 permutations
    
    1. What was the R value?
      Show code
      > library(vegan)
      > mantel(vareveg.dist,vareenv.dist)$statistic
      
      [1] 0.4268
      
    2. Was this significant?
  4. Generate a correlogram (mutivariate correlation plot)
    Show code
    > par(mar=c(4,4,0,0))
    > plot(vareenv.dist,vareveg.dist, ann=F, axes=F, type="n")
    > points(vareenv.dist,vareveg.dist, pch=16)
    > axis(1)
    > axis(2, las=1)
    > mtext("Vegetation distances",2,line=3)
    > mtext("Soil chemistry distances",1,line=3)
    > box(bty="l")
    
    plot of chunk ws15.2Q1.4

Adonis - Permutational Multivariate analysis of variance

In the previous question we established that there was an association between the vegetation community and the soil chemistry. In this question we are going to delve a little deeper into that association. Now there are numerous ways that we could do this, so we are going to just focus on two of those.

  1. perform a permutational multivariate analysis of variance (adonis) of the vegetation community against concentrations of certain soil chemicals
  2. perform a permutational multivariate analysis of variance (adonis) of the vegetation community against the PCA axes scores from the soil chemistry data

  1. Since the permulational multivariate analysis of variance partitions variance sequentially, it is important that all predictors are uncorrelated to one another ( or the effects of those later in the model will be under-estimated) - the (multi)collinearity issue.

    Lets start by exploring the relationships amongst the soil chemistry variables

    Show code
    > pairs(vareenv[,-1])
    
    plot of chunk ws15.2Q2.1
    Clearly Phosphorus, Potassium, Calcium, Magnessium, Sulfure and Zinc are correlated to one another and thus cannot all be in the model. Similarly, Iron, Aluminium, percent of baresoil, pH and humus depth are correlated to each other.
  2. We might elect to include one as a representative from each of these groups along with Nitrogen in a model. Ideally we would base our selection on some sort of meaningful criteria. In the absence of anything even vaguely sensible around here, we will try the combination of Nitrogen, Calcium and Iron. Lets try it.
    Show code
    > adonis(vareveg.dist~N+Ca+Fe, data=vareenv)
    
    Call:
    adonis(formula = vareveg.dist ~ N + Ca + Fe, data = vareenv) 
    
    Terms added sequentially (first to last)
    
              Df SumsOfSqs MeanSqs F.Model
    N          1      0.27   0.267    1.70
    Ca         1      0.38   0.381    2.42
    Fe         1      0.30   0.303    1.93
    Residuals 20      3.14   0.157        
    Total     23      4.09                
                 R2 Pr(>F)   
    N         0.065  0.060 . 
    Ca        0.093  0.005 **
    Fe        0.074  0.034 * 
    Residuals 0.768          
    Total     1.000          
    ---
    Signif. codes:  
      0 '***' 0.001 '**' 0.01 '*' 0.05
      '.' 0.1 ' ' 1
    
  3. The alternative approach introduced earlier is to create a triplet of soil chemistry predictors from PCA. Lets have a try
    Show code
    > vareenv.pca <- rda(vareenv[,-1], scale=TRUE)
    > adonis(vareveg.dist~scores(vareenv.pca,1)$sites+scores(vareenv.pca,2)$sites+scores(vareenv.pca,3)$sites)
    
    Call:
    adonis(formula = vareveg.dist ~ scores(vareenv.pca, 1)$sites +      scores(vareenv.pca, 2)$sites + scores(vareenv.pca, 3)$sites) 
    
    Terms added sequentially (first to last)
    
                                 Df
    scores(vareenv.pca, 1)$sites  1
    scores(vareenv.pca, 2)$sites  1
    scores(vareenv.pca, 3)$sites  1
    Residuals                    20
    Total                        23
                                 SumsOfSqs
    scores(vareenv.pca, 1)$sites      0.46
    scores(vareenv.pca, 2)$sites      0.41
    scores(vareenv.pca, 3)$sites      0.25
    Residuals                         2.98
    Total                             4.09
                                 MeanSqs
    scores(vareenv.pca, 1)$sites   0.461
    scores(vareenv.pca, 2)$sites   0.406
    scores(vareenv.pca, 3)$sites   0.249
    Residuals                      0.149
    Total                               
                                 F.Model
    scores(vareenv.pca, 1)$sites    3.09
    scores(vareenv.pca, 2)$sites    2.72
    scores(vareenv.pca, 3)$sites    1.67
    Residuals                           
    Total                               
                                    R2
    scores(vareenv.pca, 1)$sites 0.113
    scores(vareenv.pca, 2)$sites 0.099
    scores(vareenv.pca, 3)$sites 0.061
    Residuals                    0.728
    Total                        1.000
                                 Pr(>F)    
    scores(vareenv.pca, 1)$sites  0.001 ***
    scores(vareenv.pca, 2)$sites  0.002 ** 
    scores(vareenv.pca, 3)$sites  0.050 *  
    Residuals                              
    Total                                  
    ---
    Signif. codes:  
      0 '***' 0.001 '**' 0.01 '*' 0.05
      '.' 0.1 ' ' 1
    
    Note this has not necessarily told us a great deal more. All we have established is that there may well be three important gradients underlying the chemical ecosystem and that these many influence the vegetation communities.

Permutational Multivariate analysis of variance

Jongman et al. (1987) presented a data set from a study in which the cover abundance of 30 plant species were measured on 20 rangeland dune sites. They also indicated what the form of management each site experienced (either biological farming, hobby farming, nature conservation management or standard farming. The major intension of the study was to determine whether the vegetation communities differed between the alternative management practices.

Download dune data set
Format of dune.csv data file
MANAGEMENTBelper..Brohor
BF3..4
SF0..0
SF2..3
SF0..0
HF0..0
........
trout

Open the dune data set.
Show code
> dune <- read.csv('../downloads/data/dune.csv')
> dune
   MANAGEMENT Belper Empnig Junbuf
1          BF      3      0      0
2          SF      0      0      3
3          SF      2      0      0
4          SF      0      0      0
5          HF      0      0      0
6          SF      0      0      0
7          HF      0      0      0
8          HF      2      0      0
9          NM      0      0      0
10         NM      0      0      0
11         BF      2      0      0
12         BF      0      0      0
13         HF      0      0      4
14         NM      2      0      0
15         SF      2      0      0
16         NM      0      0      0
17         NM      0      0      0
18         NM      0      2      0
19         SF      0      0      4
20         HF      0      0      2
   Junart Airpra Elepal Rumace Viclat
1       0      0      0      0      0
2       0      0      0      0      0
3       0      0      0      0      0
4       3      0      8      0      0
5       0      0      0      6      0
6       0      0      0      0      0
7       4      0      4      0      0
8       0      0      0      5      0
9       0      2      0      0      0
10      3      0      5      0      0
11      0      0      0      0      1
12      0      0      0      0      2
13      4      0      0      2      0
14      0      0      0      0      1
15      0      0      0      0      0
16      4      0      4      0      0
17      0      0      4      0      0
18      0      3      0      0      0
19      0      0      0      2      0
20      0      0      0      3      0
   Brarut Ranfla Cirarv Hyprad Leoaut
1       0      0      0      0      5
2       0      2      0      0      2
3       2      0      2      0      2
4       4      2      0      0      0
5       6      0      0      0      3
6       0      0      0      0      0
7       2      2      0      0      3
8       2      0      0      0      3
9       0      0      0      2      2
10      4      2      0      0      2
11      2      0      0      0      3
12      4      0      0      2      5
13      2      0      0      0      2
14      6      0      0      0      5
15      2      0      0      0      2
16      4      4      0      0      2
17      0      2      0      0      2
18      3      0      0      5      6
19      4      0      0      0      2
20      2      0      0      0      3
   Potpal Poapra Calcus Tripra Trirep
1       0      4      0      0      5
2       0      2      0      0      2
3       0      4      0      0      1
4       0      0      3      0      0
5       0      3      0      5      5
6       0      4      0      0      0
7       0      4      0      0      2
8       0      2      0      2      2
9       0      1      0      0      0
10      2      0      0      0      1
11      0      4      0      0      6
12      0      4      0      0      3
13      0      4      0      0      3
14      0      3      0      0      2
15      0      5      0      0      2
16      0      0      3      0      0
17      2      0      4      0      6
18      0      0      0      0      2
19      0      0      0      0      3
20      0      4      0      2      2
   Antodo Salrep Achmil Poatri Chealb
1       0      0      3      7      0
2       0      0      0      9      1
3       0      0      0      5      0
4       0      0      0      2      0
5       3      0      2      4      0
6       0      0      1      2      0
7       0      0      0      4      0
8       4      0      2      6      0
9       4      0      2      0      0
10      0      0      0      0      0
11      4      0      4      4      0
12      0      0      0      0      0
13      0      0      0      5      0
14      0      3      0      0      0
15      0      0      0      6      0
16      0      5      0      0      0
17      0      0      0      0      0
18      4      3      0      0      0
19      0      0      0      4      0
20      2      0      2      5      0
   Elyrep Sagpro Plalan Agrsto Lolper
1       4      0      0      0      5
2       0      2      0      5      0
3       4      5      0      8      5
4       0      0      0      7      0
5       0      0      5      0      6
6       4      0      0      0      7
7       0      2      0      4      4
8       4      0      5      0      2
9       0      0      2      0      0
10      0      0      0      4      0
11      0      0      3      0      6
12      0      2      3      0      7
13      6      2      0      3      2
14      0      0      3      0      2
15      4      0      0      4      6
16      0      0      0      5      0
17      0      0      0      4      0
18      0      3      0      0      0
19      0      4      0      4      0
20      0      0      5      0      6
   Alogen Brohor
1       2      4
2       5      0
3       2      3
4       4      0
5       0      0
6       0      0
7       5      0
8       0      2
9       0      0
10      0      0
11      0      4
12      0      0
13      3      0
14      0      0
15      7      0
16      0      0
17      0      0
18      0      0
19      8      0
20      0      2
  1. Briefly explore the dune data set and determine the appropriate sort of standardization and distance index
    Show code
    > #species means
    > apply(dune[,-1],2, mean, na.rm=TRUE)
    
    Belper Empnig Junbuf Junart Airpra 
      0.65   0.10   0.65   0.90   0.25 
    Elepal Rumace Viclat Brarut Ranfla 
      1.25   0.90   0.20   2.45   0.70 
    Cirarv Hyprad Leoaut Potpal Poapra 
      0.10   0.45   2.70   0.20   2.40 
    Calcus Tripra Trirep Antodo Salrep 
      0.50   0.45   2.35   1.05   0.55 
    Achmil Poatri Chealb Elyrep Sagpro 
      0.80   3.15   0.05   1.30   1.00 
    Plalan Agrsto Lolper Alogen Brohor 
      1.30   2.40   2.90   1.80   0.75 
    
    > #species maximums
    > apply(dune[,-1],2, max)
    
    Belper Empnig Junbuf Junart Airpra 
         3      2      4      4      3 
    Elepal Rumace Viclat Brarut Ranfla 
         8      6      2      6      4 
    Cirarv Hyprad Leoaut Potpal Poapra 
         2      5      6      2      5 
    Calcus Tripra Trirep Antodo Salrep 
         4      5      6      4      5 
    Achmil Poatri Chealb Elyrep Sagpro 
         4      9      1      6      5 
    Plalan Agrsto Lolper Alogen Brohor 
         5      8      7      8      4 
    
    > #species sums
    > apply(dune[,-1],2, sum, na.rm=TRUE)
    
    Belper Empnig Junbuf Junart Airpra 
        13      2     13     18      5 
    Elepal Rumace Viclat Brarut Ranfla 
        25     18      4     49     14 
    Cirarv Hyprad Leoaut Potpal Poapra 
         2      9     54      4     48 
    Calcus Tripra Trirep Antodo Salrep 
        10      9     47     21     11 
    Achmil Poatri Chealb Elyrep Sagpro 
        16     63      1     26     20 
    Plalan Agrsto Lolper Alogen Brohor 
        26     48     58     36     15 
    
    > #species variance
    > apply(dune[,-1],2, var, na.rm=TRUE)
    
    Belper Empnig Junbuf Junart Airpra 
    1.0816 0.2000 1.9237 2.6211 0.6184 
    Elepal Rumace Viclat Brarut Ranfla 
    5.5658 3.2526 0.2737 3.6289 1.3789 
    Cirarv Hyprad Leoaut Potpal Poapra 
    0.2000 1.5237 2.4316 0.3789 3.4105 
    Calcus Tripra Trirep Antodo Salrep 
    1.5263 1.5237 3.6079 2.8921 1.9447 
    Achmil Poatri Chealb Elyrep Sagpro 
    1.5368 7.9237 0.0500 4.3263 2.4211 
    Plalan Agrsto Lolper Alogen Brohor 
    3.8000 7.2000 7.9895 6.9053 1.9868 
    
    Whilst the means, maximums and variances are mostly fairly similar, there are some species such as Chealb and Empnig that are an order of magnitude less abundant and variable than the bulk of the species. Therefore a Wisconsin double standardization would still be appropriate. A Bray-Curtis dissimilarity would thence be a good choice.
  2. Perform the aforementioned standardization and distance measure procedures.
    Show code
    > library(vegan)
    > dune.dist <- vegdist(wisconsin(dune[,-1]), "bray")
    
  3. Now perform the permutational multivariate analysis of variance relating the dune vegetation communities to the management practices.
    Show code
    > dune.adonis<-adonis(dune.dist~dune[,1])
    > dune.adonis
    
    Call:
    adonis(formula = dune.dist ~ dune[, 1]) 
    
    Terms added sequentially (first to last)
    
              Df SumsOfSqs MeanSqs F.Model
    dune[, 1]  3      1.42   0.473    2.31
    Residuals 16      3.28   0.205        
    Total     19      4.70                
                 R2 Pr(>F)   
    dune[, 1] 0.302  0.006 **
    Residuals 0.698          
    Total     1.000          
    ---
    Signif. codes:  
      0 '***' 0.001 '**' 0.01 '*' 0.05
      '.' 0.1 ' ' 1
    
  4. The above analysis indicates that the dune vegetation communities do indeed differ between the different management practices. However, we may wish to explore whether the alternative farming practices (BF: biological farming, HF: hobby farming, and NM: natural conservation management) each differ in communities from the standard farming practices (SF).

    We can do this by generating dummy codes for the managemnet variable and defining treatment contrasts relative to the SF level.

    Show code
    > management <-factor(dune$MANAGEMENT, levels=c("SF","BF","HF","NM"))
    > mm <- model.matrix(~management)
    > colnames(mm) <-gsub("management","",colnames(mm))
    > mm <- data.frame(mm)
    > dune.adonis<-adonis(dune.dist~BF+HF+NM, data=mm)
    > dune.adonis
    
    Call:
    adonis(formula = dune.dist ~ BF + HF + NM, data = mm) 
    
    Terms added sequentially (first to last)
    
              Df SumsOfSqs MeanSqs F.Model
    BF         1      0.33   0.334    1.63
    HF         1      0.38   0.376    1.84
    NM         1      0.71   0.709    3.47
    Residuals 16      3.28   0.205        
    Total     19      4.70                
                 R2 Pr(>F)   
    BF        0.071  0.116   
    HF        0.080  0.097 . 
    NM        0.151  0.010 **
    Residuals 0.698          
    Total     1.000          
    ---
    Signif. codes:  
      0 '***' 0.001 '**' 0.01 '*' 0.05
      '.' 0.1 ' ' 1
    

Permutational multivariate analysis of variance

Mac Nally (1989) studied geographic variation in forest bird communities. His data set consists of the maximum abundance for 102 bird species from 37 sites that where further classified into five different forest types (Gippsland manna gum, montane forest, woodland, box-ironbark and river redgum and mixed forest). He was primarily interested in determining whether the bird assemblages differed between forest types.

In Tutorial 15.1 we explored the bird communities using non-metric multidimensional scaling. We then overlay the environmental habitat variable with a permutation test that explored the association of the habitat levels with the ordination axes. That is, we investigate the degree to which a treatment (habitat level) aligns with the axes.

Alternatively, we could use a multivariate analysis of variance approach to investigate the effects of habitat on the bird community structure.

Download the macnally data set
Format of macnally_full.csv data file
SITEHABITATV1GST..
Reedy LakeMixed3.4..
PearcedaleGippland Manna Gum3.4..
VarneetGippland Manna Gum8.4..
CranbourneGippland Manna Gum3.0..
LysterfieldMixed5.6..
........
trout

Open the macnally_full data set.
Show code
> macnally <- read.csv('../downloads/data/macnally_full.csv')
> macnally
                         HABITAT  GST
Reedy Lake                 Mixed  3.4
Pearcedale           Gipps.Manna  3.4
Warneet              Gipps.Manna  8.4
Cranbourne           Gipps.Manna  3.0
Lysterfield                Mixed  5.6
Red Hill                   Mixed  8.1
Devilbend                  Mixed  8.3
Olinda                     Mixed  4.6
Fern Tree Gum     Montane Forest  3.2
Sherwin       Foothills Woodland  4.6
Heathcote Ju      Montane Forest  3.7
Warburton         Montane Forest  3.8
Millgrove                  Mixed  5.4
Ben Cairn                  Mixed  3.1
Panton Gap        Montane Forest  3.8
OShannassy                 Mixed  9.6
Ghin Ghin                  Mixed  3.4
Minto                      Mixed  5.6
Hawke                      Mixed  1.7
St Andrews    Foothills Woodland  4.7
Nepean        Foothills Woodland 14.0
Cape Schanck               Mixed  6.0
Balnarring                 Mixed  4.1
Bittern              Gipps.Manna  6.5
Bailieston          Box-Ironbark  6.5
Donna Buang                Mixed  1.5
Upper Yarra                Mixed  4.7
Gembrook                   Mixed  7.5
Arcadia            River Red Gum  3.1
Undera             River Red Gum  2.7
Coomboona          River Red Gum  4.4
Toolamba           River Red Gum  3.0
Rushworth           Box-Ironbark  2.1
Sayers              Box-Ironbark  2.6
Waranga                    Mixed  3.0
Costerfield         Box-Ironbark  7.1
Tallarook     Foothills Woodland  4.3
              EYR   GF  BTH GWH WTTR
Reedy Lake    0.0  0.0  0.0 0.0  0.0
Pearcedale    9.2  0.0  0.0 0.0  0.0
Warneet       3.8  0.7  2.8 0.0  0.0
Cranbourne    5.0  0.0  5.0 2.0  0.0
Lysterfield   5.6 12.9 12.2 9.5  2.1
Red Hill      4.1 10.9 24.5 5.6  6.7
Devilbend     7.1  6.9 29.1 4.2  2.0
Olinda        5.3 11.1 28.2 3.9  6.5
Fern Tree Gum 5.2  8.3 18.2 3.8  4.2
Sherwin       1.2  4.6  6.5 2.3  5.2
Heathcote Ju  2.5  6.3 24.9 2.8  7.4
Warburton     6.5 11.1 36.1 6.2  8.5
Millgrove     6.5 11.9 19.6 3.3  8.6
Ben Cairn     9.3 11.1 25.9 9.3  8.3
Panton Gap    3.8 10.3 34.6 7.9  4.8
OShannassy    4.0  5.4 34.9 7.0  5.1
Ghin Ghin     2.7  9.1 16.1 1.3  3.2
Minto         3.3 13.3 28.0 7.0  8.3
Hawke         2.6  5.5 16.0 4.3  6.7
St Andrews    3.6  6.0 25.2 3.7  7.5
Nepean        5.6  5.5 20.0 3.0  6.6
Cape Schanck  4.9  4.9 16.2 3.4  2.6
Balnarring    4.9 10.7 21.2 3.9  0.0
Bittern       9.7  7.8 14.4 5.2  0.0
Bailieston    2.5  5.1  5.6 4.3  5.7
Donna Buang   0.0  2.2  9.6 6.7  3.0
Upper Yarra   3.1  7.0 17.1 8.3 12.8
Gembrook      7.5 12.7 16.4 4.7  6.4
Arcadia       0.0  1.2  0.0 1.2  0.0
Undera        0.0  2.2  0.0 1.3  6.5
Coomboona     0.0  2.1  0.0 0.0  3.3
Toolamba      0.0  0.5  0.0 0.8  0.0
Rushworth     1.1  3.2  1.8 0.5  4.8
Sayers        0.0  1.1  7.5 1.6  5.2
Waranga       1.6  1.5  3.0 0.0  3.0
Costerfield   2.2  4.5  9.0 2.7  6.0
Tallarook     2.9  8.7 14.4 2.9  5.8
              WEHE WNHE  SFW WBSW   CR
Reedy Lake     0.0 11.9  0.4  0.0  1.1
Pearcedale     0.0 11.5  8.3 12.6  0.0
Warneet       10.7 12.3  4.9 10.7  0.0
Cranbourne     3.0 10.0  6.9 12.0  0.0
Lysterfield    7.9 28.6  9.2  5.0 19.1
Red Hill       9.4  6.7  0.0  8.9 12.1
Devilbend      7.1 27.4 13.1  2.8  0.0
Olinda         2.6 10.9  3.1  8.6  9.3
Fern Tree Gum  2.8  9.0  3.8  5.6 14.1
Sherwin        0.6  3.6  3.8  3.0  7.5
Heathcote Ju   1.3  4.7  5.5  9.5  5.7
Warburton      2.3 25.4  8.2  5.9 10.5
Millgrove      2.5 11.9  4.3  5.4 10.8
Ben Cairn      2.8  2.8  2.8  8.3 18.5
Panton Gap     2.9  3.7  4.8  7.2  5.9
OShannassy     2.6  6.4  3.9 11.3 11.6
Ghin Ghin      4.7  0.0 22.0  5.8  7.4
Minto          7.0 38.9 10.5  7.0 14.0
Hawke          3.5  5.9  6.7 10.0  3.7
St Andrews     4.7 10.0  0.0  0.0  4.0
Nepean         7.0  3.3  7.0 10.0  4.7
Cape Schanck   2.8  9.4  6.6  7.8  5.1
Balnarring     5.1  2.9 12.1  6.1  0.0
Bittern       11.5 12.5 20.7  4.9  0.0
Bailieston     6.2  6.2  1.2  0.0  0.0
Donna Buang    8.1  0.0  0.0  7.3  8.1
Upper Yarra    1.3  6.4  2.3  5.4  5.4
Gembrook       1.6  8.9  9.3  6.4  4.8
Arcadia        0.0  1.8  0.7  0.0  0.0
Undera         0.0  0.0  6.5  0.0  0.0
Coomboona      0.0  0.0  0.8  0.0  0.0
Toolamba       0.0  0.0  1.6  0.0  0.0
Rushworth      0.9  5.3  4.8  0.0  1.1
Sayers         3.6  6.9  6.7  0.0  2.7
Waranga        0.0 14.5  6.7  0.0  0.7
Costerfield    2.5  7.7  9.5  0.0  7.7
Tallarook      2.8 11.1  2.9  0.0  3.8
               LK  RWB AUR STTH   LR
Reedy Lake    3.8  9.7 0.0  0.0  4.8
Pearcedale    0.5 11.6 0.0  0.0  3.7
Warneet       1.9 16.6 2.3  2.8  5.5
Cranbourne    2.0 11.0 1.5  0.0 11.0
Lysterfield   3.6  5.7 8.8  7.0  1.6
Red Hill      6.7  2.7 0.0 16.8  3.4
Devilbend     2.8  2.4 2.8 13.9  0.0
Olinda        3.8  0.6 1.3 10.2  0.0
Fern Tree Gum 3.2  0.0 0.0 12.2  0.6
Sherwin       2.4  0.6 0.0 11.3  5.8
Heathcote Ju  2.9  0.0 1.8 12.0  0.0
Warburton     3.1  9.8 1.6  7.6 15.0
Millgrove     6.5  2.7 2.0  8.6  0.0
Ben Cairn     3.1  0.0 3.1 12.0  3.3
Panton Gap    3.1  0.6 3.8 17.3  2.4
OShannassy    2.3  0.0 2.3  7.8  0.0
Ghin Ghin     4.5  0.0 0.0  8.1  2.7
Minto         5.2  1.7 5.2 25.2  0.0
Hawke         2.1  0.5 1.5  9.0  4.8
St Andrews    5.1  2.8 3.7 15.8  3.4
Nepean        3.3  2.1 3.7 12.0  2.2
Cape Schanck  5.2 21.3 0.0  0.0  4.3
Balnarring    2.7  0.0 0.0  4.9 16.5
Bittern       0.0 16.1 5.2  0.0  0.0
Bailieston    1.6  5.0 4.1  9.8  0.0
Donna Buang   1.5  2.2 0.7  5.2  0.0
Upper Yarra   2.4  0.6 2.3  6.4  0.0
Gembrook      3.6 14.5 4.7 24.3  2.4
Arcadia       1.8  0.0 2.5  0.0  2.7
Undera        0.0  0.0 2.2  7.5  3.1
Coomboona     2.8  0.0 2.2  3.1  1.7
Toolamba      2.0  0.0 2.5  0.0  2.5
Rushworth     1.1 26.3 1.6  3.2  0.0
Sayers        1.6  8.0 1.6  7.5  2.7
Waranga       4.0 23.0 1.6  0.0  8.9
Costerfield   2.2  8.9 1.9  9.3  1.1
Tallarook     2.9  2.9 1.9  4.6 10.3
              WPHE  YTH  ER PCU  ESP
Reedy Lake    27.3  0.0 5.1 0.0  0.0
Pearcedale    27.6  0.0 2.7 0.0  3.7
Warneet       27.5  0.0 5.3 0.0  0.0
Cranbourne    20.0  0.0 2.1 0.0  2.0
Lysterfield    0.0  0.0 1.4 0.0  3.5
Red Hill       0.0  0.0 2.2 0.0  3.4
Devilbend     16.7  0.0 0.0 0.0  5.5
Olinda         0.0  0.0 1.2 0.0  5.1
Fern Tree Gum  0.0  0.0 1.3 2.8  7.1
Sherwin        0.0  9.6 2.3 2.9  0.6
Heathcote Ju   0.0  0.0 0.0 2.8  0.9
Warburton      0.0  0.0 0.0 1.8  7.6
Millgrove      0.0  0.0 6.5 2.5  5.4
Ben Cairn      0.0  0.0 0.0 2.5  7.4
Panton Gap     0.0  0.0 0.0 3.1  9.2
OShannassy     0.0  0.0 0.0 1.5  3.1
Ghin Ghin      8.4  8.4 3.4 0.0  0.0
Minto         15.4  0.0 0.0 0.0  3.3
Hawke          0.0  0.0 0.0 2.1  3.7
St Andrews     0.0  9.0 0.0 3.7  5.6
Nepean         0.0  0.0 3.7 0.0  4.0
Cape Schanck   0.0  0.0 6.4 0.0  4.6
Balnarring     0.0  0.0 9.1 0.0  3.9
Bittern       27.7  0.0 2.3 0.0  0.0
Bailieston     0.0  8.7 0.0 0.0  0.0
Donna Buang    0.0  0.0 0.0 1.5  7.4
Upper Yarra    0.0  0.0 0.0 1.3  2.3
Gembrook       0.0  0.0 3.6 0.0 26.6
Arcadia       27.6  0.0 4.3 3.7  0.0
Undera        13.5 11.5 2.0 0.0  0.0
Coomboona     13.9  5.6 4.6 1.1  0.0
Toolamba      16.0  0.0 5.0 0.0  0.0
Rushworth      0.0 10.7 3.2 0.0  0.0
Sayers         0.0 20.2 1.1 0.0  0.0
Waranga       25.3  2.2 3.4 0.0  0.0
Costerfield    0.0 15.8 1.1 0.0  0.0
Tallarook      0.0  2.9 0.0 0.0  5.8
              SCR RBFT BFCS WAG WWCH
Reedy Lake    0.0  0.0  0.6 1.9  0.0
Pearcedale    0.0  1.1  1.1 3.4  0.0
Warneet       0.0  0.0  1.5 2.1  0.0
Cranbourne    0.0  5.0  1.4 3.4  0.0
Lysterfield   0.7  0.0  2.7 0.0  0.0
Red Hill      0.0  0.7  2.0 0.0  0.0
Devilbend     0.0  0.0  3.6 0.0  0.0
Olinda        0.0  0.7  0.0 0.0  0.0
Fern Tree Gum 0.0  1.9  0.6 0.0  0.0
Sherwin       3.0  0.0  1.2 0.0  9.8
Heathcote Ju  2.6  0.0  0.0 0.0 11.7
Warburton     0.0  0.9  1.5 0.0  0.0
Millgrove     2.0  5.4  2.2 0.0  0.0
Ben Cairn     0.0  0.0  0.0 0.0  0.0
Panton Gap    0.0  3.7  0.0 0.0  0.0
OShannassy    0.0  9.6  0.7 0.0  0.0
Ghin Ghin     0.0 44.7  0.4 1.3  0.0
Minto         0.0 10.5  0.0 0.0  0.0
Hawke         0.0  0.0  0.7 0.0  3.2
St Andrews    4.0  0.9  0.0 0.0 10.0
Nepean        2.0  0.0  1.1 0.0  0.0
Cape Schanck  0.0  3.4  0.0 0.0  0.0
Balnarring    0.0  2.7  1.0 0.0  0.0
Bittern       0.0  2.3  2.3 6.3  0.0
Bailieston    6.2  0.0  1.6 0.0 10.0
Donna Buang   0.0  0.0  1.5 0.0  0.0
Upper Yarra   0.0  6.4  0.9 0.0  0.0
Gembrook      4.7  2.8  0.0 0.0  0.0
Arcadia       0.0  0.6  2.1 4.9  8.0
Undera        0.0  0.0  1.9 2.5  6.5
Coomboona     0.0  0.0  6.9 3.3  5.6
Toolamba      0.0  0.8  3.0 3.5  5.0
Rushworth     1.1  2.7  1.1 0.0  9.6
Sayers        2.6  0.0  0.5 0.0  5.6
Waranga       0.0 10.9  1.6 2.4  8.9
Costerfield   5.5  0.0  1.3 0.0  5.7
Tallarook     5.6  0.0  1.5 0.0  2.8
              NHHE   VS CST  BTR AMAG
Reedy Lake     0.0  0.0 1.7 12.5  8.6
Pearcedale     6.9  0.0 0.9  0.0  0.0
Warneet        3.0  0.0 1.5  0.0  0.0
Cranbourne    32.0  0.0 1.4  0.0  0.0
Lysterfield    6.4  0.0 0.0  0.0  0.0
Red Hill       2.2  5.4 0.0  0.0  0.0
Devilbend      5.6  5.6 4.6  0.0  0.0
Olinda         0.0  1.9 0.0  0.0  0.0
Fern Tree Gum  0.0  4.2 0.0  0.0  0.0
Sherwin        0.0  5.1 0.0  0.0  0.0
Heathcote Ju   0.0  0.0 0.0  0.0  0.0
Warburton      0.0  3.9 2.5  0.0  0.0
Millgrove      0.0  5.4 3.2  0.0  0.0
Ben Cairn      2.1  0.0 0.0  0.0  0.0
Panton Gap     0.0  0.0 0.0  0.0  0.0
OShannassy     0.0  0.0 0.0  0.0  0.0
Ghin Ghin      0.4  0.6 0.6  0.0  8.4
Minto          0.0  0.0 3.5  0.0  6.7
Hawke          0.0  0.0 1.5  0.0  0.0
St Andrews     0.0  0.0 2.7  0.0  0.0
Nepean         1.1  4.5 0.0  0.0  0.0
Cape Schanck  33.3  0.0 0.0  0.0  0.0
Balnarring     4.9 10.1 0.0  0.0  0.0
Bittern        2.6  0.0 1.3  0.0  0.0
Bailieston     0.0  2.5 0.0  0.0  0.0
Donna Buang    0.0  0.0 1.5  0.0  0.0
Upper Yarra    0.0  0.0 0.0  0.0  0.0
Gembrook       4.8  9.7 0.0  0.0  0.0
Arcadia        0.0  0.0 1.8  6.7  3.1
Undera         0.0  3.8 0.0  4.0  3.2
Coomboona      0.0  3.3 1.0  4.2  5.4
Toolamba       0.0  0.0 0.8  7.0  3.7
Rushworth      0.0  2.7 0.0  0.0  0.0
Sayers         0.0  0.0 0.0  0.0  0.0
Waranga        0.0  0.0 0.7  5.5  2.7
Costerfield    0.0  3.3 1.1  5.5  0.0
Tallarook      0.0  3.8 3.4  0.0  0.0
               SCC  RWH  WSW STP YFHE
Reedy Lake    12.5  0.6  0.0 4.8  0.0
Pearcedale     0.0  2.3  5.7 0.0  1.1
Warneet        0.0  1.4 24.3 3.1 11.7
Cranbourne     0.0  0.0 10.0 4.0  0.0
Lysterfield    0.0  7.0  0.0 0.0  6.1
Red Hill       0.0  6.8  0.0 0.0  0.0
Devilbend      0.0  7.3  3.6 2.4  0.0
Olinda         0.0  9.1  0.0 0.0  0.0
Fern Tree Gum  0.0  4.5  0.0 0.0  0.0
Sherwin        0.0  6.3  0.0 3.5  2.3
Heathcote Ju   0.0  5.9  0.0 4.4  4.7
Warburton      0.0  0.0  0.0 2.7  6.2
Millgrove      0.0  8.8  0.0 2.2  5.4
Ben Cairn      0.0  0.0  0.0 0.0  0.9
Panton Gap     0.0  0.0  0.0 1.2  4.9
OShannassy     0.0  3.9  0.0 3.1  8.5
Ghin Ghin     47.6  6.1  4.7 1.2  6.7
Minto         80.5  5.0  0.0 5.0 26.7
Hawke          0.0  4.2  0.0 0.0  3.2
St Andrews     0.0  8.4  0.0 5.1  5.0
Nepean         0.0  3.3  0.0 0.0  1.0
Cape Schanck   0.0  2.6  0.0 0.0  3.4
Balnarring     0.0  4.9  0.0 0.0  1.9
Bittern        0.0  0.0 12.5 2.3 19.5
Bailieston     0.0  7.3  0.0 0.0  1.1
Donna Buang    0.0  0.0  0.0 2.2  0.0
Upper Yarra    0.0  7.0  0.0 6.4  3.9
Gembrook       0.0 10.9  0.0 0.0 20.2
Arcadia       24.0  0.0  2.7 8.2  0.0
Undera        16.0  1.5  1.0 8.7  0.0
Coomboona     30.4  1.1  0.0 8.1  0.0
Toolamba      29.9  0.0  6.0 4.5  0.0
Rushworth      0.0  4.3  0.0 1.1 14.4
Sayers         0.0  3.7  0.0 0.0  8.0
Waranga        0.0  1.4  3.4 2.7 16.3
Costerfield    0.0  6.2  0.0 6.6  0.6
Tallarook      0.0  9.5  0.0 1.9  5.8
              WHIP GAL  FHE BRTH SPP
Reedy Lake     0.0 4.8 26.2  0.0 0.0
Pearcedale     0.0 0.0  0.0  0.0 1.1
Warneet        0.0 0.0  0.0  0.0 4.6
Cranbourne     0.0 2.8  0.0  0.0 0.8
Lysterfield    0.0 0.0  0.0  0.0 5.4
Red Hill       0.0 0.0  0.0  0.0 3.4
Devilbend      0.0 0.0  0.0  0.0 0.0
Olinda         2.0 0.0  0.0  0.0 2.0
Fern Tree Gum  3.2 0.0  0.0  0.0 2.6
Sherwin        0.0 0.0  0.0  6.0 4.2
Heathcote Ju   0.0 0.0  0.0  0.0 3.7
Warburton      3.3 0.0  0.0  0.0 6.2
Millgrove      2.6 0.0  0.0  0.0 5.3
Ben Cairn      3.7 0.0  0.0  0.0 3.7
Panton Gap     3.8 0.0  0.0  0.0 1.9
OShannassy     2.9 0.0  0.0  0.0 2.2
Ghin Ghin      0.0 0.0  0.0  0.0 4.5
Minto          0.0 0.0  0.0  0.0 5.0
Hawke          0.0 0.0  0.0  5.2 3.7
St Andrews     0.0 0.0  0.0 10.0 5.1
Nepean         0.0 0.0  0.0  0.0 4.7
Cape Schanck   0.0 0.0  0.0  0.0 0.0
Balnarring     0.0 0.0  0.0  0.0 0.0
Bittern        0.0 0.0  0.0  0.0 3.5
Bailieston     0.0 0.0  1.2  0.0 2.8
Donna Buang    3.7 0.0  0.0  0.0 3.7
Upper Yarra    0.9 0.0  0.0  0.0 3.9
Gembrook       0.0 0.0  0.0  0.0 4.5
Arcadia        0.0 4.1  0.0  0.0 0.0
Undera         0.0 8.6  0.0  0.0 0.0
Coomboona      0.0 5.4  0.0  0.0 0.0
Toolamba       0.0 7.8  0.0  0.0 0.0
Rushworth      0.0 0.0  9.6 11.7 3.2
Sayers         0.0 0.0  3.1  9.1 5.7
Waranga        0.0 5.9 14.8  0.0 2.2
Costerfield    0.0 0.0 15.9 13.9 6.3
Tallarook      0.0 0.0  0.0 30.6 8.3
               SIL GCU MUSK MGLK BHHE
Reedy Lake     0.0 0.0 13.1  1.7  1.1
Pearcedale     0.0 0.0  0.0  0.0  0.0
Warneet        0.0 0.0  0.0  0.0  0.0
Cranbourne     0.0 0.0  0.0  1.4  0.0
Lysterfield    0.0 0.0  0.0  0.0  0.0
Red Hill       2.7 1.4  0.0  0.0  0.0
Devilbend      1.2 0.0  0.0  0.0  0.0
Olinda         0.0 2.6  0.0  0.0  0.0
Fern Tree Gum  4.9 1.3  0.0  0.0  0.0
Sherwin        1.2 0.0  0.0  0.0  0.0
Heathcote Ju   2.5 1.5  0.0  0.0  0.0
Warburton      4.6 5.7  0.0  0.0  7.4
Millgrove      3.2 1.1  0.0  0.0  0.9
Ben Cairn     12.0 2.1  0.0  0.0  0.0
Panton Gap     2.4 2.9  0.0  0.0  7.7
OShannassy     3.7 0.0  0.0  0.0  0.0
Ghin Ghin      6.7 0.0  0.0  2.7  0.0
Minto         17.5 0.0  0.0  0.0  0.0
Hawke          0.0 0.0  0.0  0.0  0.5
St Andrews     0.9 3.4  0.0  0.0  1.0
Nepean         1.1 0.0  0.0  0.0  5.0
Cape Schanck   0.0 0.0  0.0  0.0  0.0
Balnarring    10.3 0.0  0.0  0.0  3.9
Bittern        8.0 0.0  0.0  0.0  2.3
Bailieston     0.8 0.0  0.0  0.0 14.1
Donna Buang    4.4 4.4  0.0  0.0  0.0
Upper Yarra    2.3 0.0  0.0  0.0  0.6
Gembrook       1.8 1.8  0.0  0.0  5.6
Arcadia        9.8 0.0  0.0  3.7  0.0
Undera         0.0 0.0  0.0  1.6  0.0
Coomboona      0.0 0.0  0.0  2.3  0.0
Toolamba       0.0 0.0  0.0  1.6  0.0
Rushworth      2.7 1.1 16.0  0.0  9.9
Sayers         3.7 1.6  3.1  0.0  7.8
Waranga        0.0 0.0 20.0  0.0  8.1
Costerfield    3.9 1.6  3.8  0.0 10.8
Tallarook      0.0 0.9  0.0  0.0  2.3
              RFC YTBC LYRE  CHE OWH
Reedy Lake    0.0  0.0  0.0  0.0 0.0
Pearcedale    0.0  0.0  0.0  0.0 0.0
Warneet       0.0  0.0  0.0  0.0 0.0
Cranbourne    0.0  0.0  0.0  0.0 0.0
Lysterfield   0.0  0.0  0.0  0.0 0.0
Red Hill      0.0  0.0  0.0  0.0 0.0
Devilbend     0.0  0.0  0.0  0.0 0.0
Olinda        0.0  1.9  0.0  0.0 0.0
Fern Tree Gum 0.0  2.6  0.6  0.6 0.0
Sherwin       0.0  0.0  0.0  0.0 0.0
Heathcote Ju  0.0  1.8  0.0  0.0 0.0
Warburton     0.0  3.9  1.4  0.0 0.0
Millgrove     0.0  1.9  0.0  2.2 0.0
Ben Cairn     0.0  3.7  0.0  5.6 3.7
Panton Gap    0.0  0.0  2.4  1.8 0.6
OShannassy    0.0  1.5  1.6  8.1 0.0
Ghin Ghin     1.3  0.0  0.0  0.0 0.0
Minto         2.8  0.0  0.0  0.0 2.8
Hawke         0.0  0.0  1.7  0.0 0.0
St Andrews    0.0  0.0  0.0  0.0 0.0
Nepean        0.0  0.0  0.0  5.6 0.0
Cape Schanck  0.0  0.0  0.0  5.2 0.0
Balnarring    0.0  0.0  0.0  0.0 0.0
Bittern       0.0  0.0  0.0  0.0 0.0
Bailieston    0.0  0.0  0.0  0.0 0.0
Donna Buang   0.0  3.6  3.0  1.5 0.7
Upper Yarra   0.0  0.8  0.9  0.9 0.0
Gembrook      0.0  5.5  0.0 11.2 0.0
Arcadia       4.3  0.0  0.0  0.0 0.0
Undera        1.5  0.0  0.0  0.0 0.0
Coomboona     2.6  0.0  0.0  0.0 0.0
Toolamba      2.0  0.0  0.0  0.0 0.0
Rushworth     0.0  0.0  0.0  0.0 0.0
Sayers        0.0  0.0  0.0  0.0 0.0
Waranga       2.7  0.0  0.0  0.0 0.0
Costerfield   0.0  0.0  0.0  0.0 0.0
Tallarook     0.0  0.0  0.0  0.0 0.0
               TRM  MB STHR LHE FTC
Reedy Lake    15.0 0.0  0.0 0.0 0.0
Pearcedale     0.0 0.0  0.0 0.0 2.3
Warneet        0.0 0.0  0.0 0.0 0.0
Cranbourne     0.0 1.0  0.0 0.0 0.0
Lysterfield    0.0 0.0  0.0 0.0 2.1
Red Hill       0.0 0.0  0.0 0.0 0.0
Devilbend      0.0 3.6  0.0 0.0 0.0
Olinda         0.0 0.0  0.0 0.0 2.6
Fern Tree Gum  0.0 0.0  0.0 0.0 2.6
Sherwin        0.0 1.2  0.0 0.0 1.7
Heathcote Ju   0.0 0.0  0.9 0.0 1.6
Warburton      0.0 0.0  1.4 2.1 2.1
Millgrove      0.0 0.0  0.0 0.0 3.5
Ben Cairn      0.0 0.0  0.0 4.1 4.6
Panton Gap     0.0 0.0  0.9 1.8 3.1
OShannassy     0.0 0.0  0.0 2.4 5.4
Ghin Ghin      0.0 0.0  0.0 0.0 2.4
Minto          0.0 0.0  0.0 0.0 1.7
Hawke          0.0 0.0  0.0 0.0 1.1
St Andrews     0.0 1.7  0.0 0.0 3.4
Nepean         0.0 2.2  0.0 0.0 1.9
Cape Schanck   0.0 0.0  0.0 0.0 1.7
Balnarring     0.0 4.9  0.0 0.0 1.0
Bittern        0.0 0.0  0.0 0.0 2.3
Bailieston     0.0 0.0  0.0 0.0 0.0
Donna Buang    0.0 0.0  0.0 2.2 2.2
Upper Yarra    0.0 0.0  0.0 0.9 2.3
Gembrook       0.0 0.8  2.4 1.9 2.8
Arcadia        2.5 0.0  0.0 0.0 0.6
Undera         0.0 0.5  0.0 0.0 0.0
Coomboona      0.6 0.0  0.0 0.0 0.0
Toolamba       3.3 0.0  0.0 0.0 0.0
Rushworth      0.0 0.0  0.0 0.0 0.0
Sayers         0.0 0.0  0.0 0.0 0.0
Waranga        4.8 0.0  0.0 0.0 0.0
Costerfield    0.0 0.0  0.0 0.0 1.6
Tallarook      0.0 0.0  0.0 0.0 2.9
              PINK OBO  YR LFB SPW RBTR
Reedy Lake     0.0 0.0 0.0 2.9 0.0  0.0
Pearcedale     0.0 0.0 0.0 0.0 0.0  0.0
Warneet        0.0 0.0 0.0 0.0 0.0  0.0
Cranbourne     0.0 0.0 0.0 0.0 0.0  0.0
Lysterfield    0.0 1.4 0.0 0.0 0.0  0.0
Red Hill       0.0 0.0 0.0 0.0 0.0  0.0
Devilbend      0.0 0.0 0.0 0.0 0.0  0.0
Olinda         0.0 0.0 0.0 0.0 0.0  0.0
Fern Tree Gum  0.0 0.0 0.0 0.0 0.0  0.0
Sherwin        0.0 1.2 0.0 0.0 0.0  0.6
Heathcote Ju   0.0 0.0 0.0 0.0 0.0  1.6
Warburton      0.0 0.0 0.0 0.0 0.0  0.0
Millgrove      0.0 0.0 0.0 0.0 0.0  1.1
Ben Cairn      0.0 0.0 0.0 0.0 0.0  0.0
Panton Gap     1.0 0.0 0.0 0.0 0.0  0.0
OShannassy     2.2 0.0 0.0 0.0 0.0  0.0
Ghin Ghin      0.0 1.2 0.0 0.0 0.0  0.0
Minto          0.0 0.0 0.0 0.0 0.0  0.0
Hawke          0.0 0.0 0.0 0.0 0.0  1.7
St Andrews     0.0 0.0 0.0 0.0 0.0  0.0
Nepean         0.0 0.0 0.0 0.0 0.0  0.0
Cape Schanck   0.0 0.0 0.0 0.0 0.0  0.0
Balnarring     0.0 0.0 0.0 0.0 0.0  0.0
Bittern        0.0 0.0 0.0 0.0 0.0  0.0
Bailieston     0.0 3.3 0.0 0.0 0.0  0.0
Donna Buang    0.8 0.0 0.0 0.0 0.0  0.0
Upper Yarra    0.0 0.0 0.0 0.0 0.0  0.0
Gembrook       0.0 0.0 0.0 0.0 0.0  0.0
Arcadia        0.0 2.5 0.0 1.4 0.0  0.0
Undera         0.0 0.0 3.2 1.0 0.0  0.0
Coomboona      0.0 0.0 2.6 5.9 0.0  0.0
Toolamba       0.0 2.0 0.0 6.6 0.0  0.0
Rushworth      0.0 1.1 0.0 0.0 1.1  0.0
Sayers         0.0 0.5 0.0 0.0 0.0  0.0
Waranga        0.0 2.4 0.0 0.0 0.0  0.0
Costerfield    0.0 1.1 0.0 0.0 3.3  0.0
Tallarook      0.0 0.0 0.0 0.0 2.9  0.0
              DWS BELL  LWB CBW GGC PIL
Reedy Lake    0.4  0.0  0.0 0.0 0.0 0.0
Pearcedale    0.0  0.0  0.0 0.0 0.0 0.0
Warneet       3.5  0.0  0.0 0.0 0.0 0.0
Cranbourne    5.5  0.0  4.0 0.0 0.0 0.0
Lysterfield   0.0 22.1  0.0 0.0 0.0 0.0
Red Hill      0.0  0.0  0.0 0.0 0.0 0.0
Devilbend     1.8  0.0  0.0 0.0 0.0 0.0
Olinda        0.0  0.0  0.0 0.0 0.0 0.0
Fern Tree Gum 0.0  0.0  0.0 0.0 1.3 1.3
Sherwin       0.0  0.0  0.0 0.6 0.0 0.0
Heathcote Ju  0.0  0.0  0.0 0.0 0.0 0.0
Warburton     0.0  0.0  0.0 0.0 1.8 0.7
Millgrove     0.9  0.0  0.0 0.0 0.0 0.0
Ben Cairn     0.0  0.0  0.0 0.0 3.7 2.8
Panton Gap    0.0  0.0  0.0 0.0 4.8 1.8
OShannassy    0.0  0.0  0.0 0.0 0.0 2.2
Ghin Ghin     0.0  0.0  0.0 0.0 0.0 0.0
Minto         0.0  0.0  0.0 0.0 0.0 0.0
Hawke         0.0  0.0  0.0 0.0 0.0 0.0
St Andrews    0.9 15.0  0.0 0.0 0.0 0.0
Nepean        0.0  0.0  0.0 0.0 0.0 0.0
Cape Schanck  0.0  0.0 32.2 0.0 0.0 0.0
Balnarring    0.0  0.0 16.5 1.0 0.0 0.0
Bittern       0.0  0.0  5.8 0.0 0.0 0.0
Bailieston    0.0  0.0  0.0 1.7 0.0 0.0
Donna Buang   0.0  0.0  0.0 0.0 0.7 4.4
Upper Yarra   0.7  0.0  0.0 0.0 0.0 0.0
Gembrook      0.0  0.0  0.0 0.0 0.0 0.0
Arcadia       0.0  0.0  0.0 0.0 0.0 0.0
Undera        0.0  0.0  0.0 0.0 0.0 0.0
Coomboona     0.0  0.0  0.0 0.0 0.0 0.0
Toolamba      0.0  0.0  0.0 0.0 0.0 0.0
Rushworth     0.0  0.0  0.0 0.0 0.0 0.0
Sayers        0.0  0.0  0.0 1.6 0.0 0.0
Waranga       4.8  0.0  0.0 0.0 0.0 0.0
Costerfield   0.6  0.0  0.0 0.0 0.0 0.0
Tallarook     0.0  0.0  0.0 0.0 0.0 0.0
              SKF  RSL PDOV CRP  JW
Reedy Lake    1.9  6.7  0.0 0.0 0.0
Pearcedale    0.0  0.0  0.0 0.0 0.0
Warneet       0.0  0.0  0.0 0.0 0.0
Cranbourne    0.0  0.8  0.0 0.0 0.0
Lysterfield   0.0  0.0  0.0 0.0 0.0
Red Hill      0.0  0.0  0.0 0.0 0.0
Devilbend     0.0  0.0  0.0 0.0 0.0
Olinda        0.0  0.0  0.0 0.0 0.0
Fern Tree Gum 0.0  0.0  0.0 0.0 0.0
Sherwin       2.3  0.0  0.0 0.0 0.0
Heathcote Ju  0.0  0.0  0.0 0.0 0.0
Warburton     0.0  0.0  0.0 0.0 0.0
Millgrove     0.0  0.0  0.0 0.0 0.0
Ben Cairn     0.0  0.0  0.0 0.0 0.0
Panton Gap    0.0  0.0  0.0 0.0 0.0
OShannassy    0.0  0.0  0.0 0.0 0.0
Ghin Ghin     1.8  0.0  0.0 0.0 0.0
Minto         1.7  0.0  0.0 0.0 0.0
Hawke         0.0  0.0  0.0 0.0 0.0
St Andrews    2.0  0.0  0.0 0.0 0.0
Nepean        0.0  0.0  0.0 0.0 0.0
Cape Schanck  0.0  0.0  0.0 0.0 0.0
Balnarring    0.0  0.0  0.0 0.0 0.0
Bittern       0.0  0.0  0.0 0.0 0.0
Bailieston    0.0  0.0  0.0 0.0 0.0
Donna Buang   0.0  0.0  0.0 0.0 0.0
Upper Yarra   1.6  0.0  0.0 0.0 0.0
Gembrook      0.0  0.0  0.0 0.0 0.0
Arcadia       2.1 11.0  3.1 1.8 1.2
Undera        2.1  0.0  0.0 0.0 3.8
Coomboona     2.2  1.1  0.0 1.1 2.8
Toolamba      1.0  5.0  0.4 0.0 0.0
Rushworth     0.0  1.4  0.0 0.0 0.0
Sayers        0.0  1.7  0.0 0.0 0.0
Waranga       1.4  0.0  0.8 0.0 0.0
Costerfield   0.0  0.0  0.0 0.0 0.0
Tallarook     1.7  0.0  0.0 0.0 0.0
              BCHE RCR GBB RRP LLOR
Reedy Lake     0.0 0.0 0.0 4.8  0.0
Pearcedale     0.0 0.0 0.0 0.0  0.0
Warneet        0.0 0.0 0.0 0.0  0.0
Cranbourne     0.0 0.0 0.0 0.0  0.0
Lysterfield    0.0 0.0 0.7 0.0  0.0
Red Hill       0.0 0.0 0.0 0.0  0.0
Devilbend      0.0 0.0 0.0 0.0  0.0
Olinda         0.0 0.0 0.0 0.0  0.0
Fern Tree Gum  0.0 0.0 0.6 0.0  0.0
Sherwin        0.0 0.0 0.0 0.0  0.0
Heathcote Ju   0.0 0.0 0.0 0.0  0.0
Warburton      0.0 0.0 0.0 0.0  0.0
Millgrove      0.0 0.0 0.0 0.0  0.0
Ben Cairn      0.0 0.0 0.0 0.0  0.0
Panton Gap     0.0 0.0 0.0 0.0  0.0
OShannassy     0.0 0.0 0.0 0.0  0.0
Ghin Ghin      0.0 0.0 0.0 0.0  0.0
Minto          0.0 0.0 0.0 0.0  0.0
Hawke          0.0 0.0 0.0 0.0  0.0
St Andrews     0.0 0.0 0.0 0.0  0.0
Nepean         0.0 0.0 0.0 0.0  0.0
Cape Schanck   0.0 0.0 0.0 0.0  0.0
Balnarring     0.0 0.0 0.0 0.0  0.0
Bittern        0.0 0.0 0.0 0.0  0.0
Bailieston     0.0 3.3 0.0 0.0  0.0
Donna Buang    0.0 0.0 0.0 0.0  0.0
Upper Yarra    0.0 0.0 0.9 0.0  0.0
Gembrook       0.0 0.0 0.0 0.0  0.0
Arcadia        0.0 0.0 0.0 0.0  0.0
Undera         0.0 0.0 0.0 0.0  0.0
Coomboona      0.0 0.0 0.0 0.0  0.0
Toolamba       1.2 0.0 0.0 0.0  0.0
Rushworth      0.0 0.9 0.0 0.0  0.0
Sayers         0.0 0.5 0.5 0.0  0.0
Waranga        0.0 0.0 0.8 4.8 10.9
Costerfield    0.0 0.5 0.0 0.0  0.0
Tallarook      0.0 0.0 0.0 0.0  0.0
              YTHE  RF SHBC AZKF SFC
Reedy Lake     0.0 0.0  0.0  0.0 0.0
Pearcedale     0.0 0.0  0.0  0.0 0.0
Warneet        0.0 0.0  1.4  0.0 0.0
Cranbourne     0.0 0.0  0.0  0.0 0.0
Lysterfield    0.0 0.0  0.7  0.0 0.0
Red Hill       0.0 1.4  0.0  0.0 3.4
Devilbend      0.0 0.0  0.0  0.0 0.0
Olinda         0.0 1.9  2.0  0.7 1.2
Fern Tree Gum  0.0 3.2  1.3  0.0 1.9
Sherwin        0.0 0.0  0.6  0.0 1.2
Heathcote Ju   0.0 0.0  1.5  0.0 1.6
Warburton      0.0 1.8  2.1  0.0 4.1
Millgrove      0.0 1.1  0.0  0.0 7.6
Ben Cairn      0.0 4.6  1.9  0.0 2.8
Panton Gap     0.0 1.8  3.7  0.0 1.8
OShannassy     0.0 1.6  1.6  0.0 0.0
Ghin Ghin      0.0 0.0  1.8  0.0 0.0
Minto          0.0 0.0  0.0  0.0 0.0
Hawke          0.0 0.0  0.0  0.0 0.0
St Andrews     0.0 0.0  0.0  0.0 0.0
Nepean         0.0 2.2  1.9  0.0 0.0
Cape Schanck   0.0 2.6  0.9  0.0 0.0
Balnarring     0.0 0.0  1.9  0.0 0.0
Bittern        0.0 0.0  0.0  0.0 0.0
Bailieston     0.0 0.0  0.0  0.0 0.0
Donna Buang    0.0 2.2  0.7  0.0 0.7
Upper Yarra    0.0 0.0  1.6  0.0 0.8
Gembrook       0.0 2.8  1.8  0.0 0.0
Arcadia        0.0 0.0  0.0  0.0 0.0
Undera         0.0 0.0  0.0  0.0 0.0
Coomboona      0.0 0.0  0.0  0.0 0.0
Toolamba       0.0 0.0  0.0  0.0 0.0
Rushworth      0.0 0.0  0.0  0.0 0.0
Sayers         0.0 0.0  0.0  0.0 0.0
Waranga        2.7 0.0  0.0  0.0 0.0
Costerfield    6.3 0.0  1.6  0.0 0.0
Tallarook      0.0 0.0  1.9  0.0 0.0
              YRTH ROSE BCOO LFC  WG
Reedy Lake     0.0  0.0  0.0 0.0 0.0
Pearcedale     0.0  0.0  0.0 0.0 0.0
Warneet        0.0  0.0  0.0 1.8 0.0
Cranbourne     0.0  0.0  0.0 0.0 0.0
Lysterfield    0.0  0.0  0.0 0.0 0.0
Red Hill       0.0  0.0  0.0 0.0 0.0
Devilbend      0.0  0.0  0.0 2.4 0.0
Olinda         0.0  0.6  0.0 0.0 0.0
Fern Tree Gum  0.0  0.0  0.6 0.0 0.0
Sherwin        0.0  0.0  0.0 0.0 0.0
Heathcote Ju   0.0  0.0  0.0 0.0 0.0
Warburton      0.0  0.0  0.0 0.0 0.0
Millgrove      0.0  0.0  0.0 0.0 0.0
Ben Cairn      0.0  1.9  2.8 0.0 0.0
Panton Gap     0.0  3.1  1.2 0.0 0.0
OShannassy     0.0  0.7  1.6 0.0 0.0
Ghin Ghin      3.9  0.0  0.0 1.2 0.0
Minto          0.0  0.0  0.0 0.0 0.0
Hawke          0.0  0.0  0.0 0.0 0.0
St Andrews     0.0  0.0  0.0 0.0 0.0
Nepean         0.0  0.0  0.0 0.0 0.0
Cape Schanck   0.0  0.0  0.0 0.0 0.0
Balnarring     0.0  0.0  0.0 0.0 0.0
Bittern        0.0  0.0  0.0 0.0 0.0
Bailieston     0.0  0.0  0.0 0.0 0.0
Donna Buang    0.0  1.5  0.0 0.0 0.0
Upper Yarra    0.0  0.0  0.0 0.0 0.0
Gembrook       0.0  0.0  0.0 5.5 0.0
Arcadia        0.0  0.0  0.0 2.5 0.0
Undera         5.9  0.0  0.0 0.0 3.1
Coomboona      0.0  0.0  0.0 0.0 1.6
Toolamba       0.0  0.0  0.0 0.0 0.0
Rushworth      0.0  0.0  0.0 0.0 0.5
Sayers         0.0  0.0  0.0 0.0 0.0
Waranga        0.0  0.0  0.0 0.0 0.0
Costerfield    0.0  0.0  0.0 0.0 0.0
Tallarook      0.0  0.0  0.0 0.0 0.0
              PCOO WTG NMIN NFB  DB
Reedy Lake     1.9 0.0  0.2 0.0 0.0
Pearcedale     0.0 0.0  0.0 0.0 0.0
Warneet        0.0 0.0  5.8 0.0 0.0
Cranbourne     0.0 0.0  3.1 0.0 0.0
Lysterfield    0.0 0.0  0.0 0.0 0.0
Red Hill       0.0 0.0  0.0 0.0 0.0
Devilbend      0.0 0.0  0.0 0.0 0.0
Olinda         0.0 0.0  0.0 0.0 0.0
Fern Tree Gum  0.0 0.0  0.0 0.0 0.0
Sherwin        0.0 0.0  0.0 0.0 0.0
Heathcote Ju   0.0 0.0  0.0 0.0 0.0
Warburton      0.0 0.0  0.0 0.0 0.0
Millgrove      0.0 0.0  0.0 0.0 0.0
Ben Cairn      0.0 0.0  0.0 0.0 0.0
Panton Gap     0.0 0.0  0.0 0.0 0.0
OShannassy     0.0 0.0  0.0 0.0 0.0
Ghin Ghin      0.0 0.8  0.0 1.8 1.2
Minto          0.0 0.0  0.0 0.0 0.0
Hawke          0.0 0.0  0.0 0.0 0.0
St Andrews     0.0 0.0  0.0 0.0 0.0
Nepean         0.0 0.0  0.0 0.0 0.0
Cape Schanck   0.0 0.0  0.0 0.0 0.0
Balnarring     0.0 0.0  0.0 0.0 0.0
Bittern        0.0 0.0  0.0 0.0 0.0
Bailieston     0.0 3.1  0.0 0.0 0.0
Donna Buang    0.0 0.0  0.0 0.0 0.0
Upper Yarra    0.0 0.0  0.0 0.0 0.0
Gembrook       0.0 0.0  0.0 0.0 0.0
Arcadia        0.0 0.0  0.0 0.0 1.4
Undera         0.0 3.1  0.0 1.5 0.0
Coomboona      0.0 1.6  5.4 0.0 1.6
Toolamba       0.0 0.0  5.7 0.8 1.5
Rushworth      0.0 0.5  0.0 0.0 0.0
Sayers         0.0 0.0  0.0 0.0 0.0
Waranga        0.0 0.0  1.4 2.4 0.0
Costerfield    0.0 0.0  0.0 0.0 0.0
Tallarook      0.0 0.0  0.0 0.0 0.0
              RBEE HBC  DF PCL FLAME
Reedy Lake     0.0 0.0 0.0 9.1   0.0
Pearcedale     0.0 0.0 0.0 0.0   0.0
Warneet        0.0 0.0 0.0 0.0   0.0
Cranbourne     0.0 0.0 0.0 0.0   0.0
Lysterfield    0.0 0.0 0.0 0.0   0.0
Red Hill       0.0 0.0 0.0 0.0   0.0
Devilbend      0.0 0.0 0.0 0.0   0.0
Olinda         0.0 0.0 0.0 0.0   0.0
Fern Tree Gum  0.0 0.0 0.0 0.0   0.0
Sherwin        0.0 0.0 0.0 0.0   0.0
Heathcote Ju   0.0 0.0 0.0 0.0   0.0
Warburton      0.0 0.0 0.0 0.0   0.0
Millgrove      0.0 0.0 0.0 0.0   0.0
Ben Cairn      0.0 0.0 0.0 0.0   0.0
Panton Gap     0.0 0.0 0.0 0.0   0.0
OShannassy     0.0 0.0 0.0 0.0   0.0
Ghin Ghin      0.0 0.0 0.0 0.0   2.6
Minto          0.0 0.0 0.0 0.0   0.0
Hawke          0.0 0.0 0.0 0.0   0.0
St Andrews     0.0 0.0 0.0 0.0   0.0
Nepean         0.0 0.0 0.0 0.0   0.0
Cape Schanck   0.0 0.0 0.0 0.0   0.0
Balnarring     0.0 0.0 0.0 0.0   1.9
Bittern        0.0 0.0 0.0 0.0   0.0
Bailieston     0.0 0.0 0.0 0.0   0.0
Donna Buang    0.0 0.0 0.0 0.0   2.9
Upper Yarra    0.0 0.0 0.0 0.0   0.0
Gembrook       0.0 0.0 0.0 0.0   0.0
Arcadia        1.4 0.0 0.0 0.0   1.8
Undera         1.0 0.0 0.0 0.0   5.9
Coomboona      0.0 0.0 0.0 0.0   1.7
Toolamba       0.5 0.4 0.0 0.0   0.0
Rushworth      0.0 0.5 0.0 0.0   0.0
Sayers         0.0 0.0 0.0 0.0   0.0
Waranga        0.7 0.0 2.1 0.0   0.0
Costerfield    1.1 0.0 1.1 0.0   0.0
Tallarook      0.0 0.0 3.8 0.0   0.0
              WWT WBWS LCOR KING
Reedy Lake    0.0  0.0  0.0  0.0
Pearcedale    0.0  0.0  0.0  0.0
Warneet       0.0  0.0  0.0  0.0
Cranbourne    0.0  0.0  0.0  0.0
Lysterfield   0.0  0.0  0.0  0.0
Red Hill      0.0  0.0  0.0  0.0
Devilbend     0.0  0.0  0.0  0.0
Olinda        0.0  0.0  0.0  0.0
Fern Tree Gum 0.0  0.0  0.0  0.0
Sherwin       0.0  0.0  0.0  0.0
Heathcote Ju  0.0  0.0  0.0  0.0
Warburton     0.0  0.0  0.0  0.8
Millgrove     0.0  0.0  0.0  0.0
Ben Cairn     0.0  0.0  0.0  0.0
Panton Gap    0.0  0.0  0.0  0.0
OShannassy    0.0  0.0  0.0  0.7
Ghin Ghin     0.0  0.0  0.0  0.0
Minto         0.0  0.0  0.0  0.0
Hawke         0.0  0.0  0.0  0.0
St Andrews    0.0  0.0  0.0  0.0
Nepean        0.0  0.0  0.0  0.0
Cape Schanck  0.0  0.0  0.0  0.0
Balnarring    0.0  0.0  0.0  0.0
Bittern       0.0  0.0  0.0  0.0
Bailieston    0.0  0.0  0.0  0.0
Donna Buang   0.0  0.0  0.0  0.0
Upper Yarra   0.0  0.0  0.0  0.0
Gembrook      0.0  0.0  0.0  0.0
Arcadia       2.1  0.0  4.8  0.0
Undera        0.0  0.0  0.0  0.0
Coomboona     0.0  0.0  0.0  0.0
Toolamba      2.5  0.0  0.0  0.0
Rushworth     0.0  0.0  0.0  0.0
Sayers        0.0  0.0  0.0  0.0
Waranga       0.0  1.6  0.0  0.0
Costerfield   0.0  0.0  0.0  0.0
Tallarook     0.0  0.0  0.0  0.0
  1. We will start at the same dissimilarity matrix used in Tutorial 15.1. That was a Wisconsin double standardization followed by a Bray-Curtis dissimilarity matrix.
    Show code
    > library(vegan)
    > macnally.dist <- vegdist(wisconsin(macnally[,c(-1,-2)]),"bray")
    
  2. Now rather than explore the relationships over restricted ordination space, we can instead perform a permutational multivariate analysis of variance.
    Show code
    > adonis(macnally.dist~macnally$HABITAT)
    
    Call:
    adonis(formula = macnally.dist ~ macnally$HABITAT) 
    
    Terms added sequentially (first to last)
    
                     Df SumsOfSqs MeanSqs
    macnally$HABITAT  5      3.50   0.699
    Residuals        31      4.60   0.148
    Total            36      8.09        
                     F.Model    R2 Pr(>F)
    macnally$HABITAT    4.72 0.432  0.001
    Residuals                0.568       
    Total                    1.000       
                        
    macnally$HABITAT ***
    Residuals           
    Total               
    ---
    Signif. codes:  
      0 '***' 0.001 '**' 0.01 '*' 0.05
      '.' 0.1 ' ' 1
    
  3. Clearly bird communities differ in different habitats. If we consider mixed forest to be a reference, how do the bird communities of the other habitat types compare to the mixed forest type
    Show code
    > habitat <-factor(macnally$HABITAT, levels=c("Mixed", "Box-Ironbark", "Foothills Woodland", "Gipps.Manna","Montane Forest","River Red Gum"))
    > mm <- model.matrix(~habitat)
    > colnames(mm) <-gsub("habitat","",colnames(mm))
    > mm <- data.frame(mm)
    > macnally.adonis<-adonis(macnally.dist~Box.Ironbark+Foothills.Woodland+Gipps.Manna+Montane.Forest+River.Red.Gum, data=mm, contr.unordered="contr.treat")
    > macnally.adonis
    
    Call:
    adonis(formula = macnally.dist ~ Box.Ironbark + Foothills.Woodland +      Gipps.Manna + Montane.Forest + River.Red.Gum, data = mm,      contr.unordered = "contr.treat") 
    
    Terms added sequentially (first to last)
    
                       Df SumsOfSqs MeanSqs
    Box.Ironbark        1      0.70   0.702
    Foothills.Woodland  1      0.34   0.336
    Gipps.Manna         1      0.72   0.722
    Montane.Forest      1      0.36   0.363
    River.Red.Gum       1      1.37   1.372
    Residuals          31      4.60   0.148
    Total              36      8.09        
                       F.Model    R2 Pr(>F)
    Box.Ironbark          4.74 0.087  0.001
    Foothills.Woodland    2.27 0.042  0.030
    Gipps.Manna           4.87 0.089  0.003
    Montane.Forest        2.45 0.045  0.016
    River.Red.Gum         9.25 0.170  0.001
    Residuals                  0.568       
    Total                      1.000       
                          
    Box.Ironbark       ***
    Foothills.Woodland *  
    Gipps.Manna        ** 
    Montane.Forest     *  
    River.Red.Gum      ***
    Residuals             
    Total                 
    ---
    Signif. codes:  
      0 '***' 0.001 '**' 0.01 '*' 0.05
      '.' 0.1 ' ' 1
    

Centering variables

Centering a variable shifts the underlying scale such that the mean of the variable is equal to 0. That is, once centered, the data will vary around 0.

Centering is achieved by subtracting each value from the mean of all the values. In R this can be done manually:

> Y <- c(1,4,3,7,8)
> mean(Y)
[1] 4.6
> Yc <- Y-mean(Y)
Or using the scale() function.
> scale(Y, scale=FALSE)
     [,1]
[1,] -3.6
[2,] -0.6
[3,] -1.6
[4,]  2.4
[5,]  3.4
attr(,"scaled:center")
[1] 4.6
The scale() function can also be used to center the vectors of matrices.
> Y <- matrix(c(2,5,3,6,7,1,7,3,5),3,3)
> scale(Y,scale=FALSE)
        [,1]   [,2] [,3]
[1,] -1.3333  1.333    2
[2,]  1.6667  2.333   -2
[3,] -0.3333 -3.667    0
attr(,"scaled:center")
[1] 3.333 4.667 5.000

End of instructions

  Quitting R

There are two ways to quit R elegantly
  1. Goto the RGui File menu and select Quit
  2. In the R Console, type
    > q()
    
Either way you will be prompted to save the workspace, generally this is not necessary or even desirable.

End of instructions

  The R working directory

The R working directory is the location that R looks in to read/write files. When you load R, it initially sets the working directory to a location within the R subdirectory of your computer (Windows and MacOSX) or the location from which you executed R (Linux). However, we can alter the working directory so that it points to another location. This not only prevents having to search for files, it also negates the need to specify a path as well as a filename in various operations.
The working directory can be altered by specifying a path (full or relative) as an argument to the setwd() function. Note that R uses Unix path slashes ("/") irrespective of what operating system.
> setwd("/Work/Analyses/Project1") #change the working directory

Alternatively, on Windows:
  1. Goto the RGui File menu
  2. Select the Change dir submenu
  3. Locate the directory that contains your data and/or scripts
  4. Click OK

End of instructions

  Read in (source) an R script

There are two ways to read in (source) a R script
  1. Goto the RGui File menu and select Source R code
  2. In the R Console, type

    > source('filename')

    where 'filename' is the name of the R script file.
All commands in the file will be executed sequentially starting at the first line. Note that R will ignore everything that is on a line following the '#' character. This provides a way of inserting comments into script files (a practice that is highly encouraged as it provides a way of reminding yourself what each line of syntax does).

Note that there is an alternative way of using scripts

  1. Goto the RGui File menu and select the Open script submenu
  2. This will display the R script file in the R Editor window
  3. Syntax can then be directly cut and paste into the R Console. Again,each line will be executed sequentially and comments are ignored.
When working with multi-step analyses, it is recommended that you have the R Editor open and that syntax is entered directly into the editor and then pasted into the R Console. Then provided the R script is saved regularly, your data and analyses are safe from computer disruptions.

End of instructions

  R workspace

The R workspace stores all the objects that are created during an R session. To view a list of objects occupying the current R workshpace

> ls()

The workspace is cabable of storing huge numbers of objects, and thus it can become cluttered (with objects that you have created, but forgotten about, or many similar objects that contain similar contents) very rapidly. Whilst this does not affect the performance of R, it can lead to chronic confusion. It is advisable that you remove all unwanted objects when you have finished with them. To remove and object;

> rm(object name)

where object name is the name of the object to be removed.

When you exit R, you will be prompted for whether you want to save the workspace. You can also save the current workspace at any time by; A saved workspace is not in easily readable text format. Saving a workspace enables you to retrieve an entire sessions work instantly, which can be useful if you are forced to perform you analyses over multiple sittings. Furthermore, by providing different names, it is possible to maintain different workspaces for different projects.

End of instructions

  Removing objects

To remove an object

> rm(object name)

where object name is the name of the object to be removed.

End of instructions

  R indexing

A vector is simply a array (list) of entries of the same type. For example, a numeric vector is just a list of numbers. A vector has only a single dimension - length - and thus the first entry in the vector has an index of 1 (is in position 1), the second has an index of 2 (position 2), etc. The index of an entry provides a convenient way to access entries. If a vector contained the numbers 10, 5, 8, 3.1, then the entry with index 1 is 10, at index 4 is the entry 3.1. Consider the following examples
> var <- c(4,8,2,6,9,2)
> var
[1] 4 8 2 6 9 2
> var[5]
[1] 9
> var[3:5]
[1] 2 6 9

A data frame on the other hand is not a single vector, but rather a collection of vectors. It is a matrix, consisting of columns and rows and therefore has both length and width. Consequently, each entry has a row index and a column index. Consider the following examples
> dv <- c(4,8,2,6,9,2)
> iv <- c('a','a','a','b','b','b')
> data <- data.frame(iv,dv)
> data
  iv dv
1  a  4
2  a  8
3  a  2
4  b  6
5  b  9
6  b  2
> #The first column (preserve frame)
> data[1]
  iv
1  a
2  a
3  a
4  b
5  b
6  b
> #The second column (preserve frame)
> data[2]
  dv
1  4
2  8
3  2
4  6
5  9
6  2
> #The first column (as vector)
> data[,1]
[1] a a a b b b
Levels: a b
> #The first row (as vector)
> data[1,]
  iv dv
1  a  4
> #list the entry in row 3, column 2
> data[3,2]
[1] 2

End of instructions

  R 'fix()' function

The 'fix()' function provides a very rudimentary spreadsheet for data editing and entry.

> fix(data frame name)

The spreadsheet (Data Editor) will appear. A new variable can be created by clicking on a column heading and then providing a name fo r the variable as well as an indication of whether the contents are expected to be numeric (numbers) or character (contain some letters). Data are added by clicking on a cell and providing an entry. Closing the Data Editor will cause the changes to come into affect.

End of instructions

  R transformations

The following table illustrates the use of common data transmformation functions on a single numeric vector (old_var). In each case a new vector is created (new_var)

TransformationSyntax
loge

> new_var <- log(old_var)

log10

> new_var <- log10(old_var)

square root

> new_var <- sqrt(old_var)

arcsin

> new_var <- asin(sqrt(old_var))

scale (mean=0, unit variance)

> new_var <- scale(old_var)

where old_var is the name of the original variable that requires transformation and new_var is the name of the new variable (vector) that is to contain the transformed data. Note that the original vector (variable) is not altered.

End of instructions

  Data transformations


Essentially transforming data is the process of converting the scale in which the observations were measured into another scale.

I will demonstrate the principles of data transformation with two simple examples.
Firstly, to illustrate the legality and frequency of data transformations, imagine you had measured water temperature in a large number of streams. Naturally, you would have probably measured the temperature in ° C. Supposing you later wanted to publish your results in an American journal and the editor requested that the results be in ° F. You would not need to re-measure the stream temperature. Rather, each of the temperatures could be converted from one scale (° C) to the other (° F). Such transformations are very common.

To further illustrate the process of transformation, imagine a botanist wanted to examine the size of a particular species leaves for some reason. The botanist decides to measure the length of a random selection of leaves using a standard linear, metric ruler and the distribution of sample observations as represented by a histogram and boxplot are as follows.


The growth rate of leaves might be expected to be greatest in small leaves and decelerate with increasing leaf size. That is, the growth rate of leaves might be expected to be logarithmic rather than linear. As a result, the distribution of leaf sizes using a linear scale might also be expected to be non-normal (log-normal). If, instead of using a linear scale, the botanist had used a logarithmic ruler, the distribution of leaf sizes may have been more like the following.


If the distribution of observations is determined by the scale used to measure of the observations, and the choice of scale (in this case the ruler) is somewhat arbitrary (a linear scale is commonly used because we find it easier to understand), then it is justifiable to convert the data from one scale to another after the data has been collected and explored. It is not necessary to re-measure the data in a different scale. Therefore to normalize the data, the botanist can simply convert the data to logarithms.

The important points in the process of transformations are;
  1. The order of the data has not been altered (a large leaf measured on a linear scale is still a large leaf on a logarithmic scale), only the spacing of the data
  2. Since the spacing of the data is purely dependent on the scale of the measuring device, there is no reason why one scale is more correct than any other scale
  3. For the purpose of normalization, data can be converted from one scale to another following exploration

End of instructions

  R writing data

To export data into R, we read the empty the contents a data frame into a file. The general format of the command for writing data in from a data frame into a file is

> write.data(data frame name, 'filename.csv', quote=F, sep=',')

where data frame name is a name of the data frame to be saved and filename.csv is the name of the csv file that is to be created (or overwritten). The argument quote=F indicates that quotation marks should not be placed around entries in the file. The argument sep=',' indicates that entries in the file are separated by a comma (hence a comma delimited file).

As an example

> write.data(LEAVES, 'leaves.csv', quote=F, sep=',')

End of instructions

  Exporting from Excel

End of instructions

  Reading data into R

Ensure that the working directory is pointing to the path containing the file to be imported before proceeding.
To import data into R, we read the contents of a file into a data frame. The general format of the command for reading data into a data frame is

> name <- read.table('filename.csv', header=T, sep=',', row.names=column, strip.white=T)

where name is a name you wish the data frame to be referred to as, filename.csv is the name of the csv file that you created in excel and column is the number of the column that had row labels (if there were any). The argument header=T indicates that the variable (vector) names will be created from the names supplied in the first row of the file. The argument sep=',' indicates that entries in the file are separated by a comma (hence a comma delimited file). If the data file does not contain row labels, or you are not sure whether it does, it is best to omit the row.names=column argument. The strip.white=T arguement ensures that no leading or trailing spaces are left in character names (these can be particularly nasty in categorical variables!).

As an example
> phasmid <- read.data('phasmid.csv', header=T, sep=',', row.names=1, strip.white=T)

End of instructions

  Cutting and pasting data into R

To import data into R via the clipboard, copy the data in Excel (including a row containing column headings - variable names) to the clipboard (CNTRL-C). Then in R use a modification of the read.table function:

> name <- read.table('clipboard', header=T, sep='\t', row.names=column, strip.white=T)

where name is a name you wish the data frame to be referred to as, clipboard is used to indicate that the data are on the clipboard, and column is the number of the column that had row labels (if there were any). The argument header=T indicates that the variable (vector) names will be created from the names supplied in the first row of the file. The argument sep='\t' indicates that entries in the clipboard are separated by a TAB. If the data file does not contain row labels, or you are not sure whether it does, it is best to omit the row.names=column argument. The strip.white=T arguement ensures that no leading or trailing spaces are left in character names (these can be particularly nasty in categorical variables!).

As an example
> phasmid <- read.data('clipboard', header=T, sep='\t', row.names=1, strip.white=T)

End of instructions

  Writing a copy of the data to an R script file

> phasmid <- dump('data', "")

where 'data' is the name of the object that you wish to be stored.
Cut and paste the output of this command directly into the top of your R script. In future, when the script file is read, it will include a copy of your data set, preserved exactly.

End of instructions

  Frequency histogram

> hist(variable)

where variable is the name of the numeric vector (variable) for which the histogram is to be constructed.

End of instructions

  Summary Statistics

# mean
> mean(variable)
# variance
> var(variable)
# standard deviation
> sd(variable)
# variable length
> length(variable)
# median
> median(variable)
# maximum
> max(variable)
# minimum
> min(variable)
# standard error of mean
> sd(variable)/sqrt(length(variable))
# interquartile range
> quantile(variable, c(.25,.75))
# 95% confidence interval
> qnorm(0.975,0,1)*(sd(variable)/sqrt(length(variable)))

where variable is the name of the numeric vector (variable).

End of instructions

  Normal probabilities

> pnorm(c(value), mean=mean, sd=sd, lower.tail=FALSE)

this will calculate the area under a normal distribution (beyond the value of value) with mean of mean and standard deviation of sd. The argument lower.tail=FALSE indicates that the area is calculated for values greater than value. For example

> pnorm(c(2.9), mean=2.025882, sd=0.4836265, lower.tail=FALSE)

End of instructions

  Boxplots

A boxplot is a graphical representation of the distribution of observations based on the 5-number summary that includes the median (50%), quartiles (25% and 75%) and data range (0% - smallest observation and 100% - largest observation). The box demonstrates where the middle 50% of the data fall and the whiskers represent the minimum and maximum observations. Outliers (extreme observations) are also represented when present.
The figure below demonstrates the relationship between the distribution of observations and a boxplot of the same observations. Normally distributed data results in symmetrical boxplots, whereas skewed distributions result in asymmetrical boxplots with each segment getting progressively larger (or smaller).


End of instructions

  Boxplots

> boxplot(variable)

where variable is the name of the numeric vector (variable)

End of instructions

  Observations, variables & populations

Observations are the sampling units (e.g quadrats) or experimental units (e.g. individual organisms, aquaria) that make up the sample.

Variables are the actual properties measured by the individual observations (e.g. length, number of individuals, rate, pH, etc). Random variables (those variables whose values are not known for sure before the onset of sampling) can be either continuous (can theoretically be any value within a range, e.g. length, weight, pH, etc) or categorical (can only take on certain discrete values, such as counts - number of organisms etc).

Populations are defined as all the possible observations that we are interested in.

A sample represents a collected subset of the population's observations and is used to represent the entire population. Sample statistics are the characteristics of the sample (e.g. sample mean) and are used to estimate population parameters.


End of instructions

  Population & sample

A population refers to all possible observations. Therefore, population parameters refer to the characteristics of the whole population. For example the population mean.

A sample represents a collected subset of the population's observations and is used to represent the entire population. Sample statistics are the characteristics of the sample (e.g. sample mean) and are used to estimate population parameters.


End of instructions

  Standard error and precision

A good indication of how good a single estimate is likely to be is how precise the measure is. Precision is a measure of how repeatable an outcome is. If we could repeat the sampling regime multiple times and each time calculate the sample mean, then we could examine how similar each of the sample means are. So a measure of precision is the degree of variability between the individual sample means from repeated sampling events.

Sample number Sample mean
112.1
212.7
312.5
Mean of sample means12.433
> SD of sample means0.306

The table above lists three sample means and also illustrates a number of important points;
  1. Each sample yields a different sample mean
  2. The mean of the sample means should be the best estimate of the true population mean
  3. The more similar (consistent) the sample means are, the more precise any single estimate of the population mean is likely to be
The standard deviation of the sample means from repeated sampling is called the Standard error of the mean.

It is impractical to repeat the sampling effort multiple times, however, it is possible to estimate the standard error (and therefore the precision of any individual sample mean) using the standard deviation (SD) of a single sample and the size (n) of this single sample.

The smaller the standard error (SE) of the mean, the more precise (and possibly more reliable) the estimate of the mean is likely to be.

End of instructions

  Confidence intervals


A 95% confidence interval is an interval that we are 95% confident will contain the true population mean. It is the interval that there is a less than 5% chance that this interval will not contain the true population mean, and therefore it is very unlikely that this interval will not contain the true mean. The frequentist approach to statistics dictates that if multiple samples are collected from a population and the interval is calculated for each sample, 95% of these intervals will contain the true population mean and 5% will not. Therefore there is a 95% probability that any single sample interval will contain the population mean.

The interval is expressed as the mean ± half the interval. The confidence interval is obviously affected by the degree of confidence. In order to have a higher degree of confidence that an interval is likely to contain the population mean, the interval would need to be larger.

End of instructions

  Successful transformations


Since the objective of a transformation is primarily to normalize the data (although variance homogeneity and linearization may also be beneficial side-effects) the success of a transformation is measured by whether or not it has improved the normality of the data. It is not measured by whether the outcome of a statistical analysis is more favorable!

End of instructions

  Selecting subsets of data

# select the first 5 entries in the variable
> variable[1:5]
# select all but the first entry in the variable
> variable[-1]
# select all cases of variable that are less than num
> variable[variablenum]
# select all cases of variable that are equal to 'Big'
> variable1[variablelabel]

where variable is the name of a numeric vector (variable) and num is a number. variable1 is the name of a character vector and label is a character string (word). 5. In the Subset expression box enter a logical statement that indicates how the data is to be subsetted. For example, exclude all values of DOC that are greater than 170 (to exclude the outlier) and therefore only include those values that are less that 170, enter DOC < 170. Alternatively, you could chose to exclude the outlier using its STREAM name. To exclude this point enter STREAM. != 'Santa Cruz'.

End of instructions

  Summary Statistics by groups

> tapply(dv,factor,func)

the tapply() function applies the function func to the numeric vector dv for each level of the factor factor. For example

# calculate the mean separately for each group
> tapply(dv,factor,mean)
# calculate the mean separately for each group
> tapply(dv,factor,var)

End of instructions

  EDA - Normal distribution

Parametric statistical hypothesis tests assume that the population measurements follow a specific distribution. Most commonly, statistical hypothesis tests assume that the population measurements are normally distributed (Question 4 highlights the specific reasoning for this).

While it is usually not possible to directly examine the distribution of measurements in the whole population to determine whether or not this requirement is satisfied or not (for the same reasons that it is not possible to directly measure population parameters such as population mean), if the sampling is adequate (unbiased and sufficiently replicated), the distribution of measurements in a sample should reflect the population distribution. That is a sample of measurements taken from a normally distributed population should be normally distributed.

Tests on data that do not meet the distributional requirements may not be reliable, and thus, it is essential that the distribution of data be examined routinely prior to any formal statistical analysis.


End of instructions

  Factorial boxplots

> boxplot(dv~factor,data=data)

where dv is the name of the numeric vector (dependent variable), factor is the name of the factor (categorical or factor variable) and data is the name of the data frame (data set). The '~' is used in formulas to represent that the left hand side is proportional to the right hand side.

End of instructions

  EDA - Linearity

The most common methods for analysing the relationship or association between variables.assume that the variables are linearly related (or more correctly, that they do not covary according to some function other than linear). For example, to examine for the presence and strength of the relationship between two variables (such as those depicted below), it is a requirement of the more basic statistical tests that the variables not be related by any function other than a linear (straight line) function if any function at all.

There is no evidence that the data represented in Figure (a) are related (covary) according to any function other than linear. Likewise, there is no evidence that the data represented in Figure (b) are related (covary) according to any function other than linear (if at all). However, there is strong evidence that the data represented in Figure (c) are not related linearly. Hence Figure (c) depicts evidence for a violation in the statistical necessity of linearity.


End of instructions

  EDA - Scatterplots

Scatterplots are constructed to present the relationships, associations and trends between continuous variables. By convention, dependent variables are arranged on the Y-axis and independent variables on the X-axis. The variable measurements are used as coordinates and each replicate is represented by a single point.

Figure (c) above displays a linear smoother (linear `line-of-best-fit') through the data. Different smoothers help identify different trends in data.


End of instructions

  Scatterplots

# load the 'car' library
> library(car)
# generate a scatterplot
> scatterplot(dv~iv,data=data)

where dv is the dependent variable, iv is the independent variable and data is the data frame (data set).

End of instructions

  Plotting y against x


The statement 'plot variable1 against variable2' is interpreted as 'plot y against x'. That is variable1 is considered to be the dependent variable and is plotted on the y-axis and variable2 is the independent variable and thus is plotted on the x-axis.

End of instructions

  Scatterplot matrix (SPLOM)

# make sure the car package is loaded > library(car)
> scatterplot.matrix(~var1 + var2 + var3, data=data, diagonal='boxplot')
#OR > pairs(~var1 + var2 + var3, data=data)

where var1, var2 etc are the names of numeric vectors in the data data frame (data set)

End of instructions

  EDA - Homogeneity of variance

Many statistical hypothesis tests assume that the populations from which the sample measurements were collected are equally varied with respect to all possible measurements. For example, are the growth rates of one population of plants (the population treated with a fertilizer) more or less varied than the growth rates of another population (the control population that is purely treated with water). If they are, then the results of many statistical hypothesis tests that compare means may be unreliable. If the populations are equally varied, then the tests are more likely to be reliable.
Obviously, it is not possible to determine the variability of entire populations. However, if sampling is unbiased and sufficiently replicated, then the variability of samples should be directly proportional to the variability of their respective populations. Consequently, samples that have substantially different degrees of variability provide strong evidence that the populations are likely to be unequally varied.
There are a number of options available to determine the likelihood that populations are equally varied from samples.
1. Calculate the variance or standard deviation of the populations. If one sample is more than 2 times greater or less than the other sample(s), then there is some evidence for unequal population variability (non-homogeneity of variance)
2. Construct boxplots of the measurements for each population, and compare the lengths of the boxplots. The length of a symmetrical (and thus normally distributed) boxplot is a robust measure of the spread of values, and thus an indication of the spread of population values. Therefore if any of the boxplots are more than 2 times longer or shorter than the other boxplot(s), then there is again, some evidence for unequal population variability (non-homogeneity of variance)


End of instructions

  t test

The frequentist approach to hypothesis testing is based on estimating the probability of collecting the observed sample(s) when the null hypothesis is true. That is, how rare would it be to collect samples that displayed the observed degree of difference if the samples had been collected from identical populations. The degree of difference between two collected samples is objectively represented by a t statistic.

Where y1 and y2 are the sample means of group 1 and 2 respectively and √s²/n1 + s²/n2 represents the degree of precision in the difference between means by taking into account the degree of variability of each sample.
If the null hypothesis is true (that is the mean of population 1 and population 2 are equal), the degree of difference (t value) between an unbiased sample collected from population 1 and an unbiased sample collected from population 2 should be close to zero (0). It is unlikely that an unbiased sample collected from population 1 will have a mean substantially higher or lower than an unbiased sample from population 2. That is, it is unlikely that such samples could result in a very large (positive or negative) t value if the null hypothesis (of no difference between populations) was true. The question is, how large (either positive or negative) does the t value need to be, before we conclude that the null hypothesis is unlikely to be true?.

What is needed is a method by which we can determine the likelihood of any conceivable t value when null hypothesis is true. This can be done via simulation. We can simulate the collection of random samples from two identical populations and calculate the proportion of all possible t values.

Lets say a vivoculturalist was interested in comparing the size of Eucalyptus regnans seedlings grown under shade and full sun conditions. In this case we have two populations. One population represents all the possible E. regnans seedlings grown in shade conditions, and the other population represents all the possible E. regnans seedlings grown in full sun conditions. If we had grown 200 seedlings under shade conditions and 200 seedlings under full sun conditions, then these samples can be used to assess the null hypothesis that the mean size of an infinite number (population) of E. regnans seedlings grown under shade conditions is the same as the mean size of an infinite number (population) of E. regnans seedlings grown under full sun conditions. That is that the population means are equal.

We can firstly simulate the sizes of 200 seedlings grown under shade conditions and another 200 seedlings grown under full sun conditions that could arise naturally when shading has no effect on seedling growth. That is, we can simulate one possible outcome when the null hypothesis is true and shading has no effect on seedling growth
Now we can calculate the degree of difference between the mean sizes of seedlings grown under the two different conditions taking into account natural variation (that is, we can calculate a t value using the formula from above). From this simulation we may have found that the mean size of seedling grown in shade and full sun was 31.5cm and 33.7cm respectively and the degree of difference (t value) was 0.25. This represents only one possible outcome (t value). We now repeat this simulation process a large number of times (1000) and generate a histogram (or more correctly, a distribution) of the t value outcomes that are possible when the null hypothesis is true.

It should be obvious that when the null hypothesis is true (and the two populations are the same), the majority of t values calculated from samples containing 200 seedlings will be close to zero (0) - indicating only small degrees of difference between the samples. However, it is also important to notice that it is possible (albeit less likely) to have samples that are quit different from one another (large positive or negative t values) just by pure chance (for example t values greater than 2).

It turns out that it is not necessary to perform these simulations each time you test a null hypothesis. There is a mathematical formulae to estimate the t distribution appropriate for any given sample size (or more correctly, degrees of freedom) when the null hypothesis is true. In this case, the t distribution is for (200-1)+(200-1)=398 degrees of freedom.

At this stage we would calculate the t value from our actual observed samples (the 200 real seedlings grown under each of the conditions). We then compare this t value to our distribution of all possible t values (t distribution) to determine how likely our t value is when the null hypothesis is true. The simulated t distribution suggests that very large (positive or negative) t values are unlikely. The t distribution allows us to calculate the probability of obtaining a value greater than a specific t value. For example, we could calculate the probability of obtaining a t value of 2 or greater when the null hypothesis is true, by determining the area under the t distribution beyond a value of 2.

If the calculated t value was 2, then the probability of obtaining this t value (or one greater) when the null hypothesis is true is 0.012 (or 1.2%). Since the probability of obtaining our t value or one greater (and therefore the probability of having a sample of 200 seedlings grown in the shade being so much different in size than a sample of 200 seedlings grown in full sun) is so low, we would conclude that the null hypothesis of no effect of shading on seedling growth is likely to be false. Thus, we have provided some strong evidence that shading conditions do effect seedling growth!


End of instructions

  t-test degrees of freedom

Degrees of freedom is the number of observations that are free to vary when estimating a parameter. For a t-test, df is calculated as
df = (n1-1)+(n2-1)
where n1 is population 1 sample size and n2 is the sample size of population 2.

End of instructions

  One-tailed critical t-value

One-tail critical t-values are just calculated from a t distribution (of given df). The area under the t distribution curve above (or below) this value is 0.05 (or some other specified probability). Essentially we calculate a quantile of a specified probability from a t distribution of df degrees of freedom

# calculate critical t value (&alpha =0.05) for upper tail (e.g. A larger than B)
> qt(0.05,df=df,lower.tail=F)
# calculate critical t value (&alpha =0.05) for lower tail (e.g. B larger than A)
> qt(0.05,df=df,lower.tail=T)

where 0.05 is the specified &alpha value and df is the specified degrees of freedom. Note that it is not necessary to have collected the data before calculating the critical t-value, you only need to know sample sizes (to get df).

It actually doesn't matter whether you select Lower tail is TRUE or FALSE. The t-distribution is a symmetrical distribution, centered around 0, therefore, the critical t-value is the same for both Lower tail (e.g. population 2 greater than population 1) and Upper tail (e.g. population 1 greater than population 2), except that the Lower tail is always negative. As it is often less confusing to work with positive values, it is recommended that you use Upper tail values. An example of a t-distribution with Upper tail for a one-tailed test is depicted below. Note that this is not part of the t quantiles output!


End of instructions

  Two-tailed critical t-value

Two-tail critical t-values are just calculated from a t distribution (of given df). The area under the t distribution curve above and below the positive and negative of this value respectively is 0.05 (or some other specified probability). Essentially we calculate a quantile of a specified probability from a t distribution of df degrees of freedom. In a two-tailed test, half of the probability is associated with the area above the positive critical t-value and the other half is associated with the area below the negative critical t-value. Therefore when we use the quantile to calculate this critical t-value, we specify the probability as &alpha/2 - since &alpha/2 applies to each tail.

# calculate critical t value (&alpha =0.05) for upper tail (e.g. A different to B)
> qt(0.05/2,df=df, lower.tail=T)

where 0.05 is the specified &alpha value and df is the specified degrees of freedom. Note that it is not necessary to have collected the data before calculating the critical t-value, you only need to know sample sizes (to get df).

Again, it actually doesn't matter whether you select Lower tail as either TRUE or FALSE. For a symmetrical distribution, centered around 0, the critical t-value is the same for both Lower tail (e.g. population 2 greater than population 1) and Upper tail (e.g. population 1 greater than population 2), except that the Lower tail is always negative. As it is often less confusing to work with positive values, it is recommended that you use Upper tail values. An example of a t-distribution with Upper tail for a two-tailed test is depicted below. Note that this is not part of the t quantiles output!

End of instructions

  Basic steps of Hypothesis testing


Step 1 - Clearly establish the statistical null hypothesis. Therefore, start off by considering the situation where the null hypothesis is true - e.g. when the two population means are equal

Step 2 - Establish a critical statistical criteria (e.g. alpha = 0.05)

Step 3 - Collect samples consisting of independent, unbiased samples

Step 4 - Assess the assumptions relevant to the statistical hypothesis test. For a t test:
  1.  Normality
  2.  Homogeneity of variance

Step 5 - Calculate test statistic appropriate for null hypothesis (e.g. a t value)


Step 6 - Compare observed test statistic to a probability distribution for that test statistic when the null hypothesis is true for the appropriate degrees of freedom (e.g. compare the observed t value to a t distribution).

Step 7 - If the observed test statistic is greater (in magnitude) than the critical value for that test statistic (based on the predefined critical criteria), we conclude that it is unlikely that the observed samples could have come from populations that fulfill the null hypothesis and therefore the null hypothesis is rejected, otherwise we conclude that there is insufficient evidence to reject the null hypothesis. Alternatively, we calculate the probability of obtaining the observed test statistic (or one of greater magnitude) when the null hypothesis is true. If this probability is less than our predefined critical criteria (e.g. 0.05), we conclude that it is unlikely that the observed samples could have come from populations that fulfill the null hypothesis and therefore the null hypothesis is rejected, otherwise we conclude that there is insufficient evidence to reject the null hypothesis.

End of instructions

  Pooled variance t-test

> t.test(dv~factor,data=data,var.equal=T)

where dv is the name of the dependent variable, factor is the name of the categorical/factorial variable and data is the name of the data frame (data set). The argument var.equal=T indicates a pooled variances t-test

End of instructions

  Separate variance t-test

> t.test(dv~factor,data=data,var.equal=F)

where dv is the name of the dependent variable, factor is the name of the categorical/factorial variable and data is the name of the data frame (data set). The argument var.equal=F indicates a separate variances t-test

End of instructions

  Output of t-test


The following output are based on a simulated data sets;
1.  Pooled variance t-test for populations with equal (or nearly so) variances

2.  Separate variance t-test for population with unequal variances

End of instructions

  Paired t-test

> t.test(cat1, cat2, data=data, paired=T)

where cat1 and cat2 are two numeric vectors (variables) in the data data frame (data set). The argument paired=T indicates a paired t-test)

End of instructions

  Non-parametric tests

Non-parametric tests do not place any distributional limitations on data sets and are therefore useful when the assumption of normality is violated. There are a number of alternatives to parametric tests, the most common are;

1. Randomization tests - rather than assume that a test statistic calculated from the data follows a specific mathematical distribution (such as a normal distribution), these tests generate their own test statistic distribution by repeatedly re-sampling or re-shuffling the original data and recalculating the test statistic each time. A p value is subsequently calculated as the proportion of random test statistics that are greater than or equal to the test statistic based on the original (un-shuffled) data.

2. Rank-based tests - these tests operate not on the original data, but rather data that has first been ranked (each observation is assigned a ranking, such that the largest observation is given the value of 1, the next highest is 2 and so on). It turns out that the probability distribution of any rank based test statistic for a is identical.

End of instructions

  Wilcoxon test

> wilcox.test(dv ~ factor, data=data)

where dv is the dependent variable and factor is the categorical variable from the data data frame (data set).

End of instructions

  Randomization (permutation) tests

The basis of hypothesis testing (when comparing two populations) is to determine how often we would expect to collect a specific sample (or more one more unusual) if the populations were identical. That is, would our sample be considered unusual under normal circumstances? To determine this, we need to estimate what sort of samples we would expect under normal circumstances. The theoretical t-distribution mimics what it would be like to randomly resample repeatedly (with a given sample size) from such identical populations.

A single sample from the (proposed) population(s) provides a range of observations that are possible. Completely shuffling those data (or labels), should mimic the situation when there is no effect (null hypothesis true situation), since the data are then effectively assigned randomly with respect to the effect. In the case of a t-test comparing the means of two groups, randomly shuffling the data is likely to result similar group means. This provides one possible outcome (t-value) that might be expected when the null hypothesis true. If this shuffling process is repeated multiple times, a distribution of all of the outcomes (t-values) that are possible when the null hypothesis is true can be generated. We can now determine how unusual our real (unshuffled) t-value is by comparing it to the built up t-distribution. If the real t-value is expected to occur less than 5% of the time, we reject the null hypothesis.

Randomization tests - rather than assume that a test statistic calculated from the data follows a specific mathematical distribution (such as a normal distribution), these tests generate their own test statistic distribution by repeatedly re-sampling or re-shuffling the original data and recalculating the test statistic each time. A p value is subsequently calculated as the proportion of random test statistics that are greater than or equal to the test statistic based on the original (un-shuffled) data.

End of instructions

  Defining the randomization statistic

> stat <- function(data, indices){
+ t <- t.test(data[,2]~data[,1])$stat
+ t
+ }
The above function takes a data set (data) and calculates a test statistic (in this case a t-statistic). The illustrated function uses the t.test() function to perform a t-test on column 2 ([,2]) against column 1 ([,1]). The value of the t-statistic is stored in an object called 't'. The function returns the value of this object. The indentations are not important, they are purely used to improve human readability.

End of instructions

  Defining the data shuffling procedure

> rand.gen <- function(data, mle){
+ out <- data
+ out[,1] <- sample(out[,1], replace=F)
+ out
+ }
The above function defines how the data should be shuffled. The first line creates a temporary object (out) that stores the original data set (data) so that any alterations do not effect the original data set. The second line uses the sample() function to shuffle the first column (the group labels). The third line of the function just returns the shuffled data set.

End of instructions

  Perform randomization test (bootstrapping)

> library(boot)
> coots.boot <- boot(coots, stat, R=100, sim="parametric", ran.gen=rand.gen)
Error: object 'coots' not found
The sequence begins by ensuring that the boot package has been loaded into memory. Then the boot() function is used to repeatedly (R=100: repeat 100 times) perform the defined statistical procedure (stat). The ran.gen=rand.gen parameter defines how the data set should be altered prior to performing the statistical procedure, and the sim="parametric" parameter indicates that all randomizations are possible (as opposed to a permutation where each configuration can only occur once). The outcome of the randomization test (bootstrap) is stored in an object called coots.boot

End of instructions

  Calculating p-value from randomization test

> p.length <- length(coots.boot$t[coots.boot$t >= coots.boot$t0])+1
Error: object 'coots.boot' not found
> print(p.length/(coots.boot$R + 1))
Error: object 'p.length' not found
The coots.boot object contains a list of all of the t-values calculated from the resampling (shuffling and recalculating) procedure (coots.boot$t)as well as the actual t-value from the actual data set (coots.boot$t0). It also stores the number of randomizations that it performed (coots.boot$R).

The first line in the above sequence determines how many of the possible t values (including the actual t-value) are greater or equal to the actual t-value. It does this by specifying a list of coots.boot$t values that are greater or equal to the coots.boot$t0 value. (coots.boot$t[coots.boot$t >= coots.boot$t0]) and calculating the length of this list using the length() function. One (1) is added to this length, since the actual t-value must be included as a possible t-value.
The second line expresses the number of randomization t-values greater or equal to the actual t-value as a fraction of the total number of randomizations (again adding 1 to account for the actual situation). This is interpreted as any other p-value - the probability of obtaining our sample statistic (and thus our observations) when the null hypothesis is true.

End of instructions

  Calculating p-value from randomization test

In a t-test, the effect size is the absolute difference between the expected population means. Therefore if the expected population means of population A and B are 10 and 16 respectively, then the effect size is 6.

Typically, effect size is estimated by considering the magnitude of an effect that is either likely or else what magnitude of effect would be regarded biologically significant. For example, if the overall population mean was estimated to be 12 and the researches regarded a 15% increase to be biologically significant, then the effect size is the difference between 12 and a mean 15% greater than 12 (12+(12*.15)=13.8). Therefore the effect size is (13.8-12=1.8).

End of instructions