Workshop 13.3 - Multivariate standardizations

14 Jan 2013

Basic statistics references

Legendre and Legendre
Quinn & Keough (2002) - Chpt 17

Standardizations

The following community data represent the abundances of three species of gastropods in five quadrats (ranging from high shore marsh - Quadrat 1, to low shore marsk - Quadrat 5) in a saltmarsh.

Download gastropod data set

Format of the gastropod

Salinator	Ophicardelus	Marinula
4	0	1
9	3	0
9	4	1
6	2	0
0	1	1

Salinator	Number of Salinator gastropods - variable
Ophicardelus	Number of Ophicardelus gastropods - variable
Marinula	Number of Marinula gastropods - variable
Q1-Q5	Quadrats - these are the objects

Open the gastropod data set.

Show code

> gastropod <- read.csv("../downloads/data/gastropod.csv")
> gastropod

  Salinator Ophicardelus Marinula
1         4            0        1
2         9            3        0
3         9            4        1
4         6            2        0
5         0            1        1

Before proceeding with any multivariate analyses, it is a good idea to get a 'feel' for your data. The gastropod data set is intentionally very small so that we can help relate various calculated properties to what we can see by simply inspecting the counts.
To build up a picture of these data, generate the following exploratory properties:
1. Scale of each of the species (column maximums)
  Show code
  > apply(gastropod, 2, max)
  Salinator Ophicardelus Marinula 9 4 1
2. Scale of each of the species (column means)
  Show code
  > apply(gastropod, 2, mean)
  Salinator Ophicardelus Marinula 5.6 2.0 0.6
3. Variability of each of the species (column variance)
  Show code
  > apply(gastropod, 2, var)
  Salinator Ophicardelus Marinula 14.3 2.5 0.3
4. Abundances in each quadrat (row totals)
  Show code
  > apply(gastropod, 1, sum)
  [1] 5 12 14 8 2
5. Correlations between species
  Show code
  > cor(gastropod)
  Salinator Ophicardelus Marinula Salinator 1.0000 0.7944 -0.4587 Ophicardelus 0.7944 1.0000 -0.2887 Marinula -0.4587 -0.2887 1.0000

We intend to use these data in some sort of multivariate analysis. Typically, before doing so, we standardize the data in order to ensure that certain features are honored in the analysis. Standardize the gastropod data to achieve the following:

ensure that the rare and abundant species alike have similar weighting and are constrained to the range of 0-1

Show code

> library(vegan)
> gast1 <- decostand(gastropod, "max")
> gast1

  Salinator Ophicardelus Marinula
1    0.4444         0.00        1
2    1.0000         0.75        0
3    1.0000         1.00        1
4    0.6667         0.50        0
5    0.0000         0.25        1

> apply(gast1, 2, max)

   Salinator Ophicardelus     Marinula 
           1            1            1

> apply(gast1, 2, range)

     Salinator Ophicardelus Marinula
[1,]         0            0        0
[2,]         1            1        1

ensure that the all species have similar weighting yet maintain their variability. This could be important if you want multivariate patterns to reflect heterogeneity (many analyses are drawn towards higher variability).

Show code

> # center the data
> gast2 <- apply(gastropod, 2, scale, scale = FALSE)
> gast2

     Salinator Ophicardelus Marinula
[1,]      -1.6           -2      0.4
[2,]       3.4            1     -0.6
[3,]       3.4            2      0.4
[4,]       0.4            0     -0.6
[5,]      -5.6           -1      0.4

> apply(gast2, 2, mean)

   Salinator Ophicardelus     Marinula 
   3.554e-16    0.000e+00    2.220e-17

> apply(gast2, 2, var)

   Salinator Ophicardelus     Marinula 
        14.3          2.5          0.3

ensure that the all species have similar weighting. The influences of highly abundant and/or variable species are suppressed and those of rare species are enhanced so that all have similar influence.

Show code

> # scale data to mean=0 and variance of 1
> gast3 <- apply(gastropod, 2, scale)
> # OR
> library(vegan)
> gast3 <- decostand(gastropod, method = "standardize")
> gast3

  Salinator Ophicardelus Marinula
1   -0.4231      -1.2649   0.7303
2    0.8991       0.6325  -1.0954
3    0.8991       1.2649   0.7303
4    0.1058       0.0000  -1.0954
5   -1.4809      -0.6325   0.7303

> apply(gast3, 2, mean)

   Salinator Ophicardelus     Marinula 
   1.193e-16    0.000e+00    0.000e+00

> apply(gast3, 2, var)

   Salinator Ophicardelus     Marinula 
           1            1            1

ensure that all sites have similar weightings and are constrained to a range of 0-1.

Show code

> library(vegan)
> gast4 <- decostand(gastropod, "total")
> gast4

  Salinator Ophicardelus Marinula
1    0.8000       0.0000  0.20000
2    0.7500       0.2500  0.00000
3    0.6429       0.2857  0.07143
4    0.7500       0.2500  0.00000
5    0.0000       0.5000  0.50000

> apply(gast4, 1, sum)

[1] 1 1 1 1 1

> cor(gast4)

             Salinator Ophicardelus Marinula
Salinator       1.0000      -0.8353  -0.8852
Ophicardelus   -0.8353       1.0000   0.4836
Marinula       -0.8852       0.4836   1.0000

ensure that all species and sites have similar weightings and yet enhances any underlying patterns (increases species correlations for example). This can improve the success of any resulting multivariate analyses.

Show code

> library(vegan)
> # Wisconsin double standardization
> gast5 <- wisconsin(gastropod)
> gast5

  Salinator Ophicardelus Marinula
1    0.3077       0.0000   0.6923
2    0.5714       0.4286   0.0000
3    0.3333       0.3333   0.3333
4    0.5714       0.4286   0.0000
5    0.0000       0.2000   0.8000

> cor(gast5)

             Salinator Ophicardelus Marinula
Salinator       1.0000       0.6123  -0.9241
Ophicardelus    0.6123       1.0000  -0.8680
Marinula       -0.9241      -0.8680   1.0000

Transformation	Syntax
log_e	> new_var <- log(old_var)
log₁₀	> new_var <- log10(old_var)
square root	> new_var <- sqrt(old_var)
arcsin	> new_var <- asin(sqrt(old_var))
scale (mean=0, unit variance)	> new_var <- scale(old_var)

Transformation

Syntax

log_e

> new_var <- log(old_var)

log₁₀

> new_var <- log10(old_var)

square root

> new_var <- sqrt(old_var)

arcsin

> new_var <- asin(sqrt(old_var))

scale (mean=0, unit variance)

> new_var <- scale(old_var)

Sample number	Sample mean
1	12.1
2	12.7
3	12.5
Mean of sample means	12.433
> SD of sample means	0.306