Tutorial 10.2a - Goodness of fit tests

22 Jul 2018

Scenario and Data

Goodness of fit tests are concerned with comparing the observed frequencies with those expected on the basis of a specific null hypothesis. So lets now fabricate a motivating scenario and some data.

We will create a scenario that involves items classified into one of three groups (A, B and C). The number of items in each classification group are then tallied up. Out of a total of 47 items, 15 where of type A, 9 where of type B and 23 where of type C. We could evaluate a parity (a 1:1:1 ratio from these data. In a frequentist context, this might involve testing a null hypothesis that the observed data could have come from a population with a 1:1 item ratio. In this case the probability would be the probability of obtaining the observed ratio of frequencies when the null hypothesis is true.

To extend the example, lets also explore a 1:1:2 ratio.

We start by generating the observed data:

# the observed frequences of A and B
obs <- c(15, 9, 23)
obs

[1] 15  9 23

An appropriate test statistic for comparing an observed ($o$) frequency ratio to an expected ($e$) frequency ratio is the chi-square $\chi^2$ statistic. $$\chi^2=\sum\frac{(o-e)^2}{e}$$

When the null hypothesis is true, and specific assumptions hold, the $\chi^2$ statistic should follow a $\chi^2$ probability distribution with degrees of freedom equal to the number of categories minus 1 ($df=p-1$).

Exploratory data analysis and initial assumption checking

The assumptions are:

All of the observations are classified independently - this must be addressed at the design and collection stages
No more than 20% of the expected values should be less than 5. Since the location and spread of the expected value of a $\chi^2$ distribution are the same parameter ($\lambda$), and that the $\chi^2$ distribution is bounded by a lower limit of zero, distributions with expected values less than 5 have an asymmetrical shape and are thus unreliable (for calculating probabilities).

So lets calculate the expected frequencies as a means to evaluate this assumption. The expected values are calculated as: $$e = total~counts \times expected~fraction$$

	1:1:1 ratio	1:1:2 ratio
Expected fractions	A=1/3=0.33, B=1/3=0.33, C=1/3=0.33	A=1/3=0.25, B=1/4=0.25, C=2/4=0.5
Expected frequencies	$e_A=(15+9+23)\times 1/3=15.67$ $e_B=(15+9+23)\times 1/3=15.67$ $e_C=(15+9+23)\times 1/3=15.67$	$e_A=(15+9+23)\times 1/4=11.75$ $e_B=(15+9+23)\times 1/4=11.75$ $e_C=(15+9+23)\times 2/4=23.5$

It is clear that in neither case are any of the expected frequencies less than 5. Therefore, we would conclude that probabilities derived from the $\chi^2$ distribution are likely to be reliable.

Model fitting or statistical analysis

We perform the Goodness of fit $\chi^2$ test with the chisq.test() function. There are only two relevant parameters for a Goodness of fit test:

x: the set (vector or matrix) of observed frequencies
p: the set (vector or matrix) of expected probabilities (default to probabilities equivalent to 1/(length of x)

1:1:1 ratio	1:1:2 ratio
data.chisq <- chisq.test(obs)	data.chisq1 <- chisq.test(obs, p = c(1/4, 1/4, 2/4))

Model evaluation

Prior to exploring the model parameters, it is prudent to confirm that the model did indeed fit the assumptions and was an appropriate fit to the data. For the $chi^2$ test, this just means confirming the expected values.

1:1:1 ratio	1:1:2 ratio
data.chisq$exp [1] 15.66667 15.66667 15.66667	data.chisq1$exp [1] 11.75 11.75 23.50

Exploring the model parameters, test hypotheses

If there was any evidence that the assumptions had been violated or the model was not an appropriate fit, then we would need to reconsider the model and start the process again. In this case, there is no evidence that the test will be unreliable so we can proceed to explore the test statistics.

1:1:1 ratio	1:1:2 ratio
data.chisq Chi-squared test for given probabilities data: obs X-squared = 6.2979, df = 2, p-value = 0.0429	data.chisq1 Chi-squared test for given probabilities data: obs X-squared = 1.5532, df = 2, p-value = 0.46

1:1:1 ratio

1:1:2 ratio

data.chisq

	Chi-squared test for given probabilities

data:  obs
X-squared = 6.2979, df = 2, p-value = 0.0429

data.chisq1

	Chi-squared test for given probabilities

data:  obs
X-squared = 1.5532, df = 2, p-value = 0.46

Conclusions:

There is inferential evidence to reject the null hypothesis that the observed item frequencies could have come from a population with a 1:1:1 ratio.
There is insufficient inferential evidence to reject the null hypothesis that the observed item frequencies could have come from a population with a 1:1:2 ratio.

Further explorations of the trends

When significant overal deviations have been identified (and there are more than 2 groups), it is often useful to explore the patterns of residuals. Since the residuals are a (standardized) measure of the differences between observed and expected for each classification group (A, B or C in this case), they provide an indication of which group(s) deviate most from the expected and therefore are the main driver(s) of the "effect".

In interpreting the residuals, we are looking for large (substantially larger in magnitude than 1)positive and negative values, which represent higher and lower observed frequencies than would have been expected under the null hypothesis respectively.

1:1:1 ratio	1:1:2 ratio
data.chisq$res [1] -0.1684304 -1.6843038 1.8527342	data.chisq1$res [1] 0.9481224 -0.8022575 -0.1031421

Conclusions:

There were fewer observed B's and more observed C's than would have been expected from a 1:1:1 population ratio.
As the observed data were not found to differ significantly from a 1:1:2 ratio, the residuals are not going to offer much additional insights and probably would not have been generated..

Worked Examples

Basic χ² references

Logan (2010) - Chpt 16-17
Quinn & Keough (2002) - Chpt 13-14

Goodness of fit test

A fictitious plant ecologist sampled 90 shrubs of a dioecious plant in a forest, and each plant was classified as being either male or female. The ecologist was interested in the sex ratio and whether it differed from 50:50. The observed counts and the predicted (expected) counts based on a theoretical 50:50 sex ratio follow.

Format of fictitious plant sex ratios - note, not a file

Expected and Observed data (50:50 sex ratio).

	Female	Male	Total
Observed	40	50	90
Expected	45	45	90

Note, it is not necessary to open or create a data file for this question.

First, what is the appropriate test
to examine the sex ratio of these plants?
What null hypothesis is being tested by this test?
What are the degrees of freedom are associated with this data for this test?
Perform a Goodness-of-fit test
to test the null hypothesis that these data came from a population with a 50:50 sex ratio (hint). Identify the following:

Show code
chisq.test(c(40, 50))
Chi-squared test for given probabilities data: c(40, 50) X-squared = 1.1111, df = 1, p-value = 0.2918
1. X² statistic
2. df
3. P value
What are your conclusions (statistical and biological)?

Lets now extend this fictitious endeavor. Recent studies on a related species of shrub have suggested a 30:70 female:male sex ratio. Knowing that our plant ecologist had similar research interests, the authors contacted her to inquire whether her data contradicted their findings.

Using the same observed data, test the null hypothesis
that these data came from a population with a 30:70 sex ratio (hint). From a 30:70 female:male sex ratio, what are the expected frequency counts of females and males from 90 individuals and what is the X² statistic?.

Show code
chisq.test(c(40, 50), p = c(0.3, 0.7))
Chi-squared test for given probabilities data: c(40, 50) X-squared = 8.9418, df = 1, p-value = 0.002787
chisq.test(c(40, 50), p = c(0.3, 0.7))$exp
[1] 27 63
1. Expected number of females
2. Expected number of males
3. X² statistic
Do the plant ecologist's data dispute the findings of the other studies? (y or n)