Jump to main navigation


Tutorial 4 - Exploratory data analysis

23 April 2011

This Workshop has been thrown together a little hastily and is therefore not very well organized - sorry! Graphical features are demonstrated either via tables of properties or as clickable graphics that reveal the required R code. Click on a graphic to reveal/toggle the source code.

High level plotting functions

Most graphics in R are performed by issuing a series (one or more) graphical statements that sequentially add additional features to a graphical device. A graphical device is any device capable of receiving and interpreting graphical statements. Common examples


  R object classes

Assigning entries is basically the act of defining a new object name and specifying what that object contains (its value). For example if we wanted to store the number 10.513 as John Howards IQ, we instruct R to create a new object called (say IQ) and assign the value 10.513 to it. That is, we instruct R that IQ equals 10.513.
In R, the assignment operator is <- instead of =.

> name <- value

So to assign IQ the value of 10.513 in R
> IQ <- 10.513

End of instructions

  R object classes

Object classes define how information in stored and displayed. The basic storage unit in R is called a vector. A vector is an array of one or more entries of the same class. The common classes include
  1. numeric - stores a number eg 1, 2.345 etc
  2. character - stores alphanumeric characters eg 'a', 'fish', 'color1'
  3. logical - stores either TRUE or FALSE
So the entries (1, 2, 3 & 4) might make up a numeric vector, whereas the entries ('Big', 'Small' & 'Tiny') would make up a character vector. To determine the class type of an object, use the following syntax (where bold font is used to represent the object whose class is to be determined).

> class(name)

End of instructions

  Print contents

In R, print means to output (list) the contents of an object. By default, the contents are output to the screen (the default output device). It is also possible to redirect output to a file or printer if necessary. The contents of a file are 'printed' by completing the 'print()' function or by just entering the name of the object. Consider the following;
> numbers <- c(1, 4, 6, 7, 4, 345, 36, 78)
> numbers
[1]   1   4   6   7   4 345  36  78
The first line of this syntax generates and populates the numeric vector called 'numbers'. The second line uses the print function to tell R to list the contents of the 'numbers' object - the output of which appears on the third line. The forth and fifth line illustrate that the same outcome can be achieved by simply entering the name of the object.

End of instructions

  R vectors - variables

In biology, a variable is a collection of observations of the same type. For example, a variable might consist of the observed weights of individuals within a sample of 10 bush rats. Each item (or element) in the variable is of the same type (a weight) and will have been measured comparably (same techniques and units). Biological variables are therefore best represented in R by vectors.

End of instructions

  R Factors

There are a number of ways in which this can be done. One way is to use the 'factor' (makes a list into a factor) function in conjunction with the 'c' (concatenation) function.

> name <- factor(c(list of characters/words))

Another way is to use the 'gl' function (which generates factors according to specified patterns of their levels)

> name <- gl(number of levels, number of replicates, length of data set, lab=c(list of level names)))

Hence, consider the following alternative solutions;
> sex <- factor(c("Female", "Female", "Female", "Female", "Female",
+     "Female", "Male", "Male", "Male", "Male", "Male", "Male"))
> #OR
> sex <- factor(c(rep("Female", 6), rep("Male", 6)))
> #OR
> sex <- gl(2, 6, 12, lab = c("Female", "Male"))

The second option uses the 'rep()' function which in this case is used to repeat the level name (eg 'Female') 6 times.

End of instructions