Jump to main navigation


Tutorial 3.1 - Package management

27 Mar 2017

One of the great strengths of R is the ease to which it can be extended via the creation of new functions. This means that the functionality of the environment is not limited by the development priorities and economics of a comercial enterprise. Moreover, collections of related functions can be assembled together into what is called a package or library. These packages can be distributed to others to use or modify and thus the community and capacity grows.

One of the keys to the concept of packages is that they extend the functionality when it is required. Currently (2013), there are in excess of 4000 packages available on CRAN (Comprehensive R Archive Network) and an additional 2000 packages available via other sources. If all of that functionality was available simultaneously, the environment would be impeared with bloat. In any given session, the amount of extended functionality is likely to be relatively low, therefore it makes sence to only 'load' the functionality into memory when it is required.

The R environment comprises the core language itself (with its built in data, memory and control structures along with parsers error handlers and built in operators and constants) along with any number of packages. Even on a brand new install of R there are some packages. These tend to provide crucial of common functions and as such many of them are automatically loaded at the start of an R session.

To see what packages are currently loaded in your session, enter the following:

(.packages())
[1] "nlme"      "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
[8] "base"     

A more general alternative to using the .packages() function, is to use the seach() function.

 [1] ".GlobalEnv"        "package:nlme"      "package:stats"     "package:graphics" 
 [5] "package:grDevices" "package:utils"     "package:datasets"  "package:methods"  
 [9] "Autoloads"         "package:base"     
 [1] "knitr"     "mgcv"      "nlme"      "stats"     "graphics"  "grDevices" "utils"    
 [8] "datasets"  "methods"   "base"     
Actually, the search() function as just used (without providing a string to search for), returns the locations (search path) and order of where commands are searched for. For example, when you enter a command, the first place that R searches for this command (variable, function, constant, etc) is .GlobalEnv. .GlobalEnv is the current workspace and stores all the user created objects (such as variables, dataframe etc).

If the object is not found in .GlobalEnv, the search continues within the next search location (in my case the stats package and so on. When you load an additional package (such as the car package, this package (along with any of other packages that it depends on) will be placed towards the start of the search que. The logic being that if you have just loaded the package, then chances are you intend to use its functionality and therefore your statements will most likely be evaluated faster (because there is likely to be less to search through before locating the relevant objects).

library(car)
search()
 [1] ".GlobalEnv"        "package:car"       "package:nlme"      "package:stats"    
 [5] "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
 [9] "package:methods"   "Autoloads"         "package:base"     
Indeed, issuing the library() function this way simply adds the package to the search path. The detach() function removes a package from the search path. Removing a package from the search path when you know its functions are not going to be required for the rest of the session speeds up the evaluation of many statements (and therefore most routines) as the engine potentially has fewer packages to traverse whilst seeking objects.

Listing installed packages

The installed.packages() function tabulates a list of all the currently installed packages available on your system along with the package path (where is resides on your system) and version number. Additional fields can be requested (including "Priority", "Depends", "Imports", "LinkingTo", "Suggests", "Enhances", "OS_type", "License" and "Built").

installed.packages()
installed.packages(fields=c("Package", "LibPath", "Version", "Depends","Built"))

Yet more information can be obtained for any single package with the packageDescription() and library functions - the latter provides all the information of the former and then includes a descriptive index of all the functions and datasets defined within the package.

packageDescription('car')
library(help='car')

Installing packages

The R community contains some of the brightest and most generous mathematician, statisticians and practitioners who continue to actively develop and maintain concepts and routines. Most of these routines end up being packaged as a collection of functions and then hosted on one or more publicly available sites so that others can benefit from their efforts.

The locations of collections of packages are called repositories or 'repos' for short. There four main repositories are CRAN, Bioconductor, R-Forge and github. By default, R is only 'tuned in' to CRAN. That is any package queries or actions pertain just to the CRAN repositories.

To get a tabulated list of all the packages available on CRAN (warning there are over 4000 packages, so this will be a large table):

available.packages()

Comprehensive R Archive Network - CRAN

CRAN is a repository of R packages mirrored across 90 sites throughout the world. Packages are installed from CRAN using the install.packages() function. The first (and only mandatory) argument to the install.packages() function is the name of the package(s) to install (pkgs=). If no other arguments are provided, the install.packages() function will search CRAN for the specified package(s) and install it along with any of its dependencies that are not yet installed on your system.

Note, unless you have started the session with administrator (root) privileges, the packages will be installed within a path of your home folder. Whilst this is not necessarily a bad thing, it does mean that the package is not globally available to all users on your system (not that it is common to have multiple users of a single system these days). Moreover, it means that R packages reside in multiple locations across your system. The packages that came with your R install will be in one location (or a couple or related locations) and the packages that you have installed will be in another location.

To see the locations currently used on your system, you can issue the following statement.

.libPaths()

To install a specific package (and its dependencies):

install.packages("devtools")
You will be prompted to select a mirror site. In the absence of any other criterion, just select the mirror that is closed geographically to you. The terminal will then provide feedback about the progress and status of the install process.

By indicating a specific repository, you can avoid being prompted for a mirror. For example, I chose to use a CRAN mirror at Melbourne University (Australia), and therefore the following statement gives me direct access

install.packages("devtools", repos="http://cran.csiro.au")

Finally, you could provide a vector of repository names if you were unsure which repository was likely to contain the package you were after. This can also be useful if your preferred mirror regularly experiences downtime - the alternative mirror (second in the vector) is used only when the first fails.

Bioconductor

Bioconductor is an open source and open development project devoted to genomic data analysis tools, most of which are available as R packages. Whilst initially the packages focused primarily on the manipulation and analysis of DNA microarrays, as the scope of the projects has expanded, so too has the functional scope of the packages there hosted.

source("http://bioconductor.org/biocLite.R")
biocLite("limma")

Or to install multiple packages from Bioconductor

source("http://bioconductor.org/biocLite.R")
biocLite(c("GenomicFeatures", "AnnotationDbi"))

R-Forge

Unlike both CRAN and Bioconductor (which are essentially package repositories), R-Forge is an entire R package development platform. Package development is supported through a range of services including:

  • version control (SVN) - allowing multiple collaborators to maintain current and historical versions of files by facilitating simultaneous editing, conflict resolution and rolling back
  • daily package checking and building - so packages are always up to date
  • bug tracking and feature request tools
  • mailing lists and message boards
  • full backup and archival system
And all of this within a mature content management system like web environment.

Installing packages from R-Forge is the same as it is for CRAN, just that the path of the root repository needs to be specified with the repos= argument.

install.packages("lme4.0", repos="http://R-Forge.R-project.org")

Github

Github builds upon the philosophy of the development platform promoted by the Source Forge family (including R-Forge) by adding the ability to fork a project. Forking is when the direction of a project is split so that multiple new opportunities can be explored without jeopardizing the stability and integrity of the parent source. If the change in direction proves valuable, the project (package) can either become a new package or else feedback into the development of the original package.

Hadley Wickham has yet again come up with a set of outrageously useful tools (devtools package). This package is a set of functions that simplify (albeit slightly dictatorially) the processes of package authoring, building, releasing and installing. For now, we will concentrate on the latter feature.

In order to make use of this package to install packages from github, the devtools package must itself be installed. It is recommended that this install take place from CRAN (as outline above). Thereafter, the devtools package can be included in the search path and the install_github function used to retrieve and install a nominated package or packages from github.

install_github("ggplot2")

As described above, github is a development platform and therefore it is also a source of 'bleeding edge' development versions of packages. Whilst the development versions are less likely to be as stable or even as statistically rigorous as the final release versions, they do offer the very latest ideas and routines. They provide the very latest snapshot of where the developers are currently at.

Most of the time users only want the stable release versions of a package. However there are times when having the ability to try out new developments as they happen can be very rewarding. The dev_mode() function within the devtools package provides a switch that can be used to toggle your system in and out of development mode. When in development mode, installed packages are quarantined within a separate path (R-dev) to prevent them overriding or conflicting with the stable versions that are critical for your regular analyses.

# switch to development mode
dev_mode(on=T)
#install the development version of ggplot2
install_github("ggplot2")
# use the development version of ggplot2 
library(ggplot2)
# switch development mode off
dev_mode(on=F)
# stable version of ggplot2 is now engaged

Manual download and install

Packages are made available on the various repositories in compressed form and differ between Windows, MacOSX and Linux versions. Those web repositories all have functionality for navigating or searching through the repositories for specific packages. The packages (compressed files) can be directly downloaded from these sites.

Additionally, some packages are not available on the various repositories and firewalls and proxies can sometimes prevent R from accessing the repositories directly. In these cases, packages must be manually downloaded and installed.

There are a number of ways to install a package that resides locally. Note, do not uncompress the packages.

  1. From the command line (outside of R).
    R CMD INSTALL packagename 
    
    where packagename is replaced by the path and name of the compressed package.
  2. Using the install.packages() function by specifying repos=NULL.
    install.packages('packagename', repos=NULL)
    
    where packagename is replaced by the path (if not in the current working directory) and name of the compressed package.
  3. Via the Windows RGui, select the Install package(s) from local zip files... option of the Packages menu and select the compressed package.

Updating packages

An integral component of package management is being able to maintain an up to date system. Many packages are regularly updated so as to adopt new ideas and functionality. Indeed, it is the speed of functional evolution that sets R apart from most other statistical environments.

Along with the install.packages() function, there are three other functions to help manage and maintain the packages on your system.

  • old.packages() compares the versions of packages you have installed with the versions of those packages available in the current repositories. It tabulates the names, install paths and versions of old packages on your system.
    old.packages()
    
    Alternative repositories (than CRAN) can be indicated via the repos= argment.
    old.packages(repos="http://R-Forge.R-project.org")
    #or even multiple repos
    old.packages(repos=c("http://cran.csiro.au","http://R-Forge.R-project.org"))
    
  • new.packages() provides a tabulated list of all the packages on the repository that are either not in your local install, or else are of a newer version. Note, with over 4000 packages available on CRAN, unless the repos= parameter is pointing to somewhere very specific (and with a narrow subset of packages) this function is rarely of much use.
    new.packages()
    
  • update.packages() downloads and installs packages for which newer versions of those packages identified as 'old' by the old.packages() function. Just like old.packages(), alternative or multiple repositories can be specified.
    update.packages()
    #or from alternative multiple repos
    update.packages(repos=c("http://cran.csiro.au","http://R-Forge.R-project.org"))
    

Reinstalling packages following re-installation of R

Just as development continues on packages, so to does development continue on the main base R system. Periodically, a new version of R comes out with a new set of features, performance enhancements, bug fixes and requirements. When a new stable version is released, some packages are also altered to reflect the changes. Hence, some new functionality can necessitate not only an update of a package, but also an update of the base R system.

Updating the entire system can be a lengthy and inconvenient process as not only does R need to be re-installed, typically all of the packages need to be re-installed or updated. The following steps can be used to reduce the pain by semi-automating the process and ensuring that no package is forgotten.

  1. Start R and change the current working directory to a convenient location (such as your home, desktop or downloads folder).
  2. Get a vector of all packages currently installed on your system
    packages <- installed.packages()[,"Package"]
    
  3. Get a vector of packages installed on your system that are part of the base install.packages and exclude them from the packages vector
    base <- installed.packages(priority="base")[,"Package"]
    packages <- packages[!packages %in% base]
    
  4. Get a vector of packages that are available across the desired repositories and compare it to your vector of packages to produce:
    • A vector of packages that will need to be re-installed from the repositories. If the contriburl= argument is omitted, then only the repositories in the current repos path (usually only CRAN) will be searched.
      rep <- available.packages(contriburl=contrib.url(c("http://cran.csiro.au",
        "http://R-Forge.R-project.org")))[,"Package"]
      toGetFromRepos <- packages[packages %in% rep]
      
    • A vector of packages that will have to be acquired and installed from other locations
      toGetFromElse <- packages[!packages %in% rep]
      toGetFromElse
      
               acepack            akima       assertthat              AUC        backports 
             "acepack"          "akima"     "assertthat"            "AUC"      "backports" 
             base64enc        bayesplot            biglm         binGroup           bitops 
           "base64enc"      "bayesplot"          "biglm"       "binGroup"         "bitops" 
                  boot             brms            broom           btergm          caTools 
                "boot"           "brms"          "broom"         "btergm"        "caTools" 
             checkmate             coda     colourpicker       commonmark             covr 
           "checkmate"           "coda"   "colourpicker"     "commonmark"           "covr" 
                crayon             curl              DBI             desc            dplyr 
              "crayon"           "curl"            "DBI"           "desc"          "dplyr" 
                    DT           dtplyr         dygraphs             ergm       ergm.count 
                  "DT"         "dtplyr"       "dygraphs"           "ergm"     "ergm.count" 
          estimability         evaluate              evd          flexmix          forcats 
        "estimability"       "evaluate"            "evd"        "flexmix"        "forcats" 
                   gam           gamlss          geepack            GERGM           GGally 
                 "gam"         "gamlss"        "geepack"          "GERGM"         "GGally" 
                 ggmap          ggplot2           glmnet         gridBase           gtable 
               "ggmap"        "ggplot2"         "glmnet"       "gridBase"         "gtable" 
                gtools            haven            highr            Hmisc              hms 
              "gtools"          "haven"          "highr"          "Hmisc"            "hms" 
                HSAUR3        htmlTable      htmlwidgets           httpuv             httr 
              "HSAUR3"      "htmlTable"    "htmlwidgets"         "httpuv"           "httr" 
                igraph           inline            irlba             jpeg         jsonlite 
              "igraph"         "inline"          "irlba"           "jpeg"       "jsonlite" 
                  KFAS            knitr         lazyeval              lfe             lme4 
                "KFAS"          "knitr"       "lazyeval"            "lfe"           "lme4" 
                lmtest              loo          lpSolve        lubridate         magrittr 
              "lmtest"            "loo"        "lpSolve"      "lubridate"       "magrittr" 
               mapdata          mapproj             maps         markdown           mclust 
             "mapdata"        "mapproj"           "maps"       "markdown"         "mclust" 
               memoise             mgcv   microbenchmark             mime           miniUI 
             "memoise"           "mgcv" "microbenchmark"           "mime"         "miniUI" 
                misc3d           mnormt           modelr       modeltools           mstate 
              "misc3d"         "mnormt"         "modelr"     "modeltools"         "mstate" 
                 muhaz        multicool          munsell            ncdf4   networkDynamic 
               "muhaz"      "multicool"        "munsell"          "ncdf4" "networkDynamic" 
                  nlme           nloptr              NMF     nycflights13          openssl 
                "nlme"         "nloptr"            "NMF"   "nycflights13"        "openssl" 
                orcutt          packrat         pbkrtest            pcaPP          permute 
              "orcutt"        "packrat"       "pbkrtest"          "pcaPP"        "permute" 
                   PKI            plogr             plyr              png            poLCA 
                 "PKI"          "plogr"           "plyr"            "png"          "poLCA" 
                praise      prettyunits         progress            proto            psych 
              "praise"    "prettyunits"       "progress"          "proto"          "psych" 
                 purrr         quadprog         quantreg               R6           raster 
               "purrr"       "quadprog"       "quantreg"             "R6"         "raster" 
               R.cache     RColorBrewer        RcppEigen     RcppParallel            RCurl 
             "R.cache"   "RColorBrewer"      "RcppEigen"   "RcppParallel"          "RCurl" 
                 readr           readxl       RefManageR         registry              rem 
               "readr"         "readxl"     "RefManageR"       "registry"            "rem" 
               reshape         reshape2              rex            rgdal            rgeos 
             "reshape"       "reshape2"            "rex"          "rgdal"          "rgeos" 
                 rjson          RJSONIO        rmarkdown      R.methodsS3           RMySQL 
               "rjson"        "RJSONIO"      "rmarkdown"    "R.methodsS3"         "RMySQL" 
                  ROCR             R.oo         roxygen2      RPostgreSQL        rprojroot 
                "ROCR"           "R.oo"       "roxygen2"    "RPostgreSQL"      "rprojroot" 
                 rrcov            R.rsp        rsconnect          RSQLite            rstan 
               "rrcov"          "R.rsp"      "rsconnect"        "RSQLite"          "rstan" 
              rstanarm       rstantools       rstudioapi            RUnit          R.utils 
            "rstanarm"     "rstantools"     "rstudioapi"          "RUnit"        "R.utils" 
                 rvest          RWiener           scales          selectr            shiny 
               "rvest"        "RWiener"         "scales"        "selectr"          "shiny" 
               shinyjs        shinystan      shinythemes           slackr              sna 
             "shinyjs"      "shinystan"    "shinythemes"         "slackr"            "sna" 
           sourcetools               sp          SparseM         speedglm          spTimer 
         "sourcetools"             "sp"        "SparseM"       "speedglm"        "spTimer" 
           StanHeaders          statmod          statnet   statnet.common          stringr 
         "StanHeaders"        "statmod"        "statnet" "statnet.common"        "stringr" 
              survival            tergm         testthat          threejs           tibble 
            "survival"          "tergm"       "testthat"        "threejs"         "tibble" 
                 tidyr        tidyverse             tnam            trust          tweedie 
               "tidyr"      "tidyverse"           "tnam"          "trust"        "tweedie" 
               viridis            withr            xergm     xergm.common              XML 
             "viridis"          "withr"          "xergm"   "xergm.common"            "XML" 
                  xml2           xtable             yaml             boot            class 
                "xml2"         "xtable"           "yaml"           "boot"          "class" 
               cluster        codetools          foreign       KernSmooth             MASS 
             "cluster"      "codetools"        "foreign"     "KernSmooth"           "MASS" 
                  mgcv             nlme             nnet            rpart          spatial 
                "mgcv"           "nlme"           "nnet"          "rpart"        "spatial" 
              survival 
            "survival" 
      
  5. Save the vector of packages to be re-installed from the repositories and the vector of packages to be installed from elsewhere
    save(toGetFromRepos, file="Rpackages")
    save(toGetFromElse, file="Otherpackages")
    
  6. Download and install the latest version of R
  7. Start R, preferably with administrator (root) privileges and make sure that the current working directory is pointing to the location that the Rpackages file was saved.
  8. Load the vector of packages to be re-installed from the repositories.
    load(file="Rpackages")
    load(file="Otherpackages")
    
  9. Loop through a the vector of packages and install those that are not already on the system
    for (p in setdiff(toGetFromRepos, installed.packages()[,"Package"])) {
    install.packages(p, repos=c("http://cran.csiro.au","http://R-Forge.R-project.org"))
    }
    
  10. Manually install those packages that do not reside in repositories (the vector toGetFromElse serves as a reminder of what these packages are).

The above steps are also useful when setting up R on an additional machine. It helps minimize incompatibilities between the machines.

The above steps could also be put into a shell script to automate the process even further.

Creating packages

Comming soon - based on devtools


Exponential family of distributions

The exponential distributions are a class of continuous distribution which can be characterized by two parameters. One of these parameters (the location parameter) is a function of the mean and the other (the dispersion parameter) is a function of the variance of the distribution. Note that recent developments have further extended generalized linear models to accommodate other non-exponential residual distributions.

End of instructions