Jump to main navigation


Tutorial 17.4 - Knitr and reproducible research

18 Nov 2018

Overview

Tutorials 17.1 and 17.2 introduced two document markup languages for the preparation of PDF and HTML respectively. Tutorial 17.3 introduced the markdown language and pandoc - the universal document conversion tool.

Reproducible research is a data analyses concept that promotes publishing of all analysis source, outcomes and supporting commentary (such as a description of methodologies and interpretation of results) in such a way that others can reproduce the findings for verification. Ideally, this works best when the documentation and source codes are woven together into a single document. Traditionally, this would involve substantial quantities of 'cutting and pasting' from statistical software into document authoring tools such as LaTeX, html or Microsoft Word. Of course, any minor changes in the analyses then would necessitate replacements of the code in the document as well as replacing any affected figures or tables. Keeping everything synchronized was a bit of a battle.

This is where packages like knitr come in. Knitr evaluates blocks of code within a document and converting both the code and output into the same format as the surrounding document (e.g. LaTeX or html). This scheme greatly facilitates reproducible research by allowing the document and all source code to be contained in a single file or related files.

How it works

Within the surrounding document, code blocks are defined within language specific tag pairs:

  • for LaTeX
     <<>>=
     ...
     @
    
  • for HTML
     <!--begin.rcode
     ...
     end.rcode -->
    

For a first example, we will add a single simple code block to minimum LaTeX and html documents.

LaTeX

The workflow consists of the source document/code (in this case a text file called min.Rtex), an R session in which the knit() function from the knitr package is used to 'knit' the code blocks into the surrounding document format, and finally compiling into pdf via xelatex (of pdflatex).

LaTeX code (min.Rtex)PDF result
  \documentclass[a4paper,12pt]{article}
  \begin{document}
  \section{A section}\label{sec:s1}
  This is a minimum \LaTeX~document with embeded
  R code.

  <<Summary>>=
  x = rnorm(10)
  summary(x)
  @
  \end{document}

Within R
  library(knitr)
  knit('min.Rtex', 'min.tex')
Command line
  xelatex min.tex

HTML

The workflow consists of the source document/code (in this case a text file called min.HTML), and an R session in which the knit() function from the knitr package is used to 'knit' the code blocks into the surrounding document format.

LaTeX code (min.Rhtml)HTML result
  <!DOCTYPE html>
  <html>
  <head>
  <title> Simple document </title>
  </head>
  <body>
  <h1>A Section</h1>
  This is a minimum HTML document with
  embedded R code
  
  <!--begin.rcode Summary1
  x = rnorm(10)
  summary(x)
  end.rcode-->
  </body>
  </html>
Within R
  library(knitr)
  knit('min.Rhtml', 'min.html')

knitr options

In the examples above, a single option was provided as a knitr 'chunk' argument. This option was chunk label and is used to provide a name for the chunk (chunks can refer to other chunks). The following table lists other common options available (for a full list of options, visit the knitr chunk options website).

OptionDescriptionLaTeXHTML
Text output
echo either:
  • TRUE or FALSE (whether the output should include the chunk code)
  • a vector of numbers (indicating which lines of the code to include in the output)
Yes Yes
eval either:
  • TRUE or FALSE (whether to evaluate the code)
  • a vector of numbers (indicating which lines of the code to should be evaluated)
Yes Yes
results either:
  • markup - output in the format of the surrounding document
  • asis - output as raw (verbatim) output
  • hold - defer the output of individual outputs until the chunk end
  • hide - hide the output (not warnings or errors)
Yes Yes
warning either:
  • TRUE or FALSE (whether to include warnings)
  • a vector of numbers (indicating which warnings to include)
Yes Yes
error either:
  • TRUE or FALSE (whether to include errors)
  • a vector of numbers (indicating which errors to include)
Yes Yes
message either:
  • TRUE or FALSE (whether to include messages)
  • a vector of numbers (indicating which messages to include)
Yes Yes
Code decoration
tidy either:
  • TRUE or FALSE (whether to reformat the code)
  • 'styler' to use the styler package for reformatting
Yes Yes
tidy.opts a list of options passed on to the tidying function. For example, tidy.opts = list(width.cutoff = 60) restricts the width of R output to 60 characters wide. Yes Yes
prompt TRUE or FALSE (whether to include R prompts in the echoed code) Yes Yes
comment the comment character used before output (defaults to ##) Yes Yes
highlight TRUE or FALSE (whether to apply syntax highlighting to the code) Yes Yes
size the font size for code and output Yes No
background the color of the code and output background Yes No