Simon says:

learn is hard: remember all the hours learning biology concepts;
be patient and practise: remember your experiments rarely success with your first attempt;
use fresh eyes: do not try to compare (yes, there is similarity with Excel or other); adopt the foreign language/culture learning process;
be organised: apply your experimental methodologies to your computational analysis; track your work with a digital lab notebook, e.g., with RMardown files. ;-)

In this introduction to R, we will present the main R concepts, by introducing R variables, vectors, matrices, lists, functions and packages.

Please read http://adv-r.had.co.nz/Style.html.

Variable

In R, we have two main types of values: numeric values and characters. Numerics:

## [1] 12

3.14

## [1] 3.14

-42.12

## [1] -42.12

and characters:

"text"

## [1] "text"

"Rock and Roll"

## [1] "Rock and Roll"

The characters also named strings are between double-quote " (french keyboard: key 3).

We can save these values in variables,

a <- 3.14

so we can easily access them later or re-use them:

a + a

## [1] 6.28

## [1] 3.14

We can compute, save the value and display it:

result <- 2*a + 3 * pi
result

## [1] 15.70478

Exercises

Assign a number of your choice to the variable b
Show what’s in b
Multiply a by 2
Add 10 to a
Assign your name to an object called name

Vector / List

Vector contains several values:

b <- 1:10

We can concatenate these values with the c() function:

c(b, b)

##  [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10

We can also apply mathematical formulas to them:

b + b

##  [1]  2  4  6  8 10 12 14 16 18 20

3 * b

##  [1]  3  6  9 12 15 18 21 24 27 30

Exercises

Concatenate a and b
Concatenate your name and a
Add a to b

Built-in Function

Many functions exist in R, and greatly simplify our life.

How to compute the mean of the variable b ? The naive way:

(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10)/10

## [1] 5.5

And now, what does it happen if b <- 5:17? We have to modify the formula. Not enough powerful. R has built-in functions to easily compute common quantities.

For example, the mean is computed by:

mean(b)

## [1] 5.5

Moreover, we can compose the function and the mathematical operations (also fucntion).

m <- mean(b + b)
m

## [1] 11

What about the median? Easy:

median(b)

## [1] 5.5

And the number of elements (the length of the vector/list):

length(b)

## [1] 10

length(b + b)

## [1] 10

length(c(b, b))

## [1] 20

The concatenation c() is a built-in function.

Let sum all the elements, find the minimum, the maximum:

sum(b)

## [1] 55

min(b)

## [1] 1

max(b)

## [1] 10

Or print a statistical summary:

summary(b)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.25    5.50    5.50    7.75   10.00

List the 6 first elements:

head(b)

## [1] 1 2 3 4 5 6

or the 6 last:

tail(b)

## [1]  5  6  7  8  9 10

or the 4 last:

tail(b, n=4)

## [1]  7  8  9 10

Be careful, R does not warn even if we are asking weird stuff:

head(b, n=20)

##  [1]  1  2  3  4  5  6  7  8  9 10

We can read help using:

?tail

or the Help facilities of RStudio.

Exercises

Create the vector 13, 13, 13, 13, 14, 14, 16, 18, 21 and asign it to a variable:
Compute the median:
Compute the mean:
Is it normal? ;-)
Compute the length:
Sum all the values:
Compute the mean using the previous sum and length

Block (chunk) with comment

We sometimes want to insert a comment to remind the purpose of an action.

A comment starts with the symbol sharp #.

# this is a comment

and a comment does nothing when evaluated.

It is really useful to explain several lines:

# Assign a list from 10 to 25 to the variable named var
var <- 10:25

# Select first elements
head(var)

## [1] 10 11 12 13 14 15

var + var # return a new list

##  [1] 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

# Eval the length the concatenated list
length(c(var, var))

## [1] 32

The comments do not appear in the Output block. However they appear in the final document (see this doc).

Matrices / Table

Matrices can be compared to tables: they are two dimensional and can contain numbers, characters, etc.

Let load a matrix:

load("cyto.RData")
cyto

##             FSC-A     SSC-A      CD45      CD19      CD34      CD56
## cell_1   96985.98  34853.85 160.62489 111.52325  60.59250  75.91315
## cell_2   88143.30  22730.01 146.89993 109.37424  60.74403  66.13155
## cell_3   92953.35  31651.34  46.90078  64.03857  76.36395  82.21494
## cell_4   78595.02  51510.99 176.39761  63.25624  64.01659 114.98014
## cell_5  109199.79  60020.60 159.90495  63.29745  87.70644  88.29176
## cell_6  136586.52 130294.78 162.73160  59.74005  88.22612 140.49802
## cell_7   86250.78  25582.85  14.60138  63.21502  78.34427  65.37292
## cell_8    6391.98   4621.63  99.39405  58.45350  55.43047  65.70305
## cell_9   16286.76   7108.74 101.62800  67.55399 148.54910  77.11388
## cell_10  91409.22  76663.14 195.31946  59.94742  79.83633  80.57721

We can check the number of rows and columns in the cyto object:

nrow(cyto)

## [1] 10

ncol(cyto)

## [1] 6

dim(cyto)

## [1] 10  6

Load data from CSV file and show a summary:

df <- read.csv2(file='example.csv', header=TRUE, dec=",", sep=";")
dim(df)

## [1] 150   5

summary(df)

##       Var1            Var2            Var3            Var4      
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##          Pop    
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
##

Indexing

We can display the element in the first row and the first column, or the 2nd row and 3rd column.

cyto[1, 1]

## [1] 96985.98

cyto[2, 3]

## [1] 146.8999

The indexing convention is: row, column.

All the elements of a specific column of a matrix can be selected by its number, here the 3rd column:

fcs_a <- cyto[, 3]
fcs_a

##    cell_1    cell_2    cell_3    cell_4    cell_5    cell_6    cell_7 
## 160.62489 146.89993  46.90078 176.39761 159.90495 162.73160  14.60138 
##    cell_8    cell_9   cell_10 
##  99.39405 101.62800 195.31946

How to concatenate two columns ?

# 1. select column 2 and column 5
# 2. concatenate them
# 3. store in the variable
fcs_b <- c(cyto[, 2], cyto[, 5])
fcs_b

##       cell_1       cell_2       cell_3       cell_4       cell_5 
##  34853.85156  22730.00977  31651.34180  51510.99219  60020.60156 
##       cell_6       cell_7       cell_8       cell_9      cell_10 
## 130294.78125  25582.85156   4621.62988   7108.74023  76663.14062 
##       cell_1       cell_2       cell_3       cell_4       cell_5 
##     60.59250     60.74403     76.36395     64.01659     87.70644 
##       cell_6       cell_7       cell_8       cell_9      cell_10 
##     88.22612     78.34427     55.43047    148.54910     79.83633

# 1. concatenate the number 2 and 5
# 2. select the columns 2 and 5
# 3. store
fcs_c <- cyto[, c(2, 5)]
fcs_c

##             SSC-A      CD34
## cell_1   34853.85  60.59250
## cell_2   22730.01  60.74403
## cell_3   31651.34  76.36395
## cell_4   51510.99  64.01659
## cell_5   60020.60  87.70644
## cell_6  130294.78  88.22612
## cell_7   25582.85  78.34427
## cell_8    4621.63  55.43047
## cell_9    7108.74 148.54910
## cell_10  76663.14  79.83633

# Do you see the difference ?

The column can also be selected by typing its column name directly:

cd45 <- cyto[, "CD34"]
cd45

##    cell_1    cell_2    cell_3    cell_4    cell_5    cell_6    cell_7 
##  60.59250  60.74403  76.36395  64.01659  87.70644  88.22612  78.34427 
##    cell_8    cell_9   cell_10 
##  55.43047 148.54910  79.83633

cd45_b <- cyto[, 5]
cd45_b

##    cell_1    cell_2    cell_3    cell_4    cell_5    cell_6    cell_7 
##  60.59250  60.74403  76.36395  64.01659  87.70644  88.22612  78.34427 
##    cell_8    cell_9   cell_10 
##  55.43047 148.54910  79.83633

Idem to select several columns:

cd45_c <- cyto[, c("CD34", "CD45")]
cd45_c

##              CD34      CD45
## cell_1   60.59250 160.62489
## cell_2   60.74403 146.89993
## cell_3   76.36395  46.90078
## cell_4   64.01659 176.39761
## cell_5   87.70644 159.90495
## cell_6   88.22612 162.73160
## cell_7   78.34427  14.60138
## cell_8   55.43047  99.39405
## cell_9  148.54910 101.62800
## cell_10  79.83633 195.31946

However, you cannot mix number and name to select columns. This chunk returns an error:

# cyto[, c(1, "CD34")]

Conversely, we can select a specific row, here the second:

cell2 <- cyto[2, ]
cell2

##       FSC-A       SSC-A        CD45        CD19        CD34        CD56 
## 88143.29688 22730.00977   146.89993   109.37424    60.74403    66.13155

Important to remember

The names are important.
The spaces in column name are hell.

Good names of variables, columns, files, etc. ease a lot the analysis, especially when things are not going as expected.

Exercises

Show the 2 first rows of the cyto matrix
Show the FSC-A value of the first cell

More on list

Lists in R can be compared to vectors, which can contain various objects:

random_numbers <- c(1, 2, 3, 7)
random_letters <- c("A", "B", "C", "Z")

The variables random_numbers and random_letters are two lists containing 4 numbers or 4 letters.

These two lists can be associated to create another list:

my_list_b <- list(random_letters, random_numbers)
my_list_b

## [[1]]
## [1] "A" "B" "C" "Z"
## 
## [[2]]
## [1] 1 2 3 7

The variable my_list_b is a list containing two sub-lists. These sub-lists can be named:

my_list <- list(my_numbers = random_numbers,
                my_letters = random_letters)
my_list

## $my_numbers
## [1] 1 2 3 7
## 
## $my_letters
## [1] "A" "B" "C" "Z"

The different items of a list can then be accessed by using the $ and typing their name directly:

my_list$my_numbers

## [1] 1 2 3 7

Exercises

Create a list containing your name, and job:

Packages

Packages contain sets of useful functions, which can be installed, loaded, and then used in R. A package needs to be installed only once on your computer:

#install.packages("Rtsne")

But needs to be loaded every time you wish to use it:

#library(Rtsne)

The functions of an installed package can then be used. They often need the user to set parameters:

#tsne <- Rtsne(cyto)

This line would generate an error, we need to lower the perplexity as we only have 10 cells.

#tsne <- Rtsne(cyto, perplexity = 2)
#tsne$Y

You can always ask help to R, if you wish to know how a function works:

#?Rtsne

Define your own function

It is easy to define new functions. For example, I redefine my way to compute the mean:

moyenne <- function(lst) {
  sumall <- sum(lst)
  len <- length(lst)
  moy <- sumall / len
  moy
}

Then I can use it:

onelist <- 1:10
moyenne(onelist)

## [1] 5.5

and the result is the same than mean(onelist) i.e., 5.5.

It is out of this presentation, but R is powerful and you can write loops over list, test value, etc.

moyenne.absolute <- function(lst) {
  # init temporary variables
  nelem <- 0
  tot <- 0
  # loop over all the elements of the list
  for (val in lst) {
    if (val > 0 ) {
      tot <- tot + val
    } else {
      tot <- tot - val
    }
    nelem <- nelem + 1
  }
  # compute something
  moy <- tot / nelem
  # return it
  moy
}

and this function computes mean(abs(lst)). Let check that:

one <- 1:10
onelist <- c(-one, 2*one)

moyenne.absolute(onelist) == mean(abs(onelist))

## [1] TRUE

Well, if this section is not clear for you, then do not worry. We will talk again later.

RMarkdown summary

list starts with -
italics is *italics*
bold is **bold**
code is `code`
equation $E=mc^2 + \sum x_i$ is $E=mc^2 + \sum x_i$
section starts with #, subsection with ##, subsubsubsection with ###, etc.
hyperlink is [hyperlink](https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf)

Subsection talking about table

You can also pretty print, e.g., a table:

Legend of the cyto table
	CD45	CD34	CD56
cell_3	46.90078	76.36395	82.21494
cell_4	176.39761	64.01659	114.98014
cell_5	159.90495	87.70644	88.29176
cell_6	162.73160	88.22612	140.49802
cell_7	14.60138	78.34427	65.37292

Subsubsection talking about figure

My nice caption of my nice figure

Part 1 - R introduction

Variable

Exercises

Vector / List

Exercises

Built-in Function

Exercises

Block (chunk) with comment

Matrices / Table

Indexing

Important to remember

Exercises

More on list

Exercises

Packages

Define your own function

RMarkdown summary

Subsection talking about table

Subsubsection talking about figure

Further readings