Simon says:

**learn is hard**: remember all the hours learning biology concepts;**be patient and practise**: remember your experiments rarely success with your first attempt;**use fresh eyes**: do not try to compare (yes, there is similarity with Excel or other); adopt the foreign language/culture learning process;**be organised**: apply your experimental methodologies to your computational analysis; track your work with a*digital***lab notebook**, e.g., with RMardown files. ;-)

In this introduction to R, we will present the main R concepts, by introducing R variables, vectors, matrices, lists, functions and packages.

Please read http://adv-r.had.co.nz/Style.html.

In R, we have two main types of values: numeric values and characters. Numerics:

`12`

`## [1] 12`

`3.14`

`## [1] 3.14`

`-42.12`

`## [1] -42.12`

and characters:

`"text"`

`## [1] "text"`

`"Rock and Roll"`

`## [1] "Rock and Roll"`

The characters also named strings are between double-quote `"`

(french keyboard: key 3).

We can save these values in variables,

`a <- 3.14`

so we can easily access them later or re-use them:

`a + a`

`## [1] 6.28`

`a`

`## [1] 3.14`

We can compute, save the value and display it:

```
result <- 2*a + 3 * pi
result
```

`## [1] 15.70478`

Assign a number of your choice to the variable

`b`

Show whatâ€™s in

`b`

Multiply

`a`

by`2`

Add

`10`

to`a`

Assign your name to an object called name

Vector contains several values:

`b <- 1:10`

We can concatenate these values with the `c()`

function:

`c(b, b)`

`## [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10`

We can also apply mathematical formulas to them:

`b + b`

`## [1] 2 4 6 8 10 12 14 16 18 20`

`3 * b`

`## [1] 3 6 9 12 15 18 21 24 27 30`

Concatenate

`a`

and`b`

Concatenate your name and

`a`

Add

`a`

to`b`

Many functions exist in R, and greatly simplify our life.

How to compute the mean of the variable `b`

? The naive way:

`(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10)/10`

`## [1] 5.5`

And now, what does it happen if `b <- 5:17`

? We have to modify the formula. Not enough powerful. R has built-in functions to easily compute common quantities.

For example, the mean is computed by:

`mean(b)`

`## [1] 5.5`

Moreover, we can compose the function and the mathematical operations (also fucntion).

```
m <- mean(b + b)
m
```

`## [1] 11`

What about the median? Easy:

`median(b)`

`## [1] 5.5`

And the number of elements (the length of the vector/list):

`length(b)`

`## [1] 10`

`length(b + b)`

`## [1] 10`

`length(c(b, b))`

`## [1] 20`

The concatenation `c()`

is a built-in function.

Let sum all the elements, find the minimum, the maximum:

`sum(b)`

`## [1] 55`

`min(b)`

`## [1] 1`

`max(b)`

`## [1] 10`

Or print a statistical summary:

`summary(b)`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 3.25 5.50 5.50 7.75 10.00
```

List the 6 first elements:

`head(b)`

`## [1] 1 2 3 4 5 6`

or the 6 last:

`tail(b)`

`## [1] 5 6 7 8 9 10`

or the 4 last:

`tail(b, n=4)`

`## [1] 7 8 9 10`

Be careful, R does not warn even if we are asking *weird* stuff:

`head(b, n=20)`

`## [1] 1 2 3 4 5 6 7 8 9 10`

We can read help using:

`?tail`

or the Help facilities of RStudio.

Create the vector 13, 13, 13, 13, 14, 14, 16, 18, 21 and asign it to a variable:

Compute the median:

Compute the mean:

Is it normal? ;-)

Compute the length:

Sum all the values:

Compute the mean using the previous sum and length

We sometimes want to insert a comment to remind the purpose of an action.

A comment starts with the symbol sharp `#`

.

`# this is a comment`

and a comment does nothing when evaluated.

It is really useful to explain several lines:

```
# Assign a list from 10 to 25 to the variable named var
var <- 10:25
# Select first elements
head(var)
```

`## [1] 10 11 12 13 14 15`

`var + var # return a new list`

`## [1] 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50`

```
# Eval the length the concatenated list
length(c(var, var))
```

`## [1] 32`

The comments do not appear in the Output block. However they appear in the final document (see this doc).

Matrices can be compared to tables: they are two dimensional and can contain numbers, characters, etc.

Let load a matrix:

```
load("cyto.RData")
cyto
```

```
## FSC-A SSC-A CD45 CD19 CD34 CD56
## cell_1 96985.98 34853.85 160.62489 111.52325 60.59250 75.91315
## cell_2 88143.30 22730.01 146.89993 109.37424 60.74403 66.13155
## cell_3 92953.35 31651.34 46.90078 64.03857 76.36395 82.21494
## cell_4 78595.02 51510.99 176.39761 63.25624 64.01659 114.98014
## cell_5 109199.79 60020.60 159.90495 63.29745 87.70644 88.29176
## cell_6 136586.52 130294.78 162.73160 59.74005 88.22612 140.49802
## cell_7 86250.78 25582.85 14.60138 63.21502 78.34427 65.37292
## cell_8 6391.98 4621.63 99.39405 58.45350 55.43047 65.70305
## cell_9 16286.76 7108.74 101.62800 67.55399 148.54910 77.11388
## cell_10 91409.22 76663.14 195.31946 59.94742 79.83633 80.57721
```

We can check the number of rows and columns in the cyto object:

`nrow(cyto)`

`## [1] 10`

`ncol(cyto)`

`## [1] 6`

`dim(cyto)`

`## [1] 10 6`

Load data from CSV file and show a summary:

```
df <- read.csv2(file='example.csv', header=TRUE, dec=",", sep=";")
dim(df)
```

`## [1] 150 5`

`summary(df)`

```
## Var1 Var2 Var3 Var4
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Pop
## setosa :50
## versicolor:50
## virginica :50
##
##
##
```

We can display the element in the first row and the first column, or the 2nd row and 3rd column.

`cyto[1, 1]`

`## [1] 96985.98`

`cyto[2, 3]`

`## [1] 146.8999`

The indexing convention is: row, column.

All the elements of a specific column of a matrix can be selected by its number, here the 3rd column:

```
fcs_a <- cyto[, 3]
fcs_a
```

```
## cell_1 cell_2 cell_3 cell_4 cell_5 cell_6 cell_7
## 160.62489 146.89993 46.90078 176.39761 159.90495 162.73160 14.60138
## cell_8 cell_9 cell_10
## 99.39405 101.62800 195.31946
```

How to concatenate two columns ?

```
# 1. select column 2 and column 5
# 2. concatenate them
# 3. store in the variable
fcs_b <- c(cyto[, 2], cyto[, 5])
fcs_b
```

```
## cell_1 cell_2 cell_3 cell_4 cell_5
## 34853.85156 22730.00977 31651.34180 51510.99219 60020.60156
## cell_6 cell_7 cell_8 cell_9 cell_10
## 130294.78125 25582.85156 4621.62988 7108.74023 76663.14062
## cell_1 cell_2 cell_3 cell_4 cell_5
## 60.59250 60.74403 76.36395 64.01659 87.70644
## cell_6 cell_7 cell_8 cell_9 cell_10
## 88.22612 78.34427 55.43047 148.54910 79.83633
```

```
# 1. concatenate the number 2 and 5
# 2. select the columns 2 and 5
# 3. store
fcs_c <- cyto[, c(2, 5)]
fcs_c
```

```
## SSC-A CD34
## cell_1 34853.85 60.59250
## cell_2 22730.01 60.74403
## cell_3 31651.34 76.36395
## cell_4 51510.99 64.01659
## cell_5 60020.60 87.70644
## cell_6 130294.78 88.22612
## cell_7 25582.85 78.34427
## cell_8 4621.63 55.43047
## cell_9 7108.74 148.54910
## cell_10 76663.14 79.83633
```

`# Do you see the difference ?`

The column can also be selected by typing its column name directly:

```
cd45 <- cyto[, "CD34"]
cd45
```

```
## cell_1 cell_2 cell_3 cell_4 cell_5 cell_6 cell_7
## 60.59250 60.74403 76.36395 64.01659 87.70644 88.22612 78.34427
## cell_8 cell_9 cell_10
## 55.43047 148.54910 79.83633
```

```
cd45_b <- cyto[, 5]
cd45_b
```

```
## cell_1 cell_2 cell_3 cell_4 cell_5 cell_6 cell_7
## 60.59250 60.74403 76.36395 64.01659 87.70644 88.22612 78.34427
## cell_8 cell_9 cell_10
## 55.43047 148.54910 79.83633
```

Idem to select several columns:

```
cd45_c <- cyto[, c("CD34", "CD45")]
cd45_c
```

```
## CD34 CD45
## cell_1 60.59250 160.62489
## cell_2 60.74403 146.89993
## cell_3 76.36395 46.90078
## cell_4 64.01659 176.39761
## cell_5 87.70644 159.90495
## cell_6 88.22612 162.73160
## cell_7 78.34427 14.60138
## cell_8 55.43047 99.39405
## cell_9 148.54910 101.62800
## cell_10 79.83633 195.31946
```

However, you cannot mix number and name to select columns. This chunk returns an error:

`# cyto[, c(1, "CD34")]`

Conversely, we can select a specific row, here the second:

```
cell2 <- cyto[2, ]
cell2
```

```
## FSC-A SSC-A CD45 CD19 CD34 CD56
## 88143.29688 22730.00977 146.89993 109.37424 60.74403 66.13155
```

- The names are important.
- The spaces in column name are hell.

**Good names** of variables, columns, files, etc. **ease a lot** the analysis, especially when things are not going as expected.

Show the 2 first rows of the cyto matrix

Show the

`FSC-A`

value of the first cell

Lists in R can be compared to vectors, which can contain various objects:

```
random_numbers <- c(1, 2, 3, 7)
random_letters <- c("A", "B", "C", "Z")
```

The variables `random_numbers`

and `random_letters`

are two lists containing 4 numbers or 4 letters.

These two lists can be associated to create another list:

```
my_list_b <- list(random_letters, random_numbers)
my_list_b
```

```
## [[1]]
## [1] "A" "B" "C" "Z"
##
## [[2]]
## [1] 1 2 3 7
```

The variable `my_list_b`

is a list containing two sub-lists. These sub-lists can be named:

```
my_list <- list(my_numbers = random_numbers,
my_letters = random_letters)
my_list
```

```
## $my_numbers
## [1] 1 2 3 7
##
## $my_letters
## [1] "A" "B" "C" "Z"
```

The different items of a list can then be accessed by using the `$`

and typing their name directly:

`my_list$my_numbers`

`## [1] 1 2 3 7`

- Create a list containing your name, and job:

Packages contain sets of useful functions, which can be installed, loaded, and then used in R. A package needs to be installed only once on your computer:

`#install.packages("Rtsne")`

But needs to be loaded every time you wish to use it:

`#library(Rtsne)`

The functions of an installed package can then be used. They often need the user to set parameters:

`#tsne <- Rtsne(cyto)`

This line would generate an error, we need to lower the perplexity as we only have 10 cells.

```
#tsne <- Rtsne(cyto, perplexity = 2)
#tsne$Y
```

You can always ask help to R, if you wish to know how a function works:

`#?Rtsne`

It is easy to define new functions. For example, I redefine my way to compute the mean:

```
moyenne <- function(lst) {
sumall <- sum(lst)
len <- length(lst)
moy <- sumall / len
moy
}
```

Then I can use it:

```
onelist <- 1:10
moyenne(onelist)
```

`## [1] 5.5`

and the result is the same than `mean(onelist)`

i.e., 5.5.

It is out of this presentation, but R is powerful and you can write loops over list, test value, etc.

```
moyenne.absolute <- function(lst) {
# init temporary variables
nelem <- 0
tot <- 0
# loop over all the elements of the list
for (val in lst) {
if (val > 0 ) {
tot <- tot + val
} else {
tot <- tot - val
}
nelem <- nelem + 1
}
# compute something
moy <- tot / nelem
# return it
moy
}
```

and this function computes `mean(abs(lst))`

. Let check that:

```
one <- 1:10
onelist <- c(-one, 2*one)
moyenne.absolute(onelist) == mean(abs(onelist))
```

`## [1] TRUE`

Well, if this section is not clear for you, then do not worry. We will talk again later.

- list starts with
`-`

*italics*is`*italics*`

**bold**is`**bold**`

`code`

is``code``

- equation \(E=mc^2 + \sum x_i\) is
`$E=mc^2 + \sum x_i$`

- section starts with
`#`

, subsection with`##`

, subsubsubsection with`###`

, etc. - hyperlink is
`[hyperlink](https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf)`

You can also pretty print, e.g., a table:

CD45 | CD34 | CD56 | |
---|---|---|---|

cell_3 | 46.90078 | 76.36395 | 82.21494 |

cell_4 | 176.39761 | 64.01659 | 114.98014 |

cell_5 | 159.90495 | 87.70644 | 88.29176 |

cell_6 | 162.73160 | 88.22612 | 140.49802 |

cell_7 | 14.60138 | 78.34427 | 65.37292 |

You could also have a look at the following online resources:

- https://rmarkdown.rstudio.com/lesson-1.html
- https://www.rstudio.com/online-learning/#r-programming
- https://www.youtube.com/watch?v=o0Y478jOjGk (11min29)
- https://www.youtube.com/watch?v=u1r5XTqrCTQ (2min52)

Please read http://adv-r.had.co.nz/Style.html.