Simon says:

In this introduction to R, we will present the main R concepts, by introducing R variables, vectors, matrices, lists, functions and packages.

Please read http://adv-r.had.co.nz/Style.html.

Variable

In R, we have two main types of values: numeric values and characters. Numerics:

12
## [1] 12
3.14
## [1] 3.14
-42.12
## [1] -42.12

and characters:

"text"
## [1] "text"
"Rock and Roll"
## [1] "Rock and Roll"

The characters also named strings are between double-quote " (french keyboard: key 3).

We can save these values in variables,

a <- 3.14

so we can easily access them later or re-use them:

a + a
## [1] 6.28
a
## [1] 3.14

We can compute, save the value and display it:

result <- 2*a + 3 * pi
result
## [1] 15.70478

Exercises

  1. Assign a number of your choice to the variable b

  2. Show what’s in b

  3. Multiply a by 2

  4. Add 10 to a

  5. Assign your name to an object called name

Vector / List

Vector contains several values:

b <- 1:10

We can concatenate these values with the c() function:

c(b, b)
##  [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10

We can also apply mathematical formulas to them:

b + b
##  [1]  2  4  6  8 10 12 14 16 18 20
3 * b
##  [1]  3  6  9 12 15 18 21 24 27 30

Exercises

  1. Concatenate a and b

  2. Concatenate your name and a

  3. Add a to b

Built-in Function

Many functions exist in R, and greatly simplify our life.

How to compute the mean of the variable b ? The naive way:

(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10)/10
## [1] 5.5

And now, what does it happen if b <- 5:17? We have to modify the formula. Not enough powerful. R has built-in functions to easily compute common quantities.

For example, the mean is computed by:

mean(b)
## [1] 5.5

Moreover, we can compose the function and the mathematical operations (also fucntion).

m <- mean(b + b)
m
## [1] 11

What about the median? Easy:

median(b)
## [1] 5.5

And the number of elements (the length of the vector/list):

length(b)
## [1] 10
length(b + b)
## [1] 10
length(c(b, b))
## [1] 20

The concatenation c() is a built-in function.

Let sum all the elements, find the minimum, the maximum:

sum(b)
## [1] 55
min(b)
## [1] 1
max(b)
## [1] 10

Or print a statistical summary:

summary(b)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.25    5.50    5.50    7.75   10.00

List the 6 first elements:

head(b)
## [1] 1 2 3 4 5 6

or the 6 last:

tail(b)
## [1]  5  6  7  8  9 10

or the 4 last:

tail(b, n=4)
## [1]  7  8  9 10

Be careful, R does not warn even if we are asking weird stuff:

head(b, n=20)
##  [1]  1  2  3  4  5  6  7  8  9 10

We can read help using:

?tail

or the Help facilities of RStudio.

Exercises

  1. Create the vector 13, 13, 13, 13, 14, 14, 16, 18, 21 and asign it to a variable:

  2. Compute the median:

  3. Compute the mean:

  4. Is it normal? ;-)

  5. Compute the length:

  6. Sum all the values:

  7. Compute the mean using the previous sum and length

Block (chunk) with comment

We sometimes want to insert a comment to remind the purpose of an action.

A comment starts with the symbol sharp #.

# this is a comment

and a comment does nothing when evaluated.

It is really useful to explain several lines:

# Assign a list from 10 to 25 to the variable named var
var <- 10:25

# Select first elements
head(var)
## [1] 10 11 12 13 14 15
var + var # return a new list
##  [1] 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
# Eval the length the concatenated list
length(c(var, var))
## [1] 32

The comments do not appear in the Output block. However they appear in the final document (see this doc).

Matrices / Table

Matrices can be compared to tables: they are two dimensional and can contain numbers, characters, etc.

Let load a matrix:

load("cyto.RData")
cyto
##             FSC-A     SSC-A      CD45      CD19      CD34      CD56
## cell_1   96985.98  34853.85 160.62489 111.52325  60.59250  75.91315
## cell_2   88143.30  22730.01 146.89993 109.37424  60.74403  66.13155
## cell_3   92953.35  31651.34  46.90078  64.03857  76.36395  82.21494
## cell_4   78595.02  51510.99 176.39761  63.25624  64.01659 114.98014
## cell_5  109199.79  60020.60 159.90495  63.29745  87.70644  88.29176
## cell_6  136586.52 130294.78 162.73160  59.74005  88.22612 140.49802
## cell_7   86250.78  25582.85  14.60138  63.21502  78.34427  65.37292
## cell_8    6391.98   4621.63  99.39405  58.45350  55.43047  65.70305
## cell_9   16286.76   7108.74 101.62800  67.55399 148.54910  77.11388
## cell_10  91409.22  76663.14 195.31946  59.94742  79.83633  80.57721

We can check the number of rows and columns in the cyto object:

nrow(cyto)
## [1] 10
ncol(cyto)
## [1] 6
dim(cyto)
## [1] 10  6

Load data from CSV file and show a summary:

df <- read.csv2(file='example.csv', header=TRUE, dec=",", sep=";")
dim(df)
## [1] 150   5
summary(df)
##       Var1            Var2            Var3            Var4      
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##          Pop    
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

Indexing

We can display the element in the first row and the first column, or the 2nd row and 3rd column.

cyto[1, 1]
## [1] 96985.98
cyto[2, 3]
## [1] 146.8999

The indexing convention is: row, column.

All the elements of a specific column of a matrix can be selected by its number, here the 3rd column:

fcs_a <- cyto[, 3]
fcs_a
##    cell_1    cell_2    cell_3    cell_4    cell_5    cell_6    cell_7 
## 160.62489 146.89993  46.90078 176.39761 159.90495 162.73160  14.60138 
##    cell_8    cell_9   cell_10 
##  99.39405 101.62800 195.31946

How to concatenate two columns ?

# 1. select column 2 and column 5
# 2. concatenate them
# 3. store in the variable
fcs_b <- c(cyto[, 2], cyto[, 5])
fcs_b
##       cell_1       cell_2       cell_3       cell_4       cell_5 
##  34853.85156  22730.00977  31651.34180  51510.99219  60020.60156 
##       cell_6       cell_7       cell_8       cell_9      cell_10 
## 130294.78125  25582.85156   4621.62988   7108.74023  76663.14062 
##       cell_1       cell_2       cell_3       cell_4       cell_5 
##     60.59250     60.74403     76.36395     64.01659     87.70644 
##       cell_6       cell_7       cell_8       cell_9      cell_10 
##     88.22612     78.34427     55.43047    148.54910     79.83633
# 1. concatenate the number 2 and 5
# 2. select the columns 2 and 5
# 3. store
fcs_c <- cyto[, c(2, 5)]
fcs_c
##             SSC-A      CD34
## cell_1   34853.85  60.59250
## cell_2   22730.01  60.74403
## cell_3   31651.34  76.36395
## cell_4   51510.99  64.01659
## cell_5   60020.60  87.70644
## cell_6  130294.78  88.22612
## cell_7   25582.85  78.34427
## cell_8    4621.63  55.43047
## cell_9    7108.74 148.54910
## cell_10  76663.14  79.83633
# Do you see the difference ?

The column can also be selected by typing its column name directly:

cd45 <- cyto[, "CD34"]
cd45
##    cell_1    cell_2    cell_3    cell_4    cell_5    cell_6    cell_7 
##  60.59250  60.74403  76.36395  64.01659  87.70644  88.22612  78.34427 
##    cell_8    cell_9   cell_10 
##  55.43047 148.54910  79.83633
cd45_b <- cyto[, 5]
cd45_b
##    cell_1    cell_2    cell_3    cell_4    cell_5    cell_6    cell_7 
##  60.59250  60.74403  76.36395  64.01659  87.70644  88.22612  78.34427 
##    cell_8    cell_9   cell_10 
##  55.43047 148.54910  79.83633

Idem to select several columns:

cd45_c <- cyto[, c("CD34", "CD45")]
cd45_c
##              CD34      CD45
## cell_1   60.59250 160.62489
## cell_2   60.74403 146.89993
## cell_3   76.36395  46.90078
## cell_4   64.01659 176.39761
## cell_5   87.70644 159.90495
## cell_6   88.22612 162.73160
## cell_7   78.34427  14.60138
## cell_8   55.43047  99.39405
## cell_9  148.54910 101.62800
## cell_10  79.83633 195.31946

However, you cannot mix number and name to select columns. This chunk returns an error:

# cyto[, c(1, "CD34")]

Conversely, we can select a specific row, here the second:

cell2 <- cyto[2, ]
cell2
##       FSC-A       SSC-A        CD45        CD19        CD34        CD56 
## 88143.29688 22730.00977   146.89993   109.37424    60.74403    66.13155

Important to remember

  • The names are important.
  • The spaces in column name are hell.

Good names of variables, columns, files, etc. ease a lot the analysis, especially when things are not going as expected.

Exercises

  1. Show the 2 first rows of the cyto matrix

  2. Show the FSC-A value of the first cell

More on list

Lists in R can be compared to vectors, which can contain various objects:

random_numbers <- c(1, 2, 3, 7)
random_letters <- c("A", "B", "C", "Z")

The variables random_numbers and random_letters are two lists containing 4 numbers or 4 letters.

These two lists can be associated to create another list:

my_list_b <- list(random_letters, random_numbers)
my_list_b
## [[1]]
## [1] "A" "B" "C" "Z"
## 
## [[2]]
## [1] 1 2 3 7

The variable my_list_b is a list containing two sub-lists. These sub-lists can be named:

my_list <- list(my_numbers = random_numbers,
                my_letters = random_letters)
my_list
## $my_numbers
## [1] 1 2 3 7
## 
## $my_letters
## [1] "A" "B" "C" "Z"

The different items of a list can then be accessed by using the $ and typing their name directly:

my_list$my_numbers
## [1] 1 2 3 7

Exercises

  1. Create a list containing your name, and job:

Packages

Packages contain sets of useful functions, which can be installed, loaded, and then used in R. A package needs to be installed only once on your computer:

#install.packages("Rtsne")

But needs to be loaded every time you wish to use it:

#library(Rtsne)

The functions of an installed package can then be used. They often need the user to set parameters:

#tsne <- Rtsne(cyto)

This line would generate an error, we need to lower the perplexity as we only have 10 cells.

#tsne <- Rtsne(cyto, perplexity = 2)
#tsne$Y

You can always ask help to R, if you wish to know how a function works:

#?Rtsne

Define your own function

It is easy to define new functions. For example, I redefine my way to compute the mean:

moyenne <- function(lst) {
  sumall <- sum(lst)
  len <- length(lst)
  moy <- sumall / len
  moy
}

Then I can use it:

onelist <- 1:10
moyenne(onelist)
## [1] 5.5

and the result is the same than mean(onelist) i.e., 5.5.

It is out of this presentation, but R is powerful and you can write loops over list, test value, etc.

moyenne.absolute <- function(lst) {
  # init temporary variables
  nelem <- 0
  tot <- 0
  # loop over all the elements of the list
  for (val in lst) {
    if (val > 0 ) {
      tot <- tot + val
    } else {
      tot <- tot - val
    }
    nelem <- nelem + 1
  }
  # compute something
  moy <- tot / nelem
  # return it
  moy
}

and this function computes mean(abs(lst)). Let check that:

one <- 1:10
onelist <- c(-one, 2*one)

moyenne.absolute(onelist) == mean(abs(onelist))
## [1] TRUE

Well, if this section is not clear for you, then do not worry. We will talk again later.

RMarkdown summary

Subsection talking about table

You can also pretty print, e.g., a table:

Legend of the cyto table
CD45 CD34 CD56
cell_3 46.90078 76.36395 82.21494
cell_4 176.39761 64.01659 114.98014
cell_5 159.90495 87.70644 88.29176
cell_6 162.73160 88.22612 140.49802
cell_7 14.60138 78.34427 65.37292

Subsubsection talking about figure

My nice caption of my nice figure

My nice caption of my nice figure

Further readings

You could also have a look at the following online resources:

Please read http://adv-r.had.co.nz/Style.html.