Simon says:
In this introduction to R, we will present the main R concepts, by introducing R variables, vectors, matrices, lists, functions and packages.
Please read http://adv-r.had.co.nz/Style.html.
In R, we have two main types of values: numeric values and characters. Numerics:
12
## [1] 12
3.14
## [1] 3.14
-42.12
## [1] -42.12
and characters:
"text"
## [1] "text"
"Rock and Roll"
## [1] "Rock and Roll"
The characters also named strings are between double-quote "
(french keyboard: key 3).
We can save these values in variables,
a <- 3.14
so we can easily access them later or re-use them:
a + a
## [1] 6.28
a
## [1] 3.14
We can compute, save the value and display it:
result <- 2*a + 3 * pi
result
## [1] 15.70478
Assign a number of your choice to the variable b
Show what’s in b
Multiply a
by 2
Add 10
to a
Assign your name to an object called name
Vector contains several values:
b <- 1:10
We can concatenate these values with the c()
function:
c(b, b)
## [1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
We can also apply mathematical formulas to them:
b + b
## [1] 2 4 6 8 10 12 14 16 18 20
3 * b
## [1] 3 6 9 12 15 18 21 24 27 30
Concatenate a
and b
Concatenate your name and a
Add a
to b
Many functions exist in R, and greatly simplify our life.
How to compute the mean of the variable b
? The naive way:
(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10)/10
## [1] 5.5
And now, what does it happen if b <- 5:17
? We have to modify the formula. Not enough powerful. R has built-in functions to easily compute common quantities.
For example, the mean is computed by:
mean(b)
## [1] 5.5
Moreover, we can compose the function and the mathematical operations (also fucntion).
m <- mean(b + b)
m
## [1] 11
What about the median? Easy:
median(b)
## [1] 5.5
And the number of elements (the length of the vector/list):
length(b)
## [1] 10
length(b + b)
## [1] 10
length(c(b, b))
## [1] 20
The concatenation c()
is a built-in function.
Let sum all the elements, find the minimum, the maximum:
sum(b)
## [1] 55
min(b)
## [1] 1
max(b)
## [1] 10
Or print a statistical summary:
summary(b)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 3.25 5.50 5.50 7.75 10.00
List the 6 first elements:
head(b)
## [1] 1 2 3 4 5 6
or the 6 last:
tail(b)
## [1] 5 6 7 8 9 10
or the 4 last:
tail(b, n=4)
## [1] 7 8 9 10
Be careful, R does not warn even if we are asking weird stuff:
head(b, n=20)
## [1] 1 2 3 4 5 6 7 8 9 10
We can read help using:
?tail
or the Help facilities of RStudio.
Create the vector 13, 13, 13, 13, 14, 14, 16, 18, 21 and asign it to a variable:
Compute the median:
Compute the mean:
Is it normal? ;-)
Compute the length:
Sum all the values:
Compute the mean using the previous sum and length
We sometimes want to insert a comment to remind the purpose of an action.
A comment starts with the symbol sharp #
.
# this is a comment
and a comment does nothing when evaluated.
It is really useful to explain several lines:
# Assign a list from 10 to 25 to the variable named var
var <- 10:25
# Select first elements
head(var)
## [1] 10 11 12 13 14 15
var + var # return a new list
## [1] 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
# Eval the length the concatenated list
length(c(var, var))
## [1] 32
The comments do not appear in the Output block. However they appear in the final document (see this doc).
Matrices can be compared to tables: they are two dimensional and can contain numbers, characters, etc.
Let load a matrix:
load("cyto.RData")
cyto
## FSC-A SSC-A CD45 CD19 CD34 CD56
## cell_1 96985.98 34853.85 160.62489 111.52325 60.59250 75.91315
## cell_2 88143.30 22730.01 146.89993 109.37424 60.74403 66.13155
## cell_3 92953.35 31651.34 46.90078 64.03857 76.36395 82.21494
## cell_4 78595.02 51510.99 176.39761 63.25624 64.01659 114.98014
## cell_5 109199.79 60020.60 159.90495 63.29745 87.70644 88.29176
## cell_6 136586.52 130294.78 162.73160 59.74005 88.22612 140.49802
## cell_7 86250.78 25582.85 14.60138 63.21502 78.34427 65.37292
## cell_8 6391.98 4621.63 99.39405 58.45350 55.43047 65.70305
## cell_9 16286.76 7108.74 101.62800 67.55399 148.54910 77.11388
## cell_10 91409.22 76663.14 195.31946 59.94742 79.83633 80.57721
We can check the number of rows and columns in the cyto object:
nrow(cyto)
## [1] 10
ncol(cyto)
## [1] 6
dim(cyto)
## [1] 10 6
Load data from CSV file and show a summary:
df <- read.csv2(file='example.csv', header=TRUE, dec=",", sep=";")
dim(df)
## [1] 150 5
summary(df)
## Var1 Var2 Var3 Var4
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Pop
## setosa :50
## versicolor:50
## virginica :50
##
##
##
We can display the element in the first row and the first column, or the 2nd row and 3rd column.
cyto[1, 1]
## [1] 96985.98
cyto[2, 3]
## [1] 146.8999
The indexing convention is: row, column.
All the elements of a specific column of a matrix can be selected by its number, here the 3rd column:
fcs_a <- cyto[, 3]
fcs_a
## cell_1 cell_2 cell_3 cell_4 cell_5 cell_6 cell_7
## 160.62489 146.89993 46.90078 176.39761 159.90495 162.73160 14.60138
## cell_8 cell_9 cell_10
## 99.39405 101.62800 195.31946
How to concatenate two columns ?
# 1. select column 2 and column 5
# 2. concatenate them
# 3. store in the variable
fcs_b <- c(cyto[, 2], cyto[, 5])
fcs_b
## cell_1 cell_2 cell_3 cell_4 cell_5
## 34853.85156 22730.00977 31651.34180 51510.99219 60020.60156
## cell_6 cell_7 cell_8 cell_9 cell_10
## 130294.78125 25582.85156 4621.62988 7108.74023 76663.14062
## cell_1 cell_2 cell_3 cell_4 cell_5
## 60.59250 60.74403 76.36395 64.01659 87.70644
## cell_6 cell_7 cell_8 cell_9 cell_10
## 88.22612 78.34427 55.43047 148.54910 79.83633
# 1. concatenate the number 2 and 5
# 2. select the columns 2 and 5
# 3. store
fcs_c <- cyto[, c(2, 5)]
fcs_c
## SSC-A CD34
## cell_1 34853.85 60.59250
## cell_2 22730.01 60.74403
## cell_3 31651.34 76.36395
## cell_4 51510.99 64.01659
## cell_5 60020.60 87.70644
## cell_6 130294.78 88.22612
## cell_7 25582.85 78.34427
## cell_8 4621.63 55.43047
## cell_9 7108.74 148.54910
## cell_10 76663.14 79.83633
# Do you see the difference ?
The column can also be selected by typing its column name directly:
cd45 <- cyto[, "CD34"]
cd45
## cell_1 cell_2 cell_3 cell_4 cell_5 cell_6 cell_7
## 60.59250 60.74403 76.36395 64.01659 87.70644 88.22612 78.34427
## cell_8 cell_9 cell_10
## 55.43047 148.54910 79.83633
cd45_b <- cyto[, 5]
cd45_b
## cell_1 cell_2 cell_3 cell_4 cell_5 cell_6 cell_7
## 60.59250 60.74403 76.36395 64.01659 87.70644 88.22612 78.34427
## cell_8 cell_9 cell_10
## 55.43047 148.54910 79.83633
Idem to select several columns:
cd45_c <- cyto[, c("CD34", "CD45")]
cd45_c
## CD34 CD45
## cell_1 60.59250 160.62489
## cell_2 60.74403 146.89993
## cell_3 76.36395 46.90078
## cell_4 64.01659 176.39761
## cell_5 87.70644 159.90495
## cell_6 88.22612 162.73160
## cell_7 78.34427 14.60138
## cell_8 55.43047 99.39405
## cell_9 148.54910 101.62800
## cell_10 79.83633 195.31946
However, you cannot mix number and name to select columns. This chunk returns an error:
# cyto[, c(1, "CD34")]
Conversely, we can select a specific row, here the second:
cell2 <- cyto[2, ]
cell2
## FSC-A SSC-A CD45 CD19 CD34 CD56
## 88143.29688 22730.00977 146.89993 109.37424 60.74403 66.13155
Good names of variables, columns, files, etc. ease a lot the analysis, especially when things are not going as expected.
Show the 2 first rows of the cyto matrix
Show the FSC-A
value of the first cell
Lists in R can be compared to vectors, which can contain various objects:
random_numbers <- c(1, 2, 3, 7)
random_letters <- c("A", "B", "C", "Z")
The variables random_numbers
and random_letters
are two lists containing 4 numbers or 4 letters.
These two lists can be associated to create another list:
my_list_b <- list(random_letters, random_numbers)
my_list_b
## [[1]]
## [1] "A" "B" "C" "Z"
##
## [[2]]
## [1] 1 2 3 7
The variable my_list_b
is a list containing two sub-lists. These sub-lists can be named:
my_list <- list(my_numbers = random_numbers,
my_letters = random_letters)
my_list
## $my_numbers
## [1] 1 2 3 7
##
## $my_letters
## [1] "A" "B" "C" "Z"
The different items of a list can then be accessed by using the $
and typing their name directly:
my_list$my_numbers
## [1] 1 2 3 7
Packages contain sets of useful functions, which can be installed, loaded, and then used in R. A package needs to be installed only once on your computer:
#install.packages("Rtsne")
But needs to be loaded every time you wish to use it:
#library(Rtsne)
The functions of an installed package can then be used. They often need the user to set parameters:
#tsne <- Rtsne(cyto)
This line would generate an error, we need to lower the perplexity as we only have 10 cells.
#tsne <- Rtsne(cyto, perplexity = 2)
#tsne$Y
You can always ask help to R, if you wish to know how a function works:
#?Rtsne
It is easy to define new functions. For example, I redefine my way to compute the mean:
moyenne <- function(lst) {
sumall <- sum(lst)
len <- length(lst)
moy <- sumall / len
moy
}
Then I can use it:
onelist <- 1:10
moyenne(onelist)
## [1] 5.5
and the result is the same than mean(onelist)
i.e., 5.5.
It is out of this presentation, but R is powerful and you can write loops over list, test value, etc.
moyenne.absolute <- function(lst) {
# init temporary variables
nelem <- 0
tot <- 0
# loop over all the elements of the list
for (val in lst) {
if (val > 0 ) {
tot <- tot + val
} else {
tot <- tot - val
}
nelem <- nelem + 1
}
# compute something
moy <- tot / nelem
# return it
moy
}
and this function computes mean(abs(lst))
. Let check that:
one <- 1:10
onelist <- c(-one, 2*one)
moyenne.absolute(onelist) == mean(abs(onelist))
## [1] TRUE
Well, if this section is not clear for you, then do not worry. We will talk again later.
-
*italics*
**bold**
code
is `code`
$E=mc^2 + \sum x_i$
#
, subsection with ##
, subsubsubsection with ###
, etc.[hyperlink](https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf)
You can also pretty print, e.g., a table:
CD45 | CD34 | CD56 | |
---|---|---|---|
cell_3 | 46.90078 | 76.36395 | 82.21494 |
cell_4 | 176.39761 | 64.01659 | 114.98014 |
cell_5 | 159.90495 | 87.70644 | 88.29176 |
cell_6 | 162.73160 | 88.22612 | 140.49802 |
cell_7 | 14.60138 | 78.34427 | 65.37292 |
You could also have a look at the following online resources:
Please read http://adv-r.had.co.nz/Style.html.