2 + 2
[1] 4
In Quarto documents like this one, we can write comments by just using plain text. In contrast, code needs to be within code blocks, like the one below. To execute a code block, you can click on the little “Play” button or press Cmd/Ctrl + Shift + Enter
when your keyboard is hovering the code block.
2 + 2
[1] 4
That was our first R command, a simple math operation. Of course, we can also do more complex arithmetic:
12345 ^ 2 / (200 + 25 - 6 * 2) # this is an inline comment, see the leading "#"
[1] 715488.4
In order to create a code block, you can press Cmd/Ctrl + Alt + i
or click on the little green “+C” icon on top of the script.
Create your own code block below and run a math operation.
A huge part of R is working with objects. Let’s see how they work:
<- 10 # opt/alt + minus sign will make the arrow my_object
# to print the value of an object, just call its name my_object
[1] 10
We can now use this object in our operations:
2 ^ my_object
[1] 1024
Or even create another object out of it:
<- my_object * 2 my_object2
my_object2
[1] 20
You can delete objects with the rm()
function (for “remove”):
rm(my_object2)
Objects can be of different types. One of the most useful ones is the vector, which holds a series of values. To create one manually, we can use the c()
function (for “combine”):
<- c(6, -11, my_object, 0, 20) my_vector
my_vector
[1] 6 -11 10 0 20
One can also define vectors by sequences:
3:10
[1] 3 4 5 6 7 8 9 10
We can use square brackets to retrieve parts of vectors:
4] # fourth element my_vector[
[1] 0
1:2] # first two elements my_vector[
[1] 6 -11
Let’s check out some basic functions we can use with numbers and numeric vectors:
sqrt(my_object) # squared root
[1] 3.162278
log(my_object) # logarithm (natural by default)
[1] 2.302585
abs(-5) # absolute value
[1] 5
mean(my_vector)
[1] 5
median(my_vector)
[1] 6
sd(my_vector) # standard deviation
[1] 11.53256
sum(my_vector)
[1] 25
min(my_vector) # minimum value
[1] -11
max(my_vector) # maximum value
[1] 20
length(my_vector) # length (number of elements)
[1] 5
Notice that if we wanted to save any of these results for later, we would need to assign them:
<- mean(my_vector) my_mean
my_mean
[1] 5
These functions are quite simple: they take one object and do one operation. A lot of functions are a bit more complex—they take multiple objects or take options. For example, see the sort()
function, which by default sorts a vector increasingly:
sort(my_vector)
[1] -11 0 6 10 20
If we instead want to sort our vector decreasingly, we can use the decreasing = TRUE
argument (T
also works as an abbreviation for TRUE
).
sort(my_vector, decreasing = TRUE)
[1] 20 10 6 0 -11
If you use the argument values in order, you can avoid writing the argument names (see below). This is sometimes useful, but can also lead to confusing code—use it with caution.
sort(my_vector, T)
[1] 20 10 6 0 -11
A useful function to create vectors in sequence is seq()
. Notice its arguments:
seq(from = 30, to = 100, by = 5)
[1] 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
To check the arguments of a function, you can examine its help file: look the function up on the “Help” panel on RStudio or use a command like the following: ?sort
.
Examine the help file of the log()
function. How can we compute the the base-10 logarithm of my_object
? Your code:
Other than numeric vectors, character vectors are also useful:
<- c("Apple", "Orange", "Watermelon", "Banana") my_character_vector
3] my_character_vector[
[1] "Watermelon"
nchar(my_character_vector) # count number of characters
[1] 5 6 10 6
Another useful object type is the data frame. Data frames can store multiple vectors in a tabular format. We can manually create one with the data.frame()
function:
<- data.frame(fruit = my_character_vector,
my_data_frame calories_per_100g = c(52, 47, 30, 89),
water_per_100g = c(85.6, 86.8, 91.4, 74.9))
my_data_frame
fruit calories_per_100g water_per_100g
1 Apple 52 85.6
2 Orange 47 86.8
3 Watermelon 30 91.4
4 Banana 89 74.9
Now we have a little 4x3 data frame of fruits with their calorie counts and water composition. We gathered the nutritional information from the USDA (2019).
We can use the data_frame$column
construct to access the vectors within the data frame:
mean(my_data_frame$calories_per_100g)
[1] 54.5
Obtain the maximum value of water content per 100g in the data. Your code:
Some useful commands to learn attributes of our data frame:
dim(my_data_frame)
[1] 4 3
nrow(my_data_frame)
[1] 4
names(my_data_frame) # column names
[1] "fruit" "calories_per_100g" "water_per_100g"
We will learn much more about data frames in our next module on data analysis.
After talking about vectors and data frames, the last object type that we will cover is the list. Lists are super flexible objects that can contain just about anything:
<- list(my_object, my_vector, my_data_frame) my_list
my_list
[[1]]
[1] 10
[[2]]
[1] 6 -11 10 0 20
[[3]]
fruit calories_per_100g water_per_100g
1 Apple 52 85.6
2 Orange 47 86.8
3 Watermelon 30 91.4
4 Banana 89 74.9
To retrieve the elements of a list, we need to use double square brackets:
1]] my_list[[
[1] 10
Lists are sometimes useful due to their flexibility, but are much less common in routine data analysis compared to vectors or data frames.
The R community has developed thousands of packages, which are specialized collections of functions, datasets, and other resources. To install one, you should use the install.packages()
command. Below we will install the tidyverse
package, a suite for data analysis that we will use in the next modules. You just need to install packages once, and then they will be available system-wide.
install.packages("tidyverse") # this can take a couple of minutes
If you want to use an installed package in your script, you must load it with the library()
function. Some packages, as shown below, will print descriptive messages once loaded.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Remember that install.packages("package")
needs to be executed just once, while library(package)
needs to be in each script in which you plan to use the package. In general, never include install.packages("package")
as part of your scripts or Quarto documents!