c(5, 25, -2, 1)
[1] 5 25 -2 1
Matrices are rectangular collections of numbers. In this module we will introduce them and review some basic operators, to then introduce a sneak peek of why matrices are useful (and cool).
One number (for example, 12) is referred to as a scalar.
\[ a = 12 \]
We can put several scalars together to make a vector. Here is an example:
\[ \overrightarrow b = \begin{bmatrix} 12 \\ 14 \\ 15 \end{bmatrix} \]
Since this is a column of numbers, we cleverly refer to it as a column vector.
Here is another example of a vector, this time represented as a row vector:
\[ \overrightarrow c = \begin{bmatrix} 12 & 14 & 15 \end{bmatrix} \]
Column vectors are possibly more common and useful, but we sometimes write things down using row vectors to
Vectors are fairly easy to construct in R. As we saw before, we can use the c()
function to combine elements:
c(5, 25, -2, 1)
[1] 5 25 -2 1
Remember that the code above does not create any objects. To do so, you’d need to use the assignment operator (<-
):
<- c(5, 25, -2, 1)
vector_example vector_example
[1] 5 25 -2 1
Or we can also create vectors from sequences with the :
operator or the seq()
function:
10:20
[1] 10 11 12 13 14 15 16 17 18 19 20
seq(from = 3, to = 27, by = 3)
[1] 3 6 9 12 15 18 21 24 27
The summation operator \(\sum\) (i.e., the uppercase Sigma letter) lets us perform an operation on a sequence of numbers, which is often but not always a vector.
\[\overrightarrow d = \begin{bmatrix} 12 & 7 & -2 & 3 & -1 \end{bmatrix}\]
We can then calculate the sum of the first three elements of the vector, which is expressed as follows: \[\sum_{i=1}^3 d_i\]
Then we do the following math: \[12+7+(-2)=17\]
It is also common to use \(n\) in the superscript to indicate that we want to sum all elements:
\[
\sum_{i=1}^n d_i = 12 + 7 + (-2) + 3 + (-1) = 19
\] We can perform these operations using the sum()
function in R:
<- c(12, 7, -2, 3, -1) vector_d
sum(vector_d[1:3])
[1] 17
sum(vector_d)
[1] 19
The product operator \(\prod\) (i.e., the uppercase Pi letter) can also perform operations over a sequence of elements in a vector. Recall our previous vector:
\[\overrightarrow d = \begin{bmatrix} 12 & 7 & -2 & 3 & 1 \end{bmatrix}\]
We might want to calculate the product of all its elements, which is expressed as follows: \[\prod_{i=1}^n d_i = 12 \cdot 7 \cdot (-2) \cdot 3 \cdot (-1) = 504\]
In R, we can compute products using the prod()
function:
prod(vector_d)
[1] 504
Get the product of the first three elements of vector \(d\). Write the notation by hand and use R to obtain the number.
We can append vectors together to form a matrix:
\[A = \begin{bmatrix} 12 & 14 & 15 \\ 115 & 22 & 127 \\ 193 & 29 & 219 \end{bmatrix}\]
The number of rows and columns of a matrix constitute the dimensions of the matrix. The first number is the number of rows (“r”) and the second number is the number of columns (“c”) in the matrix.
Find a way to remember “r x c” permanently. The order of the dimensions never changes.
Matrix \(A\) above, for example, is a \(3 \times 3\) matrix. Sometimes we’d refer to it as \(A_{3 \times 3}\).
It is common to use capital letters (sometimes bold-faced) to represent matrices. In contrast, vectors are usually represented with either bold lowercase letters or lowercase letters with an arrow on top (e.g., \(\overrightarrow v\)).
There are different ways to create matrices in R. One of the simplest is via rbind()
or cbind()
, which paste vectors together (either by rows or by columns):
# Create some vectors
<- 1:4
vector1 <- 5:8
vector2 <- 9:12
vector3 <- 13:16 vector4
# Using rbind(), each vector will be a row
<- rbind(vector1, vector2, vector3, vector4)
rbind_mat rbind_mat
[,1] [,2] [,3] [,4]
vector1 1 2 3 4
vector2 5 6 7 8
vector3 9 10 11 12
vector4 13 14 15 16
# Using cbind(), each vector will be a column
<- cbind(vector1, vector2, vector3, vector4)
cbind_mat cbind_mat
vector1 vector2 vector3 vector4
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
An alternative is to use to properly named matrix()
function. The basic syntax is matrix(data, nrow, ncol, byrow)
:
data
is the input vector which becomes the data elements of the matrix.nrow
is the number of rows to be created.ncol
is the number of columns to be created.byrow
is a logical clue. If TRUE
then the input vector elements are arranged by row. By default (FALSE
), elements are arranged by column.Let’s see some examples:
# Elements are arranged sequentially by row.
<- matrix(c(1:12), nrow = 4, byrow = T)
M M
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
# Elements are arranged sequentially by column (byrow = F by default).
<- matrix(c(1:12), nrow = 4)
N N
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
How do we refer to specific elements of the matrix? For example, matrix \(A\) is an \(m\times n\) matrix where \(m=n=3\). This is sometimes called a square matrix.
More generally, matrix \(B\) is an \(m\times n\) matrix where the elements look like this: \[B= \begin{bmatrix} b_{11} & b_{12} & b_{13} & \ldots & b_{1n} \\ b_{21} & b_{22} & b_{23} & \ldots & b_{2n} \\ \vdots & \vdots & \vdots & \ldots & \vdots \\ b_{m1} & b_{m2} & b_{m3} & \ldots & b_{mn} \end{bmatrix}\]
Thus \(b_{23}\) refers to the second unit down and third across. More generally, we refer to row indices as \(i\) and to column indices as \(j\).
In R, we can access a matrix’s elements using square brackets:
# In matrix N, access the element at 1st row and 3rd column.
1,3] N[
[1] 9
# In matrix N, access the element at 4th row and 2nd column.
4,2] N[
[1] 8
When trying to identify a specific element, the first subscript is the element’s row and the second subscript is the element’s column (always in that order).
In R, indexing is 1-based, meaning that the first element of a vector, matrix, or any other data structure is accessed with index 1. In other programming tools such as Python, indexing is 0-based, meaning that the first element of a list, array, or any other data structure is accessed with index 0.
# Create a 2x2 matrix in R
<- matrix(1:4, nrow = 2, ncol = 2)
matrix_A
# Access the element in the first row, first column
<- matrix_A[1, 1] # This will return 1 element
# Create a list in Python
= [10, 20, 30, 40]
vector
# Access the first element
= vector[0] # This will return 10
first_element import numpy as np
# Create a 2x2 matrix in Python with NumPy
= np.array([[1, 2], [3, 4]])
matrix_A
# Access the element in the first row, first column
= matrix_A[0, 0] # This will return 1 element
Addition and subtraction are straightforward operations.
Matrices must have exactly the same dimensions for both of these operations.
We add or subtract each element with the corresponding element from the other matrix.
This is expressed as follows:
\[A \pm B=C\]
\[c_{ij}=a_{ij} \pm b_{ij} \text{ }\forall i,j\]
\[\begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix} \pm \begin{bmatrix} b_{11} & b_{12} & b_{13}\\ b_{21} & b_{22} & b_{23}\\ b_{31} & b_{32} & b_{33} \end{bmatrix}\] \[=\] \[\begin{bmatrix} a_{11}\pm b_{11} & a_{12}\pm b_{12} & a_{13}\pm b_{13}\\ a_{21}\pm b_{21} & a_{22}\pm b_{22} & a_{23}\pm b_{23}\\ a_{31}\pm b_{31} & a_{32}\pm b_{32} & a_{33}\pm b_{33} \end{bmatrix}\]
We start by creating two 2x3 matrices:
\[A= \begin{bmatrix} 3 & -1 & 2 \\ 9 & 4 & 6 \end{bmatrix}\]
# Create two 2x3 matrices.
<- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
matrix1 matrix1
[,1] [,2] [,3]
[1,] 3 -1 2
[2,] 9 4 6
And \[B= \begin{bmatrix} 5 & 2 & 0 \\ 9 & 3 & 4 \end{bmatrix}\]
<- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
matrix2 matrix2
[,1] [,2] [,3]
[1,] 5 0 3
[2,] 2 9 4
We can simply use the +
and -
operators for addition and substraction:
+ matrix2 matrix1
[,1] [,2] [,3]
[1,] 8 -1 5
[2,] 11 13 10
- matrix2 matrix1
[,1] [,2] [,3]
[1,] -2 -1 -1
[2,] 7 -5 2
(Use code for one of these and do the other one by hand!)
1) Calculate \(A + B\)
\[A= \begin{bmatrix} 1 & 0 \\ -2 & -1 \end{bmatrix}\]
\[B = \begin{bmatrix}
5 & 1 \\
2 & -1
\end{bmatrix}\]
2) Calculate \(A - B\)
\[A= \begin{bmatrix} 6 & -2 & 8 & 12 \\ 4 & 42 & 8 & -6 \end{bmatrix}\]
\[B = \begin{bmatrix} 18 & 42 & 3 & 7 \\ 0 & -42 & 15 & 4 \end{bmatrix}\]
Scalar multiplication is very intuitive. As we know, a scalar is a single number. We multiply each value in the matrix by the scalar to perform this operation.
Formally, this is expressed as follows: \[A = \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix}\] \[cA = \begin{bmatrix} ca_{11} & ca_{12} & ca_{13}\\ ca_{21} & ca_{22} & ca_{23}\\ ca_{31} & ca_{32} & ca_{33} \end{bmatrix}\]
In R, all we need to do is take an established matrix and multiply it by some scalar:
# matrix1 from our previous example
matrix1
[,1] [,2] [,3]
[1,] 3 -1 2
[2,] 9 4 6
* 3 matrix1
[,1] [,2] [,3]
[1,] 9 -3 6
[2,] 27 12 18
Calculate \(2\times A\) and \(-3 \times B\). Again, do one by hand and the other one using R.
\[A= \begin{bmatrix} 1 & 4 & 8 \\ 0 & -1 & 3 \end{bmatrix}\] \[ B = \begin{bmatrix} -15 & 1 & 5 \\ 2 & -42 & 0 \\ 7 & 1 & 6 \end{bmatrix}\]
Multiplying matrices is slightly trickier than multiplying scalars.
Two matrices must be conformable for them to be multiplied together. This means that the number of columns in the first matrix equals the number of rows in the second.
When multiplying \(A \times B\), if \(A\) is \(m \times n\), \(B\) must have \(n\) rows.
The conformability requirement never changes. Before multiplying anything, check to make sure the matrices are indeed conformable.
Which of the following can we multiply? What will be the dimensions of the resulting matrix? \[\begin{aligned} B_{4 \times 1}= \begin{bmatrix} 2 \\ 3\\ 4\\ 1 \end{bmatrix} M_{3 \times 3} = \begin{bmatrix} 1 & 0 & 2\\ 1 & 2 & 4\\ 2 & 3 & 2 \end{bmatrix} L_{2 \times 3} = \begin{bmatrix} 6 & 5 & -1\\ 1 & 4 & 3 \end{bmatrix} \end{aligned}\]
The only valid multiplication based on the provided matrices is \(L \times M\), which results in a \(2 \times 3\) matrix.
Why can’t we multiply in the opposite order?
The non-commutative property of matrix multiplication is a fundamental aspect in matrix algebra. The multiplication of matrices is sensitive to the order in which the matrices are multiplied due to the requirements of dimensional compatibility, the resulting dimensions, and the computation process itself.
When multiplying matrices, order matters. Even if multiplication is possible in both directions, in general \(AB \neq BA\).
This is sometimes to referred to as the “dot product,” where we multiply matching members, then sum up.
Suppose a company manufactures two kinds of furniture: chairs and sofas.
A chair costs $100 for wood, $270 for cloth, and $130 for feathers.
Each sofa costs $150 for wood, $420 for cloth, and $195 for feathers.
Chair | Sofa | |
---|---|---|
Wood | 100 | 150 |
Cloth | 270 | 420 |
Feathers | 130 | 195 |
The same information about unit cost (\(C\)) can be presented as a matrix.
\[C = \begin{bmatrix} 100 & 150\\ 270 & 420\\ 130 & 195 \end{bmatrix}\]
Note that each of the three rows of this 3 x 2 matrix represents a material (wood, cloth, or feathers), and each of the two columns represents a product (chair or coach). The elements are the unit cost (in USD).
Now, suppose that the company will produce 45 chairs and 30 sofas this month. This production quantity can be represented in the following table, and also as a 2 x 1 matrix (\(Q\)):
Product | Quantity |
---|---|
Chair | 45 |
Sofa | 30 |
\[Q = \begin{bmatrix} 45 \\ 30 \end{bmatrix}\]
What will be the company’s total cost? The “total expenditure” is equal to the “unit cost” times the “production quantity” (the number of units).
The total expenditure (\(E\)) for each material this month is calculated by multiplying these two matrices.
\[\begin{aligned} E = CQ = \begin{bmatrix} 100 & 150\\ 270 & 420\\ 130 & 195 \end{bmatrix} \begin{bmatrix} 45 \\ 30 \end{bmatrix} = \begin{bmatrix} (100)(45) + (150)(30) \\ (270)(45) + (420)(30) \\ (130)(45) + (195)(30) \end{bmatrix} = \begin{bmatrix} 9,000 \\ 24,750 \\ 11,700 \end{bmatrix} \end{aligned}\]
Multiplying the 3x2 Cost matrix (\(C\)) times the 2x1 Quantity matrix (\(Q\)) yields the 3x1 Expenditure matrix (\(E\)).
As a result of this matrix multiplication, we determine that this month the company will incur expenditures of:
Before attempting matrix multiplication, we must make sure the matrices are conformable (as we do for our manual calculations).
Then we can multiply our matrices together using the %*%
operator.
<- matrix(c(100, 270, 130, 150, 420, 195), nrow = 3)
C C
[,1] [,2]
[1,] 100 150
[2,] 270 420
[3,] 130 195
<- matrix(c(45, 30), nrow = 2)
Q Q
[,1]
[1,] 45
[2,] 30
%*% Q C
[,1]
[1,] 9000
[2,] 24750
[3,] 11700
If you have a missing value or NA
in one of the matrices you are trying to multiply (something we will discuss in further detail in the next module), you will have NA
s in your resulting matrix.
Addition and subtraction:
Associative: \((A \pm B) \pm C = A \pm (B \pm C)\)
Communicative: \(A \pm B = B \pm A\)
Multiplication:
\(AB \neq BA\)
\(A(BC) = (AB)C\)
\(A(B+C) = AB + AC\)
\((A+B)C = AC + BC\)
Square matrix
\[ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \]
In a square matrix, the number of rows equals the number of columns (\(m=n\)):
The diagonal of a matrix is a set of numbers consisting of the elements on the line from the upper-left-hand to the lower-right-hand corner of the matrix, as in $ d(A)=[1,5,9] $. Diagonals are particularly useful in square matrices.
The trace of a matrix, denoted as \(tr(A)\), is the sum of the diagonal elements of the matrix. \(tr(A) = 1+5+9 = 15\)
Diagonal matrix:
Scalar matrix:
\[ S = \begin{bmatrix} 7 & 0 & 0 \\ 0 & 7 & 0 \\ 0 & 0 & 7 \end{bmatrix} \]
Identity matrix:
\[ I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]
The identity matrix is a scalar matrix with all of the diagonal elements equal to one.
Remember that, as with all diagonal matrices, the off-diagonal elements are equal to zero.
The capital letter \(I\) is reserved for the identity matrix. For convenience, a 3x3 identity matrix can be denoted as \(I_3\).
The transpose is the original matrix with the rows and the columns interchanged.
The notation is either \(J'\) (“J prime”) or \(J^T\) (“J transpose”).
\[J = \begin{bmatrix} 4 & 5\\ 3 & 0\\ 7 & -2 \end{bmatrix}\]
\[J' = J^T = \begin{bmatrix} 4 & 3 & 7 \\ 5 & 0 & -2 \end{bmatrix}\]
In R, we use t()
to get the transpose.
<- matrix(c(4, 3, 7, 5, 0, -2), ncol = 2)
J J
[,1] [,2]
[1,] 4 5
[2,] 3 0
[3,] 7 -2
t(J)
[,1] [,2] [,3]
[1,] 4 3 7
[2,] 5 0 -2
Just like a number has a reciprocal, a matrix has an inverse.
When we multiply a matrix by its inverse we get the identity matrix (which is like “1” for matrices).
\[A × A^{-1} = I\]
\[AA^{-1} = A^{-1}A = I\]
For now, don’t worry about calculating the inverse of a matrix manually. This is the type of task we use R for.
solve()
function to calculate the inverse of a matrix:<- matrix(c(3, 2, 5, 2, 3, 2, 5, 2, 4), ncol = 3)
A A
[,1] [,2] [,3]
[1,] 3 2 5
[2,] 2 3 2
[3,] 5 2 4
solve(A)
[,1] [,2] [,3]
[1,] -0.29629630 -0.07407407 0.4074074
[2,] -0.07407407 0.48148148 -0.1481481
[3,] 0.40740741 -0.14814815 -0.1851852
A system of equations can be represented by an augmented matrix.
System of equations: \[{\color{red}{3}}x + {\color{green}{6}}y = {\color{blue}{12}}\] \[{\color{red}{5}}x + {\color{green}{10}}y = {\color{blue}{25}}\]
In an augmented matrix, each row represents one equation in the system and each column represents a variable or the constant terms. \[\begin{bmatrix} {\color{red}{3}} & {\color{green}{6}} & {\color{blue}{12}}\\ {\color{red}{5}} & {\color{green}{10}} & {\color{blue}{25}} \end{bmatrix}\]
We can use the logic above to calculate estimates for our ordinary least squares (OLS) models.
OLS is a linear regression technique used to find the best-fitting line for a set of data points (observations) by minimizing the residuals (the differences between the observed and predicted values).
We minimize the sum of the squared errors.
Suppose, for example, we have a sample consisting of \(n\) observations.
The dependent variable is denoted as an \(n \times1\) column vector.
\[Y = \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n \end{bmatrix}\]
Suppose there are \(k\) independent variables and a constant term, meaning \(k+1\) columns and \(n\) rows.
We can represent these variables as an \(n \times (k+1)\) matrix, expressed as follows:
\[X= \begin{bmatrix} 1 & x_{11} & \dots & x_{1k} \\ 1 & x_{21} & \dots & x_{2k} \\ \vdots & \vdots & \dots & \vdots \\ 1 & x_{n1} & \dots & x_{nk} \end{bmatrix}\]
Let’s say we have 173 observations (n = 173) and 2 IVs (k = 3).
This can be expressed as the following linear equation: \[y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \epsilon\]
In matrix form, we have: \[\begin{aligned} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & x_{21} \\ 1 & x_{21} & x_{22} \\ \vdots & \vdots & \vdots \\ 1 & x_{1173} & x_{2173} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_{173} \end{bmatrix}\end{aligned} \]
All 173 equations can be represented by: \[y=X\beta+\epsilon\]
\[\hat{\beta} = (X'X)^{-1}X'Y\]
Read aloud, we say “X prime X inverse, X prime Y”.
The little hat on our beta (\(\hat{\beta}\)) signifies that these are estimates.
Remember, the OLS method is to choose \(\hat{\beta}\) such that the sum of squared residuals (“SSR”) is minimized.
mtcars
data set (our favorite) for this example, which contains data about many different car models.<- mtcars cars_df
Now, we want to estimate the association between hp
(horsepower) and wt
(weight), our independent variables, and mpg
(miles per gallon), our dependent variable.
First, we transform our dependent variable into a matrix, using the as.matrix
function and specifying the column of the mtcars
data set to create a column vector of our observed values for the DV.
<- as.matrix(cars_df$mpg)
Y Y
[,1]
[1,] 21.0
[2,] 21.0
[3,] 22.8
[4,] 21.4
[5,] 18.7
[6,] 18.1
[7,] 14.3
[8,] 24.4
[9,] 22.8
[10,] 19.2
[11,] 17.8
[12,] 16.4
[13,] 17.3
[14,] 15.2
[15,] 10.4
[16,] 10.4
[17,] 14.7
[18,] 32.4
[19,] 30.4
[20,] 33.9
[21,] 21.5
[22,] 15.5
[23,] 15.2
[24,] 13.3
[25,] 19.2
[26,] 27.3
[27,] 26.0
[28,] 30.4
[29,] 15.8
[30,] 19.7
[31,] 15.0
[32,] 21.4
# create two separate matrices for IVs
<- as.matrix(cars_df$hp)
X1 <- as.matrix(cars_df$wt)
X2
# create constant column
# bind them altogether into one matrix
<- rep(1, nrow(cars_df))
constant <- cbind(constant, X1, X2)
X X
constant
[1,] 1 110 2.620
[2,] 1 110 2.875
[3,] 1 93 2.320
[4,] 1 110 3.215
[5,] 1 175 3.440
[6,] 1 105 3.460
[7,] 1 245 3.570
[8,] 1 62 3.190
[9,] 1 95 3.150
[10,] 1 123 3.440
[11,] 1 123 3.440
[12,] 1 180 4.070
[13,] 1 180 3.730
[14,] 1 180 3.780
[15,] 1 205 5.250
[16,] 1 215 5.424
[17,] 1 230 5.345
[18,] 1 66 2.200
[19,] 1 52 1.615
[20,] 1 65 1.835
[21,] 1 97 2.465
[22,] 1 150 3.520
[23,] 1 150 3.435
[24,] 1 245 3.840
[25,] 1 175 3.845
[26,] 1 66 1.935
[27,] 1 91 2.140
[28,] 1 113 1.513
[29,] 1 264 3.170
[30,] 1 175 2.770
[31,] 1 335 3.570
[32,] 1 109 2.780
Don’t forget to use %*%
for matrix multiplication!
# X prime X
<- t(X) %*% X
XpX
# X prime X inverse
<- solve(XpX)
XpXinv
# X prime Y
<- t(X) %*% Y
XpY
# beta coefficient estimates
<- XpXinv %*% XpY
bhat bhat
[,1]
constant 37.22727012
-0.03177295
-3.87783074