3  Matrices

Matrices are rectangular collections of numbers. In this module we will introduce them and review some basic operators, to then introduce a sneak peek of why matrices are useful (and cool).

3.1 Introduction

3.1.1 Scalars

One number (for example, 12) is referred to as a scalar.

\[ a = 12 \]

3.1.2 Vectors

We can put several scalars together to make a vector. Here is an example:

\[ \overrightarrow b = \begin{bmatrix} 12 \\ 14 \\ 15 \end{bmatrix} \]

Since this is a column of numbers, we cleverly refer to it as a column vector.

Here is another example of a vector, this time represented as a row vector:

\[ \overrightarrow c = \begin{bmatrix} 12 & 14 & 15 \end{bmatrix} \]

Column vectors are possibly more common and useful, but we sometimes write things down using row vectors to

Vectors are fairly easy to construct in R. As we saw before, we can use the c() function to combine elements:

c(5, 25, -2, 1)
[1]  5 25 -2  1
Warning

Remember that the code above does not create any objects. To do so, you’d need to use the assignment operator (<-):

vector_example <- c(5, 25, -2, 1)
vector_example
[1]  5 25 -2  1

Or we can also create vectors from sequences with the : operator or the seq() function:

10:20
 [1] 10 11 12 13 14 15 16 17 18 19 20
seq(from = 3, to = 27, by = 3)
[1]  3  6  9 12 15 18 21 24 27

3.2 Operators

3.2.1 Summation

The summation operator \(\sum\) (i.e., the uppercase Sigma letter) lets us perform an operation on a sequence of numbers, which is often but not always a vector.

\[\overrightarrow d = \begin{bmatrix} 12 & 7 & -2 & 3 & -1 \end{bmatrix}\]

We can then calculate the sum of the first three elements of the vector, which is expressed as follows: \[\sum_{i=1}^3 d_i\]

Then we do the following math: \[12+7+(-2)=17\]

It is also common to use \(n\) in the superscript to indicate that we want to sum all elements:

\[ \sum_{i=1}^n d_i = 12 + 7 + (-2) + 3 + (-1) = 19 \] We can perform these operations using the sum() function in R:

vector_d <- c(12, 7, -2, 3, -1)
sum(vector_d[1:3])
[1] 17
sum(vector_d)
[1] 19

3.2.2 Product

The product operator \(\prod\) (i.e., the uppercase Pi letter) can also perform operations over a sequence of elements in a vector. Recall our previous vector:

\[\overrightarrow d = \begin{bmatrix} 12 & 7 & -2 & 3 & 1 \end{bmatrix}\]

We might want to calculate the product of all its elements, which is expressed as follows: \[\prod_{i=1}^n d_i = 12 \cdot 7 \cdot (-2) \cdot 3 \cdot (-1) = 504\]

In R, we can compute products using the prod() function:

prod(vector_d)
[1] 504
Exercise

Get the product of the first three elements of vector \(d\). Write the notation by hand and use R to obtain the number.

3.3 Matrices

3.3.1 Basics

We can append vectors together to form a matrix:

\[A = \begin{bmatrix} 12 & 14 & 15 \\ 115 & 22 & 127 \\ 193 & 29 & 219 \end{bmatrix}\]

The number of rows and columns of a matrix constitute the dimensions of the matrix. The first number is the number of rows (“r”) and the second number is the number of columns (“c”) in the matrix.

Important

Find a way to remember “r x c” permanently. The order of the dimensions never changes.

Matrix \(A\) above, for example, is a \(3x3\) matrix. Sometimes we’d refer to it as \(A_{3x3}\).

Tip

It is common to use capital letters (sometimes bold-faced) to represent matrices. In contrast, vectors are usually represented with either bold lowercase letters or lowercase letters with an arrow on top (e.g., \(\overrightarrow v\)).

Constructing matrices in R

There are different ways to create matrices in R. One of the simplest is via rbind() or cbind(), which paste vectors together (either by rows or by columns):

# Create some vectors
vector1 <- 1:4
vector2 <- 5:8
vector3 <- 9:12
vector4 <- 13:16
# Using rbind(), each vector will be a row 
rbind_mat <- rbind(vector1, vector2, vector3, vector4)
rbind_mat
        [,1] [,2] [,3] [,4]
vector1    1    2    3    4
vector2    5    6    7    8
vector3    9   10   11   12
vector4   13   14   15   16
# Using cbind(), each vector will be a column
cbind_mat <- cbind(vector1, vector2, vector3, vector4)
cbind_mat
     vector1 vector2 vector3 vector4
[1,]       1       5       9      13
[2,]       2       6      10      14
[3,]       3       7      11      15
[4,]       4       8      12      16

An alternative is to use to properly named matrix() function. The basic syntax is matrix(data, nrow, ncol, byrow):

  • data is the input vector which becomes the data elements of the matrix.
  • nrow is the number of rows to be created.
  • ncol is the number of columns to be created.
  • byrow is a logical clue. If TRUE then the input vector elements are arranged by row. By default (FALSE), elements are arranged by column.

Let’s see some examples:

# Elements are arranged sequentially by row.
M <- matrix(c(1:12), nrow = 4, byrow = T)
M
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12
# Elements are arranged sequentially by column (byrow = F by default).
N <- matrix(c(1:12), nrow = 4)
N
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

3.3.2 Structure

How do we refer to specific elements of the matrix? For example, matrix \(A\) is an \(m\times n\) matrix where \(m=n=3\). This is sometimes called a square matrix.

More generally, matrix \(B\) is an \(m\times n\) matrix where the elements look like this: \[B= \begin{bmatrix} b_{11} & b_{12} & b_{13} & \ldots & b_{1n} \\ b_{21} & b_{22} & b_{23} & \ldots & b_{2n} \\ \vdots & \vdots & \vdots & \ldots & \vdots \\ b_{m1} & b_{m2} & b_{m3} & \ldots & b_{mn} \end{bmatrix}\]

Thus \(b_{23}\) refers to the second unit down and third across. More generally, we refer to row indices as \(i\) and to column indices as \(j\).

In R, we can access a matrix’s elements using square brackets:

# In matrix N, access the element at 1st row and 3rd column.
N[1,3]
[1] 9
# In matrix N, access the element at 4th row and 2nd column.
N[4,2]
[1] 8
Tip

When trying to identify a specific element, the first subscript is the element’s row and the second subscript is the element’s column (always in that order).

3.4 Matrix operations

3.4.1 Addition and subtraction

  • Addition and subtraction are straightforward operations.

  • Matrices must have exactly the same dimensions for both of these operations.

  • We add or subtract each element with the corresponding element from the other matrix.

  • This is expressed as follows:

\[A \pm B=C\]

\[c_{ij}=a_{ij} \pm b_{ij} \text{ }\forall i,j\]

\[\begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix} \pm \begin{bmatrix} b_{11} & b_{12} & b_{13}\\ b_{21} & b_{22} & b_{23}\\ b_{31} & b_{32} & b_{33} \end{bmatrix}\] \[=\] \[\begin{bmatrix} a_{11}\pm b_{11} & a_{12}\pm b_{12} & a_{13}\pm b_{13}\\ a_{21}\pm b_{21} & a_{22}\pm b_{22} & a_{23}\pm b_{23}\\ a_{31}\pm b_{31} & a_{32}\pm b_{32} & a_{33}\pm b_{33} \end{bmatrix}\]

Addition and subtraction in R

We start by creating two 2x3 matrices:

# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
matrix1
     [,1] [,2] [,3]
[1,]    3   -1    2
[2,]    9    4    6
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
matrix2
     [,1] [,2] [,3]
[1,]    5    0    3
[2,]    2    9    4

We can simply use the + and - operators for addition and substraction:

matrix1 + matrix2
     [,1] [,2] [,3]
[1,]    8   -1    5
[2,]   11   13   10
matrix1 - matrix2
     [,1] [,2] [,3]
[1,]   -2   -1   -1
[2,]    7   -5    2
Exercise

(Use code for one of these and do the other one by hand!)

1) Calculate \(A + B\)

\[A= \begin{bmatrix} 1 & 0 \\ -2 & -1 \end{bmatrix}\]

\[B = \begin{bmatrix} 5 & 1 \\ 2 & -1 \end{bmatrix}\]

2) Calculate \(A - B\)

\[A= \begin{bmatrix} 6 & -2 & 8 & 12 \\ 4 & 42 & 8 & -6 \end{bmatrix}\]

\[B = \begin{bmatrix} 18 & 42 & 3 & 7 \\ 0 & -42 & 15 & 4 \end{bmatrix}\]

3.4.2 Scalar multiplication

Scalar multiplication is very intuitive. As we know, a scalar is a single number. We multiply each value in the matrix by the scalar to perform this operation.

Formally, this is expressed as follows: \[A = \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix}\] \[cA = \begin{bmatrix} ca_{11} & ca_{12} & ca_{13}\\ ca_{21} & ca_{22} & ca_{23}\\ ca_{31} & ca_{32} & ca_{33} \end{bmatrix}\]

In R, all we need to do is take an established matrix and multiply it by some scalar:

# matrix1 from our previous example
matrix1
     [,1] [,2] [,3]
[1,]    3   -1    2
[2,]    9    4    6
matrix1 * 3
     [,1] [,2] [,3]
[1,]    9   -3    6
[2,]   27   12   18
Exercise

Calculate \(2\times A\) and \(-3 \times B\). Again, do one by hand and the other one using R.

\[A= \begin{bmatrix} 1 & 4 & 8 \\ 0 & -1 & 3 \end{bmatrix}\] \[ B = \begin{bmatrix} -15 & 1 & 5 \\ 2 & -42 & 0 \\ 7 & 1 & 6 \end{bmatrix}\]

3.4.3 Matrix multiplication

  • Multiplying matrices is slightly trickier than multiplying scalars.

  • Two matrices must be conformable for them to be multiplied together. This means that the number of columns in the first matrix equals the number of rows in the second.

  • When multiplying \(A \times B\), if \(A\) is \(m \times n\), \(B\) must have \(n\) rows.

Important

The conformability requirement never changes. Before multiplying anything, check to make sure the matrices are indeed conformable.

  • The resulting matrix will have the same number of rows as the first matrix and the number of columns in the second. For example, if \(A\) is \(i \times k\) and \(B\) is \(k \times j\), then \(A \times B\) will be \(i \times j\).

Which of the following can we multiply? What will be the dimensions of the resulting matrix? \[\begin{aligned} B= \begin{bmatrix} 2 \\ 3\\ 4\\ 1 \end{bmatrix} M = \begin{bmatrix} 1 & 0 & 2\\ 1 & 2 & 4\\ 2 & 3 & 2 \end{bmatrix} L = \begin{bmatrix} 6 & 5 & -1\\ 1 & 4 & 3 \end{bmatrix} \end{aligned}\]

Why can’t we multiply in the opposite order?

Warning

When multiplying matrices, order matters. Even if multiplication is possible in both directions, in general \(AB \neq BA\).

Multiplication steps

  • Multiply each row by each column, summing up each pair of multiplied terms.
Tip

This is sometimes to referred to as the “dot product,” where we multiply matching members, then sum up.

  • The element in position \(ij\) is the sum of the products of elements in the \(i\)th row of the first matrix (\(A\)) and the corresponding elements in the \(j\)th column of the second matrix (\(B\)). \[c_{ij}=\sum_{k=1}^n a_{ik}b_{kj}\]

Example

Suppose a company manufactures two kinds of furniture: chairs and sofas.

  • A chair costs $100 for wood, $270 for cloth, and $130 for feathers.

  • Each sofa costs $150 for wood, $420 for cloth, and $195 for feathers.

Chair Sofa
Wood 100 150
Cloth 270 420
Feathers 130 195

The same information about unit cost (\(C\)) can be presented as a matrix.

\[C = \begin{bmatrix} 100 & 150\\ 270 & 420\\ 130 & 195 \end{bmatrix}\]

Note that each of the three rows of this 3 x 2 matrix represents a material (wood, cloth, or feathers), and each of the two columns represents a product (chair or coach). The elements are the unit cost (in USD).


Now, suppose that the company will produce 45 chairs and 30 sofas this month. This production quantity can be represented in the following table, and also as a 2 x 1 matrix (\(Q\)):

Product Quantity
Chair 45
Sofa 30

\[Q = \begin{bmatrix} 45 \\ 30 \end{bmatrix}\]

What will be the company’s total cost? The “total expenditure” is equal to the “unit cost” times the “production quantity” (the number of units).

The total expenditure (\(E\)) for each material this month is calculated by multiplying these two matrices.

\[\begin{aligned} E = CQ = \begin{bmatrix} 100 & 150\\ 270 & 420\\ 130 & 195 \end{bmatrix} \begin{bmatrix} 45 \\ 30 \end{bmatrix} = \begin{bmatrix} (100)(45) + (150)(30) \\ (270)(45) + (420)(30) \\ (130)(45) + (195)(30) \end{bmatrix} = \begin{bmatrix} 9,000 \\ 24,750 \\ 11,700 \end{bmatrix} \end{aligned}\]

Multiplying the 3x2 Cost matrix (\(C\)) times the 2x1 Quantity matrix (\(Q\)) yields the 3x1 Expenditure matrix (\(E\)).

As a result of this matrix multiplication, we determine that this month the company will incur expenditures of:

  • $9,000 for wood
  • $24,750 for cloth
  • $11,700 for feathers.

Matrix multiplication in R

Before attempting matrix multiplication, we must make sure the matrices are conformable (as we do for our manual calculations).

Then we can multiply our matrices together using the %*% operator.

C <- matrix(c(100, 270, 130, 150, 420, 195), nrow = 3)
C
     [,1] [,2]
[1,]  100  150
[2,]  270  420
[3,]  130  195
Q <- matrix(c(45, 30), nrow = 2)
Q
     [,1]
[1,]   45
[2,]   30
C %*% Q
      [,1]
[1,]  9000
[2,] 24750
[3,] 11700
Warning

If you have a missing value or NA in one of the matrices you are trying to multiply (something we will discuss in further detail in the next module), you will have NAs in your resulting matrix.

3.4.4 Properties of operations

  • Addition and subtraction:

    • Associative: \((A \pm B) \pm C = A \pm (B \pm C)\)

    • Communicative: \(A \pm B = B \pm A\)

  • Multiplication:

    • \(AB \neq BA\)

    • \(A(BC) = (AB)C\)

    • \(A(B+C) = AB + AC\)

    • \((A+B)C = AC + BC\)

3.5 Special matrices

Square matrix

  • In a square matrix, the number of rows equals the number of columns (\(m=n\)):

  • The diagonal of a matrix is a set of numbers consisting of the elements on the line from the upper-left-hand to the lower-right-hand corner of the matrix. Diagonals are particularly useful in square matrices.

  • The trace of a matrix, denoted as \(tr(A)\), is the sum of the diagonal elements of the matrix.

Diagonal matrix:

  • In a diagonal matrix, all of the elements of the matrix that are not on the diagonal are equal to zero.

Scalar matrix:

  • A scalar matrix is a diagonal matrix where the diagonal elements are all equal to each other. In other words, we’re really only concerned with one scalar (or element) held in the diagonal.

Identity matrix:

  • The identity matrix is a scalar matrix with all of the diagonal elements equal to one.

  • Remember that, as with all diagonal matrices, the off-diagonal elements are equal to zero.

  • The capital letter \(I\) is reserved for the identity matrix. For convenience, a 3x3 identity matrix can be denoted as \(I_3\).

3.6 Transpose

The transpose is the original matrix with the rows and the columns interchanged.

The notation is either \(J'\) (“J prime”) or \(J^T\) (“J transpose”).

\[J = \begin{bmatrix} 4 & 5\\ 3 & 0\\ 7 & -2 \end{bmatrix}\]

\[J' = J^T = \begin{bmatrix} 4 & 3 & 7 \\ 5 & 0 & -2 \end{bmatrix}\]

In R, we use t() to get the transpose.

J <- matrix(c(4, 3, 7, 5, 0, -2), ncol = 2)
J
     [,1] [,2]
[1,]    4    5
[2,]    3    0
[3,]    7   -2
t(J)
     [,1] [,2] [,3]
[1,]    4    3    7
[2,]    5    0   -2

3.7 Inverse

  • Just like a number has a reciprocal, a matrix has an inverse.

  • When we multiply a matrix by its inverse we get the identity matrix (which is like “1” for matrices).

\[A × A^{-1} = I\]

  • The inverse of A is \(A^{-1}\) only when:

\[AA^{-1} = A^{-1}A = I\]

  • Sometimes there is no inverse at all.
Note

For now, don’t worry about calculating the inverse of a matrix manually. This is the type of task we use R for.

  • In R, we use the solve() function to calculate the inverse of a matrix:
A <- matrix(c(3, 2, 5, 2, 3, 2, 5, 2, 4), ncol = 3)
A
     [,1] [,2] [,3]
[1,]    3    2    5
[2,]    2    3    2
[3,]    5    2    4
solve(A)
            [,1]        [,2]       [,3]
[1,] -0.29629630 -0.07407407  0.4074074
[2,] -0.07407407  0.48148148 -0.1481481
[3,]  0.40740741 -0.14814815 -0.1851852

3.8 Linear systems and matrices

  • A system of equations can be represented by an augmented matrix.

  • System of equations: \[{\color{red}{3}}x + {\color{green}{6}}y = {\color{blue}{12}}\] \[{\color{red}{5}}x + {\color{green}{10}}y = {\color{blue}{25}}\]

  • In an augmented matrix, each row represents one equation in the system and each column represents a variable or the constant terms. \[\begin{bmatrix} {\color{red}{3}} & {\color{green}{6}} & {\color{blue}{12}}\\ {\color{red}{5}} & {\color{green}{10}} & {\color{blue}{25}} \end{bmatrix}\]

3.9 OLS and matrices

  • We can use the logic above to calculate estimates for our ordinary least squares (OLS) models.

  • OLS is a linear regression technique used to find the best-fitting line for a set of data points (observations) by minimizing the residuals (the differences between the observed and predicted values).

  • We minimize the sum of the squared errors.

3.9.1 Dependent variable

  • Suppose, for example, we have a sample consisting of \(n\) observations.

  • The dependent variable is denoted as an \(n \times1\) column vector.

\[Y = \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n \end{bmatrix}\]

3.9.2 Independent variables

  • Suppose there are \(k\) independent variables and a constant term, meaning \(k+1\) columns and \(n\) rows.

  • We can represent these variables as an \(n \times (k+1)\) matrix, expressed as follows:

\[X= \begin{bmatrix} 1 & x_{11} & \dots & x_{1k} \\ 1 & x_{21} & \dots & x_{2k} \\ \vdots & \vdots & \dots & \vdots \\ 1 & x_{n1} & \dots & x_{nk} \end{bmatrix}\]

  • \(x_{ij}\) is the \(i\)-th observation of the \(j\)-th independent variable.

3.9.3 Linear regression model

  • Let’s say we have 173 observations (n = 173) and 2 IVs (k = 3).

  • This can be expressed as the following linear equation: \[y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \epsilon\]

  • In matrix form, we have: \[\begin{aligned} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & x_{21} \\ 1 & x_{21} & x_{22} \\ \vdots & \vdots & \vdots \\ 1 & x_{1173} & x_{2173} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_{173} \end{bmatrix}\end{aligned} \]

  • All 173 equations can be represented by: \[y=X\beta+\epsilon\]

3.9.4 Estimates

  • Without getting too much into the mechanics, we can calculate our coefficient estimates with matrix algebra using the following equation:

\[\hat{\beta} = (X'X)^{-1}X'Y\]

  • Read aloud, we say “X prime X inverse, X prime Y”.

  • The little hat on our beta (\(\hat{\beta}\)) signifies that these are estimates.

  • Remember, the OLS method is to choose \(\hat{\beta}\) such that the sum of squared residuals (“SSR”) is minimized.

3.9.4.1 Example in R

  • We will load the mtcars data set (our favorite) for this example, which contains data about many different car models.
cars_df <- mtcars
  • Now, we want to estimate the association between hp (horsepower) and wt (weight), our independent variables, and mpg (miles per gallon), our dependent variable.

  • First, we transform our dependent variable into a matrix, using the as.matrix function and specifying the column of the mtcars data set to create a column vector of our observed values for the DV.

Y <- as.matrix(cars_df$mpg)
Y
      [,1]
 [1,] 21.0
 [2,] 21.0
 [3,] 22.8
 [4,] 21.4
 [5,] 18.7
 [6,] 18.1
 [7,] 14.3
 [8,] 24.4
 [9,] 22.8
[10,] 19.2
[11,] 17.8
[12,] 16.4
[13,] 17.3
[14,] 15.2
[15,] 10.4
[16,] 10.4
[17,] 14.7
[18,] 32.4
[19,] 30.4
[20,] 33.9
[21,] 21.5
[22,] 15.5
[23,] 15.2
[24,] 13.3
[25,] 19.2
[26,] 27.3
[27,] 26.0
[28,] 30.4
[29,] 15.8
[30,] 19.7
[31,] 15.0
[32,] 21.4
  • Next, we do the same thing for our independent variables of interest, and our constant.
# create two separate matrices for IVs
X1 <- as.matrix(cars_df$hp)
X2 <- as.matrix(cars_df$wt)

# create constant column

# bind them altogether into one matrix
constant <-  rep(1, nrow(cars_df))
X <- cbind(constant, X1, X2)
X
      constant          
 [1,]        1 110 2.620
 [2,]        1 110 2.875
 [3,]        1  93 2.320
 [4,]        1 110 3.215
 [5,]        1 175 3.440
 [6,]        1 105 3.460
 [7,]        1 245 3.570
 [8,]        1  62 3.190
 [9,]        1  95 3.150
[10,]        1 123 3.440
[11,]        1 123 3.440
[12,]        1 180 4.070
[13,]        1 180 3.730
[14,]        1 180 3.780
[15,]        1 205 5.250
[16,]        1 215 5.424
[17,]        1 230 5.345
[18,]        1  66 2.200
[19,]        1  52 1.615
[20,]        1  65 1.835
[21,]        1  97 2.465
[22,]        1 150 3.520
[23,]        1 150 3.435
[24,]        1 245 3.840
[25,]        1 175 3.845
[26,]        1  66 1.935
[27,]        1  91 2.140
[28,]        1 113 1.513
[29,]        1 264 3.170
[30,]        1 175 2.770
[31,]        1 335 3.570
[32,]        1 109 2.780
  • Next, we calculate \(X'X\), \(X'Y\), and \((X'X)^{-1}\).

Don’t forget to use %*% for matrix multiplication!

# X prime X
XpX <- t(X) %*% X

# X prime X inverse
XpXinv <- solve(XpX)

# X prime Y
XpY <- t(X) %*% Y

# beta coefficient estimates
bhat <- XpXinv %*% XpY
bhat
                [,1]
constant 37.22727012
         -0.03177295
         -3.87783074