6  Calculus

In this section we’ll focus on three big ideas from calculus: derivatives, optimization, and integrals.

6.1 Derivatives

Derivatives are about (instantaneous) rate of change.

“In the fall of 1972 President Nixon announced that the rate of increase of inflation was decreasing. This was the first time a sitting president used the third derivative to advance his case for reelection” (Rossi 1996)

Let’s dissect what Nixon might have said:

Inflation’s [first derivative, of prices] rate of increase [second derivative] is going down [third derivative].

A more graphical way to think about a derivatives is as a slope. Let’s consider a linear function of the form \(y = 2 x\):

library(tidyverse) # could also just do library(ggplot2)
ggplot() +
  stat_function(fun = function(x){2 * x}, 
                xlim = c(-10, 10))

We can imagine any political variables in the x- and y-axes. What is the rate of change? In other words, what is the derivative? Remember that we can calculate the slope with:

\[ m = \frac{f(x_2) - f(x_1)}{x_2-x_1} \]

Now consider another slightly more complicated function, a quadratic one, \(y = x^2\):

ggplot() +
  stat_function(fun = function(x){x ^ 2}, 
                xlim = c(-10, 10))

What happens when we apply our slope function?

Exercise
  1. Use the slope formula to calculate the rate of change between 5 and 6.

  2. Use the slope formula to calculate the rate of change between 5 and 5.5.

  3. Use the slope formula to calculate the rate of change between 5 and 5.1.

Takeaway: here the derivative depends on the value of \(x\). It is actually \(2x\).

Differential calculus is about finding these derivatives in a more straightforward manner! We can generalize our slope formula as follows:

\[ m = \frac{f(x_1+ \Delta x) - f(x_1)}{\Delta x} \]

The point is that when \(\Delta x\) is arbitrarily small, we’ll get our rate of change. Formalizing this:

\[ \lim_{\Delta x\to0} \frac{f(x_1+ \Delta x) - f(x_1)}{\Delta x} = \frac{d}{dx} f(x) = \frac{dy}{dx} = f'(x) \]

A few points on notation:

  • \(\frac{d}{dx} f(x)\) is read “The derivative of \(f\) of \(x\) with respect to \(x\).”

    • The variable with respect to which we’re differentiating is the one that appears in the bottom (in the case above, this is \(x\)).
    Warning

    While the above looks like a fraction, it’s really not. Do not try to cancel out the \(d\)s!

  • \(f'(x)\) (read: “\(f\) prime \(x\)”) is the derivative of \(f(x)\). This is a more compact form to refer to derivatives when you have defined \(f(x)\) elsewhere.

6.1.1 Rules of differentiation

How to compute derivatives? Sometimes you can try a bunch of numbers and get at the answer. Sometimes you can use the limit-based formula above, if you know a few properties of limits. But in most cases you will either use software (more on this later) or the rules of differentiation, which we will cover now.

Constant rule: \((c)' = 0\).

There is no change in a constant:

ggplot() +
  stat_function(fun = function(x){2}, xlim = c(-10, 10))

Coefficient rule: \((c \cdot f(x))' = c \cdot f'(x)\).

ggplot() +
  stat_function(fun = function(x){2 * x}, xlim = c(-10, 10), aes(color = "y = 2x")) +
  stat_function(fun = function(x){4 * x}, xlim = c(-10, 10), aes(color = "y = 4x")) +
  scale_color_manual("Function", values = c("red", "blue"))

Sum/difference rule: \((f(x) \pm g(x))' = f'(x) \pm g'(x)\).

The two rules above give us that the derivative is a linear operator.

Power rule: \((x^n)'=nx^{(n-1)}\)

Remember when we wanted to calculate the derivative of \(y=x^2\) above? We can use the power rule, with \(n=2\): \(\;nx^{(n-1)} = 2x^{(2-1)}=2x\). Let’s try out \(\frac{d}{dx}4x^3\) and \(\frac{d}{dx}(x^2 + 2x)\) on the board.

Exercise

Use the differentiation rules we have covered so far to calculate the derivatives of \(y\) with respect to \(x\) of the following functions:

  1. \(y = 2x^2 + 10\)
  2. \(y = 5x^4 - \frac{2}{3}x^3\)
  3. \(y = 9 \sqrt x\)
  4. \(y = \frac{4}{x^2}\)
  5. \(y = ax^3 + b\), where \(a\) and \(b\) are constants.
  6. \(y = \frac{2w}{5}\)

Exponent and logarithm rules:

\[ \begin{aligned} (c^x)' &= c^x \cdot ln(c), & \forall x>0 \\ (e^x)' &= e^x \\ \\ (log_a(x))' &= \frac{1}{x \cdot ln(a)}, & \forall x>0 \\ (ln(x))' &= \frac{1}{x}, & \forall x>0 \end{aligned} \]

We saw previously how Euler’s number (\(e\)) arises from compound interest. The properties above make it very useful in a lot of calculus applications!

Exercise

Compute the following:

  1. \(\frac{d}{dx}(10e^x)\)
  2. \(\frac{d}{dx}(ln(x) - \frac{e^2}{3})\)

Now we’ll get to a couple of more advanced (and powerful) rules.

Product rule: \((f(x)g(x))'=f'(x)g(x) + g'(x)f(x)\)

Let’s calculate \(\frac{d}{dx}(3 \cdot ln(x) \cdot x^2)\) on the board.

Quotient rule: \(\displaystyle(\frac{f(x)}{g(x)})' = \frac{f'(x)g(x) + g'(x)f(x)}{[g(x)]^2}\)

Chain rule: \((f(g(x))' = f'(g(x)) \cdot g'(x)\)

Let’s compute \(\frac{d}{dx}(e^{x^{2}})\) on the board.

Exercise

Use the differentiation rules we have covered so far to calculate the derivatives of \(y\) with respect to \(x\) of the following functions:

  1. \(x^3 \cdot x\)
  2. \(e^x \cdot x^2\)
  3. \((3x^4-8)^2\)

6.1.2 Higher-order derivatives

We saw how politicians can refer to higher-order derivatives. To compute them, you simply “pass the outputs,” starting from the lowest order and going up.

The second derivative tells us whether the slope of a function is increasing, decreasing, or staying the same at any point \(x\) on the function’s domain. For example, when driving a car:

  • \(f(x)\) = distance traveled at time \(x\)
  • \(f'(x)\) = speed at time \(x\)
  • \(f''(x)\) = acceleration at time \(x\)

Let’s compute the following second derivative:

\[f''(x^4) = \frac{d^2(x^4)}{dx^2}\]

  • First, we take the first derivative: \(f'(x^4)=4x^3\)

  • Then we use that output to take the second derivative: \(f''(x^4)=f'(4x^3)=12x^2\)

  • We can keep going… for example, the third derivative: \[f'''(x^4) = f'(12x^2) = 24x\]

Exercise

Compute the following:

  1. \(\frac{d^3}{dx^3}(x^5)\)
  2. \(f''(4x^{3/2})\)
  3. \(f''(4 \cdot ln(x))\)

6.1.3 Partial derivatives

For a function \(f(x,z)\), we might want to know how the function changes with respect to \(x\). We call this a partial derivative:

\[ \frac{\partial}{\partial_x}f(x,z) = \frac{\partial_y}{\partial_x} = \partial_x f \]

To obtain a partial derivative, we treat all other variables as constants and take the derivative with respect to the variable of interest (here \(x\)). For example:

\[ \begin{aligned} y = f(x,z) &= xz \\ \frac{\partial_y}{\partial_x} &= z \end{aligned} \]

What is \(\displaystyle\frac{\partial_y}{\partial_z}?\)

Let’s solve \(\displaystyle\frac{\partial (x^2y+xy^2-x)}{\partial x}\) and \(\displaystyle\frac{\partial (x^2y+xy^2-x)}{\partial y}\) on the board.

Example

Let’s say that \(y\) is how much I like a movie, \(d\) is how many dogs a movie has, and \(e\) is how many explosions a movie has. I claim that how much I like a movie can be expressed by a function of the type \(y = f(d, e)\). Evaluate the following situations:

  1. I like dogs and I don’t care about action. So I believe that the true relationship is \(y = f(d, e) = 3 \cdot d\). What is \(\displaystyle\frac{\partial_y}{\partial_d}\), and how can we interpret it?

  2. I like dogs and I like action. So I believe that the true relationship is \(y = f(d, e) = 3 \cdot d + 1 \cdot e\). What is \(\displaystyle\frac{\partial_y}{\partial_d}\), and how can we interpret it?

  3. I like dogs and I like action. But I definitely don’t like them together—I don’t want the dogs to be in danger! So I believe that the true relationship is \(y = f(d, e) = 3 \cdot d + 1 \cdot e -10 \cdot d \cdot e\). What is \(\displaystyle\frac{\partial_y}{\partial_d}\), and how can we interpret it?

Exercise

Take the partial derivative with respect to \(x\) and with respect to \(z\) of the following functions. What would the notation for each look like?

  1. \(y = 3xz - x\)
  2. \(x^3+z^3+x^4z^4\)
  3. \(e^{xz}\)

6.1.4 Differentiability of functions

Not all functions are differentiable at every point of their domains!

An important concept here is whether functions are continuous at a point:

  • Informally: A function is continuous at a point if its graph has no holes or breaks at that point

  • Formally: A function is continuous at a point \(a\) if: \(\lim_{x \to a} f(x)=f(a)\)

When is a function differentiable at a point?

  • If a function is differentiable at a point, it is also continuous at that point.

  • If a function is continuous at a point, it is not necessarily differentiable at that point.

    • Impossible to calculate derivative at sharp turns, cusps, or vertical tangents.
ggplot() +
  stat_function(fun = function(x){abs(x) + 2}, xlim = c(-4, 4), 
                aes(color = "y = |x| + 2")) +
  stat_function(fun = function(x){sqrt(abs(x)) + 1}, xlim = c(-4, 4), 
                aes(color = "y = √(|x|) + 1")) +
  stat_function(fun = function(x){sign(x) * abs(x)^(1 / 3)}, xlim = c(-4, 4), 
                aes(color = "y = ₃√x")) +
  scale_colour_manual("Function", values = c("red", "blue", "black")) +
  labs(title = "Examples of functions that are not differentiable at x=0")

Informally, functions need to be continuous and reasonably smooth to be differentiable.

6.1.5 How do computers calculate derivatives?

In quite a few statistics and machine learning problems, computers need to compute derivatives of arbitrarily complex functions, perhaps millions of times. How do they do it? (see Baydin et al. 2018 for discussion of these three approaches)

  • Symbolic differentiation: automatically combine the rules of differentiation (power rule, product rule, etc.). It is what math solvers use, e.g., WolframAlpha or (presumably) Symbolab.

  • Numerical differentiation: infer the derivative by computing the function at different sample values (like we did with \(y=x^2\) before. This is what, for example, R’s optim() function does behind the scenes.

  • Automatic differentiation: track how every function is constructed from (differentiable) elementary computer operations (e.g., binary arithmetic), and get the result using the chain rule. Implemented in the TensorFlow, PyTorch, and JAX Python libraries, and the ReverseDiff.jl and Zygote.jl Julia packages.

An example of computing the gradient of an esoteric function using Zygote.jl (from its documentation)

6.2 Optimization

Optimization allows us to find the minimum or maximum values (or extrema) a function takes. It has many applications in the social sciences:

  • Formal theory: utility maximization, continuous choices

  • Ordinary Least Squares (OLS): Focuses on minimizing the squared errors between observed data and model-estimated values

  • Maximum Likelihood Estimation (MLE): Focuses on maximizing a likelihood function, given observed values.

6.2.1 Extrema

On extrema: informally, a maximum is just the highest value a function takes, and a minimum is the lowest value.

In some situations, it can be easy to identify extrema intuitively by looking at a graph of the function.

  • Maxima are high points (“peaks”)

  • Minima are low points (“valleys”)

We can use derivatives (rates of change!) to get at extrema.

6.2.2 Critical points and the First-Order Condition

At critical points (or stationary points), the derivative is zero or fails to exist. At these, the function has usually reached a (local) maximum or minimum.

  • At a maximum, the function must be increasing before the point and decreasing after it.

  • At a minimum, the function must be decreasing before the point and increasing after it.

Warning

Local extrema occur at critical points, but not all critical points are extrema. For instance, sometimes the graph is changing between concave and convex (“inflection points”). Or sometimes the function is not differentiable at that point for other reasons.

We can find the local maxima and/or minima of a function by taking the derivative, setting it equal to zero, and solving for \(x\) (or whatever variable). This gives us the First-Order Condition (FOC).

\[FOC: f'(x)=0\]

6.2.3 Second-Order Condition

Notice that after this we only know that there is a critical point. BUT we don’t know if we’ve found a maximum or minimum, or even if we’ve found an extremum.

To determine whether a we are seeing a (local) maximum or minimum, we can use the Second Derivative Test:

  • Start by identifying \(f''(x)\)

  • Substitute in the stationary points \((x^*)\) identified from the FOC.

    • \(f''(x^*) > 0\) we have a local minimum

    • \(f''(x^*) < 0\) we have a local maximum

    • \(f''(x^*) = 0\) we (may) have an inflection point - need to calculate higher-order derivatives (don’t worry about this now)

Collectively these give us the Second-Order Condition (SOC).

Let’s do this procedure and obtain the FOC and SOC for \(\displaystyle y=\frac{1}{2} x^3 + 3 x^2 - 2\) on the board. What do we learn? Compare this with the plot of the function on Desmos.

6.2.4 Local or global extrema?

Now when it comes to knowing whether extrema are local or global:

  • Here we use the Extreme value theorem, which states that if a real-valued function is continuous on a closed and bounded (i.e., finite) interval, the function must have a global minimum and a global minimum on that interval at least once. Importantly, in this situation the global extrema exist, and they are either at the local extrema or at the boundaries (where we cannot even find critical points).

  • So to find the minimum/maximum on some interval, compare the local min/max to the value of the function at the interval’s endpoints. So, e.g., if the interval is \((-\infty, +\infty)\), check the function’s limits as it approaches \(-\infty\) and \(+\infty\).

Let’s try this last step for our example above, \(\displaystyle y=\frac{1}{2} x^3 + 3 x^2 - 2\), to get the global extrema in the entire domain.

Exercise

Identify the global extrema of the function \(\displaystyle \frac{x^3}{3} - \frac{3}{2}x^2 -10x\) in the interval \([-6, 6]\).

6.3 Integrals

Informally, we can think of integrals as the flip side of derivatives.

We can motivate integrals as a way of finding the area under a curve. Sometimes finding the area is easy. What’s the area under the curve between \(x=-1\) and \(x=1\) for this function?

\[ f(x) = \begin{cases} \frac{1}{3} & \text{for } x \in [0, 3] \\ 0 & \text{otherwise} \end{cases} \]

Normally, finding the area under a curve is much harder. But this is basically the question behind integration.

6.3.1 Integrals are about infinitesimals too

Let’s say we have a function \(y = x^2\) And we want to find the area under the curve from \(x=0\) to \(x=1\). How would we do this?

ggplot() +
  # draw main function
  stat_function(fun = function(x){x ^ 2}, xlim = c(-2, 2)) +
  # fill area under the curve between x = 0 and x = 1
  geom_area(mapping = aes(x = 0), stat = "function",
            fun = function(x){x ^ 2}, xlim = c(0, 1), fill = "red")

One way to approximate this area is by drawing narrow rectangles that cover the area in red. Let’s draw this on the board.

Our approximation is rough, but it gets better and better the narrower the rectangles are:

\[ Area = lim_{\Delta x \to 0}\sum_i^n{f(x) \cdot \Delta x} \]

, where \(\Delta x\) is the width of the rectangles and \(n\) is their number.

This is actually one way to define the definite integral, \(\displaystyle\int_a^bf(x)dx\) (also known as the Riemann integral). We’ll learn how to compute these in a few moments.

6.3.2 Indefinite integrals as antiderivatives

The indefinite integral, also known as the antiderivative, \(F(x)\) is the inverse of the function \(f'(x)\). \[F(x)= \displaystyle\int f(x) \text{ } dx\]

This means if you take the derivative of \(F(x)\), you wind up back at \(f(x)\). \[F' = f \text{ or } \displaystyle\frac{dF(x)}{dx} = f(x)\]

For example, what is the antiderivative for a constant function \(f(x) = 1\)? Is there just one? (this example comes from Moore and Siegel, 2013, p. 137).

This process is called anti-differentiation. We can use this concept to help us solve definite integrals!

6.3.3 Solving definite integrals

One way to calculate definite integrals, known as the “fundamental theorem of calculus,” is shown below:

\[\displaystyle\int_{a}^{b} f(x) \text{ } dx = F(b)-F(a) = F(x)\bigg|_{a}^{b}\]

First we determine the antiderivative (indefinite integral) of \(f(x)\) (and represent it \(F(x)\)), substitute the upper limit first and then the lower limit one by one, and subtract the results in order.

Warning

\(C\) in the following definitions and rules is the called the “constant of integration.” We need to add it when we define all antiderivatives (integrals) of a function because the anti-derivative “undoes” the derivative.

Remember that the derivative of any constant is zero. So if we find an integral \(F(x)\) whose derivative is \(f(x)\), adding (or subtracting) any constant will give us another integral \(F(x)+C\) whose derivative is also \(f(x)\).

6.3.4 Rules of integration

Many of the rules of integetration have counterparts in differentiation.

Coefficient rule: \(\displaystyle\int c f(x)\,dx = c \int f(x)\,dx\)

Sum/difference rule: \(\displaystyle\int (f(x) \pm g(x))\,dx = \int f(x)\,dx \pm \int g(x)\,dx\)

Constant rule: \(\displaystyle\int c\,dx = cx + C\)

Power rule: \(\displaystyle\int x^n\,dx = \frac{x^{n+1}}{n+1} + C \qquad \forall n \neq -1\)

Reciprocal rule:\(\displaystyle\int \frac{1}{x}\,dx = \ln(x) + C\)

Exponent and logarithm rules:

\[ \begin{aligned} \displaystyle \int e^x \,dx &= e^x+C \\ \displaystyle \int c^x \,dx &= \frac{c^x}{ln(c)}+C\\ \\ \displaystyle \int ln(x) \,dx &= x \cdot ln(x) - x+C \\ \displaystyle \int log_c(x) \,dx &= \frac{x \cdot log_c(x) - x}{log_c(x)}+C \end{aligned} \]

The final two rules are analog to the product rule and the chain rule:

Integration by parts: \(\displaystyle \int f(x)g'(x)\,dx = f(x)g(x) - \int f'(x)g(x)\,dx\)

Integration by substitution:

\[ \begin{aligned} 1.& \text{ Have }\displaystyle \int f(g(x))g'(x)\,dx \\ 2.& \text{ Set u=g(x)} \\ 3.& \text{ Compute } \int f(u)\,du \\ 4.& \text{ Replace u for g(x)} \end{aligned} \]

Let’s do an example on the board: \(\displaystyle \int e^{x^2} 2x \,dx\).

6.3.5 Solving the problem

Remember our function \(y=x^2\) and our goal of finding the area under the curve from \(x=0\) to \(x=1\). We can describe this problem as \(\displaystyle \int_0^1 x^2 dx\)

Find the indefinite integral, \(F(x)\):

\[ \displaystyle\int x^2 \text{ } dx = \displaystyle\frac{x^3}{3}+C \]

Now we’ll use the fundamental theory of calculus. Evaluate at our lowest and highest points, \(F(0)\) and \(F(1)\):

  • \(F(0) = 0\)

  • \(F(1) = \displaystyle\frac{1}{3}\)

  • Technically \(0 + C\) and \(\displaystyle\frac{1}{3} + C\), but the C’s will fall out in the next step

Calculate \(F(1) - F(0)\) \[\displaystyle\frac{1}{3} - 0 = \displaystyle\frac{1}{3}\]

Exercise

Solve the following indefinite integrals:

  1. \(\int x^2 \text{ } dx\)

  2. \(\int 3x^2\text{ } dx\)

  3. \(\int x\text{ } dx\)

  4. \(\int (3x^2 + 2x - 7\text{ })dx\)

  5. \(\int \dfrac{2}{x}\text{ }dx\)

And solve the following definite integrals:

  1. \(\displaystyle\int_{1}^{7} x^2 \text{ } dx\)

  2. \(\displaystyle\int_{1}^{10} 3x^2 \text{ } dx\)

  3. \(\displaystyle\int_7^7 x\text{ } dx\)

  4. \(\displaystyle\int_{1}^{5} 3x^2 + 2x - 7\text{ }dx\)

  5. \(\int_{1}^{e} \dfrac{2}{x}\text{ }dx\)