Autograd

Question 1: Appreciating autograd

Consider the following function:

\[ f(x) = x^2 + 3x + 2 \]

As well as the function \(g(x) = f(f(f(x)))\)

Calculate the gradient of both functions at point \(x = 2\).

Question 2: Approximating functions with gradients

The defining feature of the gradient is that it allows us to approximate the function locally by a linear function.

I.e., for some value \(x^*\), we know for very small \(\delta\), that

\[ f(x^* + \delta) \approx f(x^*) + f'(x^*) \cdot \delta \]

Plot the function from earlier as well as the local linear approximation at \(x = 2\) using ggplot2.

Hint

To do so, follow these steps:

  1. Create a sequence with 100 equidistant values between -4 to 4 using torch_linspace().
  2. Create the true function values at these points using the function from exercise 1.
  3. Approximate the function using the formula \(f(x^* + \delta) \approx f(x^*) + f'(x^*) \cdot \delta\).
  4. Create a data.frame with columns x, y_true, y_approx.
  5. Use ggplot2 to plot the function and its linear approximation.

Question 3: Look ma, I made my own autograd function

In this exercise, we will build our own, custom autograd function. While you might rarely need this in practice, it still allows you to get a better understanding of how the autograd system works. There is also a tutorial on this on the torch website.

To construct our own autograd function, we need to define:

  1. The forward pass:
    • How to calculate outputs from input
    • What to save for the backward pass
  2. The backward pass:
    • How to calculate the gradient of the output with respect to the input

The task is to re-create the ReLU activation function, which is a common activation function in neural networks and which is defined as:

\[ \text{ReLU}(x) = \max(0, x) \]

Note that strictly speaking, the ReLU function is not differentiable at \(x = 0\) (but a subgradient can be used instead). The derivative/subgradient of the ReLU function is:

\[ \text{ReLU}'(x) = \begin{cases} 1 & \text{if } x > 0 \\ 0 & \text{if } x \leq 0 \\ \end{cases} \]

In torch, a custom autograd function can be constructed using autograd_function() and it accepts arguments forward and backward which are functions that define the forward and backward pass: They both take as first argument a ctx, which is a communication object that is used to save information during the forward pass to be able to compute the gradient in the backward pass (e.g. for \(f(x) = x * a\), to calculate the gradient of \(f\) with respect to \(a\) we need to know the input value \(x\)). The return value of the backward pass should be a list of gradients with respect to the inputs. To check whether a gradient for an input is needed (has requires_grad = TRUE), you can use ctx$needs_input_grad which is a named list with boolean values for each input.

The backward function additionally takes a second argument grad_output, which is the gradient of the output: E.g., if our function is \(f(x)\) and we calculate the gradient of \(g(x) = h(f(x))\), then grad_output is the derivative of \(g\) with respect to its input, evaluated at \(f(x)\). This is essentially the chain rule: \(\frac{\partial g}{\partial x} = \frac{\partial h}{\partial f} \cdot \frac{\partial f}{\partial x}\).

Fill out the missing parts (...) in the code below.

relu <- autograd_function(
  forward = function(ctx, input) {
    mask <- ...
    output <- torch_where(mask, ...)
    ctx$save_for_backward(mask)
    output
  },
  backward = function(ctx, grad_output) {
    grads <- list(input = NULL)
    if (ctx$needs_input_grad$input) {
      mask <- ctx$saved_variables[[1]]
      grads$input <- ...
    }
    grads
  }
)

To check that it’s working, use the code below (with your relu instead of nnf_relu) and check that the results are the same.

x <- torch_tensor(-1, requires_grad = TRUE)
(nnf_relu(x)^2)$backward()
x$grad
torch_tensor
 0
[ CPUFloatType{1} ]
x$grad$zero_()
torch_tensor
 0
[ CPUFloatType{1} ]
x <- torch_tensor(3, requires_grad = TRUE)
(nnf_relu(x)^2)$backward()
x$grad
torch_tensor
 6
[ CPUFloatType{1} ]