<- autograd_function(
relu forward = function(ctx, input) {
<- ...
mask <- torch_where(mask, ...)
output $save_for_backward(mask)
ctx
output
},backward = function(ctx, grad_output) {
<- list(input = NULL)
grads if (ctx$needs_input_grad$input) {
<- ctx$saved_variables[[1]]
mask $input <- ...
grads
}
grads
} )
Autograd
Question 1: Appreciating autograd
Consider the following function:
\[ f(x) = x^2 + 3x + 2 \]
As well as the function \(g(x) = f(f(f(x)))\)
Calculate the gradient of both functions at point \(x = 2\).
Question 2: Approximating functions with gradients
The defining feature of the gradient is that it allows us to approximate the function locally by a linear function.
I.e., for some value \(x^*\), we know for very small \(\delta\), that
\[ f(x^* + \delta) \approx f(x^*) + f'(x^*) \cdot \delta \]
Plot the function from earlier as well as the local linear approximation at \(x = 2\) using ggplot2
.
Hint
To do so, follow these steps:
- Create a sequence with 100 equidistant values between -4 to 4 using
torch_linspace()
. - Create the true function values at these points using the function from exercise 1.
- Approximate the function using the formula \(f(x^* + \delta) \approx f(x^*) + f'(x^*) \cdot \delta\).
- Create a
data.frame
with columnsx
,y_true
,y_approx
. - Use
ggplot2
to plot the function and its linear approximation.
Question 3: Look ma, I made my own autograd function
In this exercise, we will build our own, custom autograd function. While you might rarely need this in practice, it still allows you to get a better understanding of how the autograd system works. There is also a tutorial on this on the torch
website.
To construct our own autograd function, we need to define:
- The forward pass:
- How to calculate outputs from input
- What to save for the backward pass
- The backward pass:
- How to calculate the gradient of the output with respect to the input
The task is to re-create the ReLU activation function, which is a common activation function in neural networks and which is defined as:
\[ \text{ReLU}(x) = \max(0, x) \]
Note that strictly speaking, the ReLU function is not differentiable at \(x = 0\) (but a subgradient can be used instead). The derivative/subgradient of the ReLU function is:
\[ \text{ReLU}'(x) = \begin{cases} 1 & \text{if } x > 0 \\ 0 & \text{if } x \leq 0 \\ \end{cases} \]
In torch
, a custom autograd function can be constructed using autograd_function()
and it accepts arguments forward
and backward
which are functions that define the forward and backward pass: They both take as first argument a ctx
, which is a communication object that is used to save information during the forward pass to be able to compute the gradient in the backward pass (e.g. for \(f(x) = x * a\), to calculate the gradient of \(f\) with respect to \(a\) we need to know the input value \(x\)). The return value of the backward pass should be a list of gradients with respect to the inputs. To check whether a gradient for an input is needed (has requires_grad = TRUE
), you can use ctx$needs_input_grad
which is a named list with boolean values for each input.
The backward function additionally takes a second argument grad_output
, which is the gradient of the output: E.g., if our function is \(f(x)\) and we calculate the gradient of \(g(x) = h(f(x))\), then grad_output
is the derivative of \(g\) with respect to its input, evaluated at \(f(x)\). This is essentially the chain rule: \(\frac{\partial g}{\partial x} = \frac{\partial h}{\partial f} \cdot \frac{\partial f}{\partial x}\).
Fill out the missing parts (...
) in the code below.
To check that it’s working, use the code below (with your relu
instead of nnf_relu
) and check that the results are the same.
<- torch_tensor(-1, requires_grad = TRUE)
x nnf_relu(x)^2)$backward()
($grad x
torch_tensor
0
[ CPUFloatType{1} ]
$grad$zero_() x
torch_tensor
0
[ CPUFloatType{1} ]
<- torch_tensor(3, requires_grad = TRUE)
x nnf_relu(x)^2)$backward()
($grad x
torch_tensor
6
[ CPUFloatType{1} ]