My Tools

    No Products in the Wishlist

What is Autograd – Everything You Need To Know About PyTorch Autograd

AutoGrad is a special automated method that is used to calculate the derivative of a function. In this article, I will give you a solid understanding of AutoGrad in PyTorch and how to use it in AI models

What Is AutoGrad?

Derivatives are really important when it comes to optimizing algorithms. However, manually calculating derivatives can be time-consuming and can cause many errors, especially when models get more complex.

Thankfully, modern deep learning frameworks have a solution for us called automatic differentiation, or AutoGrad for short. This is a technique used to automatically calculate derivatives of functions. These frameworks create a kind of roadmap called a computational graph that keeps track of how each value depends on others as we process the data through different functions. To calculate derivatives, autograd works its way backward through this graph using the chain rule. This way of applying the chain rule is known as backpropagation.

How to use AutoGrad?

Deep learning frameworks like PyTorch and TensorFlow offer automatic differentiation functionalities (often referred to as autograd) to compute gradients of mathematical functions with respect to input variables. The autographed package called “Grad” in PyTorch provides automatic differentiation, allowing you to define and compute gradients for your models’ parameters efficiently. On the other hand, TensorFlow also has a similar feature called “GradientTape”, which provides automatic differentiation capabilities.

In PyTorch AutoGrad, there are six steps involved when calculating the derivative of a function.

  1. Create the Tensors: Start by creating tensors with values that you want to use for the function and the gradient.
  2. Enable Autograd: Once the tensors are created, Autograd needs to be enabled on the tensors in order to calculate gradients. This can be done by setting the “requires_grad” attribute to True.
  3. Define the Graph structure: Defining which operations(functions) perform on the tensors. PyTorch keeps track of operations and their order to build a computation graph. This graph is used later to calculate gradients using the chain rule of calculus.
  4. Forward pass: This involves computing the output of the model for the given input. In simple terms, it creates output using defined values in the tensor and function.
  5. Backward pass: Once the forward pass is complete, the backward pass can be performed to compute the gradients. This is done by calling the “backward()” method on the output tensor
  6. Access the Gradient: After calling .backward(), gradients are computed and stored in the .grad attribute of the input tensors. You can access /get the gradients by calling .grad to update the model’s parameters during optimization.

For a better understanding, Let’s see how to use AutoGrad for different kinds of functions.

AutoGrad in Basic Functions

First of all, let’s see how to calculate the derivative of a function with one variable using AutoGrad.

Let’s say y=x2 and x=3, You know that we can find the derivative of the function manually by using the power rule. So the derivative of function y=x2 is 2x. then f(3)=6

Now let’s do this using autograd in PyTorch:

Throughout this article, I use Python and PyTorch combination for programming, You can follow our installation guide to set up your PC properly!

The first step is creating a tensor to hold variable values. In our case, we create a tensor(scalar) called x with a value of 3.

1 2

torch.tensor(3.0) creates a scalar with value 3. Remember torch.tensor() only accepts float and complex values.

Output:

1 1 1

Then, we have to enable the Autograd for that tensor by setting the “requires_grad” attribute to True.

enables autograd

‘requires_grad=True’ enables automatic differentiation for tensor x, This allows tracking of operations on tensor x during a forward pass through a computational graph and enables calculating gradients during backpropagation for all operations involving x, Entering ‘x.grad’ immediately after setting ‘x.requires_grad=True’ will give `None` because the gradients have not been calculated yet. Gradients are computed during the backward pass, which is triggered by calling ‘y.backward()’

Output:

2 1 1

The next step is defining the function(graph structure)

3 2

After defining the function, Python will automatically calculate the output value and it will store in variable y. This process is called “forward pass”. Here 3 ** 2 = 9

Output:

3 1 1

After the forward pass, the backward pass is used to compute the gradients. This is done by calling the “backward()” method on the output of the function. Here output result is held in the variable y.

4 2

In the last, We can see the calculated derivative of the function with respect to the variable we needed by using .grad(). In our case we have only one variable called x, so we can access the derivative of our function with respect to variable x by using x.grad.

Autograd Result

Output:

5 1 1

You can see, that PyTorch’s AutoGrad package(Grad) has calculated the derivative of the function correctly.

Let’s calculate the derivate of another function using the same variable value.

If our new function is x + 2, We can compute the derivative of that function with respect to variable x like this:

6 2

x.grad.zero_() – This is used to reset the previous derivative and gradient values, In other words, it clears all previously calculated derivative values of the old function and sets it to None. x.zero_grad() does the same job.

Output:

6 1 1

We got the derivative of the function y = x+2 with respect to variable x value 3 as 1. Because:

dxdy[x+2] =dxdyx +dxdy2

dxdyx=1×x11=1×x0=1  and, dxdy2=0

dxdy[x+2] =1

You can see that x has vanished, so there is no place to value 3.0 to multiply, Therefore the derivative of the function y = x+2 is just ‘1’. To get a good idea about this, you can compare the previous function and its derivative with this.

Let’s say we have to differentiate the function y=2xTx with respect to the column vector x:[0 1 2 3] We know the derivative of the function y=2xTx with respect to x is 4x. As the dot product of x itself here behaves as x2 for vector x elements. So, the derivative of the function for each element would be [ 0.,  4.,  8., 12.]

8 2

Output:

8 1 1

Here only difference is that we use a vector instead of a scalar.

AutoGrad in Multi-Variable Function

Let’s calculate the derivative of a function with multi-variable. We have learned that the Derivative of a multivariate function with respect to one variable is called a Partial Derivative and the Collection of all Partial Derivatives of all variables is called Gradient.

For example, if our function is f=3x2+2y and the variable values we use for it are x=2 and y=3 Then we can get the partial derivative of each variable as fx=12  and fy=2

Let’s see how we calculate the partial derivative using the AutoGrad:

Here we also follow the same steps used in previous examples. The only difference is that we have to create two tensors called x and y for this function.

7 2

Here f.backward() computes the gradient of the function. And x.grad only gives the calculated partial derivative value of the function with respect to variable x from the gradient and y.grad gives the partial derivative of the function with respect to variable y from the gradient.

Output:

7 1 1

Gradient of the function = [Partial derivative of the function respect to x, Partial Derivative of the function respect to y]

So, the Gradient of the function = [12.0 , 2.0]

Backward For Non-Scalar Variables

Still, we calculated derivatives and gradients for functions that give the scalar value as the output. Like y = x**2, we get only scalar values as the output for y. What if our function is something like this y = x * x where we get vector as the output for y

If you try to calculate the derivative or gradient for the function y = x*x by using the same method we used above, You may get an error called “grad can be implicitly created only for scalar outputs”. It means that we can only use scalar values for .grad. So we have to convert/reduce our vector x to a scalar value. For that we can use sum() or mean()

9

The derivative of the function y = x*x is 2x.So, 2*0 = 0, 2*1=2, 2*2=4, 2*3=6

Output:

9 1

Detaching

Now imagine a situation where we want to do some outside calculations to our function without affecting our gradient, like y = x *x   z = y *x here we want to calculate the gradient of z with respect to x without the effect of the function y.

Let’s say we have variable x = [0., 1., 2., 3.] and function y = x*x and z = u*x, our whole function looks like this z = y * x, We want to calculate the derivative of function z with respect to variable x but without affect of the function y. If we use the same methods used in previous examples.

10 2

Output:

10 1 1

We know that the derivative of the function z with respect to x without the affection of the function y is [0. 1. 4. 9.] but this gives the derivatives as [0. 3. 12. 27.]. Because y is affected by the gradient calculation. To solve this we are using .detach() to remove the affection of function y for the gradient calculation.

11 2

u = y.detach() – This creates a new tensor u that shares the same data as y but does not track its gradient. Detaching a tensor essentially removes it from the computation graph, so gradients won’t be calculated for it.

z = u * x – This computes the element-wise product of tensors u and x, assigning the result to the tensor z.

print(x.grad) – This prints the computed gradients of x after the backpropagation step. The gradients represent how much the final output (sum of z) changes with respect to each element of x.

Output:

11 1 1

AutoGrad is a remarkable feature given by Deep Learning Frameworks for building and training deep learning models effectively and efficiently. So you should know how to use it properly for your task. Throughout this article, we have learned how to apply the PyTorch AutoGrad properly for different kinds of functions and in different kinds of situations. It is better to keep practicing this and try to apply this method when building a neural network.

Leave a Reply