In this article, I will give you a solid understanding of calculus concepts called Partial Derivatives and Chain Rule, and their importance in AI, and show you how we do Calculus calculations and visualize them in AI.
Why does AI need Calculus?
Calculus plays a big role in the AI field and it is used in many parts of the AI model. Mainly in Optimization algorithms and backpropagation processes. Here are some of the major usages of Calculus in AI:
- Optimization: Optimization, where we repeatedly update our parameters to decrease the loss function. Optimization algorithms address how to fit our models to training data, and calculus is the key concept behind them.
- Backpropagation: In deep learning, backpropagation is a process in deep learning which involves computing the gradients of the loss function with respect to the weights and biases of the network. Calculus is utilized to efficiently calculate these gradients through chain rule derivatives, allowing the network to learn from the data by adjusting its parameters.
- Probability Calculations: Calculus is used to calculate probabilities, integrate probability density functions (PDFs), and compute expectations, enabling AI algorithms to reason about uncertainty and make informed decisions.
- Time-series Analysis: Many AI applications deal with time-series data, such as stock prices, weather data, or sensor readings. Calculus concepts like differentiation and integration are used to analyze the rates of change, trends, and patterns in time-series data.
Overall, calculus provides the mathematical foundations for understanding, developing, and optimizing AI algorithms. It enables AI systems to learn from data, make informed decisions, and solve complex problems, making it a remarkable concept in AI.
For a better understanding of the concepts in this article, I recommend reading Calculus Part-1
Partial Derivatives & Gradients
The derivative of a function f(x) at a specific point x is defined as the instantaneous rate of change of the function with respect to its input variable. It provides insight into how quickly the function’s output value increases or decreases when the input is altered by an infinitesimally small amount. Mathematically, the derivative f′(x) is given by the limit of the average rate of change as the interval approaches zero:
Above, we have been differentiating a function with one variable called x. But when we work in AI we have to deal function of many variables. These kinds of functions are called Multivariate Functions.
The method that we use to calculate the derivative of a multivariate function is called Partial Derivative. The Partial Derivative measures the rate of change of the function with respect to one variable while holding other variables constant. This tells how the function changes locally in the direction of that variable.
For example, Let’s say we have a function with variables. The partial derivative of the function y with respect to the variable is:
Here we treat all other variables(except ) as constant. Like this, we can calculate the partial derivative of the function with respect to all variables. So, if we have n variable we have to calculate partial derivative n times and we get n results. These n results we called the Gradient of the function and they are stored in a vector.
Gradient of the function = [P.D respect to x1, P.D respect to x2,…………..P.D respect to xn]
We write the gradient of a function as:
The derivative of a single-variable function gives the slope of the tangent line at a point on the graph. The gradient of a multi-variable function gives the direction of the steepest increase and the magnitude of that increase in the function’s value (indicates the direction in which the function increases the fastest). In simpler terms, it shows the way to change the input variables of the function so that the function’s value increases the most.
You can think of the gradient as a guide that points uphill, showing you the steepest path to climb on the function’s surface. By following this path, you can adjust the input variables to maximize the function’s output. This concept is crucial in optimization problems, where you’re trying to find the best values for the inputs to achieve a desired outcome.
In backpropagation, we use a technique called ‘Gradient Descent’ which involves using the opposite of the gradient. This gradient points in the direction that helps decrease the loss function quickly with respect to the model’s parameters, such as weights and biases. By following this direction, we aim to reduce the difference between the model’s predictions and the actual true values.
Chain Rule
In AI models like neural networks, We have to deal with nested functions which are functions of another function. like y=f(u) and u = g(x).In other words y=f(g(x)). Chain Rule is the method that we use to calculate the differentiation of these kinds of functions.
If f(u) has many variables like u1,u2,u3..um and each ui= gi(x) has many variables x1,x2,..xn. The chain rule states that:
Where is a matrix that contains the derivative of vector u with respect to vector x.
Through this article, we have learned calculus’s role in AI, and essential calculus concepts called Gradient and Chain Rule which help to empower AI models to navigate toward optimal solutions and enable systems to break down complex processes into manageable steps. Together, these fundamental calculus concepts provide the guiding principles that empower AI to learn, adapt, and excel across a multitude of tasks.