Neural Network also known as Artificial Neural Network (ANN) is the key component in the new-generation AI. In this article, We will explore the fundamental concepts behind the neural network, its architecture, and the difference between Deep Learning and Neural Networks.
What is a neural network?
A neural network is a computational model inspired by the structure and function of the biological human brain. This is the main concept behind deep learning, which is the modern method used for AI.
Similar to the human brain, neural networks also have neurons that are interconnected to one another in various layers of the networks. These artificial neurons are also known as nodes or units. They help artificial neural networks to simulate the way neurons in the brain work to process information.
History of Neural Network
The concept of neural networks dates back to the 1940s when researchers, including Warren McCulloch and Walter Pitts, introduced the idea of mathematical models inspired by biological neurons. In the 1950s, Frank Rosenblatt developed the perceptron, a type of neural network capable of learning from examples. However, limited computational power and lack of data hindered progress until the resurgence of neural networks in the 1980s and the development of more advanced algorithms and computing capabilities.
Also those days machine learning algorithms showed better results and accuracy compared to neural networks. So, many researchers focus on development in machine learning instead of neural networks.
Why do we need neural networks?
Before the neural networks, machine learning algorithms were the only method that we could use for approaching AI. However, there are many limitations and problems in typical machine learning algorithms. Mainly, They can not extract the feature by themself so we have to extract features, and those algorithms are not good at capturing deep and complex relationships in data. So researchers tended to find a new solution and again their attention went toward the old neural concept. After much research, they were able to re-presented the modern neural network concept with a proper foundation.
Unlike machine learning algorithms, neural networks are able to extract features automatically from input data, and with the multi-layer structure, Neural networks are able to extract increasingly deep features and capture more complex and deep patterns from the input data.
These possibilities of neural networks help us to pass the limits of machine learning and create a more efficient and accurate AI model than ever.
How does a neural network work?
Unlike machine learning, Neural networks use layers of neurons to identify patterns in data. Each neuron in a network has parameters called weight and bias, and when data is input, each neuron performs mathematical computations using these parameters. The results are then passed on to the neurons in the next layer. Like this, input data flows through all the layers and their neurons, and the last layer of the network produces the final prediction for the input data. If this prediction is not the desired result, the network adjusts the parameters to minimize the difference between the prediction and the target value. This adjustment of weights and biases helps the network to identify relevant features and relationships in the data.
By applying this process to all of the other data, the neural network becomes capable of capturing complex relationships across the entirety of the data set. In the end, we will have a fully-trained neural network with parameters that capture patterns and relationships from all of the given data. This trained neural network can then be used to make accurate predictions or classifications on new and unseen data.
For a better understanding, Let’s see what is inside the neural network architecture and how it works.
Neural Network Architecture
The basic neural network architecture contains only three main layers called: the input layer, the hidden layer, and the output layer.
Input Layer
The input layer is used to enter the data into the neural network. The input layer is only responsible for transforming the raw input data into a format that the hidden layer can understand using normalization or different representations and then passing them to the hidden layer. There are no parameters(weights and biases) in the input layer neurons.
The input layer size(number of neurons) is determined by the size of the input data. Each neuron represents a single data value of that input data. For example, if the input is a black & white/grayscale image with a 3 by 3 size (9)pixels, then we will have to use 9 neurons. In other words, we use 9 neurons to represent 9-pixel values. Here, each neuron will take a color value of a pixel.
The input layer is a very important part of neural networks. This ensures that the input data is in a format that the hidden layer can understand. Without this, the hidden layer is unable to perform any mathematical computations on the data.
Hidden Layer
The Hidden layer is used to extract features from the data and capture the patterns and relationships by performing mathematical calculations and nonlinear transformations on the input data by using parameters(weights & biases) and activation functions.
Weights & Bias
Weights and Bias are worked as Parameters in a neural network. Every neuron in the hidden and output layer has its own weight and bias value. In the basic neural network architecture, One neuron can only have one bias value but it can have many weights. During the training, these weights and bias values are adjusted at every iteration until the neural network is able to give the correct result for every input data.
Weights:
Weights are used to store the found patterns and deep features qualities. Simply they work as memories in neural networks. When training the neural network these weight values are multiplied with data passed by the previous neuron. The weight value can be negative, positive, or even zero. ex:- .05323
The number of weights that can be in one neuron depends on the number of neurons in the previous layer. For example, if the input layer has 2 neurons then each neuron in the hidden layer should have 2 weight values.
Bias:
Bias is an integer value that is added to the weighted sum. The only difference between weights and biases is that bias doesn’t multiply with input data or other neuron results, it only adds to them. This is a very important component of neural networks that allows them to learn more complex patterns and prevent overfitting. Unlike weights, one neuron can only have one bias value. The bias value also can be a positive, negative, or even zero ex-:0, 1, 0.1 or 0.01
In the past, we had to assign random values to each neuron’s weight and bias and store them using tensors (matrices or vectors) before starting the training process. But today deep learning frameworks automatically initialize/assign weights and bias values when we define the neural network structure.
The number of neurons we should use in hidden layers depends on the dimensionality of the data and the complexity of the task. By experimenting with different numbers of neurons We may be able to find the best-suited number of neurons we should use for our neural network.
During the training process, Each neuron in the hidden layer does a mathematical operation on input data passed by the input layer neurons and then passes the result to the neurons of the output layer. This mathematical operation can be shown as:
Hidden layer Output = XW + b
Where X is a matrix that contains the values passed by the input layer neurons. W is a matrix that holds the values of the weights for each neuron in the hidden layer and b is a vector that holds the bias values of each hidden layer neuron.
As above, when we multiply the input data with weight values and then add the bias values. It gives us a linear equation like y = mx + c. If we continue this process as it is we can only find the linear relationship between input and output data. Which is affected badly to our neural network accuracy and limits the possibilities and use of neural networks. So we have to break this linearity of the hidden layer. That’s where we need activation functions.
Activation Function()
This is a mathematical function that introduces non-linearity into the neural network. The activation function takes the above-hidden layer output and applies a non-linear transformation to it. An Activation function is denoted by (x). So we can rewrite the above equation like this:
Hidden layer Output = (XW + b)
Some commonly used activation functions in the hidden layer include Sigmoid, tanh, RuLU, and its variants.
Rectified Linear Activation Function(ReLU)
g(z) = max{0, z}
Here g(z) refers to the output of the hidden layer. The ReLU activation function takes the hidden layer output as input and then gives the maximum value from 0 to the hidden layer output. For example, if the hidden layer output is 3 then ReLU function output is max{0, 3} = 3. You can see this activation function neglects the negative values. If the hidden layer output was -3 then the ReLU output will be 0.
Like above, By applying the activation function to the output of the hidden layer we can capture more complex and no-linear relationships from the data.
Output Layer
This is the last layer of a neural network that provides the final result. The output layer can have single or multiple neurons and each neuron of the output layer has weight and bias values like in the hidden layer.
After the hidden layer passes the results to The output layer, Neurons of the output layer also do the same mathematical calculations on the passed values and then apply an activation function to produce the model output or prediction.
Output layer / Final result = (HW + b)
Here, H is a matrix that holds the outputs of the hidden layer neurons. W is a matrix that contains the values of the weights for each neuron in the output layer and b is a vector that contains the bias values of each output layer neuron.
The difference between the output layer and the hidden layer is that the number of neurons and the activation function we should use in the output layer depend on the nature of your task and the final output type(the type of prediction your model needs to make).
For instance, in binary classification, you can use a single neuron with a sigmoid activation function. If it is a multi-class classification, then you should use the number of neurons the same as the number of classes with softmax activation function to obtain class probabilities.
Types of Neural Networks
There are various types of neural networks based on their architectures. These neural networks’ architecture has some changes in design, connectivity, and layer arrangements compared to basic architecture for handling specific types of data or tasks. Here are some common neural network architectures:
- Feedforward Neural Networks (FNNs): Also known as multilayer perceptrons (MLPs), FNNs are the simplest type of neural network architecture in which data flow only forwards. These consist of an input layer, one or more hidden layers, and an output layer. FNNs are used for tasks such as classification, regression, and pattern recognition.
- Convolutional Neural Networks (CNNs): CNNs are a type of neural network architecture designed explicitly for processing grid-like data, such as images or spectrograms. They use convolutional layers that apply filters to capture spatial or temporal patterns and utilize convolutional layers to extract local patterns and hierarchical representations, enabling effective image classification, object detection, and other computer vision tasks.
- Recurrent Neural Networks (RNNs): RNNs are another type of neural network architecture that can process sequential data, such as speech or text. They have recurrent feedback connection that allows them to maintain internal memory of past inputs/connections that allow them to capture information from previous inputs, making them suitable for tasks like language modeling, machine translation, and speech recognition.
- Long Short-Term Memory (LSTM) Networks: LSTMs are a type of RNN that addresses the vanishing gradient problem by introducing memory cells. LSTMs can effectively capture long-term dependencies in sequential data and are especially useful for tasks involving long-range dependencies or variable-length sequences.
- Generative Adversarial Networks (GANs): GANs consist of two neural network architectures called a generator and a discriminator which compete against each other. The generator learns to generate realistic data samples, while the discriminator learns to distinguish between real and fake samples. GANs are commonly used for tasks like image synthesis, data augmentation, and generative modeling.
- Autoencoders: Autoencoders are neural network architectures trained to reconstruct their input data. They consist of an encoder that is used to compress the input data into a low-dimensional representation (encoding) and a decoder that is used to reconstruct the original input from the encoding. Autoencoders are used for tasks like dimensionality reduction, anomaly detection, and denoising.
- Transformer Networks: Transformers are a type of architecture that has gained popularity in natural language processing tasks. They use self-attention mechanisms to capture dependencies between different positions in the input sequence. Transformers have achieved outstanding results in machine translation, text generation, and language understanding tasks.
These are just a few examples of neural network architectures, and there are many more variants and hybrids designed for different problems and data types. The choice of architecture depends on the specific task, the nature of the data, and the desired performance outcomes.
Deep Learning vs. Neural Networks
Simply, the Neural network is the foundation of Deep Learning, enabling the creation of artificial intelligence. While a basic neural network consists of just a few key layers in its architecture, Deep Learning utilizes neural networks with numerous hidden layers, allowing for the capture of finer details from input data. This increased depth within the neural network is why we refer to this approach as ‘Deep Learning’. Since all Deep Learning models rely on neural network principles, we can refer to them as neural networks as well.
Neural networks stand as the foundation of modern AI and deep learning, revolutionizing the way computers process information and make decisions. This remarkable concept helps us build robust AI systems that we use today to make our lives much easier and more flexible. Still, researchers are continually exploring new techniques to improve neural network performance and address existing limitations. Nearby future Researchers will be able to build a neural network that works one hundred percent the same as the human brain.