My Tools

    No Products in the Wishlist

First AI – Not Just Another Article

Only conceptual knowledge is not enough for working in the AI field, we need practical skills to design, code, and build an AI model to handle our own problems. In this article, I will give you a complete guide on how to design and code an AI model for a given problem step by step, showing how learned concepts affect and use.

The Problem

Situation

One of our clients has faced a problem where due to the high number of students studying in his class, he is unable to check all the students’ answers to the papers. So he requested us to build an Automatic system to check the answers.

Past paper

By decomposing the problem, we can employ a scanner to capture images of paper-based answers, specifically digits or numbers. Subsequently, these images are forwarded to a computer system tasked with recognizing the digit (1, 2, 3, 4, 5), comparing it to the correct value, and ultimately providing the student’s final result.

Task

Here we need an AI model for the recognition part. Therefore our task is to build an AI model to recognize handwritten digits.

Examle

Designing the Model

Tips

As there are only five answers that can be selected, we only need the model to identify whether it is number 1, number 2, number 3, number 4, or number 5. So this is a multiclass classification problem.

To make this process easy, we can think that the images are gray (Black&White)images and the size of the scanned images is 784 pixels (28 by 28 images).

Input Data Sample

We need data to build our mode. For that, We can use a pre-designed dataset called Minist which has thousands of images of the handwritten digits so we don’t have to create or gather the data.

Model Structure

As we only have to identify five numbers and there are no complex data or processes, There is no need to use complex neural network architectures. Therefore we can use a simple fully connected Feedforward Neural network(FNNs) architecture for this task.

As there are only five numbers to identify, there are not many features to extract. So for this task, 2 Hidden layers are enough. Also, our training data are 28 by 28(784 pixel) size gray images, so we need to use 784 neurons in the input layer, Each neuron will receive a pixel value.

To learn how to design an AI model properly for a given task, read our article on AI model design.

To maintain the accuracy of our model we can apply the thumb rule to determine the number of neurons in hidden layers, Therefor we can use 128 neurons in the first hidden layer, and 64 neurons in the second hidden layer. This will be enough for our work. And as for the activation functions, we can use the ReLU activation function which is computationally more efficient and simple.

From five classes we have to identify which class belongs to our input data, So we need five neurons in the output layer to get the probability distribution of each class. As this is a multi-class classification problem we should use the softmax activation function in the output layer.

For multi-class classification problems, the most popular and efficient loss function method is the cross-entropy loss function. The combination of the softmax activation function and cross-entropy loss function is the standard approach for multi-class classification tasks.

For a proper training process, we use the Adam Optimization algorithm, which is a combination of momentum-based optimization and the adaptive learning rate approach.

When we consider hyperparameters, we can use some moderate values for them. With the Adam Adaptive learning methods 0.001 is suited for the initial learning rate value. Also when we consider the size of the dataset we use, we can use 64 for batch size and 10 epochs will be enough for a proper training process.

In the end, our AI model structure would look like this:

AI model strucutre

Coding the Model

Importing the Libraries

First of all, We should create a new Python file called train.py. Next, we can import PyTorch and other necessary libraries.

Throughout this article, I use Python and PyTorch combination for programming, You can follow our installation guide to set up your PC properly!
Import Libraries

torch – This imports the core library of PyTorch.

torch.nn – This submodule of PyTorch provides tools for building and training neural networks. It contains various pre-defined layers, loss functions, and other components needed for constructing neural network architectures.

torch.optim – This submodule contains various optimization algorithms that can be used to update the weights of the neural network during training.

torchvision – This is a package provided by PyTorch that offers dataset load and preprocessing of the MNIST dataset.

torchvision.transforms – This submodule provides a set of data transformations that can be applied to images. It’s used to compose the transformations applied to the MNIST images before feeding them into the neural network.

Loading the Dataset

The next step is loading and preprocessing the MNIST dataset.

Load the dataset

transform=transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,),(0.5,))]),This code defines a sequence of transformations to be applied to the input images.

transforms.ToTensor() – Converts the images from the dataset into PyTorch tensors.

transforms.Normalize((0.5,), (0.5,)) – Normalize the pixel values of the images. The first tuple specifies the mean to subtract from each channel, and the second tuple specifies the standard deviation to divide by. This normalization process is often done to make the data have a mean of 0 and a standard deviation of 1.

full_train_dataset=torchvision.datasets.MNIST(root=’./data’,train=True, transform=transform, download=True), This code loads the MNIST dataset with the specified settings.

root=’./data’ – Specifies the directory where the dataset will be stored.

train=True – This indicates that you’re loading the training split of the dataset.

transform=transform – Applies the transformations you defined earlier to the loaded images.

download=True – Specifies that if the dataset isn’t already present in the specified directory, it should be downloaded from the internet.

The above lines of code prepare the MNIST dataset for training by applying the defined transformations and making it compatible with PyTorch’s data handling. The dataset will be normalized and converted to tensors, and these preprocessed samples will be used to train our neural network.

Next, we have to filter the MNIST dataset to get only the digit classes 1 to 5:

Filter the Data

Here list called selected_indices now only contain images of number 0 to 5. Now we should shuffle selected_indices randomly. In PyTorch we use torch.utils.data.sampler.SubsetRandomSampler() to shuffle our data. Then we can use the shaffled data to create a data loader for training. In PyTorch, we do that by using torch.utils.data.DataLoader(dataset=full_train_dataset, batch_size=batch_size, sampler=train_sampler)

Create Input and Target data
If you’re unfamiliar with these data preprocessing methods, check our comprehensive Data Processing guide for a better understanding.

Hyperparameters

Next, we should define the hyperparameters needed for the model

Initialize Hyperparameters

Define the Neural Network

The next step is defining the neural network architecture. Here we have to define the number of neurons and layers we use and the Activation function of each layer.

Define the Neural Network model

Here we created a custom class called Net, which is a subclass of nn.Module(PyTorch’s base class for all neural network modules).

def __init__(self) Under this constructor method, we define the layers and components of your neural network.

super(Net, self).__init__() – This calls the constructor of the parent class(nn.Module) to properly initialize our custom neural network class.

“It is important to note that when building a neural network structure in PyTorch, Instead of defining individual layers of neurons, we define the connections between neuron layers. In simple terms, we don’t categorize them as input, hidden, or output layers; instead, we consider them as the links or nodes connecting one another.”

For example, in our neural network with four layers (input layer, hidden layer 1, hidden layer 2, and output layer), we have three significant connections between these layers. These connections are from the output layer to hidden layer 1, from hidden layer 1 to hidden layer 2, and from hidden layer 2 to the output layer. It’s these connections that we primarily define when designing our neural network in PyTorch.

self.fc1 = nn.Linear(784, 128) – This defines the neuron’s connection between the input layer and the first hidden layer where 784 neurons(in the input layer) are fully connected to 128 neurons(the first hidden layer).

self.fc2 = nn.Linear(128, 64) – This defines the neuron’s connection between the first hidden layer and the second hidden layer where 128 neurons(in the first hidden layer) are fully connected to 64 neurons(the second hidden layer).

self.fc3 = nn.Linear(64, 5) – This defines the neuron’s connection between the second hidden layer and the output layer where 64 neurons(in the second hidden layer) are fully connected to 5 neurons(the output layer).

torch.nn – automatically creates relevant weight and biases tensors for neurons in those connections and assigns suitable random values to them.

x = x.view(-1, 784) – This line reshapes the input data x to a flat tensor of shape (batch_size,784), effectively flattening the 2D image into a 1D vector.

x = torch.relu(self.fc1(x)) -This applies a ReLU activation function to the output of the first neuron’s connection(the input layer and the first hidden layer)

x = torch.relu(self.fc2(x)) – Similarly, this applies ReLU activation to the output of the second neuron connection(the first hidden layer and the second hidden layer)

x = self.fc3(x)) – This generates the final output logits for the third neuron’s connection(the second layer and the output layer) connected layer.

return x – This returns the final output

Now we can initialize the defined model:

Initiate the ANN

Next, we should implement our loss function, learning rate, and Optimization method. here we use the cross-entropy loss function and Adam adaptive optimization method.

Define the Loss and Optimization Function

Training Loop

Next, we need to implement the training process for our deep learning model. In deep learning, training involves an iterative procedure consisting of forward propagation and backward propagation. It’s essential to apply both forward and backward propagation to all the training data. To accomplish this, we can create a loop that iterates through the training dataset

Create the training loop

total_step = len(train_loader) – This calculates the total number of batches in our training data loader. It’s used to track progress during training.

for epoch in range(num_epochs): – This is the outer loop that iterates through each epoch during training. The number of epochs is determined by the value of `num_epochs’

for i, (images, labels) in enumerate(train_loader): – This is the inner loop that iterates through each batch of the training data loader. `i` is the batch index, and `(images, labels)` represents the batch of images and their corresponding labels.

labels = labels – 1 – This subtracts 1 from each label to make the labels 0-indexed. In our dataset, we’ve subtracted 1 from labels to represent classes 1 to 5 as indices 0 to 4.

outputs = model(images) – This passes the batch of images through your neural network model to get the raw output logits for each class.

probabilities = softmax_activation(outputs) – This applies the softmax activation function to the logits to obtain the class probabilities.

loss = cross_entropy_loss(probabilities, labels) – This computes the cross-entropy loss between the predicted probabilities and the ground truth labels.

optimizer.zero_grad() – This zeros out the gradients of the model’s parameters before calculating new gradients in the backward pass.

loss.backward() – This computes gradients of the loss with respect to the model’s parameters using backpropagation.

optimizer.step() – This updates the model’s parameters using the computed gradients and the optimization algorithm (in this case, Adam).

(i+1) % 100 == 0: – This checks if the current batch index is a multiple of 100. If true, it prints the progress and loss for that batch. (This gives the loss value of one training data after every 100 training data)

print(“Training finished”) – After the training loop completes, this message is printed to indicate that the training process has finished.

In summary, this code iterates through the training data, feeds it through the model, calculates the loss, and updates the model’s parameters using backpropagation and optimization. The training loop repeats for the specified number of epochs(here 10 times), and the progress and loss are printed periodically.

If you don’t know what grad or autograd is or what it does, check our complete guide on AutoGrad.

Save the Model

After the training process, we need to save our trained parameter values, or else they will be lost

Save the model

This line of code saves the learned parameters of our trained neural network model to a file named ‘digit_recognition_model_with_softmax_cross_entropy.pth’. This saved state_dict can later be loaded into another instance of the same model architecture to replicate the trained model’s parameters without the need for retraining.

Our whole neural network training code looks like this:

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
# Hyperparameters
batch_size = 64
learning_rate = 0.001
num_epochs = 10
# Load and preprocess MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
full_train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
# Filter the dataset to include only digits 1 to 5
selected_indices = []
for idx, (image, label) in enumerate(full_train_dataset):
    if label in [1, 2, 3, 4, 5]:  # Include only labels 1 to 5 (digits 1 to 5)
        selected_indices.append(idx)
train_sampler = torch.utils.data.sampler.SubsetRandomSampler(selected_indices)
train_loader = torch.utils.data.DataLoader(dataset=full_train_dataset, batch_size=batch_size, sampler=train_sampler)
# Define the neural network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 5)
    def forward(self, x):
        x = x.view(-1, 784)  # Flatten the input
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x
# Instantiate the model
model = Net()
# Loss and optimizer
softmax_activation = nn.Softmax(dim=1)
cross_entropy_loss = nn.CrossEntropyLoss()  # Cross-Entropy loss
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Training loop
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        labels = labels - 1  # Subtract 1 to make labels 0-indexed
        outputs = model(images)
        probabilities = softmax_activation(outputs)
        loss = cross_entropy_loss(probabilities, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if (i+1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_step}], Loss: {loss.item():.4f}')
print("Training finished")
# Save the model
torch.save(model.state_dict(), 'digit_recognition_model_with_softmax_cross_entropy.pth')

Next, we can start/run the training process to see how our neural network performs.

Training the Model

We can start the training process by simply running the train.py file.

Training Process

The displayed output in the terminal provides information about the current epoch, the progress of training on the entire dataset, and the calculated average loss for each batch.

If you don’t know how forward propagation and backward propagation work, check our complete guide on forward propagation and backward propagation to gain a better understanding of them.

After the given epochs(here 10) training process stops and the below message will be displayed.

End of the Training Process

Now, our model has been trained on a given dataset and its model parameters (weights and biases) have been adjusted to increase accuracy by minimizing the loss function as much as possible

You can see that the average loss values in the above results hover around 0.91. However, this value alone does not provide a comprehensive assessment of the overall performance of your model. It can vary based on factors such as the dataset size, number of epochs, and batch size

After training, the adjusted parameter values will be saved at a specified location. In our case, they will be saved in a file named ‘digit_recognition_model_with_softmax_cross_entropy.pth.’

Testing the Model

Now, let’s test our AI model. Typically, during testing, we use new, unseen data that is distinct from the dataset used for training. However, for simplicity, I will use a single sample image of the number 4 from the MNIST training dataset for this demonstration.

For testing the model let’s create a new Python file called test.py(Remember you can just add the below code to the same file we use for training the model but for clear understanding, I use separate files)

In the following code, I have selected an image of the number 4 from the MNIST dataset. I will pass this image data through our model, and then display the predicted class label alongside the selected image using the matplotlib library.

First of all, we should import the necessary libraries.

Import Libraries

Next, we should load our saved neural network architecture and train neural network parameters for it.

Load the ANn structure

model.load_state_dict(torch.load(‘digit_recognition_model_with_softmax_cross_entropy.pth’)) – This will load the trained parameter values from its saved location.(in our case, ‘digit_recognition_model_with_softmax_cross_entropy.pth’)

model.eval() – This sets the model to evaluation mode. When a model is in evaluation mode, it affects the behavior of certain layers in the network, such as dropout and batch normalization layers. The primary purpose of using model.eval() is to ensure consistent behavior between the training and evaluation phases. During training, dropout layers introduce randomness by randomly ‘dropping out’ units during forward propagation, which helps prevent overfitting. Likewise, batch normalization layers compute statistics based on the batch being processed.

Now we can load our data sample.

Load the dataset

In the code above, similar to the training process, we load the entire MNIST dataset. Then, we filter and save only the images of the number 4 using the following code: [idx for idx, (_, label) in enumerate(mnist_dataset) if label == 4], and store them in the digit_4_indices variable. If you want images of another number, such as number 2, you can simply change if label == 4 to if label == 2. This modification will provide you with only the images of the number 2.

To obtain a single data sample, we select one image of the number 4 by indexing digits_4_indices, which contains images of the number 4. In this case, we are using the first image. You can select a different image of the number 4 by changing the index number.

Next, we can pass the selected data sample through our trained model like this:

Define the Activation Function

The output_probabilities variable contains the model’s predicted probabilities for each class corresponding to the input image. In simpler terms, it provides probabilities for each class, indicating how likely the image belongs to each class. The class with the highest probability value is considered the model’s predicted class for that image.

We can display the model-predicted class with the selected image of the number 4 like this:

Define the Output method

Our model’s output will be something like this:

Result

“You can see that our model correctly predicted that image as number 4.”

Like this, you can build your own simple AI model. But remember, in real-world models, there are much more complex processes involved than this. These might include employing generalization methods, collecting extensive datasets, and more. This is just a simple demonstration that serves as a foundational understanding of how to construct and train an AI model. In reality, AI models are just one part of the solution. In practical applications, we must also consider how the model collects real-world data and determine appropriate actions based on its predictions. This often entails developing custom interfaces. As you can see, the process becomes significantly more complex in real-world scenarios.

Leave a Reply