Consider a LeNet-S a convolutional neural network, we want to perform the classification of digits, Write down the complete procedure followed in its architecture.
LeNet-5 Architecture for Digit Classification
LeNet-5 is one of the earliest convolutional neural network architectures, designed by Yann LeCun et al. for handwritten digit recognition (MNIST dataset). Below is the complete procedure followed in its architecture:
1. Input Layer
- Input Size: (32 \times 32) grayscale image (single channel).
2. Convolutional Layer (C1)
- Filter Size: (5 \times 5)
- Number of Filters: 6
- Stride: 1
- Padding: 2 (to maintain the input size)
- Output Size: (32 \times 32 \times 6)
- Activation Function: Sigmoid (in the original LeNet-5; modern implementations often use ReLU)
Operation: Convolve the 6 filters with the input image to produce 6 feature maps, each of size (32 \times 32).
3. Subsampling Layer (S2)
- Type: Average Pooling (subsampling)
- Filter Size: (2 \times 2)
- Stride: 2
- Output Size: (16 \times 16 \times 6)
- Activation Function: Sigmoid
Operation: Perform average pooling on each of the 6 feature maps from C1, reducing their size by a factor of 2.
4. Convolutional Layer (C3)
- Filter Size: (5 \times 5)
- Number of Filters: 16
- Stride: 1
- Padding: 0
- Output Size: (12 \times 12 \times 16)
- Activation Function: Sigmoid
Operation: Convolve 16 filters with the 6 input feature maps, producing 16 feature maps, each of size (12 \times 12).
5. Subsampling Layer (S4)
- Type: Average Pooling
- Filter Size: (2 \times 2)
- Stride: 2
- Output Size: (6 \times 6 \times 16)
- Activation Function: Sigmoid
Operation: Perform average pooling on each of the 16 feature maps from C3, reducing their size by a factor of 2.
6. Convolutional Layer (C5)
- Filter Size: (5 \times 5)
- Number of Filters: 120
- Stride: 1
- Padding: 0
- Output Size: (1 \times 1 \times 120)
- Activation Function: Sigmoid
Operation: Convolve 120 filters with the 16 input feature maps, producing 120 feature maps, each of size (1 \times 1).
7. Fully Connected Layer (F6)
- Number of Neurons: 84
- Activation Function: Sigmoid
Operation: Flatten the (1 \times 1 \times 120) output from C5 to a (120)-dimensional vector and connect it to 84 neurons, producing a (84)-dimensional vector.
8. Output Layer
- Number of Neurons: 10 (corresponding to the 10 digit classes)
- Activation Function: Softmax
Operation: Connect the 84-dimensional vector from F6 to 10 neurons, each representing a class. The softmax function converts the output into probabilities for each class.
Summary of LeNet-5 Architecture
- Input Layer: (32 \times 32 \times 1)
- C1 (Convolutional): (32 \times 32 \times 6)
- S2 (Subsampling): (16 \times 16 \times 6)
- C3 (Convolutional): (12 \times 12 \times 16)
- S4 (Subsampling): (6 \times 6 \times 16)
- C5 (Convolutional): (1 \times 1 \times 120)
- F6 (Fully Connected): 84
- Output Layer: 10 (Softmax)
Training Procedure
- Data Preparation: Normalize the input images to a fixed size ((32 \times 32) for LeNet-5) and scale pixel values to the range [0, 1].
- Forward Propagation: Pass the input image through the network layers, computing the activation at each layer.
- Loss Computation: Use the cross-entropy loss function to measure the difference between the predicted probabilities and the true labels.
- Backward Propagation: Compute gradients of the loss with respect to each weight and bias in the network using backpropagation.
- Optimization: Update the weights and biases using an optimization algorithm like stochastic gradient descent (SGD).
- Evaluation: Evaluate the trained model on a validation set to monitor performance and adjust hyperparameters if necessary.
- Inference: Use the trained model to predict the class of new, unseen images.
Conclusion
LeNet-5 is a pioneering CNN architecture that laid the groundwork for modern deep learning models. It effectively combines convolutional layers for feature extraction and subsampling layers for dimensionality reduction, followed by fully connected layers for classification. This architecture is particularly well-suited for image classification tasks, such as digit recognition in the MNIST dataset.