Exemplify convolution over volume with convolution on RGB images Also illustrate multiple filters used in it.

Convolution Over Volume in Convolutional Neural Networks (CNNs)

Convolution on RGB Images

RGB images consist of three color channels: Red, Green, and Blue. Each channel can be considered as a separate 2D matrix, and together they form a 3D volume. When performing convolution on RGB images, we need to account for the depth of the input, which is 3 in this case.

Example: Convolution Over Volume

Input:

Consider a small 3x3 RGB image, where each pixel has three values corresponding to the R, G, and B channels.

[ \text{Red Channel} = \begin{bmatrix} R_{11} & R_{12} & R_{13} \ R_{21} & R_{22} & R_{23} \ R_{31} & R_{32} & R_{33} \end{bmatrix} ] [ \text{Green Channel} = \begin{bmatrix} G_{11} & G_{12} & G_{13} \ G_{21} & G_{22} & G_{23} \ G_{31} & G_{32} & G_{33} \end{bmatrix} ] [ \text{Blue Channel} = \begin{bmatrix} B_{11} & B_{12} & B_{13} \ B_{21} & B_{22} & B_{23} \ B_{31} & B_{32} & B_{33} \end{bmatrix} ]

Filter:

Assume a 3x3x3 filter, where each layer corresponds to the R, G, and B channels respectively.

[ \text{Filter (Kernel)} = \begin{bmatrix} \begin{bmatrix} F_{R1} & F_{R2} & F_{R3} \ F_{R4} & F_{R5} & F_{R6} \ F_{R7} & F_{R8} & F_{R9} \end{bmatrix}, \begin{bmatrix} F_{G1} & F_{G2} & F_{G3} \ F_{G4} & F_{G5} & F_{G6} \ F_{G7} & F_{G8} & F_{G9} \end{bmatrix}, \begin{bmatrix} F_{B1} & F_{B2} & F_{B3} \ F_{B4} & F_{B5} & F_{B6} \ F_{B7} & F_{B8} & F_{B9} \end{bmatrix} \end{bmatrix} ]

Convolution Operation:

The filter convolves across the entire volume of the input image, performing element-wise multiplication and summing the results.

For a specific position in the input image:

[ \text{Convolution Output} = \sum_{i=1}^{3} \sum_{j=1}^{3} \sum_{k=1}^{3} (R_{ij} \cdot F_{R_{ij}} + G_{ij} \cdot F_{G_{ij}} + B_{ij} \cdot F_{B_{ij}}) ]

Multiple Filters

In practice, CNNs use multiple filters to capture different features from the input image. Each filter will produce a separate feature map. When multiple filters are applied to the input image, they generate a set of feature maps which are then stacked together to form the next layer's input.

Example with Two Filters:

Assume two different filters are used:

Filter 1: [ \begin{bmatrix} F1_{R1} & F1_{R2} & F1_{R3} \ F1_{R4} & F1_{R5} & F1_{R6} \ F1_{R7} & F1_{R8} & F1_{R9} \end{bmatrix} ] [ \begin{bmatrix} F1_{G1} & F1_{G2} & F1_{G3} \ F1_{G4} & F1_{G5} & F1_{G6} \ F1_{G7} & F1_{G8} & F1_{G9} \end{bmatrix} ] [ \begin{bmatrix} F1_{B1} & F1_{B2} & F1_{B3} \ F1_{B4} & F1_{B5} & F1_{B6} \ F1_{B7} & F1_{B8} & F1_{B9} \end{bmatrix} ]
Filter 2: [ \begin{bmatrix} F2_{R1} & F2_{R2} & F2_{R3} \ F2_{R4} & F2_{R5} & F2_{R6} \ F2_{R7} & F2_{R8} & F2_{R9} \end{bmatrix} ] [ \begin{bmatrix} F2_{G1} & F2_{G2} & F2_{G3} \ F2_{G4} & F2_{G5} & F2_{G6} \ F2_{G7} & F2_{G8} & F2_{G9} \end{bmatrix} ] [ \begin{bmatrix} F2_{B1} & F2_{B2} & F2_{B3} \ F2_{B4} & F2_{B5} & F2_{B6} \ F2_{B7} & F2_{B8} & F2_{B9} \end{bmatrix} ]

Feature Maps:

Each filter generates its own feature map by convolving over the input image.

Feature Map 1 from Filter 1
Feature Map 2 from Filter 2

These feature maps are then combined to form a new volume, which will serve as input to the next layer.

Illustration:

If we apply these filters to the RGB image, we get:

Feature Map 1: [ \begin{bmatrix} FM1_{11} & FM1_{12} & FM1_{13} \ FM1_{21} & FM1_{22} & FM1_{23} \ FM1_{31} & FM1_{32} & FM1_{33} \end{bmatrix} ]
Feature Map 2: [ \begin{bmatrix} FM2_{11} & FM2_{12} & FM2_{13} \ FM2_{21} & FM2_{22} & FM2_{23} \ FM2_{31} & FM2_{32} & FM2_{33} \end{bmatrix} ]

Combined Feature Maps (Depth of 2): [ \begin{bmatrix} FM1 & FM2 \end{bmatrix} ]

Summary

Convolution over volume involves applying 3D filters to the 3D input data (like RGB images) to extract features. Multiple filters capture various aspects of the data, generating multiple feature maps that form a new volume for the next layer. This process allows CNNs to learn and represent complex patterns and structures in the input data effectively.