Simple Image Detection and Classification using CNN Algorithm

Enrico Megantara
5 min readJun 14, 2021

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers learn automatically without human intervention or assistance and adjust actions accordingly.

To create a images detection system, we can use one of the Deep Learning models, the Convolutional Neural Network (CNN). CNN is one of the Deep Learning models that is often used to classify an image in .jpeg form. CNN has different architectural designs, according to the needs of building the CNN model. There are several examples of CNN architectural designs that have been built and have been approved for performance. For example VGGNet-16, AlexNet, LeNet, Residual Networks (ResNet), and many more. We can build the CNN model based on the existing architecture or we can even build our own CNN architecture! But of course it takes time and more in-depth research if you make your own architecture.

Convolutional Neural Network Architecture

A convolutional neural network consists of an input layer, hidden layers and an output layer. In any feed-forward neural network, any middle layers are called hidden because their inputs and outputs are masked by the activation function and final convolution. In a convolutional neural network, the hidden layers include layers that perform convolutions. Typically this includes a layer that performs a dot product of the convolution kernel with the layer’s input matrix. This product is usually the Frobenius inner product, and its activation function is commonly ReLU. As the convolution kernel slides along the input matrix for the layer, the convolution operation generates a feature map, which in turn contributes to the input of the next layer. This is followed by other layers such as pooling layers, fully connected layers, and normalization layers.

Convolution Layer

The size of the Convolution Layer can vary, but generally only 3x3, 5x5, or 7x7 filters are used. There is also a 1x1 size filter, but this filter is more dedicated to reducing the number of channels in a layer.

Fully Connected Layer

The activation map generated from the feature extraction layer is still in the form of a multidimensional array, so inevitably we have to reshape the activation map into a vector so that it can be used as input from the fully-connected layer. This layer has a hidden layer, activation function, output layer, and loss function. This layer is a layer that is usually used in the application of multi-layer perceptrons and aims to transform the data dimensions so that the data can be classified linearly.

Each neuron in the convolution layer needs to be transformed into one-dimensional data before it can be entered into a fully-connected layer. Because it causes the data to lose its spatial information and is not reversible, while the fully-connected layer can only be implemented at the end of the network. The convolution layer with a kernel size of 1 x 1 performs the same function as the fully-connected layer but retains the spatial character of the data.

I use Keras Library for make this Convolutional Neural Network prediction images :

import tensorflow as tf
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow import keras
from tensorflow.keras import layers
. . . train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(150, 150),
batch_size=32,
class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=32,
class_mode='categorical')
model = keras.Sequential()
model.add(layers.Conv2D(32, (5,5), activation='relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Conv2D(64, (3,3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Conv2D(128, (3,3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Conv2D(256, (3,3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=tf.optimizers.Adam(),
metrics=['accuracy'])
.....

Convolutional Neural Network architecture that is built consists of 3 Convolutional Layers with details:

Convolution Layer 1 = 5x5 with 32 filters
Convolution Layaer 2 = 3x3 with 64 filters
Convolution Layer 3 = 3x3 with 128 filters
Convolution Layer 3 = 3x3 with 256 filters

Activation Functions used are ReLu and Softmax on the Output layer. After the training process is carried out, the results of the training model that has been created will show:

The Convolutional Neural Network model get accuracy and validation accuracy above 0.9 and for loss and validation loss under 0.1

--

--