FigTab is a data set of 48x48x3 pixel images of geometrical shapes. Each image consists of a 3x3 grid of 9 simple geometrical figures (ellipse, triangle, quadrilateral).
They were generated by an automatic Python script with randomization on shapes. We used 48x48x3 pixels images to make it possible for standard pc systems to run the exercises in this notebook (a modest graphics card is required though). It is intented to serve as a simple and clean data set for preliminary excursions, tutorials, or course projects in deep learning courses.
It is also intended to serve as a simple data set for benchmarking deep learning libraries and deep learning hardware (like GPU systems).
The simplest recognition task related to such data set is for example to count the number of occurrences of a particular shape in each image. In this notebook we will try to build an efficient neural network for detecting the exact number of triangles in each image. We will therfore have 10 categories which we assign the following names:
0: 0-triangles
1: 1-triangles
2: 2-triangles
3: 3-triangles
4: 4-triangles
5: 5-triangles
6: 6-triangles
7: 7-triangles
8: 8-triangles
9: 9-triangles
Some examples of this categories can be seen in the above diagram. Our neural network will accept a 48x48x3 pixel color image as input and will output the exact number of triangles this image has.
One can think of other recognition tasks, such as detecting all images that have the same shape along one of the diagonals, or in one row, etc. It would be intereseting to see if their solution requires the same effort or not? Of course, we leave these challenges as exercises and course projects for our dear students :-)
The FigTab data set consists of 2 HDF5 files, each contains 50,000 48x48x3 pixel color images. All images are unique, there are no duplications.
We also supply two smaller data sets for training and validation, since the above data sets are too large. The "train.h5" contains 10,000 samples (1000 from each class), and the "test.h5" data set contains 5000 samples (500 from each class):
You will need to install the h5py Python module. Reading and writing HDF5 files can be easily learned from the following tutorial: https://www.getdatajoy.com/learn/Read_and_Write_HDF5_from_Python
The code for this IPython notebook was tested on Windows 10, Python 2.7/3.5 with keras, numpy, matplotlib and jupyter. The deep learning hardware we used was an NVIDIA GPU (GeForce GTX950) with cuDNN version 5103. Of course, it can also be run on CPU but it will be significantly slower (not recommended).
To run the code in this notebook, you'll also need to download a few private libraries libraries which we use in other examples of this course:
Here are the Python modules and basic definitions we need for an example of how to use the figtab data set
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D, AveragePooling2D, ZeroPadding2D
from keras.optimizers import SGD
from keras.constraints import maxnorm
from keras.utils import np_utils
from keras.layers.advanced_activations import SReLU, LeakyReLU
from keras.utils.visualize_util import plot
from keras.layers.noise import GaussianNoise
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
from kerutils import *
from imgutils import *
from matplotlib import rcParams
rcParams['figure.figsize'] = 10,7
%matplotlib inline
class_name = {
0: '0-triangles',
1: '1-triangles',
2: '2-triangles',
3: '3-triangles',
4: '4-triangles',
5: '5-triangles',
6: '6-triangles',
7: '7-triangles',
8: '8-triangles',
9: '9-triangles',
}
nb_classes = len(class_name) # 10
classes = range(nb_classes)
# These are css/html styles for good looking ipython notebooks
from IPython.core.display import HTML
css = open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css));
The imgutils module contains a utility load_data for loading HDF5 files to memory (as Numpy arrays). This method accepts the names of your training and validation data set files, and it returns the following six Numpy arrays:
It should be noted that in additional to reading the images from the HDF5 file, the load_data method also performs some normalization of the image data like scaling it to a unit interval and centering it around the mean value. You can control these actions by additional optional options of this command. Please look at the source code to learn more.
X_train, y_train, Y_train, X_test, y_test, Y_test = load_data('train.h5', 'test.h5')
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'training samples')
print(X_test.shape[0], 'validation samples')
The original data of each image is a 48x48x3 matrix of integers from 0 to 255. We need to scale it down to floats in the unit interval. This done automatically by the above load_data method, which applies the data_normalization procedure. Look at the imgutils module for more details.
Let's also write two small utilities for drawing samples of images, so we can inspect our results visually. We need to pull our images from the HDF5 files and not from the X_train and X_test arrays, since they were normalized and have significantly changed. Here is a method for extracting image number i from an HDF5 file:
def get_img(hfile, i):
f = h5py.File(hfile,'r')
img = np.array(f.get('img_' + str(i)))
cls = f.get('cls_' + str(i)).value
f.close()
return img, cls
def draw_image(hfile, i):
img, cls = get_img(hfile, i)
plt.imshow(img, cmap='jet')
plt.title(class_name[cls], fontsize=15, fontweight='bold', y=1.05)
plt.show()
Let's draw image 18 in the X_train array as example
draw_image("train.h5", 18)
Sometimes we want to inspect a larger group of images in parallel, so we also provide a method for drawing a grid of consecutive images.
def draw_sample(hfile, n, rows=4, cols=4, imfile=None, fontsize=9):
for i in range(0, rows*cols):
ax = plt.subplot(rows, cols, i+1)
img, cls = get_img(hfile, n+i+1)
plt.imshow(img, cmap='jet')
plt.title(class_name[cls], fontsize=fontsize, y=1.08)
ax.get_xaxis().set_ticks([])
ax.get_yaxis().set_ticks([])
plt.subplots_adjust(wspace=0.8, hspace=0.1)
draw_sample("train.h5", 400, 3, 5)
We will start with a simple Keras model which combines one Convolution2D layer with two Dense layers. Although simple in terms of code, it is too expensive in terms of computation and hardware, as it contains 70 million parameters! This is way too much and should be avoided in general. However, we want to experiment with the common use of Dense layers and see why they are not good for image processing. In general, Dense layers should be avoided as much as possible when dealing with image data. The general practice is to use Convolution and Pooling layers. These two types of layers are explained in more detail in the following two articles, which we recommend to read before you approach the following code:
We now define our first model for the recognizing FigTab shapes. Note that unlike the common practice, we decided to use the SReLU activation method instead of the more popular relu activation. We did several test with relu but SReLU seems to be more appropriate for FigTab. One of the amazing facts about SReLU is that it adapts itself during the learning process and not a constant function as other activations. You may read more about it in the following papers:
nb_epoch = 100
batch_size = 32
input_shape = X_train.shape[1:]
model = Sequential(name="model_1")
model.add(Convolution2D(64, 3, 3, input_shape=input_shape))
model.add(SReLU())
model.add(Flatten())
model.add(Dense(512))
model.add(SReLU())
model.add(Dropout(0.4))
model.add(Dense(256))
model.add(SReLU())
model.add(Dropout(0.4))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
print(model.summary())
save_model_summary(model, "model_1_summary.txt")
write_file("model_1.json", model.to_json())
fmon = FitMonitor(thresh=0.09, minacc=0.999, filename="model_1_autosave.h5")
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
hist = model.fit(
X_train,
Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
shuffle=True,
validation_data=(X_test, Y_test),
verbose=0,
callbacks = [fmon]
)
model_file = "model_1.h5"
print("Saving model to:", model_file)
model.save(model_file)
plot(model, to_file="model_1_scheme.png", show_layer_names=False, show_shapes=True)
show_scores(model, hist, X_train, Y_train, X_test, Y_test)
loss, accuracy = model.evaluate(X_train, Y_train, verbose=0)
print("Training: accuracy = %f ; loss = %f" % (accuracy, loss))
loss, accuracy = model.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy1 = %f ; loss1 = %f" % (accuracy, loss))
Although the training accuracy is quite high (99.82% !), the overall result is not good. The 10% gap with the validation accuracy is an indication of overfitting (which is also clearly noticeable from the accuracy and loss graphs above). Our model is successful on the training set only and is no as successful for any other data.
Befor we search for a new model, let's take a quick look on some of the cases that our model missed. It may give us clues on the strengths and weaknesses of NN models, and what we can expect from these artificial models.
The predict_classes method is helpful for getting a vector (y_pred) of the predicted classes of model1. We should compare y_pred to the expected true classes y_test in order to get the false cases:
y_pred = model.predict_classes(X_test)
true_preds = [(x,y) for (x,y,p) in zip(X_test, y_test, y_pred) if y == p]
false_preds = [(x,y,p) for (x,y,p) in zip(X_test, y_test, y_pred) if y != p]
print("Number of valid predictions: ", len(true_preds))
print("Number of invalid predictions:", len(false_preds))
The array false_preds consists of all triples (x,y,p) where x is an image, y is its true class, and p is the false predicted value of model.
Lets visualize a sample of 15 items:
for i,(x,y,p) in enumerate(false_preds[0:15]):
plt.subplot(3, 5, i+1)
plt.imshow(x, cmap='jet')
plt.title("%d\ny: %s\np: %s" % (i, class_name[y], class_name[p]), fontsize=9, loc='left')
plt.axis('off')
plt.subplots_adjust(wspace=0.8, hspace=0.6)
We see that in all cases our model missed the correct answer by 1, which is almost like a human error.
Lets try to add an additional Convolution2D layer and reduce the width of the Dense layers. The number of parameters is still too high (32 millions), but much less than model 1.
nb_epoch = 100
batch_size = 32
input_shape = X_train.shape[1:]
model = Sequential(name="model_2")
model.add(Convolution2D(64, 3, 3, input_shape=input_shape))
model.add(SReLU())
model.add(Convolution2D(64, 3, 3, input_shape=input_shape))
model.add(SReLU())
model.add(Flatten())
model.add(Dense(256))
model.add(SReLU())
model.add(Dropout(0.4))
model.add(Dense(64))
model.add(SReLU())
model.add(Dropout(0.4))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
print(model.summary())
save_model_summary(model, "model_2_summary.txt")
write_file("model_2.json", model.to_json())
fmon = FitMonitor(thresh=0.09, minacc=0.999, filename="model_2_autosave.h5")
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
hist = model.fit(
X_train,
Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
shuffle=True,
validation_data=(X_test, Y_test),
verbose=0,
callbacks = [fmon]
)
model_file = "model_2.h5"
print("Saving model to:", model_file)
model.save(model_file)
plot(model, to_file="model_2_scheme.png", show_layer_names=False, show_shapes=True)
show_scores(model, hist, X_train, Y_train, X_test, Y_test)
Adding more Convolution layers has reduced overfitting from 10% to 5%, but this is not good enough yet. The gap between the training and validation loss graph indicates that there's more room for improvement.
Before proceeding to our third model, let's take a moment for discussing one more isuue. From the two models we learn that training accuracy can be quite high, but we should not be impressed as we fall short in our validation sets. In some cases however we might be satisfied with what we got but would like to carry out further tests to make sure that the validation accuracy we have is not volatile. After all our validation set ("test.h5") has only 5000 samples, which might not be enough to trust our model.
Our imgutils contains a special method check_data_set for testing our model on as many samples as we wish from our large repository of samples (320K samples!). This method accepts three arguments:
You may want to sample a few thousand images from each repository in order to gain confidence in your model. here are two examples of using this method which show that the validation accuracy we got is trustable:
check_data_set(model, "figtab1.h5")
check_data_set(model, "figtab2.h5", sample=20000)
In both cases we get a validation score which pretty close to the one we got on our small validation data set.
We will add a third Convolution layer, and increase the filter size to 5x5 in the first two layers. In adition, we add three new MaxPooling2D layers (one after each Convolution2D). The immediate effect of these layers is a drastic reduction in the model number of parameters from 90 million to 915K almost 1% compared to model 1. Even if we get similar results to model 1, it would be considered a success and a proof for why Convolution and Pooling layers are the right kind of layers to use for image data.
nb_epoch = 100
batch_size = 32
input_shape = X_train.shape[1:]
model = Sequential(name="model_3")
model.add(Convolution2D(64, 5, 5, input_shape=input_shape))
model.add(SReLU())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, 5, 5, input_shape=input_shape))
model.add(SReLU())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, 3, 3, input_shape=input_shape))
model.add(SReLU())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512))
model.add(SReLU())
model.add(Dropout(0.5))
model.add(Dense(256))
model.add(SReLU())
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
print(model.summary())
save_model_summary(model, "model_3_summary.txt")
write_file("model_3.json", model.to_json())
fmon = FitMonitor(thresh=0.09, minacc=0.999, filename="model_3_autosave.h5")
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
hist = model.fit(
X_train,
Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
shuffle=True,
validation_data=(X_test, Y_test),
verbose=0,
callbacks = [fmon]
)
model_file = "model_3.h5"
print("Saving model to:", model_file)
model.save(model_file)
plot(model, to_file="model_3_scheme.png", show_layer_names=False, show_shapes=True)
show_scores(model, hist, X_train, Y_train, X_test, Y_test)
loss, accuracy = model.evaluate(X_train, Y_train, verbose=0)
print("Training: accuracy = %f ; loss = %f" % (accuracy, loss))
loss, accuracy = model.evaluate(X_test, Y_test, verbose=0)
print("Validation: accuracy = %f ; loss = %f" % (accuracy, loss))
This is one of the very rare occasions in which our neural network reached 100% training and validation accuracy! The second important point to bear in mind is that the number of parameters in our neural network has dropped from 70 million to 1 million! It means that with 1/70 of the size of our first model we were able to do much better.
This undoutably supports the claim that convolution and pooling layers are better fit for image processing than usual Dense layers.
To gain full confidence in our model, we will test it on our two large data sets figtab1.h5 and figtab2.h5 which contain 100,000 new samples.
check_data_set(model, "figtab1.h5")
check_data_set(model, "figtab2.h5")
Indeed, in both cases our model is achieving a very high accuracy score: 99.984% and 99.996%. This is a strong evidence for the validty and usefullness of our model when we apply it on new unknown samples.
Let's take a look on some of very few cases in which it fails.
X1, y1, Y1, X2, y2, Y2 = load_data('figtab1.h5', 'figtab2.h5')
y_pred = model.predict_classes(X1)
true_preds = [(x,y) for (x,y,p) in zip(X1, y1, y_pred) if y == p]
false_preds = [(x,y,p) for (x,y,p) in zip(X1, y1, y_pred) if y != p]
print("Number of valid predictions: ", len(true_preds))
print("Number of invalid predictions:", len(false_preds))
We have only 8 failureד from 50000 samples! Let's draw them all:
for i,(x,y,p) in enumerate(false_preds):
plt.subplot(2, 4, i+1)
plt.imshow(x, cmap='jet')
plt.title("%d\ny: %s\np: %s" % (i, class_name[y], class_name[p]), fontsize=9, loc='left')
plt.subplots_adjust(wspace=0.8, hspace=0.1)
5 of the 8 cases the model missed the small classes (1 triangle). An interesting question (or challenge) would be: can we build a Neural Network that is 100% robust for all samples?
We will stop our experiments here and let you try to do better (good luck!). Can you preserve the 100% accuracy levels with a much smaller number of parameters? A smaller number of neurons and layers is also desirable.
You can also experiment with other recognition tasks such as we mentioned above and see if they require different types of network complexity?