This post describes an easy way to use TensorFlowTM to make your own handwriting engine. It is shown here as an example project.
The full source code can be found on github https://github.com/niektemme/tensorflow-mnist-predict/.
I am in the process of writing an article on Machine Learning. When writing on this topic it is hard to ignore TensorFlowTM, a deep learning engine open sourced by Google. Deep learning is a branch of Machine Learning that uses the concept of the human brain in the form of neural networks to solve various problems such as image and speech recognition (Image 1). Problems that are hard to solve using computer ‘traditionally’: using a computer as a big calculator.
The fact that TensorFlow is created by Google gives it a lot of traction,especially among the tech sites I follow. To learn more about TensorFlow I joined the local “Coffee & Coding” meetup in Amsterdam who hosted “Get our hands dirty with TensorFlow”.
At the meetup we experimented with tutorials from the TensorFlow website. The tutorials themselves are clear and well written. To me it seems that these examples focus primarily on building and validating the model, but using the created models is not a priority. An exception to this is the ‘Image Recognition’ example. This is, however, one of the more complex examples, making it hard to use when you are not a Machine Learning expert.
While searching the internet–perhaps even using some AI from the same company that created TensorFlow–I saw that more people were trying to find how to apply the created models to solve actual problems.
So I set my goal on how to use a trained model using the easier TensorFlow MNIST tutorials on handwriting recognition.
The goal of this project is for my computer to recognize one of my own hand-written numbers using a trained model using the MNIST dataset. The MNIST dataset contains a large number of hand written digits and corresponding label (correct number).
This gives the following tasks:
- Train a model using the MNIST dataset.
- Save the model from step 1. Probably to file.
- Load the saved model in a different python script.
- Prepare and load an image of my own handwriting.
- Correctly predict the number I have written.
1. Train a model using the MNIST dataset
How to train a model is clearly explained in the first two tutorials form the tensorflow.org website. I did not modify anything in these the examples.
As expected, the model created form the second (expert) tutorial yielded better results in predicting the correct number form my handwriting.
2. Save the model
Saving the model is actually quite easy. It is clearly described in the TensorFlow documentation on saving and restoring variables.
It comes down to adding two lines of code to the python script explained in the TensorFlow tutorials.
Before initializing the TensorFlow (tf) variables you add:
and the following line at the bottom of the script:
The documentation gives a good explanation on how to do this. I have created two python scripts that already include these lines to create a model.ckpt file.
– create_model_1.py uses the beginners MNIST toturial
– create_model_2.py uses the expert MNIST tutorial
3. Load the saved model in a different python script
Loading the model back into a different python script is also clearly explaind on the same page in the TensorFlow documentation.
First you have to initialize the same TensorFlow variables that you used to create the model file. Then you use the TensorFlow Saver function again to restore.
4. Prepare and load an image of my own handwriting
The image of my written number has to be formatted in the same way as the images form the MNIST database. If the images don’t match, it will try to predict something else.
The MNIST website provides the following information:
– Images are normalized to fit in a 20×20 pixel box while preserving their aspect ratio.
– Images are centered in a 28×28 image.
– Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).
For the image manipulation I used the Python Imaging Library (PIL). Easily installed by:
Or look at the Pillow documentation for other installation options.
To get the image pixel values I perform the following steps. The code snippet of the imageprepare() function shows the code for all the steps.
- Load the image of my handwritten number.
- Convert the image to black and white (mode ‘L’)
- Determine which dimension of the original image is the largest
- Resize the image so that the largest dimension (ether the width of the height) is 20 pixels and the smallest dimension scales in the same ratio.
- Sharpen the image. This dramatically improves the result.
- Paste the image on a white 28×28 pixel white canvas. Center the image 4 pixels from the top or side from the largest dimension. The largest dimension is always 20 pixels and 4 + 20 + 4 = 28. De smallest dimension is positioned at half the difference between 28 and the new size of the scaled image.
- Get pixel values of the new image (canvas + centered image).
- Normalize the pixel values to a value between 0 and 1 (this is also done in the TensorFlow MNIST tutorials). Where 0 is white and 1 is pure black. The pixel values attained from step 7 are opposite where 255 is white and 0 black, so values have to be inversed. The following formula both inverts and normalizes (255-x)*1.0/255.0
I am cheating a bit, because I am suppling a cropped image. I have not adapted the function yet to auto crop. You can also use just a vector-based tool to create the handwritten image.
The argv variable passed to the imageprepare() function is the file path.
5. Predict the written number
Predicting the number is now relatively simple using the predict function. As explaind by ‘Pannus’ on the TensorFlow Github discussion on issue 97.
Following the documentation on restoring variables, the code for loading the model and using this model to predict the integer using the pixel values from preparing the image is one of the following:
When using the beginners MNIST tutorial (tutorial 1):
When using the expert MNIST tutorial (tutorial 2):
The difference between tutorial 1 and 2 is that the prediction in the model from the expert tutorial (model 2) uses the variable y_conv as the predicted label instead of y label in model 1 and that the prediction.eval function using model 2 requires another argument keep_prob: 1.0
The code snippet bellow shows the complete predictint() function to predict the correct integer and the main function to tie it all together (expert mode). The predictint() function takes the resulting pixel values from the imageprepare() function as input.
These complete scripts can also be used.
– predict_1.py uses the model form first MNIST toturial
– predict_2.py uses the model form the second MNIST expert toturial
Here are some of the numbers I tested using the neural network from the expert tutorial (model 2). The result is reasonably good. Funnily enough, it makes mistakes a human could make (except maybe mistaking a 7 for a 3). I guess some more fine-tuning is needed.