In this article, I am going to provide a 30,000 feet view of Neural Networks. The post is written for absolute beginners who are trying to dip their toes in Machine Learning and Deep Learning.
We will keep this short, sweet and math-free.
This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials :
- Neural Networks : A 30,000 Feet View for Beginners
- Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support )
- Introduction to Keras
- Understanding Feedforward Neural Networks
- Image Classification using Feedforward Neural Networks
- Image Recognition using Convolutional Neural Network
- Understanding Activation Functions
- Understanding AutoEncoders using Tensorflow
- Image Classification using pre-trained models in Keras
- Transfer Learning using pre-trained models in Keras
- Fine-tuning pre-trained models in Keras
- More to come . . .
Neural Networks as Black Box
We will start by treating a Neural Networks as a magical black box. You don’t know what’s inside the black box. All you know is that it has one input and three outputs. The input is an image of any size, color, kind etc. The three outputs are numbers between 0 and 1. The outputs are labeled “Cat”, “Dog”, and “Other”. The three numbers always add up to 1.
Understanding the Neural Network Output
The magic it performs is very simple. If you input an image to the black box, it will output three numbers. A perfect neural network would output (1, 0, 0) for a cat, (0, 1, 0) for a dog and (0, 0, 1) for anything that is not a cat or a dog. In reality, though, even a well trained neural network will not give such clean results. For example, if you input the image of a cat, the number under the label “Cat” could say 0.97, the number under “Dog” could say 0.01 and the number under the label “Other” could say 0.02. The outputs can be interpreted as probabilities. This specific output means that the black box “thinks” there is a 97% chance that the input image is that of a cat and a small chance that it is either a dog or something it does not recognize. Note that the output numbers add up to 1.
This particular problem is called image classification; given an image, you can use the label with the highest probability to assign it a class ( Cat, Dog, Other ).
Understanding the Neural Network Input
Now, you are a programmer and you are thinking you could use floats and doubles to represent the output of the Neural Network.
How do you input an image?
Images are just an array of numbers. A 256×256 image with three channels is simply an array of 256x256x3 = 196,608 numbers. Most libraries you use for reading the image will read a 256×256 color image into a continuous block of 196,608 numbers in memory.
With this new knowledge, we know the input is slightly more complicated. It is actually 196,608 numbers. Let us update our black box to reflect this new reality.
I know what you are thinking. What about images that are not 256×256. Well, you can always convert any image to size 256×256 using the following steps.
- Non-Square aspect ratio: If the input image is not square, you can resize the image so that the smaller dimension is 256. Then, crop 256×256 pixels from the center of the image.
- Grayscale image: If the input image is not a color image, you can create a 3 channel image by copying the grayscale image into three channels.
People use many different tricks to convert an image to a fixed size ( e.g. a 256×256 ) image, but since I promised I will keep it simple, I won’t go into those tricks. The important thing to note is that any image can be converted into a fixed size image even though we lose some information when we crop and resize an image to that fixed size.
What does it mean to train a Neural Network ?
The black box has knobs that can be used to “tune” it. In technical jargon, these knobs are called weights. When the knobs are in the right position, the neural network gives the right output more often for different inputs.
Training the neural net simply means finding the right knob settings ( or weights ).
How do you train a Neural Network?
If you had this magical black box but did not know the right knob settings, it would be a useless box.
The good news is that you can find the right knob settings by “training” the Neural Network.
Training a Neural Network is very similar to training a little child. You show the child a ball and tell her that it is a “ball”. When you do that many times with different kinds of balls, the child figures out that it is the shape of the ball that makes it a ball and not the color, texture or size. You then show the child an egg and ask, “What is this?” She responds “Ball.” You correct them that it is not a ball, but an egg. When this process is repeated several times, the child is able to tell the difference between a ball and an egg.
To train a Neural Network, you show it several thousand examples of the classes ( e.g. Cat, Dog, Other ) you want it to learn. This kind of training is called Supervised Learning because you are providing the Neural Network an image of a class and explicitly telling it that it is an image from that class.
To train a neural network, we, therefore, need three things.
- Training data : Thousands of images of each class and the expected output. For example, for all images of cats in this dataset, the expected output is (1, 0, 0).
- Cost function : We need to know if the current setting is better than the previous knob setting. A cost function sums up the errors made by the neural network over all images in the training set. For example, a common cost function is called sum of squared errors (SSE). If the expected output for an image is a cat, or (1, 0, 0) and the neural network outputs (0.37, 0.5, 0.13), the squared error made by the neural network on this particular image is . The total cost over all images is simply the sum of squared errors over all images. The goal of training is to find the knob settings that will minimize the cost function.
- How to update the knob settings: Finally we need a way to update the knob settings based on the error we observe over all training images.
Training a neural network with a single knob
Let’s say we have a thousand images of cats, a thousand images of dogs, and a thousand images of random objects that are not cats or dogs. These three thousand images are our training set. If our neural network has not been trained, it will have some random knob settings and when you input these three thousand images, the output will be right only one in three times.
For the purpose of simplicity, let’s say our neural network has just one knob. Since we have just one knob, we could test a thousand different knob settings spanning the range of expected knob values and find the best knob setting that minimizes the cost function. This would complete our training.
However, the real world neural networks do not have a single knob. For example, VGG-Net, a popular neural network architecture has 138 million knobs!
Training a neural network with multiple knobs
When we had just one knob, we could easily find the best setting by testing all (or a very large number of) possibilities. This quickly becomes unrealistic because even if we had just three knobs, we would have to test a billion settings. Imagine the number of possibilities with something as large as VGG-Net. Needless to say a brute force search for the optimal knob settings is not feasible.
Fortunately, there is a way out. When the cost function is convex ( i.e. shaped like a bowl ), there is a principled way to iteratively find the best weight by a method called Gradient Descent
Gradient Descent
Let’s go back to our Neural Network with just one knob and assume that our current estimate of the knob setting ( or weight ) is . If our cost function is shaped like a bowl, we could find the slope of the cost function and move a step closer to the optimum knob setting . This procedure is called Gradient Descent because we are moving down (descending) the curve based on the slope (gradient). When you reach the bottom of the bowl, the gradient or slope goes to zero and that completes your training. These bowl-shaped functions are technically called convex functions.
How do you come up with the first estimate? You can pick a random number.
Note: If you are using popular neural network architectures like GoogleNet or VGG-Net, you can use the weight trained on ImageNet instead of picking random initial weights to get much faster convergence.
Gradient Descent works similarly when there are multiple knobs. For example, when there are two knobs, the cost function is a bowl in 3D. If we place a ball on any part of this bowl, it will roll down to the bottom following the path of the maximum downward slope. This is exactly how gradient descent works. Also, note that if you let the ball roll down at full velocity, it will overshoot the bottom and take much more time to settle down at the bottom compared to a ball that is rolled down slowly in a more controlled manner. Similarly, while training a neural network, we use a parameter called the learning rate to control convergence of cost to its minimum.
When we have millions of knobs (weights), the shape of the cost function is a bowl in this higher dimensional space. Even though such a bowl is impossible to visualize, the concept of slope and Gradient Descent works just as well. Therefore, Gradient Descent allows us to converge to a solution thus making the problem tractable.
Backpropagation
There is one piece left in the puzzle. Given our current knob settings, how do we know the slope of the cost function?
First, let’s remember that the cost function, and therefore its gradient depends on the difference between true output and the current output for all images in the training set. In other words, every image in the training set contributes to the final gradient calculation based on how badly the Neural Network performs on those images.
The algorithm used for estimating the gradient of the cost function is called Backpropagation. We will cover backpropagation in a future post and yes it does involve calculus. You would be surprised though that backpagation is simply repetitive application of the chain rule that you might have learned in high school.
Subscribe & Download Code
If you liked this article and would like to receive a free Computer Vision Resource Guide, please subscribe. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. You will also receive free access to all the code I have written for this blog.
Daniel Lélis Baggio says
That was a nice and clean introduction for neural networks and computer vision. Congratulations! And the amount of math is great even though you promised not to use it 😉
Satya Mallick says
Thanks Daniel.
Kartik Sareen says
very well explained….
Douglas Jones says
Satya, one of the better descriptions I have read. Certainly the most readily understandable! I will remember this explanation as I try to absorb everything next week at GTC. This year it seems to be all about AI, Deep Learning, Neural Networks and VR!
Satya Mallick says
Thanks, Doug. Yes, those are all taking off in a big way.
Massimo Perrone says
Hello Satya,
thanks for the nice introduction. My question is: who assures that the cost function is convex?
Thanks again.
Massimo
Satya Mallick says
Thanks. Massimo. We have chosen the cost function to be the sum of squared error ( output – estimated_output)^2 which is convex. However, you don’t always need convex function. You should see this excellent video by Yann Lecun “Who is afraid of non-convex loss functions”. http://videolectures.net/eml07_lecun_wia/
mohamad hadad says
nice intro. Thanks you made it crystal clear
Satya Mallick says
Thanks, Mohamad.
jake says
I have read quite a few “simplified” explanations of NN’s. This is by far the best! Thank you!
Satya Mallick says
Thanks, Jake.
Mostafa says
Thanks a lot Satya.
It was a nice introduction. However, I still don’t understand the part on calculating the slop and how backpropagation helps us there. Looking forward to your next post.
Satya Mallick says
Thanks, Mostafa. I have not explained backprop in this post. The take away is the following — if the cost function is convex ( bowl shaped ) wrt the weights, then you can use the slope to find the direction in which to move so as to find the minimum of the cost function. Yes, more posts are coming soon.
Luke Costantino says
Great article as usually!
I hope you will cover also CNNs and so on in your future articles.
Satya Mallick says
Thanks, Luke. Yes that is the idea. Want to build on this post and gradually introduce difficult concepts.
Waheed Rafiq says
Best I have read ,and it was dyslexic friendly. well done and thanks for all your support
Satya Mallick says
Thanks a buch, Waheed.
saif mulla says
A very simple and naive way to introduce neural networks, good work, Thanks!
Satya Mallick says
Thanks!
Duke Yang says
Thanks for this nice article. it helps me. 🙂
Satya Mallick says
Thanks, Duke
momo says
How to get code please ?
Satya Mallick says
There is no code for this post. For code related to other posts, you can subscribe and confirm your email to receive a link
momo says
I just subscribe, sould I wait to get the link for other posts ?
Satya Mallick says
Yes, if you clicked on the confirmation email, you should receive an email in about 5 to 10 minutes. If you don’t, please send me an email at [email protected]
عبدالله says
fantastic article and a great explanation.
I have a question regarding Deep learning (ex.Tensorflow) and OpenCv; Can we export the results of the image clarification of Tensorflow and imported on OpenCv ?
the purpose of this move is to gain the power and the accuracy of the deep learning instead of using the other training model such as (Haar and LBP) and use the opencv library at the same time.
Renjithms Kulathoor says
Hai, I am Renjith M S, when will the course for computer vision starts, WHT will be the fee structure, is it a course to be done online, I am seriously interested to do the course, plese inform me when it begins
Satya Mallick says
Hi Renjithms,
Sorry I missed this comment. The course is open now.
https://courses.learnopencv.com/p/computer-vision-for-faces
Satya
Cayman Cheng says
This is really the most helpful introduction I’ve ever seen on the Internet. No one seems able to clearly explain this thing. Thank you Mr. Mallick.
Satya Mallick says
Thanks!
vial wuya says
thank you so much! sir
Satya Mallick says
🙂 You’re welcome.
Jonase says
I really don’t know what to say rather than thank you very much for your effort.
Unlike many other tutorials those walk you through and teach you nothing, this article gives you the idea of what things are and how they work. Thanks once again.
Satya Mallick says
Thanks, Jonase.
Rafi Kusuma says
Good overview. thank you. Now i really understand
Satya Mallick says
Thanks, Rafi
Shoaib Alauddin says
Awesome introduction from very basic to understanding level.
Satya Mallick says
Thanks, Shoaib.
Ravi srirangam says
Hello Satya,
Thanks for the great introduction to NN. This helped me to understand the concepts in a short amount of time.
Satya Mallick says
Thanks, Ravi.
gllm says
going through this article felt like a gradient descent with a perfectly tuned learning rate, thank you for making it so clear 🙂
Satya Mallick says
Thanks for the kind words 🙂