The sexy thing about Deep Learning

Soundtrack: Franz Liszt – La Campanella

Deep learning is considered among the most blood rushing topics ever since Artificial Intelligence joined the human intelligence scene; Imagine an acquaintance came to you one day, gave you a pill to feel better, and suddenly, you start using all the data that you have stored ever since you were born, even a random information your eyes captured while you were sitting in the bus passing by some high tech store.

maxresdefault.jpg
Mysterious super-smart”ish” unhealthy pill from the movie “Limitless”

Hum, the idea of limitless knowledge does sound attractive, even more, the efficient management of the information we have acquired, and that, ladies and gentlemen, what deep learning could put into place; it’s like building a brain with multiple neurons, the layers represent the condensed shape of our brain and it simply memorizes all the data it was trained on by shaping patterns of it to, eventually, acquire based on that enough experience to decide by itself what new data could represent. 

Artificial-Intelligence-Neural-Network-Nodes-670x440
Human brain vs. Artificial Neural Network

Let’s take a real life use case. Each one of us at some point is going to need glasses, whether it’s for fashion or just to be able to, ehum, SEE. I personally have never had a good taste when it comes to fashion items, but when I wanted to give myself a shot, I received some negative feedback (Ouch)!!

So, I wanted to check for an online tool to help me decide which glasses suit my face, but to my surprise, I only found blog posts or face diameter calculations (euh…), and as a big fan of automating random tasks, I was disappointed that nothing could really give me a customized automatic recommendation based on, for example, my “portrait”. So, I wanted to create a virtual brain that could decide for me what should I make my eyes wear.

What is again that attractive thing that can memorize information and give us decisions based on that? DEEP LEARNING, right !! No need to calculate the diameter of my face, determine my eyes’ shape, or have to stick in front of a mirror for minutes trying on dozens of frames instead of a couple. shot, picture and give me my face shape!!

Now, like any good professor, having a good methodology isn’t enough, to feed a student’s curiosity, you NEED to whether have an answer to all their questions or a cool way to guide the thoughts. When it comes to training a deep learning model, your spices are your dataset. Basically, the crucial task in training a neural network is preparing the data we need to feed it, consider it like teaching: It is not enough to provide the information, even if your model is smart/efficient, the way you feed it the data has a big contribution to its performance.

Let’s go back now to the eye wear recommendation system: Let’s agree that it’s a computer vision problem where we aim, by providing a person’s face, to determine which kind of frame suits her/him, and in order to do that, we don’t need a huge dataset of faces with glasses, we need a decent dataset of classified faces into the most common shapes.

 

splits
Face shapes to be predicted

 

 

In my case, I built a 3 classes dataset without any specific split, but in most cases, you’d need to have disconnected splits for training, validation and maybe test in order to make sure that your model is learning by following the evaluation metrics. For a classification problem, we need to follow the accuracy and the loss through the training and validation phases (Again, easy peasy thanks to Tensorboard if you’re using Tensorflow or Keras with Tensorflow as backend). For other problems, other metrics might be needed for better evaluation, so the accuracy won’t always be your guide, but that’s another story.

The suitable kind of neural network when dealing with pictures as data is a CNN, not the News channel, more like Convolutional Neural Network. If you’ve ever had to deal with pictures you’ll know that it’s the kind of information that needs filters in most cases to process. if you’re into maths (Nerd alert) you’ll get the convolution part. However, the vocab is kind of different because for a CNN we use kernels and get features, but that’s also a different story. For my mini experimentation project, I used Mobilenet which is a small efficient model mobile oriented and created by Google. You can check their Tutorial Here.

My classes’ files contained each 100 pictures, which is not enough to train a DEEP learning computer vision model. So how could the classification work out ? 

2 words, 17 letters: Transfer Learning

Actually there are lots of pre-trained models out there, making it easier for us to train a neural network with standard hardware (Your laptop’s CPU for instance) on small data and still get decent results. The idea of pre-trained models goes around using a model that holds information from its training on a huge dataset such as ImageNet. the information is held within the weights and instead of randomly initializing our model, we would use weights that would minimize the costs of training for us.

Let’s go back to some results:

I made my poem using Google’s tutorial mentioned earlier, which is Tensorflow for Poets *Taadaaaa*, after building the dataset with pictures from Google images, Flickr and Helen dataset (Labelled human faces’ dataset, although I didn’t use the labels because I didn’t need them for classification), I tried predicting my face shape using this picture:

selfie

And I got these results:

screen_me

 

So by the help of Transfer learning, it turned out that I have a round face. I can now find myself a suitable eye wear and, why not, maybe get the “sexy” side of deep learning. Haha !

If you have any questions/remarks, I would be glad to exchange, please feel free to send me an e-mail: nesrine.boussenna@gmail.com

Cheers! 

Nessie.

 

 

 

Leave a comment