What exactly are we talking about ?
Let’s go straight to the point (that you already know) : users, companies share more and more data. By Data, we mean texts, images, sound, videos, basically any piece of information that can be stored. The amount of data to be processed nowadays is such that there’s no way humans can handle it. And at the same time, having more data means we can make a machine learn more, because we have more examples for it to train on. These two facts naturally led to a rise of Artificial Intelligence (AI) in the last years.
Not only do we have more data, but the recent advances in computation powers have unlocked the Machine Learning (ML) and Deep Learning (DL) theories. Indeed, the theory of ML and DL has existed for long – over 50 years – but applying it was limited because of the poor computation power of our Central Processing Units (CPUs). Ever since we realized that Graphics Processing Units (GPUs), initially designed for graphic rendering, could make calculations more than 10 times faster as before, and especially since their prices went down, DL has known a significant gain in development.
Machine or Deep Learning ?!
DL belongs to the broader field of ML. ML basically refers to a wide range of complex logic and rule based systems. These technologies allow the computer to imitate, in a superficial manner, human-like thinking. Whether AI can really produce ‘intelligent’ thinking is a whole debate. Well, unfortunately for ‘Sci fi’ fans, they still need to wait before computers reach human beings in terms of thinking. Computers can compute, calculate, analyze complex and highly non-linear relationships, but cannot produce intelligence, at least not yet.
ML resemble techniques that enable computers to improve at performing precise tasks. General tasks include :
- Classification : is this an image of a dog ? or of a chair ?
- Regression : how many customers will visit my website next month?
ML techniques will be more or less efficient depending on the data that comes in and the type of learning task. In traditional ML, the data that comes in is often a digest of the raw dataset you use, refered to as features. Indeed, you’ll want to manually emphasize some parts of your data, eventually remove some others, in order to feed the training task with what you think will really be discriminent for the prediction. For example, suppose you want to predict the number of visitors you’ll have in August 2017 given the historical data of last 3 years. In traditional ML, you’d have to handcraft what you think will be discriminant in order to help the model to learn (e.g detect the influent lag values, make seasonality, trend and stationarity measures, descriptive statistics…). That’s the feature extraction process.
Think of DL as a natural evolution of ML. It solves this feature extraction problem by making it part of the model. Under DL, a Neural Network will both train on the feature extraction and on the task. It will learn by itself what information is important in order to execute the task it’s asked to perform.
We’ll come back to this, let’s now quicky see more concretely what are the applications for images.
Deep Learning for Computer Vision
What can an AI do with images ? Most popular tasks are resumed below : they tend to turn an unstructured data (image) into something structured like what’s inside (
cat) and sometimes along with its relative position in the image.
Classification can capture both objects and scenes. Objects are the entities that appear in the image, e.g a cat, a dog. Scenes are more general attributes like
party… which we call scene classification.
DL classification can also go further by describing images in full sentences rather than just chained labels. Those are captioning models. You can find below the results of an existing model (Neural Talk).
Other applications of Deep Learning on images include generative models (e.g going from text to image – the other way around!), or style transfer (turn your selfie into a Picasso painting)
Ok so that’s the most basic usages of AI on images today. Most of them use Deep Learning models.
With some level of understanding of what DL is about, and what it could be used for, you may now ask yourself one or more of the following :
1. Why should I care ? Can it be useful for my company ?
2. Follow one use-case of leveraging Deep Learning for Data Quality
Why should I care ? Can it be useful for my company ?
If you have an e-business, computer vision/image processing based on DL can be useful for both customer or internal oriented applications.
Smarter Search Engine
One use of computer vision could be the implementation of a Smarter Search Engine. By smarter, we refer to the automated task of filtering products on the website. Indeed, as an alternative to using a standard filtering system, customers can select a product they like and automatically be shown visually similar products. This enables e-commerce sites to deploy a smarter online merchandising and a more efficient customer retargeting.
Here is below an example of what Equancy has already done for a famous european shoe e-commerce site:
Another use of computer vision is to automatically tag images. Thanks to this, manual tagging is not needed anymore, making image organisation on a large scale quicker and more accurate. Having accurate keywords will also improve search and product filtering.
A practical use-case : Image Tagging Data Quality
Having an image collection with correct and precise tags is crucial.
Suppose you’re an e-commerce company and you have in your database a picture of a table mislabeled as a chair. A customer looking for a chair will see a table in the results… So first, you miss an opportunity to show one more chair to your customer. But even worse, that table will never be listed for a customer looking for a table, and hence will never get sold !
Training a network to tag images can actually be a way to improve Data Quality. Here at Equancy, we’ve improved the quality of a large beverage company’s data collection by training a Convolutional Neural Network. Convolutional means that the network uses convolutions : operations that browse images looking for specific patterns. Thanks to these operations, that network provides multiple tags per image.!
So, first, it enabled to fill tags for many images that didn’t have any. But also, when our predicted tags didn’t match the current image tags, we could signal potential errors ! To be more precise, we looked when our model was very confident on a new tag, and at the same time very poorly confident on the current tag. For example below the image was initially labeled as “Heritage” and “Photography”. Not only our model did remove “Photography” because it is not a discriminent label, but also we corrected the “Heritage” to “Drink Shots” ! Moreover, the image didn’t initially have any keyword, now it has 9 relevant keywords, such as cocktail, juice, etc …
Through this article we’ve hopefully managed to get you to understand what’s at stake when we talk about Deep Learning. Not only its business applications, we’ve also described a general idea of how it works : it is a kind of Machine Learning where Neural Networks, with many layers, are used in order to perform a given operation. We’ve studied how images were great candidates for Deep Learning, using Convolutional Neural Networks.
And that’s not all ! Other fields are undergoing strong changes thanks to Deep Learning. For example text analysis : you probably heard things about chatbots. Chatbots now use Neural Networks algorithms to automatically handle conversations… Just like language translation tools ! And the applications are very numerous : we may zoom into them in a further article soon 🙂
Don’t hesitate asking questions below !