Deep Learning and Computer Vision

Recent advances in Deep Learning have propelled Computer Vision forward.  Applications such as image recognition and search, unconstrained face recognition, and image and video captioning which only recently seemed decades off, are now being realised and deployed at scale.

Machine learning techniques use data (images, signals, text) to train a machine (or model) to perform a task such as an image classification, object detection, or language translation. Classical machine learning techniques are still being used to solve challenging image classification problems. However, they don’t work well when applied directly to images, because they ignore the structure and compositional nature of images. Until recently, state-of-the-art techniques made use of feature extraction algorithms that extract interesting parts of an image as compact low-dimensional feature vectors. These were then used along with traditional machine learning algorithms

Convolutional Neural Networks (CNN)

Sounds like a weird combination of biology and math with a little CS sprinkled in, but these networks have been some of the most influential innovations

in the field of computer vision. 2012 was the first year that neural nets grew to prominence as Alex Krizhevsky used them to win that year’s ImageNet competition (basically, the annual Olympics of computer vision), dropping the classification error record from 26% to 15%, an astounding improvement at the time. Ever since then, a host of companies has been using deep learning at the core of their services. Facebook uses neural nets for their automatic tagging algorithms, Google for their photo search, Amazon for their product recommendations, Pinterest for their home feed personalization, and Instagram for their search infrastructure.

The Problem Space

Image classification is the task of taking an input image and outputting a class (a cat, dog, etc.) or a probability of classes that best describes the image. For humans, this task of recognition is one of the first skills we learn from the moment we are born and is one that comes naturally and effortlessly as adults. Without even thinking twice, we’re able to quickly and seamlessly identify the environment we are in as well as the objects that surround us. When we see an image or just when we look at the world around us, most of the time we are able to immediately characterise the scene and give each object a label, all without even consciously noticing. These skills of being able to quickly recognise patterns, generalise from prior knowledge, and adapt to different image environments are ones that we do not share with our fellow machines.

When you first heard of the term convolutional neural networks, you may have thought of something related to neuroscience or biology, and you would be right. Sort of. CNNs do take a biological inspiration from the visual cortex. The visual cortex has small regions of cells that are sensitive to specific regions of the visual field. This idea was expanded upon by a fascinating experiment by Hubel and Wiesel in 1962 where they showed that some individual neuronal cells in the brain responded (or fired) only in the presence of edges of a certain orientation. For example, some neurons fired when exposed to vertical edges and some when shown horizontal or diagonal edges. Hubel and Wiesel found out that all of these neurons were organised in a columnar architecture and that together, they were able to produce visual perception. This idea of specialised components inside of a system having specific tasks (the neuronal cells in the visual cortex looking for specific characteristics) is one that machines use as well and is the basis behind CNNs.

Solutions to real-world computer vision problems often require trade-offs depending on your application: performance, accuracy, and simplicity of the solution. Advances in techniques such as deep learning have significantly raised the bar in terms of the accuracy of tasks like visual recognition, but the performance costs were too significant for mainstream adoption. GPU technology has closed this gap by accelerating training and prediction speeds by orders of magnitude.

MATLAB makes computer vision with deep learning much more accessible. The combination of an easy-to-use application and programming environment, a complete library of standard computer vision and machine learning algorithms, and tightly integrated support for CUDA-enabled GPUs makes MATLAB an ideal platform for designing and prototyping computer vision solutions.

To Learn More on Deep Learning Click Here

To Learn More on Computer Vision Click Here

Ritesh Kanjee has over 7 years in Printed Circuit Board (PCB) design as well in image processing and embedded control. He completed his Masters Degree in Electronic engineering and published a paper for IEEE called Vision-based adaptive Cruise control using Pattern matching (on Google Scholar). His work was implemented in LabVIEW. He works as an Embedded Electronic Engineer in defence research. He has experience in FPGA design with programming in both VHDL and Verilog.