Economies of Scale

How Deep Learning is Changing Real-World Computer Vision

The expansion of computer-vision-based systems and applications is enabled by many factors, including advances in processors, sensors and development tools. But, arguably, the single most important thing driving the proliferation of computer vision is deep learning.

With deep learning we tend to re-use a relatively small handful of algorithms across a wide range of applications and imaging conditions. This has two important consequences. (Image: MVTec Software GmbH)

With deep learning we tend to re-use a relatively small handful of algorithms across a wide range of applications and imaging conditions. This has two important consequences. (Image: MVTec Software GmbH)


The fact that deep learning-based visual perception works extremely well – routinely achieving better results than older, hand-crafted algorithms – has been widely discussed. What is less widely understood, but equally important, is how the rise of deep learning is fundamentally changing the process and economics of developing solutions and building-block technologies for commercial computer vision applications. Prior to the widespread use of deep learning in commercial computer vision applications, developers created highly complex, unique algorithms for each application. These algorithms were usually highly tuned to the specifics of the application, including factors such as image sensor characteristics, camera position, and the nature of the background behind the objects of interest. Developing, testing and tuning these algorithms often consumed tens or even hundreds of person-years of work. Even if a company was fortunate enough to have enough people available with the right skills, the magnitude of the effort required meant that only a tiny fraction of potential computer vision applications could actually be addressed.

Less diverse algorithms

With deep learning, in contrast, we tend to re-use a relatively small handful of algorithms across a wide range of applications and imaging conditions. Instead of inventing new algorithms, we re-train existing, proven algorithms. As a consequence, the algorithms being deployed in commercial computer vision systems are becoming much less diverse. This has two important consequences:

– First, the economics of commercial computer vision applications and building-block technologies have fundamentally shifted. Take processors, for example. Five or ten years ago, developing a specialized processor to deliver significantly improved performance and efficiency on a wide range of computer vision tasks was nearly impossible, due to the extreme diversity of computer vision algorithms. Today, with the focus mainly on deep learning, it’s much practical to create a specialized processor that accelerates vision workloads – and it’s much easer for investors to see a path for such a processor to sell in large volumes, serving a wide range of applications.

– Second, the nature of computer vision algorithm development has changed. Instead of investing years of effort devising novel algorithms, increasingly these days we select among proven algorithms from the research literature, perhaps tweaking them a Bit for our needs. So, in commercial applications much less effort goes into designing algorithms. But deep learning algorithms require lots of data for training and validation. And not just any data. The data must be carefully curated for the algorithms to achieve high levels of accuracy. So, there’s been a substantial shift in the focus of algorithm-related work in commercial computer vision applications, away from devising unique algorithms and towards obtaining the right quantities of the right types of training data.

The right training data

In my consulting firm, BDTI, we’ve seen this very clearly in the nature of the projects our customers bring us. A recent project illustrates this. The customer, a consumer products manufacturer, wanted to create a prototype product incorporating vision-based object classification in three months. The initial target was to identify 20 classes. Hardware design was not an issue – sensors and processors were quickly identified and selected. Algorithm development also moved speedily. The key challenge was data. To achieve acceptable accuracy, the system required a large quantity of high-quality, diverse data. There was no suitable data available, so the data set had to be created from scratch. But not just any data will do. Our first step was to design a data capture rig that would produce the right kinds of images. Here, an understanding of camera characteristics, perspective, and lighting led to detailed specifications for the data capture rig.

The difficulty in creating this data set was compounded by the requirement that the system differentiate between classes that are difficult to for humans to distinguish. In this type of situation, curation of training and validation data is critical to achieving acceptable accuracy. For this project, in addition to specifying the data capture rig, we took several steps to ensure success. For example, we provided the customer with detailed instructions for capturing data, including varying perspective and illumination in specific ways. We also specified employing different personnel to prepare and position the items for capture and requested that domain experts provide input to ensure that the data was realistic. The captured data was then carefully reviewed, with unsuitable images rejected.

Summary

The bottom line here, which shouldn’t surprise any of us, is that while deep learning is an amazing, powerful technology, it’s not a magic wand. There’s still lots of work required to field a robust computer vision solution – and it’s largely a different type of work from what was required using traditional vision algorithms.

Embedded Vision Summit 2020
Besides BDTI and the Embedded Vision Alliance Jeff Bier is also organizer of the yearly Embedded Vision Summit, the industry´s largest event for practical computer vision. The next event will take place in Santa Clara (California) from 18-20 May 2020.
Autor: Jeff Bier, Founder Embedded Vision Alliance and President BDTI
www.bdti.com

Topstories

Figure 1 | By processing a thin-film multilayer stack with photoactive layer sensitive in the infrared range (r.), on top of a Silicon readout circuitry (ROIC), Imec creates an IR-sensitive CMOS imager (l.) that’s compatible with mass manufacturing. (Image: Imec)
Figure 1 | By processing a thin-film multilayer stack with photoactive layer sensitive in the infrared range (r.), on top of a Silicon readout circuitry (ROIC), Imec creates an IR-sensitive CMOS imager (l.) that’s compatible with mass manufacturing. (Image: Imec)
Si-based CMOS imagers to detect SWIR wavelengths above 1µm
Si-based CMOS imagers to detect SWIR wavelengths above 1µm

Thin-film photodetectors Si-based CMOS imagers to detect...

News

More Posts

AI Powered 3D Vision System for Robotic Pick&Place Applications

VCV-Cortex is an AI powered stereovision system developed for robotic applications. It works with any camera and optical system to provide the desired range, FoV and precision. Unlike structured light cameras, it can work on dark and/or shiny objects at high speed.

Successful Robot Days at Willich

Successful Robot Days at Willich For the 4th time Willich Elektrotechnik organized its Robot Days in Bebra, Germany. Over two days an interesting and varied programme was presented in the company's own exhibition and training centre. In addition to lectures and live...

Image Sensors Europe 2020

Image Sensors Europe 2020 The 4th annual Image Sensors will take place in London, March 10 to 12, 2020, and offers the chance to network with over 240 attendees and hear keynote speakers, including representatives from Sony, Basler, Samsung and more. The complete...

New COO at Lakesight

New COO at Lakesight Since September 2019 Thomas Dekorsy has been Executive Integration Officer at Lakesight Technologies. Simultaneously he is the Managing Director of Mikrotron and SVS-Vistek. Dekorsy will start as COO of Lakesight from December 1, 2019, taking over...

Free CCD eBook

Free CCD eBook Harvest Imaging offers the book 'Solid-State Imaging with Charge-Coupled Devices' by Albert J.P. Theuwissen for free. The book covers the entire imaging chain: from the basics of CCDs to applications. Follow the link to download the PDF.

Change in the Matrix Vision Management

Change in the Matrix Vision Management Uwe Furtner, who has been with Matrix Vision for 27 years and Technical Director since 1996, will end his operational activities on December 31. He will dedicate himself to new challenges in the area of coaching and management...

- Anzeige -