Designing A Unique Vision System Tailored to Your Application – A Technical Guide

There are many factors that need to be considered when designing a vision system, including: the operating system, the hardware platform, the software, and API, as well as the elements that make the vision system “see” – the camera and its lens.

The camera and lens are the essence of the vision system, although every aspect plays a vital role in achieving the desired results. Therefore, both the camera and the lens should be selected as a pair on the basis of overall system requirements. This article discusses various elements of vision systems, with a focus on the camera and its lens.

Lenses and Optics

Both the camera and the lens have their own strengths, which must be considered when capturing image data. For example, a camera with a high resolution sensor is places less demand on the lens for the purpose of magnification as the captured image can be digitally enlarged.

This enables a lens with a larger field of view to be selected to capture more information in one frame. However, if a camera does not have a high resolution sensor, a lens with a longer focal length could be used to obtain the detail required for the application.

Working Distance and Field of View

Working Distance (WD) is the distance between the front of the lens and the object to be imaged. Field of View (FoV) is the amount of area to be imaged by the camera in a single frame.

WD is one of the most stringent design constraints in a vision system. Based on specific application requirements, there may be design limitations with regard to the minimum or maximum distance of the camera from its target, due to the geometry of the setup.

Similarly, FoV is an important consideration. Regardless of the distance between the target and the camera, the amount of the scene to be captured by the camera is also critical to the application. Lens selection can optimize the system’s performance in this situation.

The WD and the FoV share a basic trigonometric relationship which is dependent on the angular FoV (α – in degrees) of the system. This relationship is illustrated below and expressed as:

The relationship between angular FoV of the system and the size of the image sensor (d) and the focal length of the lens (f) is illustrated by the equation below:

For instance, a camera, placed 1 m from the target with a 1 inch optical format sensor and a 12 mm lens, will have a vertical FoV of 833 mm and a horizontal FoV of 1041 mm1. If the horizontal dimension of the sensor is 12.8 mm, the horizontal angular FoV can be determined as:

Substituting to determine the horizontal FoV:

Similarly, if the vertical dimension of the sensor is 9.3 mm, then

Substituting to determine the vertical FoV:

A lens should be selected so that it can focus within the working distance and can attain the FoV required for the application. For acceptable results, a balance between the two is often required.

Contrast and Modulation Transfer Function

It is often incorrectly assumed that the imaging system can see a number of objects equal to its number of pixels. This is not the case, as object detection, usually facilitated through edge detection, requires a line pair, which is an area of contrast to compare the object with.

When difference between two colors is highly distinguishable, high contrast is achieved; the simplest and highest contrast example being white and black. When contrast is lowered, it is difficult to distinguish the ending of one color (or object) and the beginning of another. An example of this would be a gradient of shades of gray between white and black.

As a sensor requires a high contrast space between objects for proper detection, the ideal scenario is that the camera can see a certain number of objects equal to half of its number of pixels, as defined by the Nyquist limit. In this case, each object would line up with a pixel on the image sensor, and the high-contrast space between them with adjacent pixels.

The figure below illustrates where the image sensor (in blue) in the left image does not have enough pixels to distinguish the two objects – they appear as a single, larger object. The image sensor on the right has enough pixels to resolve the two objects, in addition to the space between them, and accordingly, conforms to the Nyquist limit.

However, there are additional factors within the design of the lens that influence the spatial frequency and the size with which the objects appear in the frame for proper detection. These factors are shown in the lens’ Modulation Transfer Function (MTF) versus frequency graph, which plots the contrast seen in the recorded image to the spatial frequency of an object.

The MTF indicates the spatial frequency in line pairs per millimeter (lp/mm), and the contrast in terms of percentage. An example plot is given below, where blue annotation arrows help to describe the values on each axis.

Modulation Transfer is a function that illustrates the system’s contrast in relation to the spatial frequency of objects to be imaged.

During lens selection, the price of the lens does not always indicate its performance. Although a $1,000 lens will generally give a better performance than a $100 lens, a lens priced at $1,100 may not be better than one priced at $1,000.

It is important that the lens’ performance is matched to the intended use, as lenses work within a specific range of working distances and the MTF performance can vary beyond these distances. MTF curves are helpful in lens selection, as they combine resolution performance and contrast into a single specification. However, the lens must be tested with the prospective camera, so that required performance is met for a specific application.

Depth of Field

Depth of field is the range of distance in front of the camera where the target remains in focus.

In some applications, an object would have to be in focus within a varying working distance, without changing the focus of the lens. This can be achieved by taking advantage of a lens' depth of field.

The depth of field is directly related to the opening of the lens’ aperture, measured in f-stop or f-number. The number (N) is expressed as a ratio between the focal length of the lens (f) and the opening diameter (D) of the aperture, as shown below:

The f-number can be written as f:N, 1:N, or f/N, and is often found on the lens. As the diameter is in the denominator of the fraction, a larger f-number equates to a smaller aperture opening, so that less light is allowed to pass through the lens.

Other than regulating the amount of light passing through the lens, the size of the aperture also impacts the depth of field and the resolution of the image. When the f-number is increased and the aperture size reduces, high incidence angle oblique light rays are unable to pass through the lens.

An image become blurry outside of its depth of field, due to the lens aberrations associated with these oblique light rays. When these oblique light rays are eliminated, the depth of field of the image can increase to cover a larger range.

Conversely, when the f-number is decreased and the resolution of the image within its depth of field increases, a sharper image is created. This is because the diffraction limit is increased. The light rays diverge from their incident axis and spread out, after passing through the aperture. When the aperture diameter is increased, this effect is reduced, which enable tighter grouping of the incident rays, and results in a higher resolution image.

A trade-off between resolution for the imaging system and the acceptable depth of field needs to be determined because the aperture drives these phenomena in opposite directions.

Telecentric Lenses

The field of view from most lenses spans outward in a conical shape. However, telecentric lenses collect light only from the area directly in front of them, and have an angular FoV of 0°. Their FoV is equal to that of the diameter of the lens.

Telecentric lenses are immune to parallax error – the perceived difference in size of two identical objects that are at different distances from the observer. The immunity is especially advantageous in machine vision applications when  measuring objects that are at different distances from the vision system. The reliability of the measurements is improved as all objects will appear to be of the same size on screen.

Field of view (FoV)

Lens Compatibility

Cameras are often designed with a specific lens type in order to satisfy particular design criteria, such as performance, price, form factor, or other application-specific requirements. Lenses are distinguished by many specifications, such as the mounting method, image sensor optical format compatibility, and back focal distance.

Lens Mounts

The way in which the lens interfaces with the camera is known as the mounting method, or the mount type. C-mount lens is the most common mount type, and has been in production since 1926. A simple screw thread mechanism is used by C-mount lenses to fix the lens to the camera.

A turn and lock mechanism, also known as a bayonet-style mount, is employed by F-mount and EF-mount lenses, which are used respectively for Nikon and Canon DSLRs, as well as other cameras with large format sensors.

The CS-mount lens is similar to the C-mount lens. Although the physical size and shape of the screw thread mechanism of these lenses are identical, but they differ in back flange focal distance, which is the distance between the image sensor and the rear of the lens.

The flange focal distance of CS-mount lenses is 12.562 mm, exactly 5 mm less than that of C-mount lenses. To focus correctly, C-mount lenses require the sensor to be further back, in order to accommodate a C-mount lens in a CS-mount camera, a spacer ring is usually added.

The best practice is to use the intended lens mount style for a particular camera. However, a different format of lens can be used by employing a lens adaptor if the camera’s flange-to-sensor distance is less than the back focal distance of the lens.

Sensor Size

During lens selection, the maximum supported sensor size must appropriately match the camera. Failure to do this may result in a phenomenon known as vignetting. This is the mechanical clipping of the outer light rays, which occurs when the circle of focused light produced by the lens does not cover the entire area of the sensor.

As a result, the corners of the image become partially shaded or fully black, depending on the extent of the phenomenon. Vignetting can also occur when using an adaptor to change the lens’ mount style; using a filter on a wide angle lens; or using an extremely wide angle lens, such as a fish-eye lens.

Source: Wikipedia

Download the Complete Guide to Designing a Vision System

Camera Technology

Considering the factors discussed in the earlier section, a camera should be selected with the lens in mind. The application for the vision system generally dictates many of the camera’s requirements, such as size and weight, resolution, low light performance, and color accuracy. These factors are often linked.

Image Sensor

Most camera's specifications stem from the image sensor that it is built around. The sensor dictates the resolution, low light performance, sensitivity, dynamic range, and frame rate. During sensor design, these factors are considered and trade-offs are made, and it is important to select the best camera and sensor for the application.

However, the performance levels of two cameras that feature the same image sensor would not necessarily be identical, as there will be variations in implementation methods and the system design employed to integrate the sensor.

There are different sensor types and sizes. The first major distinction is CCD and CMOS sensors.

CCD technology was the first to be used for digital cameras, and is considered the more refined of the two. CCD sensors are more light-sensitive and generate more accurate images, although they are generally more expensive.

However, CMOS sensors have caught with the CCD technology, and in some applications they are superior to CCDs. Some CMOS sensors produce less noisy images, and are more sensitive than their CCD counterparts.

CCD sensors suffer from a behavioral drawback known as “blooming” where bright spots in an image saturate the pixels and cause them to “overflow” to the adjacent pixels in the row, creating a streak in the image, as seen in the image below.

Source: Wikipedia

Sensor Shutter

The shutter type – global or rolling – is also a major sensor differentiator. In a rolling shutter, each line of the sensor is exposed one at a time, in rapid succession. In global shutters, all pixels are exposed simultaneously.

Rolling shutters are exposed to smearing or blurring of an object if the object moves at a high rate of speed through the frame during exposure. In global shutters, these artifacts are eliminated as the shutters freeze the moving objects in place by exposing every pixel simultaneously.

Traditionally, CCD sensors functioned with global shutters, and CMOS sensors used rolling shutters. However, as CMOS technology begins to transcend CCD, high-quality, global shutter CMOS sensors are now available, for example Sony’s Pregius sensor.

Source: Wikipedia

Sensor Format

Camera sensors are available in different form factors, known as optical formats. Common optical formats are designated in fractions of an inch, including 1/3”, 1/2”, 2/3”, and 1” sensors.

These format types however, are the historical equivalent of cameras employing vacuum tubes, and do not relate to the physical sizing of the sensor. The relationship between the physical dimensions of the sensors and their optical format is illustrated below.

35 mm Sensors

Full frame sensors, also called 35 mm sensors, are another common optical format in machine vision systems. Here, 35 mm denotes the size of 35 mm film, with matching dimensions of the sensor, with its horizontal edge being approximately 35 mm in length. If size is not a design constraint, larger sensors are usually preferred for vision systems. They also allow for larger pixel size or higher pixel density, although they are more expensive.


Higher pixel density also means an increase in the camera’s resolution, i.e. the camera’s capability to resolve an object in fine detail and produce sharp and clear images.

If the pixels are larger, then the camera will have a higher dynamic range and a higher sensitivity to light (assuming constant noise). More photons can be collected over the surface of each pixel when the area of each pixel is increased, making the camera more sensitive to light.

If the pixels are larger, it translates to an increase in the percentage of the sensor’s light-sensitive area being used in each pixel, enabling more photons to be converted to electrons for each picture element.

Therefore, a trade-off exists between sensitivity and resolution. Larger pixels make the camera more light-sensitive, and smaller pixels over the same area enables the camera to see smaller objects and finer details.


The image sensor also enables a color camera to be distinguished from a monochrome camera. Color image sensors are primarily monochrome sensors in which a color filter array (CFA) is placed over the imager.

The most commonly used CFA is the Bayer pattern, where odd-numbered rows of pixels alternate between green and blue filters, and the even-numbered ones between red and green. The camera’s demosaicing algorithm interpolates the missing color data for each pixel from surrounding pixels.

Bayer pattern: A color filter array for arranging RGB color filters on a square grid of photosensors to create an accurate color image.

Source: Wikipedia

Download the Complete Guide to Designing a Vision System

Camera Build Quality

Selecting a camera with a sensor that satisfies the application’s requirements is only a part of the process. To maximize the potential of the image sensor, the camera’s overall build quality should also be considered. A well-built camera should provide reliable and robust operation that produces clear and accurate images, without dropping frames.

Quantum Efficiency

Quantum Efficiency is the efficiency with which the camera converts the incoming photons of light into electrons on the image sensor.

Quantum Efficiency (QE) curves allow comparison of cameras’ sensitivity. When the curves are compared, it must be ensured that the camera manufacturer is quoting the camera’s QE. The QE curve provided by the sensor manufacturer should not be used as the camera’s final form results in the reduction of quantum efficiency because of the addition of elements in the optical path.

Quantum efficiency is plotted as a wavelength-dependent function, highlighting the percentage incident of photons that are converted into electrons by the sensor and then read out by the camera. The figure below shows that a camera’s peak efficiency is generally in the 500 nm range, corresponding to green light.

The peak QE is often quoted by manufacturers on datasheets to provide a general idea of the camera’s efficiency. However, instead of using this value as the sole point of comparison between cameras, the entire curve should be used to ensure that optimum efficiency is attained for the working wavelength(s) to which the application will be subjected.

When comparing the QE curves of monochrome and color cameras, monochrome cameras are more sensitive to light because wavelengths are not restricted from hitting each pixel. Therefore, it is advisable to choose a camera with a monochrome sensor and exploit its higher sensitivity if color images are not absolutely necessary.

Additionally, monochrome cameras are sensitive to the near-infrared (NIR) spectrum because they do not have an NIR-blocking filter, which is found in color cameras.

Color cameras have an NIR-blocking filter because color pixels are sensitive to NIR light in varying degrees, and this can degrade color information because NIR light would adversely affect the accuracy of the color channels. Therefore, only monochrome cameras need to be selected for any application requiring NIR imaging, as they are capable of seeing past the visible light spectrum.

Color Reproduction

The build quality of a camera also depends on its capability to accurately reproduce color with minimal artifacts. Through a process called demosaicing, a color camera interpolates the missing color data from two color channels using surrounding pixels.

Different levels of thoroughness can typically be selected at a cost of computing cycles based on the selected demosaicing algorithm. Algorithms that are less accurate can introduce image artifacts such as moiré effect, false colors, and zippering as well as sharp changes in contrast, for example, dark lines on a white background.

A camera should be evaluated using targets designed to highlight these artifacts and ensure proper image reproduction. The 1951 USAF resolution test chart, and other such targets with varying degrees of high frequency patterns, can detect occurrences of false color, while ColorChecker targets can calibrate the camera and ensure accurate color reproduction.

Source: Wikipedia

Source: Wikipedia

Color verification targets should be purchased from a supplier and not simply printed, because color accuracy varies across printers, and all printers cannot print at a high resolution.

Camera Framerate

The design quality of a camera is also indicated by its maximum frame rate. Some cameras start to drop frames if the frame rate is set too high as they are not able to deal with the increased data payload. In other cameras, the CCDs can be overclocked to obtain higher frame rates, but this can influence the camera’s performance.

Well-built cameras can sustain these accelerated rates with no dropped frames or noise increase. It is better to test and compare the cameras because some manufacturers list the sensor’s maximum frame rate as their own.

GPIO and Reliability

To enable communication with the outside world, a camera is typically supplied with a number of general purpose input and output (GPIO) channels. It is important to satisfy the application’s requirements in terms of quantity of GPIOs, reliability, and determinism.

The selection of an industrial-grade camera over a commercial-grade point-and-shoot camera, or DSLR, for the application will result in a more reliable communication process. A commercial-grade camera often has a substantial lag between the trigger signal and the shutter actuation, which can lead to problems if used for an application where an event takes place for only a short period of time.

The trigger delay of a commercial-grade camera also has a relatively high level of variance; preventing a system-wide delay to be used to compensate for the camera’s lag. Industrial-grade cameras are designed to run 24 hours a day, and deterministically and reliably trigger as part of a complete imaging system. They can also trigger external events, for example, camera flashes or other hardware triggerable events that are usually a part of an application.

Download the Complete Guide to Designing a Vision System

USB 3.0 Interface

Although there are many interface options available, this article focuses on USB 3.0 because it does not need to utilize a frame grabber and, due to its plug-and-play nature, it is the easiest interface to use for machine-vision applications. USB is commonly available as a standard computing interface, and is easily understood both by consumers and vision professionals.

Speed and Distance

USB 3.0 can attain bus speeds of up to 5 Gbps. Considering overhead, the interface enables transfer speeds of roughly 400 MBps of image throughput. USB 3.0 cables can also support the newly-developed USB 3.1 protocol capable of bus speeds of 10 Gbps and throughput of more than double of that of USB 3.0, thanks to the standardized mechanical configuration of the new USB cables.

This is because the encoding overhead is reduced to 3% from 20%, when USB 3.0 as a 128b/132b encoding scheme is employed. In other words, for every 128 bits of data which need to be sent, 132 bits of data are sent, enabling a more robust transmission.

USB 3.0 and USB 3.1 use Direct Memory Access (DMA), and benefit from CPU offloading, which requires significantly less processing resources from the host CPU, and therefore allows it to perform image analysis, image post-processing, or other unrelated tasks.

It is incorrectly believed that for USB 3.0 applications, the maximum permissible cable length cannot be more than 5 m, and is this rather short. This is true for passive USB cables made of copper, so there are solutions are available that greatly extend the range of USB 3.0.

The range of USB3 can be extended using two main approaches. The first method involves the use of an active cable, where embedded electronics within the cable guarantee signal integrity at lengths of up to 20 m. The transmission range can be further extended by a combination of a passive and active cable.

The second approach involves the use of a fiber optic extender cable, which achieves a transmission distance of up to 100 m. However, the fiber optic cable cannot carry power, and will have to be injected at the far end to power the camera.

Combinations of these extension methods can be coupled with a powered USB 3.0 hub. Passive and active cables can be connected to a powered hub to deliver power and connectivity to the cameras. A fiber optic extender can be used to connect the hub with the far end computing host, enabling the data to be aggregated from all the cameras on the hub onto a single connection.

This means in effect that each camera receives 1/N of the total available bandwidth of the fiber connection to the host computer; N denotes the number of cameras connected to the hub.

The build quality of the USB hubs and cables should also be considered because low cable quality can affect the overall performance of the system. Whenever available, locking cables must be used to prevent inadvertent camera disconnection. The recommendations of the camera manufacturer need to be followed when cabling is selected for any application.

USB3 Vision

USB3 Vision is the newest machine vision standard and was released in January 2013. It was created exclusively for the machine vision industry as many companies pooled in their expertise to create a standard, which is plug-and-play and compatible with all newly manufactured computing systems, and has a high level of performance.

USB3 Vision was created by keeping the Generic Interface for Cameras (GenICam) standard in mind, and allows cameras using the interface to be seamlessly integrated into a system, which uses this standard.

Such an off-the-shelf hardware approach leverages hardware DMA to transfer images from the camera’s hardware to user buffers, and does not require frame grabbers or specialized cables. Together with the new encoding scheme of USB3.1, USB3 Vision is the fastest transfer protocol for a camera that does not need a frame grabber.

Download the Complete Guide to Designing a Vision System


This article presented an overview of high level requirements as a first step to identify the correct set up for choosing a camera and a lens for a vision system, although it is not an exclusive list of considerations.

Understanding specific imaging needs, Lumenera can help users get the most out of a camera for any imaging application. All Lumenera cameras have a four year warranty and are supported by an experienced team of imaging and technical support experts.


1 These calculations are approximate and the actual working distance should be measured when the complete system is put into place.

This information has been sourced, reviewed and adapted from materials provided by Lumenera Corporation.

For more information on this source, please visit Lumenera Corporation.

Ask A Question

Do you have a question you'd like to ask regarding this article?

Leave your feedback