Automating Pick & Place Tasks with AI

It would be difficult to deny that AI is already superior to humans in many areas: it is robust, fast, practically error-free and does not need to take a break. This superiority is particularly relevant where work processes need to be carried out continuously with consistent and reliable quality and high performance.

Automating Pick & Place Tasks with AI

Image Credit: IDS Imaging Development Systems GmbH

One reason why one would potentially use AI in the machine vision environment is to make processes more efficient and cost-effective. The use case of the “Vision Guided Robot” demonstrates how intelligent automation of typical pick & place tasks can be facilitated by a robot and an embedded AI vision camera, rendering even a PC redundant.

Different disciplines must work together optimally for “smart gripping.” For instance, if the task is using robots to sort products by different shape, material, size or quality, then they must be gripped and identified, analyzed, and localized beforehand.

This is often not only very time-consuming when it comes to rule-based image processing systems, particularly in small batch sizes, but is also understandably hardly economically feasible.

Robots, in combination with AI-based inference, can already be equipped with the necessary product knowledge and skills of an accomplished worker. It is fair to say that, for individual subtasks, monumental leaps in technology development are no longer necessary: it is simply enough to have the right products working together efficiently in an interdisciplinary manner as a “smart robot vision system."

EyeBot Use Case

Objects are randomly scattered on a conveyor belt in a production line. The objects must first be detected, then selected, and, for instance, wrapped in packaging or passed on in the correct position for more processing or analysis at a relevant station.

urobots GmbH, the software company, has developed a PC-based solution for controlling robots and detecting objects. The AI model is highly trained and was able to recognize the orientation and position of objects in camera images, from which it was able to determine which grip coordinates for a robot.

This led to the next goal: to migrate this solution to the AI-based embedded vision system from IDS Imaging Development Systems GmbH. Two things that were essential to consider when developing this solution, according to urobots, were:

  1. The user needed to be able to easily adapt the system for different use cases without needing to have any special AI expertise. The net result of this means that function is possible even if, for example, something changes in production – including the appearance of objects, the lighting, or even if additional object types need to be integrated.
  2. The overall system needed to be completely PC-less: through direct communication of the device components so that it can be both cost-effective and light and space-saving.

IDS already offers both requirements with the IDS NXT inference camera system

All image processing runs on the camera, which communicates directly with the robot via Ethernet. This is made possible by a vision app developed with the IDS NXT Vision App Creator, which uses the IDS NXT AI core. The Vision App enables the camera to locate and identify pre-trained (2D) objects in the image information. For example, tools that lie on a plane can be gripped in the correct position and placed in a designated place. The PC-less system saves costs, space and energy, allowing for easy and cost-effective picking solutions.

Alexey Pavlov, Managing Director, urobots GmbH)

 Position Detection and Direct Machine Communication

A trained neural network is able to identify all objects in the image as well as their position and orientation. The AI is able to do this when there is a lot of natural variances, such as with food, plants or other flexible objects, as well as when there are fixed objects that always look the same.

This results in an orientation recognition of the objects and a very stable position. The network has been trained for the customer by urobots GmbH, with its own software and , and finally, it has been uploaded to the IDS NXT camera. 

The network had to be translated into a special optimized format that resembles a kind of "linked list" to complete this stage.

The IDS NXT ferry tool provided by IDS made porting the trained neural network for use in the inference camera very easy. Throughout the process, each layer of the CNN network becomes a node descriptor that accurately describes each layer. The end result: a complete concatenated list of the CNN, represented in binary.

Specifically developed for the camera and based on an FPGA, the CNN accelerator IDS NXT ocean core can then optimally execute this universal CNN.

Optimal grip positions for a robot are then calculated from the detection data by use of the vision app developed by urobots – although, this did not provide a solution to the task. Direct communication had to be established between the IDS NXT camera and the robot, in addition to the results of what, where and how to grip. 

It is important that this task, in particular, should not be underestimated. This decision is often the crucial point that determines how much money, time and manpower needs to be invested in a solution. An XMLRPC-based network protocol was implemented by urobots in the camera’s vision app with the IDS NXT Vision App Creator to pass on the concrete work instructions directly to the robot. 

The final AI vision app achieves a positional accuracy of +/- 2 degrees and detects objects in about 200 milliseconds.

The neural network in the IDS NXT camera localises and detects the exact position of the objects. Based on this image information, the robot can independently grasp and deposit them.

Figure 1. The neural network in the IDS NXT camera localizes and detects the exact position of the objects. Based on this image information, the robot can independently grasp and deposit them. Image Credit: IDS Imaging Development Systems GmbH

PC-Less: More than Merely Artificially Intelligent

It is not just the artificial intelligence that renders this use case so intelligent. There are two more interesting aspects that enable this solution to run without an additional PC. The first of these is that, given that the camera itself does not simply deliver images but generates image processing results, the PC hardware and all its associated infrastructure can be dispensed. 

This, of course, ultimately minimizes the acquisition and maintenance expenses of the system. It is also often important that process decisions are made directly at the production site, i.e. "in time". The following processes can therefore be executed faster and without latency, which in some cases also enables an increase in the clock rate.

Another interesting aspect concerns the development costs. AI vision, in other words, the training of the network, is not in the typical rule-based, classical, image processing manner, which also changes the handling and approach of image processing tasks. 

The result quality is no longer determined by manually developed program code written by image processing experts and application developers. In other words, IDS NXT can also save the respective user time and money if an application can be solved with AI.

This is because, with the user-friendly and comprehensive software environment, each user is able to train a neural network, plan the corresponding vision app and accomplish it on the camera.  


This EyeBot use case has demonstrated the future for computer visions: how they can become PC-less embedded AI vision applications. 

There are further advantages to the small embedded system, such as expandability through the vision app-based concept, the development of applications for different target groups, along with end-to-end manufacturer support.

In EyeBot, the competences are clearly distributed in an application. The user’s attention is able to stay with the product in question, while IDS and urobots focus on training and running the AI to achieve image processing and controlling the robot.

Another advantage is that the vision app can also be easily adapted for other objects, other robot models and thus for many other similar applications through Ethernet-based communication and the open IDS NXT platform.

Pick and Place with IDS NXTPlay

Image Credit: IDS Imaging Development Systems GmbH

This information has been sourced, reviewed and adapted from materials provided by IDS Imaging Development Systems GmbH.

For more information on this source, please visit IDS Imaging Development Systems GmbH.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    IDS Imaging Development Systems GmbH. (2022, November 23). Automating Pick & Place Tasks with AI. AZoOptics. Retrieved on July 24, 2024 from

  • MLA

    IDS Imaging Development Systems GmbH. "Automating Pick & Place Tasks with AI". AZoOptics. 24 July 2024. <>.

  • Chicago

    IDS Imaging Development Systems GmbH. "Automating Pick & Place Tasks with AI". AZoOptics. (accessed July 24, 2024).

  • Harvard

    IDS Imaging Development Systems GmbH. 2022. Automating Pick & Place Tasks with AI. AZoOptics, viewed 24 July 2024,

Tell Us What You Think

Do you have a review, update or anything you would like to add to this article?

Leave your feedback
Your comment type

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.