Data is gold: the importance of recordings and labelling for AI-infused vision development
Recent publication by Andrew NG * (one of most renamed machine learning and education pioneers) highlights the importance of data for progress in AI. As he explains: “Unlike traditional software, which is powered by code, AI systems are built using both code (including models and algorithms) and data:
AI systems = Code (model/algorithm) + Data”
While historical approaches typically tried to improve the Code (either the model architecture or the algorithm), now we know that “for many practical applications, it’s more effective instead to focus on improving the Data”. Generate bigger and better databases is often the most straightforward way to boost AI results. The so-called “data-centric AI development” is gaining ground. For those who, as we at Sadako technologies, are devoted to generating Neural Networks for vision applications, building high-quality datasets, repeatable and systematic, to ensure excellent, consistent flow of data throughout all stages of a project is a key activity.
Our data generation process has two main steps: the image acquisition and the image labelling. We have carefully taken care of both for the development of the vision systems in the HR-Recycler project, that need to recognize WEEE objects and its components, and human motion and gestures. For image acquisition, we have prepared and performed the following recording campaigns (last one is still ongoing):
-Campaign 1 (organized with CERTH in Ecoreset’s premises)
Figure 1: Images from the July 2019 classification recordings in Ecoreset (left and centre camera)
– Campaign 2 (organized with CERTH in Ecoreset’s premises)
Figure 2: Images from the September 2019 classification recordings in Ecorest
– Campaign 3
Figure 3: Images from the December 2019 classification recordings in Indumetal
Figure 4: Sample images from the December 2019 Indumetal recordings. Time increases in the right-hand direction.
– Campaign 4
Figure 5: Sample images from the March 2020 recordings at Sadako’s premises.
- Campaign 5
Figure 6: Sample images from the June 2021 recordings at Indumetal’s premises.
Special attention was taken to the choice of hardware, as well as replicating environmental conditions (background, lighting) as close as possible to those found in operation. For human motion detection datasets, a special attention has been given to possible gender o race bias in the data collection that could harm the neural network operational performance.
On the labelling side, our internal labelling team, one of most skilled and experienced image labelling teams in the waste domain, with the help of own proprietary labelling tools, has fulfilled the task to generate multiple homogeneous high-quality annotations for the different categories established in WEEE objects and in human motion and gestures.
Accurate recordings and excellent labelling guarantee a smooth algorithm production and is critical for the system to work properly.