Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thomas Soumarmon

Dataset Definition Standard (DDS)

Jan 07, 2021

Cyril Cappi, Camille Chapdelaine, Laurent Gardes, Eric Jenn, Baptiste Lefevre, Sylvaine Picard, Thomas Soumarmon

Figure 1 for Dataset Definition Standard (DDS)

Figure 2 for Dataset Definition Standard (DDS)

Figure 3 for Dataset Definition Standard (DDS)

Abstract:This document gives a set of recommendations to build and manipulate the datasets used to develop and/or validate machine learning models such as deep neural networks. This document is one of the 3 documents defined in [1] to ensure the quality of datasets. This is a work in progress as good practices evolve along with our understanding of machine learning. The document is divided into three main parts. Section 2 addresses the data collection activity. Section 3 gives recommendations about the annotation process. Finally, Section 4 gives recommendations concerning the breakdown between train, validation, and test datasets. In each part, we first define the desired properties at stake, then we explain the objectives targeted to meet the properties, finally we state the recommendations to reach these objectives.

Via

Access Paper or Ask Questions

Ensuring Dataset Quality for Machine Learning Certification

Nov 03, 2020

Sylvaine Picard, Camille Chapdelaine, Cyril Cappi, Laurent Gardes, Eric Jenn, Baptiste Lefèvre, Thomas Soumarmon

Figure 1 for Ensuring Dataset Quality for Machine Learning Certification

Figure 2 for Ensuring Dataset Quality for Machine Learning Certification

Figure 3 for Ensuring Dataset Quality for Machine Learning Certification

Figure 4 for Ensuring Dataset Quality for Machine Learning Certification

Abstract:In this paper, we address the problem of dataset quality in the context of Machine Learning (ML)-based critical systems. We briefly analyse the applicability of some existing standards dealing with data and show that the specificities of the ML context are neither properly captured nor taken into ac-count. As a first answer to this concerning situation, we propose a dataset specification and verification process, and apply it on a signal recognition system from the railway domain. In addi-tion, we also give a list of recommendations for the collection and management of datasets. This work is one step towards the dataset engineering process that will be required for ML to be used on safety critical systems.

* The 10th IEEE International Workshop on Software Certification (WoSoCer 2020)

Via

Access Paper or Ask Questions