Abstract:Natural Language Understanding (NLU) models are typically trained in a supervised learning framework. In the case of intent classification, the predicted labels are predefined and based on the designed annotation schema while the labelling process is based on a laborious task where annotators manually inspect each utterance and assign the corresponding label. We propose an Active Annotation (AA) approach where we combine an unsupervised learning method in the embedding space, a human-in-the-loop verification process, and linguistic insights to create lexicons that can be open categories and adapted over time. In particular, annotators define the y-label space on-the-fly during the annotation using an iterative process and without the need for prior knowledge about the input data. We evaluate the proposed annotation paradigm in a real use-case NLU scenario. Results show that our Active Annotation paradigm achieves accurate and higher quality training data, with an annotation speed of an order of magnitude higher with respect to the traditional human-only driven baseline annotation methodology.
Abstract:Insitu sensors and Wireless Sensor Networks (WSNs) have become more and more popular in the last decade, due to their potential to be used in various applications of many different fields. As of today, WSNs are pretty much used by any monitoring system: from those that are health care related, to those that are used for environmental forecasting or surveillance purposes. All applications that make use of insitu sensors, strongly rely on their correct operation, which however, is quite difficult to guarantee. These sensors in fact, are typically cheap and prone to malfunction. Additionally, for many tasks (e.g. environmental forecasting), sensors are also deployed under potentially harsh weather condition, making their breakage even more likely. The high probability of erroneous readings or data corruption during transmission, brings up the problem of ensuring quality of the data collected by sensors. Since WSNs have to operate continuously and therefore generate very large volumes of data every day, the quality control process has to be automated, scalable and fast enough to be applicable to streaming data. The most common approach to ensure the quality of sensors data, consists in automated detection of erroneous readings or anomalous behaviours of sensors. In the literature, this strategy is known as anomaly detection and can be pursued in many different ways.