Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marco Wiering

Active learning for reducing labeling effort in text classification tasks

Sep 10, 2021

Pieter Floris Jacobs, Gideon Maillette de Buy Wenniger, Marco Wiering, Lambert Schomaker

Figure 1 for Active learning for reducing labeling effort in text classification tasks

Figure 2 for Active learning for reducing labeling effort in text classification tasks

Figure 3 for Active learning for reducing labeling effort in text classification tasks

Figure 4 for Active learning for reducing labeling effort in text classification tasks

Abstract:Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art NLP models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERT$_{base}$ as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERT$_{base}$ outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.

Via

Access Paper or Ask Questions

LemgoRL: An open-source Benchmark Tool to Train Reinforcement Learning Agents for Traffic Signal Control in a real-world simulation scenario

Mar 30, 2021

Arthur Müller, Vishal Rangras, Georg Schnittker, Michael Waldmann, Maxim Friesen, Tobias Ferfers, Lukas Schreckenberg, Florian Hufen, Jürgen Jasperneite, Marco Wiering

Figure 1 for LemgoRL: An open-source Benchmark Tool to Train Reinforcement Learning Agents for Traffic Signal Control in a real-world simulation scenario

Figure 2 for LemgoRL: An open-source Benchmark Tool to Train Reinforcement Learning Agents for Traffic Signal Control in a real-world simulation scenario

Figure 3 for LemgoRL: An open-source Benchmark Tool to Train Reinforcement Learning Agents for Traffic Signal Control in a real-world simulation scenario

Figure 4 for LemgoRL: An open-source Benchmark Tool to Train Reinforcement Learning Agents for Traffic Signal Control in a real-world simulation scenario

Abstract:Sub-optimal control policies in intersection traffic signal controllers (TSC) contribute to congestion and lead to negative effects on human health and the environment. Reinforcement learning (RL) for traffic signal control is a promising approach to design better control policies and has attracted considerable research interest in recent years. However, most work done in this area used simplified simulation environments of traffic scenarios to train RL-based TSC. To deploy RL in real-world traffic systems, the gap between simplified simulation environments and real-world applications has to be closed. Therefore, we propose LemgoRL, a benchmark tool to train RL agents as TSC in a realistic simulation environment of Lemgo, a medium-sized town in Germany. In addition to the realistic simulation model, LemgoRL encompasses a traffic signal logic unit that ensures compliance with all regulatory and safety requirements. LemgoRL offers the same interface as the well-known OpenAI gym toolkit to enable easy deployment in existing research work. Our benchmark tool drives the development of RL algorithms towards real-world applications. We provide LemgoRL as an open-source tool at https://github.com/rl-ina/lemgorl.

* Submitted to IEEE International Conference on Intelligent Transportation Systems (ITSC2021)

Via

Access Paper or Ask Questions

Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

Nov 01, 2020

Xiangxie Zhang, Ben Beinke, Berlian Al Kindhi, Marco Wiering

Figure 1 for Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

Figure 2 for Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

Figure 3 for Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

Figure 4 for Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

Abstract:The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art algorithms, namely Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic Models, are used for the task of DNA classification. Furthermore, we introduce a novel feature extraction method based on the Levenshtein distance and randomly generated DNA sub-sequences to compute information-rich features from the DNA sequences. We also use an existing feature extraction method based on 3-grams to represent amino acids and combine both feature extraction methods with a multitude of machine learning algorithms. Four different data sets, each concerning viral diseases such as Covid-19, AIDS, Influenza, and Hepatitis C, are used for evaluating the different approaches. The results of the experiments show that all methods obtain high accuracies on the different DNA datasets. Furthermore, the domain-specific 3-gram feature extraction method leads in general to the best results in the experiments, while the newly proposed technique outperforms all other methods on the smallest Covid-19 dataset

* 17 pages

Via

Access Paper or Ask Questions

Comparing Generative Adversarial Network Techniques for Image Creation and Modification

Mar 24, 2018

Mathijs Pieters, Marco Wiering

Figure 1 for Comparing Generative Adversarial Network Techniques for Image Creation and Modification

Figure 2 for Comparing Generative Adversarial Network Techniques for Image Creation and Modification

Figure 3 for Comparing Generative Adversarial Network Techniques for Image Creation and Modification

Figure 4 for Comparing Generative Adversarial Network Techniques for Image Creation and Modification

Abstract:Generative adversarial networks (GANs) have demonstrated to be successful at generating realistic real-world images. In this paper we compare various GAN techniques, both supervised and unsupervised. The effects on training stability of different objective functions are compared. We add an encoder to the network, making it possible to encode images to the latent space of the GAN. The generator, discriminator and encoder are parameterized by deep convolutional neural networks. For the discriminator network we experimented with using the novel Capsule Network, a state-of-the-art technique for detecting global features in images. Experiments are performed using a digit and face dataset, with various visualizations illustrating the results. The results show that using the encoder network it is possible to reconstruct images. With the conditional GAN we can alter visual attributes of generated or encoded images. The experiments with the Capsule Network as discriminator result in generated images of a lower quality, compared to a standard convolutional neural network.

* 20 pages, 23 figures

Via

Access Paper or Ask Questions