Abstract:Unsupervised anomaly detection from high dimensional data like mobility networks is a challenging task. Study of different approaches of feature engineering from such high dimensional data have been a focus of research in this field. This study aims to investigate the transferability of features learned by network classification to unsupervised anomaly detection. We propose use of an auxiliary classification task to extract features from unlabelled data by supervised learning, which can be used for unsupervised anomaly detection. We validate this approach by designing experiments to detect anomalies in mobility network data from New York and Taipei, and compare the results to traditional unsupervised feature learning approaches of PCA and autoencoders. We find that our feature learning approach yields best anomaly detection performance for both datasets, outperforming other studied approaches. This establishes the utility of this approach to feature engineering, which can be applied to other problems of similar nature.
Abstract:Gender bias exists in natural language datasets which neural language models tend to learn, resulting in biased text generation. In this research, we propose a debiasing approach based on the loss function modification. We introduce a new term to the loss function which attempts to equalize the probabilities of male and female words in the output. Using an array of bias evaluation metrics, we provide empirical evidence that our approach successfully mitigates gender bias in language models without increasing perplexity. In comparison to existing debiasing strategies, data augmentation, and word embedding debiasing, our method performs better in several aspects, especially in reducing gender bias in occupation words. Finally, we introduce a combination of data augmentation and our approach, and show that it outperforms existing strategies in all bias evaluation metrics.