Unsupervised anomaly detection from high dimensional data like mobility networks is a challenging task. Study of different approaches of feature engineering from such high dimensional data have been a focus of research in this field. This study aims to investigate the transferability of features learned by network classification to unsupervised anomaly detection. We propose use of an auxiliary classification task to extract features from unlabelled data by supervised learning, which can be used for unsupervised anomaly detection. We validate this approach by designing experiments to detect anomalies in mobility network data from New York and Taipei, and compare the results to traditional unsupervised feature learning approaches of PCA and autoencoders. We find that our feature learning approach yields best anomaly detection performance for both datasets, outperforming other studied approaches. This establishes the utility of this approach to feature engineering, which can be applied to other problems of similar nature.