Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Austin Walters

Sensitive Data Detection with High-Throughput Neural Network Models for Financial Institutions

Dec 17, 2020

Anh Truong, Austin Walters, Jeremy Goodsitt

Figure 1 for Sensitive Data Detection with High-Throughput Neural Network Models for Financial Institutions

Figure 2 for Sensitive Data Detection with High-Throughput Neural Network Models for Financial Institutions

Figure 3 for Sensitive Data Detection with High-Throughput Neural Network Models for Financial Institutions

Figure 4 for Sensitive Data Detection with High-Throughput Neural Network Models for Financial Institutions

Abstract:Named Entity Recognition has been extensively investigated in many fields. However, the application of sensitive entity detection for production systems in financial institutions has not been well explored due to the lack of publicly available, labeled datasets. In this paper, we use internal and synthetic datasets to evaluate various methods of detecting NPI (Nonpublic Personally Identifiable) information commonly found within financial institutions, in both unstructured and structured data formats. Character-level neural network models including CNN, LSTM, BiLSTM-CRF, and CNN-CRF are investigated on two prediction tasks: (i) entity detection on multiple data formats, and (ii) column-wise entity prediction on tabular datasets. We compare these models with other standard approaches on both real and synthetic data, with respect to F1-score, precision, recall, and throughput. The real datasets include internal structured data and public email data with manually tagged labels. Our experimental results show that the CNN model is simple yet effective with respect to accuracy and throughput and thus, is the most suitable candidate model to be deployed in the production environment(s). Finally, we provide several lessons learned on data limitations, data labelling and the intrinsic overlap of data entities.

Via

Access Paper or Ask Questions

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Sep 03, 2019

Anh Truong, Austin Walters, Jeremy Goodsitt, Keegan Hines, C. Bayan Bruss, Reza Farivar

Figure 1 for Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Figure 2 for Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Figure 3 for Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Figure 4 for Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools

Abstract:There has been considerable growth and interest in industrial applications of machine learning (ML) in recent years. ML engineers, as a consequence, are in high demand across the industry, yet improving the efficiency of ML engineers remains a fundamental challenge. Automated machine learning (AutoML) has emerged as a way to save time and effort on repetitive tasks in ML pipelines, such as data pre-processing, feature engineering, model selection, hyperparameter optimization, and prediction result analysis. In this paper, we investigate the current state of AutoML tools aiming to automate these tasks. We conduct various evaluations of the tools on many datasets, in different data segments, to examine their performance, and compare their advantages and disadvantages on different test cases.

Via

Access Paper or Ask Questions