Abstract:Addresses occupy a niche location within the landscape of textual data, due to the positional importance carried by every word, and the geographical scope it refers to. The task of matching addresses happens everyday and is present in various fields like mail redirection, entity resolution, etc. Our work defines, and formalizes a framework to generate matching and mismatching pairs of addresses in the English language, and use it to evaluate various methods to automatically perform address matching. These methods vary widely from distance based approaches to deep learning models. By studying the Precision, Recall and Accuracy metrics of these approaches, we obtain an understanding of the best suited method for this setting of the address matching task.
Abstract:Natural Language Processing technology has advanced vastly in the past decade. Text processing has been successfully applied to a wide variety of domains. In this paper, we propose a novel framework, Text Based Classification(TBC), that uses state of the art text processing techniques to solve classification tasks on tabular data. We provide a set of controlled experiments where we present the benefits of using this approach against other classification methods. Experimental results on several data sets also show that this framework achieves comparable performance to that of several state of the art models in accuracy, precision and recall of predicted classes.