Abstract:Document classification for text, images and other applicable entities has long been a focus of research in academia and also finds application in many industrial settings. Amidst a plethora of approaches to solve such problems, machine-learning techniques have found success in a variety of scenarios. In this paper we discuss the design of a machine learning-based semi-supervised job title classification system for the online job recruitment domain currently in production at CareerBuilder.com and propose enhancements to it. The system leverages a varied collection of classification as well clustering algorithms. These algorithms are encompassed in an architecture that facilitates leveraging existing off-the-shelf machine learning tools and techniques while keeping into consideration the challenges of constructing a scalable classification system for a large taxonomy of categories. As a continuously evolving system that is still under development we first discuss the existing semi-supervised classification system which is composed of both clustering and classification components in a proximity-based classifier setup and results of which are already used across numerous products at CareerBuilder. We then elucidate our long-term goals for job title classification and propose enhancements to the existing system in the form of a two-stage coarse and fine level classifier augmentation to construct a cascade of hierarchical vertical classifiers. Preliminary results are presented using experimental evaluation on real world industrial data.