Abstract:Inverse classification is the process of manipulating an instance such that it is more likely to conform to a specific class. Past methods that address such a problem have shortcomings. Greedy methods make changes that are overly radical, often relying on data that is strictly discrete. Other methods rely on certain data points, the presence of which cannot be guaranteed. In this paper we propose a general framework and method that overcomes these and other limitations. The formulation of our method can use any differentiable classification function. We demonstrate the method by using logistic regression and Gaussian kernel SVMs. We constrain the inverse classification to occur on features that can actually be changed, each of which incurs an individual cost. We further subject such changes to fall within a certain level of cumulative change (budget). Our framework can also accommodate the estimation of (indirectly changeable) features whose values change as a consequence of actions taken. Furthermore, we propose two methods for specifying feature-value ranges that result in different algorithmic behavior. We apply our method, and a proposed sensitivity analysis-based benchmark method, to two freely available datasets: Student Performance from the UCI Machine Learning Repository and a real world cardiovascular disease dataset. The results obtained demonstrate the validity and benefits of our framework and method.
Abstract:Inverse classification is the process of perturbing an instance in a meaningful way such that it is more likely to conform to a specific class. Historical methods that address such a problem are often framed to leverage only a single classifier, or specific set of classifiers. These works are often accompanied by naive assumptions. In this work we propose generalized inverse classification (GIC), which avoids restricting the classification model that can be used. We incorporate this formulation into a refined framework in which GIC takes place. Under this framework, GIC operates on features that are immediately actionable. Each change incurs an individual cost, either linear or non-linear. Such changes are subjected to occur within a specified level of cumulative change (budget). Furthermore, our framework incorporates the estimation of features that change as a consequence of direct actions taken (indirectly changeable features). To solve such a problem, we propose three real-valued heuristic-based methods and two sensitivity analysis-based comparison methods, each of which is evaluated on two freely available real-world datasets. Our results demonstrate the validity and benefits of our formulation, framework, and methods.