Part-based representations have been shown to be very useful for image classification. Learning part-based models is often viewed as a two-stage problem. First, a collection of informative parts is discovered, using heuristics that promote part distinctiveness and diversity, and then classifiers are trained on the vector of part responses. In this paper we unify the two stages and learn the image classifiers and a set of shared parts jointly. We generate an initial pool of parts by randomly sampling part candidates and selecting a good subset using L1/L2 regularization. All steps are driven "directly" by the same objective namely the classification loss on a training set. This lets us do away with engineered heuristics. We also introduce the notion of "negative parts", intended as parts that are negatively correlated with one or more classes. Negative parts are complementary to the parts discovered by other methods, which look only for positive correlations.