We critically appraise the recent interest in out-of-distribution (OOD) detection and question the practical relevance of existing benchmarks. While the currently prevalent trend is to consider different datasets as OOD, we posit that out-distributions of practical interest are ones where the distinction is semantic in nature for a specified context, and evaluative tasks should reflect this more closely. Assuming a context of object recognition, we recommend a set of benchmarks, motivated by referencing practical applications. Finally, we explore a multi-task learning based approach and show empirically that auxiliary objectives for improved semantic awareness can result in improved semantic anomaly detection, with accompanying generalization benefits.