Given the scarcity of anomalies in real-world applications, the majority of literature has been focusing on modeling normality. The learned representations enable anomaly detection as the normality model is trained to capture certain key underlying data regularities under normal circumstances. In practical settings, particularly industrial time series anomaly detection, we often encounter situations where a large amount of normal operation data is available along with a small number of anomaly events collected over time. This practical situation calls for methodologies to leverage these small number of anomaly events to create a better anomaly detector. In this paper, we introduce two methodologies to address the needs of this practical situation and compared them with recently developed state of the art techniques. Our proposed methods anchor on representative learning of normal operation with autoregressive (AR) model along with loss components to encourage representations that separate normal versus few positive examples. We applied the proposed methods to two industrial anomaly detection datasets and demonstrated effective performance in comparison with approaches from literature. Our study also points out additional challenges with adopting such methods in practical applications.