Abstract:This paper investigates how to correct Chinese text errors with types of mistaken, missing and redundant characters, which is common for Chinese native speakers. Most existing models based on detect-correct framework can correct mistaken characters errors, but they cannot deal with missing or redundant characters. The reason is that lengths of sentences before and after correction are not the same, leading to the inconsistence between model inputs and outputs. Although the Seq2Seq-based or sequence tagging methods provide solutions to the problem and achieved relatively good results on English context, but they do not perform well in Chinese context according to our experimental results. In our work, we propose a novel detect-correct framework which is alignment-agnostic, meaning that it can handle both text aligned and non-aligned occasions, and it can also serve as a cold start model when there are no annotated data provided. Experimental results on three datasets demonstrate that our method is effective and achieves the best performance among existing published models.
Abstract:In recent studies, Lots of work has been done to solve time series anomaly detection by applying Variational Auto-Encoders (VAEs). Time series anomaly detection is a very common but challenging task in many industries, which plays an important role in network monitoring, facility maintenance, information security, and so on. However, it is very difficult to detect anomalies in time series with high accuracy, due to noisy data collected from real world, and complicated abnormal patterns. From recent studies, we are inspired by Nouveau VAE (NVAE) and propose our anomaly detection model: Time series to Image VAE (T2IVAE), an unsupervised model based on NVAE for univariate series, transforming 1D time series to 2D image as input, and adopting the reconstruction error to detect anomalies. Besides, we also apply the Generative Adversarial Networks based techniques to T2IVAE training strategy, aiming to reduce the overfitting. We evaluate our model performance on three datasets, and compare it with other several popular models using F1 score. T2IVAE achieves 0.639 on Numenta Anomaly Benchmark, 0.651 on public dataset from NASA, and 0.504 on our dataset collected from real-world scenario, outperforms other comparison models.