Abstract:Incremental anomaly detection sequentially recognizes abnormal regions in novel categories for dynamic industrial scenarios. This remains highly challenging due to knowledge overwriting and feature conflicts, leading to catastrophic forgetting. In this work, we propose ONER, an end-to-end ONline Experience Replay method, which efficiently mitigates catastrophic forgetting while adapting to new tasks with minimal cost. Specifically, our framework utilizes two types of experiences from past tasks: decomposed prompts and semantic prototypes, addressing both model parameter updates and feature optimization. The decomposed prompts consist of learnable components that assemble to produce attention-conditioned prompts. These prompts reuse previously learned knowledge, enabling model to learn novel tasks effectively. The semantic prototypes operate at both pixel and image levels, performing regularization in the latent feature space to prevent forgetting across various tasks. Extensive experiments demonstrate that our method achieves state-of-the-art performance in incremental anomaly detection with significantly reduced forgetting, as well as efficiently adapting to new categories with minimal costs. These results confirm the efficiency and stability of ONER, making it a powerful solution for real-world applications.
Abstract:In this paper, we address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples. Unlike previous methods that only consider ID samples to be useful and aim to filter out OOD ones completely during training, we argue that the exploration and exploitation of both ID and OOD samples can benefit SSL. To support our claim, i) we propose a prototype-based clustering and identification algorithm that explores the inherent similarity and difference among samples at feature level and effectively cluster them around several predefined ID and OOD prototypes, thereby enhancing feature learning and facilitating ID/OOD identification; ii) we propose an importance-based sampling method that exploits the difference in importance of each ID and OOD sample to SSL, thereby reducing the sampling bias and improving the training. Our proposed method achieves state-of-the-art in several challenging benchmarks, and improves upon existing SSL methods even when ID samples are totally absent in unlabeled data.
Abstract:It plays a central role in intelligent agent systems to model agent's epistemic state and its change. To this end, some formal systems have been presented. Among them, epistemic logics focus on logic laws of different epistemic attributes (e.g., knowledge, belief, common knowledge, etc) and epistemic actions (e.g., public announcement, private announcement, asynchronous announcement, etc). All these systems do not involve the interactive behaviours between an agent and its environment. Through enriching the well-known $\pi$-calculus, this paper presents the e-calculus, which provides a concept framework to model epistemic interactions between agents with epistemic states. Unlike usual process calculus, all systems in the e-calculus are always arranged to run at an epistemic state. To formalize epistemic states abstractly, a group of postulates on them are presented. Moreover, based on these postulates, the behaviour theory of the e-calculus is developed in two different viewpoints.
Abstract:Attention-based encoder-decoder framework is widely used in the scene text recognition task. However, for the current state-of-the-art(SOTA) methods, there is room for improvement in terms of the efficient usage of local visual and global context information of the input text image, as well as the robust correlation between the scene processing module(encoder) and the text processing module(decoder). In this paper, we propose a Representation and Correlation Enhanced Encoder-Decoder Framework(RCEED) to address these deficiencies and break performance bottleneck. In the encoder module, local visual feature, global context feature, and position information are aligned and fused to generate a small-size comprehensive feature map. In the decoder module, two methods are utilized to enhance the correlation between scene and text feature space. 1) The decoder initialization is guided by the holistic feature and global glimpse vector exported from the encoder. 2) The feature enriched glimpse vector produced by the Multi-Head General Attention is used to assist the RNN iteration and the character prediction at each time step. Meanwhile, we also design a Layernorm-Dropout LSTM cell to improve model's generalization towards changeable texts. Extensive experiments on the benchmarks demonstrate the advantageous performance of RCEED in scene text recognition tasks, especially the irregular ones.
Abstract:Based on Grossi and Modgil's recent work [1], this paper considers some issues on extension-based semantics for abstract argumentation framework (AAF, for short). First, an alternative fundamental lemma is given, which generalizes the corresponding result obtained in [1]. This lemma plays a central role in constructing some special extensions in terms of iterations of the defense function. Applying this lemma, some flaws in [1] are corrected and a number of structural properties of various extension-based semantics are given. Second, the operator so-called reduced meet modulo an ultrafilter is presented. A number of fundamental semantics for AAF, including conflict-free, admissible, complete and stable semantics, are shown to be closed under this operator. Based on this fact, we provide a concise and uniform proof method to establish the universal definability of a family of range related semantics. Thirdly, using model-theoretical tools, we characterize the class of extension-based semantics that is closed under reduced meet modulo any ultrafilter, which brings us a metatheorem concerning the universal definability of range related semantics. Finally, in addition to range related semantics, some graded variants of traditional semantics of AAF are also considered in this paper, e.g., ideal semantics, eager semantics, etc.
Abstract:Deep learning based methods have achieved surprising progress in Scene Text Recognition (STR), one of classic problems in computer vision. In this paper, we propose a feasible framework for multi-lingual arbitrary-shaped STR, including instance segmentation based text detection and language model based attention mechanism for text recognition. Our STR algorithm not only recognizes Latin and Non-Latin characters, but also supports arbitrary-shaped text recognition. Our method wins the championship on Scene Text Spotting Task (Latin Only, Latin and Chinese) of ICDAR2019 Robust Reading Challenge on ArbitraryShaped Text Competition. Code is available at https://github.com/zhang0jhon/AttentionOCR.
Abstract:Recently, pose-based action recognition has gained more and more attention due to the better performance compared with traditional appearance-based methods. However, there still exist two problems to be further solved. First, existing pose-based methods generally recognize human actions with captured 3D human poses which are very difficult to obtain in real scenarios. Second, few pose-based methods model the action-related objects in recognizing human-object interaction actions in which objects play an important role. To solve the problems above, we propose a pose-based two-stream relational network (PSRN) for action recognition. In PSRN, one stream models the temporal dynamics of the targeted 2D human pose sequences which are directly extracted from raw videos, and the other stream models the action-related objects from a randomly sampled video frame. Most importantly, instead of fusing two-streams in the class score layer as before, we propose a pose-object relational network to model the relationship between human poses and action-related objects. We evaluate the proposed PSRN on two challenging benchmarks, i.e., Sub-JHMDB and PennAction. Experimental results show that our PSRN obtains the state-the-of-art performance on Sub-JHMDB (80.2%) and PennAction (98.1%). Our work opens a new door to action recognition by combining 2D human pose extracted from raw video and image appearance.