Abstract:Despite the outstanding performance in many individual tasks, deep neural networks suffer from catastrophic forgetting when learning from continuous data streams in real-world scenarios. Current Non-Exemplar Class-Incremental Learning (NECIL) methods mitigate forgetting by storing a single prototype per class, which serves to inject previous information when sequentially learning new classes. However, these stored prototypes or their augmented variants often fail to simultaneously capture spatial distribution diversity and precision needed for representing old classes. Moreover, as the model acquires new knowledge, these prototypes gradually become outdated, making them less effective. To overcome these limitations, we propose a more efficient NECIL method that replaces prototypes with synthesized retrospective features for old classes. Specifically, we model each old class's feature space using a multivariate Gaussian distribution and generate deep representations by sampling from high-likelihood regions. Additionally, we introduce a similarity-based feature compensation mechanism that integrates generated old class features with similar new class features to synthesize robust retrospective representations. These retrospective features are then incorporated into our incremental learning framework to preserve the decision boundaries of previous classes while learning new ones. Extensive experiments on CIFAR-100, TinyImageNet, and ImageNet-Subset demonstrate that our method significantly improves the efficiency of non-exemplar class-incremental learning and achieves state-of-the-art performance.
Abstract:The structured light (SL)-based 3D measurement techniques with deep learning have been widely studied, among which speckle projection profilometry (SPP) and fringe projection profilometry (FPP) are two popular methods. However, they generally use a single projection pattern for reconstruction, resulting in fringe order ambiguity or poor reconstruction accuracy. To alleviate these problems, we propose a parallel dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet), to take advantage of convolutional operations and self-attention mechanisms for processing different SL modalities. Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images. To fully integrate complementary features, we design a double-stream attention aggregation module (DAAM) that consist of a parallel attention subnetwork for aggregating multi-scale spatial structure information. This module can dynamically retain local and global representations to the maximum extent. Moreover, an adaptive mixture density head with bimodal Gaussian distribution is proposed for learning a representation that is precise near discontinuities. Compared to the standard disparity regression strategy, this adaptive mixture head can effectively improves performance at object boundaries. Extensive experiments demonstrate that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset. We also show that the proposed architecture reveals the potential in infrared-visible image fusion task.
Abstract:Detecting players from sports broadcast videos is essential for intelligent event analysis. However, existing methods assume fixed player categories, incapably accommodating the real-world scenarios where categories continue to evolve. Directly fine-tuning these methods on newly emerging categories also exist the catastrophic forgetting due to the non-stationary distribution. Inspired by recent research on incremental object detection (IOD), we propose a Refined Response Distillation (R^2D) method to effectively mitigate catastrophic forgetting for IOD tasks of the players. Firstly, we design a progressive coarse-to-fine distillation region dividing scheme, separating high-value and low-value regions from classification and regression responses for precise and fine-grained regional knowledge distillation. Subsequently, a tailored refined distillation strategy is developed on regions with varying significance to address the performance limitations posed by pronounced feature homogeneity in the IOD tasks of the players. Furthermore, we present the NBA-IOD and Volleyball-IOD datasets as the benchmark and investigate the IOD tasks of the players systematically. Extensive experiments conducted on benchmarks demonstrate that our method achieves state-of-the-art results.The code and datasets are available at https://github.com/beiyan1911/Players-IOD.
Abstract:Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component.
Abstract:The Artificial Neural Networks (ANNs) have been originally designed to function like a biological neural network, but does an ANN really work in the same way as a biological neural network? As we know, the human brain holds information in its memory cells, so if the ANNs use the same model as our brains, they should store datasets in a similar manner. The most popular type of ANN architecture is based on a layered structure of neurons, whereas a human brain has trillions of complex interconnections of neurons continuously establishing new connections, updating existing ones, and removing the irrelevant connections across different parts of the brain. In this paper, we propose a novel approach to building ANNs which are truly inspired by the biological network containing a mesh of subnets controlled by a central mechanism. A subnet is a network of neurons that hold the dataset values. We attempt to address the following fundamental questions: (1) What is the architecture of the ANN model? Whether the layered architecture is the most appropriate choice? (2) Whether a neuron is a process or a memory cell? (3) What is the best way of interconnecting neurons and what weight-assignment mechanism should be used? (4) How to incorporate prior knowledge, bias, and generalizations for features extraction and prediction? Our proposed ANN architecture leverages the accuracy on textual data and our experimental findings confirm the effectiveness of our model. We also collaborate with the construction of the ANN model for storing and processing the images.
Abstract:Vascular tracking of angiographic image sequences is one of the most clinically important tasks in the diagnostic assessment and interventional guidance of cardiac disease. However, this task can be challenging to accomplish because of unsatisfactory angiography image quality and complex vascular structures. Thus, this study proposed a new greedy graph search-based method for vascular tracking. Each vascular branch is separated from the vasculature and is tracked independently. Then, all branches are combined using topology optimization, thereby resulting in complete vasculature tracking. A gray-based image registration method was applied to determine the tracking range, and the deformation field between two consecutive frames was calculated. The vascular branch was described using a vascular centerline extraction method with multi-probability fusion-based topology optimization. We introduce an undirected acyclic graph establishment technique. A greedy search method was proposed to acquire all possible paths in the graph that might match the tracked vascular branch. The final tracking result was selected by branch matching using dynamic time warping with a DAISY descriptor. The solution to the problem reflected both the spatial and textural information between successive frames. Experimental results demonstrated that the proposed method was effective and robust for vascular tracking, attaining a F1 score of 0.89 on a single branch dataset and 0.88 on a vessel tree dataset. This approach provided a universal solution to address the problem of filamentary structure tracking.