School of Intelligent Systems Engineering, Sun Yat-sen University
Abstract:In this paper, we propose to create animatable avatars for interacting hands with 3D Gaussian Splatting (GS) and single-image inputs. Existing GS-based methods designed for single subjects often yield unsatisfactory results due to limited input views, various hand poses, and occlusions. To address these challenges, we introduce a novel two-stage interaction-aware GS framework that exploits cross-subject hand priors and refines 3D Gaussians in interacting areas. Particularly, to handle hand variations, we disentangle the 3D presentation of hands into optimization-based identity maps and learning-based latent geometric features and neural texture maps. Learning-based features are captured by trained networks to provide reliable priors for poses, shapes, and textures, while optimization-based identity maps enable efficient one-shot fitting of out-of-distribution hands. Furthermore, we devise an interaction-aware attention module and a self-adaptive Gaussian refinement module. These modules enhance image rendering quality in areas with intra- and inter-hand interactions, overcoming the limitations of existing GS-based methods. Our proposed method is validated via extensive experiments on the large-scale InterHand2.6M dataset, and it significantly improves the state-of-the-art performance in image quality. Project Page: \url{https://github.com/XuanHuang0/GuassianHand}.
Abstract:The lesion segmentation on endoscopic images is challenging due to its complex and ambiguous features. Fully-supervised deep learning segmentation methods can receive good performance based on entirely pixel-level labeled dataset but greatly increase experts' labeling burden. Semi-supervised and weakly supervised methods can ease labeling burden, but heavily strengthen the learning difficulty. To alleviate this difficulty, weakly semi-supervised segmentation adopts a new annotation protocol of adding a large number of point annotation samples into a few pixel-level annotation samples. However, existing methods only mine points' limited information while ignoring reliable prior surrounding the point annotations. In this paper, we propose a weakly semi-supervised method called Point-Neighborhood Learning (PNL) framework. To mine the prior of the pixels surrounding the annotated point, we transform a single-point annotation into a circular area named a point-neighborhood. We propose point-neighborhood supervision loss and pseudo-label scoring mechanism to enhance training supervision. Point-neighborhoods are also used to augment the data diversity. Our method greatly improves performance without changing the structure of segmentation network. Comprehensive experiments show the superiority of our method over the other existing methods, demonstrating its effectiveness in point-annotated medical images. The project code will be available on: https://github.com/ParryJay/PNL.
Abstract:Diffusion is a commonly used technique for spreading information from point to point on a graph. The rationale behind diffusion is not clear. And the multi-types Galton-Watson forest is a random model of population growth without space or any other resource constraints. In this paper, we use the degenerated multi-types Galton-Watson forest (MGWF) to interpret the diffusion process and establish an equivalent relationship between them. With the two-phase setting of the MGWF, one can interpret the diffusion process and the Google PageRank system explicitly. It also improves the convergence behaviour of the iterative diffusion process and Google PageRank system. We validate the proposal by experiment while providing new research directions.
Abstract:Based on rectangle theory of formal concept and set covering theory, the concept reduction preserving binary relations is investigated in this paper. It is known that there are three types of formal concepts: core concepts, relative necessary concepts and unnecessary concepts. First, we present the new judgment results for relative necessary concepts and unnecessary concepts. Second, we derive the bounds for both the maximum number of relative necessary concepts and the maximum number of unnecessary concepts and it is a difficult problem as either in concept reduction preserving binary relations or attribute reduction of decision formal contexts, the computation of formal contexts from formal concepts is a challenging problem. Third, based on rectangle theory of formal concept, a fast algorithm for reducing attributes while preserving the extensions for a set of formal concepts is proposed using the extension bit-array technique, which allows multiple context cells to be processed by a single 32-bit or 64-bit operator. Technically, the new algorithm could store both formal context and extent of a concept as bit-arrays, and we can use bit-operations to process set operations "or" as well as "and". One more merit is that the new algorithm does not need to consider other concepts in the concept lattice, thus the algorithm is explicit to understand and fast. Experiments demonstrate that the new algorithm is effective in the computation of attribute reductions.
Abstract:Concise granule descriptions for describable granules and approaching description methods for indescribable granules are challenging and important issues in granular computing. The concept with only common attributes has been frequently studied. To investigate the granules with some special needs, we propose two new types of compound concepts in this paper: bipolar concept and common-and-necessary concept. Based on the definitions of concept-forming operations, the logical formulas are derived for each of the following types of concepts: formal concept, three-way concept, object oriented concept, bipolar concept and common-and-necessary concept. Furthermore, by utilizing the logical relationship among various concepts, we have derived concise and unified equivalent conditions for describable granules and approaching description methods for indescribable granules for all five kinds of concepts.
Abstract:The emergence of Formal Concept Analysis (FCA) as a data analysis technique has increased the need for developing algorithms which can compute formal concepts quickly. The current efficient algorithms for FCA are variants of the Close-By-One (CbO) algorithm, such as In-Close2, In-Close3 and In-Close4, which are all based on horizontal storage of contexts. In this paper, based on algorithm In-Close4, a new algorithm based on the vertical storage of contexts, called In-Close5, is proposed, which can significantly reduce both the time complexity and space complexity of algorithm In-Close4. Technically, the new algorithm stores both context and extent of a concept as a vertical bit-array, while within In-Close4 algorithm the context is stored only as a horizontal bit-array, which is very slow in finding the intersection of two extent sets. Experimental results demonstrate that the proposed algorithm is much more effective than In-Close4 algorithm, and it also has a broader scope of applicability in computing formal concept in which one can solve the problems that cannot be solved by the In-Close4 algorithm.
Abstract:Sleep staging plays an important role on the diagnosis of sleep disorders. In general, experts classify sleep stages manually based on polysomnography (PSG), which is quite time-consuming. Meanwhile, the acquisition process of multiple signals is much complex, which can affect the subject's sleep. Therefore, the use of single-channel electroencephalogram (EEG) for automatic sleep staging has become a popular research topic. In the literature, a large number of sleep staging methods based on single-channel EEG have been proposed with promising results and achieve the preliminary automation of sleep staging. However, the performance for most of these methods in the N1 stage do not satisfy the needs of the diagnosis. In this paper, we propose a deep learning model multi scale dual attention network(MSDAN) based on raw EEG, which utilizes multi-scale convolution to extract features in different waveforms contained in the EEG signal, connects channel attention and spatial attention mechanisms in series to filter and highlight key information, and uses soft thresholding to remove redundant information. Experiments were conducted using two datasets with 5-fold cross-validation and hold-out validation method. The final average accuracy, overall accuracy, macro F1 score and Cohen's Kappa coefficient of the model reach 96.70%, 91.74%, 0.8231 and 0.8723 on the Sleep-EDF dataset, 96.14%, 90.35%, 0.7945 and 0.8284 on the Sleep-EDFx dataset. Significantly, our model performed superiorly in the N1 stage, with F1 scores of 54.41% and 52.79% on the two datasets respectively. The results show the superiority of our network over the existing methods, reaching a new state-of-the-art. In particular, the proposed method achieves excellent results in the N1 sleep stage compared to other methods.
Abstract:Modern CNN-based object detectors focus on feature configuration during training but often ignore feature optimization during inference. In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages. We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with modern detectors, resulting in our iffDetector. Unlike conventional open-loop feature calculation approaches without feedback, the IFF module performs closed-loop optimization by leveraging high-level semantics to enhance the convolutional features. By applying Fourier transform analysis, we demonstrate that the IFF module acts as a negative feedback that theoretically guarantees the stability of feature learning. IFF can be fused with CNN-based object detectors in a plug-and-play manner with negligible computational cost overhead. Experiments on the PASCAL VOC and MS COCO datasets demonstrate that our iffDetector consistently outperforms state-of-the-art methods by significant margins\footnote{The test code and model are anonymously available in https://github.com/anonymous2020new/iffDetector }.
Abstract:Preserving the contour topology during image segmentation is useful in manypractical scenarios. By keeping the contours isomorphic, it is possible to pre-vent over-segmentation and under-segmentation, as well as to adhere to giventopologies. The self-repelling snake model (SR) is a variational model thatpreserves contour topology by combining a non-local repulsion term with thegeodesic active contour model (GAC). The SR is traditionally solved using theadditive operator splitting (AOS) scheme. Although this solution is stable, thememory requirement grows quickly as the image size increases. In our paper,we propose an alternative solution to the SR using the Split Bregman method.Our algorithm breaks the problem down into simpler subproblems to use lower-order evolution equations and approximation schemes. The memory usage issignificantly reduced as a result. Experiments show comparable performance to the original algorithm with shorter iteration times.
Abstract:Existing graph convolutional networks focus on the neighborhood aggregation scheme. When applied to semi-supervised learning, they often suffer from the overfitting problem as the networks are trained with the cross-entropy loss on a small potion of labeled data. In this paper, we propose an unsupervised manifold smoothness loss defined with respect to the graph structure, which can be added to the loss function as a regularization. We draw connections between the proposed loss with an iterative diffusion process, and show that minimizing the loss is equivalent to aggregate neighbor predictions with infinite layers. We conduct experiments on multi-layer perceptron and existing graph networks, and demonstrate that adding the proposed loss can improve the performance consistently.