Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lichun Wang

Ensemble Predicate Decoding for Unbiased Scene Graph Generation

Aug 26, 2024

Jiasong Feng, Lichun Wang, Hongbo Xu, Kai Xu, Baocai Yin

Abstract:Scene Graph Generation (SGG) aims to generate a comprehensive graphical representation that accurately captures the semantic information of a given scenario. However, the SGG model's performance in predicting more fine-grained predicates is hindered by a significant predicate bias. According to existing works, the long-tail distribution of predicates in training data results in the biased scene graph. However, the semantic overlap between predicate categories makes predicate prediction difficult, and there is a significant difference in the sample size of semantically similar predicates, making the predicate prediction more difficult. Therefore, higher requirements are placed on the discriminative ability of the model. In order to address this problem, this paper proposes Ensemble Predicate Decoding (EPD), which employs multiple decoders to attain unbiased scene graph generation. Two auxiliary decoders trained on lower-frequency predicates are used to improve the discriminative ability of the model. Extensive experiments are conducted on the VG, and the experiment results show that EPD enhances the model's representation capability for predicates. In addition, we find that our approach ensures a relatively superior predictive capability for more frequent predicates compared to previous unbiased SGG methods.

Via

Access Paper or Ask Questions

Real-time Human Action Recognition Using Locally Aggregated Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model

May 24, 2021

Bin Sun, Dehui Kong, Shaofan Wang, Lichun Wang, Baocai Yin

Figure 1 for Real-time Human Action Recognition Using Locally Aggregated Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model

Figure 2 for Real-time Human Action Recognition Using Locally Aggregated Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model

Figure 3 for Real-time Human Action Recognition Using Locally Aggregated Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model

Figure 4 for Real-time Human Action Recognition Using Locally Aggregated Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model

Abstract:3D action recognition is referred to as the classification of action sequences which consist of 3D skeleton joints. While many research work are devoted to 3D action recognition, it mainly suffers from three problems: highly complicated articulation, a great amount of noise, and a low implementation efficiency. To tackle all these problems, we propose a real-time 3D action recognition framework by integrating the locally aggregated kinematic-guided skeletonlet (LAKS) with a supervised hashing-by-analysis (SHA) model. We first define the skeletonlet as a few combinations of joint offsets grouped in terms of kinematic principle, and then represent an action sequence using LAKS, which consists of a denoising phase and a locally aggregating phase. The denoising phase detects the noisy action data and adjust it by replacing all the features within it with the features of the corresponding previous frame, while the locally aggregating phase sums the difference between an offset feature of the skeletonlet and its cluster center together over all the offset features of the sequence. Finally, the SHA model which combines sparse representation with a hashing model, aiming at promoting the recognition accuracy while maintaining a high efficiency. Experimental results on MSRAction3D, UTKinectAction3D and Florence3DAction datasets demonstrate that the proposed method outperforms state-of-the-art methods in both recognition accuracy and implementation efficiency.

* CYB-R1,12 pages with 9 figures

Via

Access Paper or Ask Questions

Feature Fusion Use Unsupervised Prior Knowledge to Let Small Object Represent

Dec 17, 2019

Tian Liu, Lichun Wang, Shaofan Wang

Figure 1 for Feature Fusion Use Unsupervised Prior Knowledge to Let Small Object Represent

Figure 2 for Feature Fusion Use Unsupervised Prior Knowledge to Let Small Object Represent

Figure 3 for Feature Fusion Use Unsupervised Prior Knowledge to Let Small Object Represent

Figure 4 for Feature Fusion Use Unsupervised Prior Knowledge to Let Small Object Represent

Abstract:Fusing low level and high level features is a widely used strategy to provide details that might be missing during convolution and pooling. Different from previous works, we propose a new fusion mechanism called FillIn which takes advantage of prior knowledge described with superpixel segmentation. According to the prior knowledge, the FillIn chooses small region on low level feature map to fill into high level feature map. By using the proposed fusion mechanism, the low level features have equal channels for some tiny region as high level features, which makes the low level features have relatively independent power to decide final semantic label. We demonstrate the effectiveness of our model on PASCAL VOC 2012, it achieves competitive test result based on DeepLabv3+ backbone and visualizations of predictions prove our fusion can let small objects represent and low level features have potential for segmenting small objects.

Via

Access Paper or Ask Questions