Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiuli Shao

AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction

Mar 17, 2025

Xuying Zhang, Yupeng Zhou, Kai Wang, Yikai Wang, Zhen Li, Xiuli Shao, Daquan Zhou, Qibin Hou, Ming-Ming Cheng

Abstract:Novel view synthesis (NVS) is a cornerstone for image-to-3d creation. However, existing works still struggle to maintain consistency between the generated views and the input views, especially when there is a significant camera pose difference, leading to poor-quality 3D geometries and textures. We attribute this issue to their treatment of all target views with equal priority according to our empirical observation that the target views closer to the input views exhibit higher fidelity. With this inspiration, we propose AR-1-to-3, a novel next-view prediction paradigm based on diffusion models that first generates views close to the input views, which are then utilized as contextual information to progressively synthesize farther views. To encode the generated view subsequences as local and global conditions for the next-view prediction, we accordingly develop a stacked local feature encoding strategy (Stacked-LE) and an LSTM-based global feature encoding strategy (LSTM-GE). Extensive experiments demonstrate that our method significantly improves the consistency between the generated views and the input views, producing high-fidelity 3D assets.

Via

Access Paper or Ask Questions

GroupTransNet: Group Transformer Network for RGB-D Salient Object Detection

Mar 21, 2022

Xian Fang, Jinshao Zhu, Xiuli Shao, Hongpeng Wang

Figure 1 for GroupTransNet: Group Transformer Network for RGB-D Salient Object Detection

Figure 2 for GroupTransNet: Group Transformer Network for RGB-D Salient Object Detection

Figure 3 for GroupTransNet: Group Transformer Network for RGB-D Salient Object Detection

Figure 4 for GroupTransNet: Group Transformer Network for RGB-D Salient Object Detection

Abstract:Salient object detection on RGB-D images is an active topic in computer vision. Although the existing methods have achieved appreciable performance, there are still some challenges. The locality of convolutional neural network requires that the model has a sufficiently deep global receptive field, which always leads to the loss of local details. To address the challenge, we propose a novel Group Transformer Network (GroupTransNet) for RGB-D salient object detection. This method is good at learning the long-range dependencies of cross layer features to promote more perfect feature expression. At the beginning, the features of the slightly higher classes of the middle three levels and the latter three levels are soft grouped to absorb the advantages of the high-level features. The input features are repeatedly purified and enhanced by the attention mechanism to purify the cross modal features of color modal and depth modal. The features of the intermediate process are first fused by the features of different layers, and then processed by several transformers in multiple groups, which not only makes the size of the features of each scale unified and interrelated, but also achieves the effect of sharing the weight of the features within the group. The output features in different groups complete the clustering staggered by two owing to the level difference, and combine with the low-level features. Extensive experiments demonstrate that GroupTransNet outperforms the comparison models and achieves the new state-of-the-art performance.

Via

Access Paper or Ask Questions

LC3Net: Ladder context correlation complementary network for salient object detection

Oct 21, 2021

Xian Fang, Jinchao Zhu, Xiuli Shao, Hongpeng Wang

Figure 1 for LC3Net: Ladder context correlation complementary network for salient object detection

Figure 2 for LC3Net: Ladder context correlation complementary network for salient object detection

Figure 3 for LC3Net: Ladder context correlation complementary network for salient object detection

Figure 4 for LC3Net: Ladder context correlation complementary network for salient object detection

Abstract:Currently, existing salient object detection methods based on convolutional neural networks commonly resort to constructing discriminative networks to aggregate high level and low level features. However, contextual information is always not fully and reasonably utilized, which usually causes either the absence of useful features or contamination of redundant features. To address these issues, we propose a novel ladder context correlation complementary network (LC3Net) in this paper, which is equipped with three crucial components. At the beginning, we propose a filterable convolution block (FCB) to assist the automatic collection of information on the diversity of initial features, and it is simple yet practical. Besides, we propose a dense cross module (DCM) to facilitate the intimate aggregation of different levels of features by validly integrating semantic information and detailed information of both adjacent and non-adjacent layers. Furthermore, we propose a bidirectional compression decoder (BCD) to help the progressive shrinkage of multi-scale features from coarse to fine by leveraging multiple pairs of alternating top-down and bottom-up feature interaction flows. Extensive experiments demonstrate the superiority of our method against 16 state-of-the-art methods.

Via

Access Paper or Ask Questions

M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection

Sep 16, 2021

Xian Fang, Jinchao Zhu, Ruixun Zhang, Xiuli Shao, Hongpeng Wang

Figure 1 for M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection

Figure 2 for M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection

Figure 3 for M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection

Figure 4 for M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection

Abstract:Salient object detection is a fundamental topic in computer vision. Previous methods based on RGB-D often suffer from the incompatibility of multi-modal feature fusion and the insufficiency of multi-scale feature aggregation. To tackle these two dilemmas, we propose a novel multi-modal and multi-scale refined network (M2RNet). Three essential components are presented in this network. The nested dual attention module (NDAM) explicitly exploits the combined features of RGB and depth flows. The adjacent interactive aggregation module (AIAM) gradually integrates the neighbor features of high, middle and low levels. The joint hybrid optimization loss (JHOL) makes the predictions have a prominent outline. Extensive experiments demonstrate that our method outperforms other state-of-the-art approaches.

Via

Access Paper or Ask Questions