Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kuangyi Chen

ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View Stereo

Mar 27, 2025

Yuxi Hu, Jun Zhang, Zhe Zhang, Rafael Weilharter, Yuchen Rao, Kuangyi Chen, Runze Yuan, Friedrich Fraundorfer

Abstract:Multi-view Stereo (MVS) aims to estimate depth and reconstruct 3D point clouds from a series of overlapping images. Recent learning-based MVS frameworks overlook the geometric information embedded in features and correlations, leading to weak cost matching. In this paper, we propose ICG-MVSNet, which explicitly integrates intra-view and cross-view relationships for depth estimation. Specifically, we develop an intra-view feature fusion module that leverages the feature coordinate correlations within a single image to enhance robust cost matching. Additionally, we introduce a lightweight cross-view aggregation module that efficiently utilizes the contextual information from volume correlations to guide regularization. Our method is evaluated on the DTU dataset and Tanks and Temples benchmark, consistently achieving competitive performance against state-of-the-art works, while requiring lower computational resources.

Via

Access Paper or Ask Questions

PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement

Oct 29, 2024

Shutong Jin, Ruiyu Wang, Kuangyi Chen, Florian T. Pokorny

Figure 1 for PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement

Figure 2 for PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement

Figure 3 for PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement

Figure 4 for PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement

Abstract:Scene rearrangement, like table tidying, is a challenging task in robotic manipulation due to the complexity of predicting diverse object arrangements. Web-scale trained generative models such as Stable Diffusion can aid by generating natural scenes as goals. To facilitate robot execution, object-level representations must be extracted to match the real scenes with the generated goals and to calculate object pose transformations. Current methods typically use a multi-step design that involves separate models for generation, segmentation, and feature encoding, which can lead to a low success rate due to error accumulation. Furthermore, they lack control over the viewing perspectives of the generated goals, restricting the tasks to 3-DoF settings. In this paper, we propose PACA, a zero-shot pipeline for scene rearrangement that leverages perspective-aware cross-attention representation derived from Stable Diffusion. Specifically, we develop a representation that integrates generation, segmentation, and feature encoding into a single step to produce object-level representations. Additionally, we introduce perspective control, thus enabling the matching of 6-DoF camera views and extending past approaches that were limited to 3-DoF top-down views. The efficacy of our method is demonstrated through its zero-shot performance in real robot experiments across various scenes, achieving an average matching accuracy and execution success rate of 87% and 67%, respectively.

* Accepted by WACV2025

Via

Access Paper or Ask Questions

FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events

Mar 18, 2024

Xiangyuan Wang, Kuangyi Chen, Wen Yang, Lei Yu, Yannan Xing, Huai Yu

Abstract:Keypoint detection and tracking in traditional image frames are often compromised by image quality issues such as motion blur and extreme lighting conditions. Event cameras offer potential solutions to these challenges by virtue of their high temporal resolution and high dynamic range. However, they have limited performance in practical applications due to their inherent noise in event data. This paper advocates fusing the complementary information from image frames and event streams to achieve more robust keypoint detection and tracking. Specifically, we propose a novel keypoint detection network that fuses the textural and structural information from image frames with the high-temporal-resolution motion information from event streams, namely FE-DeTr. The network leverages a temporal response consistency for supervision, ensuring stable and efficient keypoint detection. Moreover, we use a spatio-temporal nearest-neighbor search strategy for robust keypoint tracking. Extensive experiments are conducted on a new dataset featuring both image frames and event data captured under extreme conditions. The experimental results confirm the superior performance of our method over both existing frame-based and event-based methods.

* 7 pages, Accepted by ICRA 2024

Via

Access Paper or Ask Questions

2D-3D Pose Tracking with Multi-View Constraints

Sep 20, 2023

Huai Yu, Kuangyi Chen, Wen Yang, Sebastian Scherer, Gui-Song Xia

Abstract:Camera localization in 3D LiDAR maps has gained increasing attention due to its promising ability to handle complex scenarios, surpassing the limitations of visual-only localization methods. However, existing methods mostly focus on addressing the cross-modal gaps, estimating camera poses frame by frame without considering the relationship between adjacent frames, which makes the pose tracking unstable. To alleviate this, we propose to couple the 2D-3D correspondences between adjacent frames using the 2D-2D feature matching, establishing the multi-view geometrical constraints for simultaneously estimating multiple camera poses. Specifically, we propose a new 2D-3D pose tracking framework, which consists: a front-end hybrid flow estimation network for consecutive frames and a back-end pose optimization module. We further design a cross-modal consistency-based loss to incorporate the multi-view constraints during the training and inference process. We evaluate our proposed framework on the KITTI and Argoverse datasets. Experimental results demonstrate its superior performance compared to existing frame-by-frame 2D-3D pose tracking methods and state-of-the-art vision-only pose tracking algorithms. More online pose tracking videos are available at \url{https://youtu.be/yfBRdg7gw5M}

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions