Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yinghan Sun

A Practical Introduction to Deep Reinforcement Learning

May 13, 2025

Yinghan Sun, Hongxi Wang, Hua Chen, Wei Zhang

Abstract:Deep reinforcement learning (DRL) has emerged as a powerful framework for solving sequential decision-making problems, achieving remarkable success in a wide range of applications, including game AI, autonomous driving, biomedicine, and large language models. However, the diversity of algorithms and the complexity of theoretical foundations often pose significant challenges for beginners seeking to enter the field. This tutorial aims to provide a concise, intuitive, and practical introduction to DRL, with a particular focus on the Proximal Policy Optimization (PPO) algorithm, which is one of the most widely used and effective DRL methods. To facilitate learning, we organize all algorithms under the Generalized Policy Iteration (GPI) framework, offering readers a unified and systematic perspective. Instead of lengthy theoretical proofs, we emphasize intuitive explanations, illustrative examples, and practical engineering techniques. This work serves as an efficient and accessible guide, helping readers rapidly progress from basic concepts to the implementation of advanced DRL algorithms.

Via

Access Paper or Ask Questions

GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement

Apr 17, 2024

Linfang Zheng, Tze Ho Elden Tse, Chen Wang, Yinghan Sun, Hua Chen, Ales Leonardis, Wei Zhang

Abstract:Object pose refinement is essential for robust object pose estimation. Previous work has made significant progress towards instance-level object pose refinement. Yet, category-level pose refinement is a more challenging problem due to large shape variations within a category and the discrepancies between the target object and the shape prior. To address these challenges, we introduce a novel architecture for category-level object pose refinement. Our approach integrates an HS-layer and learnable affine transformations, which aims to enhance the extraction and alignment of geometric information. Additionally, we introduce a cross-cloud transformation mechanism that efficiently merges diverse data sources. Finally, we push the limits of our model by incorporating the shape prior information for translation and size error prediction. We conducted extensive experiments to demonstrate the effectiveness of the proposed framework. Through extensive quantitative experiments, we demonstrate significant improvement over the baseline method by a large margin across all metrics.

* The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

Via

Access Paper or Ask Questions

Multi-Resolution Planar Region Extraction for Uneven Terrains

Nov 21, 2023

Yinghan Sun, Linfang Zheng, Hua Chen, Wei Zhang

Figure 1 for Multi-Resolution Planar Region Extraction for Uneven Terrains

Figure 2 for Multi-Resolution Planar Region Extraction for Uneven Terrains

Figure 3 for Multi-Resolution Planar Region Extraction for Uneven Terrains

Figure 4 for Multi-Resolution Planar Region Extraction for Uneven Terrains

Abstract:This paper studies the problem of extracting planar regions in uneven terrains from unordered point cloud measurements. Such a problem is critical in various robotic applications such as robotic perceptive locomotion. While existing approaches have shown promising results in effectively extracting planar regions from the environment, they often suffer from issues such as low computational efficiency or loss of resolution. To address these issues, we propose a multi-resolution planar region extraction strategy in this paper that balances the accuracy in boundaries and computational efficiency. Our method begins with a pointwise classification preprocessing module, which categorizes all sampled points according to their local geometric properties to facilitate multi-resolution segmentation. Subsequently, we arrange the categorized points using an octree, followed by an in-depth analysis of nodes to finish multi-resolution plane segmentation. The efficiency and robustness of the proposed approach are verified via synthetic and real-world experiments, demonstrating our method's ability to generalize effectively across various uneven terrains while maintaining real-time performance, achieving frame rates exceeding 35 FPS.

Via

Access Paper or Ask Questions

HS-Pose: Hybrid Scope Feature Extraction for Category-level Object Pose Estimation

Mar 28, 2023

Linfang Zheng, Chen Wang, Yinghan Sun, Esha Dasgupta, Hua Chen, Ales Leonardis, Wei Zhang, Hyung Jin Chang

Figure 1 for HS-Pose: Hybrid Scope Feature Extraction for Category-level Object Pose Estimation

Abstract:In this paper, we focus on the problem of category-level object pose estimation, which is challenging due to the large intra-category shape variation. 3D graph convolution (3D-GC) based methods have been widely used to extract local geometric features, but they have limitations for complex shaped objects and are sensitive to noise. Moreover, the scale and translation invariant properties of 3D-GC restrict the perception of an object's size and translation information. In this paper, we propose a simple network structure, the HS-layer, which extends 3D-GC to extract hybrid scope latent features from point cloud data for category-level object pose estimation tasks. The proposed HS-layer: 1) is able to perceive local-global geometric structure and global information, 2) is robust to noise, and 3) can encode size and translation information. Our experiments show that the simple replacement of the 3D-GC layer with the proposed HS-layer on the baseline method (GPV-Pose) achieves a significant improvement, with the performance increased by 14.5% on 5d2cm metric and 10.3% on IoU75. Our method outperforms the state-of-the-art methods by a large margin (8.3% on 5d2cm, 6.9% on IoU75) on the REAL275 dataset and runs in real-time (50 FPS).

* Accepted by the 2023 IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR)

Via

Access Paper or Ask Questions