Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dong Tian

Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns

Mar 05, 2025

Dong Tian, Ge Li, Hongyi Zhou, Onur Celik, Gerhard Neumann

Abstract:Soft Actor-Critic (SAC) critically depends on its critic network, which typically evaluates a single state-action pair to guide policy updates. Using N-step returns is a common practice to reduce the bias in the target values of the critic. However, using N-step returns can again introduce high variance and necessitates importance sampling, often destabilizing training. Recent algorithms have also explored action chunking-such as direct action repetition and movement primitives-to enhance exploration. In this paper, we propose a Transformer-based Critic Network for SAC that integrates the N-returns framework in a stable and efficient manner. Unlike approaches that perform chunking in the actor network, we feed chunked actions into the critic network to explore potential performance gains. Our architecture leverages the Transformer's ability to process sequential information, facilitating more robust value estimation. Empirical results show that this method not only achieves efficient, stable training but also excels in sparse reward/multi-phase environments-traditionally a challenge for step-based methods. These findings underscore the promise of combining Transformer-based critics with N-returns to advance reinforcement learning performance

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

Towards Reproducible Learning-based Compression

Oct 13, 2024

Jiahao Pang, Muhammad Asad Lodhi, Junghyun Ahn, Yuning Huang, Dong Tian

Abstract:A deep learning system typically suffers from a lack of reproducibility that is partially rooted in hardware or software implementation details. The irreproducibility leads to skepticism in deep learning technologies and it can hinder them from being deployed in many applications. In this work, the irreproducibility issue is analyzed where deep learning is employed in compression systems while the encoding and decoding may be run on devices from different manufacturers. The decoding process can even crash due to a single bit difference, e.g., in a learning-based entropy coder. For a given deep learning-based module with limited resources for protection, we first suggest that reproducibility can only be assured when the mismatches are bounded. Then a safeguarding mechanism is proposed to tackle the challenges. The proposed method may be applied for different levels of protection either at the reconstruction level or at a selected decoding level. Furthermore, the overhead introduced for the protection can be scaled down accordingly when the error bound is being suppressed. Experiments demonstrate the effectiveness of the proposed approach for learning-based compression systems, e.g., in image compression and point cloud compression.

* Accepted at MMSP 2024

Via

Access Paper or Ask Questions

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Oct 12, 2024

Ge Li, Dong Tian, Hongyi Zhou, Xinkai Jiang, Rudolf Lioutikov, Gerhard Neumann

Figure 1 for TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Figure 2 for TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Figure 3 for TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Figure 4 for TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Abstract:This work introduces Transformer-based Off-Policy Episodic Reinforcement Learning (TOP-ERL), a novel algorithm that enables off-policy updates in the ERL framework. In ERL, policies predict entire action trajectories over multiple time steps instead of single actions at every time step. These trajectories are typically parameterized by trajectory generators such as Movement Primitives (MP), allowing for smooth and efficient exploration over long horizons while capturing high-level temporal correlations. However, ERL methods are often constrained to on-policy frameworks due to the difficulty of evaluating state-action values for entire action sequences, limiting their sample efficiency and preventing the use of more efficient off-policy architectures. TOP-ERL addresses this shortcoming by segmenting long action sequences and estimating the state-action values for each segment using a transformer-based critic architecture alongside an n-step return estimation. These contributions result in efficient and stable training that is reflected in the empirical results conducted on sophisticated robot learning environments. TOP-ERL significantly outperforms state-of-the-art RL methods. Thorough ablation studies additionally show the impact of key design choices on the model performance.

Via

Access Paper or Ask Questions

PIVOT-Net: Heterogeneous Point-Voxel-Tree-based Framework for Point Cloud Compression

Feb 11, 2024

Jiahao Pang, Kevin Bui, Dong Tian

Abstract:The universality of the point cloud format enables many 3D applications, making the compression of point clouds a critical phase in practice. Sampled as discrete 3D points, a point cloud approximates 2D surface(s) embedded in 3D with a finite bit-depth. However, the point distribution of a practical point cloud changes drastically as its bit-depth increases, requiring different methodologies for effective consumption/analysis. In this regard, a heterogeneous point cloud compression (PCC) framework is proposed. We unify typical point cloud representations -- point-based, voxel-based, and tree-based representations -- and their associated backbones under a learning-based framework to compress an input point cloud at different bit-depth levels. Having recognized the importance of voxel-domain processing, we augment the framework with a proposed context-aware upsampling for decoding and an enhanced voxel transformer for feature aggregation. Extensive experimentation demonstrates the state-of-the-art performance of our proposal on a wide range of point clouds.

* Accepted at 3DV 2024

Via

Access Paper or Ask Questions

WrappingNet: Mesh Autoencoder via Deep Sphere Deformation

Aug 29, 2023

Eric Lei, Muhammad Asad Lodhi, Jiahao Pang, Junghyun Ahn, Dong Tian

Figure 1 for WrappingNet: Mesh Autoencoder via Deep Sphere Deformation

Figure 2 for WrappingNet: Mesh Autoencoder via Deep Sphere Deformation

Figure 3 for WrappingNet: Mesh Autoencoder via Deep Sphere Deformation

Figure 4 for WrappingNet: Mesh Autoencoder via Deep Sphere Deformation

Abstract:There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific templates, e.g., human face/body templates. It restricts the learned latent codes to only be meaningful for objects in a specific category, so the learned latent spaces are unable to be used across different types of objects. In this work, we present WrappingNet, the first mesh autoencoder enabling general mesh unsupervised learning over heterogeneous objects. It introduces a novel base graph in the bottleneck dedicated to representing mesh connectivity, which is shown to facilitate learning a shared latent space representing object shape. The superiority of WrappingNet mesh learning is further demonstrated via improved reconstruction quality and competitive classification compared to point cloud learning, as well as latent interpolation between meshes of different categories.

Via

Access Paper or Ask Questions

Concavity-Induced Distance for Unoriented Point Cloud Decomposition

Jun 19, 2023

Ruoyu Wang, Yanfei Xue, Bharath Surianarayanan, Dong Tian, Chen Feng

Abstract:We propose Concavity-induced Distance (CID) as a novel way to measure the dissimilarity between a pair of points in an unoriented point cloud. CID indicates the likelihood of two points or two sets of points belonging to different convex parts of an underlying shape represented as a point cloud. After analyzing its properties, we demonstrate how CID can benefit point cloud analysis without the need for meshing or normal estimation, which is beneficial for robotics applications when dealing with raw point cloud observations. By randomly selecting very few points for manual labeling, a CID-based point cloud instance segmentation via label propagation achieves comparable average precision as recent supervised deep learning approaches, on S3DIS and ScanNet datasets. Moreover, CID can be used to group points into approximately convex parts whose convex hulls can be used as compact scene representations in robotics, and it outperforms the baseline method in terms of grouping quality. Our project website is available at: https://ai4ce.github.io/CID/

* 8 pages, 8 figures, accepted by IEEE Robotics and Automation Letters

Via

Access Paper or Ask Questions

GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression

Sep 09, 2022

Jiahao Pang, Muhammad Asad Lodhi, Dong Tian

Figure 1 for GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression

Figure 2 for GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression

Figure 3 for GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression

Figure 4 for GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression

Abstract:Point cloud compression (PCC) is a key enabler for various 3-D applications, owing to the universality of the point cloud format. Ideally, 3D point clouds endeavor to depict object/scene surfaces that are continuous. Practically, as a set of discrete samples, point clouds are locally disconnected and sparsely distributed. This sparse nature is hindering the discovery of local correlation among points for compression. Motivated by an analysis with fractal dimension, we propose a heterogeneous approach with deep learning for lossy point cloud geometry compression. On top of a base layer compressing a coarse representation of the input, an enhancement layer is designed to cope with the challenging geometric residual/details. Specifically, a point-based network is applied to convert the erratic local details to latent features residing on the coarse point cloud. Then a sparse convolutional neural network operating on the coarse point cloud is launched. It utilizes the continuity/smoothness of the coarse geometry to compress the latent features as an enhancement bit-stream that greatly benefits the reconstruction quality. When this bit-stream is unavailable, e.g., due to packet loss, we support a skip mode with the same architecture which generates geometric details from the coarse point cloud directly. Experimentation on both dense and sparse point clouds demonstrate the state-of-the-art compression performance achieved by our proposal. Our code is available at https://github.com/InterDigitalInc/GRASP-Net.

* Accepted at ACM MM 2022 Workshop on Advances in Point Cloud Compression, Processing and Analysis

Via

Access Paper or Ask Questions

FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Apr 01, 2021

Haiyan Wang, Jiahao Pang, Muhammad A. Lodhi, Yingli Tian, Dong Tian

Figure 1 for FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Figure 2 for FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Figure 3 for FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Figure 4 for FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Abstract:Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc. Conventionally, scene flow is estimated from dense/regular RGB video frames. With the development of depth-sensing technologies, precise 3D measurements are available via point clouds which have sparked new research in 3D scene flow. Nevertheless, it remains challenging to extract scene flow from point clouds due to the sparsity and irregularity in typical point cloud sampling patterns. One major issue related to irregular sampling is identified as the randomness during point set abstraction/feature extraction -- an elementary process in many flow estimation scenarios. A novel Spatial Abstraction with Attention (SA^2) layer is accordingly proposed to alleviate the unstable abstraction problem. Moreover, a Temporal Abstraction with Attention (TA^2) layer is proposed to rectify attention in temporal domain, leading to benefits with motions scaled in a larger range. Extensive analysis and experiments verified the motivation and significant performance gains of our method, dubbed as Flow Estimation via Spatial-Temporal Attention (FESTA), when compared to several state-of-the-art benchmarks of scene flow estimation.

* Accepted at CVPR 2021

Via

Access Paper or Ask Questions

Graph Signal Processing for Geometric Data and Beyond: Theory and Applications

Aug 05, 2020

Wei Hu, Jiahao Pang, Xianming Liu, Dong Tian, Chia-Wen Lin, Anthony Vetro

Figure 1 for Graph Signal Processing for Geometric Data and Beyond: Theory and Applications

Figure 2 for Graph Signal Processing for Geometric Data and Beyond: Theory and Applications

Figure 3 for Graph Signal Processing for Geometric Data and Beyond: Theory and Applications

Figure 4 for Graph Signal Processing for Geometric Data and Beyond: Theory and Applications

Abstract:Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited, while Graph Signal Processing (GSP)---a fast-developing field in the signal processing community---enables processing signals that reside on irregular domains and plays a critical role in numerous applications of geometric data from low-level processing to high-level analysis. To further advance the research in this field, we provide the first timely and comprehensive overview of GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs, among the various geometric data modalities, and with spectral/nodal graph filtering techniques. We also discuss the recently developed Graph Neural Networks (GNNs) and interpret the operation of these networks from the perspective of GSP. We conclude with a brief discussion of open problems and challenges.

* 16 pages, 6 figures

Via

Access Paper or Ask Questions

TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations

Jun 17, 2020

Jiahao Pang, Duanshun Li, Dong Tian

Figure 1 for TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations

Figure 2 for TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations

Figure 3 for TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations

Figure 4 for TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations

Abstract:Topology matters. Despite the recent success of point cloud processing with geometric deep learning, it remains arduous to capture the complex topologies of point cloud data with a learning model. Given a point cloud dataset containing objects with various genera or scenes with multiple objects, we propose an autoencoder, TearingNet, which tackles the challenging task of representing the point clouds using a fixed-length descriptor. Unlike existing works to deform primitives of genus zero (e.g., a 2D square patch) to an object-level point cloud, we propose a function which tears the primitive during deformation, letting it emulate the topology of a target point cloud. From the torn primitive, we construct a locally-connected graph to further enforce the learned topology via filtering. Moreover, we analyze a widely existing problem which we call point-collapse when processing point clouds with diverse topologies. Correspondingly, we propose a subtractive sculpture strategy to train our TearingNet model. Experimentation finally shows the superiority of our proposal in terms of reconstructing more faithful point clouds as well as generating more topology-friendly representations than benchmarks.

* Submitted to NeurIPS 2020

Via

Access Paper or Ask Questions