Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiahao Lin

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Mar 17, 2025

Songjun Tu, Jiahao Lin, Xiangyu Tian, Qichao Zhang, Linjing Li, Yuqian Fu, Nan Xu, Wei He, Xiangyuan Lan, Dongmei Jiang(+1 more)

Abstract:Recent advancements in post-training methodologies for large language models (LLMs) have highlighted reinforcement learning (RL) as a critical component for enhancing reasoning. However, the substantial computational costs associated with RL-based approaches have led to growing interest in alternative paradigms, such as Direct Preference Optimization (DPO). In this study, we investigate the effectiveness of DPO in facilitating self-improvement for LLMs through iterative preference-based learning. We demonstrate that a single round of DPO with coarse filtering significantly enhances mathematical reasoning performance, particularly for strong base model. Furthermore, we design an iterative enhancement framework for both the generator and the reward model (RM), enabling their mutual improvement through online interaction across multiple rounds of DPO. Finally, with simple verifiable rewards, our model DPO-VP achieves RL-level performance with significantly lower computational overhead. These findings highlight DPO as a scalable and cost-effective alternative to RL, offering a practical solution for enhancing LLM reasoning in resource-constrained situations.

Via

Access Paper or Ask Questions

GHuNeRF: Generalizable Human NeRF from a Monocular Video

Sep 03, 2023

Chen Li, Jiahao Lin, Gim Hee Lee

Abstract:In this paper, we tackle the challenging task of learning a generalizable human NeRF model from a monocular video. Although existing generalizable human NeRFs have achieved impressive results, they require muti-view images or videos which might not be always available. On the other hand, some works on free-viewpoint rendering of human from monocular videos cannot be generalized to unseen identities. In view of these limitations, we propose GHuNeRF to learn a generalizable human NeRF model from a monocular video of the human performer. We first introduce a visibility-aware aggregation scheme to compute vertex-wise features, which is used to construct a 3D feature volume. The feature volume can only represent the overall geometry of the human performer with insufficient accuracy due to the limited resolution. To solve this, we further enhance the volume feature with temporally aligned point-wise features using an attention mechanism. Finally, the enhanced feature is used for predicting density and color for each sampled point. A surface-guided sampling strategy is also introduced to improve the efficiency for both training and inference. We validate our approach on the widely-used ZJU-MoCap dataset, where we achieve comparable performance with existing multi-view video based approaches. We also test on the monocular People-Snapshot dataset and achieve better performance than existing works when only monocular video is used.

* Corrected typos

Via

Access Paper or Ask Questions

Online Map Vectorization for Autonomous Driving: A Rasterization Perspective

Jun 18, 2023

Gongjie Zhang, Jiahao Lin, Shuang Wu, Yilin Song, Zhipeng Luo, Yang Xue, Shijian Lu, Zuoguan Wang

Abstract:Vectorized high-definition (HD) map is essential for autonomous driving, providing detailed and precise environmental information for advanced perception and planning. However, current map vectorization methods often exhibit deviations, and the existing evaluation metric for map vectorization lacks sufficient sensitivity to detect these deviations. To address these limitations, we propose integrating the philosophy of rasterization into map vectorization. Specifically, we introduce a new rasterization-based evaluation metric, which has superior sensitivity and is better suited to real-world autonomous driving scenarios. Furthermore, we propose MapVR (Map Vectorization via Rasterization), a novel framework that applies differentiable rasterization to vectorized outputs and then performs precise and geometry-aware supervision on rasterized HD maps. Notably, MapVR designs tailored rasterization strategies for various geometric shapes, enabling effective adaptation to a wide range of map elements. Experiments show that incorporating rasterization into map vectorization greatly enhances performance with no extra computational cost during inference, leading to more accurate map perception and ultimately promoting safer autonomous driving.

Via

Access Paper or Ask Questions

Learning Spatial Context with Graph Neural Network for Multi-Person Pose Grouping

Apr 06, 2021

Jiahao Lin, Gim Hee Lee

Figure 1 for Learning Spatial Context with Graph Neural Network for Multi-Person Pose Grouping

Figure 2 for Learning Spatial Context with Graph Neural Network for Multi-Person Pose Grouping

Figure 3 for Learning Spatial Context with Graph Neural Network for Multi-Person Pose Grouping

Figure 4 for Learning Spatial Context with Graph Neural Network for Multi-Person Pose Grouping

Abstract:Bottom-up approaches for image-based multi-person pose estimation consist of two stages: (1) keypoint detection and (2) grouping of the detected keypoints to form person instances. Current grouping approaches rely on learned embedding from only visual features that completely ignore the spatial configuration of human poses. In this work, we formulate the grouping task as a graph partitioning problem, where we learn the affinity matrix with a Graph Neural Network (GNN). More specifically, we design a Geometry-aware Association GNN that utilizes spatial information of the keypoints and learns local affinity from the global context. The learned geometry-based affinity is further fused with appearance-based affinity to achieve robust keypoint association. Spectral clustering is used to partition the graph for the formation of the pose instances. Experimental results on two benchmark datasets show that our proposed method outperforms existing appearance-only grouping frameworks, which shows the effectiveness of utilizing spatial context for robust grouping. Source code is available at: https://github.com/jiahaoLjh/PoseGrouping.

* 7 pages, 4 figures. Accepted in ICRA 2021

Via

Access Paper or Ask Questions

Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo

Apr 06, 2021

Jiahao Lin, Gim Hee Lee

Figure 1 for Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo

Figure 2 for Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo

Figure 3 for Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo

Figure 4 for Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo

Abstract:Existing approaches for multi-view multi-person 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views and solve for the 3D pose estimation for each person. Establishing cross-view correspondences is challenging in multi-person scenes, and incorrect correspondences will lead to sub-optimal performance for the multi-stage pipeline. In this work, we present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot. Specifically, we propose to perform depth regression for each joint of each 2D pose in a target camera view. Cross-view consistency constraints are implicitly enforced by multiple reference camera views via the plane sweep algorithm to facilitate accurate depth regression. We adopt a coarse-to-fine scheme to first regress the person-level depth followed by a per-person joint-level relative depth estimation. 3D poses are obtained from a simple back-projection given the estimated depths. We evaluate our approach on benchmark datasets where it outperforms previous state-of-the-arts while being remarkably efficient. Our code is available at https://github.com/jiahaoLjh/PlaneSweepPose.

* 10 pages, 5 figures. Accepted in CVPR 2021

Via

Access Paper or Ask Questions

HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization

Jul 17, 2020

Jiahao Lin, Gim Hee Lee

Figure 1 for HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization

Figure 2 for HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization

Abstract:Current works on multi-person 3D pose estimation mainly focus on the estimation of the 3D joint locations relative to the root joint and ignore the absolute locations of each pose. In this paper, we propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization in the camera coordinate space. Our HDNet first estimates the 2D human pose with heatmaps of the joints. These estimated heatmaps serve as attention masks for pooling features from image regions corresponding to the target person. A skeleton-based Graph Neural Network (GNN) is utilized to propagate features among joints. We formulate the target depth regression as a bin index estimation problem, which can be transformed with a soft-argmax operation from the classification output of our HDNet. We evaluate our HDNet on the root joint localization and root-relative 3D pose estimation tasks with two benchmark datasets, i.e., Human3.6M and MuPoTS-3D. The experimental results show that we outperform the previous state-of-the-art consistently under multiple evaluation metrics. Our source code is available at: https://github.com/jiahaoLjh/HumanDepth.

* 16 pages, 5 figures. Accepted in ECCV 2020

Via

Access Paper or Ask Questions

Robust Vision-based Obstacle Avoidance for Micro Aerial Vehicles in Dynamic Environments

Feb 13, 2020

Jiahao Lin, Hai Zhu, Javier Alonso-Mora

Figure 1 for Robust Vision-based Obstacle Avoidance for Micro Aerial Vehicles in Dynamic Environments

Figure 2 for Robust Vision-based Obstacle Avoidance for Micro Aerial Vehicles in Dynamic Environments

Figure 3 for Robust Vision-based Obstacle Avoidance for Micro Aerial Vehicles in Dynamic Environments

Figure 4 for Robust Vision-based Obstacle Avoidance for Micro Aerial Vehicles in Dynamic Environments

Abstract:In this paper, we present an on-board vision-based approach for avoidance of moving obstacles in dynamic environments. Our approach relies on an efficient obstacle detection and tracking algorithm based on depth image pairs, which provides the estimated position, velocity and size of the obstacles. Robust collision avoidance is achieved by formulating a chance-constrained model predictive controller (CC-MPC) to ensure that the collision probability between the micro aerial vehicle (MAV) and each moving obstacle is below a specified threshold. The method takes into account MAV dynamics, state estimation and obstacle sensing uncertainties. The proposed approach is implemented on a quadrotor equipped with a stereo camera and is tested in a variety of environments, showing effective on-line collision avoidance of moving obstacles.

* 7 pages, 7 figures, to be published in 2020 IEEE International Conference on Robotics and Automation (ICRA)

Via

Access Paper or Ask Questions

Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation

Aug 22, 2019

Jiahao Lin, Gim Hee Lee

Figure 1 for Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation

Figure 2 for Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation

Figure 3 for Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation

Figure 4 for Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation

Abstract:Existing deep learning approaches on 3d human pose estimation for videos are either based on Recurrent or Convolutional Neural Networks (RNNs or CNNs). However, RNN-based frameworks can only tackle sequences with limited frames because sequential models are sensitive to bad frames and tend to drift over long sequences. Although existing CNN-based temporal frameworks attempt to address the sensitivity and drift problems by concurrently processing all input frames in the sequence, the existing state-of-the-art CNN-based framework is limited to 3d pose estimation of a single frame from a sequential input. In this paper, we propose a deep learning-based framework that utilizes matrix factorization for sequential 3d human poses estimation. Our approach processes all input frames concurrently to avoid the sensitivity and drift problems, and yet outputs the 3d pose estimates for every frame in the input sequence. More specifically, the 3d poses in all frames are represented as a motion matrix factorized into a trajectory bases matrix and a trajectory coefficient matrix. The trajectory bases matrix is precomputed from matrix factorization approaches such as Singular Value Decomposition (SVD) or Discrete Cosine Transform (DCT), and the problem of sequential 3d pose estimation is reduced to training a deep network to regress the trajectory coefficient matrix. We demonstrate the effectiveness of our framework on long sequences by achieving state-of-the-art performances on multiple benchmark datasets. Our source code is available at: https://github.com/jiahaoLjh/trajectory-pose-3d.

* 13 pages, 5 figures. Accepted in BMVC 2019

Via

Access Paper or Ask Questions