Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yijun Cao

Toward Better SSIM Loss for Unsupervised Monocular Depth Estimation

Jun 05, 2025

Yijun Cao, Fuya Luo, Yongjie Li

Abstract:Unsupervised monocular depth learning generally relies on the photometric relation among temporally adjacent images. Most of previous works use both mean absolute error (MAE) and structure similarity index measure (SSIM) with conventional form as training loss. However, they ignore the effect of different components in the SSIM function and the corresponding hyperparameters on the training. To address these issues, this work proposes a new form of SSIM. Compared with original SSIM function, the proposed new form uses addition rather than multiplication to combine the luminance, contrast, and structural similarity related components in SSIM. The loss function constructed with this scheme helps result in smoother gradients and achieve higher performance on unsupervised depth estimation. We conduct extensive experiments to determine the relatively optimal combination of parameters for our new SSIM. Based on the popular MonoDepth approach, the optimized SSIM loss function can remarkably outperform the baseline on the KITTI-2015 outdoor dataset.

* International Conference on Image and Graphics. Cham: Springer Nature Switzerland, 2023: 81-92
* 12 pages,4 figures

Via

Access Paper or Ask Questions

Vector-Symbolic Architecture for Event-Based Optical Flow

May 15, 2024

Hongzhi You, Yijun Cao, Wei Yuan, Fanjun Wang, Ning Qiao, Yongjie Li

Figure 1 for Vector-Symbolic Architecture for Event-Based Optical Flow

Figure 2 for Vector-Symbolic Architecture for Event-Based Optical Flow

Figure 3 for Vector-Symbolic Architecture for Event-Based Optical Flow

Figure 4 for Vector-Symbolic Architecture for Event-Based Optical Flow

Abstract:From a perspective of feature matching, optical flow estimation for event cameras involves identifying event correspondences by comparing feature similarity across accompanying event frames. In this work, we introduces an effective and robust high-dimensional (HD) feature descriptor for event frames, utilizing Vector Symbolic Architectures (VSA). The topological similarity among neighboring variables within VSA contributes to the enhanced representation similarity of feature descriptors for flow-matching points, while its structured symbolic representation capacity facilitates feature fusion from both event polarities and multiple spatial scales. Based on this HD feature descriptor, we propose a novel feature matching framework for event-based optical flow, encompassing both model-based (VSA-Flow) and self-supervised learning (VSA-SM) methods. In VSA-Flow, accurate optical flow estimation validates the effectiveness of HD feature descriptors. In VSA-SM, a novel similarity maximization method based on the HD feature descriptor is proposed to learn optical flow in a self-supervised way from events alone, eliminating the need for auxiliary grayscale images. Evaluation results demonstrate that our VSA-based method achieves superior accuracy in comparison to both model-based and self-supervised learning methods on the DSEC benchmark, while remains competitive among both methods on the MVSEC benchmark. This contribution marks a significant advancement in event-based optical flow within the feature matching methodology.

Via

Access Paper or Ask Questions

Weak Supervision with Arbitrary Single Frame for Micro- and Macro-expression Spotting

Mar 21, 2024

Wang-Wang Yu, Xian-Shi Zhang, Fu-Ya Luo, Yijun Cao, Kai-Fu Yang, Hong-Mei Yan, Yong-Jie Li

Figure 1 for Weak Supervision with Arbitrary Single Frame for Micro- and Macro-expression Spotting

Figure 2 for Weak Supervision with Arbitrary Single Frame for Micro- and Macro-expression Spotting

Figure 3 for Weak Supervision with Arbitrary Single Frame for Micro- and Macro-expression Spotting

Figure 4 for Weak Supervision with Arbitrary Single Frame for Micro- and Macro-expression Spotting

Abstract:Frame-level micro- and macro-expression spotting methods require time-consuming frame-by-frame observation during annotation. Meanwhile, video-level spotting lacks sufficient information about the location and number of expressions during training, resulting in significantly inferior performance compared with fully-supervised spotting. To bridge this gap, we propose a point-level weakly-supervised expression spotting (PWES) framework, where each expression requires to be annotated with only one random frame (i.e., a point). To mitigate the issue of sparse label distribution, the prevailing solution is pseudo-label mining, which, however, introduces new problems: localizing contextual background snippets results in inaccurate boundaries and discarding foreground snippets leads to fragmentary predictions. Therefore, we design the strategies of multi-refined pseudo label generation (MPLG) and distribution-guided feature contrastive learning (DFCL) to address these problems. Specifically, MPLG generates more reliable pseudo labels by merging class-specific probabilities, attention scores, fused features, and point-level labels. DFCL is utilized to enhance feature similarity for the same categories and feature variability for different categories while capturing global representations across the entire datasets. Extensive experiments on the CAS(ME)^2, CAS(ME)^3, and SAMM-LV datasets demonstrate PWES achieves promising performance comparable to that of recent fully-supervised methods.

Via

Access Paper or Ask Questions

Unsupervised Vision and Vision-motion Calibration Strategies for PointGoal Navigation in Indoor Environment

Oct 02, 2022

Yijun Cao, Xianshi Zhang, Fuya Luo, Yongjie Li

Figure 1 for Unsupervised Vision and Vision-motion Calibration Strategies for PointGoal Navigation in Indoor Environment

Figure 2 for Unsupervised Vision and Vision-motion Calibration Strategies for PointGoal Navigation in Indoor Environment

Figure 3 for Unsupervised Vision and Vision-motion Calibration Strategies for PointGoal Navigation in Indoor Environment

Figure 4 for Unsupervised Vision and Vision-motion Calibration Strategies for PointGoal Navigation in Indoor Environment

Abstract:PointGoal navigation in indoor environment is a fundamental task for personal robots to navigate to a specified point. Recent studies solved this PointGoal navigation task with near-perfect success rate in photo-realistically simulated environments, under the assumptions with noiseless actuation and most importantly, perfect localization with GPS and compass sensors. However, accurate GPS signal can not be obtained in real indoor environment. To improve the pointgoal navigation accuracy in real indoor, we proposed novel vision and vision-motion calibration strategies to train visual and motion path integration in unsupervised manner. Sepecifically, visual calibration computes the relative pose of the agent from the re-projection error of two adjacent frames, and then replaces the accurate GPS signal with the path integration. This pseudo position is also used to calibrate self-motion integration which assists agent to update their internal perception of location and helps improve the success rate of navigation. The training and inference process only use RGB, depth, collision as well as self-action information. The experiments show that the proposed system achieves satisfactory results and outperforms the partially supervised learning algorithms on the popular Gibson dataset.

* 8 pages, 3 figures

Via

Access Paper or Ask Questions

Robust Visual Odometry Using Position-Aware Flow and Geometric Bundle Adjustment

Nov 22, 2021

Yijun Cao, Xianshi Zhang, Fuya Luo, Peng Peng, Yongjie Li

Figure 1 for Robust Visual Odometry Using Position-Aware Flow and Geometric Bundle Adjustment

Figure 2 for Robust Visual Odometry Using Position-Aware Flow and Geometric Bundle Adjustment

Figure 3 for Robust Visual Odometry Using Position-Aware Flow and Geometric Bundle Adjustment

Figure 4 for Robust Visual Odometry Using Position-Aware Flow and Geometric Bundle Adjustment

Abstract:In this paper, an essential problem of robust visual odometry (VO) is approached by incorporating geometry-based methods into deep-learning architecture in a self-supervised manner. Generally, pure geometry-based algorithms are not as robust as deep learning in feature-point extraction and matching, but perform well in ego-motion estimation because of their well-established geometric theory. In this work, a novel optical flow network (PANet) built on a position-aware mechanism is proposed first. Then, a novel system that jointly estimates depth, optical flow, and ego-motion without a typical network to learning ego-motion is proposed. The key component of the proposed system is an improved bundle adjustment module containing multiple sampling, initialization of ego-motion, dynamic damping factor adjustment, and Jacobi matrix weighting. In addition, a novel relative photometric loss function is advanced to improve the depth estimation accuracy. The experiments show that the proposed system not only outperforms other state-of-the-art methods in terms of depth, flow, and VO estimation among self-supervised learning-based methods on KITTI dataset, but also significantly improves robustness compared with geometry-based, learning-based and hybrid VO systems. Further experiments show that our model achieves outstanding generalization ability and performance in challenging indoor (TMU-RGBD) and outdoor (KAIST) scenes.

* 15 pages, 14 figures

Via

Access Paper or Ask Questions