Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Caiming Zhang

MonoRelief V2: Leveraging Real Data for High-Fidelity Monocular Relief Recovery

Aug 27, 2025

Yu-Wei Zhang, Tongju Han, Lipeng Gao, Mingqiang Wei, Hui Liu, Changbao Li, Caiming Zhang

Abstract:This paper presents MonoRelief V2, an end-to-end model designed for directly recovering 2.5D reliefs from single images under complex material and illumination variations. In contrast to its predecessor, MonoRelief V1 [1], which was solely trained on synthetic data, MonoRelief V2 incorporates real data to achieve improved robustness, accuracy and efficiency. To overcome the challenge of acquiring large-scale real-world dataset, we generate approximately 15,000 pseudo real images using a text-to-image generative model, and derive corresponding depth pseudo-labels through fusion of depth and normal predictions. Furthermore, we construct a small-scale real-world dataset (800 samples) via multi-view reconstruction and detail refinement. MonoRelief V2 is then progressively trained on the pseudo-real and real-world datasets. Comprehensive experiments demonstrate its state-of-the-art performance both in depth and normal predictions, highlighting its strong potential for a range of downstream applications. Code is at: https://github.com/glp1001/MonoreliefV2.

Via

Access Paper or Ask Questions

NeuVAS: Neural Implicit Surfaces for Variational Shape Modeling

Jun 16, 2025

Pengfei Wang, Qiujie Dong, Fangtian Liang, Hao Pan, Lei Yang, Congyi Zhang, Guying Lin, Caiming Zhang, Yuanfeng Zhou, Changhe Tu(+4 more)

Abstract:Neural implicit shape representation has drawn significant attention in recent years due to its smoothness, differentiability, and topological flexibility. However, directly modeling the shape of a neural implicit surface, especially as the zero-level set of a neural signed distance function (SDF), with sparse geometric control is still a challenging task. Sparse input shape control typically includes 3D curve networks or, more generally, 3D curve sketches, which are unstructured and cannot be connected to form a curve network, and therefore more difficult to deal with. While 3D curve networks or curve sketches provide intuitive shape control, their sparsity and varied topology pose challenges in generating high-quality surfaces to meet such curve constraints. In this paper, we propose NeuVAS, a variational approach to shape modeling using neural implicit surfaces constrained under sparse input shape control, including unstructured 3D curve sketches as well as connected 3D curve networks. Specifically, we introduce a smoothness term based on a functional of surface curvatures to minimize shape variation of the zero-level set surface of a neural SDF. We also develop a new technique to faithfully model G0 sharp feature curves as specified in the input curve sketches. Comprehensive comparisons with the state-of-the-art methods demonstrate the significant advantages of our method.

Via

Access Paper or Ask Questions

Minding Fuzzy Regions: A Data-driven Alternating Learning Paradigm for Stable Lesion Segmentation

Mar 14, 2025

Lexin Fang, Yunyang Xu, Xiang Ma, Xuemei Li, Caiming Zhang

Abstract:Deep learning has achieved significant advancements in medical image segmentation, but existing models still face challenges in accurately segmenting lesion regions. The main reason is that some lesion regions in medical images have unclear boundaries, irregular shapes, and small tissue density differences, leading to label ambiguity. However, the existing model treats all data equally without taking quality differences into account in the training process, resulting in noisy labels negatively impacting model training and unstable feature representations. In this paper, a data-driven alternating learning (DALE) paradigm is proposed to optimize the model's training process, achieving stable and high-precision segmentation. The paradigm focuses on two key points: (1) reducing the impact of noisy labels, and (2) calibrating unstable representations. To mitigate the negative impact of noisy labels, a loss consistency-based collaborative optimization method is proposed, and its effectiveness is theoretically demonstrated. Specifically, the label confidence parameters are introduced to dynamically adjust the influence of labels of different confidence levels during model training, thus reducing the influence of noise labels. To calibrate the learning bias of unstable representations, a distribution alignment method is proposed. This method restores the underlying distribution of unstable representations, thereby enhancing the discriminative capability of fuzzy region representations. Extensive experiments on various benchmarks and model backbones demonstrate the superiority of the DALE paradigm, achieving an average performance improvement of up to 7.16%.

* 10 pages, 11 figures, accepted by CVPR 2025

Via

Access Paper or Ask Questions

Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching

Oct 22, 2024

Xiang Ma, Xuemei Li, Lexin Fang, Caiming Zhang

Figure 1 for Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching

Figure 2 for Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching

Figure 3 for Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching

Figure 4 for Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching

Abstract:Many contrastive learning based models have achieved advanced performance in image-text matching tasks. The key of these models lies in analyzing the correlation between image-text pairs, which involves cross-modal interaction of embeddings in corresponding dimensions. However, the embeddings of different modalities are from different models or modules, and there is a significant modality gap. Directly interacting such embeddings lacks rationality and may capture inaccurate correlation. Therefore, we propose a novel method called DIAS to bridge the modality gap from two aspects: (1) We align the information representation of embeddings from different modalities in corresponding dimension to ensure the correlation calculation is based on interactions of similar information. (2) The spatial constraints of inter- and intra-modalities unmatched pairs are introduced to ensure the effectiveness of semantic alignment of the model. Besides, a sparse correlation algorithm is proposed to select strong correlated spatial relationships, enabling the model to learn more significant features and avoid being misled by weak correlation. Extensive experiments demonstrate the superiority of DIAS, achieving 4.3\%-10.2\% rSum improvements on Flickr30k and MSCOCO benchmarks.

Via

Access Paper or Ask Questions

U-Mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting

Jan 04, 2024

Xiang Ma, Xuemei Li, Lexin Fang, Tianlong Zhao, Caiming Zhang

Figure 1 for U-Mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting

Figure 2 for U-Mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting

Figure 3 for U-Mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting

Figure 4 for U-Mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting

Abstract:Time series forecasting is a crucial task in various domains. Caused by factors such as trends, seasonality, or irregular fluctuations, time series often exhibits non-stationary. It obstructs stable feature propagation through deep layers, disrupts feature distributions, and complicates learning data distribution changes. As a result, many existing models struggle to capture the underlying patterns, leading to degraded forecasting performance. In this study, we tackle the challenge of non-stationarity in time series forecasting with our proposed framework called U-Mixer. By combining Unet and Mixer, U-Mixer effectively captures local temporal dependencies between different patches and channels separately to avoid the influence of distribution variations among channels, and merge low- and high-levels features to obtain comprehensive data representations. The key contribution is a novel stationarity correction method, explicitly restoring data distribution by constraining the difference in stationarity between the data before and after model processing to restore the non-stationarity information, while ensuring the temporal dependencies are preserved. Through extensive experiments on various real-world time series datasets, U-Mixer demonstrates its effectiveness and robustness, and achieves 14.5\% and 7.7\% improvements over state-of-the-art (SOTA) methods.

* Accepted by AAAI2024

Via

Access Paper or Ask Questions

D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer

Dec 20, 2023

Junjie Gao, Pengfei Wang, Qiujie Dong, Qiong Zeng, Shiqing Xin, Caiming Zhang

Abstract:Establishing accurate and representative matches is a crucial step in addressing the point cloud registration problem. A commonly employed approach involves detecting keypoints with salient geometric features and subsequently mapping these keypoints from one frame of the point cloud to another. However, methods within this category are hampered by the repeatability of the sampled keypoints. In this paper, we introduce a saliency-guided trans\textbf{former}, referred to as \textit{D3Former}, which entails the joint learning of repeatable \textbf{D}ense \textbf{D}etectors and feature-enhanced \textbf{D}escriptors. The model comprises a Feature Enhancement Descriptor Learning (FEDL) module and a Repetitive Keypoints Detector Learning (RKDL) module. The FEDL module utilizes a region attention mechanism to enhance feature distinctiveness, while the RKDL module focuses on detecting repeatable keypoints to enhance matching capabilities. Extensive experimental results on challenging indoor and outdoor benchmarks demonstrate that our proposed method consistently outperforms state-of-the-art point cloud matching methods. Notably, tests on 3DLoMatch, even with a low overlap ratio, show that our method consistently outperforms recently published approaches such as RoReg and RoITr. For instance, with the number of extracted keypoints reduced to 250, the registration recall scores for RoReg, RoITr, and our method are 64.3\%, 73.6\%, and 76.5\%, respectively.

* 15 pages, 6 figures

Via

Access Paper or Ask Questions

Efficient and Effective Multi-View Subspace Clustering for Large-scale Data

Oct 15, 2023

Yuxiu Lin, Hui Liu, Ren Wang, Gongguan Chen, Caiming Zhang

Figure 1 for Efficient and Effective Multi-View Subspace Clustering for Large-scale Data

Figure 2 for Efficient and Effective Multi-View Subspace Clustering for Large-scale Data

Figure 3 for Efficient and Effective Multi-View Subspace Clustering for Large-scale Data

Figure 4 for Efficient and Effective Multi-View Subspace Clustering for Large-scale Data

Abstract:Recent multi-view subspace clustering achieves impressive results utilizing deep networks, where the self-expressive correlation is typically modeled by a fully connected (FC) layer. However, they still suffer from two limitations: i) it is under-explored to extract a unified representation from multiple views that simultaneously satisfy minimal sufficiency and discriminability. ii) the parameter scale of the FC layer is quadratic to the number of samples, resulting in high time and memory costs that significantly degrade their feasibility in large-scale datasets. In light of this, we propose a novel deep framework termed Efficient and Effective Large-scale Multi-View Subspace Clustering (E$^2$LMVSC). Specifically, to enhance the quality of the unified representation, a soft clustering assignment similarity constraint is devised for explicitly decoupling consistent, complementary, and superfluous information across multi-view data. Then, following information bottleneck theory, a sufficient yet minimal unified feature representation is obtained. Moreover, E$^2$LMVSC employs the maximal coding rate reduction principle to promote intra-cluster aggregation and inter-cluster separability within the unified representation. Finally, the self-expressive coefficients are learned by a Relation-Metric Net instead of a parameterized FC layer for greater efficiency. Extensive experiments show that E$^2$LMVSC yields comparable results to existing methods and achieves state-of-the-art clustering performance in large-scale multi-view datasets.

Via

Access Paper or Ask Questions

MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting

Jul 13, 2023

Tianlong Zhao, Xiang Ma, Xuemei Li, Caiming Zhang

Figure 1 for MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting

Figure 2 for MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting

Figure 3 for MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting

Figure 4 for MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting

Abstract:Time series forecasting has received wide interest from existing research due to its broad applications and inherent challenging. The research challenge lies in identifying effective patterns in historical series and applying them to future forecasting. Advanced models based on point-wise connected MLP and Transformer architectures have strong fitting power, but their secondary computational complexity limits practicality. Additionally, those structures inherently disrupt the temporal order, reducing the information utilization and making the forecasting process uninterpretable. To solve these problems, this paper proposes a forecasting model, MPR-Net. It first adaptively decomposes multi-scale historical series patterns using convolution operation, then constructs a pattern extension forecasting method based on the prior knowledge of pattern reproduction, and finally reconstructs future patterns into future series using deconvolution operation. By leveraging the temporal dependencies present in the time series, MPR-Net not only achieves linear time complexity, but also makes the forecasting process interpretable. By carrying out sufficient experiments on more than ten real data sets of both short and long term forecasting tasks, MPR-Net achieves the state of the art forecasting performance, as well as good generalization and robustness performance.

Via

Access Paper or Ask Questions

Tensor Robust PCA with Nonconvex and Nonlocal Regularization

Nov 04, 2022

Xiaoyu Geng, Qiang Guo, Shuaixiong Hui, Caiming Zhang

Abstract:Tensor robust principal component analysis (TRPCA) is a promising way for low-rank tensor recovery, which minimizes the convex surrogate of tensor rank by shrinking each tensor singular values equally. However, for real-world visual data, large singular values represent more signifiant information than small singular values. In this paper, we propose a nonconvex TRPCA (N-TRPCA) model based on the tensor adjustable logarithmic norm. Unlike TRPCA, our N-TRPCA can adaptively shrink small singular values more and shrink large singular values less. In addition, TRPCA assumes that the whole data tensor is of low rank. This assumption is hardly satisfied in practice for natural visual data, restricting the capability of TRPCA to recover the edges and texture details from noisy images and videos. To this end, we integrate nonlocal self-similarity into N-TRPCA, and further develop a nonconvex and nonlocal TRPCA (NN-TRPCA) model. Specifically, similar nonlocal patches are grouped as a tensor and then each group tensor is recovered by our N-TRPCA. Since the patches in one group are highly correlated, all group tensors have strong low-rank property, leading to an improvement of recovery performance. Experimental results demonstrate that the proposed NN-TRPCA outperforms some existing TRPCA methods in visual data recovery. The demo code is available at https://github.com/qguo2010/NN-TRPCA.

* 19 pages, 7 figures

Via

Access Paper or Ask Questions

Context-aware virtual adversarial training for anatomically-plausible segmentation

Jul 13, 2021

Ping Wang, Jizong Peng, Marco Pedersoli, Yuanfeng Zhou, Caiming Zhang, Christian Desrosiers

Figure 1 for Context-aware virtual adversarial training for anatomically-plausible segmentation

Figure 2 for Context-aware virtual adversarial training for anatomically-plausible segmentation

Figure 3 for Context-aware virtual adversarial training for anatomically-plausible segmentation

Figure 4 for Context-aware virtual adversarial training for anatomically-plausible segmentation

Abstract:Despite their outstanding accuracy, semi-supervised segmentation methods based on deep neural networks can still yield predictions that are considered anatomically impossible by clinicians, for instance, containing holes or disconnected regions. To solve this problem, we present a Context-aware Virtual Adversarial Training (CaVAT) method for generating anatomically plausible segmentation. Unlike approaches focusing solely on accuracy, our method also considers complex topological constraints like connectivity which cannot be easily modeled in a differentiable loss function. We use adversarial training to generate examples violating the constraints, so the network can learn to avoid making such incorrect predictions on new examples, and employ the Reinforce algorithm to handle non-differentiable segmentation constraints. The proposed method offers a generic and efficient way to add any constraint on top of any segmentation network. Experiments on two clinically-relevant datasets show our method to produce segmentations that are both accurate and anatomically-plausible in terms of region connectivity.

* This paper is accepted at MICCAI2021

Via

Access Paper or Ask Questions