Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiqun Wang

Vision Transformer-Based Time-Series Image Reconstruction for Cloud-Filling Applications

Jun 24, 2025

Lujun Li, Yiqun Wang, Radu State

Abstract:Cloud cover in multispectral imagery (MSI) poses significant challenges for early season crop mapping, as it leads to missing or corrupted spectral information. Synthetic aperture radar (SAR) data, which is not affected by cloud interference, offers a complementary solution, but lack sufficient spectral detail for precise crop mapping. To address this, we propose a novel framework, Time-series MSI Image Reconstruction using Vision Transformer (ViT), to reconstruct MSI data in cloud-covered regions by leveraging the temporal coherence of MSI and the complementary information from SAR from the attention mechanism. Comprehensive experiments, using rigorous reconstruction evaluation metrics, demonstrate that Time-series ViT framework significantly outperforms baselines that use non-time-series MSI and SAR or time-series MSI without SAR, effectively enhancing MSI image reconstruction in cloud-covered regions.

* This paper has been accepted as a conference paper at the 2025 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

Via

Access Paper or Ask Questions

MT-PCR: A Hybrid Mamba-Transformer with Spatial Serialization for Hierarchical Point Cloud Registration

Jun 16, 2025

Bingxi Liu, An Liu, Hao Chen, Jinqiang Cui, Yiqun Wang, Hong Zhang

Abstract:Point cloud registration (PCR) is a fundamental task in 3D computer vision and robotics. Most existing learning-based PCR methods rely on Transformers, which suffer from quadratic computational complexity. This limitation restricts the resolution of point clouds that can be processed, inevitably leading to information loss. In contrast, Mamba-a recently proposed model based on state space models (SSMs)-achieves linear computational complexity while maintaining strong long-range contextual modeling capabilities. However, directly applying Mamba to PCR tasks yields suboptimal performance due to the unordered and irregular nature of point cloud data. To address this challenge, we propose MT-PCR, the first point cloud registration framework that integrates both Mamba and Transformer modules. Specifically, we serialize point cloud features using Z-order space-filling curves to enforce spatial locality, enabling Mamba to better model the geometric structure of the input. Additionally, we remove the order indicator module commonly used in Mamba-based sequence modeling, leads to improved performance in our setting. The serialized features are then processed by an optimized Mamba encoder, followed by a Transformer refinement stage. Extensive experiments on multiple benchmarks demonstrate that MT-PCR outperforms Transformer-based and concurrent state-of-the-art methods in both accuracy and efficiency, significantly reducing while GPU memory usage and FLOPs.

* 11 Pages

Via

Access Paper or Ask Questions

Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent Visibility

May 27, 2025

Yidi Li, Jun Xiao, Zhengda Lu, Yiqun Wang, Haiyong Jiang

Abstract:This work presents a novel text-to-vector graphics generation approach, Dream3DVG, allowing for arbitrary viewpoint viewing, progressive detail optimization, and view-dependent occlusion awareness. Our approach is a dual-branch optimization framework, consisting of an auxiliary 3D Gaussian Splatting optimization branch and a 3D vector graphics optimization branch. The introduced 3DGS branch can bridge the domain gaps between text prompts and vector graphics with more consistent guidance. Moreover, 3DGS allows for progressive detail control by scheduling classifier-free guidance, facilitating guiding vector graphics with coarse shapes at the initial stages and finer details at later stages. We also improve the view-dependent occlusions by devising a visibility-awareness rendering module. Extensive results on 3D sketches and 3D iconographies, demonstrate the superiority of the method on different abstraction levels of details, cross-view consistency, and occlusion-aware stroke culling.

* CVPR 2025

Via

Access Paper or Ask Questions

3D Surface Reconstruction with Enhanced High-Frequency Details

May 06, 2025

Shikun Zhang, Yiqun Wang, Cunjian Chen, Yong Li, Qiuhong Ke

Abstract:Neural implicit 3D reconstruction can reproduce shapes without 3D supervision, and it learns the 3D scene through volume rendering methods and neural implicit representations. Current neural surface reconstruction methods tend to randomly sample the entire image, making it difficult to learn high-frequency details on the surface, and thus the reconstruction results tend to be too smooth. We designed a method (FreNeuS) based on high-frequency information to solve the problem of insufficient surface detail. Specifically, FreNeuS uses pixel gradient changes to easily acquire high-frequency regions in an image and uses the obtained high-frequency information to guide surface detail reconstruction. High-frequency information is first used to guide the dynamic sampling of rays, applying different sampling strategies according to variations in high-frequency regions. To further enhance the focus on surface details, we have designed a high-frequency weighting method that constrains the representation of high-frequency details during the reconstruction process. Qualitative and quantitative results show that our method can reconstruct fine surface details and obtain better surface reconstruction quality compared to existing methods. In addition, our method is more applicable and can be generalized to any NeuS-based work.

* Accepted by Journal of Visual Communication and Image Representation

Via

Access Paper or Ask Questions

MoMa: A Modular Deep Learning Framework for Material Property Prediction

Feb 21, 2025

Botian Wang, Yawen Ouyang, Yaohui Li, Yiqun Wang, Haorui Cui, Jianbing Zhang, Xiaonan Wang, Wei-Ying Ma, Hao Zhou

Abstract:Deep learning methods for material property prediction have been widely explored to advance materials discovery. However, the prevailing pre-train then fine-tune paradigm often fails to address the inherent diversity and disparity of material tasks. To overcome these challenges, we introduce MoMa, a Modular framework for Materials that first trains specialized modules across a wide range of tasks and then adaptively composes synergistic modules tailored to each downstream scenario. Evaluation across 17 datasets demonstrates the superiority of MoMa, with a substantial 14% average improvement over the strongest baseline. Few-shot and continual learning experiments further highlight MoMa's potential for real-world applications. Pioneering a new paradigm of modular material learning, MoMa will be open-sourced to foster broader community collaboration.

Via

Access Paper or Ask Questions

DehazeGS: Seeing Through Fog with 3D Gaussian Splatting

Jan 07, 2025

Jinze Yu, Yiqun Wang, Zhengda Lu, Jianwei Guo, Yong Li, Hongxing Qin, Xiaopeng Zhang

Abstract:Current novel view synthesis tasks primarily rely on high-quality and clear images. However, in foggy scenes, scattering and attenuation can significantly degrade the reconstruction and rendering quality. Although NeRF-based dehazing reconstruction algorithms have been developed, their use of deep fully connected neural networks and per-ray sampling strategies leads to high computational costs. Moreover, NeRF's implicit representation struggles to recover fine details from hazy scenes. In contrast, recent advancements in 3D Gaussian Splatting achieve high-quality 3D scene reconstruction by explicitly modeling point clouds into 3D Gaussians. In this paper, we propose leveraging the explicit Gaussian representation to explain the foggy image formation process through a physically accurate forward rendering process. We introduce DehazeGS, a method capable of decomposing and rendering a fog-free background from participating media using only muti-view foggy images as input. We model the transmission within each Gaussian distribution to simulate the formation of fog. During this process, we jointly learn the atmospheric light and scattering coefficient while optimizing the Gaussian representation of the hazy scene. In the inference stage, we eliminate the effects of scattering and attenuation on the Gaussians and directly project them onto a 2D plane to obtain a clear view. Experiments on both synthetic and real-world foggy datasets demonstrate that DehazeGS achieves state-of-the-art performance in terms of both rendering quality and computational efficiency.

* 9 pages,4 figures

Via

Access Paper or Ask Questions

Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Jul 08, 2024

Lintao Zhang, Xiangcheng Du, LeoWu TomyEnrique, Yiqun Wang, Yingbin Zheng, Cheng Jin

Figure 1 for Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Figure 2 for Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Figure 3 for Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Figure 4 for Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Abstract:For image inpainting, the existing Denoising Diffusion Probabilistic Model (DDPM) based method i.e. RePaint can produce high-quality images for any inpainting form. It utilizes a pre-trained DDPM as a prior and generates inpainting results by conditioning on the reverse diffusion process, namely denoising process. However, this process is significantly time-consuming. In this paper, we propose an efficient DDPM-based image inpainting method which includes three speed-up strategies. First, we utilize a pre-trained Light-Weight Diffusion Model (LWDM) to reduce the number of parameters. Second, we introduce a skip-step sampling scheme of Denoising Diffusion Implicit Models (DDIM) for the denoising process. Finally, we propose Coarse-to-Fine Sampling (CFS), which speeds up inference by reducing image resolution in the coarse stage and decreasing denoising timesteps in the refinement stage. We conduct extensive experiments on both faces and general-purpose image inpainting tasks, and our method achieves competitive performance with approximately 60 times speedup.

* The code is avaliable at: https://github.com/linghuyuhangyuan/M2S

Via

Access Paper or Ask Questions

Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Jul 03, 2024

Yiqun Wang, Zhao Zhou, Xiangcheng Du, Xingjiao Wu, Yingbin Zheng, Cheng Jin

Figure 1 for Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Figure 2 for Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Figure 3 for Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Figure 4 for Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Abstract:When dealing with the task of fine-grained scene image classification, most previous works lay much emphasis on global visual features when doing multi-modal feature fusion. In other words, models are deliberately designed based on prior intuitions about the importance of different modalities. In this paper, we present a new multi-modal feature fusion approach named MAA (Modality-Agnostic Adapter), trying to make the model learn the importance of different modalities in different cases adaptively, without giving a prior setting in the model architecture. More specifically, we eliminate the modal differences in distribution and then use a modality-agnostic Transformer encoder for a semantic-level feature fusion. Our experiments demonstrate that MAA achieves state-of-the-art results on benchmarks by applying the same modalities with previous methods. Besides, it is worth mentioning that new modalities can be easily added when using MAA and further boost the performance. Code is available at https://github.com/quniLcs/MAA.

Via

Access Paper or Ask Questions

Graphical Reasoning: LLM-based Semi-Open Relation Extraction

Apr 30, 2024

Yicheng Tao, Yiqun Wang, Longju Bai

Abstract:This paper presents a comprehensive exploration of relation extraction utilizing advanced language models, specifically Chain of Thought (CoT) and Graphical Reasoning (GRE) techniques. We demonstrate how leveraging in-context learning with GPT-3.5 can significantly enhance the extraction process, particularly through detailed example-based reasoning. Additionally, we introduce a novel graphical reasoning approach that dissects relation extraction into sequential sub-tasks, improving precision and adaptability in processing complex relational data. Our experiments, conducted on multiple datasets, including manually annotated data, show considerable improvements in performance metrics, underscoring the effectiveness of our methodologies.

Via

Access Paper or Ask Questions

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

Mar 21, 2024

Yan Wang, Lihao Wang, Yuning Shen, Yiqun Wang, Huizhuo Yuan, Yue Wu, Quanquan Gu

Figure 1 for Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

Figure 2 for Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

Figure 3 for Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

Figure 4 for Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

Abstract:The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially diffusion models, have been employed to generate novel protein conformations. However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation. By incorporating a force-guided network with a mixture of data-based score models, ConfDiff can can generate protein conformations with rich diversity while preserving high fidelity. Experiments on a variety of protein conformation prediction tasks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method surpasses the state-of-the-art method.

Via

Access Paper or Ask Questions