Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haiyan Huang

Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

Mar 24, 2025

Duowang Zhu, Xiaohu Huang, Haiyan Huang, Hao Zhou, Zhenfeng Shao

Abstract:In this paper, we present Change3D, a framework that reconceptualizes the change detection and captioning tasks through video modeling. Recent methods have achieved remarkable success by regarding each pair of bi-temporal images as separate frames. They employ a shared-weight image encoder to extract spatial features and then use a change extractor to capture differences between the two images. However, image feature encoding, being a task-agnostic process, cannot attend to changed regions effectively. Furthermore, different change extractors designed for various change detection and captioning tasks make it difficult to have a unified framework. To tackle these challenges, Change3D regards the bi-temporal images as comprising two frames akin to a tiny video. By integrating learnable perception frames between the bi-temporal images, a video encoder enables the perception frames to interact with the images directly and perceive their differences. Therefore, we can get rid of the intricate change extractors, providing a unified framework for different change detection and captioning tasks. We verify Change3D on multiple tasks, encompassing change detection (including binary change detection, semantic change detection, and building damage assessment) and change captioning, across eight standard benchmarks. Without bells and whistles, this simple yet effective framework can achieve superior performance with an ultra-light video model comprising only ~6%-13% of the parameters and ~8%-34% of the FLOPs compared to state-of-the-art methods. We hope that Change3D could be an alternative to 2D-based models and facilitate future research.

* conference paper, accepted by CVPR 2025

Via

Access Paper or Ask Questions

ChangeViT: Unleashing Plain Vision Transformers for Change Detection

Jun 18, 2024

Duowang Zhu, Xiaohu Huang, Haiyan Huang, Zhenfeng Shao, Qimin Cheng

Figure 1 for ChangeViT: Unleashing Plain Vision Transformers for Change Detection

Figure 2 for ChangeViT: Unleashing Plain Vision Transformers for Change Detection

Figure 3 for ChangeViT: Unleashing Plain Vision Transformers for Change Detection

Figure 4 for ChangeViT: Unleashing Plain Vision Transformers for Change Detection

Abstract:Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper, our study uncovers ViTs' unique advantage in discerning large-scale changes, a capability where CNNs fall short. Capitalizing on this insight, we introduce ChangeViT, a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes. This framework is supplemented by a detail-capture module that generates detailed spatial features and a feature injector that efficiently integrates fine-grained spatial information into high-level semantic learning. The feature integration ensures that ChangeViT excels in both detecting large-scale changes and capturing fine-grained details, providing comprehensive change detection across diverse scales. Without bells and whistles, ChangeViT achieves state-of-the-art performance on three popular high-resolution datasets (i.e., LEVIR-CD, WHU-CD, and CLCD) and one low-resolution dataset (i.e., OSCD), which underscores the unleashed potential of plain ViTs for change detection. Furthermore, thorough quantitative and qualitative analyses validate the efficacy of the introduced modules, solidifying the effectiveness of our approach. The source code is available at https://github.com/zhuduowang/ChangeViT.

Via

Access Paper or Ask Questions

Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate

May 16, 2022

Yuting Ye, Christine Ho, Ci-Ren Jiang, Wayne Tai Lee, Haiyan Huang

Figure 1 for Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate

Figure 2 for Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate

Figure 3 for Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate

Figure 4 for Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate

Abstract:Hierarchical multi-label classification (HMC) has drawn increasing attention in the past few decades. It is applicable when hierarchical relationships among classes are available and need to be incorporated along with the multi-label classification whereby each object is assigned to one or more classes. There are two key challenges in HMC: i) optimizing the classification accuracy, and meanwhile ii) ensuring the given class hierarchy. To address these challenges, in this article, we introduce a new statistic called the multidimensional local precision rate (mLPR) for each object in each class. We show that classification decisions made by simply sorting objects across classes in descending order of their true mLPRs can, in theory, ensure the class hierarchy and lead to the maximization of CATCH, an objective function we introduce that is related to the area under a hit curve. This approach is the first of its kind that handles both challenges in one objective function without additional constraints, thanks to the desirable statistical properties of CATCH and mLPR. In practice, however, true mLPRs are not available. In response, we introduce HierRank, a new algorithm that maximizes an empirical version of CATCH using estimated mLPRs while respecting the hierarchy. The performance of this approach was evaluated on a synthetic data set and two real data sets; ours was found to be superior to several comparison methods on evaluation criteria based on metrics such as precision, recall, and $F_1$ score.

* 34 pages, 11 figures, 9 tables

Via

Access Paper or Ask Questions

HierLPR: Decision making in hierarchical multi-label classification with local precision rates

Oct 18, 2018

Christine Ho, Yuting Ye, Ci-Ren Jiang, Wayne Tai Lee, Haiyan Huang

Figure 1 for HierLPR: Decision making in hierarchical multi-label classification with local precision rates

Figure 2 for HierLPR: Decision making in hierarchical multi-label classification with local precision rates

Figure 3 for HierLPR: Decision making in hierarchical multi-label classification with local precision rates

Figure 4 for HierLPR: Decision making in hierarchical multi-label classification with local precision rates

Abstract:In this article we propose a novel ranking algorithm, referred to as HierLPR, for the multi-label classification problem when the candidate labels follow a known hierarchical structure. HierLPR is motivated by a new metric called eAUC that we design to assess the ranking of classification decisions. This metric, associated with the hit curve and local precision rate, emphasizes the accuracy of the first calls. We show that HierLPR optimizes eAUC under the tree constraint and some light assumptions on the dependency between the nodes in the hierarchy. We also provide a strategy to make calls for each node based on the ordering produced by HierLPR, with the intent of controlling FDR or maximizing F-score. The performance of our proposed methods is demonstrated on synthetic datasets as well as a real example of disease diagnosis using NCBI GEO datasets. In these cases, HierLPR shows a favorable result over competing methods in the early part of the precision-recall curve.

* 27 pages, 9 figures

Via

Access Paper or Ask Questions