Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhenyu Weng

Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision

Oct 09, 2025

Xiaoxu Ma, Runhao Li, Zhenyu Weng

Abstract:Deep hashing has been widely adopted for large-scale image retrieval, with numerous strategies proposed to optimize hash function learning. Pairwise-based methods are effective in learning hash functions that preserve local similarity relationships, whereas center-based methods typically achieve superior performance by more effectively capturing global data distributions. However, the strength of center-based methods in modeling global structures often comes at the expense of underutilizing important local similarity information. To address this limitation, we propose Mutual Learning for Hashing (MLH), a novel weak-to-strong framework that enhances a center-based hashing branch by transferring knowledge from a weaker pairwise-based branch. MLH consists of two branches: a strong center-based branch and a weaker pairwise-based branch. Through an iterative mutual learning process, the center-based branch leverages local similarity cues learned by the pairwise-based branch. Furthermore, inspired by the mixture-of-experts paradigm, we introduce a novel mixture-of-hash-experts module that enables effective cross-branch interaction, further enhancing the performance of both branches. Extensive experiments demonstrate that MLH consistently outperforms state-of-the-art hashing methods across multiple benchmark datasets.

Via

Access Paper or Ask Questions

AFT: An Exemplar-Free Class Incremental Learning Method for Environmental Sound Classification

Sep 19, 2025

Xinyi Chen, Xi Chen, Zhenyu Weng, Yang Xiao

Abstract:As sounds carry rich information, environmental sound classification (ESC) is crucial for numerous applications such as rare wild animals detection. However, our world constantly changes, asking ESC models to adapt to new sounds periodically. The major challenge here is catastrophic forgetting, where models lose the ability to recognize old sounds when learning new ones. Many methods address this using replay-based continual learning. This could be impractical in scenarios such as data privacy concerns. Exemplar-free methods are commonly used but can distort old features, leading to worse performance. To overcome such limitations, we propose an Acoustic Feature Transformation (AFT) technique that aligns the temporal features of old classes to the new space, including a selectively compressed feature space. AFT mitigates the forgetting of old knowledge without retaining past data. We conducted experiments on two datasets, showing consistent improvements over baseline models with accuracy gains of 3.7\% to 3.9\%.

* Submitted to ICASSP 2026

Via

Access Paper or Ask Questions

Federated Learning for Medical Image Classification: A Comprehensive Benchmark

Apr 07, 2025

Zhekai Zhou, Guibo Luo, Mingzhi Chen, Zhenyu Weng, Yuesheng Zhu

Figure 1 for Federated Learning for Medical Image Classification: A Comprehensive Benchmark

Figure 2 for Federated Learning for Medical Image Classification: A Comprehensive Benchmark

Figure 3 for Federated Learning for Medical Image Classification: A Comprehensive Benchmark

Figure 4 for Federated Learning for Medical Image Classification: A Comprehensive Benchmark

Abstract:The federated learning paradigm is wellsuited for the field of medical image analysis, as it can effectively cope with machine learning on isolated multicenter data while protecting the privacy of participating parties. However, current research on optimization algorithms in federated learning often focuses on limited datasets and scenarios, primarily centered around natural images, with insufficient comparative experiments in medical contexts. In this work, we conduct a comprehensive evaluation of several state-of-the-art federated learning algorithms in the context of medical imaging. We conduct a fair comparison of classification models trained using various federated learning algorithms across multiple medical imaging datasets. Additionally, we evaluate system performance metrics, such as communication cost and computational efficiency, while considering different federated learning architectures. Our findings show that medical imaging datasets pose substantial challenges for current federated learning optimization algorithms. No single algorithm consistently delivers optimal performance across all medical federated learning scenarios, and many optimization algorithms may underperform when applied to these datasets. Our experiments provide a benchmark and guidance for future research and application of federated learning in medical imaging contexts. Furthermore, we propose an efficient and robust method that combines generative techniques using denoising diffusion probabilistic models with label smoothing to augment datasets, widely enhancing the performance of federated learning on classification tasks across various medical imaging datasets. Our code will be released on GitHub, offering a reliable and comprehensive benchmark for future federated learning studies in medical imaging.

Via

Access Paper or Ask Questions

FACT: Feature Adaptive Continual-learning Tracker for Multiple Object Tracking

Sep 12, 2024

Rongzihan Song, Zhenyu Weng, Huiping Zhuang, Jinchang Ren, Yongming Chen, Zhiping Lin

Figure 1 for FACT: Feature Adaptive Continual-learning Tracker for Multiple Object Tracking

Figure 2 for FACT: Feature Adaptive Continual-learning Tracker for Multiple Object Tracking

Figure 3 for FACT: Feature Adaptive Continual-learning Tracker for Multiple Object Tracking

Figure 4 for FACT: Feature Adaptive Continual-learning Tracker for Multiple Object Tracking

Abstract:Multiple object tracking (MOT) involves identifying multiple targets and assigning them corresponding IDs within a video sequence, where occlusions are often encountered. Recent methods address occlusions using appearance cues through online learning techniques to improve adaptivity or offline learning techniques to utilize temporal information from videos. However, most existing online learning-based MOT methods are unable to learn from all past tracking information to improve adaptivity on long-term occlusions while maintaining real-time tracking speed. On the other hand, temporal information-based offline learning methods maintain a long-term memory to store past tracking information, but this approach restricts them to use only local past information during tracking. To address these challenges, we propose a new MOT framework called the Feature Adaptive Continual-learning Tracker (FACT), which enables real-time tracking and feature learning for targets by utilizing all past tracking information. We demonstrate that the framework can be integrated with various state-of-the-art feature-based trackers, thereby improving their tracking ability. Specifically, we develop the feature adaptive continual-learning (FAC) module, a neural network that can be trained online to learn features adaptively using all past tracking information during tracking. Moreover, we also introduce a two-stage association module specifically designed for the proposed continual learning-based tracking. Extensive experiment results demonstrate that the proposed method achieves state-of-the-art online tracking performance on MOT17 and MOT20 benchmarks. The code will be released upon acceptance.

Via

Access Paper or Ask Questions

Dual Branch Network Towards Accurate Printed Mathematical Expression Recognition

Dec 14, 2023

Yuqing Wang, Zhenyu Weng, Zhaokun Zhou, Shuaijian Ji, Zhongjie Ye, Yuesheng Zhu

Abstract:Over the past years, Printed Mathematical Expression Recognition (PMER) has progressed rapidly. However, due to the insufficient context information captured by Convolutional Neural Networks, some mathematical symbols might be incorrectly recognized or missed. To tackle this problem, in this paper, a Dual Branch transformer-based Network (DBN) is proposed to learn both local and global context information for accurate PMER. In our DBN, local and global features are extracted simultaneously, and a Context Coupling Module (CCM) is developed to complement the features between the global and local contexts. CCM adopts an interactive manner so that the coupled context clues are highly correlated to each expression symbol. Additionally, we design a Dynamic Soft Target (DST) strategy to utilize the similarities among symbol categories for reasonable label generation. Our experimental results have demonstrated that DBN can accurately recognize mathematical expressions and has achieved state-of-the-art performance.

* Published at ICANN 2022

Via

Access Paper or Ask Questions

POAR: Towards Open-World Pedestrian Attribute Recognition

Mar 26, 2023

YUE Zhang, Suchen Wang, Shichao Kan, Zhenyu Weng, Yigang Cen, Yap-peng Tan

Figure 1 for POAR: Towards Open-World Pedestrian Attribute Recognition

Figure 2 for POAR: Towards Open-World Pedestrian Attribute Recognition

Figure 3 for POAR: Towards Open-World Pedestrian Attribute Recognition

Figure 4 for POAR: Towards Open-World Pedestrian Attribute Recognition

Abstract:Pedestrian attribute recognition (PAR) aims to predict the attributes of a target pedestrian in a surveillance system. Existing methods address the PAR problem by training a multi-label classifier with predefined attribute classes. However, it is impossible to exhaust all pedestrian attributes in the real world. To tackle this problem, we develop a novel pedestrian open-attribute recognition (POAR) framework. Our key idea is to formulate the POAR problem as an image-text search problem. We design a Transformer-based image encoder with a masking strategy. A set of attribute tokens are introduced to focus on specific pedestrian parts (e.g., head, upper body, lower body, feet, etc.) and encode corresponding attributes into visual embeddings. Each attribute category is described as a natural language sentence and encoded by the text encoder. Then, we compute the similarity between the visual and text embeddings of attributes to find the best attribute descriptions for the input images. Different from existing methods that learn a specific classifier for each attribute category, we model the pedestrian at a part-level and explore the searching method to handle the unseen attributes. Finally, a many-to-many contrastive (MTMC) loss with masked tokens is proposed to train the network since a pedestrian image can comprise multiple attributes. Extensive experiments have been conducted on benchmark PAR datasets with an open-attribute setting. The results verified the effectiveness of the proposed POAR method, which can form a strong baseline for the POAR task.

Via

Access Paper or Ask Questions

ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection

May 30, 2022

Huiping Zhuang, Zhenyu Weng, Renchunzi Xie, Kar-Ann Toh, Zhiping Lin

Figure 1 for ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection

Figure 2 for ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection

Figure 3 for ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection

Figure 4 for ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection

Abstract:Class-incremental learning (CIL) learns a classification model with training data of different classes arising progressively. Existing CIL either suffers from serious accuracy loss due to catastrophic forgetting, or invades data privacy by revisiting used exemplars. Inspired by linear learning formulations, we propose an analytic class-incremental learning (ACIL) with absolute memorization of past knowledge while avoiding breaching of data privacy (i.e., without storing historical data). The absolute memorization is demonstrated in the sense that class-incremental learning using ACIL given present data would give identical results to that from its joint-learning counterpart which consumes both present and historical samples. This equality is theoretically validated. Data privacy is ensured since no historical data are involved during the learning process. Empirical validations demonstrate ACIL's competitive accuracy performance with near-identical results for various incremental task settings (e.g., 5-50 phases). This also allows ACIL to outperform the state-of-the-art methods for large-phase scenarios (e.g., 25 and 50 phases).

Via

Access Paper or Ask Questions

Region-aware Attention for Image Inpainting

Apr 03, 2022

Zhilin Huang, Chujun Qin, Zhenyu Weng, Yuesheng Zhu

Figure 1 for Region-aware Attention for Image Inpainting

Figure 2 for Region-aware Attention for Image Inpainting

Figure 3 for Region-aware Attention for Image Inpainting

Figure 4 for Region-aware Attention for Image Inpainting

Abstract:Recent attention-based image inpainting methods have made inspiring progress by modeling long-range dependencies within a single image. However, they tend to generate blurry contents since the correlation between each pixel pairs is always misled by ill-predicted features in holes. To handle this problem, we propose a novel region-aware attention (RA) module. By avoiding the directly calculating corralation between each pixel pair in a single samples and considering the correlation between different samples, the misleading of invalid information in holes can be avoided. Meanwhile, a learnable region dictionary (LRD) is introduced to store important information in the entire dataset, which not only simplifies correlation modeling, but also avoids information redundancy. By applying RA in our architecture, our methodscan generate semantically plausible results with realistic details. Extensive experiments on CelebA, Places2 and Paris StreetView datasets validate the superiority of our method compared with existing methods.

Via

Access Paper or Ask Questions

Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined Classification

Feb 24, 2022

Longhui Yu, Zhenyu Weng, Yuqing Wang, Yuesheng Zhu

Figure 1 for Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined Classification

Figure 2 for Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined Classification

Figure 3 for Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined Classification

Figure 4 for Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined Classification

Abstract:Incremental learning methods can learn new classes continually by distilling knowledge from the last model (as a teacher model) to the current model (as a student model) in the sequentially learning process. However, these methods cannot work for Incremental Implicitly-Refined Classification (IIRC), an incremental learning extension where the incoming classes could have two granularity levels, a superclass label and a subclass label. This is because the previously learned superclass knowledge may be occupied by the subclass knowledge learned sequentially. To solve this problem, we propose a novel Multi-Teacher Knowledge Distillation (MTKD) strategy. To preserve the subclass knowledge, we use the last model as a general teacher to distill the previous knowledge for the student model. To preserve the superclass knowledge, we use the initial model as a superclass teacher to distill the superclass knowledge as the initial model contains abundant superclass knowledge. However, distilling knowledge from two teacher models could result in the student model making some redundant predictions. We further propose a post-processing mechanism, called as Top-k prediction restriction to reduce the redundant predictions. Our experimental results on IIRC-ImageNet120 and IIRC-CIFAR100 show that the proposed method can achieve better classification accuracy compared with existing state-of-the-art methods.

Via

Access Paper or Ask Questions

Structure-aware Image Inpainting with Two Parallel Streams

Nov 05, 2021

Zhilin Huang, Chujun Qin, Ruixin Liu, Zhenyu Weng, Yuesheng Zhu

Figure 1 for Structure-aware Image Inpainting with Two Parallel Streams

Figure 2 for Structure-aware Image Inpainting with Two Parallel Streams

Figure 3 for Structure-aware Image Inpainting with Two Parallel Streams

Figure 4 for Structure-aware Image Inpainting with Two Parallel Streams

Abstract:Recent works in image inpainting have shown that structural information plays an important role in recovering visually pleasing results. In this paper, we propose an end-to-end architecture composed of two parallel UNet-based streams: a main stream (MS) and a structure stream (SS). With the assistance of SS, MS can produce plausible results with reasonable structures and realistic details. Specifically, MS reconstructs detailed images by inferring missing structures and textures simultaneously, and SS restores only missing structures by processing the hierarchical information from the encoder of MS. By interacting with SS in the training process, MS can be implicitly encouraged to exploit structural cues. In order to help SS focus on structures and prevent textures in MS from being affected, a gated unit is proposed to depress structure-irrelevant activations in the information flow between MS and SS. Furthermore, the multi-scale structure feature maps in SS are utilized to explicitly guide the structure-reasonable image reconstruction in the decoder of MS through the fusion block. Extensive experiments on CelebA, Paris StreetView and Places2 datasets demonstrate that our proposed method outperforms state-of-the-art methods.

* 9 pages, 8 figures, rejected by IJCAI 2021

Via

Access Paper or Ask Questions