Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hongyu Zhu

RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering

Feb 19, 2025

Sichu Liang, Linhai Zhang, Hongyu Zhu, Wenwen Wang, Yulan He, Deyu Zhou

Abstract:Medical question answering requires extensive access to specialized conceptual knowledge. The current paradigm, Retrieval-Augmented Generation (RAG), acquires expertise medical knowledge through large-scale corpus retrieval and uses this knowledge to guide a general-purpose large language model (LLM) for generating answers. However, existing retrieval approaches often overlook the importance of factual knowledge, which limits the relevance of retrieved conceptual knowledge and restricts its applicability in real-world scenarios, such as clinical decision-making based on Electronic Health Records (EHRs). This paper introduces RGAR, a recurrence generation-augmented retrieval framework that retrieves both relevant factual and conceptual knowledge from dual sources (i.e., EHRs and the corpus), allowing them to interact and refine each another. Through extensive evaluation across three factual-aware medical question answering benchmarks, RGAR establishes a new state-of-the-art performance among medical RAG systems. Notably, the Llama-3.1-8B-Instruct model with RGAR surpasses the considerably larger, RAG-enhanced GPT-3.5. Our findings demonstrate the benefit of extracting factual knowledge for retrieval, which consistently yields improved generation quality.

Via

Access Paper or Ask Questions

Efficient and Effective Model Extraction

Sep 24, 2024

Hongyu Zhu, Wentao Hu, Sichu Liang, Fangqi Li, Wenwen Wang, Shilin Wang

Figure 1 for Efficient and Effective Model Extraction

Figure 2 for Efficient and Effective Model Extraction

Figure 3 for Efficient and Effective Model Extraction

Figure 4 for Efficient and Effective Model Extraction

Abstract:Model extraction aims to create a functionally similar copy from a machine learning as a service (MLaaS) API with minimal overhead, typically for illicit profit or as a precursor to further attacks, posing a significant threat to the MLaaS ecosystem. However, recent studies have shown that model extraction is highly inefficient, particularly when the target task distribution is unavailable. In such cases, even substantially increasing the attack budget fails to produce a sufficiently similar replica, reducing the adversary's motivation to pursue extraction attacks. In this paper, we revisit the elementary design choices throughout the extraction lifecycle. We propose an embarrassingly simple yet dramatically effective algorithm, Efficient and Effective Model Extraction (E3), focusing on both query preparation and training routine. E3 achieves superior generalization compared to state-of-the-art methods while minimizing computational costs. For instance, with only 0.005 times the query budget and less than 0.2 times the runtime, E3 outperforms classical generative model based data-free model extraction by an absolute accuracy improvement of over 50% on CIFAR-10. Our findings underscore the persistent threat posed by model extraction and suggest that it could serve as a valuable benchmarking algorithm for future security evaluations.

Via

Access Paper or Ask Questions

EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition

Sep 22, 2024

Huafeng Qin, Hongyu Zhu, Xin Jin, Xin Yu, Mounim A. El-Yacoubi, Xinbo Gao

Figure 1 for EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition

Figure 2 for EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition

Figure 3 for EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition

Figure 4 for EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition

Abstract:Eye movement biometrics has received increasing attention thanks to its high secure identification. Although deep learning (DL) models have been recently successfully applied for eye movement recognition, the DL architecture still is determined by human prior knowledge. Differentiable Neural Architecture Search (DARTS) automates the manual process of architecture design with high search efficiency. DARTS, however, usually stacks the same multiple learned cells to form a final neural network for evaluation, limiting therefore the diversity of the network. Incidentally, DARTS usually searches the architecture in a shallow network while evaluating it in a deeper one, which results in a large gap between the architecture depths in the search and evaluation scenarios. To address this issue, we propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition. First, we define a supernet and propose a global and local alternate Neural Architecture Search method to search the optimal architecture alternately with an differentiable neural architecture search. The local search strategy aims to find an optimal architecture for different cells while the global search strategy is responsible for optimizing the architecture of the target network. To further reduce redundancy, a transfer entropy is proposed to compute the information amount of each layer, so as to further simplify search network. Our experiments on three public databases demonstrate that the proposed EM-DARTS is capable of producing an optimal architecture that leads to state-of-the-art recognition performance.

* Submited to IEEE Transactions on Information Forensics and Security

Via

Access Paper or Ask Questions

Relax DARTS: Relaxing the Constraints of Differentiable Architecture Search for Eye Movement Recognition

Sep 18, 2024

Hongyu Zhu, Xin Jin, Hongchao Liao, Yan Xiang, Mounim A. El-Yacoubi, Huafeng Qin

Figure 1 for Relax DARTS: Relaxing the Constraints of Differentiable Architecture Search for Eye Movement Recognition

Figure 2 for Relax DARTS: Relaxing the Constraints of Differentiable Architecture Search for Eye Movement Recognition

Figure 3 for Relax DARTS: Relaxing the Constraints of Differentiable Architecture Search for Eye Movement Recognition

Figure 4 for Relax DARTS: Relaxing the Constraints of Differentiable Architecture Search for Eye Movement Recognition

Abstract:Eye movement biometrics is a secure and innovative identification method. Deep learning methods have shown good performance, but their network architecture relies on manual design and combined priori knowledge. To address these issues, we introduce automated network search (NAS) algorithms to the field of eye movement recognition and present Relax DARTS, which is an improvement of the Differentiable Architecture Search (DARTS) to realize more efficient network search and training. The key idea is to circumvent the issue of weight sharing by independently training the architecture parameters $\alpha$ to achieve a more precise target architecture. Moreover, the introduction of module input weights $\beta$ allows cells the flexibility to select inputs, to alleviate the overfitting phenomenon and improve the model performance. Results on four public databases demonstrate that the Relax DARTS achieves state-of-the-art recognition performance. Notably, Relax DARTS exhibits adaptability to other multi-feature temporal classification tasks.

* Accepted By CCBR 2024

Via

Access Paper or Ask Questions

A Survey on Mixup Augmentations and Beyond

Sep 08, 2024

Xin Jin, Hongyu Zhu, Siyuan Li, Zedong Wang, Zicheng Liu, Chang Yu, Huafeng Qin, Stan Z. Li

Figure 1 for A Survey on Mixup Augmentations and Beyond

Figure 2 for A Survey on Mixup Augmentations and Beyond

Figure 3 for A Survey on Mixup Augmentations and Beyond

Figure 4 for A Survey on Mixup Augmentations and Beyond

Abstract:As Deep Neural Networks have achieved thrilling breakthroughs in the past decade, data augmentations have garnered increasing attention as regularization techniques when massive labeled data are unavailable. Among existing augmentations, Mixup and relevant data-mixing methods that convexly combine selected samples and the corresponding labels are widely adopted because they yield high performances by generating data-dependent virtual data while easily migrating to various domains. This survey presents a comprehensive review of foundational mixup methods and their applications. We first elaborate on the training pipeline with mixup augmentations as a unified framework containing modules. A reformulated framework could contain various mixup methods and give intuitive operational procedures. Then, we systematically investigate the applications of mixup augmentations on vision downstream tasks, various data modalities, and some analysis \& theorems of mixup. Meanwhile, we conclude the current status and limitations of mixup research and point out further work for effective and efficient mixup augmentations. This survey can provide researchers with the current state of the art in mixup methods and provide some insights and guidance roles in the mixup arena. An online project with this survey is available at \url{https://github.com/Westlake-AI/Awesome-Mixup}.

* Preprint V1 with 27 pages main text. Online project at https://github.com/Westlake-AI/Awesome-Mixup

Via

Access Paper or Ask Questions

SUMix: Mixup with Semantic and Uncertain Information

Jul 10, 2024

Huafeng Qin, Xin Jin, Hongyu Zhu, Hongchao Liao, Mounîm A. El-Yacoubi, Xinbo Gao

Figure 1 for SUMix: Mixup with Semantic and Uncertain Information

Figure 2 for SUMix: Mixup with Semantic and Uncertain Information

Figure 3 for SUMix: Mixup with Semantic and Uncertain Information

Figure 4 for SUMix: Mixup with Semantic and Uncertain Information

Abstract:Mixup data augmentation approaches have been applied for various tasks of deep learning to improve the generalization ability of deep neural networks. Some existing approaches CutMix, SaliencyMix, etc. randomly replace a patch in one image with patches from another to generate the mixed image. Similarly, the corresponding labels are linearly combined by a fixed ratio $\lambda$ by l. The objects in two images may be overlapped during the mixing process, so some semantic information is corrupted in the mixed samples. In this case, the mixed image does not match the mixed label information. Besides, such a label may mislead the deep learning model training, which results in poor performance. To solve this problem, we proposed a novel approach named SUMix to learn the mixing ratio as well as the uncertainty for the mixed samples during the training process. First, we design a learnable similarity function to compute an accurate mix ratio. Second, an approach is investigated as a regularized term to model the uncertainty of the mixed samples. We conduct experiments on five image benchmarks, and extensive experimental results imply that our method is capable of improving the performance of classifiers with different cutting-based mixup approaches. The source code is available at https://github.com/JinXins/SUMix.

* Accepted by ECCV2024 [Camera Ready] (16 pages, 5 figures) with the source code at https://github.com/JinXins/SUMix

Via

Access Paper or Ask Questions

StarLKNet: Star Mixup with Large Kernel Networks for Palm Vein Identification

May 21, 2024

Xin Jin, Hongyu Zhu, Mounîm A. El Yacoubi, Hongchao Liao, Huafeng Qin, Yun Jiang

Figure 1 for StarLKNet: Star Mixup with Large Kernel Networks for Palm Vein Identification

Figure 2 for StarLKNet: Star Mixup with Large Kernel Networks for Palm Vein Identification

Figure 3 for StarLKNet: Star Mixup with Large Kernel Networks for Palm Vein Identification

Figure 4 for StarLKNet: Star Mixup with Large Kernel Networks for Palm Vein Identification

Abstract:As a representative of a new generation of biometrics, vein identification technology offers a high level of security and convenience. Convolutional neural networks (CNNs), a prominent class of deep learning architectures, have been extensively utilized for vein identification. Since their performance and robustness are limited by small Effective Receptive Fields (e.g. 3$\times$3 kernels) and insufficient training samples, however, they are unable to extract global feature representations from vein images in an effective manner. To address these issues, we propose StarLKNet, a large kernel convolution-based palm-vein identification network, with the Mixup approach. Our StarMix learns effectively the distribution of vein features to expand samples. To enable CNNs to capture comprehensive feature representations from palm-vein images, we explored the effect of convolutional kernel size on the performance of palm-vein identification networks and designed LaKNet, a network leveraging large kernel convolution and gating mechanism. In light of the current state of knowledge, this represents an inaugural instance of the deployment of a CNN with large kernels in the domain of vein identification. Extensive experiments were conducted to validate the performance of StarLKNet on two public palm-vein datasets. The results demonstrated that StarMix provided superior augmentation, and LakNet exhibited more stable performance gains compared to mainstream approaches, resulting in the highest recognition accuracy and lowest identification error.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

Apr 21, 2024

Hongyu Zhu, Sichu Liang, Wentao Hu, Fangqi Li, Ju Jia, Shilin Wang

Figure 1 for Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

Figure 2 for Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

Figure 3 for Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

Figure 4 for Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

Abstract:With the rise of Machine Learning as a Service (MLaaS) platforms,safeguarding the intellectual property of deep learning models is becoming paramount. Among various protective measures, trigger set watermarking has emerged as a flexible and effective strategy for preventing unauthorized model distribution. However, this paper identifies an inherent flaw in the current paradigm of trigger set watermarking: evasion adversaries can readily exploit the shortcuts created by models memorizing watermark samples that deviate from the main task distribution, significantly impairing their generalization in adversarial settings. To counteract this, we leverage diffusion models to synthesize unrestricted adversarial examples as trigger sets. By learning the model to accurately recognize them, unique watermark behaviors are promoted through knowledge injection rather than error memorization, thus avoiding exploitable shortcuts. Furthermore, we uncover that the resistance of current trigger set watermarking against removal attacks primarily relies on significantly damaging the decision boundaries during embedding, intertwining unremovability with adverse impacts. By optimizing the knowledge transfer properties of protected models, our approach conveys watermark behaviors to extraction surrogates without aggressively decision boundary perturbation. Experimental results on CIFAR-10/100 and Imagenette datasets demonstrate the effectiveness of our method, showing not only improved robustness against evasion adversaries but also superior resistance to watermark removal attacks compared to state-of-the-art solutions.

Via

Access Paper or Ask Questions

EmMixformer: Mix transformer for eye movement recognition

Jan 10, 2024

Huafeng Qin, Hongyu Zhu, Xin Jin, Qun Song, Mounim A. El-Yacoubi, Xinbo Gao

Figure 1 for EmMixformer: Mix transformer for eye movement recognition

Figure 2 for EmMixformer: Mix transformer for eye movement recognition

Figure 3 for EmMixformer: Mix transformer for eye movement recognition

Figure 4 for EmMixformer: Mix transformer for eye movement recognition

Abstract:Eye movement (EM) is a new highly secure biometric behavioral modality that has received increasing attention in recent years. Although deep neural networks, such as convolutional neural network (CNN), have recently achieved promising performance, current solutions fail to capture local and global temporal dependencies within eye movement data. To overcome this problem, we propose in this paper a mixed transformer termed EmMixformer to extract time and frequency domain information for eye movement recognition. To this end, we propose a mixed block consisting of three modules, transformer, attention Long short-term memory (attention LSTM), and Fourier transformer. We are the first to attempt leveraging transformer to learn long temporal dependencies within eye movement. Second, we incorporate the attention mechanism into LSTM to propose attention LSTM with the aim to learn short temporal dependencies. Third, we perform self attention in the frequency domain to learn global features. As the three modules provide complementary feature representations in terms of local and global dependencies, the proposed EmMixformer is capable of improving recognition accuracy. The experimental results on our eye movement dataset and two public eye movement datasets show that the proposed EmMixformer outperforms the state of the art by achieving the lowest verification error.

Via

Access Paper or Ask Questions

Improve Deep Forest with Learnable Layerwise Augmentation Policy Schedule

Sep 16, 2023

Hongyu Zhu, Sichu Liang, Wentao Hu, Fang-Qi Li, Yali yuan, Shi-Lin Wang, Guang Cheng

Figure 1 for Improve Deep Forest with Learnable Layerwise Augmentation Policy Schedule

Figure 2 for Improve Deep Forest with Learnable Layerwise Augmentation Policy Schedule

Figure 3 for Improve Deep Forest with Learnable Layerwise Augmentation Policy Schedule

Figure 4 for Improve Deep Forest with Learnable Layerwise Augmentation Policy Schedule

Abstract:As a modern ensemble technique, Deep Forest (DF) employs a cascading structure to construct deep models, providing stronger representational power compared to traditional decision forests. However, its greedy multi-layer learning procedure is prone to overfitting, limiting model effectiveness and generalizability. This paper presents an optimized Deep Forest, featuring learnable, layerwise data augmentation policy schedules. Specifically, We introduce the Cut Mix for Tabular data (CMT) augmentation technique to mitigate overfitting and develop a population-based search algorithm to tailor augmentation intensity for each layer. Additionally, we propose to incorporate outputs from intermediate layers into a checkpoint ensemble for more stable performance. Experimental results show that our method sets new state-of-the-art (SOTA) benchmarks in various tabular classification tasks, outperforming shallow tree ensembles, deep forests, deep neural network, and AutoML competitors. The learned policies also transfer effectively to Deep Forest variants, underscoring its potential for enhancing non-differentiable deep learning modules in tabular signal processing.

Via

Access Paper or Ask Questions