Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhe Tao

Language Guided Concept Bottleneck Models for Interpretable Continual Learning

Mar 30, 2025

Lu Yu, Haoyu Han, Zhe Tao, Hantao Yao, Changsheng Xu

Figure 1 for Language Guided Concept Bottleneck Models for Interpretable Continual Learning

Figure 2 for Language Guided Concept Bottleneck Models for Interpretable Continual Learning

Figure 3 for Language Guided Concept Bottleneck Models for Interpretable Continual Learning

Figure 4 for Language Guided Concept Bottleneck Models for Interpretable Continual Learning

Abstract:Continual learning (CL) aims to enable learning systems to acquire new knowledge constantly without forgetting previously learned information. CL faces the challenge of mitigating catastrophic forgetting while maintaining interpretability across tasks. Most existing CL methods focus primarily on preserving learned knowledge to improve model performance. However, as new information is introduced, the interpretability of the learning process becomes crucial for understanding the evolving decision-making process, yet it is rarely explored. In this paper, we introduce a novel framework that integrates language-guided Concept Bottleneck Models (CBMs) to address both challenges. Our approach leverages the Concept Bottleneck Layer, aligning semantic consistency with CLIP models to learn human-understandable concepts that can generalize across tasks. By focusing on interpretable concepts, our method not only enhances the models ability to retain knowledge over time but also provides transparent decision-making insights. We demonstrate the effectiveness of our approach by achieving superior performance on several datasets, outperforming state-of-the-art methods with an improvement of up to 3.06% in final average accuracy on ImageNet-subset. Additionally, we offer concept visualizations for model predictions, further advancing the understanding of interpretable continual learning.

* CVPR 2025; Project Page: https://github.com/FisherCats/CLG-CBM

Via

Access Paper or Ask Questions

Enabling Real-Time Conversations with Minimal Training Costs

Sep 18, 2024

Wang Xu, Shuo Wang, Weilin Zhao, Xu Han, Yukun Yan, Yudi Zhang, Zhe Tao, Zhiyuan Liu, Wanxiang Che

Figure 1 for Enabling Real-Time Conversations with Minimal Training Costs

Figure 2 for Enabling Real-Time Conversations with Minimal Training Costs

Figure 3 for Enabling Real-Time Conversations with Minimal Training Costs

Figure 4 for Enabling Real-Time Conversations with Minimal Training Costs

Abstract:Large language models (LLMs) have demonstrated the ability to improve human efficiency through conversational interactions. Conventional LLM-powered dialogue systems, operating on a turn-based paradigm, preclude real-time interaction during response generation. To address this limitation, researchers have proposed duplex models. These models can dynamically adapt to user input, facilitating real-time interactive feedback. However, these methods typically require substantial computational resources to acquire the ability. To reduce overhead, this paper presents a new duplex decoding approach that enhances LLMs with duplex ability, requiring minimal additional training. Specifically, our method employs parallel decoding of queries and responses in conversations, effectively implementing a channel-division-multiplexing decoding strategy. Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.

* 7pages, 6 figures, 1 table

Via

Access Paper or Ask Questions

Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning

Aug 02, 2024

Lu Yu, Zhe Tao, Hantao Yao, Joost Van de Weijer, Changsheng Xu

Figure 1 for Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning

Figure 2 for Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning

Figure 3 for Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning

Figure 4 for Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning

Abstract:Deep neural networks (DNNs) excel on fixed datasets but struggle with incremental and shifting data in real-world scenarios. Continual learning addresses this challenge by allowing models to learn from new data while retaining previously learned knowledge. Existing methods mainly rely on visual features, often neglecting the rich semantic information encoded in text. The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes. Consequently, effectively leveraging this information throughout continual learning is expected to be beneficial. To address this, we propose integrating semantic guidance within and across tasks by capturing semantic similarity using text embeddings. We start from a pre-trained CLIP model, employ the \emph{Semantically-guided Representation Learning (SG-RL)} module for a soft-assignment towards all current task classes, and use the Semantically-guided Knowledge Distillation (SG-KD) module for enhanced knowledge transfer. Experimental results demonstrate the superiority of our method on general and fine-grained datasets. Our code can be found in https://github.com/aprilsveryown/semantically-guided-continual-learning.

Via

Access Paper or Ask Questions

EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models

May 18, 2024

Yu Huang, Liang Guo, Wanqian Guo, Zhe Tao, Yang Lv, Zhihao Sun, Dongfang Zhao

Abstract:In the field of environmental science, it is crucial to have robust evaluation metrics for large language models to ensure their efficacy and accuracy. We propose EnviroExam, a comprehensive evaluation method designed to assess the knowledge of large language models in the field of environmental science. EnviroExam is based on the curricula of top international universities, covering undergraduate, master's, and doctoral courses, and includes 936 questions across 42 core courses. By conducting 0-shot and 5-shot tests on 31 open-source large language models, EnviroExam reveals the performance differences among these models in the domain of environmental science and provides detailed evaluation standards. The results show that 61.3% of the models passed the 5-shot tests, while 48.39% passed the 0-shot tests. By introducing the coefficient of variation as an indicator, we evaluate the performance of mainstream open-source large language models in environmental science from multiple perspectives, providing effective criteria for selecting and fine-tuning language models in this field. Future research will involve constructing more domain-specific test sets using specialized environmental science textbooks to further enhance the accuracy and specificity of the evaluation.

Via

Access Paper or Ask Questions

Architecture-Preserving Provable Repair of Deep Neural Networks

Apr 07, 2023

Zhe Tao, Stephanie Nawas, Jacqueline Mitchell, Aditya V. Thakur

Abstract:Deep neural networks (DNNs) are becoming increasingly important components of software, and are considered the state-of-the-art solution for a number of problems, such as image recognition. However, DNNs are far from infallible, and incorrect behavior of DNNs can have disastrous real-world consequences. This paper addresses the problem of architecture-preserving V-polytope provable repair of DNNs. A V-polytope defines a convex bounded polytope using its vertex representation. V-polytope provable repair guarantees that the repaired DNN satisfies the given specification on the infinite set of points in the given V-polytope. An architecture-preserving repair only modifies the parameters of the DNN, without modifying its architecture. The repair has the flexibility to modify multiple layers of the DNN, and runs in polynomial time. It supports DNNs with activation functions that have some linear pieces, as well as fully-connected, convolutional, pooling and residual layers. To the best our knowledge, this is the first provable repair approach that has all of these features. We implement our approach in a tool called APRNN. Using MNIST, ImageNet, and ACAS Xu DNNs, we show that it has better efficiency, scalability, and generalization compared to PRDNN and REASSURE, prior provable repair methods that are not architecture preserving.

* Accepted paper at PLDI 2021. Tool will be available at https://github.com/95616ARG/APRNN/

Via

Access Paper or Ask Questions

FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Jun 28, 2020

Yunfei Song, Tian Liu, Tongquan Wei, Xiangfeng Wang, Zhe Tao, Mingsong Chen

Figure 1 for FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Figure 2 for FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Figure 3 for FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Figure 4 for FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Abstract:Along with the proliferation of Artificial Intelligence (AI) and Internet of Things (IoT) techniques, various kinds of adversarial attacks are increasingly emerging to fool Deep Neural Networks (DNNs) used by Industrial IoT (IIoT) applications. Due to biased training data or vulnerable underlying models, imperceptible modifications on inputs made by adversarial attacks may result in devastating consequences. Although existing methods are promising in defending such malicious attacks, most of them can only deal with limited existing attack types, which makes the deployment of large-scale IIoT devices a great challenge. To address this problem, we present an effective federated defense approach named FDA3 that can aggregate defense knowledge against adversarial examples from different sources. Inspired by federated learning, our proposed cloud-based architecture enables the sharing of defense capabilities against different attacks among IIoT devices. Comprehensive experimental results show that the generated DNNs by our approach can not only resist more malicious attacks than existing attack-specific adversarial training methods, but also can prevent IIoT applications from new attacks.

* IEEE Transactions on Industrial Informatics, 2020

Via

Access Paper or Ask Questions