Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chenxi Gu

SpikeBERT: A Language Spikformer Trained with Two-Stage Knowledge Distillation from BERT

Aug 30, 2023

Changze Lv, Tianlong Li, Jianhan Xu, Chenxi Gu, Zixuan Ling, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

Figure 1 for SpikeBERT: A Language Spikformer Trained with Two-Stage Knowledge Distillation from BERT

Figure 2 for SpikeBERT: A Language Spikformer Trained with Two-Stage Knowledge Distillation from BERT

Figure 3 for SpikeBERT: A Language Spikformer Trained with Two-Stage Knowledge Distillation from BERT

Figure 4 for SpikeBERT: A Language Spikformer Trained with Two-Stage Knowledge Distillation from BERT

Abstract:Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are too simplistic, and deep architectures have not been fully explored, resulting in a significant performance gap compared to mainstream transformer-based networks such as BERT. To this end, we improve a recently-proposed spiking transformer (i.e., Spikformer) to make it possible to process language tasks and propose a two-stage knowledge distillation method for training it, which combines pre-training by distilling knowledge from BERT with a large collection of unlabelled texts and fine-tuning with task-specific instances via knowledge distillation again from the BERT fine-tuned on the same training examples. Through extensive experimentation, we show that the models trained with our method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve comparable results to BERTs on text classification tasks for both English and Chinese with much less energy consumption.

Via

Access Paper or Ask Questions

Watermarking Pre-trained Language Models with Backdooring

Oct 14, 2022

Chenxi Gu, Chengsong Huang, Xiaoqing Zheng, Kai-Wei Chang, Cho-Jui Hsieh

Figure 1 for Watermarking Pre-trained Language Models with Backdooring

Figure 2 for Watermarking Pre-trained Language Models with Backdooring

Figure 3 for Watermarking Pre-trained Language Models with Backdooring

Figure 4 for Watermarking Pre-trained Language Models with Backdooring

Abstract:Large pre-trained language models (PLMs) have proven to be a crucial component of modern natural language processing systems. PLMs typically need to be fine-tuned on task-specific downstream datasets, which makes it hard to claim the ownership of PLMs and protect the developer's intellectual property due to the catastrophic forgetting phenomenon. We show that PLMs can be watermarked with a multi-task learning framework by embedding backdoors triggered by specific inputs defined by the owners, and those watermarks are hard to remove even though the watermarked PLMs are fine-tuned on multiple downstream tasks. In addition to using some rare words as triggers, we also show that the combination of common words can be used as backdoor triggers to avoid them being easily detected. Extensive experiments on multiple datasets demonstrate that the embedded watermarks can be robustly extracted with a high success rate and less influenced by the follow-up fine-tuning.

Via

Access Paper or Ask Questions