Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yating Ren

Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Dec 02, 2021

Yating Ren, Junzhong Ji, Lingfeng Niu, Minglong Lei

Figure 1 for Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Figure 2 for Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Figure 3 for Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Figure 4 for Multi-task Self-distillation for Graph-based Semi-Supervised Learning

Abstract:Graph convolutional networks have made great progress in graph-based semi-supervised learning. Existing methods mainly assume that nodes connected by graph edges are prone to have similar attributes and labels, so that the features smoothed by local graph structures can reveal the class similarities. However, there often exist mismatches between graph structures and labels in many real-world scenarios, where the structures may propagate misleading features or labels that eventually affect the model performance. In this paper, we propose a multi-task self-distillation framework that injects self-supervised learning and self-distillation into graph convolutional networks to separately address the mismatch problem from the structure side and the label side. First, we formulate a self-supervision pipeline based on pre-text tasks to capture different levels of similarities in graphs. The feature extraction process is encouraged to capture more complex proximity by jointly optimizing the pre-text task and the target task. Consequently, the local feature aggregations are improved from the structure side. Second, self-distillation uses soft labels of the model itself as additional supervision, which has similar effects as label smoothing. The knowledge from the classification pipeline and the self-supervision pipeline is collectively distilled to improve the generalization ability of the model from the label side. Experiment results show that the proposed method obtains remarkable performance gains under several classic graph convolutional architectures.

Via

Access Paper or Ask Questions