Abstract:The problem of extreme multi-label text classification (XMTC) is to recall some most relevant labels for a text from an extremely large label set. Though the methods based on deep pre-trained models have reached significant achievement, the pre-trained models are still not fully utilized. Label semantics has not attracted much attention so far, and the latent space between texts and labels has not been effectively explored. This paper constructs a novel guide network (GUDN) to help fine-tune the pre-trained model to instruct classification later. Also, we use the raw label semantics to effectively explore the latent space between texts and labels, which can further improve predicted accuracy. Experimental results demonstrate that GUDN outperforms state-of-the-art methods on several popular datasets. Our source code is released at https://github.com/wq2581/GUDN.