Abstract:Recent advancements in both representation learning and function learning have demonstrated substantial promise across diverse domains of artificial intelligence. However, the effective integration of these paradigms poses a significant challenge, particularly in cases where users must manually decide whether to apply a representation learning or function learning model based on dataset characteristics. To address this issue, we introduce MLP-KAN, a unified method designed to eliminate the need for manual model selection. By integrating Multi-Layer Perceptrons (MLPs) for representation learning and Kolmogorov-Arnold Networks (KANs) for function learning within a Mixture-of-Experts (MoE) architecture, MLP-KAN dynamically adapts to the specific characteristics of the task at hand, ensuring optimal performance. Embedded within a transformer-based framework, our work achieves remarkable results on four widely-used datasets across diverse domains. Extensive experimental evaluation demonstrates its superior versatility, delivering competitive performance across both deep representation and function learning tasks. These findings highlight the potential of MLP-KAN to simplify the model selection process, offering a comprehensive, adaptable solution across various domains. Our code and weights are available at \url{https://github.com/DLYuanGod/MLP-KAN}.
Abstract:Text classification, a core component of task-oriented dialogue systems, attracts continuous research from both the research and industry community, and has resulted in tremendous progress. However, existing method does not consider the use of label information, which may weaken the performance of text classification systems in some token-aware scenarios. To address the problem, in this paper, we introduce the use of label information as label embedding for the task of text classification and achieve remarkable performance on benchmark dataset.
Abstract:While most successful approaches for machine reading comprehension rely on single training objective, it is assumed that the encoder layer can learn great representation through the loss function we define in the predict layer, which is cross entropy in most of time, in the case that we first use neural networks to encode the question and paragraph, then directly fuse the encoding result of them. However, due to the distantly loss backpropagating in reading comprehension, the encoder layer cannot learn effectively and be directly supervised. Thus, the encoder layer can not learn the representation well at any time. Base on this, we propose to inject multi granularity information to the encoding layer. Experiments demonstrate the effect of adding multi granularity information to the encoding layer can boost the performance of machine reading comprehension system. Finally, empirical study shows that our approach can be applied to many existing MRC models.