Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhang Xiong

AAKT: Enhancing Knowledge Tracing with Alternate Autoregressive Modeling

Feb 17, 2025

Hao Zhou, Wenge Rong, Jianfei Zhang, Qing Sun, Yuanxin Ouyang, Zhang Xiong

Abstract:Knowledge Tracing (KT) aims to predict students' future performances based on their former exercises and additional information in educational settings. KT has received significant attention since it facilitates personalized experiences in educational situations. Simultaneously, the autoregressive modeling on the sequence of former exercises has been proven effective for this task. One of the primary challenges in autoregressive modeling for Knowledge Tracing is effectively representing the anterior (pre-response) and posterior (post-response) states of learners across exercises. Existing methods often employ complex model architectures to update learner states using question and response records. In this study, we propose a novel perspective on knowledge tracing task by treating it as a generative process, consistent with the principles of autoregressive models. We demonstrate that knowledge states can be directly represented through autoregressive encodings on a question-response alternate sequence, where model generate the most probable representation in hidden state space by analyzing history interactions. This approach underpins our framework, termed Alternate Autoregressive Knowledge Tracing (AAKT). Additionally, we incorporate supplementary educational information, such as question-related skills, into our framework through an auxiliary task, and include extra exercise details, like response time, as additional inputs. Our proposed framework is implemented using advanced autoregressive technologies from Natural Language Generation (NLG) for both training and prediction. Empirical evaluations on four real-world KT datasets indicate that AAKT consistently outperforms all baseline models in terms of AUC, ACC, and RMSE. Furthermore, extensive ablation studies and visualized analysis validate the effectiveness of key components in AAKT.

* IEEE Transactions on Learning Technologies, vol. 18, pp. 25-38, 2025

Via

Access Paper or Ask Questions

Explainable Few-shot Knowledge Tracing

May 23, 2024

Haoxuan Li, Jifan Yu, Yuanxin Ouyang, Zhuang Liu, Wenge Rong, Juanzi Li, Zhang Xiong

Figure 1 for Explainable Few-shot Knowledge Tracing

Figure 2 for Explainable Few-shot Knowledge Tracing

Figure 3 for Explainable Few-shot Knowledge Tracing

Figure 4 for Explainable Few-shot Knowledge Tracing

Abstract:Knowledge tracing (KT), aiming to mine students' mastery of knowledge by their exercise records and predict their performance on future test questions, is a critical task in educational assessment. While researchers achieved tremendous success with the rapid development of deep learning techniques, current knowledge tracing tasks fall into the cracks from real-world teaching scenarios. Relying heavily on extensive student data and solely predicting numerical performances differs from the settings where teachers assess students' knowledge state from limited practices and provide explanatory feedback. To fill this gap, we explore a new task formulation: Explainable Few-shot Knowledge Tracing. By leveraging the powerful reasoning and generation abilities of large language models (LLMs), we then propose a cognition-guided framework that can track the student knowledge from a few student records while providing natural language explanations. Experimental results from three widely used datasets show that LLMs can perform comparable or superior to competitive deep knowledge tracing methods. We also discuss potential directions and call for future improvements in relevant topics.

Via

Access Paper or Ask Questions

Wasserstein Dependent Graph Attention Network for Collaborative Filtering with Uncertainty

Apr 09, 2024

Haoxuan Li, Yuanxin Ouyang, Zhuang Liu, Wenge Rong, Zhang Xiong

Figure 1 for Wasserstein Dependent Graph Attention Network for Collaborative Filtering with Uncertainty

Figure 2 for Wasserstein Dependent Graph Attention Network for Collaborative Filtering with Uncertainty

Figure 3 for Wasserstein Dependent Graph Attention Network for Collaborative Filtering with Uncertainty

Figure 4 for Wasserstein Dependent Graph Attention Network for Collaborative Filtering with Uncertainty

Abstract:Collaborative filtering (CF) is an essential technique in recommender systems that provides personalized recommendations by only leveraging user-item interactions. However, most CF methods represent users and items as fixed points in the latent space, lacking the ability to capture uncertainty. In this paper, we propose a novel approach, called the Wasserstein dependent Graph ATtention network (W-GAT), for collaborative filtering with uncertainty. We utilize graph attention network and Wasserstein distance to address the limitations of LightGCN and Kullback-Leibler divergence (KL) divergence to learn Gaussian embedding for each user and item. Additionally, our method incorporates Wasserstein-dependent mutual information further to increase the similarity between positive pairs and to tackle the challenges induced by KL divergence. Experimental results on three benchmark datasets show the superiority of W-GAT compared to several representative baselines. Extensive experimental analysis validates the effectiveness of W-GAT in capturing uncertainty by modeling the range of user preferences and categories associated with items.

* This work has been submitted to the IEEE TCSS for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

A Review of Data Mining in Personalized Education: Current Trends and Future Prospects

Feb 27, 2024

Zhang Xiong, Haoxuan Li, Zhuang Liu, Zhuofan Chen, Hao Zhou, Wenge Rong, Yuanxin Ouyang

Abstract:Personalized education, tailored to individual student needs, leverages educational technology and artificial intelligence (AI) in the digital age to enhance learning effectiveness. The integration of AI in educational platforms provides insights into academic performance, learning preferences, and behaviors, optimizing the personal learning process. Driven by data mining techniques, it not only benefits students but also provides educators and institutions with tools to craft customized learning experiences. To offer a comprehensive review of recent advancements in personalized educational data mining, this paper focuses on four primary scenarios: educational recommendation, cognitive diagnosis, knowledge tracing, and learning analysis. This paper presents a structured taxonomy for each area, compiles commonly used datasets, and identifies future research directions, emphasizing the role of data mining in enhancing personalized education and paving the way for future exploration and innovation.

* Zhang Xiong, Haoxuan Li, Zhuang Liu, Zhuofan Chen, Hao Zhou, Wenge Rong, Yuanxin Ouyang. A Review of Data Mining in Personalized Education: Current Trends and Future Prospects. Frontiers of Digital Education, 2024 ,1(1): 26-50
* 25 pages, 5 figures

Via

Access Paper or Ask Questions

Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Aug 25, 2023

Keheng Wang, Feiyu Duan, Sirui Wang, Peiguang Li, Yunsen Xian, Chuantao Yin, Wenge Rong, Zhang Xiong

Figure 1 for Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Figure 2 for Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Figure 3 for Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Figure 4 for Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

Abstract:Equipped with Chain-of-Thought (CoT), Large language models (LLMs) have shown impressive reasoning ability in various downstream tasks. Even so, suffering from hallucinations and the inability to access external knowledge, LLMs often come with incorrect or unfaithful intermediate reasoning steps, especially in the context of answering knowledge-intensive tasks such as KBQA. To alleviate this issue, we propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge, and thus overcome the hallucinations and error propagation. Concretely, we formulate the CoT rationale process of LLMs into a structured multi-round QA format. In each round, LLMs interact with a QA system that retrieves external knowledge and produce faithful reasoning traces based on retrieved precise answers. The structured CoT reasoning of LLMs is facilitated by our developed KBQA CoT collection, which serves as in-context learning demonstrations and can also be utilized as feedback augmentation to train a robust retriever. Extensive experiments on WebQSP and ComplexWebQuestion datasets demonstrate the effectiveness of proposed KD-CoT in task-solving reasoning generation, which outperforms the vanilla CoT ICL with an absolute success rate of 8.0% and 5.1%. Furthermore, our proposed feedback-augmented retriever outperforms the state-of-the-art baselines for retrieving knowledge, achieving significant improvement in Hit performance.

Via

Access Paper or Ask Questions

Transformer-Patcher: One Mistake worth One Neuron

Jan 24, 2023

Zeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie Zhou, Wenge Rong, Zhang Xiong

Abstract:Large Transformer-based Pretrained Language Models (PLMs) dominate almost all Natural Language Processing (NLP) tasks. Nevertheless, they still make mistakes from time to time. For a model deployed in an industrial environment, fixing these mistakes quickly and robustly is vital to improve user experiences. Previous works formalize such problems as Model Editing (ME) and mostly focus on fixing one mistake. However, the one-mistake-fixing scenario is not an accurate abstraction of the real-world challenge. In the deployment of AI services, there are ever-emerging mistakes, and the same mistake may recur if not corrected in time. Thus a preferable solution is to rectify the mistakes as soon as they appear nonstop. Therefore, we extend the existing ME into Sequential Model Editing (SME) to help develop more practical editing methods. Our study shows that most current ME methods could yield unsatisfying results in this scenario. We then introduce Transformer-Patcher, a novel model editor that can shift the behavior of transformer-based models by simply adding and training a few neurons in the last Feed-Forward Network layer. Experimental results on both classification and generation tasks show that Transformer-Patcher can successively correct up to thousands of errors (Reliability) and generalize to their equivalent inputs (Generality) while retaining the model's accuracy on irrelevant inputs (Locality). Our method outperforms previous fine-tuning and HyperNetwork-based methods and achieves state-of-the-art performance for Sequential Model Editing (SME). The code is available at https://github.com/ZeroYuHuang/Transformer-Patcher.

* accepted in ICLR 2023

Via

Access Paper or Ask Questions

Mixture of Attention Heads: Selecting Attention Heads Per Token

Oct 11, 2022

Xiaofeng Zhang, Yikang Shen, Zeyu Huang, Jie Zhou, Wenge Rong, Zhang Xiong

Figure 1 for Mixture of Attention Heads: Selecting Attention Heads Per Token

Figure 2 for Mixture of Attention Heads: Selecting Attention Heads Per Token

Figure 3 for Mixture of Attention Heads: Selecting Attention Heads Per Token

Figure 4 for Mixture of Attention Heads: Selecting Attention Heads Per Token

Abstract:Mixture-of-Experts (MoE) networks have been proposed as an efficient way to scale up model capacity and implement conditional computing. However, the study of MoE components mostly focused on the feedforward layer in Transformer architecture. This paper proposes the Mixture of Attention Heads (MoA), a new architecture that combines multi-head attention with the MoE mechanism. MoA includes a set of attention heads that each has its own set of parameters. Given an input, a router dynamically selects a subset of $k$ attention heads per token. This conditional computation schema allows MoA to achieve stronger performance than the standard multi-head attention layer. Furthermore, the sparsely gated MoA can easily scale up the number of attention heads and the number of parameters while preserving computational efficiency. In addition to the performance improvements, MoA also automatically differentiates heads' utilities, providing a new perspective to discuss the model's interpretability. We conducted experiments on several important tasks, including Machine Translation and Masked Language Modeling. Experiments have shown promising results on several tasks against strong baselines that involve large and very deep models.

* accepted in EMNLP 2022

Via

Access Paper or Ask Questions

PREMA: Part-based REcurrent Multi-view Aggregation Network for 3D Shape Retrieval

Nov 09, 2021

Jiongchao Jin, Huanqiang Xu, Pengliang Ji, Zehao Tang, Zhang Xiong

Figure 1 for PREMA: Part-based REcurrent Multi-view Aggregation Network for 3D Shape Retrieval

Figure 2 for PREMA: Part-based REcurrent Multi-view Aggregation Network for 3D Shape Retrieval

Figure 3 for PREMA: Part-based REcurrent Multi-view Aggregation Network for 3D Shape Retrieval

Figure 4 for PREMA: Part-based REcurrent Multi-view Aggregation Network for 3D Shape Retrieval

Abstract:We propose the Part-based Recurrent Multi-view Aggregation network(PREMA) to eliminate the detrimental effects of the practical view defects, such as insufficient view numbers, occlusions or background clutters, and also enhance the discriminative ability of shape representations. Inspired by the fact that human recognize an object mainly by its discriminant parts, we define the multi-view coherent part(MCP), a discriminant part reoccurring in different views. Our PREMA can reliably locate and effectively utilize MCPs to build robust shape representations. Comprehensively, we design a novel Regional Attention Unit(RAU) in PREMA to compute the confidence map for each view, and extract MCPs by applying those maps to view features. PREMA accentuates MCPs via correlating features of different views, and aggregates the part-aware features for shape representation.

* Accepted by ICCSMT 2021

Via

Access Paper or Ask Questions

Contrastive Learning for Recommender System

Jan 05, 2021

Zhuang Liu, Yunpu Ma, Yuanxin Ouyang, Zhang Xiong

Figure 1 for Contrastive Learning for Recommender System

Figure 2 for Contrastive Learning for Recommender System

Figure 3 for Contrastive Learning for Recommender System

Figure 4 for Contrastive Learning for Recommender System

Abstract:Recommender systems, which analyze users' preference patterns to suggest potential targets, are indispensable in today's society. Collaborative Filtering (CF) is the most popular recommendation model. Specifically, Graph Neural Network (GNN) has become a new state-of-the-art for CF. In the GNN-based recommender system, message dropout is usually used to alleviate the selection bias in the user-item bipartite graph. However, message dropout might deteriorate the recommender system's performance due to the randomness of dropping out the outgoing messages based on the user-item bipartite graph. To solve this problem, we propose a graph contrastive learning module for a general recommender system that learns the embeddings in a self-supervised manner and reduces the randomness of message dropout. Besides, many recommender systems optimize models with pairwise ranking objectives, such as the Bayesian Pairwise Ranking (BPR) based on a negative sampling strategy. However, BPR has the following problems: suboptimal sampling and sample bias. We introduce a new debiased contrastive loss to solve these problems, which provides sufficient negative samples and applies a bias correction probability to alleviate the sample bias. We integrate the proposed framework, including graph contrastive module and debiased contrastive module with several Matrix Factorization(MF) and GNN-based recommendation models. Experimental results on three public benchmarks demonstrate the effectiveness of our framework.

* arXiv admin note: text overlap with arXiv:1905.08108 by other authors

Via

Access Paper or Ask Questions

AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks

Jul 24, 2019

Jingyuan Wang, Yang Zhang, Ke Tang, Junjie Wu, Zhang Xiong

Figure 1 for AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks

Figure 2 for AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks

Figure 3 for AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks

Figure 4 for AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks

Abstract:Recent years have witnessed the successful marriage of finance innovations and AI techniques in various finance applications including quantitative trading (QT). Despite great research efforts devoted to leveraging deep learning (DL) methods for building better QT strategies, existing studies still face serious challenges especially from the side of finance, such as the balance of risk and return, the resistance to extreme loss, and the interpretability of strategies, which limit the application of DL-based strategies in real-life financial markets. In this work, we propose AlphaStock, a novel reinforcement learning (RL) based investment strategy enhanced by interpretable deep attention networks, to address the above challenges. Our main contributions are summarized as follows: i) We integrate deep attention networks with a Sharpe ratio-oriented reinforcement learning framework to achieve a risk-return balanced investment strategy; ii) We suggest modeling interrelationships among assets to avoid selection bias and develop a cross-asset attention mechanism; iii) To our best knowledge, this work is among the first to offer an interpretable investment strategy using deep reinforcement learning models. The experiments on long-periodic U.S. and Chinese markets demonstrate the effectiveness and robustness of AlphaStock over diverse market states. It turns out that AlphaStock tends to select the stocks as winners with high long-term growth, low volatility, high intrinsic value, and being undervalued recently.

* Accepted for POSTER presentation at KDD2019 Applied Data Science Track

Via

Access Paper or Ask Questions