Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinyu Guan

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Jan 08, 2025

Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, Mao Yang

Figure 1 for rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Figure 2 for rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Figure 3 for rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Figure 4 for rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Abstract:We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. rStar-Math achieves this by exercising "deep thinking" through Monte Carlo Tree Search (MCTS), where a math policy SLM performs test-time search guided by an SLM-based process reward model. rStar-Math introduces three innovations to tackle the challenges in training the two SLMs: (1) a novel code-augmented CoT data sythesis method, which performs extensive MCTS rollouts to generate step-by-step verified reasoning trajectories used to train the policy SLM; (2) a novel process reward model training method that avoids na\"ive step-level score annotation, yielding a more effective process preference model (PPM); (3) a self-evolution recipe in which the policy SLM and PPM are built from scratch and iteratively evolved to improve reasoning capabilities. Through 4 rounds of self-evolution with millions of synthesized solutions for 747k math problems, rStar-Math boosts SLMs' math reasoning to state-of-the-art levels. On the MATH benchmark, it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%. On the USA Math Olympiad (AIME), rStar-Math solves an average of 53.3% (8/15) of problems, ranking among the top 20% the brightest high school math students. Code and data will be available at https://github.com/microsoft/rStar.

Via

Access Paper or Ask Questions

Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation

Sep 18, 2024

Yuening Zhou, Yulin Wang, Qian Cui, Xinyu Guan, Francisco Cisternas

Figure 1 for Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation

Figure 2 for Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation

Figure 3 for Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation

Figure 4 for Basket-Enhanced Heterogenous Hypergraph for Price-Sensitive Next Basket Recommendation

Abstract:Next Basket Recommendation (NBR) is a new type of recommender system that predicts combinations of items users are likely to purchase together. Existing NBR models often overlook a crucial factor, which is price, and do not fully capture item-basket-user interactions. To address these limitations, we propose a novel method called Basket-augmented Dynamic Heterogeneous Hypergraph (BDHH). BDHH utilizes a heterogeneous multi-relational graph to capture the intricate relationships among item features, with price as a critical factor. Moreover, our approach includes a basket-guided dynamic augmentation network that could dynamically enhances item-basket-user interactions. Experiments on real-world datasets demonstrate that BDHH significantly improves recommendation accuracy, providing a more comprehensive understanding of user behavior.

Via

Access Paper or Ask Questions

OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining

Feb 24, 2024

Fanjin Zhang, Shijie Shi, Yifan Zhu, Bo Chen, Yukuo Cen, Jifan Yu, Yelin Chen, Lulu Wang, Qingfei Zhao, Yuqing Cheng(+12 more)

Abstract:With the rapid proliferation of scientific literature, versatile academic knowledge services increasingly rely on comprehensive academic graph mining. Despite the availability of public academic graphs, benchmarks, and datasets, these resources often fall short in multi-aspect and fine-grained annotations, are constrained to specific task types and domains, or lack underlying real academic graphs. In this paper, we present OAG-Bench, a comprehensive, multi-aspect, and fine-grained human-curated benchmark based on the Open Academic Graph (OAG). OAG-Bench covers 10 tasks, 20 datasets, 70+ baselines, and 120+ experimental results to date. We propose new data annotation strategies for certain tasks and offer a suite of data pre-processing codes, algorithm implementations, and standardized evaluation protocols to facilitate academic graph mining. Extensive experiments reveal that even advanced algorithms like large language models (LLMs) encounter difficulties in addressing key challenges in certain tasks, such as paper source tracing and scholar profiling. We also introduce the Open Academic Graph Challenge (OAG-Challenge) to encourage community input and sharing. We envisage that OAG-Bench can serve as a common ground for the community to evaluate and compare algorithms in academic graph mining, thereby accelerating algorithm development and advancement in this field. OAG-Bench is accessible at https://www.aminer.cn/data/.

* 8 pages, 5 appendix pages

Via

Access Paper or Ask Questions

GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation

Mar 26, 2023

Ji Qi, Jifan Yu, Teng Tu, Kunyu Gao, Yifan Xu, Xinyu Guan, Xiaozhi Wang, Yuxiao Dong, Bin Xu, Lei Hou(+5 more)

Abstract:Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i.e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. In this paper, we present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC). Moreover, we conduct experimental adaption of existing methods to show the difficulty and potential directions for solving this valuable and applicable task.

Via

Access Paper or Ask Questions

Polycentric Clustering and Structural Regularization for Source-free Unsupervised Domain Adaptation

Oct 14, 2022

Xinyu Guan, Han Sun, Ningzhong Liu, Huiyu Zhou

Figure 1 for Polycentric Clustering and Structural Regularization for Source-free Unsupervised Domain Adaptation

Figure 2 for Polycentric Clustering and Structural Regularization for Source-free Unsupervised Domain Adaptation

Figure 3 for Polycentric Clustering and Structural Regularization for Source-free Unsupervised Domain Adaptation

Figure 4 for Polycentric Clustering and Structural Regularization for Source-free Unsupervised Domain Adaptation

Abstract:Source-Free Domain Adaptation (SFDA) aims to solve the domain adaptation problem by transferring the knowledge learned from a pre-trained source model to an unseen target domain. Most existing methods assign pseudo-labels to the target data by generating feature prototypes. However, due to the discrepancy in the data distribution between the source domain and the target domain and category imbalance in the target domain, there are severe class biases in the generated feature prototypes and noisy pseudo-labels. Besides, the data structure of the target domain is often ignored, which is crucial for clustering. In this paper, a novel framework named PCSR is proposed to tackle SFDA via a novel intra-class Polycentric Clustering and Structural Regularization strategy. Firstly, an inter-class balanced sampling strategy is proposed to generate representative feature prototypes for each class. Furthermore, k-means clustering is introduced to generate multiple clustering centers for each class in the target domain to obtain robust pseudo-labels. Finally, to enhance the model's generalization, structural regularization is introduced for the target domain. Extensive experiments on three UDA benchmark datasets show that our method performs better or similarly against the other state of the art methods, demonstrating our approach's superiority for visual domain adaptation problems.

* BMVC2022, codes https://github.com/Gxinuu/PCSR

Via

Access Paper or Ask Questions

TDGIA:Effective Injection Attacks on Graph Neural Networks

Jun 12, 2021

Xu Zou, Qinkai Zheng, Yuxiao Dong, Xinyu Guan, Evgeny Kharlamov, Jialiang Lu, Jie Tang

Figure 1 for TDGIA:Effective Injection Attacks on Graph Neural Networks

Figure 2 for TDGIA:Effective Injection Attacks on Graph Neural Networks

Figure 3 for TDGIA:Effective Injection Attacks on Graph Neural Networks

Figure 4 for TDGIA:Effective Injection Attacks on Graph Neural Networks

Abstract:Graph Neural Networks (GNNs) have achieved promising performance in various real-world applications. However, recent studies have shown that GNNs are vulnerable to adversarial attacks. In this paper, we study a recently-introduced realistic attack scenario on graphs -- graph injection attack (GIA). In the GIA scenario, the adversary is not able to modify the existing link structure and node attributes of the input graph, instead the attack is performed by injecting adversarial nodes into it. We present an analysis on the topological vulnerability of GNNs under GIA setting, based on which we propose the Topological Defective Graph Injection Attack (TDGIA) for effective injection attacks. TDGIA first introduces the topological defective edge selection strategy to choose the original nodes for connecting with the injected ones. It then designs the smooth feature optimization objective to generate the features for the injected nodes. Extensive experiments on large-scale datasets show that TDGIA can consistently and significantly outperform various attack baselines in attacking dozens of defense GNN models. Notably, the performance drop on target GNNs resultant from TDGIA is more than double the damage brought by the best attack solution among hundreds of submissions on KDD-CUP 2020.

* KDD 2021 research track paper

Via

Access Paper or Ask Questions

Machine Learning for Exam Triage

Apr 30, 2018

Xinyu Guan, Jessica Lee, Peter Wu, Yue Wu

Figure 1 for Machine Learning for Exam Triage

Abstract:In this project, we extend the state-of-the-art CheXNet (Rajpurkar et al. [2017]) by making use of the additional non-image features in the dataset. Our model produced better AUROC scores than the original CheXNet.

Via

Access Paper or Ask Questions