Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhihong Shen

Microsoft Research

Reasoning Fails Where Step Flow Breaks

Apr 08, 2026

Xiaoyu Xu, Yulan Pan, Xiaosong Yuan, Zhihong Shen, Minghao Su, Yuanhao Su, Xiaofeng Zhang

Abstract:Large reasoning models (LRMs) that generate long chains of thought now perform well on multi-step math, science, and coding tasks. However, their behavior is still unstable and hard to interpret, and existing analysis tools struggle with such long, structured reasoning traces. We introduce Step-Saliency, which pools attention--gradient scores into step-to-step maps along the question--thinking--summary trajectory. Across several models, Step-Saliency reveals two recurring information-flow failures: Shallow Lock-in, where shallow layers over-focus on the current step and barely use earlier context, and Deep Decay, where deep layers gradually lose saliency on the thinking segment and the summary increasingly attends to itself and the last few steps. Motivated by these patterns, we propose StepFlow, a saliency-inspired test-time intervention that adjusts shallow saliency patterns measured by Step-Saliency via Odds-Equal Bridge and adds a small step-level residual in deep layers via Step Momentum Injection. StepFlow improves accuracy on math, science, and coding tasks across multiple LRMs without retraining, indicating that repairing information flow can recover part of their missing reasoning performance.

* Accepted at ACL 2026

Via

Access Paper or Ask Questions

Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding

May 23, 2023

Yu Zhang, Hao Cheng, Zhihong Shen, Xiaodong Liu, Ye-Yi Wang, Jianfeng Gao

Figure 1 for Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding

Figure 2 for Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding

Figure 3 for Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding

Figure 4 for Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding

Abstract:Scientific literature understanding tasks have gained significant attention due to their potential to accelerate scientific discovery. Pre-trained language models (LMs) have shown effectiveness in these tasks, especially when tuned via contrastive learning. However, jointly utilizing pre-training data across multiple heterogeneous tasks (e.g., extreme classification, citation prediction, and literature search) remains largely unexplored. To bridge this gap, we propose a multi-task contrastive learning framework, SciMult, with a focus on facilitating common knowledge sharing across different scientific literature understanding tasks while preventing task-specific skills from interfering with each other. To be specific, we explore two techniques -- task-aware specialization and instruction tuning. The former adopts a Mixture-of-Experts Transformer architecture with task-aware sub-layers; the latter prepends task-specific instructions to the input text so as to produce task-aware outputs. Extensive experiments on a comprehensive collection of benchmark datasets verify the effectiveness of our task-aware specialization strategy in various tasks, where we outperform state-of-the-art scientific LMs.

* 15 pages

Via

Access Paper or Ask Questions

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Feb 11, 2022

Yu Zhang, Zhihong Shen, Chieh-Han Wu, Boya Xie, Junheng Hao, Ye-Yi Wang, Kuansan Wang, Jiawei Han

Figure 1 for Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Figure 2 for Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Figure 3 for Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Figure 4 for Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Abstract:Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant labels from a large candidate set. Most existing LMTC approaches rely on massive human-annotated training data, which are often costly to obtain and suffer from a long-tailed label distribution (i.e., many labels occur only a few times in the training set). In this paper, we study LMTC under the zero-shot setting, which does not require any annotated documents with labels and only relies on label surface names and descriptions. To train a classifier that calculates the similarity score between a document and a label, we propose a novel metadata-induced contrastive learning (MICoL) method. Different from previous text-based contrastive learning techniques, MICoL exploits document metadata (e.g., authors, venues, and references of research papers), which are widely available on the Web, to derive similar document-document pairs. Experimental results on two large-scale datasets show that: (1) MICoL significantly outperforms strong zero-shot text classification and contrastive learning baselines; (2) MICoL is on par with the state-of-the-art supervised metadata-aware LMTC method trained on 10K-200K labeled documents; and (3) MICoL tends to predict more infrequent labels than supervised methods, thus alleviates the deteriorated performance on long-tailed labels.

* 12 pages; Accepted to WWW 2022

Via

Access Paper or Ask Questions

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Jun 25, 2021

Yu Wang, Jinchao Li, Tristan Naumann, Chenyan Xiong, Hao Cheng, Robert Tinn, Cliff Wong, Naoto Usuyama, Richard Rogahn, Zhihong Shen(+5 more)

Figure 1 for Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Figure 2 for Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Figure 3 for Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Figure 4 for Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Abstract:Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vertical domains is challenging due to the scarcity of direct supervision from click logs. Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck. We propose a general approach for vertical search based on domain-specific pretraining and present a case study for the biomedical domain. Despite being substantially simpler and not using any relevance labels for training or development, our method performs comparably or better than the best systems in the official TREC-COVID evaluation, a COVID-related biomedical search competition. Using distributed computing in modern cloud infrastructure, our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search, a new search experience for biomedical literature: https://aka.ms/biomedsearch.

Via

Access Paper or Ask Questions

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Feb 15, 2021

Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang, Jiawei Han

Figure 1 for MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Figure 2 for MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Figure 3 for MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Figure 4 for MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Abstract:Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set. Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. However, most existing studies focus on only modeling the text information, with a few attempts to utilize either metadata or hierarchy signals, but not both of them. In this paper, we bridge the gap by formalizing the problem of metadata-aware text classification in a large label hierarchy (e.g., with tens of thousands of labels). To address this problem, we present the MATCH solution -- an end-to-end framework that leverages both metadata and hierarchy information. To incorporate metadata, we pre-train the embeddings of text and metadata in the same space and also leverage the fully-connected attentions to capture the interrelations between them. To leverage the label hierarchy, we propose different ways to regularize the parameters and output probability of each child label by its parents. Extensive experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH over state-of-the-art deep learning baselines.

* 12 pages; Accepted to WWW 2021

Via

Access Paper or Ask Questions

CORD-19: The Covid-19 Open Research Dataset

Apr 25, 2020

Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William Merrill(+14 more)

Figure 1 for CORD-19: The Covid-19 Open Research Dataset

Figure 2 for CORD-19: The Covid-19 Open Research Dataset

Figure 3 for CORD-19: The Covid-19 Open Research Dataset

Figure 4 for CORD-19: The Covid-19 Open Research Dataset

Abstract:The Covid-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on Covid-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 75K times and has served as the basis of many Covid-19 text mining and discovery systems. In this article, we describe the mechanics of dataset construction, highlighting challenges and key design decisions, provide an overview of how CORD-19 has been used, and preview tools and upcoming shared tasks built around the dataset. We hope this resource will continue to bring together the computing community, biomedical experts, and policy makers in the search for effective treatments and management policies for Covid-19.

* 10 pages, 3 figures

Via

Access Paper or Ask Questions

TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Jan 26, 2020

Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang, Jiawei Han

Figure 1 for TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Figure 2 for TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Figure 3 for TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Figure 4 for TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Abstract:Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications. For example, online retailers (e.g., Amazon and eBay) use taxonomies for product recommendation, and web search engines (e.g., Google and Bing) leverage taxonomies to enhance query understanding. Enormous efforts have been made on constructing taxonomies either manually or semi-automatically. However, with the fast-growing volume of web content, existing taxonomies will become outdated and fail to capture emerging knowledge. Therefore, in many applications, dynamic expansions of an existing taxonomy are in great demand. In this paper, we study how to expand an existing taxonomy by adding a set of new concepts. We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of <query concept, anchor concept> pairs from the existing taxonomy as training data. Using such self-supervision data, TaxoExpan learns a model to predict whether a query concept is the direct hyponym of an anchor concept. We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data. Extensive experiments on three large-scale datasets from different domains demonstrate both the effectiveness and the efficiency of TaxoExpan for taxonomy expansion.

* WWW 2020

Via

Access Paper or Ask Questions

A Scalable Hybrid Research Paper Recommender System for Microsoft Academic

May 21, 2019

Anshul Kanakia, Zhihong Shen, Darrin Eide, Kuansan Wang

Figure 1 for A Scalable Hybrid Research Paper Recommender System for Microsoft Academic

Figure 2 for A Scalable Hybrid Research Paper Recommender System for Microsoft Academic

Figure 3 for A Scalable Hybrid Research Paper Recommender System for Microsoft Academic

Figure 4 for A Scalable Hybrid Research Paper Recommender System for Microsoft Academic

Abstract:We present the design and methodology for the large scale hybrid paper recommender system used by Microsoft Academic. The system provides recommendations for approximately 160 million English research papers and patents. Our approach handles incomplete citation information while also alleviating the cold-start problem that often affects other recommender systems. We use the Microsoft Academic Graph (MAG), titles, and available abstracts of research papers to build a recommendation list for all documents, thereby combining co-citation and content based approaches. Tuning system parameters also allows for blending and prioritization of each approach which, in turn, allows us to balance paper novelty versus authority in recommendation results. We evaluate the generated recommendations via a user study of 40 participants, with over 2400 recommendation pairs graded and discuss the quality of the results using P@10 and nDCG scores. We see that there is a strong correlation between participant scores and the similarity rankings produced by our system but that additional focus needs to be put towards improving recommender precision, particularly for content based recommendations. The results of the user survey and associated analysis scripts are made available via GitHub and the recommendations produced by our system are available as part of the MAG on Azure to facilitate further research and light up novel research paper recommendation applications.

* In The World Wide Web Conference (WWW '19). ACM, New York, NY, USA, 2893-2899
* 7 pages, 7 figures. Short paper at The Web Conference 2019, San Francisco, USA

Via

Access Paper or Ask Questions

A Web-scale system for scientific knowledge exploration

May 30, 2018

Zhihong Shen, Hao Ma, Kuansan Wang

Figure 1 for A Web-scale system for scientific knowledge exploration

Figure 2 for A Web-scale system for scientific knowledge exploration

Figure 3 for A Web-scale system for scientific knowledge exploration

Figure 4 for A Web-scale system for scientific knowledge exploration

Abstract:To enable efficient exploration of Web-scale scientific knowledge, it is necessary to organize scientific publications into a hierarchical concept structure. In this work, we present a large-scale system to (1) identify hundreds of thousands of scientific concepts, (2) tag these identified concepts to hundreds of millions of scientific publications by leveraging both text and graph structure, and (3) build a six-level concept hierarchy with a subsumption-based model. The system builds the most comprehensive cross-domain scientific concept ontology published to date, with more than 200 thousand concepts and over one million relationships.

* 6 pages, accepted for ACL 2018 demo paper

Via

Access Paper or Ask Questions