Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Neeraj Varshney

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

Apr 21, 2026

Chaitanya Dwivedi, Binxuan Huang, Himanshu Gupta, Pratik Jayarao, Neeraj Varshney, Bing Yin

Abstract:Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active computation, model quality scales predictably with total parameters, and MoEs realize this by increasing expert count. However, training large MoEs is expensive, as memory requirements and inter-device communication both scale with total parameter count. We propose expert upcycling, a method for progressively expanding MoE capacity by increasing the number of experts during continued pre-training (CPT). Given a trained E-expert model, the upcycling operator constructs an mE-expert model through expert duplication and router extension while holding top-K routing fixed, preserving per-token inference cost. Duplication provides a warm initialization: the expanded model inherits the source checkpoint's learned representations, starting from a substantially lower loss than random initialization. Subsequent CPT then breaks the symmetry among duplicated experts to drive specialization. We formalize the upcycling operator and develop a theoretical framework decomposing the quality gap into a capacity term and an initialization term. We further introduce utility-based expert selection, which uses gradient-based importance scores to guide non-uniform duplication, more than tripling gap closure when CPT is limited. In our 7B-13B total parameter experiments, the upcycled model matches the fixed-size baseline on validation loss while saving 32% of GPU hours. Comprehensive ablations across model scales, activation ratios, MoE architectures, and training budgets yield a practical recipe for deploying expert upcycling, establishing it as a principled, compute-efficient alternative to training large MoE models from scratch.

* 12 Pages, 5 Tables. 14 Pages in Appendix

Via

Access Paper or Ask Questions

Evaluation of gNB Monostatic Sensing for UAV Use Case

Apr 02, 2026

Steve Blandino, Neeraj Varshney, Jian Wang, Jack Chuang, Camillo Gentile, Nada Golmie

Abstract:3GPP Release 19 has initiated the standardization of integrated sensing and communications (ISAC), including a channel model for monostatic sensing, evaluation scenarios, and performance assessment methodologies. These common assumptions provide an important basis for ISAC evaluation, but reproducible end-to-end studies still require a transparent sensing implementation. This paper evaluates 5G New Radio (NR) base station (gNB)-based monostatic sensing for the Unmanned Aerial Vehicle (UAV) use case using a 5G NR downlink Cyclic Prefix-Orthogonal Frequency Division Multiplexing (CP-OFDM) waveform and positioning reference signals (PRS), following 3GPP Urban Macro-Aerial Vehicle (UMa-AV) scenario assumptions. We present an end-to-end processing chain for multi-target detection and 3D localization, achieving more than 70% detection probability with less than 5% false alarm rate, in the considered scenario. For correctly detected targets, localization errors are on the order of a few meters, with a 90th-percentile error of 4m and 6m in the vertical and horizontal directions, respectively. To support reproducible baseline studies and further research, we release the simulator 5GNRad, which reproduces our evaluation

Via

Access Paper or Ask Questions

Stabilizing Reinforcement Learning for Honesty Alignment in Language Models on Deductive Reasoning

Nov 12, 2025

Jiarui Liu, Kaustubh Dhole, Yingheng Wang, Haoyang Wen, Sarah Zhang, Haitao Mao, Gaotang Li, Neeraj Varshney, Jingguo Liu, Xiaoman Pan

Abstract:Reinforcement learning with verifiable rewards (RLVR) has recently emerged as a promising framework for aligning language models with complex reasoning objectives. However, most existing methods optimize only for final task outcomes, leaving models vulnerable to collapse when negative rewards dominate early training. This challenge is especially pronounced in honesty alignment, where models must not only solve answerable queries but also identify when conclusions cannot be drawn from the given premises. Deductive reasoning provides an ideal testbed because it isolates reasoning capability from reliance on external factual knowledge. To investigate honesty alignment, we curate two multi-step deductive reasoning datasets from graph structures, one for linear algebra and one for logical inference, and introduce unanswerable cases by randomly perturbing an edge in half of the instances. We find that GRPO, with or without supervised fine tuning initialization, struggles on these tasks. Through extensive experiments across three models, we evaluate stabilization strategies and show that curriculum learning provides some benefit but requires carefully designed in distribution datasets with controllable difficulty. To address these limitations, we propose Anchor, a reinforcement learning method that injects ground truth trajectories into rollouts, preventing early training collapse. Our results demonstrate that this method stabilizes learning and significantly improves the overall reasoning performance, underscoring the importance of training dynamics for enabling reliable deductive reasoning in aligned language models.

Via

Access Paper or Ask Questions

Deep Learning-based Human Gesture Channel Modeling for Integrated Sensing and Communication Scenarios

Jul 09, 2025

Zhengyu Zhang, Neeraj Varshney, Jelena Senic, Raied Caromi, Samuel Berweger, Camillo Gentile, Enrico M. Vitucci, Ruisi He, Vittorio Degli-Esposti

Abstract:With the development of Integrated Sensing and Communication (ISAC) for Sixth-Generation (6G) wireless systems, contactless human recognition has emerged as one of the key application scenarios. Since human gesture motion induces subtle and random variations in wireless multipath propagation, how to accurately model human gesture channels has become a crucial issue for the design and validation of ISAC systems. To this end, this paper proposes a deep learning-based human gesture channel modeling framework for ISAC scenarios, in which the human body is decomposed into multiple body parts, and the mapping between human gestures and their corresponding multipath characteristics is learned from real-world measurements. Specifically, a Poisson neural network is employed to predict the number of Multi-Path Components (MPCs) for each human body part, while Conditional Variational Auto-Encoders (C-VAEs) are reused to generate the scattering points, which are further used to reconstruct continuous channel impulse responses and micro-Doppler signatures. Simulation results demonstrate that the proposed method achieves high accuracy and generalization across different gestures and subjects, providing an interpretable approach for data augmentation and the evaluation of gesture-based ISAC systems.

Via

Access Paper or Ask Questions

Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

Jun 24, 2024

Nisarg Patel, Mohith Kulkarni, Mihir Parmar, Aashna Budhiraja, Mutsumi Nakamura, Neeraj Varshney, Chitta Baral

Figure 1 for Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

Figure 2 for Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

Figure 3 for Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

Figure 4 for Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

Abstract:As Large Language Models (LLMs) continue to exhibit remarkable performance in natural language understanding tasks, there is a crucial need to measure their ability for human-like multi-step logical reasoning. Existing logical reasoning evaluation benchmarks often focus primarily on simplistic single-step or multi-step reasoning with a limited set of inference rules. Furthermore, the lack of datasets for evaluating non-monotonic reasoning represents a crucial gap since it aligns more closely with human-like reasoning. To address these limitations, we propose Multi-LogiEval, a comprehensive evaluation dataset encompassing multi-step logical reasoning with various inference rules and depths. Multi-LogiEval covers three logic types--propositional, first-order, and non-monotonic--consisting of more than 30 inference rules and more than 60 of their combinations with various depths. Leveraging this dataset, we conduct evaluations on a range of LLMs including GPT-4, ChatGPT, Gemini-Pro, Yi, Orca, and Mistral, employing a zero-shot chain-of-thought. Experimental results show that there is a significant drop in the performance of LLMs as the reasoning steps/depth increases (average accuracy of ~68% at depth-1 to ~43% at depth-5). We further conduct a thorough investigation of reasoning chains generated by LLMs which reveals several important findings. We believe that Multi-LogiEval facilitates future research for evaluating and enhancing the logical reasoning ability of LLMs. Data is available at https://github.com/Mihir3009/Multi-LogiEval.

* 23 Pages

Via

Access Paper or Ask Questions

Efficient Transmission Scheme for LEO Satellite-Based NB-IoT: A Data-Driven Perspective

Jun 20, 2024

Ayush Kumar Dwivedi, Houcine Chougrani, Sachin Chaudhari, Neeraj Varshney, Symeon Chatzinotas

Abstract:This study analyses the medium access control (MAC) layer aspects of a low-Earth-orbit (LEO) satellite-based Internet of Things (IoT) network. A transmission scheme based on change detection is proposed to accommodate more users within the network and improve energy efficiency. Machine learning (ML) algorithms are also proposed to reduce the payload size by leveraging the correlation among the sensed parameters. Real-world data from an IoT testbed deployed for a smart city application is utilised to analyse the performance regarding collision probability, effective data received and average battery lifetime. The findings reveal that the traffic pattern, post-implementation of the proposed scheme, differs from the commonly assumed Poisson traffic, thus proving the effectiveness of having IoT data from actual deployment. It is demonstrated that the transmission scheme facilitates accommodating more devices while targeting a specific collision probability. Considering the link budget for a direct access NB-IoT scenario, more data is effectively offloaded to the server within the limited visibility of LEO satellites. The average battery lifetimes are also demonstrated to increase by many folds by using the proposed access schemes and ML algorithms.

Via

Access Paper or Ask Questions

Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation

Jun 08, 2024

Neeraj Varshney, Satyam Raj, Venkatesh Mishra, Agneet Chatterjee, Ritika Sarkar, Amir Saeidi, Chitta Baral

Abstract:Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, they have been shown to suffer from a critical limitation pertinent to 'hallucination' in their output. Recent research has focused on investigating and addressing this problem for a variety of tasks such as biography generation, question answering, abstractive summarization, and dialogue generation. However, the crucial aspect pertaining to 'negation' has remained considerably underexplored. Negation is important because it adds depth and nuance to the understanding of language and is also crucial for logical reasoning and inference. In this work, we address the above limitation and particularly focus on studying the impact of negation in LLM hallucinations. Specifically, we study four tasks with negation: 'false premise completion', 'constrained fact generation', 'multiple choice question answering', and 'fact generation'. We show that open-source state-of-the-art LLMs such as LLaMA-2-chat, Vicuna, and Orca-2 hallucinate considerably on all these tasks involving negation which underlines a critical shortcoming of these models. Addressing this problem, we further study numerous strategies to mitigate these hallucinations and demonstrate their impact.

Via

Access Paper or Ask Questions

Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies

Jun 06, 2024

Aswin RRV, Nemika Tyagi, Md Nayem Uddin, Neeraj Varshney, Chitta Baral

Abstract:This study explores the sycophantic tendencies of Large Language Models (LLMs), where these models tend to provide answers that match what users want to hear, even if they are not entirely correct. The motivation behind this exploration stems from the common behavior observed in individuals searching the internet for facts with partial or misleading knowledge. Similar to using web search engines, users may recall fragments of misleading keywords and submit them to an LLM, hoping for a comprehensive response. Our empirical analysis of several LLMs shows the potential danger of these models amplifying misinformation when presented with misleading keywords. Additionally, we thoroughly assess four existing hallucination mitigation strategies to reduce LLMs sycophantic behavior. Our experiments demonstrate the effectiveness of these strategies for generating factually correct statements. Furthermore, our analyses delve into knowledge-probing experiments on factual keywords and different categories of sycophancy mitigation.

* To be published in Findings of ACL 2024

Via

Access Paper or Ask Questions

Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Apr 23, 2024

Mihir Parmar, Nisarg Patel, Neeraj Varshney, Mutsumi Nakamura, Man Luo, Santosh Mashetty, Arindam Mitra, Chitta Baral

Figure 1 for Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Figure 2 for Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Figure 3 for Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Figure 4 for Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Abstract:Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks. But, can they really "reason" over the natural language? This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied. However, the crucial skill pertaining to 'logical reasoning' has remained underexplored. Existing work investigating this reasoning ability of LLMs has focused only on a couple of inference rules (such as modus ponens and modus tollens) of propositional and first-order logic. Addressing the above limitation, we comprehensively evaluate the logical reasoning ability of LLMs on 25 different reasoning patterns spanning over propositional, first-order, and non-monotonic logics. To enable systematic evaluation, we introduce LogicBench, a natural language question-answering dataset focusing on the use of a single inference rule. We conduct detailed analysis with a range of LLMs such as GPT-4, ChatGPT, Gemini, Llama-2, and Mistral using chain-of-thought prompting. Experimental results show that existing LLMs do not fare well on LogicBench; especially, they struggle with instances involving complex reasoning and negations. Furthermore, they sometimes overlook contextual information necessary for reasoning to arrive at the correct conclusion. We believe that our work and findings facilitate future research for evaluating and enhancing the logical reasoning ability of LLMs. Data and code are available at https://github.com/Mihir3009/LogicBench.

* 29 Pages

Via

Access Paper or Ask Questions

The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness

Dec 30, 2023

Neeraj Varshney, Pavel Dolin, Agastya Seth, Chitta Baral

Figure 1 for The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness

Figure 2 for The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness

Figure 3 for The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness

Figure 4 for The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness

Abstract:As Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications, their safety concerns become critical areas of NLP research. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark: a collection of diverse safe and unsafe prompts with carefully designed evaluation methods that facilitate systematic evaluation, comparison, and analysis over 'safety' and 'over-defensiveness.' With SODE, we study a variety of LLM defense strategies over multiple state-of-the-art LLMs, which reveals several interesting and important findings, such as (a) the widely popular 'self-checking' techniques indeed improve the safety against unsafe inputs, but this comes at the cost of extreme over-defensiveness on the safe inputs, (b) providing a safety instruction along with in-context exemplars (of both safe and unsafe inputs) consistently improves safety and also mitigates undue over-defensiveness of the models, (c) providing contextual knowledge easily breaks the safety guardrails and makes the models more vulnerable to generating unsafe responses. Overall, our work reveals numerous such critical findings that we believe will pave the way and facilitate further research in improving the safety of LLMs.

Via

Access Paper or Ask Questions