Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ness Shroff

Large Language Models Achieve Gold Medal Performance at International Astronomy & Astrophysics Olympiad

Oct 06, 2025

Lucas Carrit Delgado Pinheiro, Ziru Chen, Bruno Caixeta Piazza, Ness Shroff, Yingbin Liang, Yuan-Sen Ting, Huan Sun

Abstract:While task-specific demonstrations show early success in applying large language models (LLMs) to automate some astronomical research tasks, they only provide incomplete views of all necessary capabilities in solving astronomy problems, calling for more thorough understanding of LLMs' strengths and limitations. So far, existing benchmarks and evaluations focus on simple question-answering that primarily tests astronomical knowledge and fails to evaluate the complex reasoning required for real-world research in the discipline. Here, we address this gap by systematically benchmarking five state-of-the-art LLMs on the International Olympiad on Astronomy and Astrophysics (IOAA) exams, which are designed to examine deep conceptual understanding, multi-step derivations, and multimodal analysis. With average scores of 85.6% and 84.2%, Gemini 2.5 Pro and GPT-5 (the two top-performing models) not only achieve gold medal level performance but also rank in the top two among ~200-300 participants in all four IOAA theory exams evaluated (2022-2025). In comparison, results on the data analysis exams show more divergence. GPT-5 still excels in the exams with an 88.5% average score, ranking top 10 among the participants in the four most recent IOAAs, while other models' performances drop to 48-76%. Furthermore, our in-depth error analysis underscores conceptual reasoning, geometric reasoning, and spatial visualization (52-79% accuracy) as consistent weaknesses among all LLMs. Hence, although LLMs approach peak human performance in theory exams, critical gaps must be addressed before they can serve as autonomous research agents in astronomy.

* 18 pages, 6 figures, to be submitted, comments are welcome

Via

Access Paper or Ask Questions

Artificial Intelligence of Things: A Survey

Oct 25, 2024

Shakhrul Iman Siam, Hyunho Ahn, Li Liu, Samiul Alam, Hui Shen, Zhichao Cao, Ness Shroff, Bhaskar Krishnamachari, Mani Srivastava, Mi Zhang

Figure 1 for Artificial Intelligence of Things: A Survey

Figure 2 for Artificial Intelligence of Things: A Survey

Figure 3 for Artificial Intelligence of Things: A Survey

Figure 4 for Artificial Intelligence of Things: A Survey

Abstract:The integration of the Internet of Things (IoT) and modern Artificial Intelligence (AI) has given rise to a new paradigm known as the Artificial Intelligence of Things (AIoT). In this survey, we provide a systematic and comprehensive review of AIoT research. We examine AIoT literature related to sensing, computing, and networking & communication, which form the three key components of AIoT. In addition to advancements in these areas, we review domain-specific AIoT systems that are designed for various important application domains. We have also created an accompanying GitHub repository, where we compile the papers included in this survey: https://github.com/AIoT-MLSys-Lab/AIoT-Survey. This repository will be actively maintained and updated with new research as it becomes available. As both IoT and AI become increasingly critical to our society, we believe AIoT is emerging as an essential research field at the intersection of IoT and modern AI. We hope this survey will serve as a valuable resource for those engaged in AIoT research and act as a catalyst for future explorations to bridge gaps and drive advancements in this exciting field.

* ACM Trans. Sen. Netw.(August 2024)
* Accepted in ACM Transactions on Sensor Networks (TOSN)

Via

Access Paper or Ask Questions

Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers

Oct 17, 2024

Yuchen Liang, Peizhong Ju, Yingbin Liang, Ness Shroff

Figure 1 for Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers

Figure 2 for Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers

Abstract:The denoising diffusion model has recently emerged as a powerful generative technique, capable of transforming noise into meaningful data. While theoretical convergence guarantees for diffusion models are well established when the target distribution aligns with the training distribution, practical scenarios often present mismatches. One common case is in zero-shot conditional diffusion sampling, where the target conditional distribution is different from the (unconditional) training distribution. These score-mismatched diffusion models remain largely unexplored from a theoretical perspective. In this paper, we present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers, focusing on target distributions with finite second moments. We show that score mismatches result in an asymptotic distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions. This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise. Interestingly, the derived convergence upper bound offers useful guidance for designing a novel bias-optimal zero-shot sampler in linear conditional models that minimizes the asymptotic bias. For such bias-optimal samplers, we further establish convergence guarantees with explicit dependencies on dimension and conditioning, applied to several interesting target distributions, including those with bounded support and Gaussian mixtures. Our findings are supported by numerical studies.

Via

Access Paper or Ask Questions

Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning?

Sep 05, 2024

Peizhong Ju, Haibo Yang, Jia Liu, Yingbin Liang, Ness Shroff

Figure 1 for Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning?

Figure 2 for Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning?

Figure 3 for Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning?

Figure 4 for Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning?

Abstract:Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing. While various algorithms along with their optimization analyses have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention. This lack of investigation can be attributed to the complex interplay between data heterogeneity and infrequent communication due to the local updates within the FL framework. This motivates us to investigate a fundamental question in FL: Can we quantify the impact of data heterogeneity and local updates on the generalization performance for FL as the learning process evolves? To this end, we conduct a comprehensive theoretical study of FL's generalization performance using a linear model as the first step, where the data heterogeneity is considered for both the stationary and online/non-stationary cases. By providing closed-form expressions of the model error, we rigorously quantify the impact of the number of the local updates (denoted as $K$) under three settings ($K=1$, $K<\infty$, and $K=\infty$) and show how the generalization performance evolves with the number of rounds $t$. Our investigation also provides a comprehensive understanding of how different configurations (including the number of model parameters $p$ and the number of training samples $n$) contribute to the overall generalization performance, thus shedding new insights (such as benign overfitting) for implementing FL over networks.

* Published in MobiHoc 2024

Via

Access Paper or Ask Questions

Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate

Feb 21, 2024

Yuchen Liang, Peizhong Ju, Yingbin Liang, Ness Shroff

Figure 1 for Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate

Figure 2 for Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate

Abstract:The denoising diffusion model emerges recently as a powerful generative technique that converts noise into data. Theoretical convergence guarantee has been mainly studied for continuous-time diffusion models, and has been obtained for discrete-time diffusion models only for distributions with bounded support in the literature. In this paper, we establish the convergence guarantee for substantially larger classes of distributions under discrete-time diffusion models and further improve the convergence rate for distributions with bounded support. In particular, we first establish the convergence rates for both smooth and general (possibly non-smooth) distributions having finite second moment. We then specialize our results to a number of interesting classes of distributions with explicit parameter dependencies, including distributions with Lipschitz scores, Gaussian mixture distributions, and distributions with bounded support. We further propose a novel accelerated sampler and show that it improves the convergence rates of the corresponding regular sampler by orders of magnitude with respect to all system parameters. For distributions with bounded support, our result improves the dimensional dependence of the previous convergence rate by orders of magnitude. Our study features a novel analysis technique that constructs tilting factor representation of the convergence error and exploits Tweedie's formula for handling Taylor expansion power terms.

Via

Access Paper or Ask Questions

Hoeffding's Inequality for Markov Chains under Generalized Concentrability Condition

Oct 04, 2023

Hao Chen, Abhishek Gupta, Yin Sun, Ness Shroff

Figure 1 for Hoeffding's Inequality for Markov Chains under Generalized Concentrability Condition

Figure 2 for Hoeffding's Inequality for Markov Chains under Generalized Concentrability Condition

Figure 3 for Hoeffding's Inequality for Markov Chains under Generalized Concentrability Condition

Abstract:This paper studies Hoeffding's inequality for Markov chains under the generalized concentrability condition defined via integral probability metric (IPM). The generalized concentrability condition establishes a framework that interpolates and extends the existing hypotheses of Markov chain Hoeffding-type inequalities. The flexibility of our framework allows Hoeffding's inequality to be applied beyond the ergodic Markov chains in the traditional sense. We demonstrate the utility by applying our framework to several non-asymptotic analyses arising from the field of machine learning, including (i) a generalization bound for empirical risk minimization with Markovian samples, (ii) a finite sample guarantee for Ployak-Ruppert averaging of SGD, and (iii) a new regret bound for rested Markovian bandits with general state space.

Via

Access Paper or Ask Questions

Non-Convex Bilevel Optimization with Time-Varying Objective Functions

Aug 07, 2023

Sen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, Ness Shroff

Abstract:Bilevel optimization has become a powerful tool in a wide variety of machine learning problems. However, the current nonconvex bilevel optimization considers an offline dataset and static functions, which may not work well in emerging online applications with streaming data and time-varying functions. In this work, we study online bilevel optimization (OBO) where the functions can be time-varying and the agent continuously updates the decisions with online streaming data. To deal with the function variations and the unavailability of the true hypergradients in OBO, we propose a single-loop online bilevel optimizer with window averaging (SOBOW), which updates the outer-level decision based on a window average of the most recent hypergradient estimations stored in the memory. Compared to existing algorithms, SOBOW is computationally efficient and does not need to know previous functions. To handle the unique technical difficulties rooted in single-loop update and function variations for OBO, we develop a novel analytical technique that disentangles the complex couplings between decision variables, and carefully controls the hypergradient estimation error. We show that SOBOW can achieve a sublinear bilevel local regret under mild conditions. Extensive experiments across multiple domains corroborate the effectiveness of SOBOW.

Via

Access Paper or Ask Questions

Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

Jun 22, 2023

Yining Li, Peizhong Ju, Ness Shroff

Figure 1 for Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

Figure 2 for Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

Figure 3 for Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

Abstract:Reinforcement learning often needs to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces (often known as the curse of dimensionality). In this work, we address this issue by learning the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity. In particular, we partition the action spaces into multiple groups based on the similarity in transition distribution and reward function, and build a linear decomposition model to capture the difference between the intra-group transition kernel and the intra-group rewards. Both our theoretical analysis and experiments reveal a \emph{surprising and counter-intuitive result}: while a more refined grouping strategy can reduce the approximation error caused by treating actions in the same group as identical, it also leads to increased estimation error when the size of samples or the computation resources is limited. This finding highlights the grouping strategy as a new degree of freedom that can be optimized to minimize the overall performance loss. To address this issue, we formulate a general optimization problem for determining the optimal grouping strategy, which strikes a balance between performance loss and sample/computational complexity. We further propose a computationally efficient method for selecting a nearly-optimal grouping strategy, which maintains its computational complexity independent of the size of the action space.

Via

Access Paper or Ask Questions

Theoretical Hardness and Tractability of POMDPs in RL with Partial Hindsight State Information

Jun 14, 2023

Ming Shi, Yingbin Liang, Ness Shroff

Figure 1 for Theoretical Hardness and Tractability of POMDPs in RL with Partial Hindsight State Information

Figure 2 for Theoretical Hardness and Tractability of POMDPs in RL with Partial Hindsight State Information

Abstract:Partially observable Markov decision processes (POMDPs) have been widely applied to capture many real-world applications. However, existing theoretical results have shown that learning in general POMDPs could be intractable, where the main challenge lies in the lack of latent state information. A key fundamental question here is how much hindsight state information (HSI) is sufficient to achieve tractability. In this paper, we establish a lower bound that reveals a surprising hardness result: unless we have full HSI, we need an exponentially scaling sample complexity to obtain an $\epsilon$-optimal policy solution for POMDPs. Nonetheless, from the key insights in our lower-bound construction, we find that there exist important tractable classes of POMDPs even with partial HSI. In particular, for two novel classes of POMDPs with partial HSI, we provide new algorithms that are shown to be near-optimal by establishing new upper and lower bounds.

* Submitted for publication

Via

Access Paper or Ask Questions

Provably Efficient Model-Free Algorithms for Non-stationary CMDPs

Mar 10, 2023

Honghao Wei, Arnob Ghosh, Ness Shroff, Lei Ying, Xingyu Zhou

Abstract:We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov Decision Processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost). In the non-stationary environment, reward, utility functions, and transition kernels can vary arbitrarily over time as long as the cumulative variations do not exceed certain variation budgets. We propose the first model-free, simulator-free RL algorithms with sublinear regret and zero constraint violation for non-stationary CMDPs in both tabular and linear function approximation settings with provable performance guarantees. Our results on regret bound and constraint violation for the tabular case match the corresponding best results for stationary CMDPs when the total budget is known. Additionally, we present a general framework for addressing the well-known challenges associated with analyzing non-stationary CMDPs, without requiring prior knowledge of the variation budget. We apply the approach for both tabular and linear approximation settings.

* AISTATS 2023

Via

Access Paper or Ask Questions