Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anthony M. Zador

Token-Level Uncertainty-Aware Objective for Language Model Post-Training

Mar 15, 2025

Tingkai Liu, Ari S. Benjamin, Anthony M. Zador

Abstract:In the current work, we connect token-level uncertainty in causal language modeling to two types of training objectives: 1) masked maximum likelihood (MLE), 2) self-distillation. We show that masked MLE is effective in reducing epistemic uncertainty, and serve as an effective token-level automatic curriculum learning technique. However, masked MLE is prone to overfitting and requires self-distillation regularization to improve or maintain performance on out-of-distribution tasks. We demonstrate significant performance gain via the proposed training objective - combined masked MLE and self-distillation - across multiple architectures (Gemma, LLaMA, Phi) and datasets (Alpaca, ShareGPT, GSM8K), mitigating overfitting while maintaining adaptability during post-training. Our findings suggest that uncertainty-aware training provides an effective mechanism for enhancing language model training.

Via

Access Paper or Ask Questions

Neural Circuit Architectural Priors for Embodied Control

Jan 13, 2022

Nikhil X. Bhattasali, Anthony M. Zador, Tatiana A. Engel

Figure 1 for Neural Circuit Architectural Priors for Embodied Control

Figure 2 for Neural Circuit Architectural Priors for Embodied Control

Figure 3 for Neural Circuit Architectural Priors for Embodied Control

Figure 4 for Neural Circuit Architectural Priors for Embodied Control

Abstract:Artificial neural networks for simulated motor control and robotics often adopt generic architectures like fully connected MLPs. While general, these tabula rasa architectures rely on large amounts of experience to learn, are not easily transferable to new bodies, and have internal dynamics that are difficult to interpret. In nature, animals are born with highly structured connectivity in their nervous systems shaped by evolution; this innate circuitry acts synergistically with learning mechanisms to provide inductive biases that enable most animals to function well soon after birth and improve abilities efficiently. Convolutional networks inspired by visual circuitry have encoded useful biases for vision. However, it is unknown the extent to which ANN architectures inspired by neural circuitry can yield useful biases for other domains. In this work, we ask what advantages biologically inspired network architecture can provide in the context of motor control. Specifically, we translate C. elegans circuits for locomotion into an ANN model controlling a simulated Swimmer agent. On a locomotion task, our architecture achieves good initial performance and asymptotic performance comparable with MLPs, while dramatically improving data efficiency and requiring orders of magnitude fewer parameters. Our architecture is more interpretable and transfers to new body designs. An ablation analysis shows that principled excitation/inhibition is crucial for learning, while weight initialization contributes to good initial performance. Our work demonstrates several advantages of ANN architectures inspired by systems neuroscience and suggests a path towards modeling more complex behavior.

Via

Access Paper or Ask Questions