Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:softmax is not enough (for sharp out-of-distribution)

Oct 01, 2024

Petar Veličković, Christos Perivolaropoulos, Federico Barbero, Razvan Pascanu

Share this with someone who'll enjoy it:

Abstract:A key property of reasoning systems is the ability to make sharp decisions on their input data. For contemporary AI systems, a key carrier of sharp behaviour is the softmax function, with its capability to perform differentiable query-key lookups. It is a common belief that the predictive power of networks leveraging softmax arises from "circuits" which sharply perform certain kinds of computations consistently across many diverse inputs. However, for these circuits to be robust, they would need to generalise well to arbitrary valid inputs. In this paper, we dispel this myth: even for tasks as simple as finding the maximum key, any learned circuitry must disperse as the number of items grows at test time. We attribute this to a fundamental limitation of the softmax function to robustly approximate sharp functions, prove this phenomenon theoretically, and propose adaptive temperature as an ad-hoc technique for improving the sharpness of softmax at inference time.

* Comments welcome. 14 pages, 7 figures

View paper on

Share this with someone who'll enjoy it:

Title:softmax is not enough (for sharp out-of-distribution)

Paper and Code