Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Evan Dogariu

Provable Length Generalization in Sequence Prediction via Spectral Filtering

Nov 01, 2024

Annie Marsden, Evan Dogariu, Naman Agarwal, Xinyi Chen, Daniel Suo, Elad Hazan

Figure 1 for Provable Length Generalization in Sequence Prediction via Spectral Filtering

Figure 2 for Provable Length Generalization in Sequence Prediction via Spectral Filtering

Figure 3 for Provable Length Generalization in Sequence Prediction via Spectral Filtering

Figure 4 for Provable Length Generalization in Sequence Prediction via Spectral Filtering

Abstract:We consider the problem of length generalization in sequence prediction. We define a new metric of performance in this setting -- the Asymmetric-Regret -- which measures regret against a benchmark predictor with longer context length than available to the learner. We continue by studying this concept through the lens of the spectral filtering algorithm. We present a gradient-based learning algorithm that provably achieves length generalization for linear dynamical systems. We conclude with proof-of-concept experiments which are consistent with our theory.

* 34 pages, 9 figures

Via

Access Paper or Ask Questions

FutureFill: Fast Generation from Convolutional Sequence Models

Oct 02, 2024

Naman Agarwal, Xinyi Chen, Evan Dogariu, Vlad Feinberg, Daniel Suo, Peter Bartlett, Elad Hazan

Figure 1 for FutureFill: Fast Generation from Convolutional Sequence Models

Figure 2 for FutureFill: Fast Generation from Convolutional Sequence Models

Figure 3 for FutureFill: Fast Generation from Convolutional Sequence Models

Figure 4 for FutureFill: Fast Generation from Convolutional Sequence Models

Abstract:We address the challenge of efficient auto-regressive generation in sequence prediction models by introducing FutureFill: a method for fast generation that applies to any sequence prediction algorithm based on convolutional operators. Our approach reduces the generation time requirement from linear to square root relative to the context length. Additionally, FutureFill requires a prefill cache sized only by the number of tokens generated, which is smaller than the cache requirements for standard convolutional and attention-based models. We validate our theoretical findings with experimental evidence demonstrating correctness and efficiency gains in a synthetic generation task.

Via

Access Paper or Ask Questions

Flash STU: Fast Spectral Transform Units

Sep 17, 2024

Y. Isabel Liu, Windsor Nguyen, Yagiz Devre, Evan Dogariu, Anirudha Majumdar, Elad Hazan

Abstract:This paper describes an efficient, open source PyTorch implementation of the Spectral Transform Unit. We investigate sequence prediction tasks over several modalities including language, robotics, and simulated dynamical systems. We find that for the same parameter count, the STU and its variants outperform the Transformer as well as other leading state space models across various modalities.

Via

Access Paper or Ask Questions

Robust Streaming, Sampling, and a Perspective on Online Learning

Dec 04, 2023

Evan Dogariu, Jiatong Yu

Abstract:In this work we present an overview of statistical learning, followed by a survey of robust streaming techniques and challenges, culminating in several rigorous results proving the relationship that we motivate and hint at throughout the journey. Furthermore, we unify often disjoint theorems in a shared framework and notation to clarify the deep connections that are discovered. We hope that by approaching these results from a shared perspective, already aware of the technical connections that exist, we can enlighten the study of both fields and perhaps motivate new and previously unconsidered directions of research.

Via

Access Paper or Ask Questions

An End-to-End Network Pruning Pipeline with Sparsity Enforcement

Dec 04, 2023

Evan Dogariu

Abstract:Neural networks have emerged as a powerful tool for solving complex tasks across various domains, but their increasing size and computational requirements have posed significant challenges in deploying them on resource-constrained devices. Neural network sparsification, and in particular pruning, has emerged as an effective technique to alleviate these challenges by reducing model size, computational complexity, and memory footprint while maintaining competitive performance. However, many pruning pipelines modify the standard training pipeline at only a single stage, if at all. In this work, we look to develop an end-to-end training pipeline that befits neural network pruning and sparsification at all stages of training. To do so, we make use of nonstandard model parameter initialization, pre-pruning training methodologies, and post-pruning training optimizations. We conduct experiments utilizing combinations of these methods, in addition to different techniques used in the pruning step, and find that our combined pipeline can achieve significant gains over current state of the art approaches to neural network sparsification.

Via

Access Paper or Ask Questions

Appearance Codes using Joint Embedding Learning of Multiple Modalities

Nov 19, 2023

Alex Zhang, Evan Dogariu

Abstract:The use of appearance codes in recent work on generative modeling has enabled novel view renders with variable appearance and illumination, such as day-time and night-time renders of a scene. A major limitation of this technique is the need to re-train new appearance codes for every scene on inference, so in this work we address this problem proposing a framework that learns a joint embedding space for the appearance and structure of the scene by enforcing a contrastive loss constraint between different modalities. We apply our framework to a simple Variational Auto-Encoder model on the RADIATE dataset \cite{sheeny2021radiate} and qualitatively demonstrate that we can generate new renders of night-time photos using day-time appearance codes without additional optimization iterations. Additionally, we compare our model to a baseline VAE that uses the standard per-image appearance code technique and show that our approach achieves generations of similar quality without learning appearance codes for any unseen images on inference.

Via

Access Paper or Ask Questions