Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kumar Krishna Agrawal

Atlas: Multi-Scale Attention Improves Long Context Image Modeling

Mar 16, 2025

Kumar Krishna Agrawal, Long Lian, Longchao Liu, Natalia Harguindeguy, Boyi Li, Alexander Bick, Maggie Chung, Trevor Darrell, Adam Yala

Abstract:Efficiently modeling massive images is a long-standing challenge in machine learning. To this end, we introduce Multi-Scale Attention (MSA). MSA relies on two key ideas, (i) multi-scale representations (ii) bi-directional cross-scale communication. MSA creates O(log N) scales to represent the image across progressively coarser features and leverages cross-attention to propagate information across scales. We then introduce Atlas, a novel neural network architecture based on MSA. We demonstrate that Atlas significantly improves the compute-performance tradeoff of long-context image modeling in a high-resolution variant of ImageNet 100. At 1024px resolution, Atlas-B achieves 91.04% accuracy, comparable to ConvNext-B (91.92%) while being 4.3x faster. Atlas is 2.95x faster and 7.38% better than FasterViT, 2.25x faster and 4.96% better than LongViT. In comparisons against MambaVision-S, we find Atlas-S achieves 5%, 16% and 32% higher accuracy at 1024px, 2048px and 4096px respectively, while obtaining similar runtimes. Code for reproducing our experiments and pretrained models is available at https://github.com/yalalab/atlas.

Via

Access Paper or Ask Questions

Addressing Sample Inefficiency in Multi-View Representation Learning

Dec 17, 2023

Kumar Krishna Agrawal, Arna Ghosh, Adam Oberman, Blake Richards

Figure 1 for Addressing Sample Inefficiency in Multi-View Representation Learning

Figure 2 for Addressing Sample Inefficiency in Multi-View Representation Learning

Figure 3 for Addressing Sample Inefficiency in Multi-View Representation Learning

Figure 4 for Addressing Sample Inefficiency in Multi-View Representation Learning

Abstract:Non-contrastive self-supervised learning (NC-SSL) methods like BarlowTwins and VICReg have shown great promise for label-free representation learning in computer vision. Despite the apparent simplicity of these techniques, researchers must rely on several empirical heuristics to achieve competitive performance, most notably using high-dimensional projector heads and two augmentations of the same image. In this work, we provide theoretical insights on the implicit bias of the BarlowTwins and VICReg loss that can explain these heuristics and guide the development of more principled recommendations. Our first insight is that the orthogonality of the features is more critical than projector dimensionality for learning good representations. Based on this, we empirically demonstrate that low-dimensional projector heads are sufficient with appropriate regularization, contrary to the existing heuristic. Our second theoretical insight suggests that using multiple data augmentations better represents the desiderata of the SSL objective. Based on this, we demonstrate that leveraging more augmentations per sample improves representation quality and trainability. In particular, it improves optimization convergence, leading to better features emerging earlier in the training. Remarkably, we demonstrate that we can reduce the pretraining dataset size by up to 4x while maintaining accuracy and improving convergence simply by using more data augmentations. Combining these insights, we present practical pretraining recommendations that improve wall-clock time by 2x and improve performance on CIFAR-10/STL-10 datasets using a ResNet-50 backbone. Thus, this work provides a theoretical insight into NC-SSL and produces practical recommendations for enhancing its sample and compute efficiency.

Via

Access Paper or Ask Questions

Attribute Diversity Determines the Systematicity Gap in VQA

Nov 15, 2023

Ian Berlot-Attwell, A. Michael Carrell, Kumar Krishna Agrawal, Yash Sharma, Naomi Saphra

Figure 1 for Attribute Diversity Determines the Systematicity Gap in VQA

Figure 2 for Attribute Diversity Determines the Systematicity Gap in VQA

Figure 3 for Attribute Diversity Determines the Systematicity Gap in VQA

Figure 4 for Attribute Diversity Determines the Systematicity Gap in VQA

Abstract:The degree to which neural networks can generalize to new combinations of familiar concepts, and the conditions under which they are able to do so, has long been an open question. In this work, we study the systematicity gap in visual question answering: the performance difference between reasoning on previously seen and unseen combinations of object attributes. To test, we introduce a novel diagnostic dataset, CLEVR-HOPE. We find that while increased quantity of training data does not reduce the systematicity gap, increased training data diversity of the attributes in the unseen combination does. In all, our experiments suggest that the more distinct attribute type combinations are seen during training, the more systematic we can expect the resulting model to be.

* 18 pages, 20 figures

Via

Access Paper or Ask Questions

Context-Aware Streaming Perception in Dynamic Environments

Aug 16, 2022

Gur-Eyal Sela, Ionel Gog, Justin Wong, Kumar Krishna Agrawal, Xiangxi Mo, Sukrit Kalra, Peter Schafhalter, Eric Leong, Xin Wang, Bharathan Balaji(+2 more)

Figure 1 for Context-Aware Streaming Perception in Dynamic Environments

Figure 2 for Context-Aware Streaming Perception in Dynamic Environments

Figure 3 for Context-Aware Streaming Perception in Dynamic Environments

Figure 4 for Context-Aware Streaming Perception in Dynamic Environments

Abstract:Efficient vision works maximize accuracy under a latency budget. These works evaluate accuracy offline, one image at a time. However, real-time vision applications like autonomous driving operate in streaming settings, where ground truth changes between inference start and finish. This results in a significant accuracy drop. Therefore, a recent work proposed to maximize accuracy in streaming settings on average. In this paper, we propose to maximize streaming accuracy for every environment context. We posit that scenario difficulty influences the initial (offline) accuracy difference, while obstacle displacement in the scene affects the subsequent accuracy degradation. Our method, Octopus, uses these scenario properties to select configurations that maximize streaming accuracy at test time. Our method improves tracking performance (S-MOTA) by 7.4% over the conventional static approach. Further, performance improvement using our method comes in addition to, and not instead of, advances in offline accuracy.

* 26 pages, 10 figures, to be published in ECCV 2022

Via

Access Paper or Ask Questions

Investigating Power laws in Deep Representation Learning

Feb 11, 2022

Arna Ghosh, Arnab Kumar Mondal, Kumar Krishna Agrawal, Blake Richards

Abstract:Representation learning that leverages large-scale labelled datasets, is central to recent progress in machine learning. Access to task relevant labels at scale is often scarce or expensive, motivating the need to learn from unlabelled datasets with self-supervised learning (SSL). Such large unlabelled datasets (with data augmentations) often provide a good coverage of the underlying input distribution. However evaluating the representations learned by SSL algorithms still requires task-specific labelled samples in the training pipeline. Additionally, the generalization of task-specific encoding is often sensitive to potential distribution shift. Inspired by recent advances in theoretical machine learning and vision neuroscience, we observe that the eigenspectrum of the empirical feature covariance matrix often follows a power law. For visual representations, we estimate the coefficient of the power law, $\alpha$, across three key attributes which influence representation learning: learning objective (supervised, SimCLR, Barlow Twins and BYOL), network architecture (VGG, ResNet and Vision Transformer), and tasks (object and scene recognition). We observe that under mild conditions, proximity of $\alpha$ to 1, is strongly correlated to the downstream generalization performance. Furthermore, $\alpha \approx 1$ is a strong indicator of robustness to label noise during fine-tuning. Notably, $\alpha$ is computable from the representations without knowledge of any labels, thereby offering a framework to evaluate the quality of representations in unlabelled datasets.

Via

Access Paper or Ask Questions

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

Jun 28, 2021

Wenshuo Guo, Kumar Krishna Agrawal, Aditya Grover, Vidya Muthukumar, Ashwin Pananjady

Figure 1 for Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

Figure 2 for Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

Figure 3 for Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

Figure 4 for Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

Abstract:We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy, and thereby suffer from an identifiability issue. In contrast, our paradigm leverages the demonstrator's behavior en route to optimality, and in particular, the exploration phase, to obtain consistent reward estimates. We develop simple and efficient reward estimation procedures for demonstrations within a class of upper-confidence-based algorithms, showing that reward estimation gets progressively easier as the regret of the algorithm increases. We match these upper bounds with information-theoretic lower bounds that apply to any demonstrator algorithm, thereby characterizing the optimal tradeoff between exploration and reward estimation. Extensive empirical evaluations on both synthetic data and simulated experimental design data from the natural sciences corroborate our theoretical results.

Via

Access Paper or Ask Questions

Discrete Flows: Invertible Generative Models of Discrete Data

May 24, 2019

Dustin Tran, Keyon Vafa, Kumar Krishna Agrawal, Laurent Dinh, Ben Poole

Figure 1 for Discrete Flows: Invertible Generative Models of Discrete Data

Figure 2 for Discrete Flows: Invertible Generative Models of Discrete Data

Figure 3 for Discrete Flows: Invertible Generative Models of Discrete Data

Figure 4 for Discrete Flows: Invertible Generative Models of Discrete Data

Abstract:While normalizing flows have led to significant advances in modeling high-dimensional continuous distributions, their applicability to discrete distributions remains unknown. In this paper, we show that flows can in fact be extended to discrete events---and under a simple change-of-variables formula not requiring log-determinant-Jacobian computations. Discrete flows have numerous applications. We consider two flow architectures: discrete autoregressive flows that enable bidirectionality, allowing, for example, tokens in text to depend on both left-to-right and right-to-left contexts in an exact language model; and discrete bipartite flows that enable efficient non-autoregressive generation as in RealNVP. Empirically, we find that discrete autoregressive flows outperform autoregressive baselines on synthetic discrete distributions, an addition task, and Potts models; and bipartite flows can obtain competitive performance with autoregressive baselines on character-level language modeling for Penn Tree Bank and text8.

Via

Access Paper or Ask Questions

GANSynth: Adversarial Neural Audio Synthesis

Apr 15, 2019

Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, Adam Roberts

Figure 1 for GANSynth: Adversarial Neural Audio Synthesis

Figure 2 for GANSynth: Adversarial Neural Audio Synthesis

Figure 3 for GANSynth: Adversarial Neural Audio Synthesis

Figure 4 for GANSynth: Adversarial Neural Audio Synthesis

Abstract:Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence. Autoregressive models, such as WaveNet, model local structure at the expense of global latent structure and slow iterative sampling, while Generative Adversarial Networks (GANs), have global latent conditioning and efficient parallel sampling, but struggle to generate locally-coherent audio waveforms. Herein, we demonstrate that GANs can in fact generate high-fidelity and locally-coherent audio by modeling log magnitudes and instantaneous frequencies with sufficient frequency resolution in the spectral domain. Through extensive empirical investigations on the NSynth dataset, we demonstrate that GANs are able to outperform strong WaveNet baselines on automated and human evaluation metrics, and efficiently generate audio several orders of magnitude faster than their autoregressive counterparts.

* Colab Notebook: http://goo.gl/magenta/gansynth-demo

Via

Access Paper or Ask Questions

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

Oct 15, 2018

Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson

Figure 1 for Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

Figure 2 for Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

Figure 3 for Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

Figure 4 for Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

Abstract:We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for some environments, they can also lead to sub-optimal behavior in others. Secondly, even though these algorithms can learn from few expert demonstrations, they require a prohibitively large number of interactions with the environment in order to imitate the expert for many real-world applications. In order to address these issues, we propose a new algorithm called Discriminator-Actor-Critic that uses off-policy Reinforcement Learning to reduce policy-environment interaction sample complexity by an average factor of 10. Furthermore, since our reward function is designed to be unbiased, we can apply our algorithm to many problems without making any task-specific adjustments.

Via

Access Paper or Ask Questions

Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Jul 03, 2018

Surya Bhupatiraju, Kumar Krishna Agrawal, Rishabh Singh

Figure 1 for Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Figure 2 for Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Figure 3 for Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Figure 4 for Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Abstract:Deep reinforcement learning has led to several recent breakthroughs, though the learned policies are often based on black-box neural networks. This makes them difficult to interpret and to impose desired specification constraints during learning. We present an iterative framework, MORL, for improving the learned policies using program synthesis. Concretely, we propose to use synthesis techniques to obtain a symbolic representation of the learned policy, which can then be debugged manually or automatically using program repair. After the repair step, we use behavior cloning to obtain the policy corresponding to the repaired program, which is then further improved using gradient descent. This process continues until the learned policy satisfies desired constraints. We instantiate MORL for the simple CartPole problem and show that the programmatic representation allows for high-level modifications that in turn lead to improved learning of the policies.

* Updated publication details, format. Accepted at NAMPI workshop, ICML '18

Via

Access Paper or Ask Questions