Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gopeshh Subbaraj

Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation

Mar 08, 2025

Mohit Pandey, Gopeshh Subbaraj, Artem Cherkasov, Emmanuel Bengio

Abstract:Generative Flow Networks (GFlowNets) have recently emerged as a suitable framework for generating diverse and high-quality molecular structures by learning from rewards treated as unnormalized distributions. Previous works in this framework often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using drug-like molecule datasets, which teaches A-GFNs about inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further implement a goal-conditioned finetuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on a subset of ZINC dataset, and by employing robust evaluation metrics we show the effectiveness of our approach when compared to other relevant baseline methods for a wide range of drug design tasks.

Via

Access Paper or Ask Questions

Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Dec 18, 2024

Matthew Riemer, Gopeshh Subbaraj, Glen Berseth, Irina Rish

Figure 1 for Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Figure 2 for Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Figure 3 for Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Figure 4 for Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Abstract:Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectively minimize regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime reinforcement learning (RL) environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pok\'emon and Tetris.

Via

Access Paper or Ask Questions

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

Nov 04, 2024

Md Rifat Arefin, Gopeshh Subbaraj, Nicolas Gontier, Yann LeCun, Irina Rish, Ravid Shwartz-Ziv, Christopher Pal

Figure 1 for Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

Figure 2 for Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

Figure 3 for Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

Figure 4 for Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

Abstract:Decoder-only Transformers often struggle with complex reasoning tasks, particularly arithmetic reasoning requiring multiple sequential operations. In this work, we identify representation collapse in the model's intermediate layers as a key factor limiting their reasoning capabilities. To address this, we propose Sequential Variance-Covariance Regularization (Seq-VCR), which enhances the entropy of intermediate representations and prevents collapse. Combined with dummy pause tokens as substitutes for chain-of-thought (CoT) tokens, our method significantly improves performance in arithmetic reasoning problems. In the challenging $5 \times 5$ integer multiplication task, our approach achieves $99.5\%$ exact match accuracy, outperforming models of the same size (which yield $0\%$ accuracy) and GPT-4 with five-shot CoT prompting ($44\%$). We also demonstrate superior results on arithmetic expression and longest increasing subsequence (LIS) datasets. Our findings highlight the importance of preventing intermediate layer representation collapse to enhance the reasoning capabilities of Transformers and show that Seq-VCR offers an effective solution without requiring explicit CoT supervision.

Via

Access Paper or Ask Questions

GFlowNet Pretraining with Inexpensive Rewards

Sep 15, 2024

Mohit Pandey, Gopeshh Subbaraj, Emmanuel Bengio

Figure 1 for GFlowNet Pretraining with Inexpensive Rewards

Figure 2 for GFlowNet Pretraining with Inexpensive Rewards

Figure 3 for GFlowNet Pretraining with Inexpensive Rewards

Figure 4 for GFlowNet Pretraining with Inexpensive Rewards

Abstract:Generative Flow Networks (GFlowNets), a class of generative models have recently emerged as a suitable framework for generating diverse and high-quality molecular structures by learning from unnormalized reward distributions. Previous works in this direction often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using offline drug-like molecule datasets, which conditions A-GFNs on inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further our method by implementing a goal-conditioned fine-tuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on the ZINC15 offline dataset and employ robust evaluation metrics to show the effectiveness of our approach when compared to other relevant baseline methods in drug design.

Via

Access Paper or Ask Questions

Continual Learning In Environments With Polynomial Mixing Times

Dec 13, 2021

Matthew Riemer, Sharath Chandra Raparthy, Ignacio Cases, Gopeshh Subbaraj, Maximilian Puelma Touzel, Irina Rish

Figure 1 for Continual Learning In Environments With Polynomial Mixing Times

Figure 2 for Continual Learning In Environments With Polynomial Mixing Times

Figure 3 for Continual Learning In Environments With Polynomial Mixing Times

Figure 4 for Continual Learning In Environments With Polynomial Mixing Times

Abstract:The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches and propose a family of model-based algorithms that speed up learning by directly optimizing for the average reward through a novel bootstrapping procedure. Finally, we perform empirical regret analysis of our proposed approaches, demonstrating clear improvements over baselines and also how scalable MDPs can be used for analysis of RL algorithms as mixing times scale.

* 2 Figures, 20 pages

Via

Access Paper or Ask Questions