Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ekin D. Cubuk

Generative Hierarchical Materials Search

Sep 10, 2024

Sherry Yang, Simon Batzner, Ruiqi Gao, Muratahan Aykol, Alexander L. Gaunt, Brendan McMorrow, Danilo J. Rezende, Dale Schuurmans, Igor Mordatch, Ekin D. Cubuk

Figure 1 for Generative Hierarchical Materials Search

Figure 2 for Generative Hierarchical Materials Search

Figure 3 for Generative Hierarchical Materials Search

Figure 4 for Generative Hierarchical Materials Search

Abstract:Generative models trained at scale can now produce text, video, and more recently, scientific data such as crystal structures. In applications of generative approaches to materials science, and in particular to crystal structures, the guidance from the domain expert in the form of high-level instructions can be essential for an automated system to output candidate crystals that are viable for downstream research. In this work, we formulate end-to-end language-to-structure generation as a multi-objective optimization problem, and propose Generative Hierarchical Materials Search (GenMS) for controllable generation of crystal structures. GenMS consists of (1) a language model that takes high-level natural language as input and generates intermediate textual information about a crystal (e.g., chemical formulae), and (2) a diffusion model that takes intermediate information as input and generates low-level continuous value crystal structures. GenMS additionally uses a graph neural network to predict properties (e.g., formation energy) from the generated crystal structures. During inference, GenMS leverages all three components to conduct a forward tree search over the space of possible structures. Experiments show that GenMS outperforms other alternatives of directly using language models to generate structures both in satisfying user request and in generating low-energy structures. We confirm that GenMS is able to generate common crystal structures such as double perovskites, or spinels, solely from natural language input, and hence can form the foundation for more complex structure generation in near future.

* https://generative-materials.github.io/

Via

Access Paper or Ask Questions

G-Augment: Searching For The Meta-Structure Of Data Augmentation Policies For ASR

Oct 19, 2022

Gary Wang, Ekin D. Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park

Figure 1 for G-Augment: Searching For The Meta-Structure Of Data Augmentation Policies For ASR

Figure 2 for G-Augment: Searching For The Meta-Structure Of Data Augmentation Policies For ASR

Figure 3 for G-Augment: Searching For The Meta-Structure Of Data Augmentation Policies For ASR

Figure 4 for G-Augment: Searching For The Meta-Structure Of Data Augmentation Policies For ASR

Abstract:Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more "end-to-end", the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present Graph-Augment, a technique to define the augmentation space as directed acyclic graphs (DAGs) and search over this space to optimize the augmentation policy itself. We show that given the same computational budget, policies produced by G-Augment are able to perform better than SpecAugment policies obtained by random search on fine-tuning tasks on CHiME-6 and AMI. G-Augment is also able to establish a new state-of-the-art ASR performance on the CHiME-6 evaluation set (30.7% WER). We further demonstrate that G-Augment policies show better transfer properties across warm-start to cold-start training and model size compared to random-searched SpecAugment policies.

* 6 pages, accepted at SLT 2022

Via

Access Paper or Ask Questions

On the surprising tradeoff between ImageNet accuracy and perceptual similarity

Mar 09, 2022

Manoj Kumar, Neil Houlsby, Nal Kalchbrenner, Ekin D. Cubuk

Figure 1 for On the surprising tradeoff between ImageNet accuracy and perceptual similarity

Figure 2 for On the surprising tradeoff between ImageNet accuracy and perceptual similarity

Figure 3 for On the surprising tradeoff between ImageNet accuracy and perceptual similarity

Figure 4 for On the surprising tradeoff between ImageNet accuracy and perceptual similarity

Abstract:Perceptual distances between images, as measured in the space of pre-trained deep features, have outperformed prior low-level, pixel-based metrics on assessing image similarity. While the capabilities of older and less accurate models such as AlexNet and VGG to capture perceptual similarity are well known, modern and more accurate models are less studied. First, we observe a surprising inverse correlation between ImageNet accuracy and Perceptual Scores of modern networks such as ResNets, EfficientNets, and Vision Transformers: that is better classifiers achieve worse Perceptual Scores. Then, we perform a large-scale study and examine the ImageNet accuracy/Perceptual Score relationship on varying the depth, width, number of training steps, weight decay, label smoothing, and dropout. Higher accuracy improves Perceptual Score up to a certain point, but we uncover a Pareto frontier between accuracies and Perceptual Score in the mid-to-high accuracy regime. We explore this relationship further using distortion invariance, spatial frequency sensitivity, and alternative perceptual functions. Interestingly we discover shallow ResNets, trained for less than 5 epochs only on ImageNet, whose emergent Perceptual Score matches the prior best networks trained directly on supervised human perceptual judgements.

* Preprint

Via

Access Paper or Ask Questions

No One Representation to Rule Them All: Overlapping Features of Training Methods

Oct 26, 2021

Raphael Gontijo-Lopes, Yann Dauphin, Ekin D. Cubuk

Figure 1 for No One Representation to Rule Them All: Overlapping Features of Training Methods

Figure 2 for No One Representation to Rule Them All: Overlapping Features of Training Methods

Figure 3 for No One Representation to Rule Them All: Overlapping Features of Training Methods

Figure 4 for No One Representation to Rule Them All: Overlapping Features of Training Methods

Abstract:Despite being able to capture a range of features of the data, high accuracy models trained with supervision tend to make similar predictions. This seemingly implies that high-performing models share similar biases regardless of training methodology, which would limit ensembling benefits and render low-accuracy models as having little practical use. Against this backdrop, recent work has made very different training techniques, such as large-scale contrastive learning, yield competitively-high accuracy on generalization and robustness benchmarks. This motivates us to revisit the assumption that models necessarily learn similar functions. We conduct a large-scale empirical study of models across hyper-parameters, architectures, frameworks, and datasets. We find that model pairs that diverge more in training methodology display categorically different generalization behavior, producing increasingly uncorrelated errors. We show these models specialize in subdomains of the data, leading to higher ensemble performance: with just 2 models (each with ImageNet accuracy ~76.5%), we can create ensembles with 83.4% (+7% boost). Surprisingly, we find that even significantly low-accuracy models can be used to improve high-accuracy models. Finally, we show diverging training methodology yield representations that capture overlapping (but not supersetting) feature sets which, when combined, lead to increased downstream performance.

Via

Access Paper or Ask Questions

Multi-Task Self-Training for Learning General Representations

Aug 25, 2021

Golnaz Ghiasi, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, Tsung-Yi Lin

Figure 1 for Multi-Task Self-Training for Learning General Representations

Figure 2 for Multi-Task Self-Training for Learning General Representations

Figure 3 for Multi-Task Self-Training for Learning General Representations

Figure 4 for Multi-Task Self-Training for Learning General Representations

Abstract:Despite the fast progress in training specialized models for various tasks, learning a single general model that works well for many tasks is still challenging for computer vision. Here we introduce multi-task self-training (MuST), which harnesses the knowledge in independent specialized teacher models (e.g., ImageNet model on classification) to train a single general student model. Our approach has three steps. First, we train specialized teachers independently on labeled datasets. We then use the specialized teachers to label an unlabeled dataset to create a multi-task pseudo labeled dataset. Finally, the dataset, which now contains pseudo labels from teacher models trained on different datasets/tasks, is then used to train a student model with multi-task learning. We evaluate the feature representations of the student model on 6 vision tasks including image recognition (classification, detection, segmentation)and 3D geometry estimation (depth and surface normal estimation). MuST is scalable with unlabeled or partially labeled datasets and outperforms both specialized supervised models and self-supervised models when training on large scale datasets. Lastly, we show MuST can improve upon already strong checkpoints trained with billions of examples. The results suggest self-training is a promising direction to aggregate labeled and unlabeled training data for learning general feature representations.

* ICCV 2021

Via

Access Paper or Ask Questions

Revisiting ResNets: Improved Training and Scaling Strategies

Mar 13, 2021

Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

Figure 1 for Revisiting ResNets: Improved Training and Scaling Strategies

Figure 2 for Revisiting ResNets: Improved Training and Scaling Strategies

Figure 3 for Revisiting ResNets: Improved Training and Scaling Strategies

Figure 4 for Revisiting ResNets: Improved Training and Scaling Strategies

Abstract:Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan & Le, 2019). Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. In a large-scale semi-supervised learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet NoisyStudent. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification on Kinetics-400. We recommend practitioners use these simple revised ResNets as baselines for future research.

Via

Access Paper or Ask Questions

Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation

Dec 13, 2020

Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph

Figure 1 for Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation

Figure 2 for Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation

Figure 3 for Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation

Figure 4 for Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation

Abstract:Building instance segmentation models that are data-efficient and can handle rare object categories is an important challenge in computer vision. Leveraging data augmentations is a promising direction towards addressing this challenge. Here, we perform a systematic study of the Copy-Paste augmentation ([13, 12]) for instance segmentation where we randomly paste objects onto an image. Prior studies on Copy-Paste relied on modeling the surrounding visual context for pasting the objects. However, we find that the simple mechanism of pasting objects randomly is good enough and can provide solid gains on top of strong baselines. Furthermore, we show Copy-Paste is additive with semi-supervised methods that leverage extra data through pseudo labeling (e.g. self-training). On COCO instance segmentation, we achieve 49.1 mask AP and 57.3 box AP, an improvement of +0.6 mask AP and +1.5 box AP over the previous state-of-the-art. We further demonstrate that Copy-Paste can lead to significant improvements on the LVIS benchmark. Our baseline model outperforms the LVIS 2020 Challenge winning entry by +3.6 mask AP on rare categories.

Via

Access Paper or Ask Questions

Crystal Structure Search with Random Relaxations Using Graph Networks

Dec 08, 2020

Gowoon Cheon, Lusann Yang, Kevin McCloskey, Evan J. Reed, Ekin D. Cubuk

Figure 1 for Crystal Structure Search with Random Relaxations Using Graph Networks

Figure 2 for Crystal Structure Search with Random Relaxations Using Graph Networks

Figure 3 for Crystal Structure Search with Random Relaxations Using Graph Networks

Figure 4 for Crystal Structure Search with Random Relaxations Using Graph Networks

Abstract:Materials design enables technologies critical to humanity, including combating climate change with solar cells and batteries. Many properties of a material are determined by its atomic crystal structure. However, prediction of the atomic crystal structure for a given material's chemical formula is a long-standing grand challenge that remains a barrier in materials design. We investigate a data-driven approach to accelerating ab initio random structure search (AIRSS), a state-of-the-art method for crystal structure search. We build a novel dataset of random structure relaxations of Li-Si battery anode materials using high-throughput density functional theory calculations. We train graph neural networks to simulate relaxations of random structures. Our model is able to find an experimentally verified structure of Li15Si4 it was not trained on, and has potential for orders of magnitude speedup over AIRSS when searching large unit cells and searching over multiple chemical stoichiometries. Surprisingly, we find that data augmentation of adding Gaussian noise improves both the accuracy and out of domain generalization of our models.

* Removed citations from the abstract, paper content is unchanged

Via

Access Paper or Ask Questions

Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics

Sep 17, 2020

Li Li, Stephan Hoyer, Ryan Pederson, Ruoxi Sun, Ekin D. Cubuk, Patrick Riley, Kieron Burke

Figure 1 for Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics

Figure 2 for Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics

Figure 3 for Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics

Figure 4 for Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics

Abstract:Including prior knowledge is important for effective machine learning models in physics, and is usually achieved by explicitly adding loss terms or constraints on model architectures. Prior knowledge embedded in the physics computation itself rarely draws attention. We show that solving the Kohn-Sham equations when training neural networks for the exchange-correlation functional provides an implicit regularization that greatly improves generalization. Two separations suffice for learning the entire one-dimensional H$_2$ dissociation curve within chemical accuracy, including the strongly correlated region. Our models also generalize to unseen types of molecules and overcome self-interaction error.

Via

Access Paper or Ask Questions

Rethinking Pre-training and Self-training

Jun 11, 2020

Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

Figure 1 for Rethinking Pre-training and Self-training

Figure 2 for Rethinking Pre-training and Self-training

Figure 3 for Rethinking Pre-training and Self-training

Figure 4 for Rethinking Pre-training and Self-training

Abstract:Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a surprising result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes. In other words, self-training works well exactly on the same setup that pre-training does not work (using ImageNet to help COCO). On the PASCAL segmentation dataset, which is a much smaller dataset than COCO, though pre-training does help significantly, self-training improves upon the pre-trained model. On COCO object detection, we achieve 54.3AP, an improvement of +1.5AP over the strongest SpineNet model. On PASCAL segmentation, we achieve 90.5 mIOU, an improvement of +1.5% mIOU over the previous state-of-the-art result by DeepLabv3+.

Via

Access Paper or Ask Questions