Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Justin S. Smith

Georgia Institute of Technology

Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACE

Apr 14, 2025

Jesun Firoz, Franco Pellegrini, Mario Geiger, Darren Hsu, Jenna A. Bilbrey, Han-Yi Chou, Maximilian Stadler, Markus Hoehnerbach, Tingyu Wang, Dejun Lin(+10 more)

Abstract:Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scientists. These models facilitate the understanding of matter and the discovery of new molecules and materials. In contrast to GNNs operating on a large homogeneous graphs, GNNs used by CFMs process a large number of geometric graphs of varying sizes, requiring different optimization strategies than those developed for large homogeneous GNNs. This paper presents optimizations for two critical phases of CFM training: data distribution and model training, targeting MACE - a state-of-the-art CFM. We address the challenge of load balancing in data distribution by formulating it as a multi-objective bin packing problem. We propose an iterative algorithm that provides a highly effective, fast, and practical solution, ensuring efficient data distribution. For the training phase, we identify symmetric tensor contraction as the key computational kernel in MACE and optimize this kernel to improve the overall performance. Our combined approach of balanced data distribution and kernel optimization significantly enhances the training process of MACE. Experimental results demonstrate a substantial speedup, reducing per-epoch execution time for training from 12 to 2 minutes on 740 GPUs with a 2.6M sample dataset.

* Accepted at The 34th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2025)

Via

Access Paper or Ask Questions

NavTuner: Learning a Scene-Sensitive Family of Navigation Policies

Mar 02, 2021

Haoxin Ma, Justin S. Smith, Patricio A. Vela

Figure 1 for NavTuner: Learning a Scene-Sensitive Family of Navigation Policies

Figure 2 for NavTuner: Learning a Scene-Sensitive Family of Navigation Policies

Figure 3 for NavTuner: Learning a Scene-Sensitive Family of Navigation Policies

Figure 4 for NavTuner: Learning a Scene-Sensitive Family of Navigation Policies

Abstract:The advent of deep learning has inspired research into end-to-end learning for a variety of problem domains in robotics. For navigation, the resulting methods may not have the generalization properties desired let alone match the performance of traditional methods. Instead of learning a navigation policy, we explore learning an adaptive policy in the parameter space of an existing navigation module. Having adaptive parameters provides the navigation module with a family of policies that can be dynamically reconfigured based on the local scene structure, and addresses the common assertion in machine learning that engineered solutions are inflexible. Of the methods tested, reinforcement learning (RL) is shown to provide a significant performance boost to a modern navigation method through reduced sensitivity of its success rate to environmental clutter. The outcomes indicate that RL as a meta-policy learner, or dynamic parameter tuner, effectively robustifies algorithms sensitive to external, measurable nuisance factors.

Via

Access Paper or Ask Questions

Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Aug 23, 2020

Yipu Zhao, Justin S. Smith, Patricio A. Vela

Figure 1 for Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Figure 2 for Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Figure 3 for Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Figure 4 for Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Abstract:The cost-efficiency of visual(-inertial) SLAM (VSLAM) is a critical characteristic of resource-limited applications. While hardware and algorithm advances have been significantly improved the cost-efficiency of VSLAM front-ends, the cost-efficiency of VSLAM back-ends remains a bottleneck. This paper describes a novel, rigorous method to improve the cost-efficiency of local BA in a BA-based VSLAM back-end. An efficient algorithm, called Good Graph, is developed to select size-reduced graphs optimized in local BA with condition preservation. To better suit BA-based VSLAM back-ends, the Good Graph predicts future estimation needs, dynamically assigns an appropriate size budget, and selects a condition-maximized subgraph for BA estimation. Evaluations are conducted on two scenarios: 1) VSLAM as standalone process, and 2) VSLAM as part of closed-loop navigation system. Results from the first scenario show Good Graph improves accuracy and robustness of VSLAM estimation, when computational limits exist. Results from the second scenario, indicate that Good Graph benefits the trajectory tracking performance of VSLAM-based closed-loop navigation systems, which is a primary application of VSLAM.

* 20 pages, 14 figures, 8 tables. Submitted to IEEE Transactions on Robotics, for the provided open-source software see https://github.com/ivalab/gf_orb_slam2

Via

Access Paper or Ask Questions

Simple and efficient algorithms for training machine learning potentials to force data

Jun 09, 2020

Justin S. Smith, Nicholas Lubbers, Aidan P. Thompson, Kipton Barros

Figure 1 for Simple and efficient algorithms for training machine learning potentials to force data

Figure 2 for Simple and efficient algorithms for training machine learning potentials to force data

Figure 3 for Simple and efficient algorithms for training machine learning potentials to force data

Abstract:Abstract Machine learning models, trained on data from ab initio quantum simulations, are yielding molecular dynamics potentials with unprecedented accuracy. One limiting factor is the quantity of available training data, which can be expensive to obtain. A quantum simulation often provides all atomic forces, in addition to the total energy of the system. These forces provide much more information than the energy alone. It may appear that training a model to this large quantity of force data would introduce significant computational costs. Actually, training to all available force data should only be a few times more expensive than training to energies alone. Here, we present a new algorithm for efficient force training, and benchmark its accuracy by training to forces from real-world datasets for organic chemistry and bulk aluminum.

Via

Access Paper or Ask Questions

Automated discovery of a robust interatomic potential for aluminum

Mar 10, 2020

Justin S. Smith, Benjamin Nebgen, Nithin Mathew, Jie Chen, Nicholas Lubbers, Leonid Burakovsky, Sergei Tretiak, Hai Ah Nam, Timothy Germann, Saryu Fensin(+1 more)

Figure 1 for Automated discovery of a robust interatomic potential for aluminum

Figure 2 for Automated discovery of a robust interatomic potential for aluminum

Figure 3 for Automated discovery of a robust interatomic potential for aluminum

Figure 4 for Automated discovery of a robust interatomic potential for aluminum

Abstract:Atomistic molecular dynamics simulation is an important tool for predicting materials properties. Accuracy depends crucially on the model for the interatomic potential. The gold standard would be quantum mechanics (QM) based force calculations, but such a first-principles approach becomes prohibitively expensive at large system sizes. Efficient machine learning models (ML) have become increasingly popular as surrogates for QM. Neural networks with many thousands of parameters excel in capturing structure within a large dataset, but may struggle to extrapolate beyond the scope of the available data. Here we present a highly automated active learning approach to iteratively collect new QM data that best resolves weaknesses in the existing ML model. We exemplify our approach by developing a general potential for elemental aluminum. At each active learning iteration, the method (1) trains an ANI-style neural network potential from the available data, (2) uses this potential to drive molecular dynamics simulations, and (3) collects new QM data whenever the neural network identifies an atomic configuration for which it cannot make a good prediction. All molecular dynamics simulations are initialized to a disordered configuration, and then driven according to randomized, time-varying temperatures. This nonequilibrium molecular dynamics forms a variety of crystalline and defected configurations. By training on all such automatically collected data, we produce ANI-Al, our new interatomic potential for aluminum. We demonstrate the remarkable transferability of ANI-Al by benchmarking against experimental data, e.g., the radial distribution function in melt, various properties of the stable face-centered cubic (FCC) crystal, and the coexistence curve between melt and FCC.

Via

Access Paper or Ask Questions

Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Mar 07, 2020

Yipu Zhao, Justin S. Smith, Sambhu H. Karumanchi, Patricio A. Vela

Figure 1 for Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Figure 2 for Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Figure 3 for Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Figure 4 for Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Abstract:Visual-inertial SLAM is essential for robot navigation in GPS-denied environments, e.g. indoor, underground. Conventionally, the performance of visual-inertial SLAM is evaluated with open-loop analysis, with a focus on the drift level of SLAM systems. In this paper, we raise the question on the importance of visual estimation latency in closed-loop navigation tasks, such as accurate trajectory tracking. To understand the impact of both drift and latency on visual-inertial SLAM systems, a closed-loop benchmarking simulation is conducted, where a robot is commanded to follow a desired trajectory using the feedback from visual-inertial estimation. By extensively evaluating the trajectory tracking performance of representative state-of-the-art visual-inertial SLAM systems, we reveal the importance of latency reduction in visual estimation module of these systems. The findings suggest directions of future improvements for visual-inertial SLAM.

* 8 pages, 7 figures. Accepted for publication in ICRA 2020

Via

Access Paper or Ask Questions

Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion

Aug 19, 2019

Alexander H. Chang, Shiyu Feng, Yipu Zhao, Justin S. Smith, Patricio A. Vela

Figure 1 for Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion

Figure 2 for Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion

Figure 3 for Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion

Figure 4 for Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion

Abstract:Rectilinear forms of snake-like robotic locomotion are anticipated to be an advantage in obstacle-strewn scenarios characterizing urban disaster zones, subterranean collapses, and other natural environments. The elongated, laterally-narrow footprint associated with these motion strategies is well-suited to traversal of confined spaces and narrow pathways. Navigation and path planning in the absence of global sensing, however, remains a pivotal challenge to be addressed prior to practical deployment of these robotic mechanisms. Several challenges related to visual processing and localization need to be resolved to to enable navigation. As a first pass in this direction, we equip a wireless, monocular color camera to the head of a robotic snake. Visiual odometry and mapping from ORB-SLAM permits self-localization in planar, obstacle-strewn environments. Ground plane traversability segmentation in conjunction with perception-space collision detection permits path planning for navigation. A previously presented dynamical reduction of rectilinear snake locomotion to a non-holonomic kinematic vehicle informs both SLAM and planning. The simplified motion model is then applied to track planned trajectories through an obstacle configuration. This navigational framework enables a snake-like robotic platform to autonomously navigate and traverse unknown scenarios with only monocular vision.

Via

Access Paper or Ask Questions

Less is more: sampling chemical space with active learning

Apr 09, 2018

Justin S. Smith, Ben Nebgen, Nicholas Lubbers, Olexandr Isayev, Adrian E. Roitberg

Figure 1 for Less is more: sampling chemical space with active learning

Figure 2 for Less is more: sampling chemical space with active learning

Figure 3 for Less is more: sampling chemical space with active learning

Figure 4 for Less is more: sampling chemical space with active learning

Abstract:The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction. QBC allows the presented AL algorithm to automatically sample regions of chemical space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach we develop the COMP6 benchmark (publicly available on GitHub), which contains a diverse set of organic molecules. Through the AL process, it is shown that the AL-based potentials perform as well as the ANI-1 potential on COMP6 with only 10% of the data, and vastly outperforms ANI-1 with 25% the amount of data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single molecule or materials, while remaining applicable to the general class of organic molecules comprised of the elements CHNO.

* J. Chem. Phys. 148, 241733 (2018)
* Accepted at J. Chem. Phys

Via

Access Paper or Ask Questions

Learning to Navigate: Exploiting Deep Networks to Inform Sample-Based Planning During Vision-Based Navigation

Jan 16, 2018

Justin S. Smith, Jin-Ha Hwang, Fu-Jen Chu, Patricio A. Vela

Figure 1 for Learning to Navigate: Exploiting Deep Networks to Inform Sample-Based Planning During Vision-Based Navigation

Figure 2 for Learning to Navigate: Exploiting Deep Networks to Inform Sample-Based Planning During Vision-Based Navigation

Figure 3 for Learning to Navigate: Exploiting Deep Networks to Inform Sample-Based Planning During Vision-Based Navigation

Figure 4 for Learning to Navigate: Exploiting Deep Networks to Inform Sample-Based Planning During Vision-Based Navigation

Abstract:Recent applications of deep learning to navigation have generated end-to-end navigation solutions whereby visual sensor input is mapped to control signals or to motion primitives. The resulting visual navigation strategies work very well at collision avoidance and have performance that matches traditional reactive navigation algorithms while operating in real-time. It is accepted that these solutions cannot provide the same level of performance as a global planner. However, it is less clear how such end-to-end systems should be integrated into a full navigation pipeline. We evaluate the typical end-to-end solution within a full navigation pipeline in order to expose its weaknesses. Doing so illuminates how to better integrate deep learning methods into the navigation pipeline. In particular, we show that they are an efficient means to provide informed samples for sample-based planners. Controlled simulations with comparison against traditional planners show that the number of samples can be reduced by an order of magnitude while preserving navigation performance. Implementation on a mobile robot matches the simulated performance outcomes.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

ANI-1: A data set of 20M off-equilibrium DFT calculations for organic molecules

Dec 12, 2017

Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg

Figure 1 for ANI-1: A data set of 20M off-equilibrium DFT calculations for organic molecules

Figure 2 for ANI-1: A data set of 20M off-equilibrium DFT calculations for organic molecules

Figure 3 for ANI-1: A data set of 20M off-equilibrium DFT calculations for organic molecules

Figure 4 for ANI-1: A data set of 20M off-equilibrium DFT calculations for organic molecules

Abstract:One of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy. Machine learning (ML), in particular neural networks, are emerging as a powerful approach to constructing various forms of transferable atomistic potentials. They have been successfully applied in a variety of applications in chemistry, biology, catalysis, and solid-state physics. However, these models are heavily dependent on the quality and quantity of data used in their fitting. Fitting highly flexible ML potentials comes at a cost: a vast amount of reference data is required to properly train these models. We address this need by providing access to a large computational DFT database, which consists of 20M conformations for 57,454 small organic molecules. We believe it will become a new standard benchmark for comparison of current and future methods in the ML potential community.

* Scientific Data 4, Article number: 170193 (2017)

Via

Access Paper or Ask Questions