Abstract:Neural surrogates for partial differential equations (PDEs) have become popular due to their potential to quickly simulate physics. With a few exceptions, neural surrogates generally treat the forward evolution of time-dependent PDEs as a black box by directly predicting the next state. While this is a natural and easy framework for applying neural surrogates, it can be an over-simplified and rigid framework for predicting physics. In this work, we propose an alternative framework in which neural solvers predict the temporal derivative and an ODE integrator forwards the solution in time, which has little overhead and is broadly applicable across model architectures and PDEs. We find that by simply changing the training target and introducing numerical integration during inference, neural surrogates can gain accuracy and stability. Predicting temporal derivatives also allows models to not be constrained to a specific temporal discretization, allowing for flexible time-stepping during inference or training on higher-resolution PDE data. Lastly, we investigate why this new framework can be beneficial and in what situations does it work well.
Abstract:Deformable object manipulation remains a key challenge in developing autonomous robotic systems that can be successfully deployed in real-world scenarios. In this work, we explore the challenges of deformable object manipulation through the task of sculpting clay into 3D shapes. We propose the first coarse-to-fine autonomous sculpting system in which the sculpting agent first selects how many and where to place discrete chunks of clay into the workspace to create a coarse shape, and then iteratively refines the shape with sequences of deformation actions. We leverage large language models for sub-goal generation, and train a point cloud region-based action model to predict robot actions from the desired point cloud sub-goals. Additionally, our method is the first autonomous sculpting system that is a real-world text-to-3D shaping pipeline without any explicit 3D goals or sub-goals provided to the system. We demonstrate our method is able to successfully create a set of simple shapes solely from text-based prompting. Furthermore, we explore rigorously how to best quantify success for the text-to-3D sculpting task, and compare existing text-image and text-point cloud similarity metrics to human evaluations for this task. For experimental videos, human evaluation details, and full prompts, please see our project website: https://sites.google.com/andrew.cmu.edu/hierarchicalsculpting
Abstract:Identifying drug-target interactions is essential for developing effective therapeutics. Binding affinity quantifies these interactions, and traditional approaches rely on computationally intensive 3D structural data. In contrast, language models can efficiently process sequential data, offering an alternative approach to molecular representation. In the current study, we introduce BAPULM, an innovative sequence-based framework that leverages the chemical latent representations of proteins via ProtT5-XL-U50 and ligands through MolFormer, eliminating reliance on complex 3D configurations. Our approach was validated extensively on benchmark datasets, achieving scoring power (R) values of 0.925 $\pm$ 0.043, 0.914 $\pm$ 0.004, and 0.8132 $\pm$ 0.001 on benchmark1k2101, Test2016_290, and CSAR-HiQ_36, respectively. These findings indicate the robustness and accuracy of BAPULM across diverse datasets and underscore the potential of sequence-based models in-silico drug discovery, offering a scalable alternative to 3D-centric methods for screening potential ligands.
Abstract:In recent years, natural language processing (NLP) models have demonstrated remarkable capabilities in various domains beyond traditional text generation. In this work, we introduce PeptideGPT, a protein language model tailored to generate protein sequences with distinct properties: hemolytic activity, solubility, and non-fouling characteristics. To facilitate a rigorous evaluation of these generated sequences, we established a comprehensive evaluation pipeline consisting of ideas from bioinformatics to retain valid proteins with ordered structures. First, we rank the generated sequences based on their perplexity scores, then we filter out those lying outside the permissible convex hull of proteins. Finally, we predict the structure using ESMFold and select the proteins with pLDDT values greater than 70 to ensure ordered structure. The properties of generated sequences are evaluated using task-specific classifiers - PeptideBERT and HAPPENN. We achieved an accuracy of 76.26% in hemolytic, 72.46% in non-hemolytic, 78.84% in non-fouling, and 68.06% in solubility protein generation. Our experimental results demonstrate the effectiveness of PeptideGPT in de novo protein design and underscore the potential of leveraging NLP-based approaches for paving the way for future innovations and breakthroughs in synthetic biology and bioinformatics. Codes, models, and data used in this study are freely available at: https://github.com/aayush-shah14/PeptideGPT.
Abstract:Adsorption energy is a key reactivity descriptor in catalysis, enabling the efficient screening of potential catalysts. However, determining adsorption energy involves comparing the energies of multiple adsorbate-catalyst configurations, which is computationally demanding due to a large number of possible configurations. Current algorithmic approaches typically enumerate adsorption sites and configurations without leveraging theoretical insights to guide the initial setup. In this work, we present Adsorb-Agent, a Large Language Model (LLM) agent designed to efficiently derive system-specific stable adsorption configurations with minimal human intervention. Adsorb-Agent leverages built-in knowledge and emergent reasoning capabilities, significantly reducing the number of initial configurations required while improving accuracy in predicting the minimum adsorption energy. We demonstrate its performance using two example systems, NNH-CuPd3 (111) and NNH-Mo3Pd (111), for the Nitrogen Reduction Reaction (NRR), a sustainable alternative to the Haber-Bosch process. Adsorb-Agent outperforms conventional "heuristic" and "random" algorithms by identifying lower-energy configurations with fewer initial setups, reducing computational costs while enhancing accuracy. This highlights its potential to accelerate catalyst discovery.
Abstract:Solving Partial Differential Equations (PDEs) is ubiquitous in science and engineering. Computational complexity and difficulty in writing numerical solvers has motivated the development of machine learning techniques to generate solutions quickly. Many existing methods are purely data driven, relying solely on numerical solution fields, rather than known system information such as boundary conditions and governing equations. However, the recent rise in popularity of Large Language Models (LLMs) has enabled easy integration of text in multimodal machine learning models. In this work, we use pretrained LLMs to integrate various amounts known system information into PDE learning. Our multimodal approach significantly outperforms our baseline model, FactFormer, in both next-step prediction and autoregressive rollout performance on the 2D Heat, Burgers, Navier-Stokes, and Shallow Water equations. Further analysis shows that pretrained LLMs provide highly structured latent space that is consistent with the amount of system information provided through text.
Abstract:Recent advances in deep learning have inspired numerous works on data-driven solutions to partial differential equation (PDE) problems. These neural PDE solvers can often be much faster than their numerical counterparts; however, each presents its unique limitations and generally balances training cost, numerical accuracy, and ease of applicability to different problem setups. To address these limitations, we introduce several methods to apply latent diffusion models to physics simulation. Firstly, we introduce a mesh autoencoder to compress arbitrarily discretized PDE data, allowing for efficient diffusion training across various physics. Furthermore, we investigate full spatio-temporal solution generation to mitigate autoregressive error accumulation. Lastly, we investigate conditioning on initial physical quantities, as well as conditioning solely on a text prompt to introduce text2PDE generation. We show that language can be a compact, interpretable, and accurate modality for generating physics simulations, paving the way for more usable and accessible PDE solvers. Through experiments on both uniform and structured grids, we show that the proposed approach is competitive with current neural PDE solvers in both accuracy and efficiency, with promising scaling behavior up to $\sim$3 billion parameters. By introducing a scalable, accurate, and usable physics simulator, we hope to bring neural PDE solvers closer to practical use.
Abstract:As robotic systems become increasingly integrated into complex real-world environments, there is a growing need for approaches that enable robots to understand and act upon natural language instructions without relying on extensive pre-programmed knowledge of their surroundings. This paper presents PLATO, an innovative system that addresses this challenge by leveraging specialized large language model agents to process natural language inputs, understand the environment, predict tool affordances, and generate executable actions for robotic systems. Unlike traditional systems that depend on hard-coded environmental information, PLATO employs a modular architecture of specialized agents to operate without any initial knowledge of the environment. These agents identify objects and their locations within the scene, generate a comprehensive high-level plan, translate this plan into a series of low-level actions, and verify the completion of each step. The system is particularly tested on challenging tool-use tasks, which involve handling diverse objects and require long-horizon planning. PLATO's design allows it to adapt to dynamic and unstructured settings, significantly enhancing its flexibility and robustness. By evaluating the system across various complex scenarios, we demonstrate its capability to tackle a diverse range of tasks and offer a novel solution to integrate LLMs with robotic platforms, advancing the state-of-the-art in autonomous robotic task execution. For videos and prompt details, please see our project website: https://sites.google.com/andrew.cmu.edu/plato
Abstract:The design of aerodynamic shapes, such as airfoils, has traditionally required significant computational resources and relied on predefined design parameters, which limit the potential for novel shape synthesis. In this work, we introduce a data-driven methodology for airfoil generation using a diffusion model. Trained on a dataset of preexisting airfoils, our model can generate an arbitrary number of new airfoils from random vectors, which can be conditioned on specific aerodynamic performance metrics such as lift and drag, or geometric criteria. Our results demonstrate that the diffusion model effectively produces airfoil shapes with realistic aerodynamic properties, offering substantial improvements in efficiency, flexibility, and the potential for discovering innovative airfoil designs. This approach significantly expands the design space, facilitating the synthesis of high-performance aerodynamic shapes that transcend the limitations of traditional methods.
Abstract:Industry 4.0 has revolutionized manufacturing by driving digitalization and shifting the paradigm toward additive manufacturing (AM). Fused Deposition Modeling (FDM), a key AM technology, enables the creation of highly customized, cost-effective products with minimal material waste through layer-by-layer extrusion, posing a significant challenge to traditional subtractive methods. However, the susceptibility of material extrusion techniques to errors often requires expert intervention to detect and mitigate defects that can severely compromise product quality. While automated error detection and machine learning models exist, their generalizability across diverse 3D printer setups, firmware, and sensors is limited, and deep learning methods require extensive labeled datasets, hindering scalability and adaptability. To address these challenges, we present a process monitoring and control framework that leverages pre-trained Large Language Models (LLMs) alongside 3D printers to detect and address printing defects. The LLM evaluates print quality by analyzing images captured after each layer or print segment, identifying failure modes and querying the printer for relevant parameters. It then generates and executes a corrective action plan. We validated the effectiveness of the proposed framework in identifying defects by comparing it against a control group of engineers with diverse AM expertise. Our evaluation demonstrated that LLM-based agents not only accurately identify common 3D printing errors, such as inconsistent extrusion, stringing, warping, and layer adhesion, but also effectively determine the parameters causing these failures and autonomously correct them without any need for human intervention.