Abstract:Materials science datasets are inherently heterogeneous and are available in different modalities such as characterization spectra, atomic structures, microscopic images, and text-based synthesis conditions. The advancements in multi-modal learning, particularly in vision and language models, have opened new avenues for integrating data in different forms. In this work, we evaluate common techniques in multi-modal learning (alignment and fusion) in unifying some of the most important modalities in materials science: atomic structure, X-ray diffraction patterns (XRD), and composition. We show that structure graph modality can be enhanced by aligning with XRD patterns. Additionally, we show that aligning and fusing more experimentally accessible data formats, such as XRD patterns and compositions, can create more robust joint embeddings than individual modalities across various tasks. This lays the groundwork for future studies aiming to exploit the full potential of multi-modal data in materials science, facilitating more informed decision-making in materials design and discovery.
Abstract:Adsorption energy is a key reactivity descriptor in catalysis, enabling the efficient screening of potential catalysts. However, determining adsorption energy involves comparing the energies of multiple adsorbate-catalyst configurations, which is computationally demanding due to a large number of possible configurations. Current algorithmic approaches typically enumerate adsorption sites and configurations without leveraging theoretical insights to guide the initial setup. In this work, we present Adsorb-Agent, a Large Language Model (LLM) agent designed to efficiently derive system-specific stable adsorption configurations with minimal human intervention. Adsorb-Agent leverages built-in knowledge and emergent reasoning capabilities, significantly reducing the number of initial configurations required while improving accuracy in predicting the minimum adsorption energy. We demonstrate its performance using two example systems, NNH-CuPd3 (111) and NNH-Mo3Pd (111), for the Nitrogen Reduction Reaction (NRR), a sustainable alternative to the Haber-Bosch process. Adsorb-Agent outperforms conventional "heuristic" and "random" algorithms by identifying lower-energy configurations with fewer initial setups, reducing computational costs while enhancing accuracy. This highlights its potential to accelerate catalyst discovery.
Abstract:The increasing popularity of machine learning (ML) in catalysis has spurred interest in leveraging these techniques to enhance catalyst design. Our study aims to bridge the gap between physics-based studies and data-driven methodologies by integrating ML techniques with eXplainable AI (XAI). Specifically, we employ two XAI techniques: Post-hoc XAI analysis and Symbolic Regression. These techniques help us unravel the correlation between adsorption energy and the properties of the adsorbate-catalyst system. Leveraging a large dataset such as the Open Catalyst Dataset (OC20), we employ a combination of shallow ML techniques and XAI methodologies. Our investigation involves utilizing multiple shallow machine learning techniques to predict adsorption energy, followed by post-hoc analysis for feature importance, inter-feature correlations, and the influence of various feature values on the prediction of adsorption energy. The post-hoc analysis reveals that adsorbate properties exert a greater influence than catalyst properties in our dataset. The top five features based on higher Shapley values are adsorbate electronegativity, the number of adsorbate atoms, catalyst electronegativity, effective coordination number, and the sum of atomic numbers of the adsorbate molecule. There is a positive correlation between catalyst and adsorbate electronegativity with the prediction of adsorption energy. Additionally, symbolic regression yields results consistent with SHAP analysis. It deduces a mathematical relationship indicating that the square of the catalyst electronegativity is directly proportional to the adsorption energy. These consistent correlations resemble those derived from physics-based equations in previous research. Our work establishes a robust framework that integrates ML techniques with XAI, leveraging large datasets like OC20 to enhance catalyst design through model explainability.