Abstract:Knowledge driven discovery of novel materials necessitates the development of the causal models for the property emergence. While in classical physical paradigm the causal relationships are deduced based on the physical principles or via experiment, rapid accumulation of observational data necessitates learning causal relationships between dissimilar aspects of materials structure and functionalities based on observations. For this, it is essential to integrate experimental data with prior domain knowledge. Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs). By fine-tuning ChatGPT on domain-specific literature, such as arXiv papers on ferroelectrics, and combining obtained information with data-driven causal discovery, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO). This approach enables us to hypothesize how synthesis conditions influence material properties, particularly the coercive field (E0), and guides experimental validation. The ultimate objective of this work is to develop a unified framework that integrates LLM-driven literature analysis with data-driven discovery, facilitating the precise engineering of ferroelectric materials by establishing clear connections between synthesis conditions and their resulting material properties.
Abstract:Delineating the superconducting order parameters is a pivotal task in investigating superconductivity for probing pairing mechanisms, as well as their symmetry and topology. Point-contact Andreev reflection (PCAR) measurement is a simple yet powerful tool for identifying the order parameters. The PCAR spectra exhibit significant variations depending on the type of the order parameter in a superconductor, including its magnitude ($\mathit{\Delta}$), as well as temperature, interfacial quality, Fermi velocity mismatch, and other factors. The information on the order parameter can be obtained by finding the combination of these parameters, generating a theoretical spectrum that fits a measured experimental spectrum. However, due to the complexity of the spectra and the high dimensionality of parameters, extracting the fitting parameters is often time-consuming and labor-intensive. In this study, we employ a convolutional neural network (CNN) algorithm to create models for rapid and automated analysis of PCAR spectra of various superconductors with different pairing symmetries (conventional $s$-wave, chiral $p_x+ip_y$-wave, and $d_{x^2-y^2}$-wave). The training datasets are generated based on the Blonder-Tinkham-Klapwijk (BTK) theory and further modified and augmented by selectively incorporating noise and peaks according to the bias voltages. This approach not only replicates the experimental spectra but also brings the model's attention to important features within the spectra. The optimized models provide fitting parameters for experimentally measured spectra in less than 100 ms per spectrum. Our approaches and findings pave the way for rapid and automated spectral analysis which will help accelerate research on superconductors with complex order parameters.
Abstract:Rapid progress in aberration corrected electron microscopy necessitates development of robust methods for the identification of phases, ferroic variants, and other pertinent aspects of materials structure from imaging data. While unsupervised methods for clustering and classification are widely used for these tasks, their performance can be sensitive to hyperparameter selection in the analysis workflow. In this study, we explore the effects of descriptors and hyperparameters on the capability of unsupervised ML methods to distill local structural information, exemplified by discovery of polarization and lattice distortion in Sm doped BiFeO3 (BFO) thin films. We demonstrate that a reward-driven approach can be used to optimize these key hyperparameters across the full workflow, where rewards were designed to reflect domain wall continuity and straightness, ensuring that the analysis aligns with the material's physical behavior. This approach allows us to discover local descriptors that are best aligned with the specific physical behavior, providing insight into the fundamental physics of materials. We further extend the reward driven workflows to disentangle structural factors of variation via optimized variational autoencoder (VAE). Finally, the importance of well-defined rewards was explored as a quantifiable measure of success of the workflow.
Abstract:The rapid growth of automated and autonomous instrumentations brings forth an opportunity for the co-orchestration of multimodal tools, equipped with multiple sequential detection methods, or several characterization tools to explore identical samples. This can be exemplified by the combinatorial libraries that can be explored in multiple locations by multiple tools simultaneously, or downstream characterization in automated synthesis systems. In the co-orchestration approaches, information gained in one modality should accelerate the discovery of other modalities. Correspondingly, the orchestrating agent should select the measurement modality based on the anticipated knowledge gain and measurement cost. Here, we propose and implement a co-orchestration approach for conducting measurements with complex observables such as spectra or images. The method relies on combining dimensionality reduction by variational autoencoders with representation learning for control over the latent space structure, and integrated into iterative workflow via multi-task Gaussian Processes (GP). This approach further allows for the native incorporation of the system's physics via a probabilistic model as a mean function of the GP. We illustrated this method for different modalities of piezoresponse force microscopy and micro-Raman on combinatorial $Sm-BiFeO_3$ library. However, the proposed framework is general and can be extended to multiple measurement modalities and arbitrary dimensionality of measured signals. The analysis code that supports the funding is publicly available at https://github.com/Slautin/2024_Co-orchestration.