Abstract:Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTableGPT features key strategies of table data representation and table splitting for better GPT comprehension and filtering hallucinated information through follow-up questions. When applied to a vast volume of water splitting catalysis literature, MaTableGPT achieved an extraction accuracy (total F1 score) of up to 96.8%. Through comprehensive evaluations of the GPT usage cost, labeling cost, and extraction accuracy for the learning methods of zero-shot, few-shot and fine-tuning, we present a Pareto-front mapping where the few-shot learning method was found to be the most balanced solution owing to both its high extraction accuracy (total F1 score>95%) and low cost (GPT usage cost of 5.97 US dollars and labeling cost of 10 I/O paired examples). The statistical analyses conducted on the database generated by MaTableGPT revealed valuable insights into the distribution of the overpotential and elemental utilization across the reported catalysts in the water splitting literature.
Abstract:The optimization of nanomaterial synthesis using numerous synthetic variables is considered to be extremely laborious task because the conventional combinatorial explorations are prohibitively expensive. In this work, we report an autonomous experimentation platform developed for the bespoke design of nanoparticles (NPs) with targeted optical properties. This platform operates in a closed-loop manner between a batch synthesis module of NPs and a UV- Vis spectroscopy module, based on the feedback of the AI optimization modeling. With silver (Ag) NPs as a representative example, we demonstrate that the Bayesian optimizer implemented with the early stopping criterion can efficiently produce Ag NPs precisely possessing the desired absorption spectra within only 200 iterations (when optimizing among five synthetic reagents). In addition to the outstanding material developmental efficiency, the analysis of synthetic variables further reveals a novel chemistry involving the effects of citrate in Ag NP synthesis. The amount of citrate is a key to controlling the competitions between spherical and plate-shaped NPs and, as a result, affects the shapes of the absorption spectra as well. Our study highlights both capabilities of the platform to enhance search efficiencies and to provide a novel chemical knowledge by analyzing datasets accumulated from the autonomous experimentations.
Abstract:Although robot-based automation in chemistry laboratories can accelerate the material development process, surveillance-free environments may lead to dangerous accidents primarily due to machine control errors. Object detection techniques can play vital roles in addressing these safety issues; however, state-of-the-art detectors, including single-shot detector (SSD) models, suffer from insufficient accuracy in environments involving complex and noisy scenes. With the aim of improving safety in a surveillance-free laboratory, we report a novel deep learning (DL)-based object detector, namely, DenseSSD. For the foremost and frequent problem of detecting vial positions, DenseSSD achieved a mean average precision (mAP) over 95% based on a complex dataset involving both empty and solution-filled vials, greatly exceeding those of conventional detectors; such high precision is critical to minimizing failure-induced accidents. Additionally, DenseSSD was observed to be highly insensitive to the environmental changes, maintaining its high precision under the variations of solution colors or testing view angles. The robustness of DenseSSD would allow the utilized equipment settings to be more flexible. This work demonstrates that DenseSSD is useful for enhancing safety in an automated material synthesis environment, and it can be extended to various applications where high detection accuracy and speed are both needed.
Abstract:The robust and automated determination of crystal symmetry is of utmost importance in material characterization and analysis. Recent studies have shown that deep learning (DL) methods can effectively reveal the correlations between X-ray or electron-beam diffraction patterns and crystal symmetry. Despite their promise, most of these studies have been limited to identifying relatively few classes into which a target material may be grouped. On the other hand, the DL-based identification of crystal symmetry suffers from a drastic drop in accuracy for problems involving classification into tens or hundreds of symmetry classes (e.g., up to 230 space groups), severely limiting its practical usage. Here, we demonstrate that a combined approach of shaping diffraction patterns and implementing them in a multistream DenseNet (MSDN) substantially improves the accuracy of classification. Even with an imbalanced dataset of 108,658 individual crystals sampled from 72 space groups, our model achieves 80.2% space group classification accuracy, outperforming conventional benchmark models by 17-27 percentage points (%p). The enhancement can be largely attributed to the pattern shaping strategy, through which the subtle changes in patterns between symmetrically close crystal systems (e.g., monoclinic vs. orthorhombic or trigonal vs. hexagonal) are well differentiated. We additionally find that the novel MSDN architecture is advantageous for capturing patterns in a richer but less redundant manner relative to conventional convolutional neural networks. The newly proposed protocols in regard to both input descriptor processing and DL architecture enable accurate space group classification and thus improve the practical usage of the DL approach in crystal symmetry identification.