Abstract:The design-make-test-analyze cycle in early-stage drug discovery remains constrained primarily by the "make" step: small-molecule synthesis is slow, costly, and difficult to scale or automate across diverse chemotypes. Enumerated chemical spaces aim to reduce this bottleneck by predefining synthesizable regions of chemical space from available building blocks and reliable reactions, yet existing commercial spaces are still limited by long turnaround times, narrow reaction scope, and substantial manual decision-making in route selection and execution. Here we present the first version of onepot CORE, an enumerated chemical space containing 3.4B molecules and corresponding on-demand synthesis product enabled by an automated synthesis platform and an AI chemist, Phil, that designs, executes, and analyzes experiments. onepot CORE is constructed by (i) selecting a reaction set commonly used in medicinal chemistry, (ii) sourcing and curating building blocks from supplier catalogs, (iii) enumerating candidate products, and (iv) applying ML-based feasibility assessment to prioritize compounds for robust execution. In the current release, the space is supported by seven reactions. We describe an end-to-end workflow - from route selection and automated liquid handling through workup and purification. We further report validation across operational metrics (success rate, timelines, purity, and identity), including NMR confirmation for a representative set of synthesized compounds and assay suitability demonstrated using a series of DPP4 inhibitors. Collectively, onepot CORE illustrates a path toward faster, more reliable access to diverse small molecules, supporting accelerated discovery in pharmaceuticals and beyond.
Abstract:Molecular representation is a foundational element in our understanding of the physical world. Its importance ranges from the fundamentals of chemical reactions to the design of new therapies and materials. Previous molecular machine learning models have employed strings, fingerprints, global features, and simple molecular graphs that are inherently information-sparse representations. However, as the complexity of prediction tasks increases, the molecular representation needs to encode higher fidelity information. This work introduces a novel approach to infusing quantum-chemical-rich information into molecular graphs via stereoelectronic effects. We show that the explicit addition of stereoelectronic interactions significantly improves the performance of molecular machine learning models. Furthermore, stereoelectronics-infused representations can be learned and deployed with a tailored double graph neural network workflow, enabling its application to any downstream molecular machine learning task. Finally, we show that the learned representations allow for facile stereoelectronic evaluation of previously intractable systems, such as entire proteins, opening new avenues of molecular design.




Abstract:Mass spectrometry is a widely used method to study molecules and processes in medicine, life sciences, chemistry, catalysis, and industrial product quality control, among many other applications. One of the main features of some mass spectrometry techniques is the extensive level of characterization (especially when coupled with chromatography and ion mobility methods, or a part of tandem mass spectrometry experiment) and a large amount of generated data per measurement. Terabyte scales can be easily reached with mass spectrometry studies. Consequently, mass spectrometry has faced the challenge of a high level of data disappearance. Researchers often neglect and then altogether lose access to the rich information mass spectrometry experiments could provide. With the development of machine learning methods, the opportunity arises to unlock the potential of these data, enabling previously inaccessible discoveries. The present perspective highlights reevaluation of mass spectrometry data analysis in the new generation of methods and describes significant challenges in the field, particularly related to problems involving the use of electrospray ionization. We argue that further applications of machine learning raise new requirements for instrumentation (increasing throughput and information density, decreasing pricing, and making more automation-friendly software), and once met, the field may experience significant transformation.




Abstract:Transformer-based large language models are rapidly advancing in the field of machine learning research, with applications spanning natural language, biology, chemistry, and computer programming. Extreme scaling and reinforcement learning from human feedback have significantly improved the quality of generated text, enabling these models to perform various tasks and reason about their choices. In this paper, we present an Intelligent Agent system that combines multiple large language models for autonomous design, planning, and execution of scientific experiments. We showcase the Agent's scientific research capabilities with three distinct examples, with the most complex being the successful performance of catalyzed cross-coupling reactions. Finally, we discuss the safety implications of such systems and propose measures to prevent their misuse.