Abstract:Predicting the yield percentage of a chemical reaction is useful in many aspects such as reducing wet-lab experimentation by giving the priority to the reactions with a high predicted yield. In this work we investigated the use of multiple type inputs to predict chemical reaction yield. We used simplified molecular-input line-entry system (SMILES) as well as calculated chemical descriptors as model inputs. The model consists of a pre-trained bidirectional transformer-based encoder (BERT) and a multi-layer perceptron (MLP) with a regression head to predict the yield. We experimented on two high throughput experimentation (HTE) datasets for Buchwald-Hartwig and Suzuki-Miyaura reactions. The experiments show improvements in the prediction on both datasets compared to systems using only SMILES or chemical descriptors as input. We also tested the model's performance on out-of-sample dataset splits of Buchwald-Hartwig and achieved comparable results with the state-of-the-art. In addition to predicting the yield, we demonstrated the model's ability to suggest the optimum (highest yield) reaction conditions. The model was able to suggest conditions that achieves 94% of the optimum reported yields. This proves the model to be useful in achieving the best results in the wet lab without expensive experimentation.