Abstract:Machine learning techniques are utilized to estimate the electronic band gap energy and forecast the band gap category of materials based on experimentally quantifiable properties. The determination of band gap energy is critical for discerning various material properties, such as its metallic nature, and potential applications in electronic and optoelectronic devices. While numerical methods exist for computing band gap energy, they often entail high computational costs and have limitations in accuracy and scalability. A machine learning-driven model capable of swiftly predicting material band gap energy using easily obtainable experimental properties would offer a superior alternative to conventional density functional theory (DFT) methods. Our model does not require any preliminary DFT-based calculation or knowledge of the structure of the material. We present a scheme for improving the performance of simple regression and classification models by partitioning the dataset into multiple clusters. A new evaluation scheme for comparing the performance of ML-based models in material sciences involving both regression and classification tasks is introduced based on traditional evaluation metrics. It is shown that on this new evaluation metric, our method of clustering the dataset results in better performance.