Abstract:We present a comprehensive benchmark dataset for Knowledge Graph Question Answering in Materials Science (KGQA4MAT), with a focus on metal-organic frameworks (MOFs). A knowledge graph for metal-organic frameworks (MOF-KG) has been constructed by integrating structured databases and knowledge extracted from the literature. To enhance MOF-KG accessibility for domain experts, we aim to develop a natural language interface for querying the knowledge graph. We have developed a benchmark comprised of 161 complex questions involving comparison, aggregation, and complicated graph structures. Each question is rephrased in three additional variations, resulting in 644 questions and 161 KG queries. To evaluate the benchmark, we have developed a systematic approach for utilizing ChatGPT to translate natural language questions into formal KG queries. We also apply the approach to the well-known QALD-9 dataset, demonstrating ChatGPT's potential in addressing KGQA issues for different platforms and query languages. The benchmark and the proposed approach aim to stimulate further research and development of user-friendly and efficient interfaces for querying domain-specific materials science knowledge graphs, thereby accelerating the discovery of novel materials.
Abstract:Convolutional neural networks (CNNs) are one of the most popular models of Artificial Neural Networks (ANN)s in Computer Vision (CV). A variety of CNN-based structures were developed by researchers to solve problems like image classification, object detection, and image similarity measurement. Although CNNs have shown their value in most cases, they still have a downside: they easily overfit when there are not enough samples in the dataset. Most medical image datasets are examples of such a dataset. Additionally, many datasets also contain both designed features and images, but CNNs can only deal with images directly. This represents a missed opportunity to leverage additional information. For this reason, we propose a new structure of CNN-based model: CompNet, a composite convolutional neural network. This is a specially designed neural network that accepts combinations of images and designed features as input in order to leverage all available information. The novelty of this structure is that it uses learned features from images to weight designed features in order to gain all information from both images and designed features. With the use of this structure on classification tasks, the results indicate that our approach has the capability to significantly reduce overfitting. Furthermore, we also found several similar approaches proposed by other researchers that can combine images and designed features. To make comparison, we first applied those similar approaches on LIDC and compared the results with the CompNet results, then we applied our CompNet on the datasets that those similar approaches originally used in their works and compared the results with the results they proposed in their papers. All these comparison results showed that our model outperformed those similar approaches on classification tasks either on LIDC dataset or on their proposed datasets.
Abstract:Metal-Organic Frameworks (MOFs) are a class of modular, porous crystalline materials that have great potential to revolutionize applications such as gas storage, molecular separations, chemical sensing, catalysis, and drug delivery. The Cambridge Structural Database (CSD) reports 10,636 synthesized MOF crystals which in addition contains ca. 114,373 MOF-like structures. The sheer number of synthesized (plus potentially synthesizable) MOF structures requires researchers pursue computational techniques to screen and isolate MOF candidates. In this demo paper, we describe our effort on leveraging knowledge graph methods to facilitate MOF prediction, discovery, and synthesis. We present challenges and case studies about (1) construction of a MOF knowledge graph (MOF-KG) from structured and unstructured sources and (2) leveraging the MOF-KG for discovery of new or missing knowledge.
Abstract:Neural networks are becoming increasingly better at tasks that involve classifying and recognizing images. At the same time techniques intended to explain the network output have been proposed. One such technique is the Gradient-based Class Activation Map (Grad-CAM), which is able to locate features of an input image at various levels of a convolutional neural network (CNN), but is sensitive to the vanishing gradients problem. There are techniques such as Integrated Gradients (IG), that are not affected by that problem, but its use is limited to the input layer of a network. Here we introduce a new technique to produce visual explanations for the predictions of a CNN. Like Grad-CAM, our method can be applied to any layer of the network, and like Integrated Gradients it is not affected by the problem of vanishing gradients. For efficiency, gradient integration is performed numerically at the layer level using a Riemann-Stieltjes sum approximation. Compared to Grad-CAM, heatmaps produced by our algorithm are better focused in the areas of interest, and their numerical computation is more stable. Our code is available at https://github.com/mlerma54/RSIGradCAM