Abstract:Predicting price spikes in critical metals such as Cobalt, Copper, Magnesium, and Nickel is crucial for mitigating economic risks associated with global trends like the energy transition and reshoring of manufacturing. While traditional models have focused on regression-based approaches, our work introduces a neurosymbolic ensemble framework that integrates multiple neural models with symbolic error detection and correction rules. This framework is designed to enhance predictive accuracy by correcting individual model errors and offering interpretability through rule-based explanations. We show that our method provides up to 6.42% improvement in precision, 29.41% increase in recall at 13.24% increase in F1 over the best performing neural models. Further, our method, as it is based on logical rules, has the benefit of affording an explanation as to which combination of neural models directly contribute to a given prediction.
Abstract:Recent advances in Hierarchical Multi-label Classification (HMC), particularly neurosymbolic-based approaches, have demonstrated improved consistency and accuracy by enforcing constraints on a neural model during training. However, such work assumes the existence of such constraints a-priori. In this paper, we relax this strong assumption and present an approach based on Error Detection Rules (EDR) that allow for learning explainable rules about the failure modes of machine learning models. We show that these rules are not only effective in detecting when a machine learning classifier has made an error but also can be leveraged as constraints for HMC, thereby allowing the recovery of explainable constraints even if they are not provided. We show that our approach is effective in detecting machine learning errors and recovering constraints, is noise tolerant, and can function as a source of knowledge for neurosymbolic models on multiple datasets, including a newly introduced military vehicle recognition dataset.
Abstract:The ability to generate artificial human movement patterns while meeting location and time constraints is an important problem in the security community, particularly as it enables the study of the analog problem of detecting such patterns while maintaining privacy. We frame this problem as an instance of abduction guided by a novel parsimony function represented as an aggregate truth value over an annotated logic program. This approach has the added benefit of affording explainability to an analyst user. By showing that any subset of such a program can provide a lower bound on this parsimony requirement, we are able to abduce movement trajectories efficiently through an informed (i.e., A*) search. We describe how our implementation was enhanced with the application of multiple techniques in order to be scaled and integrated with a cloud-based software stack that included bottom-up rule learning, geolocated knowledge graph retrieval/management, and interfaces with government systems for independently conducted government-run tests for which we provide results. We also report on our own experiments showing that we not only provide exact results but also scale to very large scenarios and provide realistic agent trajectories that can go undetected by machine learning anomaly detectors.
Abstract:Metacognition is the concept of reasoning about an agent's own internal processes and was originally introduced in the field of developmental psychology. In this position paper, we examine the concept of applying metacognition to artificial intelligence. We introduce a framework for understanding metacognitive artificial intelligence (AI) that we call TRAP: transparency, reasoning, adaptation, and perception. We discuss each of these aspects in-turn and explore how neurosymbolic AI (NSAI) can be leveraged to address challenges of metacognition.
Abstract:Recent advances in experimental methods have enabled researchers to collect data on thousands of analytes simultaneously. This has led to correlational studies that associated molecular measurements with diseases such as Alzheimer's, Liver, and Gastric Cancer. However, the use of thousands of biomarkers selected from the analytes is not practical for real-world medical diagnosis and is likely undesirable due to potentially formed spurious correlations. In this study, we evaluate 4 different methods for biomarker selection and 4 different machine learning (ML) classifiers for identifying correlations, evaluating 16 approaches in all. We found that contemporary methods outperform previously reported logistic regression in cases where 3 and 10 biomarkers are permitted. When specificity is fixed at 0.9, ML approaches produced a sensitivity of 0.240 (3 biomarkers) and 0.520 (10 biomarkers), while standard logistic regression provided a sensitivity of 0.000 (3 biomarkers) and 0.040 (10 biomarkers). We also noted that causal-based methods for biomarker selection proved to be the most performant when fewer biomarkers were permitted, while univariate feature selection was the most performant when a greater number of biomarkers were permitted.
Abstract:Recent advances in reinforcement learning (RL) have shown much promise across a variety of applications. However, issues such as scalability, explainability, and Markovian assumptions limit its applicability in certain domains. We observe that many of these shortcomings emanate from the simulator as opposed to the RL training algorithms themselves. As such, we propose a semantic proxy for simulation based on a temporal extension to annotated logic. In comparison with two high-fidelity simulators, we show up to three orders of magnitude speed-up while preserving the quality of policy learned. In addition, we show the ability to model and leverage non-Markovian dynamics and instantaneous actions while providing an explainable trace describing the outcomes of the agent actions.
Abstract:Classification of movement trajectories has many applications in transportation. Supervised neural models represent the current state-of-the-art. Recent security applications require this task to be rapidly employed in environments that may differ from the data used to train such models for which there is little training data. We provide a neuro-symbolic rule-based framework to conduct error correction and detection of these models to support eventual deployment in security applications. We provide a suite of experiments on several recent and state-of-the-art models and show an accuracy improvement of 1.7% over the SOTA model in the case where all classes are present in training and when 40% of classes are omitted from training, we obtain a 5.2% improvement (zero-shot) and 23.9% (few-shot) improvement over the SOTA model without resorting to retraining of the base model.
Abstract:Error prediction in large language models often relies on domain-specific information. In this paper, we present measures for quantification of error in the response of a large language model based on the diversity of responses to a given prompt - hence independent of the underlying application. We describe how three such measures - based on entropy, Gini impurity, and centroid distance - can be employed. We perform a suite of experiments on multiple datasets and temperature settings to demonstrate that these measures strongly correlate with the probability of failure. Additionally, we present empirical results demonstrating how these measures can be applied to few-shot prompting, chain-of-thought reasoning, and error detection.
Abstract:The growing popularity of neuro symbolic reasoning has led to the adoption of various forms of differentiable (i.e., fuzzy) first order logic. We introduce PyReason, a software framework based on generalized annotated logic that both captures the current cohort of differentiable logics and temporal extensions to support inference over finite periods of time with capabilities for open world reasoning. Further, PyReason is implemented to directly support reasoning over graphical structures (e.g., knowledge graphs, social networks, biological networks, etc.), produces fully explainable traces of inference, and includes various practical features such as type checking and a memory-efficient implementation. This paper reviews various extensions of generalized annotated logic integrated into our implementation, our modern, efficient Python-based implementation that conducts exact yet scalable deductive inference, and a suite of experiments. PyReason is available at: github.com/lab-v2/pyreason.
Abstract:We study the performance of a commercially available large language model (LLM) known as ChatGPT on math word problems (MWPs) from the dataset DRAW-1K. To our knowledge, this is the first independent evaluation of ChatGPT. We found that ChatGPT's performance changes dramatically based on the requirement to show its work, failing 20% of the time when it provides work compared with 84% when it does not. Further several factors about MWPs relating to the number of unknowns and number of operations that lead to a higher probability of failure when compared with the prior, specifically noting (across all experiments) that the probability of failure increases linearly with the number of addition and subtraction operations. We also have released the dataset of ChatGPT's responses to the MWPs to support further work on the characterization of LLM performance and present baseline machine learning models to predict if ChatGPT can correctly answer an MWP. We have released a dataset comprised of ChatGPT's responses to support further research in this area.