Abstract:Neurosymbolic artificial intelligence is a growing field of research aiming to combine neural network learning capabilities with the reasoning abilities of symbolic systems. Informed multi-label classification is a sub-field of neurosymbolic AI which studies how to leverage prior knowledge to improve neural classification systems. A well known family of neurosymbolic techniques for informed classification use probabilistic reasoning to integrate this knowledge during learning, inference or both. Therefore, the asymptotic complexity of probabilistic reasoning is of cardinal importance to assess the scalability of such techniques. However, this topic is rarely tackled in the neurosymbolic literature, which can lead to a poor understanding of the limits of probabilistic neurosymbolic techniques. In this paper, we introduce a formalism for informed supervised classification tasks and techniques. We then build upon this formalism to define three abstract neurosymbolic techniques based on probabilistic reasoning. Finally, we show computational complexity results on several representation languages for prior knowledge commonly found in the neurosymbolic literature.
Abstract:Neurosymbolic AI is a growing field of research aiming to combine neural networks learning capabilities with the reasoning abilities of symbolic systems. This hybridization can take many shapes. In this paper, we propose a new formalism for supervised multi-label classification with propositional background knowledge. We introduce a new neurosymbolic technique called semantic conditioning at inference, which only constrains the system during inference while leaving the training unaffected. We discuss its theoritical and practical advantages over two other popular neurosymbolic techniques: semantic conditioning and semantic regularization. We develop a new multi-scale methodology to evaluate how the benefits of a neurosymbolic technique evolve with the scale of the network. We then evaluate experimentally and compare the benefits of all three techniques across model scales on several datasets. Our results demonstrate that semantic conditioning at inference can be used to build more accurate neural-based systems with fewer resources while guaranteeing the semantic consistency of outputs.