Abstract:We propose a novel topological perspective on data languages recognizable by orbit-finite nominal monoids. For this purpose, we introduce pro-orbit-finite nominal topological spaces. Assuming globally bounded support sizes, they coincide with nominal Stone spaces and are shown to be dually equivalent to a subcategory of nominal boolean algebras. Recognizable data languages are characterized as topologically clopen sets of pro-orbit-finite words. In addition, we explore the expressive power of pro-orbit-finite equations by establishing a nominal version of Reiterman's pseudovariety theorem.
Abstract:We propose a generic categorical framework for learning unknown formal languages of various types (e.g. finite or infinite words, trees, weighted and nominal languages). Our approach is parametric in a monad T that represents the given type of languages and their recognizing algebraic structures. Using the concept of an automata presentation of T-algebras, we demonstrate that the task of learning a T-recognizable language can be reduced to learning an abstract form of automaton, which is achieved via a generalized version of Angluin's L* algorithm. The algorithm is phrased in terms of categorically described extension steps; we provide for a generic termination and complexity analysis based on a dedicated notion of finiteness. Our framework applies to structures like tree languages or omega-regular languages that were not within the scope of existing categorical accounts of automata learning. In addition, it yields new generic learning algorithms for several types of languages for which no such algorithms were previously known at all, including sorted languages, nominal languages with name binding, and cost functions.