Abstract:A construction that assigns a Boolean 1D TQFT with defects to a finite state automaton was recently developed by Gustafson, Im, Kaldawy, Khovanov, and Lihn. We show that the construction is functorial with respect to the category of finite state automata with transducers as morphisms. Certain classes of subregular languages correspond to additional cohomological structures on the associated TQFTs. We also show that the construction generalizes to context-free grammars through a categorical version of the Chomsky-Sch\"utzenberger representation theorem, due to Melli\`es and Zeilberger. The corresponding TQFTs are then described as morphisms of colored operads on an operad of cobordisms with defects.
Abstract:We extend our formulation of Merge and Minimalism in terms of Hopf algebras to an algebraic model of a syntactic-semantic interface. We show that methods adopted in the formulation of renormalization (extraction of meaningful physical values) in theoretical physics are relevant to describe the extraction of meaning from syntactic expressions. We show how this formulation relates to computational models of semantics and we answer some recent controversies about implications for generative linguistics of the current functioning of large language models.
Abstract:We provide a construction of Gabor frames that encode local linearizations of a signal detected on a curved smooth manifold of arbitrary dimension, with Gabor filters that can detect the presence of higher-dimensional boundaries in the manifold signal. We describe an application in configuration spaces in robotics with sharp constrains. The construction is a higher-dimensional generalization of the geometric setting developed for the study of signal analysis in the visual cortex.
Abstract:In this paper we compare some old formulations of Minimalism, in particular Stabler's computational minimalism, and Chomsky's new formulation of Merge and Minimalism, from the point of view of their mathematical description in terms of Hopf algebras. We show that the newer formulation has a clear advantage purely in terms of the underlying mathematical structure. More precisely, in the case of Stabler's computational minimalism, External Merge can be described in terms of a partially defined operated algebra with binary operation, while Internal Merge determines a system of right-ideal coideals of the Loday-Ronco Hopf algebra and corresponding right-module coalgebra quotients. This mathematical structure shows that Internal and External Merge have significantly different roles in the old formulations of Minimalism, and they are more difficult to reconcile as facets of a single algebraic operation, as desirable linguistically. On the other hand, we show that the newer formulation of Minimalism naturally carries a Hopf algebra structure where Internal and External Merge directly arise from the same operation. We also compare, at the level of algebraic properties, the externalization model of the new Minimalism with proposals for assignments of planar embeddings based on heads of trees.
Abstract:The syntactic Merge operation of the Minimalist Program in linguistics can be described mathematically in terms of Hopf algebras, with a formalism similar to the one arising in the physics of renormalization. This mathematical formulation of Merge has good descriptive power, as phenomena empirically observed in linguistics can be justified from simple mathematical arguments. It also provides a possible mathematical model for externalization and for the role of syntactic parameters.
Abstract:We further the theme of studying syntactic structures data from Longobardi (2017b), Collins (2010), Ceolin et al. (2020) and Koopman (2011) using general Markov models initiated in Shu et al. (2017), exploring the question of how consistent the data is with the idea that general Markov models. The ideas explored in the present paper are more generally applicable than to the setting of syntactic structures, and can be used when analyzing consistency of data with general Markov models. Additionally, we give an interpretation of the methods of Ceolin et al. (2020) as an infinite sites evolutionary model and compare it to the Markov model and explore each in the context of evolutionary processes acting on human language syntax.
Abstract:We use the persistent homology method of topological data analysis and dimensional analysis techniques to study data of syntactic structures of world languages. We analyze relations between syntactic parameters in terms of dimensionality, of hierarchical clustering structures, and of non-trivial loops. We show there are relations that hold across language families and additional relations that are family-specific. We then analyze the trees describing the merging structure of persistent connected components for languages in different language families and we show that they partly correlate to historical phylogenetic trees but with significant differences. We also show the existence of interesting non-trivial persistent first homology groups in various language families. We give examples where explicit generators for the persistent first homology can be identified, some of which appear to correspond to homoplasy phenomena, while others may have an explanation in terms of historical linguistics, corresponding to known cases of syntactic borrowing across different language subfamilies.
Abstract:We consider two different data sets of syntactic parameters and we discuss how to detect relations between parameters through a heat kernel method developed by Belkin-Niyogi, which produces low dimensional representations of the data, based on Laplace eigenfunctions, that preserve neighborhood information. We analyze the different connectivity and clustering structures that arise in the two datasets, and the regions of maximal variance in the two-parameter space of the Belkin-Niyogi construction, which identify preferable choices of independent variables. We compute clustering coefficients and their variance.
Abstract:Using Phylogenetic Algebraic Geometry, we analyze computationally the phylogenetic tree of subfamilies of the Indo-European language family, using data of syntactic structures. The two main sources of syntactic data are the SSWL database and Longobardi's recent data of syntactic parameters. We compute phylogenetic invariants and likelihood functions for two sets of Germanic languages, a set of Romance languages, a set of Slavic languages and a set of early Indo-European languages, and we compare the results with what is known through historical linguistics.
Abstract:We assign binary and ternary error-correcting codes to the data of syntactic structures of world languages and we study the distribution of code points in the space of code parameters. We show that, while most codes populate the lower region approximating a superposition of Thomae functions, there is a substantial presence of codes above the Gilbert-Varshamov bound and even above the asymptotic bound and the Plotkin bound. We investigate the dynamics induced on the space of code parameters by spin glass models of language change, and show that, in the presence of entailment relations between syntactic parameters the dynamics can sometimes improve the code. For large sets of languages and syntactic data, one can gain information on the spin glass dynamics from the induced dynamics in the space of code parameters.