Abstract:Detecting and analyzing the local environment is crucial for investigating the dynamical processes of crystal nucleation and shape colloidal particle self-assembly. Recent developments in machine learning provide a promising avenue for better order parameters in complex systems that are challenging to study using traditional approaches. However, the application of machine learning to self-assembly on systems of particle shapes is still underexplored. To address this gap, we propose a simple, physics-agnostic, yet powerful approach that involves training a multilayer perceptron (MLP) as a local environment classifier for systems of particle shapes, using input features such as particle distances and orientations. Our MLP classifier is trained in a supervised manner with a shape symmetry-encoded data augmentation technique without the need for any conventional roto-translations invariant symmetry functions. We evaluate the performance of our classifiers on four different scenarios involving self-assembly of cubic structures, 2-dimensional and 3-dimensional patchy particle shape systems, hexagonal bipyramids with varying aspect ratios, and truncated shapes with different degrees of truncation. The proposed training process and data augmentation technique are both straightforward and flexible, enabling easy application of the classifier to other processes involving particle orientations. Our work thus presents a valuable tool for investigating self-assembly processes on systems of particle shapes, with potential applications in structure identification of any particle-based or molecular system where orientations can be defined.
Abstract:Recurrent neural networks have seen widespread use in modeling dynamical systems in varied domains such as weather prediction, text prediction and several others. Often one wishes to supplement the experimentally observed dynamics with prior knowledge or intuition about the system. While the recurrent nature of these networks allows them to model arbitrarily long memories in the time series used in training, it makes it harder to impose prior knowledge or intuition through generic constraints. In this work, we present a path sampling approach based on principle of Maximum Caliber that allows us to include generic thermodynamic or kinetic constraints into recurrent neural networks. We show the method here for a widely used type of recurrent neural network known as long short-term memory network in the context of supplementing time series collecting from all-atom molecular dynamics. We demonstrate the power of the formalism for different applications. Our method can be easily generalized to other generative artificial intelligence models and to generic time series in different areas of physical and social sciences, where one wishes to supplement limited data with intuition or theory based corrections.
Abstract:We propose a general mechanism for evolution to explain the diversity of gene and language. To quantify their common features and reveal the hidden structures, several statistical properties and patterns are examined based on a new method called the rank-rank analysis. We find that the classical correspondence, "domain plays the role of word in gene language", is not rigorous, and propose to replace domain by protein. In addition, we devise a new evolution unit, syllgram, to include the characteristics of spoken and written language. Based on the correspondence between (protein, domain) and (word, syllgram), we discover that both gene and language shared a common scaling structure and scale-free network. Like the Rosetta stone, this work may help decipher the secret behind non-coding DNA and unknown languages.
Abstract:One of the ultimate goals for linguists is to find universal properties in human languages. Although words are generally considered as representing arbitrary mapping between linguistic forms and meanings, we propose a new universal law that highlights the equally important role of syllables, which is complementary to Zipf's. By plotting rank-rank frequency distribution of word and syllable for English and Chinese corpora, visible lines appear and can be fit to a master curve. We discover the multi-layer network for words and syllables based on this analysis exhibits the feature of self-organization which relies heavily on the inclusion of syllables and their connections. Analytic form for the scaling structure is derived and used to quantify how Internet slang becomes fashionable, which demonstrates its usefulness as a new tool to evolutionary linguistics.