Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vwani Roychowdhury

First numerical observation of the Berezinskii-Kosterlitz-Thouless transition in language models

Dec 02, 2024

Yuma Toji, Jun Takahashi, Vwani Roychowdhury, Hideyuki Miyahara

Figure 1 for First numerical observation of the Berezinskii-Kosterlitz-Thouless transition in language models

Figure 2 for First numerical observation of the Berezinskii-Kosterlitz-Thouless transition in language models

Figure 3 for First numerical observation of the Berezinskii-Kosterlitz-Thouless transition in language models

Figure 4 for First numerical observation of the Berezinskii-Kosterlitz-Thouless transition in language models

Abstract:Several power-law critical properties involving different statistics in natural languages -- reminiscent of scaling properties of physical systems at or near phase transitions -- have been documented for decades. The recent rise of large language models (LLMs) has added further evidence and excitement by providing intriguing similarities with notions in physics such as scaling laws and emergent abilities. However, specific instances of classes of generative language models that exhibit phase transitions, as understood by the statistical physics community, are lacking. In this work, inspired by the one-dimensional Potts model in statistical physics we construct a simple probabilistic language model that falls under the class of context sensitive grammars (CSG), and numerically demonstrate an unambiguous phase transition in the framework of a natural language model. We explicitly show that a precisely defined order parameter -- that captures symbol frequency biases in the sentences generated by the language model -- changes from strictly 0 to a strictly nonzero value (in the infinite-length limit of sentences), implying a mathematical singularity arising when tuning the parameter of the stochastic language model we consider. Furthermore, we identify the phase transition as a variant of the Berezinskii-Kosterlitz-Thouless (BKT) transition, which is known to exhibit critical properties not only at the transition point but also in the entire phase. This finding leads to the possibility that critical properties in natural languages may not require careful fine-tuning nor self-organized criticality, but is generically explained by the underlying connection between language structures and the BKT phases.

Via

Access Paper or Ask Questions

Creating an AI Observer: Generative Semantic Workspaces

Jun 07, 2024

Pavan Holur, Shreyas Rajesh, David Chong, Vwani Roychowdhury

Figure 1 for Creating an AI Observer: Generative Semantic Workspaces

Figure 2 for Creating an AI Observer: Generative Semantic Workspaces

Figure 3 for Creating an AI Observer: Generative Semantic Workspaces

Figure 4 for Creating an AI Observer: Generative Semantic Workspaces

Abstract:An experienced human Observer reading a document -- such as a crime report -- creates a succinct plot-like $\textit{``Working Memory''}$ comprising different actors, their prototypical roles and states at any point, their evolution over time based on their interactions, and even a map of missing Semantic parts anticipating them in the future. $\textit{An equivalent AI Observer currently does not exist}$. We introduce the $\textbf{[G]}$enerative $\textbf{[S]}$emantic $\textbf{[W]}$orkspace (GSW) -- comprising an $\textit{``Operator''}$ and a $\textit{``Reconciler''}$ -- that leverages advancements in LLMs to create a generative-style Semantic framework, as opposed to a traditionally predefined set of lexicon labels. Given a text segment $C_n$ that describes an ongoing situation, the $\textit{Operator}$ instantiates actor-centric Semantic maps (termed ``Workspace instance'' $\mathcal{W}_n$). The $\textit{Reconciler}$ resolves differences between $\mathcal{W}_n$ and a ``Working memory'' $\mathcal{M}_n^*$ to generate the updated $\mathcal{M}_{n+1}^*$. GSW outperforms well-known baselines on several tasks ($\sim 94\%$ vs. FST, GLEN, BertSRL - multi-sentence Semantics extraction, $\sim 15\%$ vs. NLI-BERT, $\sim 35\%$ vs. QA). By mirroring the real Observer, GSW provides the first step towards Spatial Computing assistants capable of understanding individual intentions and predicting future behavior.

* 37 pages with appendix, 28 figures

Via

Access Paper or Ask Questions

Embed-Search-Align: DNA Sequence Alignment using Transformer Models

Sep 20, 2023

Pavan Holur, K. C. Enevoldsen, Lajoyce Mboning, Thalia Georgiou, Louis-S. Bouchard, Matteo Pellegrini, Vwani Roychowdhury

Abstract:DNA sequence alignment involves assigning short DNA reads to the most probable locations on an extensive reference genome. This process is crucial for various genomic analyses, including variant calling, transcriptomics, and epigenomics. Conventional methods, refined over decades, tackle this challenge in two steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models (LLM) in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have explored whether the same Transformer architecture can produce numerical representations for DNA sequences. Such models have shown early promise in tasks involving classification of short DNA sequences, such as the detection of coding vs non-coding regions, as well as the identification of enhancer and promoter sequences. Performance at sequence classification tasks does not, however, translate to sequence alignment, where it is necessary to conduct a genome-wide search to successfully align every read. We address this open problem by framing it as an Embed-Search-Align task. In this framework, a novel encoder model DNA-ESA generates representations of reads and fragments of the reference, which are projected into a shared vector space where the read-fragment distance is used as surrogate for alignment. In particular, DNA-ESA introduces: (1) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich sequence-level embeddings, and (2) a DNA vector store to enable search across fragments on a global scale. DNA-ESA is >97% accurate when aligning 250-length reads onto a human reference genome of 3 gigabases (single-haploid), far exceeds the performance of 6 recent DNA-Transformer model baselines and shows task transfer across chromosomes and species.

* 17 pages, Tables 5, Figures 5, Under review, ICLR

Via

Access Paper or Ask Questions

Rapid design of fully soft deployable structures via kirigami cuts and active learning

Mar 04, 2023

Leixin Ma, Mrunmayi Mungekar, Vwani Roychowdhury, M. Khalid Jawed

Abstract:Soft deployable structures - unlike conventional piecewise rigid deployables based on hinges and springs - can assume intricate 3-D shapes, thereby enabling transformative technologies in soft robotics, shape-morphing architecture, and pop-up manufacturing. Their virtually infinite degrees of freedom allow precise control over the final shape. The same enabling high dimensionality, however, poses a challenge for solving the inverse design problem involving this class of structures: to achieve desired 3D structures it typically requires manufacturing technologies with extensive local actuation and control during fabrication, and a trial and error search over a large design space. We address both of these shortcomings by first developing a simplified planar fabrication approach that combines two ingredients: strain mismatch between two layers of a composite shell and kirigami cuts that relieves localized stress. In principle, it is possible to generate targeted 3-D shapes by designing the appropriate kirigami cuts and selecting the right amount of prestretch, thereby eliminating the need for local control. Second, we formulate a data-driven physics-guided framework that reduces the dimensionality of the inverse design problem using autoencoders and efficiently searches through the ``latent" parameter space in an active learning approach. We demonstrate the effectiveness of the rapid design procedure via a range of target shapes, such as peanuts, pringles, flowers, and pyramids. Tabletop experiments are conducted to fabricate the target shapes. Experimental results and numerical predictions from our framework are found to be in good agreement.

Via

Access Paper or Ask Questions

Meta-learning generalizable dynamics from trajectories

Jan 03, 2023

Qiaofeng Li, Tianyi Wang, Vwani Roychowdhury, M. Khalid Jawed

Figure 1 for Meta-learning generalizable dynamics from trajectories

Figure 2 for Meta-learning generalizable dynamics from trajectories

Figure 3 for Meta-learning generalizable dynamics from trajectories

Abstract:We present the interpretable meta neural ordinary differential equation (iMODE) method to rapidly learn generalizable (i.e., not parameter-specific) dynamics from trajectories of multiple dynamical systems that vary in their physical parameters. The iMODE method learns meta-knowledge, the functional variations of the force field of dynamical system instances without knowing the physical parameters, by adopting a bi-level optimization framework: an outer level capturing the common force field form among studied dynamical system instances and an inner level adapting to individual system instances. A priori physical knowledge can be conveniently embedded in the neural network architecture as inductive bias, such as conservative force field and Euclidean symmetry. With the learned meta-knowledge, iMODE can model an unseen system within seconds, and inversely reveal knowledge on the physical parameters of a system, or as a Neural Gauge to "measure" the physical parameters of an unseen system with observed trajectories. We test the validity of the iMODE method on bistable, double pendulum, Van der Pol, Slinky, and reaction-diffusion systems.

Via

Access Paper or Ask Questions

Action-conditioned On-demand Motion Generation

Jul 17, 2022

Qiujing Lu, Yipeng Zhang, Mingjian Lu, Vwani Roychowdhury

Figure 1 for Action-conditioned On-demand Motion Generation

Figure 2 for Action-conditioned On-demand Motion Generation

Figure 3 for Action-conditioned On-demand Motion Generation

Figure 4 for Action-conditioned On-demand Motion Generation

Abstract:We propose a novel framework, On-Demand MOtion Generation (ODMO), for generating realistic and diverse long-term 3D human motion sequences conditioned only on action types with an additional capability of customization. ODMO shows improvements over SOTA approaches on all traditional motion evaluation metrics when evaluated on three public datasets (HumanAct12, UESTC, and MoCap). Furthermore, we provide both qualitative evaluations and quantitative metrics demonstrating several first-known customization capabilities afforded by our framework, including mode discovery, interpolation, and trajectory customization. These capabilities significantly widen the spectrum of potential applications of such motion generation models. The novel on-demand generative capabilities are enabled by innovations in both the encoder and decoder architectures: (i) Encoder: Utilizing contrastive learning in low-dimensional latent space to create a hierarchical embedding of motion sequences, where not only the codes of different action types form different groups, but within an action type, codes of similar inherent patterns (motion styles) cluster together, making them readily discoverable; (ii) Decoder: Using a hierarchical decoding strategy where the motion trajectory is reconstructed first and then used to reconstruct the whole motion sequence. Such an architecture enables effective trajectory control. Our code is released on the Github page: https://github.com/roychowdhuryresearch/ODMO

* Accepted by ACMMM 2022, 13 pages, 5 figures

Via

Access Paper or Ask Questions

Quantum Advantage in Variational Bayes Inference

Jul 07, 2022

Hideyuki Miyahara, Vwani Roychowdhury

Figure 1 for Quantum Advantage in Variational Bayes Inference

Figure 2 for Quantum Advantage in Variational Bayes Inference

Figure 3 for Quantum Advantage in Variational Bayes Inference

Figure 4 for Quantum Advantage in Variational Bayes Inference

Abstract:Variational Bayes (VB) inference algorithm is used widely to estimate both the parameters and the unobserved hidden variables in generative statistical models. The algorithm -- inspired by variational methods used in computational physics -- is iterative and can get easily stuck in local minima, even when classical techniques, such as deterministic annealing (DA), are used. We study a variational Bayes (VB) inference algorithm based on a non-traditional quantum annealing approach -- referred to as quantum annealing variational Bayes (QAVB) inference -- and show that there is indeed a quantum advantage to QAVB over its classical counterparts. In particular, we show that such better performance is rooted in key concepts from quantum mechanics: (i) the ground state of the Hamiltonian of a quantum system -- defined from the given variational Bayes (VB) problem -- corresponds to an optimal solution for the minimization problem of the variational free energy at very low temperatures; (ii) such a ground state can be achieved by a technique paralleling the quantum annealing process; and (iii) starting from this ground state, the optimal solution to the VB problem can be achieved by increasing the heat bath temperature to unity, and thereby avoiding local minima introduced by spontaneous symmetry breaking observed in classical physics based VB algorithms. We also show that the update equations of QAVB can be potentially implemented using $\lceil \log K \rceil$ qubits and $\mathcal{O} (K)$ operations per step. Thus, QAVB can match the time complexity of existing VB algorithms, while delivering higher performance.

Via

Access Paper or Ask Questions

Quantum Approximation of Normalized Schatten Norms and Applications to Learning

Jun 23, 2022

Yiyou Chen, Hideyuki Miyahara, Louis-S. Bouchard, Vwani Roychowdhury

Figure 1 for Quantum Approximation of Normalized Schatten Norms and Applications to Learning

Figure 2 for Quantum Approximation of Normalized Schatten Norms and Applications to Learning

Figure 3 for Quantum Approximation of Normalized Schatten Norms and Applications to Learning

Figure 4 for Quantum Approximation of Normalized Schatten Norms and Applications to Learning

Abstract:Efficient measures to determine similarity of quantum states, such as the fidelity metric, have been widely studied. In this paper, we address the problem of defining a similarity measure for quantum operations that can be \textit{efficiently estimated}. Given two quantum operations, $U_1$ and $U_2$, represented in their circuit forms, we first develop a quantum sampling circuit to estimate the normalized Schatten 2-norm of their difference ($\| U_1-U_2 \|_{S_2}$) with precision $\epsilon$, using only one clean qubit and one classical random variable. We prove a Poly$(\frac{1}{\epsilon})$ upper bound on the sample complexity, which is independent of the size of the quantum system. We then show that such a similarity metric is directly related to a functional definition of similarity of unitary operations using the conventional fidelity metric of quantum states ($F$): If $\| U_1-U_2 \|_{S_2}$ is sufficiently small (e.g. $ \leq \frac{\epsilon}{1+\sqrt{2(1/\delta - 1)}}$) then the fidelity of states obtained by processing the same randomly and uniformly picked pure state, $|\psi \rangle$, is as high as needed ($F({U}_1 |\psi \rangle, {U}_2 |\psi \rangle)\geq 1-\epsilon$) with probability exceeding $1-\delta$. We provide example applications of this efficient similarity metric estimation framework to quantum circuit learning tasks, such as finding the square root of a given unitary operation.

* 25 pages, 4 figures, 6 tables, 1 algorithm

Via

Access Paper or Ask Questions

Diverse Imitation Learning via Self-Organizing Generative Models

May 06, 2022

Arash Vahabpour, Tianyi Wang, Qiujing Lu, Omead Pooladzandi, Vwani Roychowdhury

Figure 1 for Diverse Imitation Learning via Self-Organizing Generative Models

Figure 2 for Diverse Imitation Learning via Self-Organizing Generative Models

Figure 3 for Diverse Imitation Learning via Self-Organizing Generative Models

Figure 4 for Diverse Imitation Learning via Self-Organizing Generative Models

Abstract:Imitation learning is the task of replicating expert policy from demonstrations, without access to a reward function. This task becomes particularly challenging when the expert exhibits a mixture of behaviors. Prior work has introduced latent variables to model variations of the expert policy. However, our experiments show that the existing works do not exhibit appropriate imitation of individual modes. To tackle this problem, we adopt an encoder-free generative model for behavior cloning (BC) to accurately distinguish and imitate different modes. Then, we integrate it with GAIL to make the learning robust towards compounding errors at unseen states. We show that our method significantly outperforms the state of the art across multiple experiments.

Via

Access Paper or Ask Questions

Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media

Mar 30, 2022

Pavan Holur, Tianyi Wang, Shadi Shahsavari, Timothy Tangherlini, Vwani Roychowdhury

Figure 1 for Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media

Figure 2 for Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media

Figure 3 for Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media

Figure 4 for Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media

Abstract:Social media is a breeding ground for threat narratives and related conspiracy theories. In these, an outside group threatens the integrity of an inside group, leading to the emergence of sharply defined group identities: Insiders -- agents with whom the authors identify and Outsiders -- agents who threaten the insiders. Inferring the members of these groups constitutes a challenging new NLP task: (i) Information is distributed over many poorly-constructed posts; (ii) Threats and threat agents are highly contextual, with the same post potentially having multiple agents assigned to membership in either group; (iii) An agent's identity is often implicit and transitive; and (iv) Phrases used to imply Outsider status often do not follow common negative sentiment patterns. To address these challenges, we define a novel Insider-Outsider classification task. Because we are not aware of any appropriate existing datasets or attendant models, we introduce a labeled dataset (CT5K) and design a model (NP2IO) to address this task. NP2IO leverages pretrained language modeling to classify Insiders and Outsiders. NP2IO is shown to be robust, generalizing to noun phrases not seen during training, and exceeding the performance of non-trivial baseline models by $20\%$.

* ACL 2022: 60th Annual Meeting of the Association for Computational Linguistics 8+4 pages, 6 figures

Via

Access Paper or Ask Questions