Brandeis University
Abstract:We propose a novel framework, Continuous_Time Attention, which infuses partial differential equations (PDEs) into the Transformer's attention mechanism to address the challenges of extremely long input sequences. Instead of relying solely on a static attention matrix, we allow attention weights to evolve over a pseudo_time dimension via diffusion, wave, or reaction_diffusion dynamics. This mechanism systematically smooths local noise, enhances long_range dependencies, and stabilizes gradient flow. Theoretically, our analysis shows that PDE_based attention leads to better optimization landscapes and polynomial rather than exponential decay of distant interactions. Empirically, we benchmark our method on diverse experiments_demonstrating consistent gains over both standard and specialized long sequence Transformer variants. Our findings highlight the potential of PDE_based formulations to enrich attention mechanisms with continuous_time dynamics and global coherence.
Abstract:We introduce Dynamic Manifold Evolution Theory (DMET),a unified framework that models large language model generation as a controlled dynamical system evolving on a low_dimensional semantic manifold. By casting latent_state updates as discrete time Euler approximations of continuous dynamics, we map intrinsic energy_driven flows and context_dependent forces onto Transformer components (residual connections, attention, feed-forward networks). Leveraging Lyapunov stability theory We define three empirical metrics (state continuity, clustering quality, topological persistence) that quantitatively link latent_trajectory properties to text fluency, grammaticality, and semantic coherence. Extensive experiments across decoding parameters validate DMET's predictions and yield principled guidelines for balancing creativity and consistency in text generation.
Abstract:Recent advances in Large Language Models (LLMs) have achieved strong performance, yet their internal reasoning remains opaque, limiting interpretability and trust in critical applications. We propose a novel Multi_Scale Manifold Alignment framework that decomposes the latent space into global, intermediate, and local semantic manifolds capturing themes, context, and word-level details. Our method introduces cross_scale mapping functions that jointly enforce geometric alignment (e.g., Procrustes analysis) and information preservation (via mutual information constraints like MINE or VIB). We further incorporate curvature regularization and hyperparameter tuning for stable optimization. Theoretical analysis shows that alignment error, measured by KL divergence, can be bounded under mild assumptions. This framework offers a unified explanation of how LLMs structure multi-scale semantics, advancing interpretability and enabling applications such as bias detection and robustness enhancement.
Abstract:With the continuous increase in the number and resolution of video surveillance cameras, the burden of transmitting and storing surveillance video is growing. Traditional communication methods based on Shannon's theory are facing optimization bottlenecks. Semantic communication, as an emerging communication method, is expected to break through this bottleneck and reduce the storage and transmission consumption of video. Existing semantic decoding methods often require many samples to train the neural network for each scene, which is time-consuming and labor-intensive. In this study, a semantic encoding and decoding method for surveillance video is proposed. First, the sketch was extracted as semantic information, and a sketch compression method was proposed to reduce the bit rate of semantic information. Then, an image translation network was proposed to translate the sketch into a video frame with a reference frame. Finally, a few-shot sketch decoding network was proposed to reconstruct video from sketch. Experimental results showed that the proposed method achieved significantly better video reconstruction performance than baseline methods. The sketch compression method could effectively reduce the storage and transmission consumption of semantic information with little compromise on video quality. The proposed method provides a novel semantic encoding and decoding method that only needs a few training samples for each surveillance scene, thus improving the practicality of the semantic communication system.
Abstract:This paper presents an overview of NTIRE 2025 the First Challenge on Event-Based Image Deblurring, detailing the proposed methodologies and corresponding results. The primary goal of the challenge is to design an event-based method that achieves high-quality image deblurring, with performance quantitatively assessed using Peak Signal-to-Noise Ratio (PSNR). Notably, there are no restrictions on computational complexity or model size. The task focuses on leveraging both events and images as inputs for single-image deblurring. A total of 199 participants registered, among whom 15 teams successfully submitted valid results, offering valuable insights into the current state of event-based image deblurring. We anticipate that this challenge will drive further advancements in event-based vision research.
Abstract:LLM-based tool agents offer natural language interfaces, enabling users to seamlessly interact with computing services. While REST APIs are valuable resources for building such agents, they must first be transformed into AI-compatible tools. Automatically generating AI-compatible tools from REST API documents can greatly streamline tool agent development and minimize user learning curves. However, API documentation often suffers from a lack of standardization, inconsistent schemas, and incomplete information. To address these issues, we developed \textbf{ToolFactory}, an open-source pipeline for automating tool generation from unstructured API documents. To enhance the reliability of the developed tools, we implemented an evaluation method to diagnose errors. Furthermore, we built a knowledge base of verified tools, which we leveraged to infer missing information from poorly documented APIs. We developed the API Extraction Benchmark, comprising 167 API documents and 744 endpoints in various formats, and designed a JSON schema to annotate them. This annotated dataset was utilized to train and validate ToolFactory. The experimental results highlight the effectiveness of ToolFactory. We also demonstrated ToolFactory by creating a domain-specific AI agent for glycomaterials research. ToolFactory exhibits significant potential for facilitating the seamless integration of scientific REST APIs into AI workflows.
Abstract:This paper presents a novel unified theoretical framework for understanding Transformer architectures by integrating Partial Differential Equations (PDEs), Neural Information Flow Theory, and Information Bottleneck Theory. We model Transformer information dynamics as a continuous PDE process, encompassing diffusion, self-attention, and nonlinear residual components. Our comprehensive experiments across image and text modalities demonstrate that the PDE model effectively captures key aspects of Transformer behavior, achieving high similarity (cosine similarity > 0.98) with Transformer attention distributions across all layers. While the model excels in replicating general information flow patterns, it shows limitations in fully capturing complex, non-linear transformations. This work provides crucial theoretical insights into Transformer mechanisms, offering a foundation for future optimizations in deep learning architectural design. We discuss the implications of our findings, potential applications in model interpretability and efficiency, and outline directions for enhancing PDE models to better mimic the intricate behaviors observed in Transformers, paving the way for more transparent and optimized AI systems.
Abstract:This paper explores the application of Stochastic Differential Equations (SDE) to interpret the text generation process of Large Language Models (LLMs) such as GPT-4. Text generation in LLMs is modeled as a stochastic process where each step depends on previously generated content and model parameters, sampling the next word from a vocabulary distribution. We represent this generation process using SDE to capture both deterministic trends and stochastic perturbations. The drift term describes the deterministic trends in the generation process, while the diffusion term captures the stochastic variations. We fit these functions using neural networks and validate the model on real-world text corpora. Through numerical simulations and comprehensive analyses, including drift and diffusion analysis, stochastic process property evaluation, and phase space exploration, we provide deep insights into the dynamics of text generation. This approach not only enhances the understanding of the inner workings of LLMs but also offers a novel mathematical perspective on language generation, which is crucial for diagnosing, optimizing, and controlling the quality of generated text.
Abstract:This study presents a novel approach that leverages Neural Ordinary Differential Equations (Neural ODEs) to unravel the intricate relationships between inputs and outputs in Large Language Models (LLMs), and employs robust control to fine-tune outputs to meet predefined standards. Central to our methodology is the transformation of LLM inputs and outputs into a lower-dimensional latent space, facilitating a detailed examination of the information processing pathways within LLMs. Neural ODEs play a pivotal role in this investigation by providing a dynamic model that captures the continuous evolution of data within the LLMs. Additionally, robust control mechanisms are applied to strategically adjust the model's outputs, ensuring they not only maintain high quality and reliability but also adhere to specific performance criteria. This fusion of Neural ODEs and robust control represents a significant advancement in LLM interpretability, offering a comprehensive framework that elucidates the previously opaque mechanisms of these complex models. Our empirical results validate the effectiveness of this integrated approach, making a substantial contribution to the field of explainable AI by merging advanced machine learning techniques with the critical need for transparency and control in AI outputs.
Abstract:The language evaluation information of the interactive group decision method at present is based on the one-dimension language variable. At the same time, multi-attribute group decision making method based on two-dimension linguistic information only use single-stage and static evaluation method. In this paper, we propose a dynamic group decision making method based on two-dimension linguistic information, combining dynamic interactive group decision making methods with two-dimensional language evaluation information The method first use Two-Dimensional Uncertain Linguistic Generalized Weighted Aggregation (DULGWA) Operators to aggregate the preference information of each decision maker, then adopting dynamic information entropy method to obtain weights of attributes at each stage. Finally we propose the group consistency index to quantify the termination conditions of group interaction. One example is given to verify the developed approach and to demonstrate its effectiveness