Abstract:Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs). Although promising, existing heuristic mRAGs typically predefined fixed retrieval processes, which causes two issues: (1) Non-adaptive Retrieval Queries. (2) Overloaded Retrieval Queries. However, these flaws cannot be adequately reflected by current knowledge-seeking visual question answering (VQA) datasets, since the most required knowledge can be readily obtained with a standard two-step retrieval. To bridge the dataset gap, we first construct Dyn-VQA dataset, consisting of three types of "dynamic" questions, which require complex knowledge retrieval strategies variable in query, tool, and time: (1) Questions with rapidly changing answers. (2) Questions requiring multi-modal knowledge. (3) Multi-hop questions. Experiments on Dyn-VQA reveal that existing heuristic mRAGs struggle to provide sufficient and precisely relevant knowledge for dynamic questions due to their rigid retrieval processes. Hence, we further propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch. The underlying idea is to emulate the human behavior in question solution which dynamically decomposes complex multimodal questions into sub-question chains with retrieval action. Extensive experiments prove the effectiveness of our OmniSearch, also provide direction for advancing mRAG. The code and dataset will be open-sourced at https://github.com/Alibaba-NLP/OmniSearch.
Abstract:This work proposes DOFS, a pilot dataset of 3D deformable objects (DOs) (e.g., elasto-plastic objects) with full spatial information (i.e., top, side, and bottom information) using a novel and low-cost data collection platform with a transparent operating plane. The dataset consists of active manipulation action, multi-view RGB-D images, well-registered point clouds, 3D deformed mesh, and 3D occupancy with semantics, using a pinching strategy with a two-parallel-finger gripper. In addition, we trained a neural network with the down-sampled 3D occupancy and action as input to model the dynamics of an elasto-plastic object. Our dataset and all CADs of the data collection system will be released soon on our website.
Abstract:6G is envisaged to provide multimodal sensing, pervasive intelligence, global coverage, global coverage, etc., which poses extreme intricacy and new challenges to the network design and optimization. As the core part of 6G, wireless channel is the carrier and enabler for the flourishing technologies and novel services, which intrinsically determines the ultimate system performance. However, how to describe and utilize the complicated and high-dynamic characteristics of wireless channel accurately and effectively still remains great hallenges. To tackle this, digital twin is envisioned as a powerful technology to migrate the physical entities to virtual and computational world. In this article, we propose a large model driven digital twin channel generator (ChannelGPT) embedded with environment intelligence (EI) to enable pervasive intelligence paradigm for 6G network. EI is an iterative and interactive procedure to boost the system performance with online environment adaptivity. Firstly, ChannelGPT is capable of utilization the multimodal data from wireless channel and corresponding physical environment with the equipped sensing ability. Then, based on the fine-tuned large model, ChannelGPT can generate multi-scenario channel parameters, associated map information and wireless knowledge simultaneously, in terms of each task requirement. Furthermore, with the support of online multidimensional channel and environment information, the network entity will make accurate and immediate decisions for each 6G system layer. In practice, we also establish a ChannelGPT prototype to generate high-fidelity channel data for varied scenarios to validate the accuracy and generalization ability based on environment intelligence.
Abstract:Dynamic Graph Neural Networks (DyGNNs) have garnered increasing research attention for learning representations on evolving graphs. Despite their effectiveness, the limited expressive power of existing DyGNNs hinders them from capturing important evolving patterns of dynamic graphs. Although some works attempt to enhance expressive capability with heuristic features, there remains a lack of DyGNN frameworks with provable and quantifiable high-order expressive power. To address this research gap, we firstly propose the k-dimensional Dynamic WL tests (k-DWL) as the referencing algorithms to quantify the expressive power of DyGNNs. We demonstrate that the expressive power of existing DyGNNs is upper bounded by the 1-DWL test. To enhance the expressive power, we propose Dynamic Graph Neural Network with High-order expressive power (HopeDGN), which updates the representation of central node pair by aggregating the interaction history with neighboring node pairs. Our theoretical results demonstrate that HopeDGN can achieve expressive power equivalent to the 2-DWL test. We then present a Transformer-based implementation for the local variant of HopeDGN. Experimental results show that HopeDGN achieved performance improvements of up to 3.12%, demonstrating the effectiveness of HopeDGN.
Abstract:The air interface technology plays a crucial role in optimizing the communication quality for users. To address the challenges brought by the radio channel variations to air interface design, this article proposes a framework of wireless environment information-aided 6G AI-enabled air interface (WEI-6G AI$^{2}$), which actively acquires real-time environment details to facilitate channel fading prediction and communication technology optimization. Specifically, we first outline the role of WEI in supporting the 6G AI$^{2}$ in scenario adaptability, real-time inference, and proactive action. Then, WEI is delineated into four progressive steps: raw sensing data, features obtained by data dimensionality reduction, semantics tailored to tasks, and knowledge that quantifies the environmental impact on the channel. To validate the availability and compare the effect of different types of WEI, a path loss prediction use case is designed. The results demonstrate that leveraging environment knowledge requires only 2.2 ms of model inference time, which can effectively support real-time design for future 6G AI$^{2}$. Additionally, WEI can reduce the pilot overhead by 25\%. Finally, several open issues are pointed out, including multi-modal sensing data synchronization and information extraction method construction.
Abstract:We integrate neural operators with diffusion models to address the spectral limitations of neural operators in surrogate modeling of turbulent flows. While neural operators offer computational efficiency, they exhibit deficiencies in capturing high-frequency flow dynamics, resulting in overly smooth approximations. To overcome this, we condition diffusion models on neural operators to enhance the resolution of turbulent structures. Our approach is validated for different neural operators on diverse datasets, including a high Reynolds number jet flow simulation and experimental Schlieren velocimetry. The proposed method significantly improves the alignment of predicted energy spectra with true distributions compared to neural operators alone. Additionally, proper orthogonal decomposition analysis demonstrates enhanced spectral fidelity in space-time. This work establishes a new paradigm for combining generative models with neural operators to advance surrogate modeling of turbulent systems, and it can be used in other scientific applications that involve microstructure and high-frequency content. See our project page: vivekoommen.github.io/NO_DM
Abstract:This paper introduces a novel incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting (MCS) problems, enabling decision makers to progressively provide assignment example preference information. Specifically, we first construct a max-margin optimization-based model to model potentially non-monotonic preferences and inconsistent assignment example preference information in each iteration of the incremental preference elicitation process. Using the optimal objective function value of the max-margin optimization-based model, we devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration within the framework of uncertainty sampling in active learning. Once the termination criterion is satisfied, the sorting result for non-reference alternatives can be determined through the use of two optimization models, i.e., the max-margin optimization-based model and the complexity controlling optimization model. Subsequently, two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences, considering different termination criteria. Ultimately, we apply the proposed approach to a credit rating problem to elucidate the detailed implementation steps, and perform computational experiments on both artificial and real-world data sets to compare the proposed question selection strategies with several benchmark strategies.
Abstract:Deriving a representative model using value function-based methods from the perspective of preference disaggregation has emerged as a prominent and growing topic in multi-criteria sorting (MCS) problems. A noteworthy observation is that many existing approaches to learning a representative model for MCS problems traditionally assume the monotonicity of criteria, which may not always align with the complexities found in real-world MCS scenarios. Consequently, this paper proposes some approaches to learning a representative model for MCS problems with non-monotonic criteria through the integration of the threshold-based value-driven sorting procedure. To do so, we first define some transformation functions to map the marginal values and category thresholds into a UTA-like functional space. Subsequently, we construct constraint sets to model non-monotonic criteria in MCS problems and develop optimization models to check and rectify the inconsistency of the decision maker's assignment example preference information. By simultaneously considering the complexity and discriminative power of the models, two distinct lexicographic optimization-based approaches are developed to derive a representative model for MCS problems with non-monotonic criteria. Eventually, we offer an illustrative example and conduct comprehensive simulation experiments to elaborate the feasibility and validity of the proposed approaches.
Abstract:Existing neural network models to learn Hamiltonian systems, such as SympNets, although accurate in low-dimensions, struggle to learn the correct dynamics for high-dimensional many-body systems. Herein, we introduce Symplectic Graph Neural Networks (SympGNNs) that can effectively handle system identification in high-dimensional Hamiltonian systems, as well as node classification. SympGNNs combines symplectic maps with permutation equivariance, a property of graph neural networks. Specifically, we propose two variants of SympGNNs: i) G-SympGNN and ii) LA-SympGNN, arising from different parameterizations of the kinetic and potential energy. We demonstrate the capabilities of SympGNN on two physical examples: a 40-particle coupled Harmonic oscillator, and a 2000-particle molecular dynamics simulation in a two-dimensional Lennard-Jones potential. Furthermore, we demonstrate the performance of SympGNN in the node classification task, achieving accuracy comparable to the state-of-the-art. We also empirically show that SympGNN can overcome the oversmoothing and heterophily problems, two key challenges in the field of graph neural networks.
Abstract:One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of algorithms. However, these results are often derived from a purely causal viewpoint, which may overlook the specific RL context. We revisit this research line and find that incorporating RL-specific context can reduce unnecessary assumptions in previous identifiability analyses for latent states. More importantly, removing these assumptions allows algorithm design to go beyond the earlier boundaries constrained by them. Leveraging these insights, we propose a novel approach for general partially observable Markov Decision Processes (POMDPs) by replacing the complicated structural constraints in previous methods with two simple constraints for transition and reward preservation. With the two constraints, the proposed algorithm is guaranteed to disentangle state and noise that is faithful to the underlying dynamics. Empirical evidence from extensive benchmark control tasks demonstrates the superiority of our approach over existing counterparts in effectively disentangling state belief from noise.