Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, Australia
Abstract:In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in such a framework is that the prototype and image instance embeddings share the same representation transformation. However, in this paper, we find that there naturally exists a gap, which resembles the modality gap, between the prototype and image instance embeddings extracted from the frozen pre-trained backbone, and simply applying the same transformation during the adaptation phase constrains exploring the optimal representations and shrinks the gap between prototype and image representations. To solve this problem, we propose a simple yet effective method, contrastive prototype-image adaptation (CoPA), to adapt different transformations respectively for prototypes and images similarly to CLIP by treating prototypes as text prompts. Extensive experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently. Meanwhile, further analyses also indicate that CoPA can learn better representation clusters, enlarge the gap, and achieve minimal validation loss at the enlarged gap.
Abstract:Federated recommendation systems are essential for providing personalized recommendations while protecting user privacy. However, current methods mainly rely on ID-based item embeddings, neglecting the rich multimodal information of items. To address this, we propose a Federated Multimodal Recommendation System, called FedMR. FedMR uses a foundation model on the server to encode multimodal item data, such as images and text. To handle data heterogeneity caused by user preference differences, FedMR introduces a Mixing Feature Fusion Module on each client, which adjusts fusion strategy weights based on user interaction history to generate personalized item representations that capture users' fine-grained preferences. FedMR is compatible with existing ID-based federated recommendation systems, improving performance without modifying the original framework. Experiments on four real-world multimodal datasets demonstrate FedMR's effectiveness. The code is available at https://anonymous.4open.science/r/FedMR.
Abstract:Federated recommendation systems play a crucial role in protecting user privacy. However, existing methods primarily rely on ID-based item embeddings, overlooking the rich multimodal information of items. To address this limitation, we propose a novel Federated Multimodal Recommendation System called FedMR. FedMR leverages a foundation model on the server side to encode multimodal data, such as images and text, associated with items. To tackle the challenge of data heterogeneity caused by varying user preferences, FedMR introduces a Mixing Feature Fusion Module on the client. This module dynamically adjusts the weights of different fusion strategies based on user interaction history, generating personalized item embeddings that capture fine-grained user preferences. FedMR is compatible with existing ID-based federated recommendation systems, improving their performances without modifying the original framework. Our experiments on four real-world multimodal recommendation datasets demonstrate the effectiveness of FedMR. Our code is available at https://anonymous.4open.science/r/FedMR.
Abstract:Can large language models (LLMs) directly serve as powerful world models for model-based agents? While the gaps between the prior knowledge of LLMs and the specified environment's dynamics do exist, our study reveals that the gaps can be bridged by aligning an LLM with its deployed environment and such "world alignment" can be efficiently achieved by rule learning on LLMs. Given the rich prior knowledge of LLMs, only a few additional rules suffice to align LLM predictions with the specified environment dynamics. To this end, we propose a neurosymbolic approach to learn these rules gradient-free through LLMs, by inducing, updating, and pruning rules based on comparisons of agent-explored trajectories and world model predictions. The resulting world model is composed of the LLM and the learned rules. Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC). By optimizing look-ahead actions based on the precise world model, MPC significantly improves exploration and learning efficiency. Compared to existing LLM agents, WALL-E's reasoning only requires a few principal rules rather than verbose buffered trajectories being included in the LLM input. On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods, with lower costs on replanning time and the number of tokens used for reasoning. In Minecraft, WALL-E exceeds baselines by 15-30% in success rate while costing 8-20 fewer replanning rounds and only 60-80% of tokens. In ALFWorld, its success rate surges to a new record high of 95% only after 6 iterations.
Abstract:Traditional federated learning (FL) methods often rely on fixed weighting for parameter aggregation, neglecting the mutual influence by others. Hence, their effectiveness in heterogeneous data contexts is limited. To address this problem, we propose an influence-oriented federated learning framework, namely FedC^2I, which quantitatively measures Client-level and Class-level Influence to realize adaptive parameter aggregation for each client. Our core idea is to explicitly model the inter-client influence within an FL system via the well-crafted influence vector and influence matrix. The influence vector quantifies client-level influence, enables clients to selectively acquire knowledge from others, and guides the aggregation of feature representation layers. Meanwhile, the influence matrix captures class-level influence in a more fine-grained manner to achieve personalized classifier aggregation. We evaluate the performance of FedC^2I against existing federated learning methods under non-IID settings and the results demonstrate the superiority of our method.
Abstract:Federated Collaborative Filtering (FedCF) is an emerging field focused on developing a new recommendation framework with preserving privacy in a federated setting. Existing FedCF methods typically combine distributed Collaborative Filtering (CF) algorithms with privacy-preserving mechanisms, and then preserve personalized information into a user embedding vector. However, the user embedding is usually insufficient to preserve the rich information of the fine-grained personalization across heterogeneous clients. This paper proposes a novel personalized FedCF method by preserving users' personalized information into a latent variable and a neural model simultaneously. Specifically, we decompose the modeling of user knowledge into two encoders, each designed to capture shared knowledge and personalized knowledge separately. A personalized gating network is then applied to balance personalization and generalization between the global and local encoders. Moreover, to effectively train the proposed framework, we model the CF problem as a specialized Variational AutoEncoder (VAE) task by integrating user interaction vector reconstruction with missing value prediction. The decoder is trained to reconstruct the implicit feedback from items the user has interacted with, while also predicting items the user might be interested in but has not yet interacted with. Experimental results on benchmark datasets demonstrate that the proposed method outperforms other baseline methods, showcasing superior performance.
Abstract:Knowledge graphs (KGs) store enormous facts as relationships between entities. Due to the long-tailed distribution of relations and the incompleteness of KGs, there is growing interest in few-shot knowledge graph completion (FKGC). Existing FKGC methods often assume the existence of all entities in KGs, which may not be practical since new relations and entities can emerge over time. Therefore, we focus on a more challenging task called inductive few-shot knowledge graph completion (I-FKGC), where both relations and entities during the test phase are unknown before. Inspired by the idea of inductive reasoning, we cast I-FKGC as an inductive reasoning problem. Specifically, we propose a novel Graph Stochastic Neural Process approach (GS-NP), which consists of two major modules. In the first module, to obtain a generalized hypothesis (e.g., shared subgraph), we present a neural process-based hypothesis extractor that models the joint distribution of hypothesis, from which we can sample a hypothesis for predictions. In the second module, based on the hypothesis, we propose a graph stochastic attention-based predictor to test if the triple in the query set aligns with the extracted hypothesis. Meanwhile, the predictor can generate an explanatory subgraph identified by the hypothesis. Finally, the training of these two modules is seamlessly combined into a unified objective function, of which the effectiveness is verified by theoretical analyses as well as empirical studies. Extensive experiments on three public datasets demonstrate that our method outperforms existing methods and derives new state-of-the-art performance.
Abstract:Graph anomaly detection (GAD), which aims to identify abnormal nodes that differ from the majority within a graph, has garnered significant attention. However, current GAD methods necessitate training specific to each dataset, resulting in high training costs, substantial data requirements, and limited generalizability when being applied to new datasets and domains. To address these limitations, this paper proposes ARC, a generalist GAD approach that enables a ``one-for-all'' GAD model to detect anomalies across various graph datasets on-the-fly. Equipped with in-context learning, ARC can directly extract dataset-specific patterns from the target dataset using few-shot normal samples at the inference stage, without the need for retraining or fine-tuning on the target dataset. ARC comprises three components that are well-crafted for capturing universal graph anomaly patterns: 1) smoothness-based feature Alignment module that unifies the features of different datasets into a common and anomaly-sensitive space; 2) ego-neighbor Residual graph encoder that learns abnormality-related node embeddings; and 3) cross-attentive in-Context anomaly scoring module that predicts node abnormality by leveraging few-shot normal samples. Extensive experiments on multiple benchmark datasets from various domains demonstrate the superior anomaly detection performance, efficiency, and generalizability of ARC.
Abstract:The primary challenge in Federated Learning (FL) is to model non-IID distributions across clients, whose fine-grained structure is important to improve knowledge sharing. For example, some knowledge is globally shared across all clients, some is only transferable within a subgroup of clients, and some are client-specific. To capture and exploit this structure, we train models organized in a multi-level structure, called ``Multi-level Additive Models (MAM)'', for better knowledge-sharing across heterogeneous clients and their personalization. In federated MAM (FeMAM), each client is assigned to at most one model per level and its personalized prediction sums up the outputs of models assigned to it across all levels. For the top level, FeMAM trains one global model shared by all clients as FedAvg. For every mid-level, it learns multiple models each assigned to a subgroup of clients, as clustered FL. Every bottom-level model is trained for one client only. In the training objective, each model aims to minimize the residual of the additive predictions by the other models assigned to each client. To approximate the arbitrary structure of non-IID across clients, FeMAM introduces more flexibility and adaptivity to FL by incrementally adding new models to the prediction of each client and reassigning another if necessary, automatically optimizing the knowledge-sharing structure. Extensive experiments show that FeMAM surpasses existing clustered FL and personalized FL methods in various non-IID settings. Our code is available at https://github.com/shutong043/FeMAM.
Abstract:This paper demonstrates that pre-trained language models (PLMs) are strong foundation models for on-device meteorological variables modeling. We present LM-Weather, a generic approach to taming PLMs, that have learned massive sequential knowledge from the universe of natural language databases, to acquire an immediate capability to obtain highly customized models for heterogeneous meteorological data on devices while keeping high efficiency. Concretely, we introduce a lightweight personalized adapter into PLMs and endows it with weather pattern awareness. During communication between clients and the server, low-rank-based transmission is performed to effectively fuse the global knowledge among devices while maintaining high communication efficiency and ensuring privacy. Experiments on real-wold dataset show that LM-Weather outperforms the state-of-the-art results by a large margin across various tasks (e.g., forecasting and imputation at different scales). We provide extensive and in-depth analyses experiments, which verify that LM-Weather can (1) indeed leverage sequential knowledge from natural language to accurately handle meteorological sequence, (2) allows each devices obtain highly customized models under significant heterogeneity, and (3) generalize under data-limited and out-of-distribution (OOD) scenarios.