Abstract:Visual emotion recognition (VER), which aims at understanding humans' emotional reactions toward different visual stimuli, has attracted increasing attention. Given the subjective and ambiguous characteristics of emotion, annotating a reliable large-scale dataset is hard. For reducing reliance on data labeling, domain adaptation offers an alternative solution by adapting models trained on labeled source data to unlabeled target data. Conventional domain adaptation methods require access to source data. However, due to privacy concerns, source emotional data may be inaccessible. To address this issue, we propose an unexplored task: source-free domain adaptation (SFDA) for VER, which does not have access to source data during the adaptation process. To achieve this, we propose a novel framework termed Bridge then Begin Anew (BBA), which consists of two steps: domain-bridged model generation (DMG) and target-related model adaptation (TMA). First, the DMG bridges cross-domain gaps by generating an intermediate model, avoiding direct alignment between two VER datasets with significant differences. Then, the TMA begins training the target model anew to fit the target structure, avoiding the influence of source-specific knowledge. Extensive experiments are conducted on six SFDA settings for VER. The results demonstrate the effectiveness of BBA, which achieves remarkable performance gains compared with state-of-the-art SFDA methods and outperforms representative unsupervised domain adaptation approaches.
Abstract:Training a general-purpose time series foundation models with robust generalization capabilities across diverse applications from scratch is still an open challenge. Efforts are primarily focused on fusing cross-domain time series datasets to extract shared subsequences as tokens for training models on Transformer architecture. However, due to significant statistical heterogeneity across domains, this cross-domain fusing approach doesn't work effectively as the same as fusing texts and images. To tackle this challenge, this paper proposes a novel federated learning approach to address the heterogeneity in time series foundation models training, namely FFTS. Specifically, each data-holding organization is treated as an independent client in a collaborative learning framework with federated settings, and then many client-specific local models will be trained to preserve the unique characteristics per dataset. Moreover, a new regularization mechanism will be applied to both client-side and server-side, thus to align the shared knowledge across heterogeneous datasets from different domains. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed federated learning approach. The newly learned time series foundation models achieve superior generalization capabilities on cross-domain time series analysis tasks, including forecasting, imputation, and anomaly detection.
Abstract:Personalization stands as the cornerstone of recommender systems (RecSys), striving to sift out redundant information and offer tailor-made services for users. However, the conventional cloud-based RecSys necessitates centralized data collection, posing significant risks of user privacy breaches. In response to this challenge, federated recommender systems (FedRecSys) have emerged, garnering considerable attention. FedRecSys enable users to retain personal data locally and solely share model parameters with low privacy sensitivity for global model training, significantly bolstering the system's privacy protection capabilities. Within the distributed learning framework, the pronounced non-iid nature of user behavior data introduces fresh hurdles to federated optimization. Meanwhile, the ability of federated learning to concurrently learn multiple models presents an opportunity for personalized user modeling. Consequently, the development of personalized FedRecSys (PFedRecSys) is crucial and holds substantial significance. This tutorial seeks to provide an introduction to PFedRecSys, encompassing (1) an overview of existing studies on PFedRecSys, (2) a comprehensive taxonomy of PFedRecSys spanning four pivotal research directions-client-side adaptation, server-side aggregation, communication efficiency, privacy and protection, and (3) exploration of open challenges and promising future directions in PFedRecSys. This tutorial aims to establish a robust foundation and spark new perspectives for subsequent exploration and practical implementations in the evolving realm of RecSys.
Abstract:Heatwaves, prolonged periods of extreme heat, have intensified in frequency and severity due to climate change, posing substantial risks to public health, ecosystems, and infrastructure. Despite advancements in Machine Learning (ML) modeling, accurate heatwave forecasting at weather scales (1--15 days) remains challenging due to the non-linear interactions between atmospheric drivers and the rarity of these extreme events. Traditional models relying on heuristic feature engineering often fail to generalize across diverse climates and capture the complexities of heatwave dynamics. This study introduces the Distribution-Informed Graph Neural Network (DI-GNN), a novel framework that integrates principles from Extreme Value Theory (EVT) into the graph neural network architecture. DI-GNN incorporates Generalized Pareto Distribution (GPD)-derived descriptors into the feature space, adjacency matrix, and loss function to enhance its sensitivity to rare heatwave occurrences. By prioritizing the tails of climatic distributions, DI-GNN addresses the limitations of existing methods, particularly in imbalanced datasets where traditional metrics like accuracy are misleading. Empirical evaluations using weather station data from British Columbia, Canada, demonstrate the superior performance of DI-GNN compared to baseline models. DI-GNN achieved significant improvements in balanced accuracy, recall, and precision, with high AUC and average precision scores, reflecting its robustness in distinguishing heatwave events.
Abstract:Federated recommendation systems are essential for providing personalized recommendations while protecting user privacy. However, current methods mainly rely on ID-based item embeddings, neglecting the rich multimodal information of items. To address this, we propose a Federated Multimodal Recommendation System, called FedMR. FedMR uses a foundation model on the server to encode multimodal item data, such as images and text. To handle data heterogeneity caused by user preference differences, FedMR introduces a Mixing Feature Fusion Module on each client, which adjusts fusion strategy weights based on user interaction history to generate personalized item representations that capture users' fine-grained preferences. FedMR is compatible with existing ID-based federated recommendation systems, improving performance without modifying the original framework. Experiments on four real-world multimodal datasets demonstrate FedMR's effectiveness. The code is available at https://anonymous.4open.science/r/FedMR.
Abstract:Federated recommendation systems play a crucial role in protecting user privacy. However, existing methods primarily rely on ID-based item embeddings, overlooking the rich multimodal information of items. To address this limitation, we propose a novel Federated Multimodal Recommendation System called FedMR. FedMR leverages a foundation model on the server side to encode multimodal data, such as images and text, associated with items. To tackle the challenge of data heterogeneity caused by varying user preferences, FedMR introduces a Mixing Feature Fusion Module on the client. This module dynamically adjusts the weights of different fusion strategies based on user interaction history, generating personalized item embeddings that capture fine-grained user preferences. FedMR is compatible with existing ID-based federated recommendation systems, improving their performances without modifying the original framework. Our experiments on four real-world multimodal recommendation datasets demonstrate the effectiveness of FedMR. Our code is available at https://anonymous.4open.science/r/FedMR.
Abstract:Can large language models (LLMs) directly serve as powerful world models for model-based agents? While the gaps between the prior knowledge of LLMs and the specified environment's dynamics do exist, our study reveals that the gaps can be bridged by aligning an LLM with its deployed environment and such "world alignment" can be efficiently achieved by rule learning on LLMs. Given the rich prior knowledge of LLMs, only a few additional rules suffice to align LLM predictions with the specified environment dynamics. To this end, we propose a neurosymbolic approach to learn these rules gradient-free through LLMs, by inducing, updating, and pruning rules based on comparisons of agent-explored trajectories and world model predictions. The resulting world model is composed of the LLM and the learned rules. Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC). By optimizing look-ahead actions based on the precise world model, MPC significantly improves exploration and learning efficiency. Compared to existing LLM agents, WALL-E's reasoning only requires a few principal rules rather than verbose buffered trajectories being included in the LLM input. On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods, with lower costs on replanning time and the number of tokens used for reasoning. In Minecraft, WALL-E exceeds baselines by 15-30% in success rate while costing 8-20 fewer replanning rounds and only 60-80% of tokens. In ALFWorld, its success rate surges to a new record high of 95% only after 6 iterations.
Abstract:Automatic LLM benchmarks, such as AlpacaEval 2.0, Arena-Hard-Auto, and MT-Bench, have become popular for evaluating language models due to their cost-effectiveness and scalability compared to human evaluation. Achieving high win rates on these benchmarks can significantly boost the promotional impact of newly released language models. This promotional benefit may motivate tricks, such as manipulating model output length or style to game win rates, even though several mechanisms have been developed to control length and disentangle style to reduce gameability. Nonetheless, we show that even a "null model" that always outputs a constant response (irrelevant to input instructions) can cheat automatic benchmarks and achieve top-ranked win rates: an 86.5% LC win rate on AlpacaEval 2.0; an 83.0 score on Arena-Hard-Auto; and a 9.55 score on MT-Bench. Moreover, the crafted cheating outputs are transferable because we assume that the instructions of these benchmarks (e.g., 805 samples of AlpacaEval 2.0) are private and cannot be accessed. While our experiments are primarily proof-of-concept, an adversary could use LLMs to generate more imperceptible cheating responses, unethically benefiting from high win rates and promotional impact. Our findings call for the development of anti-cheating mechanisms for reliable automatic benchmarks. The code is available at https://github.com/sail-sg/Cheating-LLM-Benchmarks.
Abstract:Traditional federated learning (FL) methods often rely on fixed weighting for parameter aggregation, neglecting the mutual influence by others. Hence, their effectiveness in heterogeneous data contexts is limited. To address this problem, we propose an influence-oriented federated learning framework, namely FedC^2I, which quantitatively measures Client-level and Class-level Influence to realize adaptive parameter aggregation for each client. Our core idea is to explicitly model the inter-client influence within an FL system via the well-crafted influence vector and influence matrix. The influence vector quantifies client-level influence, enables clients to selectively acquire knowledge from others, and guides the aggregation of feature representation layers. Meanwhile, the influence matrix captures class-level influence in a more fine-grained manner to achieve personalized classifier aggregation. We evaluate the performance of FedC^2I against existing federated learning methods under non-IID settings and the results demonstrate the superiority of our method.
Abstract:Hallucinations of large language models (LLMs) commonly occur in domain-specific downstream tasks, with no exception in ontology matching (OM). The prevalence of using LLMs for OM raises the need for benchmarks to better understand LLM hallucinations. The OAEI-LLM dataset is an extended version of the Ontology Alignment Evaluation Initiative (OAEI) datasets that evaluate LLM-specific hallucinations in OM tasks. We outline the methodology used in dataset construction and schema extension, and provide examples of potential use cases.