Abstract:Classical variational inference for Bayesian neural networks (BNNs) in parameter space usually suffers from unresolved prior issues such as knowledge encoding intractability and pathological behaviors in deep networks, which could lead to an improper posterior inference. Hence, functional variational inference has been proposed recently to resolve these issues via stochastic process priors. Beyond variational inference, stochastic gradient Markov Chain Monte Carlo (SGMCMC) is another scalable and effective inference method for BNNs to asymptotically generate samples from true posterior by simulating a continuous dynamic. However, the existing SGMCMC methods only work in parametric space, which has the same issues of parameter-space variational inference, and extending the parameter-space dynamics to function-space dynamics is not a trivial undertaking. In this paper, we introduce a new functional SGMCMC scheme via newly designed diffusion dynamics, which can incorporate more informative functional priors. Moreover, we prove that the stationary distribution of these functional dynamics is the target posterior distribution over functions. We demonstrate better performance in both accuracy and uncertainty quantification of our functional SGMCMC on several tasks compared with naive SGMCMC and functional variational inference methods.
Abstract:Cross-Domain Recommendation (CDR) is a promising paradigm inspired by transfer learning to solve the cold-start problem in recommender systems. Existing state-of-the-art CDR methods train an explicit mapping function to transfer the cold-start users from a data-rich source domain to a target domain. However, a limitation of these methods is that the mapping function is trained on overlapping users across domains, while only a small number of overlapping users are available for training. By visualizing the loss landscape of the existing CDR model, we find that training on a small number of overlapping users causes the model to converge to sharp minima, leading to poor generalization. Based on this observation, we leverage loss-geometry-based machine learning approach and propose a novel CDR method called Sharpness-Aware CDR (SCDR). Our proposed method simultaneously optimizes recommendation loss and loss sharpness, leading to better generalization with theoretical guarantees. Empirical studies on real-world datasets demonstrate that SCDR significantly outperforms the other CDR models for cold-start recommendation tasks, while concurrently enhancing the model's robustness to adversarial attacks.
Abstract:Artificial Intelligence (AI) has emerged as a key driver of precision agriculture, facilitating enhanced crop productivity, optimized resource use, farm sustainability, and informed decision-making. Also, the expansion of genome sequencing technology has greatly increased crop genomic resources, deepening our understanding of genetic variation and enhancing desirable crop traits to optimize performance in various environments. There is increasing interest in using machine learning (ML) and deep learning (DL) algorithms for genotype-to-phenotype prediction due to their excellence in capturing complex interactions within large, high-dimensional datasets. In this work, we propose a new LSTM autoencoder-based model for barley genotype-to-phenotype prediction, specifically for flowering time and grain yield estimation, which could potentially help optimize yields and management practices. Our model outperformed the other baseline methods, demonstrating its potential in handling complex high-dimensional agricultural datasets and enhancing crop phenotype prediction performance.
Abstract:AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost and high energy consumption. Through close collaboration between clinical experts and DUCG technicians, 46 DUCG models covering 54 chief complaints were constructed. Over 1,000 diseases can be diagnosed without triage. Before being applied in real-world, the 46 DUCG models were retrospectively verified by third-party hospitals. The verified diagnostic precisions were no less than 95%, in which the diagnostic precision for every disease including uncommon ones was no less than 80%. After verifications, the 46 DUCG models were applied in the real-world in China. Over one million real diagnosis cases have been performed, with only 17 incorrect diagnoses identified. Due to DUCG's transparency, the mistakes causing the incorrect diagnoses were found and corrected. The diagnostic abilities of the clinicians who applied DUCG frequently were improved significantly. Following the introduction to the earlier presented DUCG methodology, the recommendation algorithm for potential medical checks is presented and the key idea of DUCG is extracted.
Abstract:With the development of Large Language Models (LLMs), social biases in the LLMs have become a crucial issue. While various benchmarks for social biases have been provided across languages, the extent to which Japanese LLMs exhibit social biases has not been fully investigated. In this study, we construct the Japanese Bias Benchmark dataset for Question Answering (JBBQ) based on the English bias benchmark BBQ, and analyze social biases in Japanese LLMs. The results show that while current Japanese LLMs improve their accuracies on JBBQ by instruction-tuning, their bias scores become larger. In addition, augmenting their prompts with warning about social biases reduces the effect of biases in some models.
Abstract:Deep reinforcement learning is used in various domains, but usually under the assumption that the environment has stationary conditions like transitions and state distributions. When this assumption is not met, performance suffers. For this reason, tracking continuous environmental changes and adapting to unpredictable conditions is challenging yet crucial because it ensures that systems remain reliable and flexible in practical scenarios. Our research introduces Behavior-Aware Detection and Adaptation (BADA), an innovative framework that merges environmental change detection with behavior adaptation. The key inspiration behind our method is that policies exhibit different global behaviors in changing environments. Specifically, environmental changes are identified by analyzing variations between behaviors using Wasserstein distances without manually set thresholds. The model adapts to the new environment through behavior regularization based on the extent of changes. The results of a series of experiments demonstrate better performance relative to several current algorithms. This research also indicates significant potential for tackling this long-standing challenge.
Abstract:Uncertain changes in data streams present challenges for machine learning models to dynamically adapt and uphold performance in real-time. Particularly, classification boundary change, also known as real concept drift, is the major cause of classification performance deterioration. However, accurately detecting real concept drift remains challenging because the theoretical foundations of existing drift detection methods - two-sample distribution tests and monitoring classification error rate, both suffer from inherent limitations such as the inability to distinguish virtual drift (changes not affecting the classification boundary, will introduce unnecessary model maintenance), limited statistical power, or high computational cost. Furthermore, no existing detection method can provide information on the trend of the drift, which could be invaluable for model maintenance. This work presents a novel real concept drift detection method based on Neighbor-Searching Discrepancy, a new statistic that measures the classification boundary difference between two samples. The proposed method is able to detect real concept drift with high accuracy while ignoring virtual drift. It can also indicate the direction of the classification boundary change by identifying the invasion or retreat of a certain class, which is also an indicator of separability change between classes. A comprehensive evaluation of 11 experiments is conducted, including empirical verification of the proposed theory using artificial datasets, and experimental comparisons with commonly used drift handling methods on real-world datasets. The results show that the proposed theory is robust against a range of distributions and dimensions, and the drift detection method outperforms state-of-the-art alternative methods.
Abstract:Real-world data often exhibit extreme imbalances and out-of-distribution (OOD) instances, which significantly biases the model training. While it has been extensively studied in vision and language domains separately, the impact of long-tailed open worlds on multi-modal large language models (MLLMs) has been largely overlooked. In this paper, we first demonstrate the susceptibility and vulnerability of vision-language models to significant biases caused by tail drift and out-of-distribution (OOD) drift during both the pre-training and fine-tuning stages. To eliminate the bias from different sources, we integrate the tailed drift adaptation and OOD drift detection into a unified framework by extending the concept drift theory to multi-modal. Specifically, a T-distribution-based drift adapter is proposed to effectively mitigate the bias induced by the long-tailed problem, which also facilitates the model in distinguishing OOD data through explicit distribution modelling. Extensive experiments show significant improvements in our model's ability to adapt to tailed drift and OOD drift. Moreover, it enhances the efficiency and accuracy of image-text alignment in vision language model pre-training, particularly in the long-tail open world scenario. Furthermore, we create a set of multi-modal datasets called OpenMMlo, specifically tailored for the long-tailed open world scenario, to validate our findings. To foster the development of the multi-modal community, we have made both OpenMMlo datasets and our code publicly available at: https://github.com/Anonymous0Knight/ConceptDriftMLLMs.
Abstract:Shifting the focus from principles to practical implementation, responsible artificial intelligence (AI) has garnered considerable attention across academia, industry, and society at large. Despite being in its nascent stages, this emerging field grapples with nebulous concepts and intricate knowledge frameworks. By analyzing three prevailing concepts - explainable AI, trustworthy AI, and ethical AI, this study defined responsible AI and identified its core principles. Methodologically, this study successfully demonstrated the implementation of leveraging AI's capabilities into bibliometrics for enhanced knowledge discovery and the cross-validation of experimentally examined models with domain insights. Empirically, this study investigated 17,799 research articles contributed by the AI community since 2015. This involves recognizing key technological players and their relationships, unveiling the topical landscape and hierarchy of responsible AI, charting its evolution, and elucidating the interplay between the responsibility principles and primary AI techniques. An analysis of a core cohort comprising 380 articles from multiple disciplines captures the most recent advancements in responsible AI. As one of the pioneering bibliometric studies dedicated to exploring responsible AI, this study will provide comprehensive macro-level insights, enhancing the understanding of responsible AI while furnishing valuable knowledge support for AI regulation and governance initiatives.
Abstract:Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.