Abstract:Synchrotron radiation sources play a crucial role in fields such as materials science, biology, and chemistry. The beamline, a key subsystem of the synchrotron, modulates and directs the radiation to the sample for analysis. However, the alignment of beamlines is a complex and time-consuming process, primarily carried out manually by experienced engineers. Even minor misalignments in optical components can significantly affect the beam's properties, leading to suboptimal experimental outcomes. Current automated methods, such as bayesian optimization (BO) and reinforcement learning (RL), although these methods enhance performance, limitations remain. The relationship between the current and target beam properties, crucial for determining the adjustment, is not fully considered. Additionally, the physical characteristics of optical elements are overlooked, such as the need to adjust specific devices to control the output beam's spot size or position. This paper addresses the alignment of beamlines by modeling it as a Markov Decision Process (MDP) and training an intelligent agent using RL. The agent calculates adjustment values based on the current and target beam states, executes actions, and iterates until optimal parameters are achieved. A policy network with action attention is designed to improve decision-making by considering both state differences and the impact of optical components. Experiments on two simulated beamlines demonstrate that our algorithm outperforms existing methods, with ablation studies highlighting the effectiveness of the action attention-based policy network.
Abstract:This technical report outlines the methodologies we applied for the PRCV Challenge, focusing on cognition and decision-making in driving scenarios. We employed InternVL-2.0, a pioneering open-source multi-modal model, and enhanced it by refining both the model input and training methodologies. For the input data, we strategically concatenated and formatted the multi-view images. It is worth mentioning that we utilized the coordinates of the original images without transformation. In terms of model training, we initially pre-trained the model on publicly available autonomous driving scenario datasets to bolster its alignment capabilities of the challenge tasks, followed by fine-tuning on the DriveLM-nuscenes Dataset. During the fine-tuning phase, we innovatively modified the loss function to enhance the model's precision in predicting coordinate values. These approaches ensure that our model possesses advanced cognitive and decision-making capabilities in driving scenarios. Consequently, our model achieved a score of 0.6064, securing the first prize on the competition's final results.
Abstract:Traditional decision trees are limited by axis-orthogonal splits, which can perform poorly when true decision boundaries are oblique. While oblique decision tree methods address this limitation, they often face high computational costs, difficulties with multi-class classification, and a lack of effective feature selection. In this paper, we introduce LDATree and FoLDTree, two novel frameworks that integrate Uncorrelated Linear Discriminant Analysis (ULDA) and Forward ULDA into a decision tree structure. These methods enable efficient oblique splits, handle missing values, support feature selection, and provide both class labels and probabilities as model outputs. Through evaluations on simulated and real-world datasets, LDATree and FoLDTree consistently outperform axis-orthogonal and other oblique decision tree methods, achieving accuracy levels comparable to the random forest. The results highlight the potential of these frameworks as robust alternatives to traditional single-tree methods.
Abstract:Linear discriminant analysis (LDA), a traditional classification tool, suffers from limitations such as sensitivity to noise and computational challenges when dealing with non-invertible within-class scatter matrices. Traditional stepwise LDA frameworks, which iteratively select the most informative features, often exacerbate these issues by relying heavily on Wilks' $\Lambda$, potentially causing premature stopping of the selection process. This paper introduces a novel forward discriminant analysis framework that integrates Pillai's trace with Uncorrelated Linear Discriminant Analysis (ULDA) to address these challenges, and offers a unified and stand-alone classifier. Through simulations and real-world datasets, the new framework demonstrates effective control of Type I error rates and improved classification accuracy, particularly in cases involving perfect group separations. The results highlight the potential of this approach as a robust alternative to the traditional stepwise LDA framework.
Abstract:In Reinforcement Learning-based Recommender Systems (RLRS), the complexity and dynamism of user interactions often result in high-dimensional and noisy state spaces, making it challenging to discern which aspects of the state are truly influential in driving the decision-making process. This issue is exacerbated by the evolving nature of user preferences and behaviors, requiring the recommender system to adaptively focus on the most relevant information for decision-making while preserving generaliability. To tackle this problem, we introduce an innovative causal approach for decomposing the state and extracting \textbf{C}ausal-\textbf{I}n\textbf{D}ispensable \textbf{S}tate Representations (CIDS) in RLRS. Our method concentrates on identifying the \textbf{D}irectly \textbf{A}ction-\textbf{I}nfluenced \textbf{S}tate Variables (DAIS) and \textbf{A}ction-\textbf{I}nfluence \textbf{A}ncestors (AIA), which are essential for making effective recommendations. By leveraging conditional mutual information, we develop a framework that not only discerns the causal relationships within the generative process but also isolates critical state variables from the typically dense and high-dimensional state representations. We provide theoretical evidence for the identifiability of these variables. Then, by making use of the identified causal relationship, we construct causal-indispensable state representations, enabling the training of policies over a more advantageous subset of the agent's state space. We demonstrate the efficacy of our approach through extensive experiments, showcasing our method outperforms state-of-the-art methods.
Abstract:Reinforcement learning-based recommender systems have recently gained popularity. However, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in all domains. To counter these challenges, recent advancements have leveraged offline reinforcement learning methods, notable for their data-driven approach utilizing offline datasets. A prominent example of this is the Decision Transformer. Despite its popularity, the Decision Transformer approach has inherent drawbacks, particularly evident in recommendation methods based on it. This paper identifies two key shortcomings in existing Decision Transformer-based methods: a lack of stitching capability and limited effectiveness in online adoption. In response, we introduce a novel methodology named Max-Entropy enhanced Decision Transformer with Reward Relabeling for Offline RLRS (EDT4Rec). Our approach begins with a max entropy perspective, leading to the development of a max entropy enhanced exploration strategy. This strategy is designed to facilitate more effective exploration in online environments. Additionally, to augment the model's capability to stitch sub-optimal trajectories, we incorporate a unique reward relabeling technique. To validate the effectiveness and superiority of EDT4Rec, we have conducted comprehensive experiments across six real-world offline datasets and in an online simulator.
Abstract:In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can simulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keep accumulating experience from both successful and unsuccessful cases. Simulation experiments show that the treatment performance of doctor agents consistently improves on various tasks. More interestingly, the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicare benchmarks. After treating around ten thousand patients (real-world doctors may take over two years), the evolved doctor agent achieves a state-of-the-art accuracy of 93.06% on a subset of the MedQA dataset that covers major respiratory diseases. This work paves the way for advancing the applications of LLM-powered agent techniques in medical scenarios.
Abstract:Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often overlooking environmental stochasticity. In this study, we propose an uncertainty-aware distributional offline RL method to simultaneously address both epistemic uncertainty and environmental stochasticity. We propose a model-free offline RL algorithm capable of learning risk-averse policies and characterizing the entire distribution of discounted cumulative rewards, as opposed to merely maximizing the expected value of accumulated discounted returns. Our method is rigorously evaluated through comprehensive experiments in both risk-sensitive and risk-neutral benchmarks, demonstrating its superior performance.
Abstract:Reinforcement Learning-based Recommender Systems (RLRS) have shown promise across a spectrum of applications, from e-commerce platforms to streaming services. Yet, they grapple with challenges, notably in crafting reward functions and harnessing large pre-existing datasets within the RL framework. Recent advancements in offline RLRS provide a solution for how to address these two challenges. However, existing methods mainly rely on the transformer architecture, which, as sequence lengths increase, can introduce challenges associated with computational resources and training costs. Additionally, the prevalent methods employ fixed-length input trajectories, restricting their capacity to capture evolving user preferences. In this study, we introduce a new offline RLRS method to deal with the above problems. We reinterpret the RLRS challenge by modeling sequential decision-making as an inference task, leveraging adaptive masking configurations. This adaptive approach selectively masks input tokens, transforming the recommendation task into an inference challenge based on varying token subsets, thereby enhancing the agent's ability to infer across diverse trajectory lengths. Furthermore, we incorporate a multi-scale segmented retention mechanism that facilitates efficient modeling of long sequences, significantly enhancing computational efficiency. Our experimental analysis, conducted on both online simulator and offline datasets, clearly demonstrates the advantages of our proposed method.
Abstract:Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpreted within a broader context. In this work, we introduce a new benchmark, named as CODIS, designed to assess the ability of models to use context provided in free-form text to enhance visual comprehension. Our findings indicate that MLLMs consistently fall short of human performance on this benchmark. Further analysis confirms that these models struggle to effectively extract and utilize contextual information to improve their understanding of images. This underscores the pressing need to enhance the ability of MLLMs to comprehend visuals in a context-dependent manner. View our project website at https://thunlp-mt.github.io/CODIS.