Abstract:Spatio-temporal traffic prediction is crucial in intelligent transportation systems. The key challenge of accurate prediction is how to model the complex spatio-temporal dependencies and adapt to the inherent dynamics in data. Traditional Graph Convolutional Networks (GCNs) often struggle with static adjacency matrices that introduce domain bias or learnable matrices that may be overfitting to specific patterns. This challenge becomes more complex when considering Multi-Task Learning (MTL). While MTL has the potential to enhance prediction accuracy through task synergies, it can also face significant hurdles due to task interference. To overcome these challenges, this study introduces a novel MTL framework, Dynamic Group-wise Spatio-Temporal Multi-Task Learning (DG-STMTL). DG-STMTL proposes a hybrid adjacency matrix generation module that combines static matrices with dynamic ones through a task-specific gating mechanism. We also introduce a group-wise GCN module to enhance the modelling capability of spatio-temporal dependencies. We conduct extensive experiments on two real-world datasets to evaluate our method. Results show that our method outperforms other state-of-the-arts, indicating its effectiveness and robustness.
Abstract:The problem of constrained reinforcement learning (CRL) holds significant importance as it provides a framework for addressing critical safety satisfaction concerns in the field of reinforcement learning (RL). However, with the introduction of constraint satisfaction, the current CRL methods necessitate the utilization of second-order optimization or primal-dual frameworks with additional Lagrangian multipliers, resulting in increased complexity and inefficiency during implementation. To address these issues, we propose a novel first-order feasible method named Constrained Proximal Policy Optimization (CPPO). By treating the CRL problem as a probabilistic inference problem, our approach integrates the Expectation-Maximization framework to solve it through two steps: 1) calculating the optimal policy distribution within the feasible region (E-step), and 2) conducting a first-order update to adjust the current policy towards the optimal policy obtained in the E-step (M-step). We establish the relationship between the probability ratios and KL divergence to convert the E-step into a convex optimization problem. Furthermore, we develop an iterative heuristic algorithm from a geometric perspective to solve this problem. Additionally, we introduce a conservative update mechanism to overcome the constraint violation issue that occurs in the existing feasible region method. Empirical evaluations conducted in complex and uncertain environments validate the effectiveness of our proposed method, as it performs at least as well as other baselines.