Abstract:Reinforcement learning tasks in real-world scenarios often involve large, high-dimensional action spaces, leading to challenges such as convergence difficulties, instability, and high computational complexity. It is widely acknowledged that traditional value-based reinforcement learning algorithms struggle to address these issues effectively. A prevalent approach involves generating independent sub-actions within each dimension of the action space. However, this method introduces bias, hindering the learning of optimal policies. In this paper, we propose an advantage-based optimization method and an algorithm named Advantage Branching Dueling Q-network (ABQ). ABQ incorporates a baseline mechanism to tune the action value of each dimension, leveraging the advantage relationship across different sub-actions. With this approach, the learned policy can be optimized for each dimension. Empirical results demonstrate that ABQ outperforms BDQ, achieving 3%, 171%, and 84% more cumulative rewards in HalfCheetah, Ant, and Humanoid environments, respectively. Furthermore, ABQ exhibits competitive performance when compared against two continuous action benchmark algorithms, DDPG and TD3.
Abstract:Precipitation from tropical cyclones (TCs) can cause disasters such as flooding, mudslides, and landslides. Predicting such precipitation in advance is crucial, giving people time to prepare and defend against these precipitation-induced disasters. Developing deep learning (DL) rainfall prediction methods offers a new way to predict potential disasters. However, one problem is that most existing methods suffer from cumulative errors and lack physical consistency. Second, these methods overlook the importance of meteorological factors in TC rainfall and their integration with the numerical weather prediction (NWP) model. Therefore, we propose Tropical Cyclone Precipitation Diffusion (TCP-Diffusion), a multi-modal model for global tropical cyclone precipitation forecasting. It forecasts TC rainfall around the TC center for the next 12 hours at 3 hourly resolution based on past rainfall observations and multi-modal environmental variables. Adjacent residual prediction (ARP) changes the training target from the absolute rainfall value to the rainfall trend and gives our model the ability of rainfall change awareness, reducing cumulative errors and ensuring physical consistency. Considering the influence of TC-related meteorological factors and the useful information from NWP model forecasts, we propose a multi-model framework with specialized encoders to extract richer information from environmental variables and results provided by NWP models. The results of extensive experiments show that our method outperforms other DL methods and the NWP method from the European Centre for Medium-Range Weather Forecasts (ECMWF).
Abstract:In recently years, a significant amount of research has been conducted on applying deep learning methods for glaucoma classification and detection. However, the explainability of those established machine learning models remains a big concern. In this research, in contrast, we learn from cognitive science concept and study how ophthalmologists judge glaucoma detection. Simulating experts' efforts, we propose a hierarchical decision making system, centered around a holistic set of carefully designed biomarker-oriented machine learning models. While biomarkers represent the key indicators of how ophthalmologists identify glaucoma, they usually exhibit latent inter-relations. We thus construct a time series model, named TRI-LSTM, capable of calculating and uncovering potential and latent relationships among various biomarkers of glaucoma. Our model is among the first efforts to explore the intrinsic connections among glaucoma biomarkers. We monitor temporal relationships in patients' disease states over time and to capture and retain the progression of disease-relevant clinical information from prior visits, thereby enriching biomarker's potential relationships. Extensive experiments over real-world dataset have demonstrated the effectiveness of the proposed model.
Abstract:Due to the rapid growth of scientific publications, identifying all related reference articles in the literature has become increasingly challenging yet highly demanding. Existing methods primarily assess candidate publications from a static perspective, focusing on the content of articles and their structural information, such as citation relationships. There is a lack of research regarding how to account for the evolving impact among papers on their embeddings. Toward this goal, this paper introduces a temporal dimension to paper recommendation strategies. The core idea is to continuously update a paper's embedding when new citation relationships appear, enhancing its relevance for future recommendations. Whenever a citation relationship is added to the literature upon the publication of a paper, the embeddings of the two related papers are updated through a Temporal Graph Neural Network (TGN). A learnable memory update module based on a Recurrent Neural Network (RNN) is utilized to study the evolution of the embedding of a paper in order to predict its reference impact in a future timestamp. Such a TGN-based model learns a pattern of how people's views of the paper may evolve, aiming to guide paper recommendations more precisely. Extensive experiments on an open citation network dataset, including 313,278 articles from https://paperswithcode.com/about PaperWithCode, have demonstrated the effectiveness of the proposed approach.
Abstract:Accurate data association is crucial in reducing confusion, such as ID switches and assignment errors, in multi-object tracking (MOT). However, existing advanced methods often overlook the diversity among trajectories and the ambiguity and conflicts present in motion and appearance cues, leading to confusion among detections, trajectories, and associations when performing simple global data association. To address this issue, we propose a simple, versatile, and highly interpretable data association approach called Decomposed Data Association (DDA). DDA decomposes the traditional association problem into multiple sub-problems using a series of non-learning-based modules and selectively addresses the confusion in each sub-problem by incorporating targeted exploitation of new cues. Additionally, we introduce Occlusion-aware Non-Maximum Suppression (ONMS) to retain more occluded detections, thereby increasing opportunities for association with trajectories and indirectly reducing the confusion caused by missed detections. Finally, based on DDA and ONMS, we design a powerful multi-object tracker named DeconfuseTrack, specifically focused on resolving confusion in MOT. Extensive experiments conducted on the MOT17 and MOT20 datasets demonstrate that our proposed DDA and ONMS significantly enhance the performance of several popular trackers. Moreover, DeconfuseTrack achieves state-of-the-art performance on the MOT17 and MOT20 test sets, significantly outperforms the baseline tracker ByteTrack in metrics such as HOTA, IDF1, AssA. This validates that our tracking design effectively reduces confusion caused by simple global association.
Abstract:Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye \textcolor{blue}{lens} distortion in the peripheral region of the image leads to degraded performance of the \textcolor{blue}{existing} head pose estimation models trained on undistorted images. This paper presents a new approach for head pose estimation that uses the knowledge of head location in the image to reduce the negative effect of fisheye distortion. We develop an end-to-end convolutional neural network to estimate the head pose with the multi-task learning of head pose and head location. Our proposed network estimates the head pose directly from the fisheye image without the operation of rectification or calibration. We also created \textcolor{blue}{a} fisheye-\textcolor{blue}{distorted} version of the three popular head pose estimation datasets, BIWI, 300W-LP, and AFLW2000 for our experiments. Experiments results show that our network remarkably improves the accuracy of head pose estimation compared with other state-of-the-art one-stage and two-stage methods.
Abstract:The code clone detection method based on semantic similarity has important value in software engineering tasks (e.g., software evolution, software reuse). Traditional code clone detection technologies pay more attention to the similarity of code at the syntax level, and less attention to the semantic similarity of the code. As a result, candidate codes similar in semantics are ignored. To address this issue, we propose a code clone detection method based on semantic similarity. By treating code as a series of interdependent events that occur continuously, we design a model namely EDAM to encode code semantic information based on event embedding and event dependency. The EDAM model uses the event embedding method to model the execution characteristics of program statements and the data dependence information between all statements. In this way, we can embed the program semantic information into a vector and use the vector to detect codes similar in semantics. Experimental results show that the performance of our EDAM model is superior to state of-the-art open source models for code clone detection.
Abstract:This paper presents a physics-based data-driven method to learn predictive reduced-order models (ROMs) from high-fidelity simulations, and illustrates it in the challenging context of a single-injector combustion process. The method combines the perspectives of model reduction and machine learning. Model reduction brings in the physics of the problem, constraining the ROM predictions to lie on a subspace defined by the governing equations. This is achieved by defining the ROM in proper orthogonal decomposition (POD) coordinates, which embed the rich physics information contained in solution snapshots of a high-fidelity computational fluid dynamics (CFD) model. The machine learning perspective brings the flexibility to use transformed physical variables to define the POD basis. This is in contrast to traditional model reduction approaches that are constrained to use the physical variables of the high-fidelity code. Combining the two perspectives, the approach identifies a set of transformed physical variables that expose quadratic structure in the combustion governing equations and learns a quadratic ROM from transformed snapshot data. This learning does not require access to the high-fidelity model implementation. Numerical experiments show that the ROM accurately predicts temperature, pressure, velocity, species concentrations, and the limit-cycle amplitude, with speedups of more than five orders of magnitude over high-fidelity models. Moreover, ROM-predicted pressure traces accurately match the phase of the pressure signal and yield good approximations of the limit-cycle amplitude.
Abstract:Accurate day-ahead individual resident load forecasting is very important to various applications of smart grid. As a powerful machine learning technology, deep learning has shown great advantages in load forecasting task. However, deep learning is a computationally-hungry method, requires a plenty of training time and results in considerable energy consumed and a plenty of CO2 emitted. This aggravates the energy crisis and incurs a substantial cost to the environment. As a result, the deep learning methods are difficult to be popularized and applied in the real smart grid environment. In this paper, to reduce training time, energy consumed and CO2 emitted, we propose a efficient green model based on convolutional neural network, namely LoadCNN, for next-day load forecasting of individual resident. The training time, energy consumption, and CO2 emissions of LoadCNN are only approximately 1/70 of the corresponding indicators of other state-of-the-art models. Meanwhile, it achieves state-of-the-art performance in terms of prediction accuracy. LoadCNN is the first load forecasting model which simultaneously considers prediction accuracy, training time, energy efficiency and environment costs. It is a efficient green model that is able to be quickly, cost-effectively and environmental-friendly deployed in a realistic smart grid environment.
Abstract:Distributed statistical inference has recently attracted enormous attention. Many existing work focuses on the averaging estimator. We propose a one-step approach to enhance a simple-averaging based distributed estimator. We derive the corresponding asymptotic properties of the newly proposed estimator. We find that the proposed one-step estimator enjoys the same asymptotic properties as the centralized estimator. The proposed one-step approach merely requires one additional round of communication in relative to the averaging estimator; so the extra communication burden is insignificant. In finite sample cases, numerical examples show that the proposed estimator outperforms the simple averaging estimator with a large margin in terms of the mean squared errors. A potential application of the one-step approach is that one can use multiple machines to speed up large scale statistical inference with little compromise in the quality of estimators. The proposed method becomes more valuable when data can only be available at distributed machines with limited communication bandwidth.