Abstract:Semantic scene completion (SSC) is essential for achieving comprehensive perception in autonomous driving systems. However, existing SSC methods often overlook the high deployment costs in real-world applications. Traditional architectures, such as 3D Convolutional Neural Networks (3D CNNs) and self-attention mechanisms, face challenges in efficiently capturing long-range dependencies within 3D voxel grids, limiting their effectiveness. To address these issues, we introduce MetaSSC, a novel meta-learning-based framework for SSC that leverages deformable convolution, large-kernel attention, and the Mamba (D-LKA-M) model. Our approach begins with a voxel-based semantic segmentation (SS) pretraining task, aimed at exploring the semantics and geometry of incomplete regions while acquiring transferable meta-knowledge. Using simulated cooperative perception datasets, we supervise the perception training of a single vehicle using aggregated sensor data from multiple nearby connected autonomous vehicles (CAVs), generating richer and more comprehensive labels. This meta-knowledge is then adapted to the target domain through a dual-phase training strategy that does not add extra model parameters, enabling efficient deployment. To further enhance the model's capability in capturing long-sequence relationships within 3D voxel grids, we integrate Mamba blocks with deformable convolution and large-kernel attention into the backbone network. Extensive experiments demonstrate that MetaSSC achieves state-of-the-art performance, significantly outperforming competing models while also reducing deployment costs.
Abstract:Online social platforms have become more and more popular, and the dissemination of information on social networks has attracted wide attention of the industries and academia. Aiming at selecting a small subset of nodes with maximum influence on networks, the Influence Maximization (IM) problem has been extensively studied. Since it is #P-hard to compute the influence spread given a seed set, the state-of-art methods, including heuristic and approximation algorithms, faced with great difficulties such as theoretical guarantee, time efficiency, generalization, etc. This makes it unable to adapt to large-scale networks and more complex applications. With the latest achievements of Deep Reinforcement Learning (DRL) in artificial intelligence and other fields, a lot of works has focused on exploiting DRL to solve the combinatorial optimization problems. Inspired by this, we propose a novel end-to-end DRL framework, ToupleGDD, to address the IM problem in this paper, which incorporates three coupled graph neural networks for network embedding and double deep Q-networks for parameters learning. Previous efforts to solve the IM problem with DRL trained their models on the subgraph of the whole network, and then tested their performance on the whole graph, which makes the performance of their models unstable among different networks. However, our model is trained on several small randomly generated graphs and tested on completely different networks, and can obtain results that are very close to the state-of-the-art methods. In addition, our model is trained with a small budget, and it can perform well under various large budgets in the test, showing strong generalization ability. Finally, we conduct entensive experiments on synthetic and realistic datasets, and the experimental results prove the effectiveness and superiority of our model.
Abstract:The online social platforms, like Twitter, Facebook, LinkedIn and WeChat, have grown really fast in last decade and have been one of the most effective platforms for people to communicate and share information with each other. Due to the "word of mouth" effects, information usually can spread rapidly on these social media platforms. Therefore, it is important to study the mechanisms driving the information diffusion and quantify the consequence of information spread. A lot of efforts have been focused on this problem to help us better understand and achieve higher performance in viral marketing and advertising. On the other hand, the development of neural networks has blossomed in the last few years, leading to a large number of graph representation learning (GRL) models. Compared to traditional models, GRL methods are often shown to be more effective. In this paper, we present a comprehensive review for existing works using GRL methods for popularity prediction problem, and categorize related literatures into two big classes, according to their mainly used model and techniques: embedding-based methods and deep learning methods. Deep learning method is further classified into six small classes: convolutional neural networks, graph convolutional networks, graph attention networks, graph neural networks, recurrent neural networks, and reinforcement learning. We compare the performance of these different models and discuss their strengths and limitations. Finally, we outline the challenges and future chances for popularity prediction problem.
Abstract:In the last decade, deep learning (DL) approaches have been used successfully in computer vision (CV) applications. However, DL-based CV models are generally considered to be black boxes due to their lack of interpretability. This black box behavior has exacerbated user distrust and therefore has prevented widespread deployment DLCV models in autonomous driving tasks even though some of these models exhibit superiority over human performance. For this reason, it is essential to develop explainable DL models for autonomous driving task. Explainable DL models can not only boost user trust in autonomy but also serve as a diagnostic approach to identify anydefects and weaknesses of the model during the system development phase. In this paper, we propose an explainable end-to-end autonomous driving system based on "Transformer", a state-of-the-art (SOTA) self-attention based model, to map visual features from images collected by onboard cameras to guide potential driving actions with corresponding explanations. The model achieves a soft attention over the global features of the image. The results demonstrate the efficacy of our proposed model as it exhibits superior performance (in terms of correct prediction of actions and explanations) compared to the benchmark model by a significant margin with lower computational cost.
Abstract:Past research and practice have demonstrated that dynamic rerouting framework is effective in mitigating urban traffic congestion and thereby improve urban travel efficiency. It has been suggested that dynamic rerouting could be facilitated using emerging technologies such as fog-computing which offer advantages of low-latency capabilities and information exchange between vehicles and roadway infrastructure. To address this question, this study proposes a two-stage model that combines GAQ (Graph Attention Network - Deep Q Learning) and EBkSP (Entropy Based k Shortest Path) using a fog-cloud architecture, to reroute vehicles in a dynamic urban environment and therefore to improve travel efficiency in terms of travel speed. First, GAQ analyzes the traffic conditions on each road and for each fog area, and then assigns a road index based on the information attention from both local and neighboring areas. Second, EBkSP assigns the route for each vehicle based on the vehicle priority and route popularity. A case study experiment is carried out to investigate the efficacy of the proposed model. At the model training stage, different methods are used to establish the vehicle priorities, and their impact on the results is assessed. Also, the proposed model is tested under various scenarios with different ratios of rerouting and background (non-rerouting) vehicles. The results demonstrate that vehicle rerouting using the proposed model can help attain higher speed and reduces possibility of severe congestion. This result suggests that the proposed model can be deployed by urban transportation agencies for dynamic rerouting and ultimately, to reduce urban traffic congestion.