Abstract:In dynamic autonomous driving environment, Artificial Intelligence-Generated Content (AIGC) technology can supplement vehicle perception and decision making by leveraging models' generative and predictive capabilities, and has the potential to enhance motion planning, trajectory prediction and traffic simulation. This article proposes a cloud-edge-terminal collaborative architecture to support AIGC for autonomous driving. By delving into the unique properties of AIGC services, this article initiates the attempts to construct mutually supportive AIGC and network systems for autonomous driving, including communication, storage and computation resource allocation schemes to support AIGC services, and leveraging AIGC to assist system design and resource management.
Abstract:In recent years, there has been significant progress in semantic communication systems empowered by deep learning techniques. It has greatly improved the efficiency of information transmission. Nevertheless, traditional semantic communication models still face challenges, particularly due to their single-task and single-modal orientation. Many of these models are designed for specific tasks, which may result in limitations when applied to multi-task communication systems. Moreover, these models often overlook the correlations among different modal data in multi-modal tasks. It leads to an incomplete understanding of complex information, causing increased communication overhead and diminished performance. To address these problems, we propose a multi-modal fusion-based multi-task semantic communication (MFMSC) framework. In contrast to traditional semantic communication approaches, MFMSC can effectively handle various tasks across multiple modalities. Furthermore, we design a fusion module based on Bidirectional Encoder Representations from Transformers (BERT) for multi-modal semantic information fusion. By leveraging the powerful semantic understanding capabilities and self-attention mechanism of BERT, we achieve effective fusion of semantic information from different modalities. We compare our model with multiple benchmarks. Simulation results show that MFMSC outperforms these models in terms of both performance and communication overhead.
Abstract:The development of Intelligent Transportation System (ITS) has brought about comprehensive urban traffic information that not only provides convenience to urban residents in their daily lives but also enhances the efficiency of urban road usage, leading to a more harmonious and sustainable urban life. Typical scenarios in ITS mainly include traffic flow prediction, traffic target recognition, and vehicular edge computing. However, most current ITS applications rely on a centralized training approach where users upload source data to a cloud server with high computing power for management and centralized training. This approach has limitations such as poor real-time performance, data silos, and difficulty in guaranteeing data privacy. To address these limitations, federated learning (FL) has been proposed as a promising solution. In this paper, we present a comprehensive review of the application of FL in ITS, with a particular focus on three key scenarios: traffic flow prediction, traffic target recognition, and vehicular edge computing. For each scenario, we provide an in-depth analysis of its key characteristics, current challenges, and specific manners in which FL is leveraged. Moreover, we discuss the benefits that FL can offer as a potential solution to the limitations of the centralized training approach currently used in ITS applications.
Abstract:Diabetic retinopathy (DR), as a debilitating ocular complication, necessitates prompt intervention and treatment. Despite the effectiveness of artificial intelligence in aiding DR grading, the progression of research toward enhancing the interpretability of DR grading through precise lesion segmentation faces a severe hindrance due to the scarcity of pixel-level annotated DR datasets. To mitigate this, this paper presents and delineates TJDR, a high-quality DR pixel-level annotation dataset, which comprises 561 color fundus images sourced from the Tongji Hospital Affiliated to Tongji University. These images are captured using diverse fundus cameras including Topcon's TRC-50DX and Zeiss CLARUS 500, exhibit high resolution. For the sake of adhering strictly to principles of data privacy, the private information of images is meticulously removed while ensuring clarity in displaying anatomical structures such as the optic disc, retinal blood vessels, and macular fovea. The DR lesions are annotated using the Labelme tool, encompassing four prevalent DR lesions: Hard Exudates (EX), Hemorrhages (HE), Microaneurysms (MA), and Soft Exudates (SE), labeled respectively from 1 to 4, with 0 representing the background. Significantly, experienced ophthalmologists conduct the annotation work with rigorous quality assurance, culminating in the construction of this dataset. This dataset has been partitioned into training and testing sets and publicly released to contribute to advancements in the DR lesion segmentation research community.
Abstract:Drowsiness driving is a major cause of traffic accidents and thus numerous previous researches have focused on driver drowsiness detection. Many drive relevant factors have been taken into consideration for fatigue detection and can lead to high precision, but there are still several serious constraints, such as most existing models are environmentally susceptible. In this paper, fatigue detection is considered as temporal action detection problem instead of image classification. The proposed detection system can be divided into four parts: (1) Localize the key patches of the detected driver picture which are critical for fatigue detection and calculate the corresponding optical flow. (2) Contrast Limited Adaptive Histogram Equalization (CLAHE) is used in our system to reduce the impact of different light conditions. (3) Three individual two-stream networks combined with attention mechanism are designed for each feature to extract temporal information. (4) The outputs of the three sub-networks will be concatenated and sent to the fully-connected network, which judges the status of the driver. The drowsiness detection system is trained and evaluated on the famous Nation Tsing Hua University Driver Drowsiness Detection (NTHU-DDD) dataset and we obtain an accuracy of 94.46%, which outperforms most existing fatigue detection models.
Abstract:As one of the fundamental tasks in computer vision, semantic segmentation plays an important role in real world applications. Although numerous deep learning models have made notable progress on several mainstream datasets with the rapid development of convolutional networks, they still encounter various challenges in practical scenarios. Unsupervised adaptive semantic segmentation aims to obtain a robust classifier trained with source domain data, which is able to maintain stable performance when deployed to a target domain with different data distribution. In this paper, we propose an innovative progressive feature refinement framework, along with domain adversarial learning to boost the transferability of segmentation networks. Specifically, we firstly align the multi-stage intermediate feature maps of source and target domain images, and then a domain classifier is adopted to discriminate the segmentation output. As a result, the segmentation models trained with source domain images can be transferred to a target domain without significant performance degradation. Experimental results verify the efficiency of our proposed method compared with state-of-the-art methods.