Abstract:This paper presents FLAF, a focal line and feature-constrained active view planning method for tracking failure avoidance in feature-based visual navigation of mobile robots. Our FLAF-based visual navigation is built upon a feature-based visual teach and repeat (VT\&R) framework, which supports many robotic applications by teaching a robot to navigate on various paths that cover a significant portion of daily autonomous navigation requirements. However, tracking failure in feature-based visual simultaneous localization and mapping (VSLAM) caused by textureless regions in human-made environments is still limiting VT\&R to be adopted in the real world. To address this problem, the proposed view planner is integrated into a feature-based visual SLAM system to build up an active VT\&R system that avoids tracking failure. In our system, a pan-tilt unit (PTU)-based active camera is mounted on the mobile robot. Using FLAF, the active camera-based VSLAM operates during the teaching phase to construct a complete path map and in the repeat phase to maintain stable localization. FLAF orients the robot toward more map points to avoid mapping failures during path learning and toward more feature-identifiable map points beneficial for localization while following the learned trajectory. Experiments in real scenarios demonstrate that FLAF outperforms the methods that do not consider feature-identifiability, and our active VT\&R system performs well in complex environments by effectively dealing with low-texture regions.
Abstract:End-to-end image transmission has recently become a crucial trend in intelligent wireless communications, driven by the increasing demand for high bandwidth efficiency. However, existing methods primarily optimize the trade-off between bandwidth cost and objective distortion, often failing to deliver visually pleasing results aligned with human perception. In this paper, we propose a novel rate-distortion-perception (RDP) jointly optimized joint source-channel coding (JSCC) framework to enhance perception quality in human communications. Our RDP-JSCC framework integrates a flexible plug-in conditional Generative Adversarial Networks (GANs) to provide detailed and realistic image reconstructions at the receiver, overcoming the limitations of traditional rate-distortion optimized solutions that typically produce blurry or poorly textured images. Based on this framework, we introduce a distortion-perception controllable transmission (DPCT) model, which addresses the variation in the perception-distortion trade-off. DPCT uses a lightweight spatial realism embedding module (SREM) to condition the generator on a realism map, enabling the customization of appearance realism for each image region at the receiver from a single transmission. Furthermore, for scenarios with scarce bandwidth, we propose an interest-oriented content-controllable transmission (CCT) model. CCT prioritizes the transmission of regions that attract user attention and generates other regions from an instance label map, ensuring both content consistency and appearance realism for all regions while proportionally reducing channel bandwidth costs. Comprehensive experiments demonstrate the superiority of our RDP-optimized image transmission framework over state-of-the-art engineered image transmission systems and advanced perceptual methods.
Abstract:Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by offering a unified data format, comprehensive sensory modalities, and a combination of real-world and simulated data. ARIO aims to improve the training of embodied AI agents, increasing their robustness and adaptability across various tasks and environments. Building upon the proposed new standard, we present a large-scale unified ARIO dataset, comprising approximately 3 million episodes collected from 258 series and 321,064 tasks. The ARIO standard and dataset represent a significant step towards bridging the gaps of existing data resources. By providing a cohesive framework for data collection and representation, ARIO paves the way for the development of more powerful and versatile embodied AI agents, capable of navigating and interacting with the physical world in increasingly complex and diverse ways. The project is available on https://imaei.github.io/project_pages/ario/
Abstract:Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To address this dilemma, we propose a general self-supervised machine learning framework that can handle diverse fundus diseases from unlabeled fundus images. Our method's AUC surpasses existing supervised approaches by 15.7%, and even exceeds performance of a single human expert. Furthermore, our model adapts well to various datasets from different regions, races, and heterogeneous image sources or qualities from multiple cameras or devices. Our method offers a label-free general framework to diagnose fundus diseases, which could potentially benefit telehealth programs for early screening of people at risk of vision loss.
Abstract:Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this article, we established a label-free method, name 'SSVT',which can automatically analyze un-labeled fundus images and generate high evaluation accuracy of 97.0% of four main eye diseases based on six public datasets and two datasets collected by Beijing Tongren Hospital. The promising results showcased the effectiveness of the proposed unsupervised learning method, and the strong application potential in biomedical resource shortage regions to improve global eye health.
Abstract:Intelligent task-oriented semantic communications (SemComs) have witnessed great progress with the development of deep learning (DL). In this paper, we propose a semantic-aware hybrid automatic repeat request (SemHARQ) framework for the robust and efficient transmissions of semantic features. First, to improve the robustness and effectiveness of semantic coding, a multi-task semantic encoder is proposed. Meanwhile, a feature importance ranking (FIR) method is investigated to ensure the important features delivery under limited channel resources. Then, to accurately detect the possible transmission errors, a novel feature distortion evaluation (FDE) network is designed to identify the distortion level of each feature, based on which an efficient HARQ method is proposed. Specifically, the corrupted features are retransmitted, where the remaining channel resources are used for incremental transmissions. The system performance is evaluated under different channel conditions in multi-task scenarios in Internet of Vehicles. Extensive experiments show that the proposed framework outperforms state-of-the-art works by more than 20% in rank-1 accuracy for vehicle re-identification, and 10% in vehicle color classification accuracy in the low signal-to-noise ratio regime.
Abstract:Semantic communications are expected to become the core new paradigms of the sixth generation (6G) wireless networks. Most existing works implicitly utilize channel information for codecs training, which leads to poor communications when channel type or statistical characteristics change. To tackle this issue posed by various channels, a novel channel-transferable semantic communications (CT-SemCom) framework is proposed, which adapts the codecs learned on one type of channel to other types of channels. Furthermore, integrating the proposed framework and the orthogonal frequency division multiplexing systems integrating non-orthogonal multiple access technologies, i.e., OFDM-NOMA systems, a power allocation problem to realize the transfer from additive white Gaussian noise (AWGN) channels to multi-subcarrier Rayleigh fading channels is formulated. We then design a semantics-similar dual transformation (SSDT) algorithm to derive analytical solutions with low complexity. Simulation results show that the proposed CT-SemCom framework with SSDT algorithm significantly outperforms the existing work w.r.t. channel transferability, e.g., the peak signal-to-noise ratio (PSNR) of image transmission improves by 4.2-7.3 dB under different variances of Rayleigh fading channels.
Abstract:Semantic communications, aiming at ensuring the successful delivery of the meaning of information, are expected to be one of the potential techniques for the next generation communications. However, the knowledge forming and synchronizing mechanism that enables semantic communication systems to extract and interpret the semantics of information according to the communication intents is still immature. In this paper, we propose a semantic image transmission framework with explicit semantic base (Seb), where Sebs are generated and employed as the knowledge shared between the transmitter and the receiver with flexible granularity. To represent images with Sebs, a novel Seb-based reference image generator is proposed to generate Sebs and then decompose the transmitted images. To further encode/decode the residual information for precise image reconstruction, a Seb-based image encoder/decoder is proposed. The key components of the proposed framework are optimized jointly by end-to-end (E2E) training, where the loss function is dedicated designed to tackle the problem of nondifferentiable operation in Seb-based reference image generator by introducing a gradient approximation mechanism. Extensive experiments show that the proposed framework outperforms state-of-art works by 0.5 - 1.5 dB in peak signal-to-noise ratio (PSNR) w.r.t. different signal-to-noise ratio (SNR).
Abstract:Semantic communications are expected to be an innovative solution to the emerging intelligent applications in the era of connected intelligence. In this paper, a novel scalable multitask semantic communication system with feature importance ranking (SMSC-FIR) is explored. Firstly, the multi-task correlations are investigated by a joint semantic encoder to extract relevant features. Then, a new scalable coding method is proposed based on feature importance ranking, which dynamically adjusts the coding rate and guarantees that important features for semantic tasks are transmitted with higher priority. Simulation results show that SMSC-FIR achieves performance gain w.r.t. individual intelligent tasks, especially in the low SNR regime.
Abstract:Internet of Vehicles (IoV) is expected to become the central infrastructure to provide advanced services to connected vehicles and users for higher transportation efficiency and security. A variety of emerging applications/services bring explosively growing demands for mobile data traffic between connected vehicles and roadside units (RSU), imposing the significant challenge of spectrum scarcity to IoV. In this paper, we propose a cooperative semantic-aware architecture to convey essential semantics from collaborated users to servers for lowering the data traffic. In contrast to current solutions that are mainly based on piling up highly complex signal processing techniques and multiple access capabilities in terms of syntactic communications, this paper puts forth the idea of semantic-aware content delivery in IoV. Specifically, the successful transmission of essential semantics of the source data is pursued, rather than the accurate reception of symbols regardless of its meaning as in conventional syntactic communications. To assess the benefits of the proposed architecture, we provide a case study of the image retrieval task for vehicles in intelligent transportation systems. Simulation results demonstrate that the proposed architecture outperforms the existing solutions with fewer radio resources, especially in a low signal-to-noise-ratio (SNR) regime, which can shed light on the potential of the proposed architecture in extending the applications in extreme environments.