Abstract:The management of mixed traffic that consists of robot vehicles (RVs) and human-driven vehicles (HVs) at complex intersections presents a multifaceted challenge. Traditional signal controls often struggle to adapt to dynamic traffic conditions and heterogeneous vehicle types. Recent advancements have turned to strategies based on reinforcement learning (RL), leveraging its model-free nature, real-time operation, and generalizability over different scenarios. We introduce a hierarchical RL framework to manage mixed traffic through precise longitudinal and lateral control of RVs. Our proposed hierarchical framework combines the state-of-the-art mixed traffic control algorithm as a high level decision maker to improve the performance and robustness of the whole system. Our experiments demonstrate that the framework can reduce the average waiting time by up to 54% compared to the state-of-the-art mixed traffic control method. When the RV penetration rate exceeds 60%, our technique consistently outperforms conventional traffic signal control programs in terms of the average waiting time for all vehicles at the intersection.
Abstract:Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (e.g. NeRF) encounter limitations due to the necessity of processing a large number of input views for reconstruction, coupled with their inherent inefficiencies in inference. Thus, we present the GaussianGrasper, which utilizes 3D Gaussian Splatting to explicitly represent the scene as a collection of Gaussian primitives. Our approach takes a limited set of RGB-D views and employs a tile-based splatting technique to create a feature field. In particular, we propose an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently and accurately distill language embeddings derived from foundational models. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately query and grasp objects with language instructions, providing a new solution for language-guided manipulation tasks. Data and codes can be available at https://github.com/MrSecant/GaussianGrasper.
Abstract:Greenhouse gas emissions have dramatically risen since the early 1900s with U.S. transportation generating 28% of the U.S' emissions. As such, there is interest in reducing transportation-related emissions. Specifically, sustainability research has sprouted around signalized intersections as intersections allow different streams of traffic to cross and change directions. Recent research has developed mixed traffic control eco-driving strategies at signalized intersections to decrease emissions. However, the inherent structure of a signalized intersection generates increased emissions by creating frequent acceleration/deceleration events, excessive idling from traffic congestion, and stop-and-go waves. Thus, we believe unsignalized intersections hold potential for further sustainability improvements. In this work, we provide an emissions analysis on unsignalized intersections with complex, real-world topologies and traffic demands where mixed traffic control strategies are employed by robot vehicles (RVs) to reduce waiting times and congestion. We find with at least 10% RV penetration rate, RVs generate less fuel consumption and NOx emissions than signalized intersections by up to 27% and 28%, respectively. With at least 30% RVs, CO and HC emissions are reduced by up to 42% and 43%, respectively. Additionally, RVs can reduce emissions across the whole network despite only employing their strategies at the intersections.
Abstract:Controlling and coordinating urban traffic flow through robot vehicles is emerging as a novel transportation paradigm for the future. While this approach garners growing attention from researchers and practitioners, effectively managing and coordinating large-scale mixed traffic remains a challenge. We introduce an effective framework for large-scale mixed traffic control via privacy-preserving crowdsourcing and dynamic vehicle routing. Our framework consists of three modules: a privacy-protecting crowdsensing method, a graph propagation-based traffic forecasting method, and a privacy-preserving route selection mechanism. We evaluate our framework using a real-world road network. The results show that our framework accurately forecasts traffic flow, efficiently mitigates network-wide RV shortage issue, and coordinates large-scale mixed traffic. Compared to other baseline methods, our framework not only reduces the RV shortage issue up to 69.4% but also reduces the average waiting time of all vehicles in the network up to 27%.
Abstract:Current successful methods of 3D scene perception rely on the large-scale annotated point cloud, which is tedious and expensive to acquire. In this paper, we propose Model2Scene, a novel paradigm that learns free 3D scene representation from Computer-Aided Design (CAD) models and languages. The main challenges are the domain gaps between the CAD models and the real scene's objects, including model-to-scene (from a single model to the scene) and synthetic-to-real (from synthetic model to real scene's object). To handle the above challenges, Model2Scene first simulates a crowded scene by mixing data-augmented CAD models. Next, we propose a novel feature regularization operation, termed Deep Convex-hull Regularization (DCR), to project point features into a unified convex hull space, reducing the domain gap. Ultimately, we impose contrastive loss on language embedding and the point features of CAD models to pre-train the 3D network. Extensive experiments verify the learned 3D scene representation is beneficial for various downstream tasks, including label-free 3D object salient detection, label-efficient 3D scene perception and zero-shot 3D semantic segmentation. Notably, Model2Scene yields impressive label-free 3D object salient detection with an average mAP of 46.08\% and 55.49\% on the ScanNet and S3DIS datasets, respectively. The code will be publicly available.
Abstract:Intersections are essential road infrastructures for traffic in modern metropolises; however, they can also be the bottleneck of traffic flows due to traffic incidents or the absence of traffic coordination mechanisms such as traffic lights. Thus, various control and coordination mechanisms that are beyond traditional control methods have been proposed to improve the efficiency of intersection traffic. Amongst these methods, the control of foreseeable hybrid traffic that consists of human-driven vehicles (HVs) and robot vehicles (RVs) has recently emerged. We propose a decentralized reinforcement learning approach for the control and coordination of hybrid traffic at real-world, complex intersections--a topic that has not been previously explored. Comprehensive experiments are conducted to show the effectiveness of our approach. In particular, we show that using 5% RVs, we can prevent congestion formation inside the intersection under the actual traffic demand of 700 vehicles per hour. In contrast, without RVs, congestion starts to develop when the traffic demand reaches as low as 200 vehicles per hour. Further performance gains (reduced waiting time of vehicles at the intersection) are obtained as the RV penetration rate increases. When there exist more than 50% RVs in traffic, our method starts to outperform traffic signals on the average waiting time of all vehicles at the intersection. Our method is also robust against both blackout events and sudden RV percentage drops, and enjoys excellent generalizablility, which is illustrated by its successful deployment in two unseen intersections.
Abstract:Semi-supervised Anomaly Detection (AD) is a kind of data mining task which aims at learning features from partially-labeled datasets to help detect outliers. In this paper, we classify existing semi-supervised AD methods into two categories: unsupervised-based and supervised-based, and point out that most of them suffer from insufficient exploitation of labeled data and under-exploration of unlabeled data. To tackle these problems, we propose Deep Anomaly Detection and Search (DADS), which applies Reinforcement Learning (RL) to balance exploitation and exploration. During the training process, the agent searches for possible anomalies with hierarchically-structured datasets and uses the searched anomalies to enhance performance, which in essence draws lessons from the idea of ensemble learning. Experimentally, we compare DADS with several state-of-the-art methods in the settings of leveraging labeled known anomalies to detect both other known anomalies and unknown anomalies. Results show that DADS can efficiently and precisely search anomalies from unlabeled data and learn from them, thus achieving good performance.
Abstract:Timing recovery is critical for synchronizing the clocks at the transmitting and receiving ends of a digital coherent communication system. The core of timing recovery is to determine reliably the current sampling error of the local digitizer so that the timing circuit may lock to a stable operation point. Conventional timing phase detectors need to adapt to the optical fiber channel so that the common effects of this channel, such as chromatic dispersion (CD) and polarization mode dispersion (PMD), on the timing phase extraction must be understood. Here we exploit the cyclostationarity of the optical signal and derive a model for studying the CD and PMD effect. We prove that the CD-adjusted cyclic correlation matrix contains full information about timing and PMD, and the determinant of the matrix is a timing phase detector immune to both CD and PMD. We also obtain other results such as a completely PMD-independent CD estimator, etc. Our analysis is supported by both simulations and experiments over a field implemented optical cable.
Abstract:Dynamic polarization control (DPC) is beneficial for many optical applications. It uses adjustable waveplates to perform automatic polarization tracking and manipulation. Efficient algorithms are essential to realizing an endless polarization control process at high speed. However, the standard gradientbased algorithm is not well analyzed. Here we model the DPC with a Jacobian-based control theory framework that finds a lot in common with robot kinematics. We then give a detailed analysis of the condition of the Stokes vector gradient as a Jacobian matrix. We identify the multi-stage DPC as a redundant system enabling control algorithms with null-space operations. An efficient, reset-free algorithm can be found. We anticipate more customized DPC algorithms to follow the same framework in various optical systems.
Abstract:Interpretability has become an essential topic for artificial intelligence in some high-risk domains such as healthcare, bank and security. For commonly-used tabular data, traditional methods trained end-to-end machine learning models with numerical and categorical data only, and did not leverage human understandable knowledge such as data descriptions. Yet mining human-level knowledge from tabular data and using it for prediction remain a challenge. Therefore, we propose a concept and argumentation based model (CAM) that includes the following two components: a novel concept mining method to obtain human understandable concepts and their relations from both descriptions of features and the underlying data, and a quantitative argumentation-based method to do knowledge representation and reasoning. As a result of it, CAM provides decisions that are based on human-level knowledge and the reasoning process is intrinsically interpretable. Finally, to visualize the purposed interpretable model, we provide a dialogical explanation that contain dominated reasoning path within CAM. Experimental results on both open source benchmark dataset and real-word business dataset show that (1) CAM is transparent and interpretable, and the knowledge inside the CAM is coherent with human understanding; (2) Our interpretable approach can reach competitive results comparing with other state-of-art models.