Abstract:Driven by the development goal of network paradigm and demand for various functions in the sixth-generation (6G) mission-critical Internet-of-Things (MC-IoT), we foresee a goal-oriented integration of sensing, communication, computing, and control (GIS3C) in this paper. We first provide an overview of the tasks, requirements, and challenges of MC-IoT. Then we introduce an end-to-end GIS3C architecture, in which goal-oriented communication is leveraged to bridge and empower sensing, communication, control, and computing functionalities. By revealing the interplay among multiple subsystems in terms of key performance indicators and parameters, this paper introduces unified metrics, i.e., task completion effectiveness and cost, to facilitate S3C co-design in MC-IoT. The preliminary results demonstrate the benefits of GIS3C in improving task completion effectiveness while reducing costs. We also identify and highlight the gaps and challenges in applying GIS3C in the future 6G networks.
Abstract:In last decades, dynamic resource programming in partial resource domains has been extensively investigated for single time slot optimizations. However, with the emerging real-time media applications in fifth-generation communications, their new quality of service requirements are often measured in temporal dimension. This requires multistage optimization for full resource domain dynamic programming. Taking experience rate as a typical temporal multistage metric, we jointly optimize time, frequency, space and power domains resource for multistage optimization. To strike a good tradeoff between system performance and computational complexity, we first transform the formulated mixed integer non-linear constraints into equivalent convex second order cone constraints, by exploiting the coupling effect among the resources. Leveraging the concept of structural sparsity, the objective of max-min experience rate is given as a weighted 1-norm term associated with the precoding matrix. Finally, a low-complexity iterative algorithm is proposed for full resource domain programming, aided by another simple conic optimization for obtaining its feasible initial result. Simulation verifies that our design significantly outperform the benchmarks while maintaining a fast convergence rate, shedding light on full domain dynamic resource programming of multistage optimizations.
Abstract:Orthogonal time sequency multiplexing (OTSM) has been recently proposed as a single-carrier (SC) waveform offering similar bit error rate (BER) to multi-carrier orthogonal time frequency space (OTFS) modulation in doubly-spread channels under high mobilities; however, with much lower complexity making OTSM a promising candidate for low-power millimeter-wave (mmWave) vehicular communications in 6G wireless networks. In this paper, the performance of OTSM-based homodyne transceiver is explored under hardware impairments (HIs) including in-phase and quadrature imbalance (IQI), direct current offset (DCO), phase noise, power amplifier non-linearity, carrier frequency offset, and synchronization timing offset. First, the discrete-time baseband signal model is obtained in vector form under the mentioned HIs. Then, the system input-output relations are derived in time, delay-time, and delay-sequency (DS) domains in which the parameters of HIs are incorporated. Analytical studies demonstrate that noise stays white Gaussian and effective channel matrix is sparse in the DS domain under HIs. Also, DCO appears as a DC signal at receiver interfering with only the zero sequency over all delay taps in the DS domain; however, IQI redounds to self-conjugated fully-overlapping sequency interference. Simulation results reveal the fact that with no HI compensation (HIC), not only OTSM outperforms plain SC waveform but it performs close to uncompensated OTFS system; however, HIC is essentially needed for OTSM systems operating in mmWave and beyond frequency bands.
Abstract:Accurate and robust trajectory prediction of neighboring agents is critical for autonomous vehicles traversing in complex scenes. Most methods proposed in recent years are deep learning-based due to their strength in encoding complex interactions. However, unplausible predictions are often generated since they rely heavily on past observations and cannot effectively capture the transient and contingency interactions from sparse samples. In this paper, we propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction, to cope with the challenge of predicting motions shaped by multi-scale interactions. In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions at intermediate and global levels. In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage. To emulate the motion planning procedure so as to produce trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO) incorporated with a vehicle kinematics model is devised to plan motions under the dominant influence of microscopic interactions. A multi-objective reward is designed to balance between agent-centric accuracy and scene-wise compatibility. Experimental results show that our proposal matches the state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by the visualized results that the hierarchical learning framework captures the multi-scale interactions and improves the feasibility and compliance of the predicted trajectories.
Abstract:In this paper, we investigate unmanned aerial vehicle (UAV) assisted communication systems that require quasi-balanced data rates in uplink (UL) and downlink (DL), as well as users' heterogeneous traffic. To the best of our knowledge, this is the first work to explicitly investigate joint UL-DL optimization for UAV assisted systems under heterogeneous requirements. A hybrid-mode multiple access (HMMA) scheme is proposed toward heterogeneous traffic, where non-orthogonal multiple access (NOMA) targets high average data rate, while orthogonal multiple access (OMA) aims to meet users' instantaneous rate demands by compensating for their rates. HMMA enables a higher degree of freedom in multiple access and achieves a superior minimum average rate among users than the UAV assisted NOMA or OMA schemes. Under HMMA, a joint UL-DL resource allocation algorithm is proposed with a closed-form optimal solution for UL/DL power allocation to achieve quasi-balanced average rates for UL and DL. Furthermore, considering the error propagation in successive interference cancellation (SIC) of NOMA, an enhanced-HMMA scheme is proposed, which demonstrates high robustness against SIC error and a higher minimum average rate than the HMMA scheme.
Abstract:In an episodic Markov Decision Process (MDP) problem, an online algorithm chooses from a set of actions in a sequence of $H$ trials, where $H$ is the episode length, in order to maximize the total payoff of the chosen actions. Q-learning, as the most popular model-free reinforcement learning (RL) algorithm, directly parameterizes and updates value functions without explicitly modeling the environment. Recently, [Jin et al. 2018] studies the sample complexity of Q-learning with finite states and actions. Their algorithm achieves nearly optimal regret, which shows that Q-learning can be made sample efficient. However, MDPs with large discrete states and actions [Silver et al. 2016] or continuous spaces [Mnih et al. 2013] cannot learn efficiently in this way. Hence, it is critical to develop new algorithms to solve this dilemma with provable guarantee on the sample complexity. With this motivation, we propose a novel algorithm that works for MDPs with a more general setting, which has infinitely many states and actions and assumes that the payoff function and transition kernel are Lipschitz continuous. We also provide corresponding theory justification for our algorithm. It achieves the regret $\tilde{\mathcal{O}}(K^{\frac{d+1}{d+2}}\sqrt{H^3}),$ where $K$ denotes the number of episodes and $d$ denotes the dimension of the joint space. To the best of our knowledge, this is the first analysis in the model-free setting whose established regret matches the lower bound up to a logarithmic factor.
Abstract:We consider the Lipschitz bandit optimization problem with an emphasis on practical efficiency. Although there is rich literature on regret analysis of this type of problem, e.g., [Kleinberg et al. 2008, Bubeck et al. 2011, Slivkins 2014], their proposed algorithms suffer from serious practical problems including extreme time complexity and dependence on oracle implementations. With this motivation, we propose a novel algorithm with an Upper Confidence Bound (UCB) exploration, namely Tree UCB-Hoeffding, using adaptive partitions. Our partitioning scheme is easy to implement and does not require any oracle settings. With a tree-based search strategy, the total computational cost can be improved to $\mathcal{O}(T\log T)$ for the first $T$ iterations. In addition, our algorithm achieves the regret lower bound up to a logarithmic factor.