Abstract:The upcoming next generation of wireless communication is anticipated to revolutionize the conventional functionalities of the network by adding sensing and localization capabilities, low-power communication, wireless brain computer interactions, massive robotics and autonomous systems connection. Furthermore, the key performance indicators expected for the 6G of mobile communications promise challenging operating conditions, such as user data rates of 1 Tbps, end-to-end latency of less than 1 ms, and vehicle speeds of 1000 km per hour. This evolution needs new techniques, not only to improve communications, but also to provide localization and sensing with an efficient use of the radio resources. The goal of INTERACT Working Group 2 is to design novel physical layer technologies that can meet these KPI, by combining the data information from statistical learning with the theoretical knowledge of the transmitted signal structure. Waveforms and coding, advanced multiple-input multiple-output and all the required signal processing, in sub-6-GHz, millimeter-wave bands and upper-mid-band, are considered while aiming at designing these new communications, positioning and localization techniques. This White Paper summarizes our main approaches and contributions.
Abstract:Utilizing the mmWave band can potentially achieve the high data rate needed for realistic and seamless interaction within a virtual reality (VR) application. To this end, beamforming in both the access point (AP) and head-mounted display (HMD) sides is necessary. The main challenge in this use case is the specific and highly dynamic user movement, which causes beam misalignment, degrading the received signal level and potentially leading to outages. This study examines mmWave-based coordinated multi-point networks for VR applications, where two or multiple APs cooperatively transmit the signals to an HMD for connectivity diversity. Instead of using omnireception, we propose dual-beam reception based on the analog beamforming at the HMD, enhancing the receive beamforming gain towards serving APs while achieving diversity. Evaluation using actual HMD movement data demonstrates the effectiveness of our approach, showcasing a reduction in outage rates of up to 13% compared to quasi-omnidirectional reception with two serving APs, and a 17% decrease compared to steerable single-beam reception with a serving AP. Widening the separation angle between two APs can further reduce outage rates due to head rotation as rotations can still be tracked using the steerable multi-beam, albeit at the expense of received signal levels reduction during the non-outage period.
Abstract:3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI. Providing a solution to these applications requires a multifaceted approach that covers scene-centric, object-centric, as well as interaction-centric capabilities. While there exist numerous datasets approaching the former two problems, the task of understanding interactable and articulated objects is underrepresented and only partly covered by current works. In this work, we address this shortcoming and introduce (1) an expertly curated dataset in the Universal Scene Description (USD) format, featuring high-quality manual annotations, for instance, segmentation and articulation on 280 indoor scenes; (2) a learning-based model together with a novel baseline capable of predicting part segmentation along with a full specification of motion attributes, including motion type, articulated and interactable parts, and motion parameters; (3) a benchmark serving to compare upcoming methods for the task at hand. Overall, our dataset provides 8 types of annotations - object and part segmentations, motion types, movable and interactable parts, motion parameters, connectivity, and object mass annotations. With its broad and high-quality annotations, the data provides the basis for holistic 3D scene understanding models. All data is provided in the USD format, allowing interoperability and easy integration with downstream tasks. We provide open access to our dataset, benchmark, and method's source code.
Abstract:In this paper, we introduce our millimeter-wave (mmWave) radio channel measurement for integrated sensing and communication (ISAC) scenarios with distributed links at dual bands in an indoor cavity; we also characterize the channel in delay and azimuth-angular domains for the scenarios with the presence of 1 person with varying locations and facing orientations. In our setting of distributed links with two transmitters and two receivers where each transceiver operates at two bands, we can measure two links whose each transmitter faces to one receiver and thus capable of line-of-sight (LOS) communication; these two links have crossing Fresnel zones. We have another two links capable of capturing the reflectivity from the target presenting in the test area (as well as the background). The numerical results in this paper focus on analyzing the channel with the presence of one person. It is evident that not only the human location, but also the human facing orientation, shall be taken into account when modeling the ISAC channel.
Abstract:We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given the available modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing an object instance) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map embeddings. When images are leveraged, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. The code will be made public.
Abstract:This paper presents a novel pipeline for vital sign monitoring using a 26 GHz multi-beam communication testbed. In context of Joint Communication and Sensing (JCAS), the advanced communication capability at millimeter-wave bands is comparable to the radio resource of radars and is promising to sense the surrounding environment. Being able to communicate and sense the vital sign of humans present in the environment will enable new vertical services of telecommunication, i.e., remote health monitoring. The proposed processing pipeline leverages spatially orthogonal beams to estimate the vital sign - breath rate and heart rate - of single and multiple persons in static scenarios from the raw Channel State Information samples. We consider both monostatic and bistatic sensing scenarios. For monostatic scenario, we employ the phase time-frequency calibration and Discrete Wavelet Transform to improve the performance compared to the conventional Fast Fourier Transform based methods. For bistatic scenario, we use K-means clustering algorithm to extract multi-person vital signs due to the distinct frequency-domain signal feature between single and multi-person scenarios. The results show that the estimated breath rate and heart rate reach below 2 beats per minute (bpm) error compared to the reference captured by on-body sensor for the single-person monostatic sensing scenario with body-transceiver distance up to 2 m, and the two-person bistatic sensing scenario with BS-UE distance up to 4 m. The presented work does not optimize the OFDM waveform parameters for sensing; it demonstrates a promising JCAS proof-of-concept in contact-free vital sign monitoring using mmWave multi-beam communication systems.
Abstract:We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. Further improvements are achieved by graph optimization-based semantic labeling and instance refinement. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics. We also highlight a downfall in the evaluation of recent studies: using the ground truth trajectory as input instead of a SLAM-estimated one substantially affects the accuracy, creating a large gap between the reported results and the actual performance on real-world data.
Abstract:Radio technology enabled contact-free human posture and vital sign estimation is promising for health monitoring. Radio systems at millimeter-wave (mmWave) frequencies advantageously bring large bandwidth, multi-antenna array and beam steering capability. \textit{However}, the human point cloud obtained by mmWave radar and utilized for posture estimation is likely to be sparse and incomplete. Additionally, human's random body movements deteriorate the estimation of breathing and heart rates, therefore the information of the chest location and a narrow radar beam toward the chest are demanded for more accurate vital sign estimation. In this paper, we propose a pipeline aiming to enhance the vital sign estimation performance of mmWave FMCW MIMO radar. The first step is to recognize human body part and posture, where we exploit a trained Convolutional Neural Networks (CNN) to efficiently process the imperfect human form point cloud. The CNN framework outputs the key point of different body parts, and was trained by using RGB image reference and Augmentative Ellipse Fitting Algorithm (AEFA). The next step is to utilize the chest information of the prior estimated human posture for vital sign estimation. While CNN is initially trained based on the frame-by-frame point clouds of human for posture estimation, the vital signs are extracted through beamforming toward the human chest. The numerical results show that this spatial filtering improves the estimation of the vital signs in regard to lowering the level of side harmonics and detecting the harmonics of vital signs efficiently, i.e., peak-to-average power ratio in the harmonics of vital signal is improved up to 0.02 and 0.07dB for the studied cases.
Abstract:Joint Communication and Sensing (JCAS) is envisioned for 6G cellular networks, where sensing the operation environment, especially in presence of humans, is as important as the high-speed wireless connectivity. Sensing, and subsequently recognizing blockage types, is an initial step towards signal blockage avoidance. In this context, we investigate the feasibility of using human motion recognition as a surrogate task for blockage type recognition through a set of hypothesis validation experiments using both qualitative and quantitative analysis (visual inspection and hyperparameter tuning of deep learning (DL) models, respectively). A surrogate task is useful for DL model testing and/or pre-training, thereby requiring a low amount of data to be collected from the eventual JCAS environment. Therefore, we collect and use a small dataset from a 26 GHz cellular multi-user communication device with hybrid beamforming. The data is converted into Doppler Frequency Spectrum (DFS) and used for hypothesis validations. Our research shows that (i) the presence of domain shift between data used for learning and inference requires use of DL models that can successfully handle it, (ii) DFS input data dilution to increase dataset volume should be avoided, (iii) a small volume of input data is not enough for reasonable inference performance, (iv) higher sensing resolution, causing lower sensitivity, should be handled by doing more activities/gestures per frame and lowering sampling rate, and (v) a higher reported sampling rate to STFT during pre-processing may increase performance, but should always be tested on a per learning task basis.
Abstract:Multibeam analog arrays have been proposed for millimeter-wave joint communication and sensing (JCAS). We study multibeam planar arrays for JCAS, providing time division duplex communication and full-duplex sensing with steerable beams. In order to have a large aperture with a narrow beamwidth in the radiation pattern, we propose to design a sparse tiled planar array (STPA) aperture with affordable number of phase shifters. The modular tiling and sparse design of the array are non-convex optimization problems, however, we exploit the fact that the more irregularity of the antenna array geometry, the less the side lobe level. We propose to first solve the optimization by the maximum entropy in the phase centers of tiles in the array; then we perform sparse subarray selection leveraging the geometry of the sunflower array. While maintaining the same spectral efficiency in the communication link as conventional uniform planar array (CUPA), the STPA improves angle of arrival estimation when the line-of-sight path is dominant, e.g., the STPA with 125 elements distinguishes two adjacent targets with 20$^\circ$ difference in the proximity of boresight whereas CUPA cannot. Moreover, the STPA has a 40$\%$ shorter blockage time compared to the CUPA when a blocker moves in the elevation angles.