Chalmers University of Technology
Abstract:The upper mid-band (or FR3, spanning 6-24 GHz) is a crucial frequency range for next-generation mobile networks, offering a favorable balance between coverage and spectrum efficiency. From another perspective, the systems operating in the near-field in both indoor environment and outdoor environments can support line-of-sight multiple input multiple output (MIMO) communications and be beneficial from the FR3 bands. In this paper, a novel method is proposed to measure the near-field parameters leveraging a recently developed reflection model where the near-field paths can be described by their image points. We show that these image points can be accurately estimated via triangulation from multiple measurements with a small number of antennas in each measurement, thus affording a low-cost procedure for near-field multi-path parameter extraction. A preliminary experimental apparatus is presented comprising 2 transmit and 2 receive antennas mounted on a linear track to measure the 2x2 MIMO channel at various displacements. The system uses a recently-developed wideband radio frequency (RF) transceiver board with fast frequency switching, an FPGA for fast baseband processing, and a new parameter extraction method to recover paths and spherical characteristics from the multiple 2x2 measurements.
Abstract:The following paper provides a multi-band channel measurement analysis on the frequency range (FR)3. This study focuses on the FR3 low frequencies 6.5 GHz and 8.75 GHz with a setup tailored to the context of integrated sensing and communication (ISAC), where the data are collected with and without the presence of a target. A method based on multiple signal classification (MUSIC) is used to refine the delays of the channel impulse response estimates. The results reveal that the channel at the lower frequency 6.5 GHz has additional distinguishable multipath components in the presence of the target, while the one associated with the higher frequency 8.75 GHz has more blockage. The set of results reported in this paper serves as a benchmark for future multi-band studies in the FR3 spectrum.
Abstract:Embodied reference understanding is crucial for intelligent agents to predict referents based on human intention through gesture signals and language descriptions. This paper introduces the Attention-Dynamic DINO, a novel framework designed to mitigate misinterpretations of pointing gestures across various interaction contexts. Our approach integrates visual and textual features to simultaneously predict the target object's bounding box and the attention source in pointing gestures. Leveraging the distance-aware nature of nonverbal communication in visual perspective taking, we extend the virtual touch line mechanism and propose an attention-dynamic touch line to represent referring gesture based on interactive distances. The combination of this distance-aware approach and independent prediction of the attention source, enhances the alignment between objects and the gesture represented line. Extensive experiments on the YouRefIt dataset demonstrate the efficacy of our gesture information understanding method in significantly improving task performance. Our model achieves 76.4% accuracy at the 0.25 IoU threshold and, notably, surpasses human performance at the 0.75 IoU threshold, marking a first in this domain. Comparative experiments with distance-unaware understanding methods from previous research further validate the superiority of the Attention-Dynamic Touch Line across diverse contexts.
Abstract:Digital twinning is becoming increasingly vital in the design and real-time control of future wireless networks by providing precise cost-effective simulations, predictive insights, and real-time data integration. This paper explores the application of digital twinning in optimizing wireless communication systems within urban environments, where building arrangements can critically impact network performances. We develop a digital twin platform to simulate and analyze how factors such as building positioning, base station placement, and antenna design influence wireless propagation. The ray-tracing software package of Matlab is compared with Remcom Wireless InSite. Using a realistic radiation pattern of a base transceiver station (BTS) antenna, ray tracing simulations for signal propagation and interactions in urban landscapes are then extensively examined. By analyzing radio heat maps alongside antenna patterns, we gain valuable insights into optimizing wireless deployment strategies. This study highlights the potential of digital twinning as a critical tool for urban planners and network engineers.
Abstract:The integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. However, this combination often struggles with capturing semantic information effectively. Moreover, relying solely on point features within regions of interest can lead to information loss and limitations in local feature representation. To tackle these challenges, we propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN). PVAFN leverages an attention mechanism to improve multi-modal feature fusion during the feature extraction phase. In the refinement stage, it utilizes a multi-pooling strategy to integrate both multi-scale and region-specific information effectively. The point-voxel attention mechanism adaptively combines point cloud and voxel-based Bird's-Eye-View (BEV) features, resulting in richer object representations that help to reduce false detections. Additionally, a multi-pooling enhancement module is introduced to boost the model's perception capabilities. This module employs cluster pooling and pyramid pooling techniques to efficiently capture key geometric details and fine-grained shape structures, thereby enhancing the integration of local and global features. Extensive experiments on the KITTI and Waymo datasets demonstrate that the proposed PVAFN achieves competitive performance. The code and models will be available.
Abstract:In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy observations caused by occlusion, acoustic noise, and sensor failures. Especially when there is missing data in multiple modalities, the performance of existing multi-modal fusion methods tends to decrease. To this end, we propose a Global-Local Distillation-based Tracker (GLDTracker) for robust audio-visual speaker tracking. GLDTracker is driven by a teacher-student distillation model, enabling the flexible fusion of incomplete information from each modality. The teacher network processes global signals captured by camera and microphone arrays, and the student network handles local information subject to visual occlusion and missing audio channels. By transferring knowledge from teacher to student, the student network can better adapt to complex dynamic scenes with incomplete observations. In the student network, a global feature reconstruction module based on the generative adversarial network is constructed to reconstruct global features from feature embedding with missing local information. Furthermore, a multi-modal multi-level fusion attention is introduced to integrate the incomplete feature and the reconstructed feature, leveraging the complementarity and consistency of audio-visual and global-local features. Experimental results on the AV16.3 dataset demonstrate that the proposed GLDTracker outperforms existing state-of-the-art audio-visual trackers and achieves leading performance on both standard and incomplete modalities datasets, highlighting its superiority and robustness in complex conditions. The code and models will be available.
Abstract:The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operate at multiple frequency bands. Data-driven methods, especially generative adversarial networks (GANs), can capture the intricate relationships among data samples, and provide an appropriate tool for FR3 channel modeling. In this work, we present the architecture, link state model, and path generative network of GAN-based FR3 channel modeling. The comparison of our model greatly matches the ray-tracing simulated data.
Abstract:Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple output (D-MIMO) systems deploy a large number of distributed nodes and efficiently control them, making distributed integrated sensing and communications (ISAC) possible. In this paper, we investigate ISAC in D-MIMO through the lens of different design architectures and deployments, revealing both conflicts and synergies. In addition, simulation and demonstration results reveal both opportunities and challenges towards the implementation of ISAC in D-MIMO.
Abstract:The study of cooperation within social dilemmas has long been a fundamental topic across various disciplines, including computer science and social science. Recent advancements in Artificial Intelligence (AI) have significantly reshaped this field, offering fresh insights into understanding and enhancing cooperation. This survey examines three key areas at the intersection of AI and cooperation in social dilemmas. First, focusing on multi-agent cooperation, we review the intrinsic and external motivations that support cooperation among rational agents, and the methods employed to develop effective strategies against diverse opponents. Second, looking into human-agent cooperation, we discuss the current AI algorithms for cooperating with humans and the human biases towards AI agents. Third, we review the emergent field of leveraging AI agents to enhance cooperation among humans. We conclude by discussing future research avenues, such as using large language models, establishing unified theoretical frameworks, revisiting existing theories of human cooperation, and exploring multiple real-world applications.
Abstract:Millimeter wave (mmWave) multiple-input-multi-output (MIMO) is now a reality with great potential for further improvement. We study full-duplex transmissions as an effective way to improve mmWave MIMO systems. Compared to half-duplex systems, full-duplex transmissions may offer higher data rates and lower latency. However, full-duplex transmission is hindered by self-interference (SI) at the receive antennas, and SI channel estimation becomes a crucial step to make the full-duplex systems feasible. In this paper, we address the problem of channel estimation in full-duplex mmWave MIMO systems using neural networks (NNs). Our approach involves sharing pilot resources between user equipments (UEs) and transmit antennas at the base station (BS), aiming to reduce the pilot overhead in full-duplex systems and to achieve a comparable level to that of a half-duplex system. Additionally, in the case of separate antenna configurations in a full-duplex BS, providing channel estimates of transmit antenna (TX) arrays to the downlink UEs poses another challenge, as the TX arrays are not capable of receiving pilot signals. To address this, we employ an NN to map the channel from the downlink UEs to the receive antenna (RX) arrays to the channel from the TX arrays to the downlink UEs. We further elaborate on how NNs perform the estimation with different architectures, (e.g., different numbers of hidden layers), the introduction of non-linear distortion (e.g., with a 1-bit analog-to-digital converter (ADC)), and different channel conditions (e.g., low-correlated and high-correlated channels). Our work provides novel insights into NN-based channel estimators.