Abstract:Intelligent vehicular communication with vehicle road collaboration capability is a key technology enabled by 6G, and the integration of various visual sensors on vehicles and infrastructures plays a crucial role. Moreover, accurate channel prediction is foundational to realizing intelligent vehicular communication. Traditional methods are still limited by the inability to balance accuracy and operability based on substantial spectrum resource consumption and highly refined description of environment. Therefore, leveraging out-of-band information introduced by visual sensors provides a new solution and is increasingly applied across various communication tasks. In this paper, we propose a computer vision (CV)-based prediction model for vehicular communications, realizing accurate channel characterization prediction including path loss, Rice K-factor and delay spread based on image segmentation. First, we conduct extensive vehicle-to-infrastructure measurement campaigns, collecting channel and visual data from various street intersection scenarios. The image-channel dataset is generated after a series of data post-processing steps. Image data consists of individual segmentation of target user using YOLOv8 network. Subsequently, established dataset is used to train and test prediction network ResNet-32, where segmented images serve as input of network, and various channel characteristics are treated as labels or target outputs of network. Finally, self-validation and cross-validation experiments are performed. The results indicate that models trained with segmented images achieve high prediction accuracy and remarkable generalization performance across different streets and target users. The model proposed in this paper offers novel solutions for achieving intelligent channel prediction in vehicular communications.
Abstract:In the realm of online advertising, optimizing conversions is crucial for delivering relevant products to users and enhancing business outcomes. Predicting conversion events is challenging due to variable delays between user interactions, such as impressions or clicks, and the actual conversions. These delays differ significantly across various advertisers and products, necessitating distinct optimization time windows for targeted conversions. To address this, we introduce a novel approach named the \textit{Personalized Interpolation} method, which innovatively builds upon existing fixed conversion window models to estimate flexible conversion windows. This method allows for the accurate estimation of conversions across a variety of delay ranges, thus meeting the diverse needs of advertisers without increasing system complexity. To validate the efficacy of our proposed method, we conducted comprehensive experiments using ads conversion model. Our experiments demonstrate that this method not only achieves high prediction accuracy but also does so more efficiently than other existing solutions. This validation underscores the potential of our Personalized Interpolation method to significantly enhance conversion optimization in real-world online advertising systems, promising improved targeting and effectiveness in advertising strategies.
Abstract:Gaussian Splatting (GS) and Neural Radiance Fields (NeRF) are two groundbreaking technologies that have revolutionized the field of Novel View Synthesis (NVS), enabling immersive photorealistic rendering and user experiences by synthesizing multiple viewpoints from a set of images of sparse views. The potential applications of NVS, such as high-quality virtual and augmented reality, detailed 3D modeling, and realistic medical organ imaging, underscore the importance of quality assessment of NVS methods from the perspective of human perception. Although some previous studies have explored subjective quality assessments for NVS technology, they still face several challenges, especially in NVS methods selection, scenario coverage, and evaluation methodology. To address these challenges, we conducted two subjective experiments for the quality assessment of NVS technologies containing both GS-based and NeRF-based methods, focusing on dynamic and real-world scenes. This study covers 360{\deg}, front-facing, and single-viewpoint videos while providing a richer and greater number of real scenes. Meanwhile, it's the first time to explore the impact of NVS methods in dynamic scenes with moving objects. The two types of subjective experiments help to fully comprehend the influences of different viewing paths from a human perception perspective and pave the way for future development of full-reference and no-reference quality metrics. In addition, we established a comprehensive benchmark of various state-of-the-art objective metrics on the proposed database, highlighting that existing methods still struggle to accurately capture subjective quality. The results give us some insights into the limitations of existing NVS methods and may promote the development of new NVS methods.
Abstract:Transformer models encounter challenges in scaling hidden dimensions efficiently, as uniformly increasing them inflates computational and memory costs while failing to emphasize the most relevant features for each token. For further understanding, we study hidden dimension sparsity and observe that trained Transformers utilize only a small fraction of token dimensions, revealing an "activation flow" pattern. Notably, there are shared sub-dimensions with sustained activation across multiple consecutive tokens and specialized sub-dimensions uniquely activated for each token. To better model token-relevant sub-dimensions, we propose MoHD (Mixture of Hidden Dimensions), a sparse conditional activation architecture. Particularly, MoHD employs shared sub-dimensions for common token features and a routing mechanism to dynamically activate specialized sub-dimensions. To mitigate potential information loss from sparsity, we design activation scaling and group fusion mechanisms to preserve activation flow. In this way, MoHD expands hidden dimensions with negligible increases in computation or parameters, efficient training and inference while maintaining performance. Evaluations across 10 NLP tasks show that MoHD surpasses Vanilla Transformers in parameter efficiency and task performance. It achieves 1.7% higher performance with 50% fewer activation parameters and 3.7% higher performance with a 3x parameter expansion at constant activation cost. MOHD offers a new perspective for scaling the model, showcasing the potential of hidden dimension sparsity to boost efficiency
Abstract:Although Large Language Models (LLMs) have demonstrated remarkable capabilities, their massive parameter counts and associated extensive computing make LLMs' deployment the main part of carbon emission from nowadays AI applications. Compared to modern GPUs like H$100$, it would be significantly carbon-sustainable if we could leverage old-fashioned GPUs such as M$40$ (as shown in Figure 1, M$40$ only has one third carbon emission of H$100$'s) for LLM servings. However, the limited High Bandwidth Memory (HBM) available on such GPU often cannot support the loading of LLMs due to the gigantic model size and intermediate activation data, making their serving challenging. For instance, a LLaMA2 model with $70$B parameters typically requires $128$GB for inference, which substantially surpasses $24$GB HBM in a $3090$ GPU and remains infeasible even considering the additional $64$GB DRAM. To address this challenge, this paper proposes a mixed-precision with a model modularization algorithm to enable LLM inference on outdated hardware with resource constraints. (The precision denotes the numerical precision like FP16, INT8, INT4) and multi-level caching (M2Cache).) Specifically, our M2Cache first modulizes neurons in LLM and creates their importance ranking. Then, it adopts a dynamic sparse mixed-precision quantization mechanism in weight space to reduce computational demands and communication overhead at each decoding step. It collectively lowers the operational carbon emissions associated with LLM inference. Moreover, M2Cache introduces a three-level cache management system with HBM, DRAM, and SSDs that complements the dynamic sparse mixed-precision inference. To enhance communication efficiency, M2Cache maintains a neuron-level mixed-precision LRU cache in HBM, a larger layer-aware cache in DRAM, and a full model in SSD.
Abstract:Recently, deep learning enabled semantic communications have been developed to understand transmission content from semantic level, which realize effective and accurate information transfer. Aiming to the vision of sixth generation (6G) networks, wireless devices are expected to have native perception and intelligent capabilities, which associate wireless channel with surrounding environments from physical propagation dimension to semantic information dimension. Inspired by these, we aim to provide a new paradigm on wireless channel from semantic level. A channel semantic model and its characterization framework are proposed in this paper. Specifically, a channel semantic model composes of status semantics, behavior semantics and event semantics. Based on actual channel measurement at 28 GHz, as well as multi-mode data, example results of channel semantic characterization are provided and analyzed, which exhibits reasonable and interpretable semantic information.
Abstract:Integrated sensing and communications (ISAC) is a potential technology of 6G, aiming to enable end-to-end information processing ability and native perception capability for future communication systems. As an important part of the ISAC application scenarios, ISAC aided vehicle-to-everything (V2X) can improve the traffic efficiency and safety through intercommunication and synchronous perception. It is necessary to carry out measurement, characterization, and modeling for vehicular ISAC channels as the basic theoretical support for system design. In this paper, dynamic vehicular ISAC channel measurements at 28 GHz are carried out and provide data for the characterization of non-stationarity characteristics. Based on the actual measurements, this paper analyzes the time-varying PDPs, RMSDS and non-stationarity characteristics of front, lower front, left and right perception directions in a complicated V2X scenarios. The research in this paper can enrich the investigation of vehicular ISAC channels and enable the analysis and design of vehicular ISAC systems.
Abstract:Integrated sensing and communication (ISAC) is a promising technology for 6G, with the goal of providing end-to-end information processing and inherent perception capabilities for future communication systems. Within ISAC emerging application scenarios, vehicular ISAC technologies have the potential to enhance traffic efficiency and safety through integration of communication and synchronized perception abilities. To establish a foundational theoretical support for vehicular ISAC system design and standardization, it is necessary to conduct channel measurements, and modeling to obtain a deep understanding of the radio propagation. In this paper, a dynamic statistical channel model is proposed for vehicular ISAC scenarios, incorporating Sensing Multipath Components (S-MPCs) and Clutter Multipath Components (C-MPCs), which are identified by the proposed tracking algorithm. Based on actual vehicular ISAC channel measurements at 28 GHz, time-varying sensing characteristics in front, left, and right directions are investigated. To model the dynamic evolution process of channel, number of new S-MPCs, lifetimes, initial power and delay positions, dynamic variations within their lifetimes, clustering, power decay, and fading of C-MPCs are statistically characterized. Finally, the paper provides implementation of dynamic vehicular ISAC model and validates it by comparing key simulation statistics between measurements and simulations.
Abstract:The emerging 6G network envisions integrated sensing and communication (ISAC) as a promising solution to meet growing demand for native perception ability. To optimize and evaluate ISAC systems and techniques, it is crucial to have an accurate and realistic wireless channel model. However, some important features of ISAC channels have not been well characterized, for example, most existing ISAC channel models consider communication channels and sensing channels independently, whereas ignoring correlation under the consistent environment. Moreover, sensing channels have not been well modeled in the existing standard-level channel models. Therefore, in order to better model ISAC channel, a cluster-based statistical channel model is proposed in this paper, which is based on measurements conducted at 28 GHz. In the proposed model, a new framework based on 3GPP standard is proposed, which includes communication clusters and sensing clusters. Clustering and tracking algorithms are used to extract and analyze ISAC channel characteristics. Furthermore, some special sensing cluster structures such as shared sensing cluster, newborn sensing cluster, etc., are defined to model correlation and difference between communication and sensing channels. Finally, accuracy of the proposed model is validated based on measurements and simulations.
Abstract:Dividing ads ranking system into retrieval, early, and final stages is a common practice in large scale ads recommendation to balance the efficiency and accuracy. The early stage ranking often uses efficient models to generate candidates out of a set of retrieved ads. The candidates are then fed into a more computationally intensive but accurate final stage ranking system to produce the final ads recommendation. As the early and final stage ranking use different features and model architectures because of system constraints, a serious ranking consistency issue arises where the early stage has a low ads recall, i.e., top ads in the final stage are ranked low in the early stage. In order to pass better ads from the early to the final stage ranking, we propose a multi-task learning framework for early stage ranking to capture multiple final stage ranking components (i.e. ads clicks and ads quality events) and their task relations. With our multi-task learning framework, we can not only achieve serving cost saving from the model consolidation, but also improve the ads recall and ranking consistency. In the online A/B testing, our framework achieves significantly higher click-through rate (CTR), conversion rate (CVR), total value and better ads-quality (e.g. reduced ads cross-out rate) in a large scale industrial ads ranking system.