Institute of Marine Biology and Pharmacology, Ocean College, Zhejiang University
Abstract:Neural View Synthesis (NVS) has demonstrated efficacy in generating high-fidelity dense viewpoint videos using a image set with sparse views. However, existing quality assessment methods like PSNR, SSIM, and LPIPS are not tailored for the scenes with dense viewpoints synthesized by NVS and NeRF variants, thus, they often fall short in capturing the perceptual quality, including spatial and angular aspects of NVS-synthesized scenes. Furthermore, the lack of dense ground truth views makes the full reference quality assessment on NVS-synthesized scenes challenging. For instance, datasets such as LLFF provide only sparse images, insufficient for complete full-reference assessments. To address the issues above, we propose NeRF-NQA, the first no-reference quality assessment method for densely-observed scenes synthesized from the NVS and NeRF variants. NeRF-NQA employs a joint quality assessment strategy, integrating both viewwise and pointwise approaches, to evaluate the quality of NVS-generated scenes. The viewwise approach assesses the spatial quality of each individual synthesized view and the overall inter-views consistency, while the pointwise approach focuses on the angular qualities of scene surface points and their compound inter-point quality. Extensive evaluations are conducted to compare NeRF-NQA with 23 mainstream visual quality assessment methods (from fields of image, video, and light-field assessment). The results demonstrate NeRF-NQA outperforms the existing assessment methods significantly and it shows substantial superiority on assessing NVS-synthesized scenes without references. An implementation of this paper are available at https://github.com/VincentQQu/NeRF-NQA.
Abstract:Event-stream representation is the first step for many computer vision tasks using event cameras. It converts the asynchronous event-streams into a formatted structure so that conventional machine learning models can be applied easily. However, most of the state-of-the-art event-stream representations are manually designed and the quality of these representations cannot be guaranteed due to the noisy nature of event-streams. In this paper, we introduce a data-driven approach aiming at enhancing the quality of event-stream representations. Our approach commences with the introduction of a new event-stream representation based on spatial-temporal statistics, denoted as EvRep. Subsequently, we theoretically derive the intrinsic relationship between asynchronous event-streams and synchronous video frames. Building upon this theoretical relationship, we train a representation generator, RepGen, in a self-supervised learning manner accepting EvRep as input. Finally, the event-streams are converted to high-quality representations, termed as EvRepSL, by going through the learned RepGen (without the need of fine-tuning or retraining). Our methodology is rigorously validated through extensive evaluations on a variety of mainstream event-based classification and optical flow datasets (captured with various types of event cameras). The experimental results highlight not only our approach's superior performance over existing event-stream representations but also its versatility, being agnostic to different event cameras and tasks.
Abstract:In multimedia broadcasting, no-reference image quality assessment (NR-IQA) is used to indicate the user-perceived quality of experience (QoE) and to support intelligent data transmission while optimizing user experience. This paper proposes an improved no-reference light field image quality assessment (NR-LFIQA) metric for future immersive media broadcasting services. First, we extend the concept of depthwise separable convolution (DSC) to the spatial domain of light field image (LFI) and introduce "light field depthwise separable convolution (LF-DSC)", which can extract the LFI's spatial features efficiently. Second, we further theoretically extend the LF-DSC to the angular space of LFI and introduce the novel concept of "light field anglewise separable convolution (LF-ASC)", which is capable of extracting both the spatial and angular features for comprehensive quality assessment with low complexity. Third, we define the spatial and angular feature estimations as auxiliary tasks in aiding the primary NR-LFIQA task by providing spatial and angular quality features as hints. To the best of our knowledge, this work is the first exploration of deep auxiliary learning with spatial-angular hints on NR-LFIQA. Experiments were conducted in mainstream LFI datasets such as Win5-LID and SMART with comparisons to the mainstream full reference IQA metrics as well as the state-of-the-art NR-LFIQA methods. The experimental results show that the proposed metric yields overall 42.86% and 45.95% smaller prediction errors than the second-best benchmarking metric in Win5-LID and SMART, respectively. In some challenging cases with particular distortion types, the proposed metric can reduce the errors significantly by more than 60%.
Abstract:Recent advancements in 3D Gaussian Splatting (3DGS) have substantially improved novel view synthesis, enabling high-quality reconstruction and real-time rendering. However, blurring artifacts, such as floating primitives and over-reconstruction, remain challenging. Current methods address these issues by refining scene structure, enhancing geometric representations, addressing blur in training images, improving rendering consistency, and optimizing density control, yet the role of kernel design remains underexplored. We identify the soft boundaries of Gaussian ellipsoids as one of the causes of these artifacts, limiting detail capture in high-frequency regions. To bridge this gap, we introduce 3D Linear Splatting (3DLS), which replaces Gaussian kernels with linear kernels to achieve sharper and more precise results, particularly in high-frequency regions. Through evaluations on three datasets, 3DLS demonstrates state-of-the-art fidelity and accuracy, along with a 30% FPS improvement over baseline 3DGS. The implementation will be made publicly available upon acceptance.
Abstract:Remote sensing (RS) involves the acquisition of data about objects or areas from a distance, primarily to monitor environmental changes, manage resources, and support planning and disaster response. A significant challenge in RS segmentation is the scarcity of high-quality labeled images due to the diversity and complexity of RS image, which makes pixel-level annotation difficult and hinders the development of effective supervised segmentation algorithms. To solve this problem, we propose Adaptively Augmented Consistency Learning (AACL), a semi-supervised segmentation framework designed to enhances RS segmentation accuracy under condictions of limited labeled data. AACL extracts additional information embedded in unlabeled images through the use of Uniform Strength Augmentation (USAug) and Adaptive Cut-Mix (AdaCM). Evaluations across various RS datasets demonstrate that AACL achieves competitive performance in semi-supervised segmentation, showing up to a 20% improvement in specific categories and 2% increase in overall performance compared to state-of-the-art frameworks.
Abstract:Rydberg atom-based antennas exploit the quantum properties of highly excited Rydberg atoms, providing unique advantages over classical antennas, such as high sensitivity, broad frequency range, and compact size. Despite the increasing interests in their applications in antenna and communication engineering, two key properties, involving the lack of polarization multiplexing and isotropic reception without mutual coupling, remain unexplored in the analysis of Rydberg atom-based spatial multiplexing, i.e., multiple-input and multiple-output (MIMO), communications. Generally, the design considerations for any antenna, even for atomic ones, can be extracted to factors such as radiation patterns, efficiency, and polarization, allowing them to be seamlessly integrated into existing system models. In this letter, we extract the antenna properties from relevant quantum characteristics, enabling electromagnetic modeling and capacity analysis of Rydberg MIMO systems in both far-field and near-field scenarios. By employing ray-based method for far-field analysis and dyadic Green's function for near-field calculation, our results indicate that Rydberg atom-based antenna arrays offer specific advantages over classical dipole-type arrays in single-polarization MIMO communications.
Abstract:In high-dynamic low earth orbit (LEO) satellite communication (SATCOM) systems, frequent channel state information (CSI) acquisition consumes a large number of pilots, which is intolerable in resource-limited SATCOM systems. To tackle this problem, we propose to track the state-dependent parameters including Doppler shift and channel angles, by exploiting the physical and approximate on-orbit mobility characteristics for LEO satellite and ground users (GUs), respectively. As a prerequisite for tracking, we formulate the state evolution models for kinematic (state) parameters of both satellite and GUs, along with the measurement models that describe the relationship between the state-dependent parameters and states. Then the rough estimation of state-dependent parameters is initially conducted, which is used as the measurement results in the subsequent state tracking. Concurrently, the measurement error covariance is predicted based on the formulated Cram$\acute{\text{e}}$r-Rao lower bound (CRLB). Finally, with the extended Kalman filter (EKF)-based state tracking as the bridge, the Doppler shift and channel angles can be further updated and the CSI can also be acquired. Simulation results show that compared to the rough estimation methods, the proposed joint parameter and channel tracking (JPCT) algorithm performs much better in the estimation of state-dependent parameters. Moreover, as to the CSI acquisition, the proposed algorithm can utilize a shorter pilot sequence than benchmark methods under a given estimation accuracy.
Abstract:Deep neural networks (DNNs) are powerful for cognitive tasks such as image classification, object detection, and scene segmentation. One drawback however is the significant high computational complexity and memory consumption, which makes them unfeasible to run real-time on embedded platforms because of the limited hardware resources. Block floating point (BFP) quantization is one of the representative compression approaches for reducing the memory and computational burden owing to their capability to effectively capture the broad data distribution of DNN models. Unfortunately, prior works on BFP-based quantization empirically choose the block size and the precision that preserve accuracy. In this paper, we develop a BFP-based bitwidth-aware analytical modeling framework (called ``BitQ'') for the best BFP implementation of DNN inference on embedded platforms. We formulate and resolve an optimization problem to identify the optimal BFP block size and bitwidth distribution by the trade-off of both accuracy and performance loss. Experimental results show that compared with an equal bitwidth setting, the BFP DNNs with optimized bitwidth allocation provide efficient computation, preserving accuracy on famous benchmarks. The source code and data are available at https://github.com/Cheliosoops/BitQ.
Abstract:Recent advancements in event-based zero-shot object recognition have demonstrated promising results. However, these methods heavily depend on extensive training and are inherently constrained by the characteristics of CLIP. To the best of our knowledge, this research is the first study to explore the understanding capabilities of large language models (LLMs) for event-based visual content. We demonstrate that LLMs can achieve event-based object recognition without additional training or fine-tuning in conjunction with CLIP, effectively enabling pure zero-shot event-based recognition. Particularly, we evaluate the ability of GPT-4o / 4turbo and two other open-source LLMs to directly recognize event-based visual content. Extensive experiments are conducted across three benchmark datasets, systematically assessing the recognition accuracy of these models. The results show that LLMs, especially when enhanced with well-designed prompts, significantly improve event-based zero-shot recognition performance. Notably, GPT-4o outperforms the compared models and exceeds the recognition accuracy of state-of-the-art event-based zero-shot methods on N-ImageNet by five orders of magnitude. The implementation of this paper is available at \url{https://github.com/ChrisYu-Zz/Pure-event-based-recognition-based-LLM}.
Abstract:Holographic multiple-input and multiple-output (MIMO) communications introduce innovative antenna array configurations, such as dense arrays and volumetric arrays, which offer notable advantages over conventional planar arrays with half-wavelength element spacing. However, accurately assessing the performance of these new holographic MIMO systems necessitates careful consideration of channel matrix normalization, as it is influenced by array gain, which, in turn, depends on the array topology. Traditional normalization methods may be insufficient for assessing these advanced array topologies, potentially resulting in misleading or inaccurate evaluations. In this study, we propose electromagnetic normalization approaches for the channel matrix that accommodate arbitrary array topologies, drawing on the array gains from analytical, physical, and full-wave methods. Additionally, we introduce a normalization method for near-field MIMO channels based on a rigorous dyadic Green's function approach, which accounts for potential losses of gain at near field. Finally, we perform capacity analyses under quasi-static, ergodic, and near-field conditions, through adopting the proposed normalization techniques. Our findings indicate that channel matrix normalization should reflect the realized gains of the antenna array in target directions. Failing to accurately normalize the channel matrix can result in errors when evaluating the performance limits and benefits of unconventional holographic array topologies, potentially compromising the optimal design of holographic MIMO systems.