Abstract:The glyphic writing system of Chinese incorporates information-rich visual features in each character, such as radicals that provide hints about meaning or pronunciation. However, there has been no investigation into whether contemporary Large Language Models (LLMs) and Vision-Language Models (VLMs) can harness these sub-character features in Chinese through prompting. In this study, we establish a benchmark to evaluate LLMs' and VLMs' understanding of visual elements in Chinese characters, including radicals, composition structures, strokes, and stroke counts. Our results reveal that models surprisingly exhibit some, but still limited, knowledge of the visual information, regardless of whether images of characters are provided. To incite models' ability to use radicals, we further experiment with incorporating radicals into the prompts for Chinese language understanding tasks. We observe consistent improvement in Part-Of-Speech tagging when providing additional information about radicals, suggesting the potential to enhance CLP by integrating sub-character information.
Abstract:We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an unconditional diffusion model. This model learns to generate 3D objects represented by sets of GS ellipsoids. With these strong generative 3D priors, though learning unconditionally, the diffusion model is ready for view-guided reconstruction without further model fine-tuning. This is achieved by propagating fine-grained 2D features through the efficient yet flexible splatting function and the guided denoising sampling process. In addition, a 2D diffusion model is further employed to enhance rendering fidelity, and improve reconstructed GS quality by polishing and re-using the rendered images. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views. Experiments on the challenging real-world CO3D dataset demonstrate the superiority of our approach.
Abstract:Spiking Neural Networks (SNNs) offer a promising avenue for energy-efficient computing compared with Artificial Neural Networks (ANNs), closely mirroring biological neural processes. However, this potential comes with inherent challenges in directly training SNNs through spatio-temporal backpropagation -- stemming from the temporal dynamics of spiking neurons and their discrete signal processing -- which necessitates alternative ways of training, most notably through ANN-SNN conversion. In this work, we introduce a lightweight Forward Temporal Bias Correction (FTBC) technique, aimed at enhancing conversion accuracy without the computational overhead. We ground our method on provided theoretical findings that through proper temporal bias calibration the expected error of ANN-SNN conversion can be reduced to be zero after each time step. We further propose a heuristic algorithm for finding the temporal bias only in the forward pass, thus eliminating the computational burden of backpropagation and we evaluate our method on CIFAR-10/100 and ImageNet datasets, achieving a notable increase in accuracy on all datasets. Codes are released at a GitHub repository.
Abstract:To navigate reliably in indoor environments, an industrial autonomous vehicle must know its position. However, current indoor vehicle positioning technologies either lack accuracy, usability or are too expensive. Thus, we propose a novel concept called local reference point assisted active radar positioning, which is able to overcome these drawbacks. It is based on distributing passive retroreflectors in the indoor environment such that each position of the vehicle can be identified by a unique reflection characteristic regarding the reflectors. To observe these characteristics, the autonomous vehicle is equipped with an active radar system. On one hand, this paper presents the basic idea and concept of our new approach towards indoor vehicle positioning and especially focuses on the crucial placement of the reflectors. On the other hand, it also provides a proof of concept by conducting a full system simulation including the placement of the local reference points, the radar-based distance estimation and the comparison of two different positioning methods. It successfully demonstrates the feasibility of our proposed approach.
Abstract:Detecting Resident Space Objects (RSOs) and preventing collisions with other satellites is crucial. Recently, deep convolutional neural networks (DCNNs) have shown superior performance in object detection when large-scale datasets are available. However, collecting rich data of RSOs is difficult due to very few occurrences in the space images. Without sufficient data, it is challenging to comprehensively train DCNN detectors and make them effective for detecting RSOs in space images, let alone to estimate whether a detector is sufficiently robust. The lack of meaningful evaluation of different detectors could further affect the design and application of detection methods. To tackle this issue, we propose that the space images containing RSOs can be simulated to complement the shortage of raw data for better benchmarking. Accordingly, we introduce a novel simulation-augmented benchmarking framework for RSO detection (SAB-RSOD). In our framework, by making the best use of the hardware parameters of the sensor that captures real-world space images, we first develop a high-fidelity RSO simulator that can generate various realistic space images. Then, we use this simulator to generate images that contain diversified RSOs in space and annotate them automatically. Later, we mix the synthetic images with the real-world images, obtaining around 500 images for training with only the real-world images for evaluation. Under SAB-RSOD, we can train different popular object detectors like Yolo and Faster RCNN effectively, enabling us to evaluate their performance thoroughly. The evaluation results have shown that the amount of available data and image resolution are two key factors for robust RSO detection. Moreover, if using a lower resolution for higher efficiency, we demonstrated that a simple UNet-based detection method can already access high detection accuracy.
Abstract:In the past few years, some alternatives to the Orthogonal Frequency Division Multiplexing (OFDM) modulation have been considered to improve its spectral containment and its performance level in the presence of heavy Doppler shifts. This paper examines a novel modulation, named Doppler-Resilient Universal Filtered MultiCarrier (DR-UFMC), which has the objective of combining the advantages provided by the Universal Filtered MultiCarrier (UFMC) modulation (i.e., better spectral containment), with those of the Orthogonal Time Frequency Space (OTFS) modulation (i.e., better performance in time-varying environments). The paper contains the mathematical model and detailed transceiver block scheme of the newly described modulation, along with a numerical analysis contrasting DR-UFMC against OTFS, OFDM with one-tap frequency domain equalization (FDE), and OFDM with multicarrier multisymbol linear MMSE processing. Results clearly show the superiority, with respect to the cited benchmarks, of the newly proposed modulation in terms of achievable spectral efficiency. Interestingly, it is also seen that OFDM, when considered in conjunction with multicarrier multisymbol linear minimum mean squares error (MMSE) processing, performs slightly better than OTFS in terms of achievable spectral efficiency.
Abstract:Reconstructing a 3D object from a 2D image is a well-researched vision problem, with many kinds of deep learning techniques having been tried. Most commonly, 3D convolutional approaches are used, though previous work has shown state-of-the-art methods using 2D convolutions that are also significantly more efficient to train. With the recent rise of transformers for vision tasks, often outperforming convolutional methods, along with some earlier attempts to use transformers for 3D object reconstruction, we set out to use visual transformers in place of convolutions in existing efficient, high-performing techniques for 3D object reconstruction in order to achieve superior results on the task. Using a transformer-based encoder and decoder to predict 3D structure from 2D images, we achieve accuracy similar or superior to the baseline approach. This study serves as evidence for the potential of visual transformers in the task of 3D object reconstruction.
Abstract:This paper considers the problem of beam alignment in a cell-free massive MIMO deployment with multiple access points (APs) and multiple user equipments (UEs) simultaneously operating in the same millimeter wave frequency band. Assuming the availability of a control channel at sub-6 GHz frequencies, a protocol is developed that permits estimating, for each UE, the strongest propagation path from each of the surrounding APs, and to perform user-centric association between the UEs and the APs. Estimation of the strongest paths from nearby APs is realized at the UE in a one-phase procedure, during which all the APs simultaneously transmit on pseudo-randomly selected channels with pseudo-random transmit beamformers. An algorithm for orthogonal channels assignment to the APs is also proposed, with the aim of minimizing the mutual interference between APs that transmit on the same channels. The performance of the proposed strategy is evaluated both in terms of probability of correct detection of the directions of arrival and of departure associated to the strongest beam from nearby APs, and in terms of downlink and uplink signal-to-interference-plus-noise ratio. Numerical results show that the proposed approach is effective and capable of efficiently realizing beam alignment in a multi-UE multi-AP wireless scenario.
Abstract:Unsupervised domain adaptive object detection aims to adapt a well-trained detector from its original source domain with rich labeled data to a new target domain with unlabeled data. Previous works focus on improving the domain adaptability of region-based detectors, e.g., Faster-RCNN, through matching cross-domain instance-level features that are explicitly extracted from a region proposal network (RPN). However, this is unsuitable for region-free detectors such as single shot detector (SSD), which perform a dense prediction from all possible locations in an image and do not have the RPN to encode such instance-level features. As a result, they fail to align important image regions and crucial instance-level features between the domains of region-free detectors. In this work, we propose an adversarial module to strengthen the cross-domain matching of instance-level features for region-free detectors. Firstly, to emphasize the important regions of image, the DSEM learns to predict a transferable foreground enhancement mask that can be utilized to suppress the background disturbance in an image. Secondly, considering that region-free detectors recognize objects of different scales using multi-scale feature maps, the DSEM encodes both multi-level semantic representations and multi-instance spatial-contextual relationships across different domains. Finally, the DSEM is pluggable into different region-free detectors, ultimately achieving the densely semantic feature matching via adversarial learning. Extensive experiments have been conducted on PASCAL VOC, Clipart, Comic, Watercolor, and FoggyCityscape benchmarks, and their results well demonstrate that the proposed approach not only improves the domain adaptability of region-free detectors but also outperforms existing domain adaptive region-based detectors under various domain shift settings.
Abstract:The problem of beam alignment (BA) in a cell-free massive multiple-input multiple-output (CF-mMIMO) system operating at millimeter wave (mmWaves) carrier frequencies is considered in this paper. Two estimation algorithms are proposed, in association with a protocol that permits simultaneous estimation, on a shared set of frequencies, for each user equipment (UE), of the direction of arrival and departure of the radio waves associated to the strongest propagation paths from each of the surrounding access points (APs), so that UE-AP association can take place. The proposed procedure relies on the existence of a reliable control channel at sub-6 GHz frequency, so as to enable exchange of estimated values between the UEs and the network, and assumes that APs can be identifies based on the prior knowledge of the orthogonal channels and transmit beamforming codebook. A strategy for assigning codebook entries to the several APs is also proposed, with the aim of minimizing the mutual interference between APs that are assigned the same entry. Numerical results show the effectiveness of the proposed detection strategy, thus enabling one shot fast BA for CF-mMIMO systems.