Abstract:The ability of gaze estimation models to generalize is often significantly hindered by various factors unrelated to gaze, especially when the training dataset is limited. Current strategies aim to address this challenge through different domain generalization techniques, yet they have had limited success due to the risk of overfitting when solely relying on value labels for regression. Recent progress in pre-trained vision-language models has motivated us to capitalize on the abundant semantic information available. We propose a novel approach in this paper, reframing the gaze estimation task as a vision-language alignment issue. Our proposed framework, named Language-Guided Gaze Estimation (LG-Gaze), learns continuous and geometry-sensitive features for gaze estimation benefit from the rich prior knowledges of vision-language models. Specifically, LG-Gaze aligns gaze features with continuous linguistic features through our proposed multimodal contrastive regression loss, which customizes adaptive weights for different negative samples. Furthermore, to better adapt to the labels for gaze estimation task, we propose a geometry-aware interpolation method to obtain more precise gaze embeddings. Through extensive experiments, we validate the efficacy of our framework in four different cross-domain evaluation tasks.
Abstract:Analog-to-digital converters (ADCs) play a vital important role in any devices via manipulating analog signals in a digital manner. Given that the amplitude of the signal exceeds the dynamic range of the ADCs, clipping occurs and the quality of the digitized signal degrades significantly. In this paper, we design a joint modulo sampling hardware and processing prototype which improves the ADCs' dynamic range by folding the signal before sampling. Both the detailed design of the hardware and the recovery results of various state-of-the-art processing algorithms including our proposed unlimited sampling line spectral estimation (USLSE) algorithm are presented. Additionally, key issues that arise during implementation are also addressed. It is demonstrated that the USLSE algorithm successfully recovers the original signal with a frequency of 2.5 kHz and an amplitude 10 times the ADC's dynamic range, and the linear prediction (LP) algorithm successfully recovers the original signal with a frequency of 3.5 kHz and an amplitude 10 times the ADC's dynamic range.
Abstract:3D object detection based on roadside cameras is an additional way for autonomous driving to alleviate the challenges of occlusion and short perception range from vehicle cameras. Previous methods for roadside 3D object detection mainly focus on modeling the depth or height of objects, neglecting the stationary of cameras and the characteristic of inter-frame consistency. In this work, we propose a novel framework, namely MOSE, for MOnocular 3D object detection with Scene cuEs. The scene cues are the frame-invariant scene-specific features, which are crucial for object localization and can be intuitively regarded as the height between the surface of the real road and the virtual ground plane. In the proposed framework, a scene cue bank is designed to aggregate scene cues from multiple frames of the same scene with a carefully designed extrinsic augmentation strategy. Then, a transformer-based decoder lifts the aggregated scene cues as well as the 3D position embeddings for 3D object location, which boosts generalization ability in heterologous scenes. The extensive experiment results on two public benchmarks demonstrate the state-of-the-art performance of the proposed method, which surpasses the existing methods by a large margin.
Abstract:Unlimited sampling was recently introduced to deal with the clipping or saturation of measurements where a modulo operator is applied before sampling. In this paper, we investigate the identifiability of the model where measurements are acquired under a discrete Fourier transform (DFT) sensing matrix first followed by a modulo operator (modulo-DFT). Firstly, based on the theorems of cyclotomic polynomials, we derive a sufficient condition for uniquely identifying the original signal in modulo-DFT. Additionally, for periodic bandlimited signals (PBSs) under unlimited sampling which can be viewed as a special case of modulo-DFT, the necessary and sufficient condition for the unique recovery of the original signal are provided. Moreover, we show that when the oversampling factor exceeds $3(1+1/P)$, PBS is always identifiable from the modulo samples, where $P$ is the number of harmonics including the fundamental component in the positive frequency part.
Abstract:Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. When the models are required to align with a broader range of downstream tasks, or there is a desire to notably improve the performance on a specific task, a substantial increase in fine-tuning data often emerges as the solution. However, we find that large-scale increases in instruction data can disrupt the world knowledge previously stored in the LLMs, i.e., world knowledge forgetting. In this paper, we introduce LoRAMoE to address the above challenge. The LoRAMoE is a plugin version of Mixture of Experts (MoE). The plugin form ensures the integrity of world knowledge by freezing the backbone model during the training phase. We then propose the use of localized balancing constraints to coordinate parts of experts for task utilization, meanwhile enabling other experts to fully leverage the world knowledge stored in the models. Experimental results demonstrate that LoRAMoE can reasonably coordinate experts based on data type during inference, and even dramatically increasing instruction data does not result in knowledge forgetting. Moreover, LoRAMoE provides additional benefits for the performance of downstream tasks, indicating the potential of our approach for multi-task learning.
Abstract:Modulo sampling or unlimited sampling has recently drawn a great deal of attention for cutting-edge applications, due to overcoming the barrier of information loss through sensor saturation and clipping. This is a significant problem, especially when the range of signal amplitudes is unknown or in the near-far case. To overcome this fundamental bottleneck, we propose a one-bit-aided (1bit-aided) modulo sampling scheme for direction-of-arrival (DOA) estimation. On the one hand, one-bit quantization involving a simple comparator offers the advantages of low-cost and low-complexity implementation. On the other hand, one-bit quantization provides an estimate of the normalized covariance matrix of the unquantized measurements via the arcsin law. The estimate of the normalized covariance matrix is used to implement blind integer-forcing (BIF) decoder to unwrap the modulo samples to construct the covariance matrix, and subspace methods can be used to perform the DOA estimation. Our approach named as 1bit-aided-BIF addresses the near-far problem well and overcomes the intrinsic low dynamic range of one-bit quantization. Numerical experiments validate the excellent performance of the proposed algorithm compared to using a high-precision ADC directly in the given set up.
Abstract:As radar systems accompanied by large numbers of antennas and scale up in bandwidth, the cost and power consumption of high-precision (e.g., 10-12 bits) analog-to-digital converter (ADC) become the limiting factor. As a remedy, line spectral estimation and detection (LSE\&D) from low resolution (e.g., 1-4 bits) quantization has been gradually drawn attention in recent years. As low resolution quantization reduces the dynamic range (DR) of the receiver, the theoretical detection probabilities for the multiple targets (especially for the weakest target) are analyzed, which reveals the effects of low resolution on weak signal detection and provides the guidelines for system design. The computation complexities of current methods solve the line spectral estimation from coarsely quantized samples are often high. In this paper, we propose a fast generalized Newtonized orthogonal matching pursuit (GNOMP) which has superior estimation accuracy and maintains a constant false alarm rate (CFAR) behaviour. Besides, such an approach are easily extended to handle the other measurement scenarios such as sign measurements from time-varying thresholds, compressive setting, multisnapshot setting, multidimensional setting and unknown noise variance. Substantial numerical simulations are conducted to demonstrate the effectiveness of GNOMP in terms of estimating accuracy, detection probability and running time. Besides, real data are also provided to demonstrate the effectiveness of the GNOMP.
Abstract:Finding antenna designs that satisfy frequency requirements and are also optimal with respect to multiple physical criteria is a critical component in designing next generation hardware. However, such a process is non-trivial because the objective function is typically highly nonlinear and sensitive to subtle design change. Moreover, the objective to be optimized often involves electromagnetic (EM) simulations, which is slow and expensive with commercial simulation software. In this work, we propose a sample-efficient and accurate surrogate model, named CZP (Constant Zeros Poles), to directly estimate the scattering coefficients in the frequency domain of a given 2D planar antenna design, without using a simulator. CZP achieves this by predicting the complex zeros and poles for the frequency response of scattering coefficients, which we have theoretically justified for any linear PDE, including Maxwell's equations. Moreover, instead of using low-dimensional representations, CZP leverages a novel image-based representation for antenna topology inspired by the existing mesh-based EM simulation techniques, and attention-based neural network architectures. We demonstrate experimentally that CZP not only outperforms baselines in terms of test loss, but also is able to find 2D antenna designs verifiable by commercial software with only 40k training samples, when coupling with advanced sequential search techniques like reinforcement learning.
Abstract:Line spectral estimation (LSE) is a fundamental problem in signal processing due to its wide applications. For signals that are orders of magnitude larger than the dynamic range of the analog-to-digital (ADC) threshold, conventional ADC will clip or saturate, leading to significant information loss. The Unlimited Sensing Framework (USF) was introduced to avoid saturation through sampling of the signal modulo. Motivated by the USF, we study the LSE from modulo samples. By exploiting oversampling and the bounded spectral property, the US-LSE is proposed to recover the folding instants and perform LSE. Our numerical simulations show that for oversampling factor $\gamma\geq 10$, the US-LSE is more stable in a lower signal to noise scenario, in the range of $20\sim30$ dB, compared to the existing algorithm. Besides, we process the real data generated by AWR1642, and show that US-LSE estimates the ranges of two corners with SNRs of $12$ dB and $23$ dB for oversampling factor $\gamma= 10$, and the normalized dynamic range ${\rm DR}=\beta_g/\lambda\approx 3$, where $\beta_g$ is the infinity norm of the signal.
Abstract:The line spectrum estimation problem is considered in this paper. We propose a CFAR-based Newtonized OMP (NOMP-CFAR) method which can maintain a desired false alarm rate without the knowledge of the noise variance. The NOMP-CFAR consists of two steps, namely, an initialization step and a detection step. In the initialization step, NOMP is employed to obtain candidate sinusoidal components. In the detection step, CFAR detector is applied to detect each candidate frequency, and remove the most unlikely frequency component. Then, the Newton refinements are used to refine the remaining parameters. The relationship between the false alarm rate and the required threshold is established. By comparing with the NOMP, NOMP-CFAR has only $1$ dB performance loss in additive white Gaussian noise scenario with false alarm probability $10^{-2}$ and detection probability $0.8$ without knowledge of noise variance. For varied noise variance scenario, NOMP-CFAR still preserves its CFAR property, while NOMP violates the CFAR. Besides, real experiments are also conducted to demonstrate the detection performance of NOMP-CFAR, compared to CFAR and NOMP.