University of Minnesota
Abstract:Movable antennas (MAs) have shown significant potential in enhancing the performance of dual-functional radar-communication (DFRC) systems. In this paper, we investigate the MA-based transceiver design for DFRC systems, where a reconfigurable intelligent surface (RIS) is employed to enhance the communication quality in dead zones. To enhance the radar sensing performance, we formulate an optimization problem to maximize the radar signal-to-interference-plus-noise ratio (SINR) by jointly optimizing the beamforming vectors, receiving filter, antenna positions, and RIS reflecting coefficients. To tackle this challenging problem, we develop a fractional programming-based optimization framework, incorporating block coordinate descent (BCD), successive convex approximation (SCA), and penalty techniques. Simulation results demonstrate that the proposed method can significantly improve the radar SINR and achieve a satisfactory balance between the radar and communication performance compared with existing benchmark schemes.
Abstract:Transformer-based Large Language Models (LLMs) struggle to process inputs exceeding their training context window, with performance degrading due to positional out-of-distribution (O.O.D.) that disrupt attention computations. Existing solutions, fine-tuning and training-free methods, are limited by computational inefficiency, attention logit outliers or loss of local positional information. To address this, we propose Greedy Attention Logit Interpolation (GALI), a training-free length extrapolation method that maximizes the utilization of pretrained positional intervals while avoiding attention logit outliers through attention logit interpolation. The result demonstrates that GALI consistently outperforms state-of-the-art training-free methods. Our findings reveal that LLMs interpret positional intervals unevenly within their training context window, suggesting that extrapolating within a smaller positional interval range yields superior results-even for short-context tasks. GALI represents a significant step toward resolving the positional O.O.D. challenge, enabling more reliable long-text understanding in LLMs. Our implementation of GALI, along with the experiments from our paper, is open-sourced at https://github.com/AcademyCityL/GALI.
Abstract:Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically leverage Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when used for step-level preference optimization, these models face challenges in handling noisy images of different timesteps and require complex transformations into pixel space. In this work, we demonstrate that diffusion models are inherently well-suited for step-level reward modeling in the latent space, as they can naturally extract features from noisy latent images. Accordingly, we propose the Latent Reward Model (LRM), which repurposes components of diffusion models to predict preferences of latent images at various timesteps. Building on LRM, we introduce Latent Preference Optimization (LPO), a method designed for step-level preference optimization directly in the latent space. Experimental results indicate that LPO not only significantly enhances performance in aligning diffusion models with general, aesthetic, and text-image alignment preferences, but also achieves 2.5-28$\times$ training speedup compared to existing preference optimization methods. Our code will be available at https://github.com/casiatao/LPO.
Abstract:To mitigate the adverse effects of hard samples on the training of the Flying Bird Object Detection (FBOD) model for surveillance videos, we propose a Co-Paced Learning Based on Confidence (CPL-BC) strategy and apply this strategy to the training process of the FBOD model. This strategy involves maintaining two models with identical structures but different initial parameter configurations, which collaborate with each other to select easy samples with prediction confidence exceeding a set threshold for training. As training progresses, the strategy gradually lowers the threshold, allowing more samples to participate, enhancing the model's ability to recognize objects from easy to hard. Before applying the CPL-BC strategy to train the FBOD models, we initially trained the two FBOD models to equip them with the capability to assess the difficulty level of flying bird object samples. Experimental results on two different datasets of flying bird objects in surveillance videos demonstrate that, compared to other model learning strategies, CPL-BC significantly improves detection accuracy, verifying the effectiveness and advancement of this method.
Abstract:Searching for the Extreme Operating Conditions (EOCs) is one of the core problems of power system relay protection setting calculation. The current methods based on brute-force search, heuristic algorithms, and mathematical programming can hardly meet the requirements of today's power systems in terms of computation speed due to the drastic changes in operating conditions induced by renewables and power electronics. This paper proposes an EOC fast search method, named Graph Dueling Double Deep Q Network (Graph D3QN), which combines graph neural network and deep reinforcement learning to address this challenge. First, the EOC search problem is modeled as a Markov decision process, where the information of the underlying power system is extracted using graph neural networks, so that the EOC of the system can be found via deep reinforcement learning. Then, a two-stage Guided Learning and Free Exploration (GLFE) training framework is constructed to accelerate the convergence speed of reinforcement learning. Finally, the proposed Graph D3QN method is validated through case studies of searching maximum fault current for relay protection setting calculation on the IEEE 39-bus and 118-bus systems. The experimental results demonstrate that Graph D3QN can reduce the computation time by 10 to 1000 times while guaranteeing the accuracy of the selected EOCs.
Abstract:Storage systems account for a major portion of the total cost of ownership (TCO) of warehouse-scale computers, and thus have a major impact on the overall system's efficiency. Machine learning (ML)-based methods for solving key problems in storage system efficiency, such as data placement, have shown significant promise. However, there are few known practical deployments of such methods. Studying this problem in the context of real-world hyperscale data center deployments at Google, we identify a number of challenges that we believe cause this lack of practical adoption. Specifically, prior work assumes a monolithic model that resides entirely within the storage layer, an unrealistic assumption in real-world data center deployments. We propose a cross-layer approach that moves ML out of the storage system and performs it in the application running on top of it, co-designed with a scheduling algorithm at the storage layer that consumes predictions from these application-level models. This approach combines small, interpretable models with a co-designed heuristic that adapts to different online environments. We build a proof-of-concept of this approach in a production distributed computation framework at Google. Evaluations in a test deployment and large-scale simulation studies using production traces show improvements of as much as 3.47x in TCO savings compared to state of the art baselines. We believe this work represents a significant step towards more practical ML-driven storage placement in warehouse-scale computers.
Abstract:Rainfall impacts daily activities and can lead to severe hazards such as flooding. Traditional rainfall measurement systems often lack granularity or require extensive infrastructure. While the attenuation of electromagnetic waves due to rainfall is well-documented for frequencies above 10 GHz, sub-6 GHz bands are typically assumed to experience negligible effects. However, recent studies suggest measurable attenuation even at these lower frequencies. This study presents the first channel state information (CSI)-based measurement and analysis of rainfall attenuation at 2.8 GHz. The results confirm the presence of rain-induced attenuation at this frequency, although classification remains challenging. The attenuation follows a power-law decay model, with the rate of attenuation decreasing as rainfall intensity increases. Additionally, rainfall onset significantly increases the delay spread. Building on these insights, we propose RainGaugeNet, the first CSI-based rainfall classification model that leverages multipath and temporal features. Using only 20 seconds of CSI data, RainGaugeNet achieved over 90% classification accuracy in line-of-sight scenarios and over 85% in non-lineof-sight scenarios, significantly outperforming state-of-the-art methods.
Abstract:In order to avoid the impact of hard samples on the training process of the Flying Bird Object Detection model (FBOD model, in our previous work, we designed the FBOD model according to the characteristics of flying bird objects in surveillance video), the Self-Paced Learning strategy with Easy Sample Prior Based on Confidence (SPL-ESP-BC), a new model training strategy, is proposed. Firstly, the loss-based Minimizer Function in Self-Paced Learning (SPL) is improved, and the confidence-based Minimizer Function is proposed, which makes it more suitable for one-class object detection tasks. Secondly, to give the model the ability to judge easy and hard samples at the early stage of training by using the SPL strategy, an SPL strategy with Easy Sample Prior (ESP) is proposed. The FBOD model is trained using the standard training strategy with easy samples first, then the SPL strategy with all samples is used to train it. Combining the strategy of the ESP and the Minimizer Function based on confidence, the SPL-ESP-BC model training strategy is proposed. Using this strategy to train the FBOD model can make it to learn the characteristics of the flying bird object in the surveillance video better, from easy to hard. The experimental results show that compared with the standard training strategy that does not distinguish between easy and hard samples, the AP50 of the FBOD model trained by the SPL-ESP-BC is increased by 2.1%, and compared with other loss-based SPL strategies, the FBOD model trained with SPL-ESP-BC strategy has the best comprehensive detection performance.
Abstract:Recent advancements in generative models have significantly enhanced talking face video generation, yet singing video generation remains underexplored. The differences between human talking and singing limit the performance of existing talking face video generation models when applied to singing. The fundamental differences between talking and singing-specifically in audio characteristics and behavioral expressions-limit the effectiveness of existing models. We observe that the differences between singing and talking audios manifest in terms of frequency and amplitude. To address this, we have designed a multi-scale spectral module to help the model learn singing patterns in the spectral domain. Additionally, we develop a spectral-filtering module that aids the model in learning the human behaviors associated with singing audio. These two modules are integrated into the diffusion model to enhance singing video generation performance, resulting in our proposed model, SINGER. Furthermore, the lack of high-quality real-world singing face videos has hindered the development of the singing video generation community. To address this gap, we have collected an in-the-wild audio-visual singing dataset to facilitate research in this area. Our experiments demonstrate that SINGER is capable of generating vivid singing videos and outperforms state-of-the-art methods in both objective and subjective evaluations.
Abstract:Identifying the interaction targets of bioactive compounds is a foundational element for deciphering their pharmacological effects. Target prediction algorithms equip researchers with an effective tool to rapidly scope and explore potential targets. Here, we introduce the COMET, a multi-technological modular target prediction tool that provides comprehensive predictive insights, including similar active compounds, three-dimensional predicted binding modes, and probability scores, all within an average processing time of less than 10 minutes per task. With meticulously curated data, the COMET database encompasses 990,944 drug-target interaction pairs and 45,035 binding pockets, enabling predictions for 2,685 targets, which span confirmed and exploratory therapeutic targets for human diseases. In comparative testing using datasets from ChEMBL and BindingDB, COMET outperformed five other well-known algorithms, offering nearly an 80% probability of accurately identifying at least one true target within the top 15 predictions for a given compound. COMET also features a user-friendly web server, accessible freely at https://www.pdbbind-plus.org.cn/comet.