SynSense AG, Swizerland
Abstract:Orthogonal frequency division multiplexing - integrated sensing and communication (OFDM-ISAC) has emerged as a key enabler for future wireless networks, leveraging the widely adopted OFDM waveform to seamlessly integrate wireless communication and radar sensing within a unified framework. In this paper, we propose adaptive resource allocation strategies for OFDM-ISAC systems to achieve optimal trade-offs between diverse sensing requirements and communication quality-of-service (QoS). We first develop a comprehensive resource allocation framework for OFDM-ISAC systems, deriving closed-form expressions for key sensing performance metrics, including delay resolution, Doppler resolution, delay-Doppler peak sidelobe level (PSL), and received signal-to-noise ratio (SNR). Building on this theoretical foundation, we introduce two novel resource allocation algorithms tailored to distinct sensing objectives. The resolution-oriented algorithm aims to maximize the weighted delay-Doppler resolution while satisfying constraints on PSL, sensing SNR, communication sum-rate, and transmit power. The sidelobe-oriented algorithm focuses on minimizing delay-Doppler PSL while satisfying resolution, SNR, and communication constraints. To efficiently solve the resulting non-convex optimization problems, we develop two adaptive resource allocation algorithms based on Dinkelbach's transform and majorization-minimization (MM). Extensive simulations validate the effectiveness of the proposed sensing-oriented adaptive resource allocation strategies in enhancing resolution and sidelobe suppression. Remarkably, these strategies achieve sensing performance nearly identical to that of a radar-only scheme, which dedicates all resources to sensing. These results highlight the superior performance of the proposed methods in optimizing the trade-off between sensing and communication objectives within OFDM-ISAC systems.
Abstract:DeepSeek-R1 has shown that long chain-of-thought (CoT) reasoning can naturally emerge through a simple reinforcement learning (RL) framework with rule-based rewards, where the training may directly start from the base models-a paradigm referred to as zero RL training. Most recent efforts to reproduce zero RL training have primarily focused on the Qwen2.5 model series, which may not be representative as we find the base models already exhibit strong instruction-following and self-reflection abilities. In this work, we investigate zero RL training across 10 diverse base models, spanning different families and sizes including LLama3-8B, Mistral-7B/24B, DeepSeek-Math-7B, Qwen2.5-math-7B, and all Qwen2.5 models from 0.5B to 32B. Leveraging several key design strategies-such as adjusting format reward and controlling query difficulty-we achieve substantial improvements in both reasoning accuracy and response length across most settings. However, by carefully monitoring the training dynamics, we observe that different base models exhibit distinct patterns during training. For instance, the increased response length does not always correlate with the emergence of certain cognitive behaviors such as verification (i.e., the "aha moment"). Notably, we observe the "aha moment" for the first time in small models not from the Qwen family. We share the key designs that enable successful zero RL training, along with our findings and practices. To facilitate further research, we open-source the code, models, and analysis tools.
Abstract:This paper presents CONTHER, a novel reinforcement learning algorithm designed to efficiently and rapidly train robotic agents for goal-oriented manipulation tasks and obstacle avoidance. The algorithm uses a modified replay buffer inspired by the Hindsight Experience Replay (HER) approach to artificially populate experience with successful trajectories, effectively addressing the problem of sparse reward scenarios and eliminating the need to manually collect expert demonstrations. The developed algorithm proposes a Transformer-based architecture to incorporate the context of previous states, allowing the agent to perform a deeper analysis and make decisions in a manner more akin to human learning. The effectiveness of the built-in replay buffer, which acts as an "internal demonstrator", is twofold: it accelerates learning and allows the algorithm to adapt to different tasks. Empirical data confirm the superiority of the algorithm by an average of 38.46% over other considered methods, and the most successful baseline by 28.21%, showing higher success rates and faster convergence in the point-reaching task. Since the control is performed through the robot's joints, the algorithm facilitates potential adaptation to a real robot system and construction of an obstacle avoidance task. Therefore, the algorithm has also been tested on tasks requiring following a complex dynamic trajectory and obstacle avoidance. The design of the algorithm ensures its applicability to a wide range of goal-oriented tasks, making it an easily integrated solution for real-world robotics applications.
Abstract:Recent advancements in LLM pretraining have featured ever-expanding context windows to process longer sequences. However, our pilot study reveals that models pretrained with shorter context windows consistently outperform their long-context counterparts under a fixed token budget. This finding motivates us to explore an optimal context window scheduling strategy to better balance long-context capability with pretraining efficiency. To this end, we propose SkyLadder, a simple yet effective approach that implements a short-to-long context window transition. SkyLadder preserves strong standard benchmark performance, while matching or exceeding baseline results on long context tasks. Through extensive experiments, we pre-train 1B-parameter models (up to 32K context) and 3B-parameter models (8K context) on 100B tokens, demonstrating that SkyLadder yields consistent gains of up to 3.7% on common benchmarks, while achieving up to 22% faster training speeds compared to baselines. The code is at https://github.com/sail-sg/SkyLadder.
Abstract:Integrated sensing and communication (ISAC) is a pivotal enabler for next-generation wireless networks. A key challenge in ISAC systems lies in designing dual-functional waveforms that can achieve satisfactory radar sensing accuracy by effectively suppressing range-Doppler sidelobes. However, existing solutions are often computationally intensive, limiting their practicality in multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) ISAC deployments. This paper presents a novel low-complexity algorithm leveraging the augmented Lagrangian method (ALM) and Riemannian conjugate gradient (RCG) optimization techniques to address these challenges. The proposed algorithm achieves superior sidelobe suppression compared to state-of-the-art methods while dramatically reducing computational complexity, making it highly suitable for real-world MIMO-OFDM ISAC systems. Simulation results demonstrate that the proposed approach not only outperforms existing benchmarks in sidelobe reduction but also accelerates convergence, ensuring efficient performance across communication and sensing tasks.
Abstract:Water quality data can supply a substantial decision support for water resources utilization and pollution prevention. However, there are numerous missing values in water quality data due to inescapable factors like sensor failure, thereby leading to biased result for hydrological analysis and failing to support environmental governance decision accurately. A Latent Factorization of Tensors (LFT) with Stochastic Gradient Descent (SGD) proves to be an efficient imputation method. However, a standard SGD-based LFT model commonly surfers from the slow convergence that impairs its efficiency. To tackle this issue, this paper proposes a Fast Latent Factorization of Tensors (FLFT) model. It constructs an adjusted instance error into SGD via leveraging a nonlinear PID controller to incorporates the past, current and future information of prediction error for improving convergence rate. Comparing with state-of-art models in real world datasets, the results of experiment indicate that the FLFT model achieves a better convergence rate and higher accuracy.
Abstract:Cell-free massive multi-input multi-output (CF-mMIMO) systems have emerged as a promising paradigm for next-generation wireless communications, offering enhanced spectral efficiency and coverage through distributed antenna arrays. However, the non-linearity of power amplifiers (PAs) in these arrays introduce spatial distortion, which may significantly degrade system performance. This paper presents the first investigation of distortion-aware beamforming in a distributed framework tailored for CF-mMIMO systems, enabling pre-compensation for beam dispersion caused by nonlinear PA distortion. Using a third-order memoryless polynomial distortion model, the impact of the nonlinear PA on the performance of CF-mMIMO systems is firstly analyzed by evaluating the signal-to-interference-noise-and-distortion ratio (SINDR) at user equipment (UE). Then, we develop two distributed distortion-aware beamforming designs based on ring topology and star topology, respectively. In particular, the ring-topology-based fully-distributed approach reduces interconnection costs and computational complexity, while the star-topology-based partially-distributed scheme leverages the superior computation capability of the central processor to achieve improved sum-rate performance. Extensive simulations demonstrate the effectiveness of the proposed distortion-aware beamforming designs in mitigating the effect of nonlinear PA distortion, while also reducing computational complexity and backhaul information exchange in CF-mMIMO systems.
Abstract:Reconfigurable antennas possess the capability to dynamically adjust their fundamental operating characteristics, thereby enhancing system adaptability and performance. To fully exploit this flexibility in modern wireless communication systems, this paper considers a novel tri-hybrid beamforming architecture, which seamlessly integrates pattern-reconfigurable antennas with both analog and digital beamforming. The proposed tri-hybrid architecture operates across three layers: (\textit{i}) a radiation beamformer in the electromagnetic (EM) domain for dynamic pattern alignment, (\textit{ii}) an analog beamformer in the radio-frequency (RF) domain for array gain enhancement, and (\textit{iii}) a digital beamformer in the baseband (BB) domain for multi-user interference mitigation. To establish a solid theoretical foundation, we first develop a comprehensive mathematical model for the tri-hybrid beamforming system and formulate the signal model for a multi-user multi-input single-output (MU-MISO) scenario. The optimization objective is to maximize the sum-rate while satisfying practical constraints. Given the challenges posed by high pilot overhead and computational complexity, we introduce an innovative tri-timescale beamforming framework, wherein the radiation beamformer is optimized over a long-timescale, the analog beamformer over a medium-timescale, and the digital beamformer over a short-timescale. This hierarchical strategy effectively balances performance and implementation feasibility. Simulation results validate the performance gains of the proposed tri-hybrid architecture and demonstrate that the tri-timescale design significantly reduces pilot overhead and computational complexity, highlighting its potential for future wireless communication systems.
Abstract:To date, hundreds of crashes have occurred in open road testing of automated vehicles (AVs), highlighting the need for improving AV reliability and safety. Pre-crash scenario typology classifies crashes based on vehicle dynamics and kinematics features. Building on this, characteristics analysis can identify similar features under comparable crashes, offering a more effective reflection of general crash patterns and providing more targeted recommendations for enhancing AV performance. However, current studies primarily concentrated on crashes among conventional human-driven vehicles, leaving a gap in research dedicated to in-depth AV crash analyses. In this paper, we analyzed the latest California AV collision reports and used the newly revised pre-crash scenario typology to identify pre-crash scenarios. We proposed a set of mapping rules for automatically extracting these AV pre-crash scenarios, successfully identifying 24 types with a 98.1% accuracy rate, and obtaining two key scenarios of AV crashes (i.e., rear-end scenarios and intersection scenarios) through detailed analysis. Association analyses of rear-end scenarios showed that the significant environmental influencing factors were traffic control type, location type, light, etc. For intersection scenarios prone to severe crashes with detailed descriptions, we employed causal analyses to obtain the significant causal factors: habitual violations and expectations of certain behavior. Optimization recommendations were then formulated, addressing both governmental oversight and AV manufacturers' potential improvements. The findings of this paper could guide government authorities to develop related regulations, help manufacturers design AV test scenarios, and identify potential shortcomings in control algorithms specific to various real-world scenarios, thereby optimizing AV systems effectively.
Abstract:Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.