Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aladin Djuhera

When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets

Nov 14, 2025

Aladin Djuhera, Farhan Ahmed, Swanand Ravindra Kadhe, Syed Zawad, Heiko Ludwig, Holger Boche

Abstract:Aligning large language models (LLMs) is a central objective of post-training, often achieved through reward modeling and reinforcement learning methods. Among these, direct preference optimization (DPO) has emerged as a widely adopted technique that fine-tunes LLMs on preferred completions over less favorable ones. While most frontier LLMs do not disclose their curated preference pairs, the broader LLM community has released several open-source DPO datasets, including TuluDPO, ORPO, UltraFeedback, HelpSteer, and Code-Preference-Pairs. However, systematic comparisons remain scarce, largely due to the high computational cost and the lack of rich quality annotations, making it difficult to understand how preferences were selected, which task types they span, and how well they reflect human judgment on a per-sample level. In this work, we present the first comprehensive, data-centric analysis of popular open-source DPO corpora. We leverage the Magpie framework to annotate each sample for task category, input quality, and preference reward, a reward-model-based signal that validates the preference order without relying on human annotations. This enables a scalable, fine-grained inspection of preference quality across datasets, revealing structural and qualitative discrepancies in reward margins. Building on these insights, we systematically curate a new DPO mixture, UltraMix, that draws selectively from all five corpora while removing noisy or redundant samples. UltraMix is 30% smaller than the best-performing individual dataset yet exceeds its performance across key benchmarks. We publicly release all annotations, metadata, and our curated mixture to facilitate future research in data-centric preference optimization.

Via

Access Paper or Ask Questions

Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance

Jun 06, 2025

Aladin Djuhera, Swanand Ravindra Kadhe, Syed Zawad, Farhan Ahmed, Heiko Ludwig, Holger Boche

Abstract:Recent work on large language models (LLMs) has increasingly focused on post-training and alignment with datasets curated to enhance instruction following, world knowledge, and specialized skills. However, most post-training datasets used in leading open- and closed-source LLMs remain inaccessible to the public, with limited information about their construction process. This lack of transparency has motivated the recent development of open-source post-training corpora. While training on these open alternatives can yield performance comparable to that of leading models, systematic comparisons remain challenging due to the significant computational cost of conducting them rigorously at scale, and are therefore largely absent. As a result, it remains unclear how specific samples, task types, or curation strategies influence downstream performance when assessing data quality. In this work, we conduct the first comprehensive side-by-side analysis of two prominent open post-training datasets: Tulu-3-SFT-Mix and SmolTalk. Using the Magpie framework, we annotate each sample with detailed quality metrics, including turn structure (single-turn vs. multi-turn), task category, input quality, and response quality, and we derive statistics that reveal structural and qualitative similarities and differences between the two datasets. Based on these insights, we design a principled curation recipe that produces a new data mixture, TuluTalk, which contains 14% fewer samples than either source dataset while matching or exceeding their performance on key benchmarks. Our findings offer actionable insights for constructing more effective post-training datasets that improve model performance within practical resource limits. To support future research, we publicly release both the annotated source datasets and our curated TuluTalk mixture.

Via

Access Paper or Ask Questions

"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation

Jun 04, 2025

Aladin Djuhera, Amin Seffo, Masataro Asai, Holger Boche

Abstract:Recent advancements in large language models (LLMs) have spurred interest in robotic navigation that incorporates complex spatial, mathematical, and conditional constraints from natural language into the planning problem. Such constraints can be informal yet highly complex, making it challenging to translate into a formal description that can be passed on to a planning algorithm. In this paper, we propose STPR, a constraint generation framework that uses LLMs to translate constraints (expressed as instructions on ``what not to do'') into executable Python functions. STPR leverages the LLM's strong coding capabilities to shift the problem description from language into structured and transparent code, thus circumventing complex reasoning and avoiding potential hallucinations. We show that these LLM-generated functions accurately describe even complex mathematical constraints, and apply them to point cloud representations with traditional search algorithms. Experiments in a simulated Gazebo environment show that STPR ensures full compliance across several constraints and scenarios, while having short runtimes. We also verify that STPR can be used with smaller, code-specific LLMs, making it applicable to a wide range of compact models at low inference cost.

* Preprint; under review

Via

Access Paper or Ask Questions

SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Mar 21, 2025

Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche

Figure 1 for SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Figure 2 for SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Figure 3 for SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Figure 4 for SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Abstract:Fine-tuning large language models (LLMs) on downstream tasks can inadvertently erode their safety alignment, even for benign fine-tuning datasets. We address this challenge by proposing SafeMERGE, a post-fine-tuning framework that preserves safety while maintaining task utility. It achieves this by selectively merging fine-tuned and safety-aligned model layers only when those deviate from safe behavior, measured by a cosine similarity criterion. We evaluate SafeMERGE against other fine-tuning- and post-fine-tuning-stage approaches for Llama-2-7B-Chat and Qwen-2-7B-Instruct models on GSM8K and PubMedQA tasks while exploring different merging strategies. We find that SafeMERGE consistently reduces harmful outputs compared to other baselines without significantly sacrificing performance, sometimes even enhancing it. The results suggest that our selective, subspace-guided, and per-layer merging method provides an effective safeguard against the inadvertent loss of safety in fine-tuned LLMs while outperforming simpler post-fine-tuning-stage defenses.

* ICLR 2025 Workshop on Building Trust in Language Models and Applications

Via

Access Paper or Ask Questions

SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought

Nov 27, 2024

Aladin Djuhera, Vlad C. Andrei, Amin Seffo, Holger Boche, Walid Saad

Figure 1 for SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought

Figure 2 for SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought

Figure 3 for SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought

Figure 4 for SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought

Abstract:Path planning is a complex problem for many practical applications, particularly in robotics. Existing algorithms, however, are exhaustive in nature and become increasingly complex when additional side constraints are incorporated alongside distance minimization. In this paper, a novel approach using vision language models (VLMs) is proposed for enabling path planning in complex wireless-aware environments. To this end, insights from a digital twin (DT) with real-world wireless ray tracing data are explored in order to guarantee an average path gain threshold while minimizing the trajectory length. First, traditional approaches such as A* are compared to several wireless-aware extensions, and an optimal iterative dynamic programming approach (DP-WA*) is derived, which fully takes into account all path gains and distance metrics within the DT. On the basis of these baselines, the role of VLMs as an alternative assistant for path planning is investigated, and a strategic chain-of-thought tasking (SCoTT) approach is proposed. SCoTT divides the complex planning task into several subproblems and solves each with advanced CoT prompting. Results show that SCoTT achieves very close average path gains compared to DP-WA* while at the same time yielding consistently shorter path lengths. The results also show that VLMs can be used to accelerate DP-WA* by efficiently reducing the algorithm's search space and thus saving up to 62\% in execution time. This work underscores the potential of VLMs in future digital systems as capable assistants for solving complex tasks, while enhancing user interaction and accelerating rapid prototyping under diverse wireless constraints.

Via

Access Paper or Ask Questions

R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge

Nov 27, 2024

Aladin Djuhera, Vlad C. Andrei, Mohsen Pourghasemian, Haris Gacanin, Holger Boche, Walid Saad

Figure 1 for R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge

Figure 2 for R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge

Figure 3 for R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge

Figure 4 for R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge

Abstract:Multi-task large language models (MTLLMs) are important for many applications at the wireless edge, where users demand specialized models to handle multiple tasks efficiently. However, training MTLLMs is complex and exhaustive, particularly when tasks are subject to change. Recently, the concept of model fusion via task vectors has emerged as an efficient approach for combining fine-tuning parameters to produce an MTLLM. In this paper, the problem of enabling edge users to collaboratively craft such MTTLMs via tasks vectors is studied, under the assumption of worst-case adversarial attacks. To this end, first the influence of adversarial noise to multi-task model fusion is investigated and a relationship between the so-called weight disentanglement error and the mean squared error (MSE) is derived. Using hypothesis testing, it is directly shown that the MSE increases interference between task vectors, thereby rendering model fusion ineffective. Then, a novel resilient MTLLM fusion (R-MTLLMF) is proposed, which leverages insights about the LLM architecture and fine-tuning process to safeguard task vector aggregation under adversarial noise by realigning the MTLLM. The proposed R-MTLLMF is then compared for both worst-case and ideal transmission scenarios to study the impact of the wireless channel. Extensive model fusion experiments with vision LLMs demonstrate R-MTLLMF's effectiveness, achieving close-to-baseline performance across eight different tasks in ideal noise scenarios and significantly outperforming unprotected model fusion in worst-case scenarios. The results further advocate for additional physical layer protection for a holistic approach to resilience, from both a wireless and LLM perspective.

Via

Access Paper or Ask Questions

R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models

Jul 16, 2024

Aladin Djuhera, Vlad C. Andrei, Xinyang Li, Ullrich J. Mönich, Holger Boche, Walid Saad

Figure 1 for R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models

Figure 2 for R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models

Figure 3 for R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models

Figure 4 for R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models

Abstract:Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML), where components of large ML models are outsourced to remote servers. A significant challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming that could jeopardize the learning process. This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding. In this paper, rigorous insights are provided into the influence of jamming LLM word embeddings in SFL by deriving an expression for the ML training loss divergence and showing that it is upper-bounded by the mean squared error (MSE). Based on this analysis, a physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks. R-SFLLM leverages wireless sensing data to gather information on the jamming directions-of-arrival (DoAs) for the purpose of devising a novel, sensing-assisted anti-jamming strategy while jointly optimizing beamforming, user scheduling, and resource allocation. Extensive experiments using BERT and RoBERTa models demonstrate R-SFLLM's effectiveness, achieving close-to-baseline performance across various natural language processing (NLP) tasks and datasets. The proposed methodology further introduces an adversarial training component, where controlled noise exposure significantly enhances the LLM's resilience to perturbed parameters during training. The results show that more noise-sensitive models, such as RoBERTa, benefit from this feature, especially when resource allocation is unfair. It is also shown that worst-case jamming in particular translates into worst-case model outcomes, thereby necessitating the need for jamming-resilient SFL protocols.

Via

Access Paper or Ask Questions

Resilient-By-Design Framework for MIMO-OFDM Communications under Smart Jamming

Apr 16, 2024

Vlad C. Andrei, Aladin Djuhera, Xinyang Li, Ullrich J. Mönich, Holger Boche, Walid Saad

Figure 1 for Resilient-By-Design Framework for MIMO-OFDM Communications under Smart Jamming

Figure 2 for Resilient-By-Design Framework for MIMO-OFDM Communications under Smart Jamming

Abstract:Native jamming mitigation is essential for addressing security and resilience in future 6G wireless networks. In this paper a resilient-by-design framework for effective anti-jamming in MIMO-OFDM wireless communications is introduced. A novel approach that integrates information from wireless sensing services to develop anti-jamming strategies, which do not rely on any prior information or assumptions on the adversary's concrete setup, is explored. To this end, a method that replaces conventional approaches to noise covariance estimation in anti-jamming with a surrogate covariance model is proposed, which instead incorporates sensing information on the jamming signal's directions-of-arrival (DoAs) to provide an effective approximation of the true jamming strategy. The study further focuses on integrating this novel, sensing-assisted approach into the joint optimization of beamforming, user scheduling and power allocation for a multi-user MIMO-OFDM uplink setting. Despite the NP-hard nature of this optimization problem, it can be effectively solved using an iterative water-filling approach. In order to assess the effectiveness of the proposed sensing-assisted jamming mitigation, the corresponding worst-case jamming strategy is investigated, which aims to minimize the total user sum-rate. Experimental simulations eventually affirm the robustness of our approach against both worst-case and barrage jamming, demonstrating its potential to address a wide range of jamming scenarios. Since such an integration of sensing-assisted information is directly implemented on the physical layer, resilience is incorporated preemptively by-design.

* accepted to IEEE International Conference on Communications, 2nd Workshop on Enabling Security, Trust, and Privacy in 6G Wireless Systems

Via

Access Paper or Ask Questions

An Analysis of Capacity-Distortion Trade-Offs in Memoryless ISAC Systems

Feb 26, 2024

Xinyang Li, Vlad C. Andrei, Aladin Djuhera, Ullrich J. Mönich, Holger Boche

Figure 1 for An Analysis of Capacity-Distortion Trade-Offs in Memoryless ISAC Systems

Figure 2 for An Analysis of Capacity-Distortion Trade-Offs in Memoryless ISAC Systems

Figure 3 for An Analysis of Capacity-Distortion Trade-Offs in Memoryless ISAC Systems

Figure 4 for An Analysis of Capacity-Distortion Trade-Offs in Memoryless ISAC Systems

Abstract:This manuscript investigates the information-theoretic limits of integrated sensing and communications (ISAC), aiming for simultaneous reliable communication and precise channel state estimation. We model such a system with a state-dependent discrete memoryless channel (SD-DMC) with present or absent channel feedback and generalized side information at the transmitter and the receiver, where the joint task of message decoding and state estimation is performed at the receiver. The relationship between the achievable communication rate and estimation error, the capacity-distortion (C-D) trade-off, is characterized across different causality levels of the side information. This framework is shown to be capable of modeling various practical scenarios by assigning the side information with different meanings, including monostatic and bistatic radar systems. The analysis is then extended to the two-user degraded broadcast channel, and we derive an achievable C-D region that is tight under certain conditions. To solve the optimization problem arising in the computation of C-D functions/regions, we propose a proximal block coordinate descent (BCD) method, prove its convergence to a stationary point, and derive a stopping criterion. Finally, several representative examples are studied to demonstrate the versatility of our framework and the effectiveness of the proposed algorithm.

Via

Access Paper or Ask Questions