Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Benjamin Matthias Ruppik

Post-Training Large Language Models via Reinforcement Learning from Self-Feedback

Jul 29, 2025

Carel van Niekerk, Renato Vukovic, Benjamin Matthias Ruppik, Hsien-chin Lin, Milica Gašić

Abstract:Large Language Models (LLMs) often produce plausible but poorly-calibrated answers, limiting their reliability on reasoning-intensive tasks. We present Reinforcement Learning from Self-Feedback (RLSF), a post-training stage that uses the model's own confidence as an intrinsic reward, mimicking how humans learn in the absence of external feedback. After a frozen LLM generates several chain-of-thought solutions, we define and compute the confidence of each final answer span and rank the traces accordingly. These synthetic preferences are then used to fine-tune the policy with standard preference optimization, similar to RLHF yet requiring no human labels, gold answers, or externally curated rewards. RLSF simultaneously (i) refines the model's probability estimates -- restoring well-behaved calibration -- and (ii) strengthens step-by-step reasoning, yielding improved performance on arithmetic reasoning and multiple-choice question answering. By turning a model's own uncertainty into useful self-feedback, RLSF affirms reinforcement learning on intrinsic model behaviour as a principled and data-efficient component of the LLM post-training pipeline and warrents further research in intrinsic rewards for LLM post-training.

Via

Access Paper or Ask Questions

Learning from Noisy Labels via Self-Taught On-the-Fly Meta Loss Rescaling

Dec 17, 2024

Michael Heck, Christian Geishauser, Nurul Lubis, Carel van Niekerk, Shutong Feng, Hsien-Chin Lin, Benjamin Matthias Ruppik, Renato Vukovic, Milica Gašić

Figure 1 for Learning from Noisy Labels via Self-Taught On-the-Fly Meta Loss Rescaling

Figure 2 for Learning from Noisy Labels via Self-Taught On-the-Fly Meta Loss Rescaling

Figure 3 for Learning from Noisy Labels via Self-Taught On-the-Fly Meta Loss Rescaling

Figure 4 for Learning from Noisy Labels via Self-Taught On-the-Fly Meta Loss Rescaling

Abstract:Correct labels are indispensable for training effective machine learning models. However, creating high-quality labels is expensive, and even professionally labeled data contains errors and ambiguities. Filtering and denoising can be applied to curate labeled data prior to training, at the cost of additional processing and loss of information. An alternative is on-the-fly sample reweighting during the training process to decrease the negative impact of incorrect or ambiguous labels, but this typically requires clean seed data. In this work we propose unsupervised on-the-fly meta loss rescaling to reweight training samples. Crucially, we rely only on features provided by the model being trained, to learn a rescaling function in real time without knowledge of the true clean data distribution. We achieve this via a novel meta learning setup that samples validation data for the meta update directly from the noisy training corpus by employing the rescaling function being trained. Our proposed method consistently improves performance across various NLP tasks with minimal computational overhead. Further, we are among the first to attempt on-the-fly training data reweighting on the challenging task of dialogue modeling, where noisy and ambiguous labels are common. Our strategy is robust in the face of noisy and clean data, handles class imbalance, and prevents overfitting to noisy labels. Our self-taught loss rescaling improves as the model trains, showing the ability to keep learning from the model's own signals. As training progresses, the impact of correctly labeled data is scaled up, while the impact of wrongly labeled data is suppressed.

* 10 pages, 3 figures, accepted at AAAI'25

Via

Access Paper or Ask Questions

Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction

Aug 07, 2024

Benjamin Matthias Ruppik, Michael Heck, Carel van Niekerk, Renato Vukovic, Hsien-chin Lin, Shutong Feng, Marcus Zibrowius, Milica Gašić

Figure 1 for Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction

Figure 2 for Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction

Figure 3 for Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction

Figure 4 for Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction

Abstract:A common approach for sequence tagging tasks based on contextual word representations is to train a machine learning classifier directly on these embedding vectors. This approach has two shortcomings. First, such methods consider single input sequences in isolation and are unable to put an individual embedding vector in relation to vectors outside the current local context of use. Second, the high performance of these models relies on fine-tuning the embedding model in conjunction with the classifier, which may not always be feasible due to the size or inaccessibility of the underlying feature-generation model. It is thus desirable, given a collection of embedding vectors of a corpus, i.e., a datastore, to find features of each vector that describe its relation to other, similar vectors in the datastore. With this in mind, we introduce complexity measures of the local topology of the latent space of a contextual language model with respect to a given datastore. The effectiveness of our features is demonstrated through their application to dialogue term extraction. Our work continues a line of research that explores the manifold hypothesis for word embeddings, demonstrating that local structure in the space carved out by word embeddings can be exploited to infer semantic properties.

* Accepted as a long paper to SIGDIAL 2024. 9 pages, 2 figures, 3 tables

Via

Access Paper or Ask Questions

Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding

Aug 05, 2024

Renato Vukovic, David Arps, Carel van Niekerk, Benjamin Matthias Ruppik, Hsien-Chin Lin, Michael Heck, Milica Gašić

Figure 1 for Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding

Figure 2 for Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding

Figure 3 for Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding

Figure 4 for Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding

Abstract:State-of-the-art task-oriented dialogue systems typically rely on task-specific ontologies for fulfilling user queries. The majority of task-oriented dialogue data, such as customer service recordings, comes without ontology and annotation. Such ontologies are normally built manually, limiting the application of specialised systems. Dialogue ontology construction is an approach for automating that process and typically consists of two steps: term extraction and relation extraction. In this work, we focus on relation extraction in a transfer learning set-up. To improve the generalisation, we propose an extension to the decoding mechanism of large language models. We adapt Chain-of-Thought (CoT) decoding, recently developed for reasoning problems, to generative relation extraction. Here, we generate multiple branches in the decoding space and select the relations based on a confidence threshold. By constraining the decoding to ontology terms and relations, we aim to decrease the risk of hallucination. We conduct extensive experimentation on two widely used datasets and find improvements in performance on target ontology for source fine-tuned and one-shot prompted large language models.

* Accepted to appear at SIGDIAL 2024. 9 pages, 4 figures

Via

Access Paper or Ask Questions

Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

Aug 22, 2022

Renato Vukovic, Michael Heck, Benjamin Matthias Ruppik, Carel van Niekerk, Marcus Zibrowius, Milica Gašić

Figure 1 for Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

Figure 2 for Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

Figure 3 for Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

Figure 4 for Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

Abstract:Goal oriented dialogue systems were originally designed as a natural language interface to a fixed data-set of entities that users might inquire about, further described by domain, slots, and values. As we move towards adaptable dialogue systems where knowledge about domains, slots, and values may change, there is an increasing need to automatically extract these terms from raw dialogues or related non-dialogue data on a large scale. In this paper, we take an important step in this direction by exploring different features that can enable systems to discover realizations of domains, slots, and values in dialogues in a purely data-driven fashion. The features that we examine stem from word embeddings, language modelling features, as well as topological features of the word embedding space. To examine the utility of each feature set, we train a seed model based on the widely used MultiWOZ data-set. Then, we apply this model to a different corpus, the Schema-Guided Dialogue data-set. Our method outperforms the previously proposed approach that relies solely on word embeddings. We also demonstrate that each of the features is responsible for discovering different kinds of content. We believe our results warrant further research towards ontology induction, and continued harnessing of topological data analysis for dialogue and natural language processing research.

* Accepted as a long paper to SIGDIAL 2022 (Edinburgh)

Via

Access Paper or Ask Questions