Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Farrin Marouf Sofian

Parallel Token Prediction for Language Models

Dec 24, 2025

Felix Draxler, Justus Will, Farrin Marouf Sofian, Theofanis Karaletsos, Sameer Singh, Stephan Mandt

Abstract:We propose Parallel Token Prediction (PTP), a universal framework for parallel sequence generation in language models. PTP jointly predicts multiple dependent tokens in a single transformer call by incorporating the sampling procedure into the model. This reduces the latency bottleneck of autoregressive decoding, and avoids the restrictive independence assumptions common in existing multi-token prediction methods. We prove that PTP can represent arbitrary autoregressive sequence distributions. PTP is trained either by distilling an existing model or through inverse autoregressive training without a teacher. Experimentally, we achieve state-of-the-art speculative decoding performance on Vicuna-7B by accepting over four tokens per step on Spec-Bench. The universality of our framework indicates that parallel generation of long sequences is feasible without loss of modeling power.

* Preprint. Under review

Via

Access Paper or Ask Questions

Variational Control for Guidance in Diffusion Models

Feb 06, 2025

Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, Stephan Mandt

Figure 1 for Variational Control for Guidance in Diffusion Models

Figure 2 for Variational Control for Guidance in Diffusion Models

Figure 3 for Variational Control for Guidance in Diffusion Models

Figure 4 for Variational Control for Guidance in Diffusion Models

Abstract:Diffusion models exhibit excellent sample quality, but existing guidance methods often require additional model training or are limited to specific tasks. We revisit guidance in diffusion models from the perspective of variational inference and control, introducing Diffusion Trajectory Matching (DTM) that enables guiding pretrained diffusion trajectories to satisfy a terminal cost. DTM unifies a broad class of guidance methods and enables novel instantiations. We introduce a new method within this framework that achieves state-of-the-art results on several linear and (blind) non-linear inverse problems without requiring additional model training or modifications. For instance, in ImageNet non-linear deblurring, our model achieves an FID score of 34.31, significantly improving over the best pretrained-method baseline (FID 78.07). We will make the code available in a future update.

* 8 pages in main text. Total of 20 pages

Via

Access Paper or Ask Questions

SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models

May 01, 2024

Burak Can Biner, Farrin Marouf Sofian, Umur Berkay Karakaş, Duygu Ceylan, Erkut Erdem, Aykut Erdem

Figure 1 for SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models

Figure 2 for SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models

Figure 3 for SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models

Figure 4 for SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models

Abstract:We are witnessing a revolution in conditional image synthesis with the recent success of large scale text-to-image generation methods. This success also opens up new opportunities in controlling the generation and editing process using multi-modal input. While spatial control using cues such as depth, sketch, and other images has attracted a lot of research, we argue that another equally effective modality is audio since sound and sight are two main components of human perception. Hence, we propose a method to enable audio-conditioning in large scale image diffusion models. Our method first maps features obtained from audio clips to tokens that can be injected into the diffusion model in a fashion similar to text tokens. We introduce additional audio-image cross attention layers which we finetune while freezing the weights of the original layers of the diffusion model. In addition to audio conditioned image generation, our method can also be utilized in conjuction with diffusion based editing methods to enable audio conditioned image editing. We demonstrate our method on a wide range of audio and image datasets. We perform extensive comparisons with recent methods and show favorable performance.

Via

Access Paper or Ask Questions

GECTurk: Grammatical Error Correction and Detection Dataset for Turkish

Sep 20, 2023

Atakan Kara, Farrin Marouf Sofian, Andrew Bond, Gözde Gül Şahin

Figure 1 for GECTurk: Grammatical Error Correction and Detection Dataset for Turkish

Figure 2 for GECTurk: Grammatical Error Correction and Detection Dataset for Turkish

Figure 3 for GECTurk: Grammatical Error Correction and Detection Dataset for Turkish

Figure 4 for GECTurk: Grammatical Error Correction and Detection Dataset for Turkish

Abstract:Grammatical Error Detection and Correction (GEC) tools have proven useful for native speakers and second language learners. Developing such tools requires a large amount of parallel, annotated data, which is unavailable for most languages. Synthetic data generation is a common practice to overcome the scarcity of such data. However, it is not straightforward for morphologically rich languages like Turkish due to complex writing rules that require phonological, morphological, and syntactic information. In this work, we present a flexible and extensible synthetic data generation pipeline for Turkish covering more than 20 expert-curated grammar and spelling rules (a.k.a., writing rules) implemented through complex transformation functions. Using this pipeline, we derive 130,000 high-quality parallel sentences from professionally edited articles. Additionally, we create a more realistic test set by manually annotating a set of movie reviews. We implement three baselines formulating the task as i) neural machine translation, ii) sequence tagging, and iii) prefix tuning with a pretrained decoder-only model, achieving strong results. Furthermore, we perform exhaustive experiments on out-of-domain datasets to gain insights on the transferability and robustness of the proposed approaches. Our results suggest that our corpus, GECTurk, is high-quality and allows knowledge transfer for the out-of-domain setting. To encourage further research on Turkish GEC, we release our datasets, baseline models, and the synthetic data generation pipeline at https://github.com/GGLAB-KU/gecturk.

* Accepted at Findings of IJCNLP-AACL 2023

Via

Access Paper or Ask Questions