Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Victor Ye Dong

Greedy Information Projection for LLM Data Selection

Mar 14, 2026

Victor Ye Dong, Kuan-Yun Lee, Jiamei Shuai, Shengfei Liu, Yi Liu, Jian Jiao

Abstract:We present \emph{Greedy Information Projection} (\textsc{GIP}), a principled framework for choosing training examples for large language model fine-tuning. \textsc{GIP} casts selection as maximizing mutual information between a subset of examples and task-specific query signals, which may originate from LLM quality judgments, metadata, or other sources. The framework involves optimizing a closed-form mutual information objective defined using both data and query embeddings, naturally balancing {\it quality} and {\it diversity}. Optimizing this score is equivalent to maximizing the projection of the query embedding matrix onto the span of the selected data, which provides a geometric explanation for the co-emergence of quality and diversity. Building on this view, we employ a fast greedy matching-pursuit procedure with efficient projection-based updates. On instruction-following and mathematical reasoning datasets, \textsc{GIP} selects small subsets that match full-data fine-tuning while using only a fraction of examples and compute, unifying quality-aware and diversity-aware selection for efficient fine-tuning.

* Published as a paper at 3rd DATA-FM workshop @ ICLR 2026, Brazil

Via

Access Paper or Ask Questions

Semi-Offline Reinforcement Learning for Optimized Text Generation

Jun 16, 2023

Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan

Figure 1 for Semi-Offline Reinforcement Learning for Optimized Text Generation

Figure 2 for Semi-Offline Reinforcement Learning for Optimized Text Generation

Figure 3 for Semi-Offline Reinforcement Learning for Optimized Text Generation

Figure 4 for Semi-Offline Reinforcement Learning for Optimized Text Generation

Abstract:In reinforcement learning (RL), there are two major settings for interacting with the environment: online and offline. Online methods explore the environment at significant time cost, and offline methods efficiently obtain reward signals by sacrificing exploration capability. We propose semi-offline RL, a novel paradigm that smoothly transits from offline to online settings, balances exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline approach is efficient and yields comparable or often better performance compared with state-of-the-art methods.

* In Proceedings of the 40th International Conference on Machine Learning (ICML 2023)

Via

Access Paper or Ask Questions

FAST: Improving Controllability for Text Generation with Feedback Aware Self-Training

Oct 06, 2022

Junyi Chai, Reid Pryzant, Victor Ye Dong, Konstantin Golobokov, Chenguang Zhu, Yi Liu

Figure 1 for FAST: Improving Controllability for Text Generation with Feedback Aware Self-Training

Figure 2 for FAST: Improving Controllability for Text Generation with Feedback Aware Self-Training

Figure 3 for FAST: Improving Controllability for Text Generation with Feedback Aware Self-Training

Figure 4 for FAST: Improving Controllability for Text Generation with Feedback Aware Self-Training

Abstract:Controllable text generation systems often leverage control codes to direct various properties of the output like style and length. Inspired by recent work on causal inference for NLP, this paper reveals a previously overlooked flaw in these control code-based conditional text generation algorithms. Spurious correlations in the training data can lead models to incorrectly rely on parts of the input other than the control code for attribute selection, significantly undermining downstream generation quality and controllability. We demonstrate the severity of this issue with a series of case studies and then propose two simple techniques to reduce these correlations in training sets. The first technique is based on resampling the data according to an example's propensity towards each linguistic attribute (IPS). The second produces multiple counterfactual versions of each example and then uses an additional feedback mechanism to remove noisy examples (feedback aware self-training, FAST). We evaluate on 3 tasks -- news headline, meta review, and search ads generation -- and demonstrate that FAST can significantly improve the controllability and language quality of generated outputs when compared to state-of-the-art controllable text generation approaches.

Via

Access Paper or Ask Questions

DeepGen: Diverse Search Ad Generation and Real-Time Customization

Aug 06, 2022

Konstantin Golobokov, Junyi Chai, Victor Ye Dong, Mandy Gu, Bingyu Chi, Jie Cao, Yulan Yan, Yi Liu

Figure 1 for DeepGen: Diverse Search Ad Generation and Real-Time Customization

Figure 2 for DeepGen: Diverse Search Ad Generation and Real-Time Customization

Figure 3 for DeepGen: Diverse Search Ad Generation and Real-Time Customization

Figure 4 for DeepGen: Diverse Search Ad Generation and Real-Time Customization

Abstract:We present DeepGen, a system deployed at web scale for automatically creating sponsored search advertisements (ads) for BingAds customers. We leverage state-of-the-art natural language generation (NLG) models to generate fluent ads from advertiser's web pages in an abstractive fashion and solve practical issues such as factuality and inference speed. In addition, our system creates a customized ad in real-time in response to the user's search query, therefore highlighting different aspects of the same product based on what the user is looking for. To achieve this, our system generates a diverse choice of smaller pieces of the ad ahead of time and, at query time, selects the most relevant ones to be stitched into a complete ad. We improve generation diversity by training a controllable NLG model to generate multiple ads for the same web page highlighting different selling points. Our system design further improves diversity horizontally by first running an ensemble of generation models trained with different objectives and then using a diversity sampling algorithm to pick a diverse subset of generation results for online selection. Experimental results show the effectiveness of our proposed system design. Our system is currently deployed in production, serving ${\sim}4\%$ of global ads served in Bing.

Via

Access Paper or Ask Questions