Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

Mar 07, 2023

Martin Josifoski, Marija Sakota, Maxime Peyrard, Robert West

Figure 1 for Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

Figure 2 for Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

Figure 3 for Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

Figure 4 for Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) show great potential for synthetic data generation. This work shows that useful data can be synthetically generated even for tasks that cannot be solved directly by the LLM: we show that, for problems with structured outputs, it is possible to prompt an LLM to perform the task in the opposite direction, to generate plausible text for the target structure. Leveraging the asymmetry in task difficulty makes it possible to produce large-scale, high-quality data for complex tasks. We demonstrate the effectiveness of this approach on closed information extraction, where collecting ground-truth data is challenging, and no satisfactory dataset exists to date. We synthetically generate a dataset of 1.8M data points, demonstrate its superior quality compared to existing datasets in a human evaluation and use it to finetune small models (220M and 770M parameters). The models we introduce, SynthIE, outperform existing baselines of comparable size with a substantial gap of 57 and 79 absolute points in micro and macro F1, respectively. Code, data, and models are available at https://github.com/epfl-dlab/SynthIE.

View paper on

Share this with someone who'll enjoy it:

Title:Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

Paper and Code