Picture for Dragomir Radev

Dragomir Radev

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

Add code
Oct 11, 2024
Figure 1 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 2 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 3 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 4 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Viaarxiv icon

modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models

Add code
Jun 24, 2024
Viaarxiv icon

MedGen: A Python Natural Language Processing Toolkit for Medical Text Processing

Add code
Nov 28, 2023
Viaarxiv icon

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

Add code
Nov 15, 2023
Viaarxiv icon

Fair Abstractive Summarization of Diverse Perspectives

Add code
Nov 14, 2023
Viaarxiv icon

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

Add code
Oct 02, 2023
Viaarxiv icon

RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations

Add code
Jun 25, 2023
Figure 1 for RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations
Figure 2 for RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations
Figure 3 for RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations
Figure 4 for RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations
Viaarxiv icon

bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark

Add code
Jun 07, 2023
Viaarxiv icon

On Learning to Summarize with Large Language Models as References

Add code
May 23, 2023
Viaarxiv icon

QTSumm: A New Benchmark for Query-Focused Table Summarization

Add code
May 23, 2023
Figure 1 for QTSumm: A New Benchmark for Query-Focused Table Summarization
Figure 2 for QTSumm: A New Benchmark for Query-Focused Table Summarization
Figure 3 for QTSumm: A New Benchmark for Query-Focused Table Summarization
Figure 4 for QTSumm: A New Benchmark for Query-Focused Table Summarization
Viaarxiv icon