Picture for Zhijiang Guo

Zhijiang Guo

Long$^2$RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall

Add code
Oct 31, 2024
Viaarxiv icon

The Automated Verification of Textual Claims (AVeriTeC) Shared Task

Add code
Oct 31, 2024
Viaarxiv icon

Effi-Code: Unleashing Code Efficiency in Language Models

Add code
Oct 14, 2024
Viaarxiv icon

FormalAlign: Automated Alignment Evaluation for Autoformalization

Add code
Oct 14, 2024
Viaarxiv icon

Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models

Add code
Oct 05, 2024
Viaarxiv icon

UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference

Add code
Oct 04, 2024
Viaarxiv icon

Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling

Add code
Oct 03, 2024
Figure 1 for Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
Figure 2 for Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
Figure 3 for Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
Figure 4 for Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
Viaarxiv icon

Measuring, Evaluating and Improving Logical Consistency in Large Language Models

Add code
Oct 03, 2024
Viaarxiv icon

DebateQA: Evaluating Question Answering on Debatable Knowledge

Add code
Aug 02, 2024
Viaarxiv icon

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Add code
Jun 20, 2024
Viaarxiv icon