Picture for Arman Cohan

Arman Cohan

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Add code
Nov 08, 2024
Viaarxiv icon

SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Add code
Nov 08, 2024
Viaarxiv icon

Bayesian Calibration of Win Rate Estimation with LLM Evaluators

Add code
Nov 07, 2024
Viaarxiv icon

M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models

Add code
Nov 06, 2024
Figure 1 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 2 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 3 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 4 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Viaarxiv icon

MDCure: A Scalable Pipeline for Multi-Document Instruction-Following

Add code
Oct 30, 2024
Viaarxiv icon

COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences

Add code
Oct 30, 2024
Viaarxiv icon

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Add code
Oct 30, 2024
Viaarxiv icon

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

Add code
Oct 11, 2024
Viaarxiv icon

ReIFE: Re-evaluating Instruction-Following Evaluation

Add code
Oct 09, 2024
Figure 1 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 2 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 3 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 4 for ReIFE: Re-evaluating Instruction-Following Evaluation
Viaarxiv icon

MetaMath: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models

Add code
Sep 28, 2024
Viaarxiv icon