Picture for Arman Cohan

Arman Cohan

ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain

Add code
Nov 23, 2024
Viaarxiv icon

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Add code
Nov 08, 2024
Figure 1 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 2 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 3 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 4 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Viaarxiv icon

SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Add code
Nov 08, 2024
Figure 1 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
Figure 2 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
Figure 3 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
Figure 4 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
Viaarxiv icon

Bayesian Calibration of Win Rate Estimation with LLM Evaluators

Add code
Nov 07, 2024
Viaarxiv icon

M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models

Add code
Nov 06, 2024
Figure 1 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 2 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 3 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 4 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Viaarxiv icon

MDCure: A Scalable Pipeline for Multi-Document Instruction-Following

Add code
Oct 30, 2024
Figure 1 for MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Figure 2 for MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Figure 3 for MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Figure 4 for MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Viaarxiv icon

COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences

Add code
Oct 30, 2024
Viaarxiv icon

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Add code
Oct 30, 2024
Viaarxiv icon

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

Add code
Oct 11, 2024
Figure 1 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 2 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 3 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 4 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Viaarxiv icon

ReIFE: Re-evaluating Instruction-Following Evaluation

Add code
Oct 09, 2024
Figure 1 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 2 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 3 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 4 for ReIFE: Re-evaluating Instruction-Following Evaluation
Viaarxiv icon