Picture for Yilun Zhao

Yilun Zhao

ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain

Add code
Nov 23, 2024
Viaarxiv icon

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Add code
Nov 08, 2024
Figure 1 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 2 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 3 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 4 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Viaarxiv icon

M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models

Add code
Nov 06, 2024
Figure 1 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 2 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 3 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 4 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Viaarxiv icon

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Add code
Oct 30, 2024
Viaarxiv icon

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

Add code
Oct 11, 2024
Figure 1 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 2 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 3 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 4 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Viaarxiv icon

ReIFE: Re-evaluating Instruction-Following Evaluation

Add code
Oct 09, 2024
Figure 1 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 2 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 3 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 4 for ReIFE: Re-evaluating Instruction-Following Evaluation
Viaarxiv icon

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Add code
Aug 20, 2024
Figure 1 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Figure 2 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Figure 3 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Figure 4 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Viaarxiv icon

SuperEncoder: Towards Universal Neural Approximate Quantum State Preparation

Add code
Aug 10, 2024
Viaarxiv icon

Step-Back Profiling: Distilling User History for Personalized Scientific Writing

Add code
Jun 20, 2024
Viaarxiv icon

Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation

Add code
Jun 20, 2024
Figure 1 for Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Figure 2 for Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Figure 3 for Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Viaarxiv icon