Picture for Yilun Zhao

Yilun Zhao

ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning

Add code
Jan 11, 2025
Viaarxiv icon

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

Add code
Dec 30, 2024
Viaarxiv icon

ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain

Add code
Nov 23, 2024
Viaarxiv icon

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Add code
Nov 08, 2024
Figure 1 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 2 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 3 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Figure 4 for FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
Viaarxiv icon

M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models

Add code
Nov 06, 2024
Figure 1 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 2 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 3 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Figure 4 for M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Viaarxiv icon

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Add code
Oct 30, 2024
Viaarxiv icon

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

Add code
Oct 11, 2024
Figure 1 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 2 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 3 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Figure 4 for P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Viaarxiv icon

ReIFE: Re-evaluating Instruction-Following Evaluation

Add code
Oct 09, 2024
Figure 1 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 2 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 3 for ReIFE: Re-evaluating Instruction-Following Evaluation
Figure 4 for ReIFE: Re-evaluating Instruction-Following Evaluation
Viaarxiv icon

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Add code
Aug 20, 2024
Figure 1 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Figure 2 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Figure 3 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Figure 4 for Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Viaarxiv icon

SuperEncoder: Towards Universal Neural Approximate Quantum State Preparation

Add code
Aug 10, 2024
Viaarxiv icon