Picture for Di Yin

Di Yin

Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts

Add code
Apr 09, 2025
Viaarxiv icon

FactGuard: Leveraging Multi-Agent Systems to Generate Answerable and Unanswerable Questions for Enhanced Long-Context LLM Extraction

Add code
Apr 08, 2025
Viaarxiv icon

RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following

Add code
Feb 17, 2025
Viaarxiv icon

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

Add code
Oct 29, 2024
Figure 1 for Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models
Figure 2 for Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models
Figure 3 for Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models
Figure 4 for Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models
Viaarxiv icon

CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data

Add code
Sep 25, 2024
Figure 1 for CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data
Figure 2 for CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data
Figure 3 for CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data
Figure 4 for CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data
Viaarxiv icon

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing

Add code
Sep 25, 2024
Figure 1 for Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
Figure 2 for Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
Figure 3 for Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
Figure 4 for Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
Viaarxiv icon

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Add code
Aug 09, 2024
Figure 1 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 2 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 3 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Figure 4 for VITA: Towards Open-Source Interactive Omni Multimodal LLM
Viaarxiv icon

SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models

Add code
Aug 05, 2024
Viaarxiv icon

Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence

Add code
Jun 16, 2024
Viaarxiv icon

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments

Add code
Apr 27, 2024
Viaarxiv icon