Picture for Kaijie Zhu

Kaijie Zhu

Understanding and Mitigating the Bias Inheritance in LLM-based Data Augmentation on Downstream Tasks

Add code
Feb 10, 2025
Figure 1 for Understanding and Mitigating the Bias Inheritance in LLM-based Data Augmentation on Downstream Tasks
Figure 2 for Understanding and Mitigating the Bias Inheritance in LLM-based Data Augmentation on Downstream Tasks
Figure 3 for Understanding and Mitigating the Bias Inheritance in LLM-based Data Augmentation on Downstream Tasks
Figure 4 for Understanding and Mitigating the Bias Inheritance in LLM-based Data Augmentation on Downstream Tasks
Viaarxiv icon

MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison

Add code
Feb 07, 2025
Figure 1 for MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
Figure 2 for MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
Figure 3 for MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
Figure 4 for MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
Viaarxiv icon

AgentReview: Exploring Peer Review Dynamics with LLM Agents

Add code
Jun 18, 2024
Figure 1 for AgentReview: Exploring Peer Review Dynamics with LLM Agents
Figure 2 for AgentReview: Exploring Peer Review Dynamics with LLM Agents
Figure 3 for AgentReview: Exploring Peer Review Dynamics with LLM Agents
Figure 4 for AgentReview: Exploring Peer Review Dynamics with LLM Agents
Viaarxiv icon

Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities

Add code
Jun 04, 2024
Viaarxiv icon

NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models

Add code
Mar 05, 2024
Figure 1 for NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Figure 2 for NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Figure 3 for NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Figure 4 for NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Viaarxiv icon

DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents

Add code
Feb 21, 2024
Figure 1 for DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents
Figure 2 for DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents
Figure 3 for DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents
Figure 4 for DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents
Viaarxiv icon

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Add code
Dec 19, 2023
Viaarxiv icon

PromptBench: A Unified Library for Evaluation of Large Language Models

Add code
Dec 13, 2023
Viaarxiv icon

CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents

Add code
Oct 26, 2023
Viaarxiv icon

DyVal: Graph-informed Dynamic Evaluation of Large Language Models

Add code
Oct 05, 2023
Viaarxiv icon