Picture for Wenyue Hua

Wenyue Hua

AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence

Add code
Mar 11, 2025
Viaarxiv icon

InductionBench: LLMs Fail in the Simplest Complexity Class

Add code
Feb 26, 2025
Viaarxiv icon

Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents

Add code
Feb 18, 2025
Viaarxiv icon

ADO: Automatic Data Optimization for Inputs in LLM Prompts

Add code
Feb 17, 2025
Viaarxiv icon

Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense

Add code
Jan 05, 2025
Viaarxiv icon

RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios

Add code
Dec 12, 2024
Figure 1 for RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
Figure 2 for RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
Figure 3 for RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
Figure 4 for RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
Viaarxiv icon

Disentangling Memory and Reasoning Ability in Large Language Models

Add code
Nov 21, 2024
Viaarxiv icon

Game-theoretic LLM: Agent Workflow for Negotiation Games

Add code
Nov 12, 2024
Figure 1 for Game-theoretic LLM: Agent Workflow for Negotiation Games
Figure 2 for Game-theoretic LLM: Agent Workflow for Negotiation Games
Figure 3 for Game-theoretic LLM: Agent Workflow for Negotiation Games
Figure 4 for Game-theoretic LLM: Agent Workflow for Negotiation Games
Viaarxiv icon

AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow

Add code
Sep 27, 2024
Figure 1 for AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow
Figure 2 for AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow
Figure 3 for AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow
Figure 4 for AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow
Viaarxiv icon

Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models

Add code
Jul 15, 2024
Figure 1 for Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
Figure 2 for Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
Figure 3 for Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
Figure 4 for Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
Viaarxiv icon