Picture for Minlie Huang

Minlie Huang

EJ

StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error

Add code
Mar 13, 2025
Viaarxiv icon

LongSafety: Evaluating Long-Context Safety of Large Language Models

Add code
Feb 24, 2025
Viaarxiv icon

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Add code
Feb 24, 2025
Viaarxiv icon

HPSS: Heuristic Prompting Strategy Search for LLM Evaluators

Add code
Feb 18, 2025
Viaarxiv icon

DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing

Add code
Feb 17, 2025
Viaarxiv icon

Human Decision-making is Susceptible to AI-driven Manipulation

Add code
Feb 11, 2025
Viaarxiv icon

MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science

Add code
Jan 18, 2025
Figure 1 for MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science
Figure 2 for MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science
Figure 3 for MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science
Figure 4 for MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science
Viaarxiv icon

Enhanced Large Language Models for Effective Screening of Depression and Anxiety

Add code
Jan 15, 2025
Viaarxiv icon

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Add code
Dec 30, 2024
Viaarxiv icon

LegalAgentBench: Evaluating LLM Agents in Legal Domain

Add code
Dec 23, 2024
Viaarxiv icon