Picture for Hainiu Xu

Hainiu Xu

Modeling Subjectivity in Cognitive Appraisal with Language Models

Add code
Mar 14, 2025
Viaarxiv icon

EnigmaToM: Improve LLMs' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States

Add code
Mar 05, 2025
Viaarxiv icon

Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Add code
Jun 28, 2024
Figure 1 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Figure 2 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Figure 3 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Figure 4 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Viaarxiv icon

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Add code
May 13, 2024
Figure 1 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Figure 2 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Figure 3 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Figure 4 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Viaarxiv icon

Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond

Add code
Feb 22, 2024
Viaarxiv icon

Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives

Add code
Feb 16, 2024
Figure 1 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives
Figure 2 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives
Figure 3 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives
Figure 4 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives
Viaarxiv icon

OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Add code
Feb 14, 2024
Figure 1 for OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
Figure 2 for OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
Figure 3 for OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
Figure 4 for OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
Viaarxiv icon

OpenPI2.0: An Improved Dataset for Entity Tracking in Texts

Add code
May 24, 2023
Viaarxiv icon

Exploring the Curious Case of Code Prompts

Add code
Apr 26, 2023
Viaarxiv icon

Human-in-the-Loop Schema Induction

Add code
Feb 25, 2023
Viaarxiv icon