Picture for Hainiu Xu

Hainiu Xu

Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Add code
Jun 28, 2024
Figure 1 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Figure 2 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Figure 3 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Figure 4 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Viaarxiv icon

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Add code
May 13, 2024
Figure 1 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Figure 2 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Figure 3 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Figure 4 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Viaarxiv icon

Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond

Add code
Feb 22, 2024
Viaarxiv icon

Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives

Add code
Feb 16, 2024
Figure 1 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives
Figure 2 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives
Figure 3 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives
Figure 4 for Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives
Viaarxiv icon

OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Add code
Feb 14, 2024
Viaarxiv icon

OpenPI2.0: An Improved Dataset for Entity Tracking in Texts

Add code
May 24, 2023
Viaarxiv icon

Exploring the Curious Case of Code Prompts

Add code
Apr 26, 2023
Viaarxiv icon

Human-in-the-Loop Schema Induction

Add code
Feb 25, 2023
Viaarxiv icon

Causal Reasoning of Entities and Events in Procedural Texts

Add code
Jan 29, 2023
Viaarxiv icon