Picture for Xinyu Hu

Xinyu Hu

Aspect-Guided Multi-Level Perturbation Analysis of Large Language Models in Automated Peer Review

Add code
Feb 18, 2025
Viaarxiv icon

A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability

Add code
Feb 17, 2025
Viaarxiv icon

Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference

Add code
Dec 31, 2024
Figure 1 for Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Figure 2 for Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Figure 3 for Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Figure 4 for Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Viaarxiv icon

What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality

Add code
Nov 20, 2024
Figure 1 for What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality
Figure 2 for What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality
Figure 3 for What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality
Figure 4 for What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality
Viaarxiv icon

Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation

Add code
Oct 22, 2024
Figure 1 for Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Figure 2 for Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Figure 3 for Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Figure 4 for Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Viaarxiv icon

Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models

Add code
Oct 17, 2024
Figure 1 for Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models
Figure 2 for Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models
Figure 3 for Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models
Figure 4 for Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models
Viaarxiv icon

SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval

Add code
Sep 21, 2024
Figure 1 for SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval
Figure 2 for SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval
Figure 3 for SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval
Figure 4 for SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval
Viaarxiv icon

Themis: Towards Flexible and Interpretable NLG Evaluation

Add code
Jun 26, 2024
Figure 1 for Themis: Towards Flexible and Interpretable NLG Evaluation
Figure 2 for Themis: Towards Flexible and Interpretable NLG Evaluation
Figure 3 for Themis: Towards Flexible and Interpretable NLG Evaluation
Figure 4 for Themis: Towards Flexible and Interpretable NLG Evaluation
Viaarxiv icon

Task Oriented In-Domain Data Augmentation

Add code
Jun 24, 2024
Figure 1 for Task Oriented In-Domain Data Augmentation
Figure 2 for Task Oriented In-Domain Data Augmentation
Figure 3 for Task Oriented In-Domain Data Augmentation
Figure 4 for Task Oriented In-Domain Data Augmentation
Viaarxiv icon

MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

Add code
Jun 19, 2024
Figure 1 for MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency
Figure 2 for MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency
Figure 3 for MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency
Figure 4 for MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency
Viaarxiv icon