Picture for Mingyang Song

Mingyang Song

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models

Add code
Jan 07, 2025
Figure 1 for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Figure 2 for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Figure 3 for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Figure 4 for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Viaarxiv icon

A Survey of Query Optimization in Large Language Models

Add code
Dec 23, 2024
Viaarxiv icon

MiMoTable: A Multi-scale Spreadsheet Benchmark with Meta Operations for Table Reasoning

Add code
Dec 16, 2024
Viaarxiv icon

P/D-Serve: Serving Disaggregated Large Language Model at Scale

Add code
Aug 15, 2024
Viaarxiv icon

Mitigating Multilingual Hallucination in Large Vision-Language Models

Add code
Aug 01, 2024
Figure 1 for Mitigating Multilingual Hallucination in Large Vision-Language Models
Figure 2 for Mitigating Multilingual Hallucination in Large Vision-Language Models
Figure 3 for Mitigating Multilingual Hallucination in Large Vision-Language Models
Figure 4 for Mitigating Multilingual Hallucination in Large Vision-Language Models
Viaarxiv icon

SS-Bench: A Benchmark for Social Story Generation and Evaluation

Add code
Jun 22, 2024
Viaarxiv icon

Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better!

Add code
Jun 17, 2024
Viaarxiv icon

A Preliminary Empirical Study on Prompt-based Unsupervised Keyphrase Extraction

Add code
May 26, 2024
Viaarxiv icon

Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models

Add code
Mar 25, 2024
Viaarxiv icon

Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study

Add code
Jan 10, 2024
Viaarxiv icon