Picture for Wenbo Su

Wenbo Su

ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph

Add code
Mar 20, 2025
Viaarxiv icon

Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation

Add code
Mar 20, 2025
Viaarxiv icon

ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models

Add code
Feb 27, 2025
Viaarxiv icon

UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering

Add code
Feb 26, 2025
Viaarxiv icon

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Add code
Feb 26, 2025
Viaarxiv icon

AIR: Complex Instruction Generation via Automatic Iterative Refinement

Add code
Feb 25, 2025
Viaarxiv icon

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Add code
Feb 20, 2025
Viaarxiv icon

ChineseSimpleVQA -- "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

Add code
Feb 19, 2025
Viaarxiv icon

Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models

Add code
Feb 17, 2025
Viaarxiv icon

Unlocking Scaling Law in Industrial Recommendation Systems with a Three-step Paradigm based Large User Model

Add code
Feb 12, 2025
Viaarxiv icon