Picture for Ximing Lu

Ximing Lu

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge

Add code
Oct 21, 2025
Viaarxiv icon

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation

Add code
Sep 09, 2025
Figure 1 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 2 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 3 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 4 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Viaarxiv icon

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Add code
Aug 21, 2025
Figure 1 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 2 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 3 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Figure 4 for NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Viaarxiv icon

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

Add code
Jun 16, 2025
Viaarxiv icon

Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions

Add code
Jun 10, 2025
Viaarxiv icon

Synthetic Visual Genome

Add code
Jun 09, 2025
Viaarxiv icon

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Add code
May 30, 2025
Viaarxiv icon

Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning

Add code
May 26, 2025
Viaarxiv icon

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning

Add code
Apr 06, 2025
Figure 1 for Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning
Figure 2 for Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning
Figure 3 for Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning
Figure 4 for Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning
Viaarxiv icon

Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models

Add code
Mar 15, 2025
Viaarxiv icon