Picture for Kerui Cao

Kerui Cao

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Add code
Dec 31, 2025
Viaarxiv icon

Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models

Add code
May 26, 2025
Figure 1 for Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models
Figure 2 for Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models
Figure 3 for Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models
Figure 4 for Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models
Viaarxiv icon

Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models

Add code
Dec 23, 2024
Figure 1 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Figure 2 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Figure 3 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Figure 4 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Viaarxiv icon