Picture for Shaobo Wang

Shaobo Wang

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Add code
Feb 05, 2026
Viaarxiv icon

Socratic-Geo: Synthetic Data Generation and Geometric Reasoning via Multi-Agent Interaction

Add code
Feb 03, 2026
Viaarxiv icon

Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis

Add code
Feb 03, 2026
Viaarxiv icon

Grounding and Enhancing Informativeness and Utility in Dataset Distillation

Add code
Jan 29, 2026
Viaarxiv icon

UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective

Add code
Nov 18, 2025
Figure 1 for UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective
Figure 2 for UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective
Figure 3 for UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective
Figure 4 for UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective
Viaarxiv icon

ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation

Add code
Nov 11, 2025
Figure 1 for ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation
Figure 2 for ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation
Figure 3 for ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation
Figure 4 for ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation
Viaarxiv icon

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

Add code
May 25, 2025
Viaarxiv icon

KO: Kinetics-inspired Neural Optimizer with PDE Simulation Approaches

Add code
May 20, 2025
Viaarxiv icon

DD-Ranking: Rethinking the Evaluation of Dataset Distillation

Add code
May 19, 2025
Figure 1 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 2 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 3 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Figure 4 for DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Viaarxiv icon

Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning

Add code
May 18, 2025
Viaarxiv icon