Picture for Tianyi Bai

Tianyi Bai

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Add code
Feb 02, 2026
Viaarxiv icon

From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning

Add code
Jan 19, 2026
Viaarxiv icon

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Add code
Dec 18, 2025
Viaarxiv icon

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Add code
Dec 11, 2025
Viaarxiv icon

Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification

Add code
Jun 08, 2025
Viaarxiv icon

Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning

Add code
Jun 08, 2025
Viaarxiv icon

Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models

Add code
Apr 19, 2025
Viaarxiv icon

Unsupervised Topic Models are Data Mixers for Pre-training Language Models

Add code
Feb 24, 2025
Viaarxiv icon

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Add code
Oct 13, 2024
Figure 1 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 2 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 3 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 4 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Viaarxiv icon

Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

Add code
Oct 10, 2024
Figure 1 for Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
Figure 2 for Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
Figure 3 for Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
Figure 4 for Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
Viaarxiv icon