Picture for Frederic Sala

Frederic Sala

Test-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench Dataset

Add code
Jun 25, 2025
Viaarxiv icon

Time To Impeach LLM-as-a-Judge: Programs are the Future of Evaluation

Add code
Jun 12, 2025
Viaarxiv icon

Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Add code
Jun 05, 2025
Viaarxiv icon

R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Add code
May 01, 2025
Viaarxiv icon

COSMOS: Predictable and Cost-Effective Adaptation of LLMs

Add code
Apr 30, 2025
Viaarxiv icon

TARDIS: Mitigating Temporal Misalignment via Representation Steering

Add code
Mar 25, 2025
Viaarxiv icon

Personalize Your LLM: Fake it then Align it

Add code
Mar 05, 2025
Viaarxiv icon

Tabby: Tabular Data Synthesis with Language Models

Add code
Mar 04, 2025
Viaarxiv icon

Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics

Add code
Feb 19, 2025
Viaarxiv icon

ScriptoriumWS: A Code Generation Assistant for Weak Supervision

Add code
Feb 17, 2025
Viaarxiv icon