Picture for Sinong Wang

Sinong Wang

Sid

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

Add code
Jan 31, 2025
Viaarxiv icon

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Add code
Jan 18, 2025
Viaarxiv icon

Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment

Add code
Jan 16, 2025
Figure 1 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Figure 2 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Figure 3 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Figure 4 for Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Viaarxiv icon

Improving Model Factuality with Fine-grained Critique-based Evaluator

Add code
Oct 24, 2024
Viaarxiv icon

Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following

Add code
Oct 21, 2024
Viaarxiv icon

Preference Optimization with Multi-Sample Comparisons

Add code
Oct 16, 2024
Viaarxiv icon

The Perfect Blend: Redefining RLHF with Mixture of Judges

Add code
Sep 30, 2024
Figure 1 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 2 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 3 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 4 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

Phonetic and Lexical Discovery of a Canine Language using HuBERT

Add code
Feb 25, 2024
Figure 1 for Phonetic and Lexical Discovery of a Canine Language using HuBERT
Figure 2 for Phonetic and Lexical Discovery of a Canine Language using HuBERT
Figure 3 for Phonetic and Lexical Discovery of a Canine Language using HuBERT
Figure 4 for Phonetic and Lexical Discovery of a Canine Language using HuBERT
Viaarxiv icon

SPAR: Personalized Content-Based Recommendation via Long Engagement Attention

Add code
Feb 16, 2024
Figure 1 for SPAR: Personalized Content-Based Recommendation via Long Engagement Attention
Figure 2 for SPAR: Personalized Content-Based Recommendation via Long Engagement Attention
Figure 3 for SPAR: Personalized Content-Based Recommendation via Long Engagement Attention
Figure 4 for SPAR: Personalized Content-Based Recommendation via Long Engagement Attention
Viaarxiv icon