Picture for Fred Yu

Fred Yu

Minor DPO reject penalty to increase training robustness

Add code
Aug 22, 2024
Figure 1 for Minor DPO reject penalty to increase training robustness
Figure 2 for Minor DPO reject penalty to increase training robustness
Figure 3 for Minor DPO reject penalty to increase training robustness
Figure 4 for Minor DPO reject penalty to increase training robustness
Viaarxiv icon

Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation

Add code
Aug 20, 2024
Viaarxiv icon