Picture for Congliang Chen

Congliang Chen

Exploring the Generalization Capabilities of AID-based Bi-level Optimization

Add code
Nov 25, 2024
Viaarxiv icon

Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity

Add code
Aug 29, 2024
Viaarxiv icon

Adam-mini: Use Fewer Learning Rates To Gain More

Add code
Jun 26, 2024
Viaarxiv icon

Why Transformers Need Adam: A Hessian Perspective

Add code
Feb 26, 2024
Viaarxiv icon

Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz

Add code
Oct 23, 2023
Figure 1 for Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz
Figure 2 for Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz
Figure 3 for Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz
Figure 4 for Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz
Viaarxiv icon

Adam Can Converge Without Any Modification on Update Rules

Add code
Aug 23, 2022
Figure 1 for Adam Can Converge Without Any Modification on Update Rules
Figure 2 for Adam Can Converge Without Any Modification on Update Rules
Figure 3 for Adam Can Converge Without Any Modification on Update Rules
Figure 4 for Adam Can Converge Without Any Modification on Update Rules
Viaarxiv icon

Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis

Add code
May 28, 2022
Figure 1 for Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis
Figure 2 for Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis
Figure 3 for Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis
Figure 4 for Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis
Viaarxiv icon

Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration

Add code
Jan 14, 2021
Figure 1 for Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Figure 2 for Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Figure 3 for Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Figure 4 for Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Viaarxiv icon

Quantized Adam with Error Feedback

Add code
Apr 29, 2020
Figure 1 for Quantized Adam with Error Feedback
Figure 2 for Quantized Adam with Error Feedback
Figure 3 for Quantized Adam with Error Feedback
Figure 4 for Quantized Adam with Error Feedback
Viaarxiv icon

Arbitrary Style Transfer with Deep Feature Reshuffle

Add code
Jun 20, 2018
Figure 1 for Arbitrary Style Transfer with Deep Feature Reshuffle
Figure 2 for Arbitrary Style Transfer with Deep Feature Reshuffle
Figure 3 for Arbitrary Style Transfer with Deep Feature Reshuffle
Figure 4 for Arbitrary Style Transfer with Deep Feature Reshuffle
Viaarxiv icon