Picture for Tian Ding

Tian Ding

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

Add code
Jul 31, 2024
Viaarxiv icon

Adam-mini: Use Fewer Learning Rates To Gain More

Add code
Jun 26, 2024
Viaarxiv icon

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

Add code
Jun 04, 2024
Viaarxiv icon

Why Transformers Need Adam: A Hessian Perspective

Add code
Feb 26, 2024
Viaarxiv icon

Federated Learning with Lossy Distributed Source Coding: Analysis and Optimization

Add code
Apr 23, 2022
Figure 1 for Federated Learning with Lossy Distributed Source Coding: Analysis and Optimization
Figure 2 for Federated Learning with Lossy Distributed Source Coding: Analysis and Optimization
Figure 3 for Federated Learning with Lossy Distributed Source Coding: Analysis and Optimization
Figure 4 for Federated Learning with Lossy Distributed Source Coding: Analysis and Optimization
Viaarxiv icon

The Global Landscape of Neural Networks: An Overview

Add code
Jul 02, 2020
Figure 1 for The Global Landscape of Neural Networks: An Overview
Figure 2 for The Global Landscape of Neural Networks: An Overview
Figure 3 for The Global Landscape of Neural Networks: An Overview
Figure 4 for The Global Landscape of Neural Networks: An Overview
Viaarxiv icon

Sub-Optimal Local Minima Exist for Almost All Over-parameterized Neural Networks

Add code
Nov 04, 2019
Figure 1 for Sub-Optimal Local Minima Exist for Almost All Over-parameterized Neural Networks
Viaarxiv icon

Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations

Add code
Dec 28, 2018
Figure 1 for Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations
Viaarxiv icon