Picture for Ilya Loshchilov

Ilya Loshchilov

LIS

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Add code
Oct 01, 2024
Viaarxiv icon

Weight Norm Control

Add code
Nov 21, 2023
Viaarxiv icon

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

Add code
Feb 24, 2018
Figure 1 for Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Figure 2 for Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Figure 3 for Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Figure 4 for Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Viaarxiv icon

Fixing Weight Decay Regularization in Adam

Add code
Feb 14, 2018
Figure 1 for Fixing Weight Decay Regularization in Adam
Figure 2 for Fixing Weight Decay Regularization in Adam
Figure 3 for Fixing Weight Decay Regularization in Adam
Figure 4 for Fixing Weight Decay Regularization in Adam
Viaarxiv icon

A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

Add code
Aug 23, 2017
Figure 1 for A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Figure 2 for A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Figure 3 for A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Figure 4 for A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
Viaarxiv icon

Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization

Add code
May 18, 2017
Figure 1 for Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization
Figure 2 for Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization
Figure 3 for Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization
Figure 4 for Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization
Viaarxiv icon

SGDR: Stochastic Gradient Descent with Warm Restarts

Add code
May 03, 2017
Figure 1 for SGDR: Stochastic Gradient Descent with Warm Restarts
Figure 2 for SGDR: Stochastic Gradient Descent with Warm Restarts
Figure 3 for SGDR: Stochastic Gradient Descent with Warm Restarts
Figure 4 for SGDR: Stochastic Gradient Descent with Warm Restarts
Viaarxiv icon

Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)

Add code
May 09, 2016
Figure 1 for Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)
Figure 2 for Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)
Figure 3 for Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)
Figure 4 for Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)
Viaarxiv icon

CMA-ES for Hyperparameter Optimization of Deep Neural Networks

Add code
Apr 25, 2016
Figure 1 for CMA-ES for Hyperparameter Optimization of Deep Neural Networks
Figure 2 for CMA-ES for Hyperparameter Optimization of Deep Neural Networks
Figure 3 for CMA-ES for Hyperparameter Optimization of Deep Neural Networks
Figure 4 for CMA-ES for Hyperparameter Optimization of Deep Neural Networks
Viaarxiv icon

Online Batch Selection for Faster Training of Neural Networks

Add code
Apr 25, 2016
Figure 1 for Online Batch Selection for Faster Training of Neural Networks
Figure 2 for Online Batch Selection for Faster Training of Neural Networks
Figure 3 for Online Batch Selection for Faster Training of Neural Networks
Viaarxiv icon