Picture for Gaoyuan Zhang

Gaoyuan Zhang

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Add code
Aug 23, 2024
Viaarxiv icon

Scaling Granite Code Models to 128K Context

Add code
Jul 18, 2024
Viaarxiv icon

The infrastructure powering IBM's Gen AI model development

Add code
Jul 07, 2024
Figure 1 for The infrastructure powering IBM's Gen AI model development
Figure 2 for The infrastructure powering IBM's Gen AI model development
Figure 3 for The infrastructure powering IBM's Gen AI model development
Figure 4 for The infrastructure powering IBM's Gen AI model development
Viaarxiv icon

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Add code
May 07, 2024
Figure 1 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 2 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 3 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 4 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Viaarxiv icon

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Add code
Apr 08, 2024
Figure 1 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 2 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 3 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 4 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Viaarxiv icon

Rapid Development of Compositional AI

Add code
Feb 12, 2023
Figure 1 for Rapid Development of Compositional AI
Figure 2 for Rapid Development of Compositional AI
Figure 3 for Rapid Development of Compositional AI
Figure 4 for Rapid Development of Compositional AI
Viaarxiv icon

Distributed Adversarial Training to Robustify Deep Neural Networks at Scale

Add code
Jun 13, 2022
Figure 1 for Distributed Adversarial Training to Robustify Deep Neural Networks at Scale
Figure 2 for Distributed Adversarial Training to Robustify Deep Neural Networks at Scale
Figure 3 for Distributed Adversarial Training to Robustify Deep Neural Networks at Scale
Figure 4 for Distributed Adversarial Training to Robustify Deep Neural Networks at Scale
Viaarxiv icon

When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?

Add code
Nov 01, 2021
Figure 1 for When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
Figure 2 for When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
Figure 3 for When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
Figure 4 for When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
Viaarxiv icon

Generating Adversarial Computer Programs using Optimized Obfuscations

Add code
Mar 18, 2021
Figure 1 for Generating Adversarial Computer Programs using Optimized Obfuscations
Figure 2 for Generating Adversarial Computer Programs using Optimized Obfuscations
Figure 3 for Generating Adversarial Computer Programs using Optimized Obfuscations
Figure 4 for Generating Adversarial Computer Programs using Optimized Obfuscations
Viaarxiv icon

Fast Training of Provably Robust Neural Networks by SingleProp

Add code
Feb 01, 2021
Figure 1 for Fast Training of Provably Robust Neural Networks by SingleProp
Figure 2 for Fast Training of Provably Robust Neural Networks by SingleProp
Figure 3 for Fast Training of Provably Robust Neural Networks by SingleProp
Figure 4 for Fast Training of Provably Robust Neural Networks by SingleProp
Viaarxiv icon