Picture for Mayank Mishra

Mayank Mishra

Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

Add code
Sep 07, 2024
Figure 1 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Figure 2 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Figure 3 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Figure 4 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Viaarxiv icon

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Add code
Aug 23, 2024
Viaarxiv icon

Scaling Granite Code Models to 128K Context

Add code
Jul 18, 2024
Viaarxiv icon

The infrastructure powering IBM's Gen AI model development

Add code
Jul 07, 2024
Figure 1 for The infrastructure powering IBM's Gen AI model development
Figure 2 for The infrastructure powering IBM's Gen AI model development
Figure 3 for The infrastructure powering IBM's Gen AI model development
Figure 4 for The infrastructure powering IBM's Gen AI model development
Viaarxiv icon

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Add code
May 21, 2024
Viaarxiv icon

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Add code
May 07, 2024
Figure 1 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 2 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 3 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 4 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Viaarxiv icon

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Add code
Apr 08, 2024
Figure 1 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 2 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 3 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Figure 4 for Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Viaarxiv icon

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Add code
Apr 04, 2024
Viaarxiv icon

DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Add code
Apr 03, 2024
Figure 1 for DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Figure 2 for DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Figure 3 for DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Figure 4 for DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Viaarxiv icon

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Add code
Mar 30, 2024
Figure 1 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 2 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 3 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Figure 4 for Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Viaarxiv icon