Picture for Rameswar Panda

Rameswar Panda

Richard

Stick-breaking Attention

Add code
Oct 23, 2024
Viaarxiv icon

Calibrating Expressions of Certainty

Add code
Oct 06, 2024
Viaarxiv icon

SITAR: Semi-supervised Image Transformer for Action Recognition

Add code
Sep 04, 2024
Figure 1 for SITAR: Semi-supervised Image Transformer for Action Recognition
Figure 2 for SITAR: Semi-supervised Image Transformer for Action Recognition
Figure 3 for SITAR: Semi-supervised Image Transformer for Action Recognition
Figure 4 for SITAR: Semi-supervised Image Transformer for Action Recognition
Viaarxiv icon

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Add code
Aug 23, 2024
Viaarxiv icon

Scaling Granite Code Models to 128K Context

Add code
Jul 18, 2024
Viaarxiv icon

The infrastructure powering IBM's Gen AI model development

Add code
Jul 07, 2024
Figure 1 for The infrastructure powering IBM's Gen AI model development
Figure 2 for The infrastructure powering IBM's Gen AI model development
Figure 3 for The infrastructure powering IBM's Gen AI model development
Figure 4 for The infrastructure powering IBM's Gen AI model development
Viaarxiv icon

Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Add code
Jun 27, 2024
Viaarxiv icon

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Add code
Jun 17, 2024
Viaarxiv icon

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Add code
May 21, 2024
Viaarxiv icon

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Add code
May 07, 2024
Figure 1 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 2 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 3 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 4 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Viaarxiv icon