Picture for Yikang Shen

Yikang Shen

Stick-breaking Attention

Add code
Oct 23, 2024
Viaarxiv icon

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Add code
Aug 23, 2024
Viaarxiv icon

FlexAttention for Efficient High-Resolution Vision-Language Models

Add code
Jul 29, 2024
Figure 1 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 2 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 3 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 4 for FlexAttention for Efficient High-Resolution Vision-Language Models
Viaarxiv icon

Scaling Granite Code Models to 128K Context

Add code
Jul 18, 2024
Viaarxiv icon

The infrastructure powering IBM's Gen AI model development

Add code
Jul 07, 2024
Figure 1 for The infrastructure powering IBM's Gen AI model development
Figure 2 for The infrastructure powering IBM's Gen AI model development
Figure 3 for The infrastructure powering IBM's Gen AI model development
Figure 4 for The infrastructure powering IBM's Gen AI model development
Viaarxiv icon

Octo-planner: On-device Language Model for Planner-Action Agents

Add code
Jun 26, 2024
Viaarxiv icon

Efficient Continual Pre-training by Mitigating the Stability Gap

Add code
Jun 21, 2024
Viaarxiv icon

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Add code
Jun 10, 2024
Viaarxiv icon

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Add code
May 24, 2024
Viaarxiv icon

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Add code
May 07, 2024
Figure 1 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 2 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 3 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Figure 4 for Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Viaarxiv icon