Picture for Sebastian Jaszczur

Sebastian Jaszczur

Scaling Laws for Fine-Grained Mixture of Experts

Add code
Feb 12, 2024
Viaarxiv icon

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Add code
Jan 08, 2024
Viaarxiv icon

Structured Packing in LLM Training Improves Long Context Utilization

Add code
Jan 02, 2024
Viaarxiv icon

Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation

Add code
Oct 24, 2023
Viaarxiv icon

Sparse is Enough in Scaling Transformers

Add code
Nov 24, 2021
Figure 1 for Sparse is Enough in Scaling Transformers
Figure 2 for Sparse is Enough in Scaling Transformers
Figure 3 for Sparse is Enough in Scaling Transformers
Figure 4 for Sparse is Enough in Scaling Transformers
Viaarxiv icon

Neural heuristics for SAT solving

Add code
May 27, 2020
Figure 1 for Neural heuristics for SAT solving
Figure 2 for Neural heuristics for SAT solving
Figure 3 for Neural heuristics for SAT solving
Figure 4 for Neural heuristics for SAT solving
Viaarxiv icon