Picture for Lukas Galke

Lukas Galke

Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?

Add code
Feb 17, 2025
Viaarxiv icon

FlexDeMo: Decoupled Momentum Optimization for Fully and Hybrid Sharded Training

Add code
Feb 10, 2025
Viaarxiv icon

A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification

Add code
Jan 23, 2025
Figure 1 for A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
Figure 2 for A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
Figure 3 for A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
Figure 4 for A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
Viaarxiv icon

Continual Learning for Encoder-only Language Models via a Discrete Key-Value Bottleneck

Add code
Dec 11, 2024
Viaarxiv icon

Isotropy Matters: Soft-ZCA Whitening of Embeddings for Semantic Code Search

Add code
Nov 26, 2024
Viaarxiv icon

Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal

Add code
Nov 20, 2024
Figure 1 for Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal
Figure 2 for Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal
Figure 3 for Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal
Figure 4 for Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal
Viaarxiv icon

When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization

Add code
Nov 08, 2024
Viaarxiv icon

Tokenization and Morphology in Multilingual Language Models: A~Comparative Analysis of mT5 and ByT5

Add code
Oct 15, 2024
Viaarxiv icon

POWN: Prototypical Open-World Node Classification

Add code
Jun 14, 2024
Viaarxiv icon

Emergent communication and learning pressures in language models: a language evolution perspective

Add code
Mar 21, 2024
Figure 1 for Emergent communication and learning pressures in language models: a language evolution perspective
Figure 2 for Emergent communication and learning pressures in language models: a language evolution perspective
Figure 3 for Emergent communication and learning pressures in language models: a language evolution perspective
Viaarxiv icon