Picture for Siegfried Handschuh

Siegfried Handschuh

University of St.Gallen

Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder

Add code
Apr 10, 2026
Viaarxiv icon

Efficient Neural Network Training via Subset Pretraining

Add code
Oct 21, 2024
Figure 1 for Efficient Neural Network Training via Subset Pretraining
Figure 2 for Efficient Neural Network Training via Subset Pretraining
Figure 3 for Efficient Neural Network Training via Subset Pretraining
Figure 4 for Efficient Neural Network Training via Subset Pretraining
Viaarxiv icon

Reducing the Transformer Architecture to a Minimum

Add code
Oct 17, 2024
Figure 1 for Reducing the Transformer Architecture to a Minimum
Figure 2 for Reducing the Transformer Architecture to a Minimum
Figure 3 for Reducing the Transformer Architecture to a Minimum
Figure 4 for Reducing the Transformer Architecture to a Minimum
Viaarxiv icon

Make Deep Networks Shallow Again

Add code
Sep 15, 2023
Figure 1 for Make Deep Networks Shallow Again
Figure 2 for Make Deep Networks Shallow Again
Figure 3 for Make Deep Networks Shallow Again
Figure 4 for Make Deep Networks Shallow Again
Viaarxiv icon

Discourse-Aware Text Simplification: From Complex Sentences to Linked Propositions

Add code
Aug 01, 2023
Figure 1 for Discourse-Aware Text Simplification: From Complex Sentences to Linked Propositions
Figure 2 for Discourse-Aware Text Simplification: From Complex Sentences to Linked Propositions
Figure 3 for Discourse-Aware Text Simplification: From Complex Sentences to Linked Propositions
Figure 4 for Discourse-Aware Text Simplification: From Complex Sentences to Linked Propositions
Viaarxiv icon

Analyzing FOMC Minutes: Accuracy and Constraints of Language Models

Add code
Apr 20, 2023
Figure 1 for Analyzing FOMC Minutes: Accuracy and Constraints of Language Models
Figure 2 for Analyzing FOMC Minutes: Accuracy and Constraints of Language Models
Figure 3 for Analyzing FOMC Minutes: Accuracy and Constraints of Language Models
Figure 4 for Analyzing FOMC Minutes: Accuracy and Constraints of Language Models
Viaarxiv icon

Number of Attention Heads vs Number of Transformer-Encoders in Computer Vision

Add code
Sep 15, 2022
Figure 1 for Number of Attention Heads vs Number of Transformer-Encoders in Computer Vision
Figure 2 for Number of Attention Heads vs Number of Transformer-Encoders in Computer Vision
Figure 3 for Number of Attention Heads vs Number of Transformer-Encoders in Computer Vision
Figure 4 for Number of Attention Heads vs Number of Transformer-Encoders in Computer Vision
Viaarxiv icon

Training Neural Networks in Single vs Double Precision

Add code
Sep 15, 2022
Figure 1 for Training Neural Networks in Single vs Double Precision
Figure 2 for Training Neural Networks in Single vs Double Precision
Figure 3 for Training Neural Networks in Single vs Double Precision
Viaarxiv icon

Uncovering More Shallow Heuristics: Probing the Natural Language Inference Capacities of Transformer-Based Pre-Trained Language Models Using Syllogistic Patterns

Add code
Jan 19, 2022
Figure 1 for Uncovering More Shallow Heuristics: Probing the Natural Language Inference Capacities of Transformer-Based Pre-Trained Language Models Using Syllogistic Patterns
Figure 2 for Uncovering More Shallow Heuristics: Probing the Natural Language Inference Capacities of Transformer-Based Pre-Trained Language Models Using Syllogistic Patterns
Figure 3 for Uncovering More Shallow Heuristics: Probing the Natural Language Inference Capacities of Transformer-Based Pre-Trained Language Models Using Syllogistic Patterns
Figure 4 for Uncovering More Shallow Heuristics: Probing the Natural Language Inference Capacities of Transformer-Based Pre-Trained Language Models Using Syllogistic Patterns
Viaarxiv icon

Exploring the Promises of Transformer-Based LMs for the Representation of Normative Claims in the Legal Domain

Add code
Aug 25, 2021
Figure 1 for Exploring the Promises of Transformer-Based LMs for the Representation of Normative Claims in the Legal Domain
Figure 2 for Exploring the Promises of Transformer-Based LMs for the Representation of Normative Claims in the Legal Domain
Figure 3 for Exploring the Promises of Transformer-Based LMs for the Representation of Normative Claims in the Legal Domain
Figure 4 for Exploring the Promises of Transformer-Based LMs for the Representation of Normative Claims in the Legal Domain
Viaarxiv icon