Picture for John Kirchenbauer

John Kirchenbauer

Antidistillation Fingerprinting

Add code
Feb 03, 2026
Viaarxiv icon

Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Add code
Nov 10, 2025
Viaarxiv icon

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Add code
Jun 05, 2025
Viaarxiv icon

A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition

Add code
Jun 05, 2025
Viaarxiv icon

Zero-Shot Vision Encoder Grafting via LLM Surrogates

Add code
May 28, 2025
Figure 1 for Zero-Shot Vision Encoder Grafting via LLM Surrogates
Figure 2 for Zero-Shot Vision Encoder Grafting via LLM Surrogates
Figure 3 for Zero-Shot Vision Encoder Grafting via LLM Surrogates
Figure 4 for Zero-Shot Vision Encoder Grafting via LLM Surrogates
Viaarxiv icon

When Can You Get Away with Low Memory Adam?

Add code
Mar 03, 2025
Figure 1 for When Can You Get Away with Low Memory Adam?
Figure 2 for When Can You Get Away with Low Memory Adam?
Figure 3 for When Can You Get Away with Low Memory Adam?
Figure 4 for When Can You Get Away with Low Memory Adam?
Viaarxiv icon

Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers

Add code
Feb 12, 2025
Figure 1 for Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Figure 2 for Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Figure 3 for Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Figure 4 for Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Viaarxiv icon

Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs

Add code
Feb 10, 2025
Figure 1 for Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs
Figure 2 for Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs
Figure 3 for Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs
Figure 4 for Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs
Viaarxiv icon

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Add code
Feb 07, 2025
Figure 1 for Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Figure 2 for Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Figure 3 for Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Figure 4 for Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Viaarxiv icon

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Add code
Feb 07, 2025
Figure 1 for Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Figure 2 for Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Figure 3 for Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Figure 4 for Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Viaarxiv icon