Picture for Jun Suzuki

Jun Suzuki

Relaxing Positional Alignment in Masked Diffusion Language Models

Add code
Jan 30, 2026
Viaarxiv icon

TimeMachine-bench: A Benchmark for Evaluating Model Capabilities in Repository-Level Migration Tasks

Add code
Jan 30, 2026
Viaarxiv icon

Suppressing Final Layer Hidden State Jumps in Transformer Pretraining

Add code
Jan 26, 2026
Viaarxiv icon

Instruction-Following Evaluation of Large Vision-Language Models

Add code
Dec 29, 2025
Viaarxiv icon

An Open and Reproducible Deep Research Agent for Long-Form Question Answering

Add code
Dec 15, 2025
Figure 1 for An Open and Reproducible Deep Research Agent for Long-Form Question Answering
Figure 2 for An Open and Reproducible Deep Research Agent for Long-Form Question Answering
Figure 3 for An Open and Reproducible Deep Research Agent for Long-Form Question Answering
Figure 4 for An Open and Reproducible Deep Research Agent for Long-Form Question Answering
Viaarxiv icon

Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders

Add code
Oct 25, 2025
Figure 1 for Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
Figure 2 for Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
Figure 3 for Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
Figure 4 for Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
Viaarxiv icon

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Add code
Aug 26, 2025
Viaarxiv icon

Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models

Add code
Aug 25, 2025
Viaarxiv icon

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents

Add code
Apr 14, 2025
Viaarxiv icon

STEP: Staged Parameter-Efficient Pre-training for Large Language Models

Add code
Apr 05, 2025
Viaarxiv icon