Picture for Shikai Qiu

Shikai Qiu

From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence

Add code
Jan 06, 2026
Viaarxiv icon

Customizing the Inductive Biases of Softmax Attention using Structured Matrices

Add code
Sep 09, 2025
Viaarxiv icon

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Add code
Jul 02, 2025
Viaarxiv icon

Out-of-Distribution Detection Methods Answer the Wrong Questions

Add code
Jul 02, 2025
Viaarxiv icon

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices

Add code
Oct 03, 2024
Figure 1 for Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
Figure 2 for Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
Figure 3 for Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
Figure 4 for Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
Viaarxiv icon

Transferring Knowledge from Large Foundation Models to Small Downstream Models

Add code
Jun 11, 2024
Figure 1 for Transferring Knowledge from Large Foundation Models to Small Downstream Models
Figure 2 for Transferring Knowledge from Large Foundation Models to Small Downstream Models
Figure 3 for Transferring Knowledge from Large Foundation Models to Small Downstream Models
Figure 4 for Transferring Knowledge from Large Foundation Models to Small Downstream Models
Viaarxiv icon

Compute Better Spent: Replacing Dense Layers with Structured Matrices

Add code
Jun 10, 2024
Figure 1 for Compute Better Spent: Replacing Dense Layers with Structured Matrices
Figure 2 for Compute Better Spent: Replacing Dense Layers with Structured Matrices
Figure 3 for Compute Better Spent: Replacing Dense Layers with Structured Matrices
Figure 4 for Compute Better Spent: Replacing Dense Layers with Structured Matrices
Viaarxiv icon

Function-Space Regularization in Neural Networks: A Probabilistic Perspective

Add code
Dec 28, 2023
Viaarxiv icon

Should We Learn Most Likely Functions or Parameters?

Add code
Nov 27, 2023
Viaarxiv icon

Large Language Models Are Zero-Shot Time Series Forecasters

Add code
Oct 11, 2023
Viaarxiv icon