Picture for Jingcheng Hu

Jingcheng Hu

Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining

Add code
Mar 06, 2025
Viaarxiv icon

Multi-matrix Factorization Attention

Add code
Dec 26, 2024
Figure 1 for Multi-matrix Factorization Attention
Figure 2 for Multi-matrix Factorization Attention
Figure 3 for Multi-matrix Factorization Attention
Figure 4 for Multi-matrix Factorization Attention
Viaarxiv icon

Common 7B Language Models Already Possess Strong Math Capabilities

Add code
Mar 07, 2024
Figure 1 for Common 7B Language Models Already Possess Strong Math Capabilities
Figure 2 for Common 7B Language Models Already Possess Strong Math Capabilities
Figure 3 for Common 7B Language Models Already Possess Strong Math Capabilities
Figure 4 for Common 7B Language Models Already Possess Strong Math Capabilities
Viaarxiv icon

FP8-LM: Training FP8 Large Language Models

Add code
Oct 27, 2023
Figure 1 for FP8-LM: Training FP8 Large Language Models
Figure 2 for FP8-LM: Training FP8 Large Language Models
Figure 3 for FP8-LM: Training FP8 Large Language Models
Figure 4 for FP8-LM: Training FP8 Large Language Models
Viaarxiv icon

Revealing the Dark Secrets of Masked Image Modeling

Add code
May 27, 2022
Figure 1 for Revealing the Dark Secrets of Masked Image Modeling
Figure 2 for Revealing the Dark Secrets of Masked Image Modeling
Figure 3 for Revealing the Dark Secrets of Masked Image Modeling
Figure 4 for Revealing the Dark Secrets of Masked Image Modeling
Viaarxiv icon