Picture for Xiaozhe Ren

Xiaozhe Ren

Scaling Law for Language Models Training Considering Batch Size

Add code
Dec 02, 2024
Viaarxiv icon

DAPE V2: Process Attention Score as Feature Map for Length Extrapolation

Add code
Oct 07, 2024
Viaarxiv icon

CAPE: Context-Adaptive Positional Encoding for Length Extrapolation

Add code
May 23, 2024
Viaarxiv icon

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Add code
Mar 07, 2024
Viaarxiv icon

A Survey of Reasoning with Foundation Models

Add code
Dec 26, 2023
Figure 1 for A Survey of Reasoning with Foundation Models
Figure 2 for A Survey of Reasoning with Foundation Models
Figure 3 for A Survey of Reasoning with Foundation Models
Figure 4 for A Survey of Reasoning with Foundation Models
Viaarxiv icon

EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge

Add code
Nov 23, 2023
Viaarxiv icon

CAME: Confidence-guided Adaptive Memory Efficient Optimization

Add code
Jul 05, 2023
Viaarxiv icon

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Add code
May 22, 2023
Figure 1 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 2 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 3 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 4 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Viaarxiv icon

PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Add code
Mar 20, 2023
Figure 1 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Figure 2 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Figure 3 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Figure 4 for PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Viaarxiv icon

Deeper vs Wider: A Revisit of Transformer Configuration

Add code
May 24, 2022
Figure 1 for Deeper vs Wider: A Revisit of Transformer Configuration
Figure 2 for Deeper vs Wider: A Revisit of Transformer Configuration
Figure 3 for Deeper vs Wider: A Revisit of Transformer Configuration
Figure 4 for Deeper vs Wider: A Revisit of Transformer Configuration
Viaarxiv icon