Picture for Xiaozhe Ren

Xiaozhe Ren

Self-Adjust Softmax

Add code
Feb 25, 2025
Viaarxiv icon

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

Add code
Dec 16, 2024
Figure 1 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Figure 2 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Figure 3 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Figure 4 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Viaarxiv icon

Scaling Law for Language Models Training Considering Batch Size

Add code
Dec 02, 2024
Viaarxiv icon

DAPE V2: Process Attention Score as Feature Map for Length Extrapolation

Add code
Oct 07, 2024
Figure 1 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Figure 2 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Figure 3 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Figure 4 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Viaarxiv icon

CAPE: Context-Adaptive Positional Encoding for Length Extrapolation

Add code
May 23, 2024
Figure 1 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Figure 2 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Figure 3 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Figure 4 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Viaarxiv icon

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Add code
Mar 07, 2024
Figure 1 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Figure 2 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Figure 3 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Figure 4 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Viaarxiv icon

A Survey of Reasoning with Foundation Models

Add code
Dec 26, 2023
Figure 1 for A Survey of Reasoning with Foundation Models
Figure 2 for A Survey of Reasoning with Foundation Models
Figure 3 for A Survey of Reasoning with Foundation Models
Figure 4 for A Survey of Reasoning with Foundation Models
Viaarxiv icon

EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge

Add code
Nov 23, 2023
Figure 1 for EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
Figure 2 for EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
Figure 3 for EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
Figure 4 for EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
Viaarxiv icon

CAME: Confidence-guided Adaptive Memory Efficient Optimization

Add code
Jul 05, 2023
Figure 1 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Figure 2 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Figure 3 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Figure 4 for CAME: Confidence-guided Adaptive Memory Efficient Optimization
Viaarxiv icon

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Add code
May 22, 2023
Figure 1 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 2 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 3 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Figure 4 for Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Viaarxiv icon