Picture for Yutao Sun

Yutao Sun

Multimodal Latent Language Modeling with Next-Token Diffusion

Add code
Dec 11, 2024
Viaarxiv icon

Differential Transformer

Add code
Oct 07, 2024
Figure 1 for Differential Transformer
Figure 2 for Differential Transformer
Figure 3 for Differential Transformer
Figure 4 for Differential Transformer
Viaarxiv icon

FocusLLM: Scaling LLM's Context by Parallel Decoding

Add code
Aug 21, 2024
Figure 1 for FocusLLM: Scaling LLM's Context by Parallel Decoding
Figure 2 for FocusLLM: Scaling LLM's Context by Parallel Decoding
Figure 3 for FocusLLM: Scaling LLM's Context by Parallel Decoding
Figure 4 for FocusLLM: Scaling LLM's Context by Parallel Decoding
Viaarxiv icon

Preserving Knowledge in Large Language Model: A Model-Agnostic Self-Decompression Approach

Add code
Jun 17, 2024
Viaarxiv icon

HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation

Add code
Jun 06, 2024
Figure 1 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 2 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 3 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 4 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Viaarxiv icon

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Add code
May 08, 2024
Viaarxiv icon

Retentive Network: A Successor to Transformer for Large Language Models

Add code
Aug 09, 2023
Viaarxiv icon

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

Add code
Dec 21, 2022
Viaarxiv icon

A Length-Extrapolatable Transformer

Add code
Dec 20, 2022
Viaarxiv icon

Structured Prompting: Scaling In-Context Learning to 1,000 Examples

Add code
Dec 13, 2022
Viaarxiv icon