Picture for Weilin Zhao

Weilin Zhao

Densing Law of LLMs

Add code
Dec 05, 2024
Viaarxiv icon

Enabling Real-Time Conversations with Minimal Training Costs

Add code
Sep 18, 2024
Viaarxiv icon

Configurable Foundation Models: Building LLMs from a Modular Perspective

Add code
Sep 04, 2024
Viaarxiv icon

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Add code
Aug 03, 2024
Figure 1 for MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Figure 2 for MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Figure 3 for MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Figure 4 for MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Viaarxiv icon

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

Add code
Jun 22, 2024
Figure 1 for Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
Figure 2 for Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
Figure 3 for Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
Figure 4 for Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
Viaarxiv icon

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Add code
Apr 09, 2024
Figure 1 for MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Figure 2 for MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Figure 3 for MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Figure 4 for MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Viaarxiv icon

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

Add code
Mar 18, 2024
Viaarxiv icon

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Add code
Mar 14, 2024
Figure 1 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Figure 2 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Figure 3 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Figure 4 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Viaarxiv icon

Ouroboros: Speculative Decoding with Large Model Enhanced Drafting

Add code
Feb 21, 2024
Viaarxiv icon

Unlock Predictable Scaling from Emergent Abilities

Add code
Oct 05, 2023
Figure 1 for Unlock Predictable Scaling from Emergent Abilities
Figure 2 for Unlock Predictable Scaling from Emergent Abilities
Figure 3 for Unlock Predictable Scaling from Emergent Abilities
Figure 4 for Unlock Predictable Scaling from Emergent Abilities
Viaarxiv icon