Picture for Shengding Hu

Shengding Hu

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

Add code
Oct 09, 2024
Figure 1 for Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
Figure 2 for Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
Figure 3 for Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
Figure 4 for Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
Viaarxiv icon

DecorateLM: Data Engineering through Corpus Rating, Tagging, and Editing with Language Models

Add code
Oct 08, 2024
Viaarxiv icon

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Add code
Aug 03, 2024
Viaarxiv icon

States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

Add code
Jul 16, 2024
Viaarxiv icon

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

Add code
Jun 22, 2024
Viaarxiv icon

LEGENT: Open Platform for Embodied Agents

Add code
Apr 28, 2024
Figure 1 for LEGENT: Open Platform for Embodied Agents
Figure 2 for LEGENT: Open Platform for Embodied Agents
Figure 3 for LEGENT: Open Platform for Embodied Agents
Figure 4 for LEGENT: Open Platform for Embodied Agents
Viaarxiv icon

UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs

Add code
Apr 11, 2024
Viaarxiv icon

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Add code
Apr 09, 2024
Figure 1 for MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Figure 2 for MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Figure 3 for MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Figure 4 for MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Viaarxiv icon

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

Add code
Feb 27, 2024
Viaarxiv icon

Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

Add code
Feb 26, 2024
Viaarxiv icon