Picture for Kaiyan Zhang

Kaiyan Zhang

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Add code
Mar 14, 2025
Viaarxiv icon

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Add code
Feb 10, 2025
Viaarxiv icon

Process Reinforcement through Implicit Rewards

Add code
Feb 03, 2025
Viaarxiv icon

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Add code
Jan 30, 2025
Viaarxiv icon

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Add code
Dec 23, 2024
Viaarxiv icon

How to Synthesize Text Data without Model Collapse?

Add code
Dec 19, 2024
Figure 1 for How to Synthesize Text Data without Model Collapse?
Figure 2 for How to Synthesize Text Data without Model Collapse?
Figure 3 for How to Synthesize Text Data without Model Collapse?
Figure 4 for How to Synthesize Text Data without Model Collapse?
Viaarxiv icon

Free Process Rewards without Process Labels

Add code
Dec 02, 2024
Figure 1 for Free Process Rewards without Process Labels
Figure 2 for Free Process Rewards without Process Labels
Figure 3 for Free Process Rewards without Process Labels
Figure 4 for Free Process Rewards without Process Labels
Viaarxiv icon

Automating Exploratory Proteomics Research via Language Models

Add code
Nov 06, 2024
Viaarxiv icon

Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention

Add code
Nov 04, 2024
Figure 1 for Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention
Figure 2 for Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention
Figure 3 for Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention
Figure 4 for Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention
Viaarxiv icon

A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation

Add code
Oct 28, 2024
Figure 1 for A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation
Figure 2 for A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation
Figure 3 for A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation
Figure 4 for A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation
Viaarxiv icon