Picture for Nhat Ho

Nhat Ho

A Statistical Theory of Gated Attention through the Lens of Hierarchical Mixture of Experts

Add code
Feb 01, 2026
Viaarxiv icon

Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function

Add code
Feb 01, 2026
Viaarxiv icon

Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity

Add code
Jan 31, 2026
Viaarxiv icon

S-Chain: Structured Visual Chain-of-Thought For Medicine

Add code
Oct 26, 2025
Figure 1 for S-Chain: Structured Visual Chain-of-Thought For Medicine
Figure 2 for S-Chain: Structured Visual Chain-of-Thought For Medicine
Figure 3 for S-Chain: Structured Visual Chain-of-Thought For Medicine
Figure 4 for S-Chain: Structured Visual Chain-of-Thought For Medicine
Viaarxiv icon

Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps

Add code
Oct 14, 2025
Figure 1 for Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps
Figure 2 for Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps
Figure 3 for Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps
Figure 4 for Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps
Viaarxiv icon

HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks

Add code
Oct 05, 2025
Viaarxiv icon

DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks

Add code
Oct 05, 2025
Figure 1 for DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks
Figure 2 for DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks
Figure 3 for DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks
Figure 4 for DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks
Viaarxiv icon

On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts

Add code
May 24, 2025
Figure 1 for On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Figure 2 for On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Figure 3 for On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts
Viaarxiv icon

Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures

Add code
May 19, 2025
Viaarxiv icon

CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition

Add code
May 19, 2025
Figure 1 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition
Figure 2 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition
Figure 3 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition
Figure 4 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition
Viaarxiv icon