Picture for Zhi Zhang

Zhi Zhang

Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective

Add code
Nov 27, 2024
Figure 1 for Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
Figure 2 for Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
Figure 3 for Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
Figure 4 for Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective
Viaarxiv icon

Cross-modal Information Flow in Multimodal Large Language Models

Add code
Nov 27, 2024
Viaarxiv icon

Distributed Sign Momentum with Local Steps for Training Transformers

Add code
Nov 26, 2024
Viaarxiv icon

Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory

Add code
Nov 25, 2024
Figure 1 for Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory
Figure 2 for Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory
Figure 3 for Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory
Figure 4 for Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory
Viaarxiv icon

Dense ReLU Neural Networks for Temporal-spatial Model

Add code
Nov 15, 2024
Figure 1 for Dense ReLU Neural Networks for Temporal-spatial Model
Figure 2 for Dense ReLU Neural Networks for Temporal-spatial Model
Figure 3 for Dense ReLU Neural Networks for Temporal-spatial Model
Figure 4 for Dense ReLU Neural Networks for Temporal-spatial Model
Viaarxiv icon

From References to Insights: Collaborative Knowledge Minigraph Agents for Automating Scholarly Literature Review

Add code
Nov 09, 2024
Viaarxiv icon

Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayesian Theory

Add code
Nov 01, 2024
Figure 1 for Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayesian Theory
Figure 2 for Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayesian Theory
Figure 3 for Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayesian Theory
Figure 4 for Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayesian Theory
Viaarxiv icon

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

Add code
Oct 20, 2024
Figure 1 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 2 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 3 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Figure 4 for SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Viaarxiv icon

DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting

Add code
Oct 15, 2024
Figure 1 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Figure 2 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Figure 3 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Figure 4 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Viaarxiv icon

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

Add code
Oct 15, 2024
Figure 1 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 2 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 3 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Figure 4 for MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Viaarxiv icon