Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minghui Chen

refer to the report for detailed contributions

Can Textual Gradient Work in Federated Learning?

Feb 27, 2025

Minghui Chen, Ruinan Jin, Wenlong Deng, Yuanyuan Chen, Zhi Huang, Han Yu, Xiaoxiao Li

Abstract:Recent studies highlight the promise of LLM-based prompt optimization, especially with TextGrad, which automates differentiation'' via texts and backpropagates textual feedback. This approach facilitates training in various real-world applications that do not support numerical gradient propagation or loss calculation. In this paper, we systematically explore the potential and challenges of incorporating textual gradient into Federated Learning (FL). Our contributions are fourfold. Firstly, we introduce a novel FL paradigm, Federated Textual Gradient (FedTextGrad), that allows clients to upload locally optimized prompts derived from textual gradients, while the server aggregates the received prompts. Unlike traditional FL frameworks, which are designed for numerical aggregation, FedTextGrad is specifically tailored for handling textual data, expanding the applicability of FL to a broader range of problems that lack well-defined numerical loss functions. Secondly, building on this design, we conduct extensive experiments to explore the feasibility of FedTextGrad. Our findings highlight the importance of properly tuning key factors (e.g., local steps) in FL training. Thirdly, we highlight a major challenge in FedTextGrad aggregation: retaining essential information from distributed prompt updates. Last but not least, in response to this issue, we improve the vanilla variant of FedTextGrad by providing actionable guidance to the LLM when summarizing client prompts by leveraging the Uniform Information Density principle. Through this principled study, we enable the adoption of textual gradients in FL for optimizing LLMs, identify important issues, and pinpoint future directions, thereby opening up a new research area that warrants further investigation.

* Accepted at ICLR 2025

Via

Access Paper or Ask Questions

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Jan 21, 2025

Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang(+61 more)

Figure 1 for Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Figure 2 for Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Figure 3 for Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Figure 4 for Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Abstract:We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint. The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio -- a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets. It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and etc. Hunyuan3D 2.0 is publicly released in order to fill the gaps in the open-source 3D community for large-scale foundation generative models. The code and pre-trained weights of our models are available at: https://github.com/Tencent/Hunyuan3D-2

* GitHub link: https://github.com/Tencent/Hunyuan3D-2

Via

Access Paper or Ask Questions

Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning

Oct 31, 2024

Minghui Chen, Meirui Jiang, Xin Zhang, Qi Dou, Zehua Wang, Xiaoxiao Li

Figure 1 for Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning

Figure 2 for Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning

Figure 3 for Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning

Figure 4 for Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning

Abstract:Federated learning (FL) is a learning paradigm that enables collaborative training of models using decentralized data. Recently, the utilization of pre-trained weight initialization in FL has been demonstrated to effectively improve model performance. However, the evolving complexity of current pre-trained models, characterized by a substantial increase in parameters, markedly intensifies the challenges associated with communication rounds required for their adaptation to FL. To address these communication cost issues and increase the performance of pre-trained model adaptation in FL, we propose an innovative model interpolation-based local training technique called ``Local Superior Soups.'' Our method enhances local training across different clients, encouraging the exploration of a connected low-loss basin within a few communication rounds through regularized model interpolation. This approach acts as a catalyst for the seamless adaptation of pre-trained models in in FL. We demonstrated its effectiveness and efficiency across diverse widely-used FL datasets. Our code is available at \href{https://github.com/ubc-tea/Local-Superior-Soups}{https://github.com/ubc-tea/Local-Superior-Soups}.

* Accepted at NeurIPS 2024

Via

Access Paper or Ask Questions

DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models

Oct 12, 2024

Wenlong Deng, Yize Zhao, Vala Vakilian, Minghui Chen, Xiaoxiao Li, Christos Thrampoulidis

Figure 1 for DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models

Figure 2 for DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models

Figure 3 for DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models

Figure 4 for DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models

Abstract:Storing open-source fine-tuned models separately introduces redundancy and increases response times in applications utilizing multiple models. Delta-parameter pruning (DPP), particularly the random drop and rescale (DARE) method proposed by Yu et al., addresses this by pruning the majority of delta parameters--the differences between fine-tuned and pre-trained model weights--while typically maintaining minimal performance loss. However, DARE fails when either the pruning rate or the magnitude of the delta parameters is large. We highlight two key reasons for this failure: (1) an excessively large rescaling factor as pruning rates increase, and (2) high mean and variance in the delta parameters. To push DARE's limits, we introduce DAREx (DARE the eXtreme), which features two algorithmic improvements: (1) DAREx-q, a rescaling factor modification that significantly boosts performance at high pruning rates (e.g., >30 % on COLA and SST2 for encoder models, with even greater gains in decoder models), and (2) DAREx-L2, which combines DARE with AdamR, an in-training method that applies appropriate delta regularization before DPP. We also demonstrate that DAREx-q can be seamlessly combined with vanilla parameter-efficient fine-tuning techniques like LoRA and can facilitate structural DPP. Additionally, we revisit the application of importance-based pruning techniques within DPP, demonstrating that they outperform random-based methods when delta parameters are large. Through this comprehensive study, we develop a pipeline for selecting the most appropriate DPP method under various practical scenarios.

Via

Access Paper or Ask Questions

Nonparametric End-to-End Probabilistic Forecasting of Distributed Generation Outputs Considering Missing Data Imputation

Mar 31, 2024

Minghui Chen, Zichao Meng, Yanping Liu, Longbo Luo, Ye Guo, Kang Wang

Abstract:In this paper, we introduce a nonparametric end-to-end method for probabilistic forecasting of distributed renewable generation outputs while including missing data imputation. Firstly, we employ a nonparametric probabilistic forecast model utilizing the long short-term memory (LSTM) network to model the probability distributions of distributed renewable generations' outputs. Secondly, we design an end-to-end training process that includes missing data imputation through iterative imputation and iterative loss-based training procedures. This two-step modeling approach effectively combines the strengths of the nonparametric method with the end-to-end approach. Consequently, our approach demonstrates exceptional capabilities in probabilistic forecasting for the outputs of distributed renewable generations while effectively handling missing values. Simulation results confirm the superior performance of our approach compared to existing alternatives.

Via

Access Paper or Ask Questions

Universal Debiased Editing on Foundation Models for Fair Medical Image Classification

Mar 16, 2024

Ruinan Jin, Wenlong Deng, Minghui Chen, Xiaoxiao Li

Figure 1 for Universal Debiased Editing on Foundation Models for Fair Medical Image Classification

Figure 2 for Universal Debiased Editing on Foundation Models for Fair Medical Image Classification

Figure 3 for Universal Debiased Editing on Foundation Models for Fair Medical Image Classification

Figure 4 for Universal Debiased Editing on Foundation Models for Fair Medical Image Classification

Abstract:In the era of Foundation Models' (FMs) rising prominence in AI, our study addresses the challenge of biases in medical images while using FM API, particularly spurious correlations between pixels and sensitive attributes. Traditional methods for bias mitigation face limitations due to the restricted access to web-hosted FMs and difficulties in addressing the underlying bias encoded within the FM API. We propose an U(niversal) D(ebiased) E(diting) strategy, termed UDE, which generates UDE noise to mask such spurious correlation. UDE is capable of mitigating bias both within the FM API embedding and the images themselves. Furthermore, UDE is suitable for both white-box and black-box FM APIs, where we introduced G(reedy) (Z)eroth-O(rder) (GeZO) optimization for it when the gradient is inaccessible in black-box APIs. Our whole pipeline enables fairness-aware image editing that can be applied across various medical contexts without requiring direct model manipulation or significant computational resources. Our empirical results demonstrate the method's effectiveness in maintaining fairness and utility across different patient groups and diseases. In the era of AI-driven medicine, this work contributes to making healthcare diagnostics more equitable, showcasing a practical solution for bias mitigation in pre-trained image FMs.

Via

Access Paper or Ask Questions

FedSoup: Improving Generalization and Personalization in Federated Learning via Selective Model Interpolation

Jul 20, 2023

Minghui Chen, Meirui Jiang, Qi Dou, Zehua Wang, Xiaoxiao Li

Abstract:Cross-silo federated learning (FL) enables the development of machine learning models on datasets distributed across data centers such as hospitals and clinical research laboratories. However, recent research has found that current FL algorithms face a trade-off between local and global performance when confronted with distribution shifts. Specifically, personalized FL methods have a tendency to overfit to local data, leading to a sharp valley in the local model and inhibiting its ability to generalize to out-of-distribution data. In this paper, we propose a novel federated model soup method (i.e., selective interpolation of model parameters) to optimize the trade-off between local and global performance. Specifically, during the federated training phase, each client maintains its own global model pool by monitoring the performance of the interpolated model between the local and global models. This allows us to alleviate overfitting and seek flat minima, which can significantly improve the model's generalization performance. We evaluate our method on retinal and pathological image classification tasks, and our proposed method achieves significant improvements for out-of-distribution generalization. Our code is available at https://github.com/ubc-tea/FedSoup.

* Accepted by MICCAI2023

Via

Access Paper or Ask Questions

Forgettable Federated Linear Learning with Certified Data Removal

Jun 03, 2023

Ruinan Jin, Minghui Chen, Qiong Zhang, Xiaoxiao Li

Figure 1 for Forgettable Federated Linear Learning with Certified Data Removal

Figure 2 for Forgettable Federated Linear Learning with Certified Data Removal

Figure 3 for Forgettable Federated Linear Learning with Certified Data Removal

Figure 4 for Forgettable Federated Linear Learning with Certified Data Removal

Abstract:Federated learning (FL) is a trending distributed learning framework that enables collaborative model training without data sharing. Machine learning models trained on datasets can potentially expose the private information of the training data, revealing details about individual data records. In this study, we focus on the FL paradigm that grants clients the ``right to be forgotten''. The forgettable FL framework should bleach its global model weights as it has never seen that client and hence does not reveal any information about the client. To this end, we propose the Forgettable Federated Linear Learning (2F2L) framework featured with novel training and data removal strategies. The training pipeline, named Federated linear training, employs linear approximation on the model parameter space to enable our 2F2L framework work for deep neural networks while achieving comparable results with canonical neural network training. We also introduce FedRemoval, an efficient and effective removal strategy that tackles the computational challenges in FL by approximating the Hessian matrix using public server data from the pretrained model. Unlike the previous uncertified and heuristic machine unlearning methods in FL, we provide theoretical guarantees by bounding the differences of model weights by our FedRemoval and that from retraining from scratch. Experimental results on MNIST and Fashion-MNIST datasets demonstrate the effectiveness of our method in achieving a balance between model accuracy and information removal, outperforming baseline strategies and approaching retraining from scratch.

Via

Access Paper or Ask Questions

VITA: A Multi-Source Vicinal Transfer Augmentation Method for Out-of-Distribution Generalization

Apr 25, 2022

Minghui Chen, Cheng Wen, Feng Zheng, Fengxiang He, Ling Shao

Figure 1 for VITA: A Multi-Source Vicinal Transfer Augmentation Method for Out-of-Distribution Generalization

Figure 2 for VITA: A Multi-Source Vicinal Transfer Augmentation Method for Out-of-Distribution Generalization

Figure 3 for VITA: A Multi-Source Vicinal Transfer Augmentation Method for Out-of-Distribution Generalization

Figure 4 for VITA: A Multi-Source Vicinal Transfer Augmentation Method for Out-of-Distribution Generalization

Abstract:Invariance to diverse types of image corruption, such as noise, blurring, or colour shifts, is essential to establish robust models in computer vision. Data augmentation has been the major approach in improving the robustness against common corruptions. However, the samples produced by popular augmentation strategies deviate significantly from the underlying data manifold. As a result, performance is skewed toward certain types of corruption. To address this issue, we propose a multi-source vicinal transfer augmentation (VITA) method for generating diverse on-manifold samples. The proposed VITA consists of two complementary parts: tangent transfer and integration of multi-source vicinal samples. The tangent transfer creates initial augmented samples for improving corruption robustness. The integration employs a generative model to characterize the underlying manifold built by vicinal samples, facilitating the generation of on-manifold samples. Our proposed VITA significantly outperforms the current state-of-the-art augmentation methods, demonstrated in extensive experiments on corruption benchmarks.

* Accepted by AAAI 2022

Via

Access Paper or Ask Questions

Benchmarks for Corruption Invariant Person Re-identification

Nov 01, 2021

Minghui Chen, Zhiqiang Wang, Feng Zheng

Figure 1 for Benchmarks for Corruption Invariant Person Re-identification

Figure 2 for Benchmarks for Corruption Invariant Person Re-identification

Figure 3 for Benchmarks for Corruption Invariant Person Re-identification

Figure 4 for Benchmarks for Corruption Invariant Person Re-identification

Abstract:When deploying person re-identification (ReID) model in safety-critical applications, it is pivotal to understanding the robustness of the model against a diverse array of image corruptions. However, current evaluations of person ReID only consider the performance on clean datasets and ignore images in various corrupted scenarios. In this work, we comprehensively establish six ReID benchmarks for learning corruption invariant representation. In the field of ReID, we are the first to conduct an exhaustive study on corruption invariant learning in single- and cross-modality datasets, including Market-1501, CUHK03, MSMT17, RegDB, SYSU-MM01. After reproducing and examining the robustness performance of 21 recent ReID methods, we have some observations: 1) transformer-based models are more robust towards corrupted images, compared with CNN-based models, 2) increasing the probability of random erasing (a commonly used augmentation method) hurts model corruption robustness, 3) cross-dataset generalization improves with corruption robustness increases. By analyzing the above observations, we propose a strong baseline on both single- and cross-modality ReID datasets which achieves improved robustness against diverse corruptions. Our codes are available on https://github.com/MinghuiChen43/CIL-ReID.

* Accepted by NeurIPS 2021 Track on Datasets and Benchmarks. Project page: https://github.com/MinghuiChen43/CIL-ReID

Via

Access Paper or Ask Questions