Abstract:Generative artificial intelligence (GenAI) has made significant progress in understanding world knowledge and generating content from human languages across various modalities, like text-to-text large language models, text-to-image stable diffusion, and text-to-video Sora. While in this paper, we investigate the capability of GenAI for text-to-model generation, to see whether GenAI can comprehend hyper-level knowledge embedded within AI itself parameters. Specifically, we study a practical scenario termed train-once-for-all personalization, aiming to generate personalized models for diverse end-users and tasks using text prompts. Inspired by the recent emergence of neural network diffusion, we present Tina, a text-conditioned neural network diffusion for train-once-for-all personalization. Tina leverages a diffusion transformer model conditioned on task descriptions embedded using a CLIP model. Despite the astronomical number of potential personalized tasks (e.g., $1.73\times10^{13}$), by our design, Tina demonstrates remarkable in-distribution and out-of-distribution generalization even trained on small datasets ($\sim 1000$). We further verify whether and how \Tina understands world knowledge by analyzing its capabilities under zero-shot/few-shot image prompts, different numbers of personalized classes, prompts of natural language descriptions, and predicting unseen entities.
Abstract:Personalized federated learning (pFL) enables collaborative training among multiple clients to enhance the capability of customized local models. In pFL, clients may have heterogeneous (also known as non-IID) data, which poses a key challenge in how to decouple the data knowledge into generic knowledge for global sharing and personalized knowledge for preserving local personalization. A typical way of pFL focuses on label distribution skew, and they adopt a decoupling scheme where the model is split into a common feature extractor and two prediction heads (generic and personalized). However, such a decoupling scheme cannot solve the essential problem of feature skew heterogeneity, because a common feature extractor cannot decouple the generic and personalized features. Therefore, in this paper, we rethink the architecture decoupling design for feature-skew pFL and propose an effective pFL method called FediOS. In FediOS, we reformulate the decoupling into two feature extractors (generic and personalized) and one shared prediction head. Orthogonal projections are used for clients to map the generic features into one common subspace and scatter the personalized features into different subspaces to achieve decoupling for them. In addition, a shared prediction head is trained to balance the importance of generic and personalized features during inference. Extensive experiments on four vision datasets demonstrate our method reaches state-of-the-art pFL performances under feature skew heterogeneity.