Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guinan Su

Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Feb 06, 2025

Guinan Su, Jonas Geiping

Figure 1 for Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Figure 2 for Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Figure 3 for Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Figure 4 for Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Abstract:Reasoning capabilities represent a critical frontier for large language models (LLMs), but developing them requires extensive proprietary datasets and computational resources. One way to efficiently supplement capabilities with is by model merging, which offers a promising alternative by combining multiple models without retraining. However, current merging approaches rely on manually-designed strategies for merging hyperparameters, limiting the exploration of potential model combinations and requiring significant human effort. We propose an Automated Model Merging Framework that enables fine-grained exploration of merging strategies while reducing costs through multi-fidelity approximations. We support both single and multi-objective optimization and introduce two novel search spaces: layerwise fusion (LFS) and depth-wise integration (DIS). Evaluating across a number of benchmarks, we find that the search autonomously finds 1) Merges that further boost single-objective performance, even on tasks the model has already been finetuned on, and 2) Merges that optimize multi-objective frontiers across tasks. Effective merges are found with limited compute, e.g. within less than 500 search steps.

Via

Access Paper or Ask Questions

DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation

Nov 13, 2023

Guinan Su, Yanwu Yang, Zhifeng Li

Figure 1 for DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation

Figure 2 for DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation

Figure 3 for DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation

Figure 4 for DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation

Abstract:In recent years, audio-driven 3D facial animation has gained significant attention, particularly in applications such as virtual reality, gaming, and video conferencing. However, accurately modeling the intricate and subtle dynamics of facial expressions remains a challenge. Most existing studies approach the facial animation task as a single regression problem, which often fail to capture the intrinsic inter-modal relationship between speech signals and 3D facial animation and overlook their inherent consistency. Moreover, due to the limited availability of 3D-audio-visual datasets, approaches learning with small-size samples have poor generalizability that decreases the performance. To address these issues, in this study, we propose a cross-modal dual-learning framework, termed DualTalker, aiming at improving data usage efficiency as well as relating cross-modal dependencies. The framework is trained jointly with the primary task (audio-driven facial animation) and its dual task (lip reading) and shares common audio/motion encoder components. Our joint training framework facilitates more efficient data usage by leveraging information from both tasks and explicitly capitalizing on the complementary relationship between facial motion and audio to improve performance. Furthermore, we introduce an auxiliary cross-modal consistency loss to mitigate the potential over-smoothing underlying the cross-modal complementary representations, enhancing the mapping of subtle facial expression dynamics. Through extensive experiments and a perceptual user study conducted on the VOCA and BIWI datasets, we demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively. We have made our code and video demonstrations available at https://github.com/sabrina-su/iadf.git.

Via

Access Paper or Ask Questions

AlphaGAN: Fully Differentiable Architecture Search for Generative Adversarial Networks

Jun 16, 2020

Yuesong Tian, Li Shen, Guinan Su, Zhifeng Li, Wei Liu

Figure 1 for AlphaGAN: Fully Differentiable Architecture Search for Generative Adversarial Networks

Figure 2 for AlphaGAN: Fully Differentiable Architecture Search for Generative Adversarial Networks

Figure 3 for AlphaGAN: Fully Differentiable Architecture Search for Generative Adversarial Networks

Figure 4 for AlphaGAN: Fully Differentiable Architecture Search for Generative Adversarial Networks

Abstract:Generative Adversarial Networks (GANs) are formulated as minimax game problems, whereby generators attempt to approach real data distributions by virtue of adversarial learning against discriminators. The intrinsic problem complexity poses the challenge to enhance the performance of generative networks. In this work, we aim to boost model learning from the perspective of network architectures, by incorporating recent progress on automated architecture search into GANs. To this end, we propose a fully differentiable search framework for generative adversarial networks, dubbed alphaGAN. The searching process is formalized as solving a bi-level minimax optimization problem, in which the outer-level objective aims for seeking a suitable network architecture towards pure Nash Equilibrium conditioned on the generator and the discriminator network parameters optimized with a traditional GAN loss in the inner level. The entire optimization performs a first-order method by alternately minimizing the two-level objective in a fully differentiable manner, enabling architecture search to be completed in an enormous search space. Extensive experiments on CIFAR-10 and STL-10 datasets show that our algorithm can obtain high-performing architectures only with 3-GPU hours on a single GPU in the search space comprised of approximate 2 ? 1011 possible configurations. We also provide a comprehensive analysis on the behavior of the searching process and the properties of searched architectures, which would benefit further research on architectures for generative models. Pretrained models and codes are available at https://github.com/yuesongtian/AlphaGAN.

Via

Access Paper or Ask Questions

TextNAS: A Neural Architecture Search Space tailored for Text Representation

Dec 23, 2019

Yujing Wang, Yaming Yang, Yiren Chen, Jing Bai, Ce Zhang, Guinan Su, Xiaoyu Kou, Yunhai Tong, Mao Yang, Lidong Zhou

Figure 1 for TextNAS: A Neural Architecture Search Space tailored for Text Representation

Figure 2 for TextNAS: A Neural Architecture Search Space tailored for Text Representation

Figure 3 for TextNAS: A Neural Architecture Search Space tailored for Text Representation

Figure 4 for TextNAS: A Neural Architecture Search Space tailored for Text Representation

Abstract:Learning text representation is crucial for text classification and other language related tasks. There are a diverse set of text representation networks in the literature, and how to find the optimal one is a non-trivial problem. Recently, the emerging Neural Architecture Search (NAS) techniques have demonstrated good potential to solve the problem. Nevertheless, most of the existing works of NAS focus on the search algorithms and pay little attention to the search space. In this paper, we argue that the search space is also an important human prior to the success of NAS in different applications. Thus, we propose a novel search space tailored for text representation. Through automatic search, the discovered network architecture outperforms state-of-the-art models on various public datasets on text classification and natural language inference tasks. Furthermore, some of the design principles found in the automatic network agree well with human intuition.

Via

Access Paper or Ask Questions