Abstract:This study investigates identity-preserving image synthesis, an intriguing task in image generation that seeks to maintain a subject's identity while adding a personalized, stylistic touch. Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation, but they come with significant drawbacks. These include the need for extensive resources and time for fine-tuning, as well as the requirement for multiple reference images. To overcome these challenges, our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images. Our model leverages a direct feed-forward mechanism, circumventing the need for intensive fine-tuning, thereby facilitating quick and efficient image generation. Central to our innovation is a hybrid guidance framework, which combines stylized images, facial images, and textual prompts to guide the image generation process. This unique combination enables our model to produce a variety of applications, such as artistic portraits and identity-blended images. Our experimental results, including both qualitative and quantitative evaluations, demonstrate the superiority of our method over existing baseline models and previous works, particularly in its remarkable efficiency and ability to preserve the subject's identity with high fidelity.
Abstract:Expressivity plays a fundamental role in evaluating deep neural networks, and it is closely related to understanding the limit of performance improvement. In this paper, we propose a three-pipeline training framework based on critical expressivity, including global model contraction, weight evolution, and link's weight rewiring. Specifically, we propose a pyramidal-like skeleton to overcome the saddle points that affect information transfer. Then we analyze the reason for the modularity (clustering) phenomenon in network topology and use it to rewire potential erroneous weighted links. We conduct numerical experiments on node classification and the results confirm that the proposed training framework leads to a significantly improved performance in terms of fast convergence and robustness to potential erroneous weighted links. The architecture design on GNNs, in turn, verifies the expressivity of GNNs from dynamics and topological space aspects and provides useful guidelines in designing more efficient neural networks.
Abstract:Expressivity is one of the most significant issues in assessing neural networks. In this paper, we provide a quantitative analysis of the expressivity from dynamic models, where Hilbert space is employed to analyze its convergence and criticality. From the feature mapping of several widely used activation functions made by Hermite polynomials, We found sharp declines or even saddle points in the feature space, which stagnate the information transfer in deep neural networks, then present an activation function design based on the Hermite polynomials for better utilization of spatial representation. Moreover, we analyze the information transfer of deep neural networks, emphasizing the convergence problem caused by the mismatch between input and topological structure. We also study the effects of input perturbations and regularization operators on critical expressivity. Finally, we verified the proposed method by multivariate time series prediction. The results show that the optimized DeepESN provides higher predictive performance, especially for long-term prediction. Our theoretical analysis reveals that deep neural networks use spatial domains for information representation and evolve to the edge of chaos as depth increases. In actual training, whether a particular network can ultimately arrive that depends on its ability to overcome convergence and pass information to the required network depth.