Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jielei Zhang

MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation

Mar 04, 2025

Weihang Wang, Duolin Sun, Jielei Zhang, Longwen Gao

Abstract:Few-shot Font Generation (FFG) aims to create new font libraries using limited reference glyphs, with crucial applications in digital accessibility and equity for low-resource languages, especially in multilingual artificial intelligence systems. Although existing methods have shown promising performance, transitioning to unseen characters in low-resource languages remains a significant challenge, especially when font glyphs vary considerably across training sets. MX-Font considers the content of a character from the perspective of a local component, employing a Mixture of Experts (MoE) approach to adaptively extract the component for better transition. However, the lack of a robust feature extractor prevents them from adequately decoupling content and style, leading to sub-optimal generation results. To alleviate these problems, we propose Heterogeneous Aggregation Experts (HAE), a powerful feature extraction expert that helps decouple content and style downstream from being able to aggregate information in channel and spatial dimensions. Additionally, we propose a novel content-style homogeneity loss to enhance the untangling. Extensive experiments on several datasets demonstrate that our MX-Font++ yields superior visual results in FFG and effectively outperforms state-of-the-art methods. Code and data are available at https://github.com/stephensun11/MXFontpp.

* 4 pages, 4 figures, accepted by ICASSP 2025

Via

Access Paper or Ask Questions

DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Aug 01, 2024

Yu Xie, Qian Qiao, Jun Gao, Tianxiang Wu, Shaoyao Huang, Jiaqing Fan, Ziqiang Cao, Zili Wang, Yue Zhang, Jielei Zhang(+1 more)

Figure 1 for DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Figure 2 for DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Figure 3 for DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Figure 4 for DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Abstract:More and more end-to-end text spotting methods based on Transformer architecture have demonstrated superior performance. These methods utilize a bipartite graph matching algorithm to perform one-to-one optimal matching between predicted objects and actual objects. However, the instability of bipartite graph matching can lead to inconsistent optimization targets, thereby affecting the training performance of the model. Existing literature applies denoising training to solve the problem of bipartite graph matching instability in object detection tasks. Unfortunately, this denoising training method cannot be directly applied to text spotting tasks, as these tasks need to perform irregular shape detection tasks and more complex text recognition tasks than classification. To address this issue, we propose a novel denoising training method (DNTextSpotter) for arbitrary-shaped text spotting. Specifically, we decompose the queries of the denoising part into noised positional queries and noised content queries. We use the four Bezier control points of the Bezier center curve to generate the noised positional queries. For the noised content queries, considering that the output of the text in a fixed positional order is not conducive to aligning position with content, we employ a masked character sliding method to initialize noised content queries, thereby assisting in the alignment of text content and position. To improve the model's perception of the background, we further utilize an additional loss function for background characters classification in the denoising training part.Although DNTextSpotter is conceptually simple, it outperforms the state-of-the-art methods on four benchmarks (Total-Text, SCUT-CTW1500, ICDAR15, and Inverse-Text), especially yielding an improvement of 11.3% against the best approach in Inverse-Text dataset.

* Accepted by ACMMM2024

Via

Access Paper or Ask Questions

Facial Attribute Transformers for Precise and Robust Makeup Transfer

Apr 07, 2021

Zhaoyi Wan, Haoran Chen, Jielei Zhang, Wentao Jiang, Cong Yao, Jiebo Luo

Figure 1 for Facial Attribute Transformers for Precise and Robust Makeup Transfer

Figure 2 for Facial Attribute Transformers for Precise and Robust Makeup Transfer

Figure 3 for Facial Attribute Transformers for Precise and Robust Makeup Transfer

Figure 4 for Facial Attribute Transformers for Precise and Robust Makeup Transfer

Abstract:In this paper, we address the problem of makeup transfer, which aims at transplanting the makeup from the reference face to the source face while preserving the identity of the source. Existing makeup transfer methods have made notable progress in generating realistic makeup faces, but do not perform well in terms of color fidelity and spatial transformation. To tackle these issues, we propose a novel Facial Attribute Transformer (FAT) and its variant Spatial FAT for high-quality makeup transfer. Drawing inspirations from the Transformer in NLP, FAT is able to model the semantic correspondences and interactions between the source face and reference face, and then precisely estimate and transfer the facial attributes. To further facilitate shape deformation and transformation of facial parts, we also integrate thin plate splines (TPS) into FAT, thus creating Spatial FAT, which is the first method that can transfer geometric attributes in addition to color and texture. Extensive qualitative and quantitative experiments demonstrate the effectiveness and superiority of our proposed FATs in the following aspects: (1) ensuring high-fidelity color transfer; (2) allowing for geometric transformation of facial parts; (3) handling facial variations (such as poses and shadows) and (4) supporting high-resolution face generation.

Via

Access Paper or Ask Questions

On Vocabulary Reliance in Scene Text Recognition

May 08, 2020

Zhaoyi Wan, Jielei Zhang, Liang Zhang, Jiebo Luo, Cong Yao

Figure 1 for On Vocabulary Reliance in Scene Text Recognition

Figure 2 for On Vocabulary Reliance in Scene Text Recognition

Figure 3 for On Vocabulary Reliance in Scene Text Recognition

Figure 4 for On Vocabulary Reliance in Scene Text Recognition

Abstract:The pursuit of high performance on public benchmarks has been the driving force for research in scene text recognition, and notable progress has been achieved. However, a close investigation reveals a startling fact that the state-of-the-art methods perform well on images with words within vocabulary but generalize poorly to images with words outside vocabulary. We call this phenomenon "vocabulary reliance". In this paper, we establish an analytical framework to conduct an in-depth study on the problem of vocabulary reliance in scene text recognition. Key findings include: (1) Vocabulary reliance is ubiquitous, i.e., all existing algorithms more or less exhibit such characteristic; (2) Attention-based decoders prove weak in generalizing to words outside vocabulary and segmentation-based decoders perform well in utilizing visual features; (3) Context modeling is highly coupled with the prediction layers. These findings provide new insights and can benefit future research in scene text recognition. Furthermore, we propose a simple yet effective mutual learning strategy to allow models of two families (attention-based and segmentation-based) to learn collaboratively. This remedy alleviates the problem of vocabulary reliance and improves the overall scene text recognition performance.

* EEE/CVF Conferences on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, June 2020
* CVPR'20

Via

Access Paper or Ask Questions