Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yanxiang Gong

Distribution Fitting for Combating Mode Collapse in GANs

Dec 03, 2022

Yanxiang Gong, Zhiwei Xie, Guozhen Duan, Zheng Ma, Mei Xie

Abstract:Mode collapse is still a major unsolved problem in generative adversarial networks. In this work, we analyze the causes of mode collapse from a new perspective. Due to the nonuniform sampling in the training process, some sub-distributions can be missed while sampling data. Therefore, the GAN objective can reach the minimum when the generated distribution is not the same as the real one. To alleviate the problem, we propose a global distribution fitting (GDF) method by a penalty term to constrain generated data distribution. On the basis of not changing the global minimum of the GAN objective, GDF will make it harder to reach the minimum value when the generated distribution is not the same as the real one. Furthermore, we also propose a local distribution fitting (LDF) method to cope with the situation that the real distribution is unknown. Experiments on several benchmarks demonstrate the effectiveness and competitive performance of GDF and LDF.

Via

Access Paper or Ask Questions

Unified Chinese License Plate Detection and Recognition with High Efficiency

May 07, 2022

Yanxiang Gong, Linjie Deng, Shuai Tao, Xinchen Lu, Peicheng Wu, Zhiwei Xie, Zheng Ma, Mei Xie

Figure 1 for Unified Chinese License Plate Detection and Recognition with High Efficiency

Figure 2 for Unified Chinese License Plate Detection and Recognition with High Efficiency

Figure 3 for Unified Chinese License Plate Detection and Recognition with High Efficiency

Figure 4 for Unified Chinese License Plate Detection and Recognition with High Efficiency

Abstract:Recently, deep learning-based methods have reached an excellent performance on License Plate (LP) detection and recognition tasks. However, it is still challenging to build a robust model for Chinese LPs since there are not enough large and representative datasets. In this work, we propose a new dataset named Chinese Road Plate Dataset (CRPD) that contains multi-objective Chinese LP images as a supplement to the existing public benchmarks. The images are mainly captured with electronic monitoring systems with detailed annotations. To our knowledge, CRPD is the largest public multi-objective Chinese LP dataset with annotations of vertices. With CRPD, a unified detection and recognition network with high efficiency is presented as the baseline. The network is end-to-end trainable with totally real-time inference efficiency (30 fps with 640p). The experiments on several public benchmarks demonstrate that our method has reached competitive performance. The code and dataset will be publicly available at https://github.com/yxgong0/CRPD.

Via

Access Paper or Ask Questions

Unsupervised domain adaptation via coarse-to-fine feature alignment method using contrastive learning

Mar 23, 2021

Shiyu Tang, Peijun Tang, Yanxiang Gong, Zheng Ma, Mei Xie

Figure 1 for Unsupervised domain adaptation via coarse-to-fine feature alignment method using contrastive learning

Figure 2 for Unsupervised domain adaptation via coarse-to-fine feature alignment method using contrastive learning

Figure 3 for Unsupervised domain adaptation via coarse-to-fine feature alignment method using contrastive learning

Figure 4 for Unsupervised domain adaptation via coarse-to-fine feature alignment method using contrastive learning

Abstract:Previous feature alignment methods in Unsupervised domain adaptation(UDA) mostly only align global features without considering the mismatch between class-wise features. In this work, we propose a new coarse-to-fine feature alignment method using contrastive learning called CFContra. It draws class-wise features closer than coarse feature alignment or class-wise feature alignment only, therefore improves the model's performance to a great extent. We build it upon one of the most effective methods of UDA called entropy minimization to further improve performance. In particular, to prevent excessive memory occupation when applying contrastive loss in semantic segmentation, we devise a new way to build and update the memory bank. In this way, we make the algorithm more efficient and viable with limited memory. Extensive experiments show the effectiveness of our method and model trained on the GTA5 to Cityscapes dataset has boost mIOU by 3.5 compared to the MinEnt algorithm. Our code will be publicly available.

Via

Access Paper or Ask Questions

What's the relationship between CNNs and communication systems?

Mar 03, 2020

Hao Ge, Xiaoguang Tu, Yanxiang Gong, Mei Xie, Zheng Ma

Abstract:The interpretability of Convolutional Neural Networks (CNNs) is an important topic in the field of computer vision. In recent years, works in this field generally adopt a mature model to reveal the internal mechanism of CNNs, helping to understand CNNs thoroughly. In this paper, we argue the working mechanism of CNNs can be revealed through a totally different interpretation, by comparing the communication systems and CNNs. This paper successfully obtained the corresponding relationship between the modules of the two, and verified the rationality of the corresponding relationship with experiments. Finally, through the analysis of some cutting-edge research on neural networks, we find the inherent relation between these two tasks can be of help in explaining these researches reasonably, as well as helping us discover the correct research direction of neural networks.

* Deep learning, adversarial example, interpretability

Via

Access Paper or Ask Questions

Focus-Enhanced Scene Text Recognition with Deformable Convolutions

Sep 23, 2019

Linjie Deng, Yanxiang Gong, Xinchen Lu, Xin Yi, Zheng Ma, Mei Xie

Figure 1 for Focus-Enhanced Scene Text Recognition with Deformable Convolutions

Figure 2 for Focus-Enhanced Scene Text Recognition with Deformable Convolutions

Figure 3 for Focus-Enhanced Scene Text Recognition with Deformable Convolutions

Figure 4 for Focus-Enhanced Scene Text Recognition with Deformable Convolutions

Abstract:Recently, scene text recognition methods based on deep learning have sprung up in computer vision area. The existing methods achieved great performances, but the recognition of irregular text is still challenging due to the various shapes and distorted patterns. Consider that at the time of reading words in the real world, normally we will not rectify it in our mind but adjust our focus and visual fields. Similarly, through utilizing deformable convolutional layers whose geometric structures are adjustable, we present an enhanced recognition network without the steps of rectification to deal with irregular text in this work. A number of experiments have been applied, where the results on public benchmarks demonstrate the effectiveness of our proposed components and shows that our method has reached satisfactory performances. The code will be publicly available at https://github.com/Alpaca07/dtr soon.

Via

Access Paper or Ask Questions

STELA: A Real-Time Scene Text Detector with Learned Anchor

Sep 23, 2019

Linjie Deng, Yanxiang Gong, Xinchen Lu, Yi Lin, Zheng Ma, Mei Xie

Figure 1 for STELA: A Real-Time Scene Text Detector with Learned Anchor

Figure 2 for STELA: A Real-Time Scene Text Detector with Learned Anchor

Figure 3 for STELA: A Real-Time Scene Text Detector with Learned Anchor

Figure 4 for STELA: A Real-Time Scene Text Detector with Learned Anchor

Abstract:To achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks. In this work, we present a simple and intuitive method for multi-oriented text detection where each location of feature maps only associates with one reference box. The idea is inspired from the twostage R-CNN framework that can estimate the location of objects with any shape by using learned proposals. The aim of our method is to integrate this mechanism into a onestage detector and employ the learned anchor which is obtained through a regression operation to replace the original one into the final predictions. Based on RetinaNet, our method achieves competitive performances on several public benchmarks with a totally real-time efficiency (26:5fps at 800p), which surpasses all of anchor-based scene text detectors. In addition, with less attention on anchor design, we believe our method is easy to be applied on other analogous detection tasks. The code will publicly available at https://github.com/xhzdeng/stela.

Via

Access Paper or Ask Questions

Generating Text Sequence Images for Recognition

Jan 21, 2019

Yanxiang Gong, Linjie Deng, Zheng Ma, Mei Xie

Figure 1 for Generating Text Sequence Images for Recognition

Figure 2 for Generating Text Sequence Images for Recognition

Figure 3 for Generating Text Sequence Images for Recognition

Figure 4 for Generating Text Sequence Images for Recognition

Abstract:Recently, methods based on deep learning have dominated the field of text recognition. With a large number of training data, most of them can achieve the state-of-the-art performances. However, it is hard to harvest and label sufficient text sequence images from the real scenes. To mitigate this issue, several methods to synthesize text sequence images were proposed, yet they usually need complicated preceding or follow-up steps. In this work, we present a method which is able to generate infinite training data without any auxiliary pre/post-process. We tackle the generation task as an image-to-image translation one and utilize conditional adversarial networks to produce realistic text sequence images in the light of the semantic ones. Some evaluation metrics are involved to assess our method and the results demonstrate that the caliber of the data is satisfactory. The code and dataset will be publicly available soon.

Via

Access Paper or Ask Questions

Detecting Multi-Oriented Text with Corner-based Region Proposals

Apr 08, 2018

Linjie Deng, Yanxiang Gong, Yi Lin, Jingwen Shuai, Xiaoguang Tu, Yufei Zhang, Zheng Ma, Mei Xie

Figure 1 for Detecting Multi-Oriented Text with Corner-based Region Proposals

Figure 2 for Detecting Multi-Oriented Text with Corner-based Region Proposals

Figure 3 for Detecting Multi-Oriented Text with Corner-based Region Proposals

Figure 4 for Detecting Multi-Oriented Text with Corner-based Region Proposals

Abstract:Previous approaches for scene text detection usually rely on manually defined sliding windows. In this paper, an intuitive region-based method is presented to detect multi-oriented text without any prior knowledge regarding the textual shape. We first introduce a Corner-based Region Proposal Network (CRPN) that employs corners to estimate the possible locations of text instances instead of shifting a set of default anchors. The proposals generated by CRPN are geometry adaptive, which makes our method robust to various text aspect ratios and orientations. Moreover, we design a simple embedded data augmentation module inside the region-wise subnetwork, which not only ensures the model utilizes training data more efficiently, but also learns to find the most representative instance of the input images for training. Experimental results on public benchmarks confirm that the proposed method is capable of achieving comparable performance with the state-of-the-art methods. On the ICDAR 2013 and 2015 datasets, it obtains F-measure of 0.876 and 0.845 respectively. The code is publicly available at https://github.com/xhzdeng/crpn

Via

Access Paper or Ask Questions