Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiafeng Mao

Training Data Synthesis with Difficulty Controlled Diffusion Model

Nov 27, 2024

Zerun Wang, Jiafeng Mao, Xueting Wang, Toshihiko Yamasaki

Figure 1 for Training Data Synthesis with Difficulty Controlled Diffusion Model

Figure 2 for Training Data Synthesis with Difficulty Controlled Diffusion Model

Figure 3 for Training Data Synthesis with Difficulty Controlled Diffusion Model

Figure 4 for Training Data Synthesis with Difficulty Controlled Diffusion Model

Abstract:Semi-supervised learning (SSL) can improve model performance by leveraging unlabeled images, which can be collected from public image sources with low costs. In recent years, synthetic images have become increasingly common in public image sources due to rapid advances in generative models. Therefore, it is becoming inevitable to include existing synthetic images in the unlabeled data for SSL. How this kind of contamination will affect SSL remains unexplored. In this paper, we introduce a new task, Real-Synthetic Hybrid SSL (RS-SSL), to investigate the impact of unlabeled data contaminated by synthetic images for SSL. First, we set up a new RS-SSL benchmark to evaluate current SSL methods and found they struggled to improve by unlabeled synthetic images, sometimes even negatively affected. To this end, we propose RSMatch, a novel SSL method specifically designed to handle the challenges of RS-SSL. RSMatch effectively identifies unlabeled synthetic data and further utilizes them for improvement. Extensive experimental results show that RSMatch can transfer synthetic unlabeled data from `obstacles' to `resources.' The effectiveness is further verified through ablation studies and visualization.

Via

Access Paper or Ask Questions

Reward Incremental Learning in Text-to-Image Generation

Nov 26, 2024

Maorong Wang, Jiafeng Mao, Xueting Wang, Toshihiko Yamasaki

Abstract:The recent success of denoising diffusion models has significantly advanced text-to-image generation. While these large-scale pretrained models show excellent performance in general image synthesis, downstream objectives often require fine-tuning to meet specific criteria such as aesthetics or human preference. Reward gradient-based strategies are promising in this context, yet existing methods are limited to single-reward tasks, restricting their applicability in real-world scenarios that demand adapting to multiple objectives introduced incrementally over time. In this paper, we first define this more realistic and unexplored problem, termed Reward Incremental Learning (RIL), where models are desired to adapt to multiple downstream objectives incrementally. Additionally, while the models adapt to the ever-emerging new objectives, we observe a unique form of catastrophic forgetting in diffusion model fine-tuning, affecting both metric-wise and visual structure-wise image quality. To address this catastrophic forgetting challenge, we propose Reward Incremental Distillation (RID), a method that mitigates forgetting with minimal computational overhead, enabling stable performance across sequential reward tasks. The experimental results demonstrate the efficacy of RID in achieving consistent, high-quality generation in RIL scenarios. The source code of our work will be publicly available upon acceptance.

* Under review

Via

Access Paper or Ask Questions

Dealing with Synthetic Data Contamination in Online Continual Learning

Nov 21, 2024

Maorong Wang, Nicolas Michel, Jiafeng Mao, Toshihiko Yamasaki

Abstract:Image generation has shown remarkable results in generating high-fidelity realistic images, in particular with the advancement of diffusion-based models. However, the prevalence of AI-generated images may have side effects for the machine learning community that are not clearly identified. Meanwhile, the success of deep learning in computer vision is driven by the massive dataset collected on the Internet. The extensive quantity of synthetic data being added to the Internet would become an obstacle for future researchers to collect "clean" datasets without AI-generated content. Prior research has shown that using datasets contaminated by synthetic images may result in performance degradation when used for training. In this paper, we investigate the potential impact of contaminated datasets on Online Continual Learning (CL) research. We experimentally show that contaminated datasets might hinder the training of existing online CL methods. Also, we propose Entropy Selection with Real-synthetic similarity Maximization (ESRM), a method to alleviate the performance deterioration caused by synthetic images when training online CL models. Experiments show that our method can significantly alleviate performance deterioration, especially when the contamination is severe. For reproducibility, the source code of our work is available at https://github.com/maorong-wang/ESRM.

* Accepted to NeurIPS'24

Via

Access Paper or Ask Questions

SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

Sep 26, 2024

Zerun Wang, Liuyu Xiang, Lang Huang, Jiafeng Mao, Ling Xiao, Toshihiko Yamasaki

Figure 1 for SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

Figure 2 for SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

Figure 3 for SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

Figure 4 for SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

Abstract:Open-set semi-supervised learning (OSSL) leverages practical open-set unlabeled data, comprising both in-distribution (ID) samples from seen classes and out-of-distribution (OOD) samples from unseen classes, for semi-supervised learning (SSL). Prior OSSL methods initially learned the decision boundary between ID and OOD with labeled ID data, subsequently employing self-training to refine this boundary. These methods, however, suffer from the tendency to overtrust the labeled ID data: the scarcity of labeled data caused the distribution bias between the labeled samples and the entire ID data, which misleads the decision boundary to overfit. The subsequent self-training process, based on the overfitted result, fails to rectify this problem. In this paper, we address the overtrusting issue by treating OOD samples as an additional class, forming a new SSL process. Specifically, we propose SCOMatch, a novel OSSL method that 1) selects reliable OOD samples as new labeled data with an OOD memory queue and a corresponding update strategy and 2) integrates the new SSL process into the original task through our Simultaneous Close-set and Open-set self-training. SCOMatch refines the decision boundary of ID and OOD classes across the entire dataset, thereby leading to improved results. Extensive experimental results show that SCOMatch significantly outperforms the state-of-the-art methods on various benchmarks. The effectiveness is further verified through ablation studies and visualization.

* ECCV 2024 accepted

Via

Access Paper or Ask Questions

Training-Free Sketch-Guided Diffusion with Latent Optimization

Aug 31, 2024

Sandra Zhang Ding, Jiafeng Mao, Kiyoharu Aizawa

Figure 1 for Training-Free Sketch-Guided Diffusion with Latent Optimization

Figure 2 for Training-Free Sketch-Guided Diffusion with Latent Optimization

Figure 3 for Training-Free Sketch-Guided Diffusion with Latent Optimization

Figure 4 for Training-Free Sketch-Guided Diffusion with Latent Optimization

Abstract:Based on recent advanced diffusion models, Text-to-image (T2I) generation models have demonstrated their capabilities in generating diverse and high-quality images. However, leveraging their potential for real-world content creation, particularly in providing users with precise control over the image generation result, poses a significant challenge. In this paper, we propose an innovative training-free pipeline that extends existing text-to-image generation models to incorporate a sketch as an additional condition. To generate new images with a layout and structure closely resembling the input sketch, we find that these core features of a sketch can be tracked with the cross-attention maps of diffusion models. We introduce latent optimization, a method that refines the noisy latent at each intermediate step of the generation process using cross-attention maps to ensure that the generated images closely adhere to the desired structure outlined in the reference sketch. Through latent optimization, our method enhances the fidelity and accuracy of image generation, offering users greater control and customization options in content creation.

Via

Access Paper or Ask Questions

From Obstacle to Opportunity: Enhancing Semi-supervised Learning with Synthetic Data

May 27, 2024

Zerun Wang, Jiafeng Mao, Liuyu Xiang, Toshihiko Yamasaki

Figure 1 for From Obstacle to Opportunity: Enhancing Semi-supervised Learning with Synthetic Data

Figure 2 for From Obstacle to Opportunity: Enhancing Semi-supervised Learning with Synthetic Data

Figure 3 for From Obstacle to Opportunity: Enhancing Semi-supervised Learning with Synthetic Data

Figure 4 for From Obstacle to Opportunity: Enhancing Semi-supervised Learning with Synthetic Data

Abstract:Semi-supervised learning (SSL) can utilize unlabeled data to enhance model performance. In recent years, with increasingly powerful generative models becoming available, a large number of synthetic images have been uploaded to public image sets. Therefore, when collecting unlabeled data from these sources, the inclusion of synthetic images is inevitable. This prompts us to consider the impact of unlabeled data mixed with real and synthetic images on SSL. In this paper, we set up a new task, Real and Synthetic hybrid SSL (RS-SSL), to investigate this problem. We discover that current SSL methods are unable to fully utilize synthetic data and are sometimes negatively affected. Then, by analyzing the issues caused by synthetic images, we propose a new SSL method, RSMatch, to tackle the RS-SSL problem. Extensive experimental results show that RSMatch can better utilize the synthetic data in unlabeled images to improve the SSL performance. The effectiveness is further verified through ablation studies and visualization.

Via

Access Paper or Ask Questions

Semantic-Driven Initial Image Construction for Guided Image Synthesis in Diffusion Model

Dec 13, 2023

Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa

Abstract:The initial noise image has demonstrated a significant influence on image generation, and manipulating the initial noise image can effectively increase control over the generation. All of the current generation is based only on a single initial noise drawn from a normal distribution, which may not be suited to the desired content specified by the prompt. In this research, we propose a novel approach using pre-collected, semantically-informed pixel blocks from multiple initial noises for the initial image construction to enhance control over the image generation. The inherent tendencies of these pixel blocks can easily generate specific content, thus effectively guiding the generation process towards the desired content. The pursuit of tailored initial image construction inevitably leads to deviations from the normal distribution, and our experimental results show that the diffusion model exhibits a certain degree of tolerance towards the distribution of initial images. Our approach achieves state-of-the-art performance in the training-free layout-to-image synthesis task, demonstrating the adaptability of the initial image construction in guiding the content of the generated image. Our code will be made publicly available.

Via

Access Paper or Ask Questions

Guided Image Synthesis via Initial Image Editing in Diffusion Model

May 05, 2023

Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa

Abstract:Diffusion models have the ability to generate high quality images by denoising pure Gaussian noise images. While previous research has primarily focused on improving the control of image generation through adjusting the denoising process, we propose a novel direction of manipulating the initial noise to control the generated image. Through experiments on stable diffusion, we show that blocks of pixels in the initial latent images have a preference for generating specific content, and that modifying these blocks can significantly influence the generated image. In particular, we show that modifying a part of the initial image affects the corresponding region of the generated image while leaving other regions unaffected, which is useful for repainting tasks. Furthermore, we find that the generation preferences of pixel blocks are primarily determined by their values, rather than their position. By moving pixel blocks with a tendency to generate user-desired content to user-specified regions, our approach achieves state-of-the-art performance in layout-to-image generation. Our results highlight the flexibility and power of initial image manipulation in controlling the generated image.

Via

Access Paper or Ask Questions

Training-Free Location-Aware Text-to-Image Synthesis

Apr 26, 2023

Jiafeng Mao, Xueting Wang

Abstract:Current large-scale generative models have impressive efficiency in generating high-quality images based on text prompts. However, they lack the ability to precisely control the size and position of objects in the generated image. In this study, we analyze the generative mechanism of the stable diffusion model and propose a new interactive generation paradigm that allows users to specify the position of generated objects without additional training. Moreover, we propose an object detection-based evaluation metric to assess the control capability of location aware generation task. Our experimental results show that our method outperforms state-of-the-art methods on both control capacity and image quality.

Via

Access Paper or Ask Questions

Noisy Annotation Refinement for Object Detection

Oct 20, 2021

Jiafeng Mao, Qing Yu, Yoko Yamakata, Kiyoharu Aizawa

Figure 1 for Noisy Annotation Refinement for Object Detection

Figure 2 for Noisy Annotation Refinement for Object Detection

Figure 3 for Noisy Annotation Refinement for Object Detection

Figure 4 for Noisy Annotation Refinement for Object Detection

Abstract:Supervised training of object detectors requires well-annotated large-scale datasets, whose production is costly. Therefore, some efforts have been made to obtain annotations in economical ways, such as cloud sourcing. However, datasets obtained by these methods tend to contain noisy annotations such as inaccurate bounding boxes and incorrect class labels. In this study, we propose a new problem setting of training object detectors on datasets with entangled noises of annotations of class labels and bounding boxes. Our proposed method efficiently decouples the entangled noises, corrects the noisy annotations, and subsequently trains the detector using the corrected annotations. We verified the effectiveness of our proposed method and compared it with the baseline on noisy datasets with different noise levels. The experimental results show that our proposed method significantly outperforms the baseline.

Via

Access Paper or Ask Questions