Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Naoto Inoue

Type-R: Automatically Retouching Typos for Text-to-Image Generation

Nov 27, 2024

Wataru Shimoda, Naoto Inoue, Daichi Haraguchi, Hayato Mitani, Seichi Uchida, Kota Yamaguchi

Abstract:While recent text-to-image models can generate photorealistic images from text prompts that reflect detailed instructions, they still face significant challenges in accurately rendering words in the image. In this paper, we propose to retouch erroneous text renderings in the post-processing pipeline. Our approach, called Type-R, identifies typographical errors in the generated image, erases the erroneous text, regenerates text boxes for missing words, and finally corrects typos in the rendered words. Through extensive experiments, we show that Type-R, in combination with the latest text-to-image models such as Stable Diffusion or Flux, achieves the highest text rendering accuracy while maintaining image quality and also outperforms text-focused generation baselines in terms of balancing text accuracy and image quality.

Via

Access Paper or Ask Questions

Can GPTs Evaluate Graphic Design Based on Design Principles?

Oct 11, 2024

Daichi Haraguchi, Naoto Inoue, Wataru Shimoda, Hayato Mitani, Seiichi Uchida, Kota Yamaguchi

Abstract:Recent advancements in foundation models show promising capability in graphic design generation. Several studies have started employing Large Multimodal Models (LMMs) to evaluate graphic designs, assuming that LMMs can properly assess their quality, but it is unclear if the evaluation is reliable. One way to evaluate the quality of graphic design is to assess whether the design adheres to fundamental graphic design principles, which are the designer's common practice. In this paper, we compare the behavior of GPT-based evaluation and heuristic evaluation based on design principles using human annotations collected from 60 subjects. Our experiments reveal that, while GPTs cannot distinguish small details, they have a reasonably good correlation with human annotation and exhibit a similar tendency to heuristic metrics based on design principles, suggesting that they are indeed capable of assessing the quality of graphic design. Our dataset is available at https://cyberagentailab.github.io/Graphic-design-evaluation .

* Accepted to SIGGRAPH Asia 2024 (Technical Communications Track)

Via

Access Paper or Ask Questions

Multimodal Markup Document Models for Graphic Design Completion

Sep 27, 2024

Kotaro Kikuchi, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi

Figure 1 for Multimodal Markup Document Models for Graphic Design Completion

Figure 2 for Multimodal Markup Document Models for Graphic Design Completion

Figure 3 for Multimodal Markup Document Models for Graphic Design Completion

Figure 4 for Multimodal Markup Document Models for Graphic Design Completion

Abstract:This paper presents multimodal markup document models (MarkupDM) that can generate both markup language and images within interleaved multimodal documents. Unlike existing vision-and-language multimodal models, our MarkupDM tackles unique challenges critical to graphic design tasks: generating partial images that contribute to the overall appearance, often involving transparency and varying sizes, and understanding the syntax and semantics of markup languages, which play a fundamental role as a representational format of graphic designs. To address these challenges, we design an image quantizer to tokenize images of diverse sizes with transparency and modify a code language model to process markup languages and incorporate image modalities. We provide in-depth evaluations of our approach on three graphic design completion tasks: generating missing attribute values, images, and texts in graphic design templates. Results corroborate the effectiveness of our MarkupDM for graphic design tasks. We also discuss the strengths and weaknesses in detail, providing insights for future research on multimodal document generation.

* Project page: https://cyberagentailab.github.io/MarkupDM/

Via

Access Paper or Ask Questions

LTSim: Layout Transportation-based Similarity Measure for Evaluating Layout Generation

Jul 17, 2024

Mayu Otani, Naoto Inoue, Kotaro Kikuchi, Riku Togashi

Figure 1 for LTSim: Layout Transportation-based Similarity Measure for Evaluating Layout Generation

Figure 2 for LTSim: Layout Transportation-based Similarity Measure for Evaluating Layout Generation

Figure 3 for LTSim: Layout Transportation-based Similarity Measure for Evaluating Layout Generation

Figure 4 for LTSim: Layout Transportation-based Similarity Measure for Evaluating Layout Generation

Abstract:We introduce a layout similarity measure designed to evaluate the results of layout generation. While several similarity measures have been proposed in prior research, there has been a lack of comprehensive discussion about their behaviors. Our research uncovers that the majority of these measures are unable to handle various layout differences, primarily due to their dependencies on strict element matching, that is one-by-one matching of elements within the same category. To overcome this limitation, we propose a new similarity measure based on optimal transport, which facilitates a more flexible matching of elements. This approach allows us to quantify the similarity between any two layouts even those sharing no element categories, making our measure highly applicable to a wide range of layout generation tasks. For tasks such as unconditional layout generation, where FID is commonly used, we also extend our measure to deal with collection-level similarities between groups of layouts. The empirical result suggests that our collection-level measure offers more reliable comparisons than existing ones like FID and Max.IoU.

* 26 pages

Via

Access Paper or Ask Questions

OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

Jun 12, 2024

Naoto Inoue, Kento Masui, Wataru Shimoda, Kota Yamaguchi

Abstract:Automatic generation of graphic designs has recently received considerable attention. However, the state-of-the-art approaches are complex and rely on proprietary datasets, which creates reproducibility barriers. In this paper, we propose an open framework for automatic graphic design called OpenCOLE, where we build a modified version of the pioneering COLE and train our model exclusively on publicly available datasets. Based on GPT4V evaluations, our model shows promising performance comparable to the original COLE. We release the pipeline and training results to encourage open development.

* To appear as an extended abstract (EA) in Workshop on Graphic Design Understanding and Generation (in CVPR2024), code: https://github.com/CyberAgentAILab/OpenCOLE

Via

Access Paper or Ask Questions

LayoutFlow: Flow Matching for Layout Generation

Mar 27, 2024

Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui, Mayu Otani, Hideki Nakayama

Abstract:Finding a suitable layout represents a crucial task for diverse applications in graphic design. Motivated by simpler and smoother sampling trajectories, we explore the use of Flow Matching as an alternative to current diffusion-based layout generation models. Specifically, we propose LayoutFlow, an efficient flow-based model capable of generating high-quality layouts. Instead of progressively denoising the elements of a noisy layout, our method learns to gradually move, or flow, the elements of an initial sample until it reaches its final prediction. In addition, we employ a conditioning scheme that allows us to handle various generation tasks with varying degrees of conditioning with a single model. Empirically, LayoutFlow performs on par with state-of-the-art models while being significantly faster.

Via

Access Paper or Ask Questions

Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Nov 22, 2023

Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi, Kiyoharu Aizawa

Figure 1 for Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Figure 2 for Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Figure 3 for Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Figure 4 for Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Abstract:Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. In this paper, we argue that the current layout generation approaches suffer from the limited training data for the high-dimensional layout structure. We show that a simple retrieval augmentation can significantly improve the generation quality. Our model, which is named Retrieval-Augmented Layout Transformer (RALF), retrieves nearest neighbor layout examples based on an input image and feeds these results into an autoregressive generator. Our model can apply retrieval augmentation to various controllable generation tasks and yield high-quality layouts within a unified architecture. Our extensive experiments show that RALF successfully generates content-aware layouts in both constrained and unconstrained settings and significantly outperforms the baselines.

* Webpage: https://udonda.github.io/RALF/

Via

Access Paper or Ask Questions

Towards Flexible Multi-modal Document Models

Mar 31, 2023

Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

Abstract:Creative workflows for generating graphical documents involve complex inter-related tasks, such as aligning elements, choosing appropriate fonts, or employing aesthetically harmonious colors. In this work, we attempt at building a holistic model that can jointly solve many different design tasks. Our model, which we denote by FlexDM, treats vector graphic documents as a set of multi-modal elements, and learns to predict masked fields such as element type, position, styling attributes, image, or text, using a unified architecture. Through the use of explicit multi-task learning and in-domain pre-training, our model can better capture the multi-modal relationships among the different document fields. Experimental results corroborate that our single FlexDM is able to successfully solve a multitude of different design tasks, while achieving performance that is competitive with task-specific and costly baselines.

* To be published in CVPR2023 (highlight), project page: https://cyberagentailab.github.io/flex-dm

Via

Access Paper or Ask Questions

LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

Mar 14, 2023

Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

Figure 1 for LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

Figure 2 for LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

Figure 3 for LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

Figure 4 for LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

Abstract:Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the discrete representation and learns to progressively infer a noiseless layout from the initial input, where we model the layout corruption process by modality-wise discrete diffusion. For conditional generation, we propose to inject layout constraints in the form of masking or logit adjustment during inference. We show in the experiments that our LayoutDM successfully generates high-quality layouts and outperforms both task-specific and task-agnostic baselines on several layout tasks.

* To be published in CVPR2023, project page: https://cyberagentailab.github.io/layout-dm/

Via

Access Paper or Ask Questions

Generative Colorization of Structured Mobile Web Pages

Dec 22, 2022

Kotaro Kikuchi, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi

Figure 1 for Generative Colorization of Structured Mobile Web Pages

Figure 2 for Generative Colorization of Structured Mobile Web Pages

Figure 3 for Generative Colorization of Structured Mobile Web Pages

Figure 4 for Generative Colorization of Structured Mobile Web Pages

Abstract:Color is a critical design factor for web pages, affecting important factors such as viewer emotions and the overall trust and satisfaction of a website. Effective coloring requires design knowledge and expertise, but if this process could be automated through data-driven modeling, efficient exploration and alternative workflows would be possible. However, this direction remains underexplored due to the lack of a formalization of the web page colorization problem, datasets, and evaluation protocols. In this work, we propose a new dataset consisting of e-commerce mobile web pages in a tractable format, which are created by simplifying the pages and extracting canonical color styles with a common web browser. The web page colorization problem is then formalized as a task of estimating plausible color styles for a given web page content with a given hierarchical structure of the elements. We present several Transformer-based methods that are adapted to this task by prepending structural message passing to capture hierarchical relationships between elements. Experimental results, including a quantitative evaluation designed for this task, demonstrate the advantages of our methods over statistical and image colorization methods. The code is available at https://github.com/CyberAgentAILab/webcolor.

* Accepted to WACV 2023

Via

Access Paper or Ask Questions