Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jooyeol Yun

Talk to Your Slides: Efficient Slide Editing Agent with Large Language Models

May 16, 2025

Kyudan Jung, Hojun Cho, Jooyeol Yun, Jaehyeok Jang, Jagul Choo

Abstract:Existing research on large language models (LLMs) for PowerPoint predominantly focuses on slide generation, overlooking the common yet tedious task of editing existing slides. We introduce Talk-to-Your-Slides, an LLM-powered agent that directly edits slides within active PowerPoint sessions through COM communication. Our system employs a two-level approach: (1) high-level processing where an LLM agent interprets instructions and formulates editing plans, and (2) low-level execution where Python scripts directly manipulate PowerPoint objects. Unlike previous methods relying on predefined operations, our approach enables more flexible and contextually-aware editing. To facilitate evaluation, we present TSBench, a human-annotated dataset of 379 diverse editing instructions with corresponding slide variations. Experimental results demonstrate that Talk-to-Your-Slides significantly outperforms baseline methods in execution success rate, instruction fidelity, and editing efficiency. Our code and benchmark are available at https://anonymous.4open.science/r/talk-to-your-slides/

* 14 pages, 6 figures

Via

Access Paper or Ask Questions

SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation

Apr 19, 2025

Minho Park, Taewoong Kang, Jooyeol Yun, Sungwon Hwang, Jaegul Choo

Abstract:The increasing demand for AR/VR applications has highlighted the need for high-quality 360-degree panoramic content. However, generating high-quality 360-degree panoramic images and videos remains a challenging task due to the severe distortions introduced by equirectangular projection (ERP). Existing approaches either fine-tune pretrained diffusion models on limited ERP datasets or attempt tuning-free methods that still rely on ERP latent representations, leading to discontinuities near the poles. In this paper, we introduce SphereDiff, a novel approach for seamless 360-degree panoramic image and video generation using state-of-the-art diffusion models without additional tuning. We define a spherical latent representation that ensures uniform distribution across all perspectives, mitigating the distortions inherent in ERP. We extend MultiDiffusion to spherical latent space and propose a spherical latent sampling method to enable direct use of pretrained diffusion models. Moreover, we introduce distortion-aware weighted averaging to further improve the generation quality in the projection process. Our method outperforms existing approaches in generating 360-degree panoramic content while maintaining high fidelity, making it a robust solution for immersive AR/VR applications. The code is available here. https://github.com/pmh9960/SphereDiff

Via

Access Paper or Ask Questions

Enabling Region-Specific Control via Lassos in Point-Based Colorization

Dec 18, 2024

Sanghyeon Lee, Jooyeol Yun, Jaegul Choo

Figure 1 for Enabling Region-Specific Control via Lassos in Point-Based Colorization

Figure 2 for Enabling Region-Specific Control via Lassos in Point-Based Colorization

Figure 3 for Enabling Region-Specific Control via Lassos in Point-Based Colorization

Figure 4 for Enabling Region-Specific Control via Lassos in Point-Based Colorization

Abstract:Point-based interactive colorization techniques allow users to effortlessly colorize grayscale images using user-provided color hints. However, point-based methods often face challenges when different colors are given to semantically similar areas, leading to color intermingling and unsatisfactory results-an issue we refer to as color collapse. The fundamental cause of color collapse is the inadequacy of points for defining the boundaries for each color. To mitigate color collapse, we introduce a lasso tool that can control the scope of each color hint. Additionally, we design a framework that leverages the user-provided lassos to localize the attention masks. The experimental results show that using a single lasso is as effective as applying 4.18 individual color hints and can achieve the desired outcomes in 30% less time than using points alone.

* Accepted to AAAI2025

Via

Access Paper or Ask Questions

Generative Location Modeling for Spatially Aware Object Insertion

Oct 17, 2024

Jooyeol Yun, Davide Abati, Mohamed Omran, Jaegul Choo, Amirhossein Habibian, Auke Wiggers

Figure 1 for Generative Location Modeling for Spatially Aware Object Insertion

Figure 2 for Generative Location Modeling for Spatially Aware Object Insertion

Figure 3 for Generative Location Modeling for Spatially Aware Object Insertion

Figure 4 for Generative Location Modeling for Spatially Aware Object Insertion

Abstract:Generative models have become a powerful tool for image editing tasks, including object insertion. However, these methods often lack spatial awareness, generating objects with unrealistic locations and scales, or unintentionally altering the scene background. A key challenge lies in maintaining visual coherence, which requires both a geometrically suitable object location and a high-quality image edit. In this paper, we focus on the former, creating a location model dedicated to identifying realistic object locations. Specifically, we train an autoregressive model that generates bounding box coordinates, conditioned on the background image and the desired object class. This formulation allows to effectively handle sparse placement annotations and to incorporate implausible locations into a preference dataset by performing direct preference optimization. Our extensive experiments demonstrate that our generative location model, when paired with an inpainting method, substantially outperforms state-of-the-art instruction-tuned models and location modeling baselines in object insertion tasks, delivering accurate and visually coherent results.

Via

Access Paper or Ask Questions

Scaling Up Personalized Aesthetic Assessment via Task Vector Customization

Jul 09, 2024

Jooyeol Yun, Jaegul Choo

Abstract:The task of personalized image aesthetic assessment seeks to tailor aesthetic score prediction models to match individual preferences with just a few user-provided inputs. However, the scalability and generalization capabilities of current approaches are considerably restricted by their reliance on an expensive curated database. To overcome this long-standing scalability challenge, we present a unique approach that leverages readily available databases for general image aesthetic assessment and image quality assessment. Specifically, we view each database as a distinct image score regression task that exhibits varying degrees of personalization potential. By determining optimal combinations of task vectors, known to represent specific traits of each database, we successfully create personalized models for individuals. This approach of integrating multiple models allows us to harness a substantial amount of data. Our extensive experiments demonstrate the effectiveness of our approach in generalizing to previously unseen domains-a challenge previous approaches have struggled to achieve-making it highly applicable to real-world scenarios. Our novel approach significantly advances the field by offering scalable solutions for personalized aesthetic assessment and establishing high standards for future research. https://yeolj00.github.io/personal-projects/personalized-aesthetics/

* ECCV 2024

Via

Access Paper or Ask Questions

Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

Jun 08, 2024

Minho Park, Sunghyun Park, Jooyeol Yun, Jaegul Choo

Figure 1 for Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

Figure 2 for Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

Figure 3 for Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

Figure 4 for Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

Abstract:Recent advancements in text-to-image generation have inspired researchers to generate datasets tailored for perception models using generative models, which prove particularly valuable in scenarios where real-world data is limited. In this study, our goal is to address the challenges when fine-tuning vision-language models (e.g., CLIP) on generated datasets. Specifically, we aim to fine-tune vision-language models to a specific classification model without access to any real images, also known as name-only transfer. However, despite the high fidelity of generated images, we observed a significant performance degradation when fine-tuning the model using the generated datasets due to the domain gap between real and generated images. To overcome the domain gap, we provide two regularization methods for training and post-training, respectively. First, we leverage the domain-agnostic knowledge from the original pre-trained vision-language model by conducting the weight-space ensemble of the fine-tuned model on the generated dataset with the original pre-trained model at the post-training. Secondly, we reveal that fine-tuned models with high feature diversity score high performance in the real domain, which indicates that increasing feature diversity prevents learning the generated domain-specific knowledge. Thus, we encourage feature diversity by providing additional regularization at training time. Extensive experiments on various classification datasets and various text-to-image generation models demonstrated that our analysis and regularization techniques effectively mitigate the domain gap, which has long been overlooked, and enable us to achieve state-of-the-art performance by training with generated images. Code is available at https://github.com/pmh9960/regft-for-gen

* Preprint. Under review

Via

Access Paper or Ask Questions

Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis

Aug 16, 2023

Minho Park, Jooyeol Yun, Seunghwan Choi, Jaegul Choo

Abstract:Existing text-to-image generation approaches have set high standards for photorealism and text-image correspondence, largely benefiting from web-scale text-image datasets, which can include up to 5~billion pairs. However, text-to-image generation models trained on domain-specific datasets, such as urban scenes, medical images, and faces, still suffer from low text-image correspondence due to the lack of text-image pairs. Additionally, collecting billions of text-image pairs for a specific domain can be time-consuming and costly. Thus, ensuring high text-image correspondence without relying on web-scale text-image datasets remains a challenging task. In this paper, we present a novel approach for enhancing text-image correspondence by leveraging available semantic layouts. Specifically, we propose a Gaussian-categorical diffusion process that simultaneously generates both images and corresponding layout pairs. Our experiments reveal that we can guide text-to-image generation models to be aware of the semantics of different image regions, by training the model to generate semantic labels for each pixel. We demonstrate that our approach achieves higher text-image correspondence compared to existing text-to-image generation approaches in the Multi-Modal CelebA-HQ and the Cityscapes dataset, where text-image pairs are scarce. Codes are available in this https://pmh9960.github.io/research/GCDP

* Accepted to ICCV 2023

Via

Access Paper or Ask Questions

iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Jul 15, 2022

Sanghyeon Lee, Jooyeol Yun, Minho Park, Jaegul Choo

Figure 1 for iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Figure 2 for iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Figure 3 for iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Figure 4 for iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Abstract:Point-interactive image colorization aims to colorize grayscale images when a user provides the colors for specific locations. It is essential for point-interactive colorization methods to appropriately propagate user-provided colors (i.e., user hints) in the entire image to obtain a reasonably colorized image with minimal user effort. However, existing approaches often produce partially colorized results due to the inefficient design of stacking convolutional layers to propagate hints to distant relevant regions. To address this problem, we present iColoriT, a novel point-interactive colorization Vision Transformer capable of propagating user hints to relevant regions, leveraging the global receptive field of Transformers. The self-attention mechanism of Transformers enables iColoriT to selectively colorize relevant regions with only a few local hints. Our approach colorizes images in real-time by utilizing pixel shuffling, an efficient upsampling technique that replaces the decoder architecture. Also, in order to mitigate the artifacts caused by pixel shuffling with large upsampling ratios, we present the local stabilizing layer. Extensive quantitative and qualitative results demonstrate that our approach highly outperforms existing methods for point-interactive colorization, producing accurately colorized images with a user's minimal effort.

Via

Access Paper or Ask Questions

Improving Face Recognition with Large Age Gaps by Learning to Distinguish Children

Oct 22, 2021

Jungsoo Lee, Jooyeol Yun, Sunghyun Park, Yonggyu Kim, Jaegul Choo

Figure 1 for Improving Face Recognition with Large Age Gaps by Learning to Distinguish Children

Figure 2 for Improving Face Recognition with Large Age Gaps by Learning to Distinguish Children

Figure 3 for Improving Face Recognition with Large Age Gaps by Learning to Distinguish Children

Figure 4 for Improving Face Recognition with Large Age Gaps by Learning to Distinguish Children

Abstract:Despite the unprecedented improvement of face recognition, existing face recognition models still show considerably low performances in determining whether a pair of child and adult images belong to the same identity. Previous approaches mainly focused on increasing the similarity between child and adult images of a given identity to overcome the discrepancy of facial appearances due to aging. However, we observe that reducing the similarity between child images of different identities is crucial for learning distinct features among children and thus improving face recognition performance in child-adult pairs. Based on this intuition, we propose a novel loss function called the Inter-Prototype loss which minimizes the similarity between child images. Unlike the previous studies, the Inter-Prototype loss does not require additional child images or training additional learnable parameters. Our extensive experiments and in-depth analyses show that our approach outperforms existing baselines in face recognition with child-adult pairs. Our code and newly-constructed test sets of child-adult pairs are available at https://github.com/leebebeto/Inter-Prototype.

* Accepted to BMVC 2021

Via

Access Paper or Ask Questions

The New Approach on Fuzzy Decision Trees

Aug 13, 2014

Jooyeol Yun, Jun won Seo, Taeseon Yoon

Abstract:Decision trees have been widely used in machine learning. However, due to some reasons, data collecting in real world contains a fuzzy and uncertain form. The decision tree should be able to handle such fuzzy data. This paper presents a method to construct fuzzy decision tree. It proposes a fuzzy decision tree induction method in iris flower data set, obtaining the entropy from the distance between an average value and a particular value. It also presents an experiment result that shows the accuracy compared to former ID3.

* Jooyeol Yun, Jun won Seo, and Taeseon Yoon (2014) THE NEW APPROACH ON FUZZY DECISION TREES International Journal of Fuzzy Logic Systems (IJFLS) Vol.4, No.3, July 2014

Via

Access Paper or Ask Questions