Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chih-Yu Wang

Gen-n-Val: Agentic Image Data Generation and Validation

Jun 05, 2025

Jing-En Huang, I-Sheng Fang, Tzuhsuan Huang, Chih-Yu Wang, Jun-Cheng Chen

Abstract:Recently, Large Language Models (LLMs) and Vision Large Language Models (VLLMs) have demonstrated impressive performance as agents across various tasks while data scarcity and label noise remain significant challenges in computer vision tasks, such as object detection and instance segmentation. A common solution for resolving these issues is to generate synthetic data. However, current synthetic data generation methods struggle with issues, such as multiple objects per mask, inaccurate segmentation, and incorrect category labels, limiting their effectiveness. To address these issues, we introduce Gen-n-Val, a novel agentic data generation framework that leverages Layer Diffusion (LD), LLMs, and VLLMs to produce high-quality, single-object masks and diverse backgrounds. Gen-n-Val consists of two agents: (1) The LD prompt agent, an LLM, optimizes prompts for LD to generate high-quality foreground instance images and segmentation masks. These optimized prompts ensure the generation of single-object synthetic data with precise instance masks and clean backgrounds. (2) The data validation agent, a VLLM, which filters out low-quality synthetic instance images. The system prompts for both agents are refined through TextGrad. Additionally, we use image harmonization to combine multiple instances within scenes. Compared to state-of-the-art synthetic data approaches like MosaicFusion, our approach reduces invalid synthetic data from 50% to 7% and improves performance by 1% mAP on rare classes in COCO instance segmentation with YOLOv9c and YOLO11m. Furthermore, Gen-n-Val shows significant improvements (7. 1% mAP) over YOLO-Worldv2-M in open-vocabulary object detection benchmarks with YOLO11m. Moreover, Gen-n-Val improves the performance of YOLOv9 and YOLO11 families in instance segmentation and object detection.

Via

Access Paper or Ask Questions

Swapped Logit Distillation via Bi-level Teacher Alignment

Apr 27, 2025

Stephen Ekaputra Limantoro, Jhe-Hao Lin, Chih-Yu Wang, Yi-Lung Tsai, Hong-Han Shuai, Ching-Chun Huang, Wen-Huang Cheng

Abstract:Knowledge distillation (KD) compresses the network capacity by transferring knowledge from a large (teacher) network to a smaller one (student). It has been mainstream that the teacher directly transfers knowledge to the student with its original distribution, which can possibly lead to incorrect predictions. In this article, we propose a logit-based distillation via swapped logit processing, namely Swapped Logit Distillation (SLD). SLD is proposed under two assumptions: (1) the wrong prediction occurs when the prediction label confidence is not the maximum; (2) the "natural" limit of probability remains uncertain as the best value addition to the target cannot be determined. To address these issues, we propose a swapped logit processing scheme. Through this approach, we find that the swap method can be effectively extended to teacher and student outputs, transforming into two teachers. We further introduce loss scheduling to boost the performance of two teachers' alignment. Extensive experiments on image classification tasks demonstrate that SLD consistently performs best among previous state-of-the-art methods.

* Accepted to Multimedia Systems 2025

Via

Access Paper or Ask Questions

StyleDiT: A Unified Framework for Diverse Child and Partner Faces Synthesis with Style Latent Diffusion Transformer

Dec 14, 2024

Pin-Yen Chiu, Dai-Jie Wu, Po-Hsun Chu, Chia-Hsuan Hsu, Hsiang-Chen Chiu, Chih-Yu Wang, Jun-Cheng Chen

Abstract:Kinship face synthesis is a challenging problem due to the scarcity and low quality of the available kinship data. Existing methods often struggle to generate descendants with both high diversity and fidelity while precisely controlling facial attributes such as age and gender. To address these issues, we propose the Style Latent Diffusion Transformer (StyleDiT), a novel framework that integrates the strengths of StyleGAN with the diffusion model to generate high-quality and diverse kinship faces. In this framework, the rich facial priors of StyleGAN enable fine-grained attribute control, while our conditional diffusion model is used to sample a StyleGAN latent aligned with the kinship relationship of conditioning images by utilizing the advantage of modeling complex kinship relationship distribution. StyleGAN then handles latent decoding for final face generation. Additionally, we introduce the Relational Trait Guidance (RTG) mechanism, enabling independent control of influencing conditions, such as each parent's facial image. RTG also enables a fine-grained adjustment between the diversity and fidelity in synthesized faces. Furthermore, we extend the application to an unexplored domain: predicting a partner's facial images using a child's image and one parent's image within the same framework. Extensive experiments demonstrate that our StyleDiT outperforms existing methods by striking an excellent balance between generating diverse and high-fidelity kinship faces.

Via

Access Paper or Ask Questions

SynHIN: Generating Synthetic Heterogeneous Information Network for Explainable AI

Jan 07, 2024

Ming-Yi Hong, Yi-Hsiang Huang, You-Chen Teng, Chih-Yu Wang, Che Lin

Abstract:Graph Neural Networks (GNNs) excel in various domains, from detecting e-commerce spam to social network classification problems. However, the lack of public graph datasets hampers research progress, particularly in heterogeneous information networks (HIN). The demand for datasets for fair HIN comparisons is growing due to advancements in GNN interpretation models. In response, we propose SynHIN, a unique method for generating synthetic heterogeneous information networks. SynHIN identifies motifs in real-world datasets, summarizes graph statistics, and constructs a synthetic network. Our approach utilizes In-Cluster and Out-Cluster Merge modules to build the synthetic HIN from primary motif clusters. After In/Our-Cluster mergers and a post-pruning process fitting the real dataset constraints, we ensure the synthetic graph statistics align closely with the reference one. SynHIN generates a synthetic heterogeneous graph dataset for node classification tasks, using the primary motif as the explanation ground truth. It can adapt and address the lack of heterogeneous graph datasets and motif ground truths, proving beneficial for assessing heterogeneous graph neural network explainers. We further present a benchmark dataset for future heterogeneous graph explainer model research. Our work marks a significant step towards explainable AI in HGNNs.

Via

Access Paper or Ask Questions

A GAN Approach for Node Embedding in Heterogeneous Graphs Using Subgraph Sampling

Dec 11, 2023

Hung Chun Hsu, Bo-Jun Wu, Ming-Yi Hong, Che Lin, Chih-Yu Wang

Abstract:Our research addresses class imbalance issues in heterogeneous graphs using graph neural networks (GNNs). We propose a novel method combining the strengths of Generative Adversarial Networks (GANs) with GNNs, creating synthetic nodes and edges that effectively balance the dataset. This approach directly targets and rectifies imbalances at the data level. The proposed framework resolves issues such as neglecting graph structures during data generation and creating synthetic structures usable with GNN-based classifiers in downstream tasks. It processes node and edge information concurrently, improving edge balance through node augmentation and subgraph sampling. Additionally, our framework integrates a threshold strategy, aiding in determining optimal edge thresholds during training without time-consuming parameter adjustments. Experiments on the Amazon and Yelp Review datasets highlight the effectiveness of the framework we proposed, especially in minority node identification, where it consistently outperforms baseline models across key performance metrics, demonstrating its potential in the field.

Via

Access Paper or Ask Questions

Chinese Restaurant Game - Part I: Theory of Learning with Negative Network Externality

Feb 13, 2012

Chih-Yu Wang, Yan Chen, K. J. Ray Liu

Figure 1 for Chinese Restaurant Game - Part I: Theory of Learning with Negative Network Externality

Figure 2 for Chinese Restaurant Game - Part I: Theory of Learning with Negative Network Externality

Figure 3 for Chinese Restaurant Game - Part I: Theory of Learning with Negative Network Externality

Figure 4 for Chinese Restaurant Game - Part I: Theory of Learning with Negative Network Externality

Abstract:In a social network, agents are intelligent and have the capability to make decisions to maximize their utilities. They can either make wise decisions by taking advantages of other agents' experiences through learning, or make decisions earlier to avoid competitions from huge crowds. Both these two effects, social learning and negative network externality, play important roles in the decision process of an agent. While there are existing works on either social learning or negative network externality, a general study on considering both these two contradictory effects is still limited. We find that the Chinese restaurant process, a popular random process, provides a well-defined structure to model the decision process of an agent under these two effects. By introducing the strategic behavior into the non-strategic Chinese restaurant process, in Part I of this two-part paper, we propose a new game, called Chinese Restaurant Game, to formulate the social learning problem with negative network externality. Through analyzing the proposed Chinese restaurant game, we derive the optimal strategy of each agent and provide a recursive method to achieve the optimal strategy. How social learning and negative network externality influence each other under various settings is also studied through simulations.

Via

Access Paper or Ask Questions

Chinese Restaurant Game - Part II: Applications to Wireless Networking, Cloud Computing, and Online Social Networking

Dec 15, 2011

Chih-Yu Wang, Yan Chen, K. J. Ray Liu

Figure 1 for Chinese Restaurant Game - Part II: Applications to Wireless Networking, Cloud Computing, and Online Social Networking

Figure 2 for Chinese Restaurant Game - Part II: Applications to Wireless Networking, Cloud Computing, and Online Social Networking

Figure 3 for Chinese Restaurant Game - Part II: Applications to Wireless Networking, Cloud Computing, and Online Social Networking

Figure 4 for Chinese Restaurant Game - Part II: Applications to Wireless Networking, Cloud Computing, and Online Social Networking

Abstract:In Part I of this two-part paper [1], we proposed a new game, called Chinese restaurant game, to analyze the social learning problem with negative network externality. The best responses of agents in the Chinese restaurant game with imperfect signals are constructed through a recursive method, and the influence of both learning and network externality on the utilities of agents is studied. In Part II of this two-part paper, we illustrate three applications of Chinese restaurant game in wireless networking, cloud computing, and online social networking. For each application, we formulate the corresponding problem as a Chinese restaurant game and analyze how agents learn and make strategic decisions in the problem. The proposed method is compared with four common-sense methods in terms of agents' utilities and the overall system performance through simulations. We find that the proposed Chinese restaurant game theoretic approach indeed helps agents make better decisions and improves the overall system performance. Furthermore, agents with different decision orders have different advantages in terms of their utilities, which also verifies the conclusions drawn in Part I of this two-part paper.

Via

Access Paper or Ask Questions