Abstract:Modern image generation systems can produce high-quality visuals, yet user prompts often contain ambiguities, requiring multiple revisions. Existing methods struggle to address the nuanced needs of non-expert users. We propose Visual Co-Adaptation (VCA), a novel framework that iteratively refines prompts and aligns generated images with user preferences. VCA employs a fine-tuned language model with reinforcement learning and multi-turn dialogues for prompt disambiguation. Key components include the Incremental Context-Enhanced Dialogue Block for interactive clarification, the Semantic Exploration and Disambiguation Module (SESD) leveraging Retrieval-Augmented Generation (RAG) and CLIP scoring, and the Pixel Precision and Consistency Optimization Module (PPCO) for refining image details using Proximal Policy Optimization (PPO). A human-in-the-loop feedback mechanism further improves performance. Experiments show that VCA surpasses models like DALL-E 3 and Stable Diffusion, reducing dialogue rounds to 4.3, achieving a CLIP score of 0.92, and enhancing user satisfaction to 4.73/5. Additionally, we introduce a novel multi-round dialogue dataset with prompt-image pairs and user intent annotations.
Abstract:Recent advancements in text-to-image (T2I) generation using diffusion models have enabled cost-effective video-editing applications by leveraging pre-trained models, eliminating the need for resource-intensive training. However, the frame-independence of T2I generation often results in poor temporal consistency. Existing methods address this issue through temporal layer fine-tuning or inference-based temporal propagation, but these approaches suffer from high training costs or limited temporal coherence. To address these challenges, we propose a General and Efficient Adapter (GE-Adapter) that integrates temporal-spatial and semantic consistency with Baliteral DDIM inversion. This framework introduces three key components: (1) Frame-based Temporal Consistency Blocks (FTC Blocks) to capture frame-specific features and enforce smooth inter-frame transitions via temporally-aware loss functions; (2) Channel-dependent Spatial Consistency Blocks (SCD Blocks) employing bilateral filters to enhance spatial coherence by reducing noise and artifacts; and (3) Token-based Semantic Consistency Module (TSC Module) to maintain semantic alignment using shared prompt tokens and frame-specific tokens. Our method significantly improves perceptual quality, text-image alignment, and temporal coherence, as demonstrated on the MSR-VTT dataset. Additionally, it achieves enhanced fidelity and frame-to-frame coherence, offering a practical solution for T2V editing.
Abstract:Ensuring safe, comfortable, and efficient navigation is a critical goal for autonomous driving systems. While end-to-end models trained on large-scale datasets excel in common driving scenarios, they often struggle with rare, long-tail events. Recent progress in large language models (LLMs) has introduced enhanced reasoning capabilities, but their computational demands pose challenges for real-time decision-making and precise planning. This paper presents FASIONAD, a novel dual-system framework inspired by the cognitive model "Thinking, Fast and Slow." The fast system handles routine navigation tasks using rapid, data-driven path planning, while the slow system focuses on complex reasoning and decision-making in challenging or unfamiliar situations. A dynamic switching mechanism based on score distribution and feedback allows seamless transitions between the two systems. Visual prompts generated by the fast system enable human-like reasoning in the slow system, which provides high-quality feedback to enhance the fast system's decision-making. To evaluate FASIONAD, we introduce a new benchmark derived from the nuScenes dataset, specifically designed to differentiate fast and slow scenarios. FASIONAD achieves state-of-the-art performance on this benchmark, establishing a new standard for frameworks integrating fast and slow cognitive processes in autonomous driving. This approach paves the way for more adaptive, human-like autonomous driving systems.
Abstract:Recently, large language models (LLMs) have achieved significant progress in automated code generation. Despite their strong instruction-following capabilities, these models frequently struggled to align with user intent in coding scenarios. In particular, they were hampered by datasets that lacked diversity and failed to address specialized tasks or edge cases. Furthermore, challenges in supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) led to failures in generating precise, human-intent-aligned code. To tackle these challenges and improve the code generation performance for automated programming systems, we propose Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization (i.e., FALCON). FALCON is structured into two hierarchical levels. From the global level, long-term memory improves code quality by retaining and applying learned knowledge. At the local level, short-term memory allows for the incorporation of immediate feedback from compilers and AI systems. Additionally, we introduce meta-reinforcement learning with feedback rewards to solve the global-local bi-level optimization problem and enhance the model's adaptability across diverse code generation tasks. Extensive experiments demonstrate that our technique achieves state-of-the-art performance, leading other reinforcement learning methods by more than 4.5 percentage points on the MBPP benchmark and 6.1 percentage points on the Humaneval benchmark. The open-sourced code is publicly available at https://github.com/titurte/FALCON.
Abstract:With the increasing complexity of modern power systems, conventional dynamic load modeling with ZIP and induction motors (ZIP + IM) is no longer adequate to address the current load characteristic transitions. In recent years, the WECC composite load model (WECC CLM) has shown to effectively capture the dynamic load responses over traditional load models in various stability studies and contingency analyses. However, a detailed WECC CLM model typically has a high degree of complexity, with over one hundred parameters, and no systematic approach to identifying and calibrating these parameters. Enabled by the wide deployment of PMUs and advanced deep learning algorithms, proposed here is a double deep Q-learning network (DDQN)-based, two-stage load modeling framework for the WECC CLM. This two-stage method decomposes the complicated WECC CLM for more efficient identification and does not require explicit model details. In the first stage, the DDQN agent determines an accurate load composition. In the second stage, the parameters of the WECC CLM are selected from a group of Monte-Carlo simulations. The set of selected load parameters is expected to best approximate the true transient responses. The proposed framework is verified using an IEEE 39-bus test system on commercial simulation platforms.
Abstract:Creating and modeling real-world graphs is a crucial problem in various applications of engineering, biology, and social sciences; however, learning the distributions of nodes/edges and sampling from them to generate realistic graphs is still challenging. Moreover, generating a diverse set of synthetic graphs that all imitate a real network is not addressed. In this paper, the novel problem of creating diverse synthetic graphs is solved. First, we devise the deep supervised subset selection (DeepS3) algorithm; Given a ground-truth set of data points, DeepS3 selects a diverse subset of all items (i.e. data points) that best represent the items in the ground-truth set. Furthermore, we propose the deep graph representation recurrent network (GRRN) as a novel generative model that learns a probabilistic representation of a real weighted graph. Training the GRRN, we generate a large set of synthetic graphs that are likely to follow the same features and adjacency patterns as the original one. Incorporating GRRN with DeepS3, we select a diverse subset of generated graphs that best represent the behaviors of the real graph (i.e. our ground-truth). We apply our model to the novel problem of power grid synthesis, where a synthetic power network is created with the same physical/geometric properties as a real power system without revealing the real locations of the substations (nodes) and the lines (edges), since such data is confidential. Experiments on the Synthetic Power Grid Data Set show accurate synthetic networks that follow similar structural and spatial properties as the real power grid.
Abstract:Machine Learning on graph-structured data is an important and omnipresent task for a vast variety of applications including anomaly detection and dynamic network analysis. In this paper, a deep generative model is introduced to capture continuous probability densities corresponding to the nodes of an arbitrary graph. In contrast to all learning formulations in the area of discriminative pattern recognition, we propose a scalable generative optimization/algorithm theoretically proved to capture distributions at the nodes of a graph. Our model is able to generate samples from the probability densities learned at each node. This probabilistic data generation model, i.e. convolutional graph auto-encoder (CGAE), is devised based on the localized first-order approximation of spectral graph convolutions, deep learning, and the variational Bayesian inference. We apply our CGAE to a new problem, the spatio-temporal probabilistic solar irradiance prediction. Multiple solar radiation measurement sites in a wide area in northern states of the US are modeled as an undirected graph. Using our proposed model, the distribution of future irradiance given historical radiation observations is estimated for every site/node. Numerical results on the National Solar Radiation Database show state-of-the-art performance for probabilistic radiation prediction on geographically distributed irradiance data in terms of reliability, sharpness, and continuous ranked probability score.
Abstract:This paper addresses the energy disaggregation problem, i.e. decomposing the electricity signal of a whole home to its operating devices. First, we cast the problem as a dictionary learning (DL) problem where the key electricity patterns representing consumption behaviors are extracted for each device and stored in a dictionary matrix. The electricity signal of each device is then modeled by a linear combination of such patterns with sparse coefficients that determine the contribution of each device in the total electricity. Although popular, the classic DL approach is prone to high error in real-world applications including energy disaggregation, as it merely finds linear dictionaries. Moreover, this method lacks a recurrent structure; thus, it is unable to leverage the temporal structure of energy signals. Motivated by such shortcomings, we propose a novel optimization program where the dictionary and its sparse coefficients are optimized simultaneously with a deep neural model extracting powerful nonlinear features from the energy signals. A long short-term memory auto-encoder (LSTM-AE) is proposed with tunable time dependent states to capture the temporal behavior of energy signals for each device. We learn the dictionary in the space of temporal features captured by the LSTM-AE rather than the original space of the energy signals; hence, in contrast to the traditional DL, here, a nonlinear dictionary is learned using powerful temporal features extracted from our deep model. Real experiments on the publicly available Reference Energy Disaggregation Dataset (REDD) show significant improvement compared to the state-of-the-art methodologies in terms of the disaggregation accuracy and F-score metrics.
Abstract:Short-term probabilistic wind power forecasting can provide critical quantified uncertainty information of wind generation for power system operation and control. As the complicated characteristics of wind power prediction error, it would be difficult to develop a universal forecasting model dominating over other alternative models. Therefore, a novel multi-model combination (MMC) approach for short-term probabilistic wind generation forecasting is proposed in this paper to exploit the advantages of different forecasting models. The proposed approach can combine different forecasting models those provide different kinds of probability density functions to improve the probabilistic forecast accuracy. Three probabilistic forecasting models based on the sparse Bayesian learning, kernel density estimation and beta distribution fitting are used to form the combined model. The parameters of the MMC model are solved based on Bayesian framework. Numerical tests illustrate the effectiveness of the proposed MMC approach.