Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qisai Liu

Bidirectional Linear Recurrent Models for Sequence-Level Multisource Fusion

Apr 11, 2025

Qisai Liu, Zhanhong Jiang, Joshua R. Waite, Chao Liu, Aditya Balu, Soumik Sarkar

Abstract:Sequence modeling is a critical yet challenging task with wide-ranging applications, especially in time series forecasting for domains like weather prediction, temperature monitoring, and energy load forecasting. Transformers, with their attention mechanism, have emerged as state-of-the-art due to their efficient parallel training, but they suffer from quadratic time complexity, limiting their scalability for long sequences. In contrast, recurrent neural networks (RNNs) offer linear time complexity, spurring renewed interest in linear RNNs for more computationally efficient sequence modeling. In this work, we introduce BLUR (Bidirectional Linear Unit for Recurrent network), which uses forward and backward linear recurrent units (LRUs) to capture both past and future dependencies with high computational efficiency. BLUR maintains the linear time complexity of traditional RNNs, while enabling fast parallel training through LRUs. Furthermore, it offers provably stable training and strong approximation capabilities, making it highly effective for modeling long-term dependencies. Extensive experiments on sequential image and time series datasets reveal that BLUR not only surpasses transformers and traditional RNNs in accuracy but also significantly reduces computational costs, making it particularly suitable for real-world forecasting tasks. Our code is available here.

Via

Access Paper or Ask Questions

Enhancing PPO with Trajectory-Aware Hybrid Policies

Feb 21, 2025

Qisai Liu, Zhanhong Jiang, Hsin-Jung Yang, Mahsa Khosravi, Joshua R. Waite, Soumik Sarkar

Figure 1 for Enhancing PPO with Trajectory-Aware Hybrid Policies

Figure 2 for Enhancing PPO with Trajectory-Aware Hybrid Policies

Figure 3 for Enhancing PPO with Trajectory-Aware Hybrid Policies

Figure 4 for Enhancing PPO with Trajectory-Aware Hybrid Policies

Abstract:Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable performance with theoretical policy improvement guarantees, high variance, and high sample complexity still remain critical challenges in on-policy algorithms. To alleviate these issues, we propose Hybrid-Policy Proximal Policy Optimization (HP3O), which utilizes a trajectory replay buffer to make efficient use of trajectories generated by recent policies. Particularly, the buffer applies the "first in, first out" (FIFO) strategy so as to keep only the recent trajectories to attenuate the data distribution drift. A batch consisting of the trajectory with the best return and other randomly sampled ones from the buffer is used for updating the policy networks. The strategy helps the agent to improve its capability on top of the most recent best performance and in turn reduce variance empirically. We theoretically construct the policy improvement guarantees for the proposed algorithm. HP3O is validated and compared against several baseline algorithms using multiple continuous control environments. Our code is available here.

Via

Access Paper or Ask Questions

RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception

Jan 31, 2025

Joshua R. Waite, Md. Zahid Hasan, Qisai Liu, Zhanhong Jiang, Chinmay Hegde, Soumik Sarkar

Abstract:Vision-language model (VLM) fine-tuning for application-specific visual grounding based on natural language instructions has become one of the most popular approaches for learning-enabled autonomous systems. However, such fine-tuning relies heavily on high-quality datasets to achieve successful performance in various downstream tasks. Additionally, VLMs often encounter limitations due to insufficient and imbalanced fine-tuning data. To address these issues, we propose a new generalizable framework to improve VLM fine-tuning by integrating it with a reinforcement learning (RL) agent. Our method utilizes the RL agent to manipulate objects within an indoor setting to create synthetic data for fine-tuning to address certain vulnerabilities of the VLM. Specifically, we use the performance of the VLM to provide feedback to the RL agent to generate informative data that efficiently fine-tune the VLM over the targeted task (e.g. spatial reasoning). The key contribution of this work is developing a framework where the RL agent serves as an informative data sampling tool and assists the VLM in order to enhance performance and address task-specific vulnerabilities. By targeting the data sampling process to address the weaknesses of the VLM, we can effectively train a more context-aware model. In addition, generating synthetic data allows us to have precise control over each scene and generate granular ground truth captions. Our results show that the proposed data generation approach improves the spatial reasoning performance of VLMs, which demonstrates the benefits of using RL-guided data generation in vision-language tasks.

* ICCPS 2025 accepted paper, 10 pages, 9 figures

Via

Access Paper or Ask Questions

Incorporating System-level Safety Requirements in Perception Models via Reinforcement Learning

Dec 04, 2024

Weisi Fan, Jesse Lane, Qisai Liu, Soumik Sarkar, Tichakorn Wongpiromsarn

Figure 1 for Incorporating System-level Safety Requirements in Perception Models via Reinforcement Learning

Figure 2 for Incorporating System-level Safety Requirements in Perception Models via Reinforcement Learning

Figure 3 for Incorporating System-level Safety Requirements in Perception Models via Reinforcement Learning

Figure 4 for Incorporating System-level Safety Requirements in Perception Models via Reinforcement Learning

Abstract:Perception components in autonomous systems are often developed and optimized independently of downstream decision-making and control components, relying on established performance metrics like accuracy, precision, and recall. Traditional loss functions, such as cross-entropy loss and negative log-likelihood, focus on reducing misclassification errors but fail to consider their impact on system-level safety, overlooking the varying severities of system-level failures caused by these errors. To address this limitation, we propose a novel training paradigm that augments the perception component with an understanding of system-level safety objectives. Central to our approach is the translation of system-level safety requirements, formally specified using the rulebook formalism, into safety scores. These scores are then incorporated into the reward function of a reinforcement learning framework for fine-tuning perception models with system-level safety objectives. Simulation results demonstrate that models trained with this approach outperform baseline perception models in terms of system-level safety.

Via

Access Paper or Ask Questions

A Deep Learning Approach to Detect Lean Blowout in Combustion Systems

Aug 03, 2022

Tryambak Gangopadhyay, Somnath De, Qisai Liu, Achintya Mukhopadhyay, Swarnendu Sen, Soumik Sarkar

Figure 1 for A Deep Learning Approach to Detect Lean Blowout in Combustion Systems

Figure 2 for A Deep Learning Approach to Detect Lean Blowout in Combustion Systems

Figure 3 for A Deep Learning Approach to Detect Lean Blowout in Combustion Systems

Figure 4 for A Deep Learning Approach to Detect Lean Blowout in Combustion Systems

Abstract:Lean combustion is environment friendly with low NOx emissions and also provides better fuel efficiency in a combustion system. However, approaching towards lean combustion can make engines more susceptible to lean blowout. Lean blowout (LBO) is an undesirable phenomenon that can cause sudden flame extinction leading to sudden loss of power. During the design stage, it is quite challenging for the scientists to accurately determine the optimal operating limits to avoid sudden LBO occurrence. Therefore, it is crucial to develop accurate and computationally tractable frameworks for online LBO detection in low NOx emission engines. To the best of our knowledge, for the first time, we propose a deep learning approach to detect lean blowout in combustion systems. In this work, we utilize a laboratory-scale combustor to collect data for different protocols. We start far from LBO for each protocol and gradually move towards the LBO regime, capturing a quasi-static time series dataset at each condition. Using one of the protocols in our dataset as the reference protocol and with conditions annotated by domain experts, we find a transition state metric for our trained deep learning model to detect LBO in the other test protocols. We find that our proposed approach is more accurate and computationally faster than other baseline models to detect the transitions to LBO. Therefore, we recommend this method for real-time performance monitoring in lean combustion engines.

Via

Access Paper or Ask Questions

A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment

Jan 19, 2021

Homagni Saha, Fateme Fotouhif, Qisai Liu, Soumik Sarkar

Figure 1 for A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment

Figure 2 for A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment

Figure 3 for A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment

Figure 4 for A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment

Abstract:In this paper we propose a new framework - MoViLan (Modular Vision and Language) for execution of visually grounded natural language instructions for day to day indoor household tasks. While several data-driven, end-to-end learning frameworks have been proposed for targeted navigation tasks based on the vision and language modalities, performance on recent benchmark data sets revealed the gap in developing comprehensive techniques for long horizon, compositional tasks (involving manipulation and navigation) with diverse object categories, realistic instructions and visual scenarios with non-reversible state changes. We propose a modular approach to deal with the combined navigation and object interaction problem without the need for strictly aligned vision and language training data (e.g., in the form of expert demonstrated trajectories). Such an approach is a significant departure from the traditional end-to-end techniques in this space and allows for a more tractable training process with separate vision and language data sets. Specifically, we propose a novel geometry-aware mapping technique for cluttered indoor environments, and a language understanding model generalized for household instruction following. We demonstrate a significant increase in success rates for long-horizon, compositional tasks over the baseline on the recently released benchmark data set-ALFRED.

* 16 pages

Via

Access Paper or Ask Questions