Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jidong J. Yang

Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding

Aug 24, 2025

Yunxiang Yang, Ningning Xu, Jidong J. Yang

Abstract:This paper introduces a multi-agent framework for comprehensive highway scene understanding, designed around a mixture-of-experts strategy. In this framework, a large generic vision-language model (VLM), such as GPT-4o, is contextualized with domain knowledge to generates task-specific chain-of-thought (CoT) prompts. These fine-grained prompts are then used to guide a smaller, efficient VLM (e.g., Qwen2.5-VL-7B) in reasoning over short videos, along with complementary modalities as applicable. The framework simultaneously addresses multiple critical perception tasks, including weather classification, pavement wetness assessment, and traffic congestion detection, achieving robust multi-task reasoning while balancing accuracy and computational efficiency. To support empirical validation, we curated three specialized datasets aligned with these tasks. Notably, the pavement wetness dataset is multimodal, combining video streams with road weather sensor data, highlighting the benefits of multimodal reasoning. Experimental results demonstrate consistently strong performance across diverse traffic and environmental conditions. From a deployment perspective, the framework can be readily integrated with existing traffic camera systems and strategically applied to high-risk rural locations, such as sharp curves, flood-prone lowlands, or icy bridges. By continuously monitoring the targeted sites, the system enhances situational awareness and delivers timely alerts, even in resource-constrained environments.

* 16 pages, 16 figures, 8 tables

Via

Access Paper or Ask Questions

CrashSage: A Large Language Model-Centered Framework for Contextual and Interpretable Traffic Crash Analysis

May 08, 2025

Hao Zhen, Jidong J. Yang

Abstract:Road crashes claim over 1.3 million lives annually worldwide and incur global economic losses exceeding \$1.8 trillion. Such profound societal and financial impacts underscore the urgent need for road safety research that uncovers crash mechanisms and delivers actionable insights. Conventional statistical models and tree ensemble approaches typically rely on structured crash data, overlooking contextual nuances and struggling to capture complex relationships and underlying semantics. Moreover, these approaches tend to incur significant information loss, particularly in narrative elements related to multi-vehicle interactions, crash progression, and rare event characteristics. This study presents CrashSage, a novel Large Language Model (LLM)-centered framework designed to advance crash analysis and modeling through four key innovations. First, we introduce a tabular-to-text transformation strategy paired with relational data integration schema, enabling the conversion of raw, heterogeneous crash data into enriched, structured textual narratives that retain essential structural and relational context. Second, we apply context-aware data augmentation using a base LLM model to improve narrative coherence while preserving factual integrity. Third, we fine-tune the LLaMA3-8B model for crash severity inference, demonstrating superior performance over baseline approaches, including zero-shot, zero-shot with chain-of-thought prompting, and few-shot learning, with multiple models (GPT-4o, GPT-4o-mini, LLaMA3-70B). Finally, we employ a gradient-based explainability technique to elucidate model decisions at both the individual crash level and across broader risk factor dimensions. This interpretability mechanism enhances transparency and enables targeted road safety interventions by providing deeper insights into the most influential factors.

* 20 pages, 7 figures

Via

Access Paper or Ask Questions

Enhancing autonomous vehicle safety in rain: a data-centric approach for clear vision

Dec 29, 2024

Mark A. Seferian, Jidong J. Yang

Abstract:Autonomous vehicles face significant challenges in navigating adverse weather, particularly rain, due to the visual impairment of camera-based systems. In this study, we leveraged contemporary deep learning techniques to mitigate these challenges, aiming to develop a vision model that processes live vehicle camera feeds to eliminate rain-induced visual hindrances, yielding visuals closely resembling clear, rain-free scenes. Using the Car Learning to Act (CARLA) simulation environment, we generated a comprehensive dataset of clear and rainy images for model training and testing. In our model, we employed a classic encoder-decoder architecture with skip connections and concatenation operations. It was trained using novel batching schemes designed to effectively distinguish high-frequency rain patterns from low-frequency scene features across successive image frames. To evaluate the model performance, we integrated it with a steering module that processes front-view images as input. The results demonstrated notable improvements in steering accuracy, underscoring the model's potential to enhance navigation safety and reliability in rainy weather conditions.

* 16 pages, 16 figures, 2 tables

Via

Access Paper or Ask Questions

Leveraging Scene Geometry and Depth Information for Robust Image Deraining

Dec 27, 2024

Ningning Xu, Jidong J. Yang

Abstract:Image deraining holds great potential for enhancing the vision of autonomous vehicles in rainy conditions, contributing to safer driving. Previous works have primarily focused on employing a single network architecture to generate derained images. However, they often fail to fully exploit the rich prior knowledge embedded in the scenes. Particularly, most methods overlook the depth information that can provide valuable context about scene geometry and guide more robust deraining. In this work, we introduce a novel learning framework that integrates multiple networks: an AutoEncoder for deraining, an auxiliary network to incorporate depth information, and two supervision networks to enforce feature consistency between rainy and clear scenes. This multi-network design enables our model to effectively capture the underlying scene structure, producing clearer and more accurately derained images, leading to improved object detection for autonomous vehicles. Extensive experiments on three widely-used datasets demonstrated the effectiveness of our proposed method.

* 12 pages, 5 figures, 10 tables

Via

Access Paper or Ask Questions

Enhancing Nighttime Vehicle Detection with Day-to-Night Style Transfer and Labeling-Free Augmentation

Dec 21, 2024

Yunxiang Yang, Hao Zhen, Yongcan Huang, Jidong J. Yang

Figure 1 for Enhancing Nighttime Vehicle Detection with Day-to-Night Style Transfer and Labeling-Free Augmentation

Figure 2 for Enhancing Nighttime Vehicle Detection with Day-to-Night Style Transfer and Labeling-Free Augmentation

Figure 3 for Enhancing Nighttime Vehicle Detection with Day-to-Night Style Transfer and Labeling-Free Augmentation

Figure 4 for Enhancing Nighttime Vehicle Detection with Day-to-Night Style Transfer and Labeling-Free Augmentation

Abstract:Existing deep learning-based object detection models perform well under daytime conditions but face significant challenges at night, primarily because they are predominantly trained on daytime images. Additionally, training with nighttime images presents another challenge: even human annotators struggle to accurately label objects in low-light conditions. This issue is particularly pronounced in transportation applications, such as detecting vehicles and other objects of interest on rural roads at night, where street lighting is often absent, and headlights may introduce undesirable glare. This study addresses these challenges by introducing a novel framework for labeling-free data augmentation, leveraging CARLA-generated synthetic data for day-to-night image style transfer. Specifically, the framework incorporates the Efficient Attention Generative Adversarial Network for realistic day-to-night style transfer and uses CARLA-generated synthetic nighttime images to help the model learn vehicle headlight effects. To evaluate the efficacy of the proposed framework, we fine-tuned the YOLO11 model with an augmented dataset specifically curated for rural nighttime environments, achieving significant improvements in nighttime vehicle detection. This novel approach is simple yet effective, offering a scalable solution to enhance AI-based detection systems in low-visibility environments and extend the applicability of object detection models to broader real-world contexts.

* 12 pages, 8 figures, 3 tables

Via

Access Paper or Ask Questions

Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis

Dec 06, 2024

Oscar Lares, Hao Zhen, Jidong J. Yang

Figure 1 for Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis

Figure 2 for Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis

Figure 3 for Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis

Figure 4 for Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis

Abstract:Reliable and interpretable traffic crash modeling is essential for understanding causality and improving road safety. This study introduces a novel approach to predicting collision types by utilizing a comprehensive dataset fused from multiple sources, including weather data, crash reports, high-resolution traffic information, pavement geometry, and facility characteristics. Central to our approach is the development of a Feature Group Tabular Transformer (FGTT) model, which organizes disparate data into meaningful feature groups, represented as tokens. These group-based tokens serve as rich semantic components, enabling effective identification of collision patterns and interpretation of causal mechanisms. The FGTT model is benchmarked against widely used tree ensemble models, including Random Forest, XGBoost, and CatBoost, demonstrating superior predictive performance. Furthermore, model interpretation reveals key influential factors, providing fresh insights into the underlying causality of distinct crash types.

* 19 pages, 6 figures, 5 tables

Via

Access Paper or Ask Questions

Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference

Aug 04, 2024

Hao Zhen, Yucheng Shi, Yongcan Huang, Jidong J. Yang, Ninghao Liu

Abstract:Harnessing the power of Large Language Models (LLMs), this study explores the use of three state-of-the-art LLMs, specifically GPT-3.5-turbo, LLaMA3-8B, and LLaMA3-70B, for crash severity inference, framing it as a classification task. We generate textual narratives from original traffic crash tabular data using a pre-built template infused with domain knowledge. Additionally, we incorporated Chain-of-Thought (CoT) reasoning to guide the LLMs in analyzing the crash causes and then inferring the severity. This study also examine the impact of prompt engineering specifically designed for crash severity inference. The LLMs were tasked with crash severity inference to: (1) evaluate the models' capabilities in crash severity analysis, (2) assess the effectiveness of CoT and domain-informed prompt engineering, and (3) examine the reasoning abilities with the CoT framework. Our results showed that LLaMA3-70B consistently outperformed the other models, particularly in zero-shot settings. The CoT and Prompt Engineering techniques significantly enhanced performance, improving logical reasoning and addressing alignment issues. Notably, the CoT offers valuable insights into LLMs' reasoning processes, unleashing their capacity to consider diverse factors such as environmental conditions, driver behavior, and vehicle characteristics in severity analysis and inference.

* 20 pages, 12 figures, 3 tables

Via

Access Paper or Ask Questions

Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms

Feb 01, 2023

Shihan Ma, Jidong J. Yang

Figure 1 for Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms

Figure 2 for Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms

Figure 3 for Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms

Figure 4 for Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms

Abstract:This paper introduces a novel approach to leverage features learned from both supervised and self-supervised paradigms, to improve image classification tasks, specifically for vehicle classification. Two state-of-the-art self-supervised learning methods, DINO and data2vec, were evaluated and compared for their representation learning of vehicle images. The former contrasts local and global views while the latter uses masked prediction on multi-layered representations. In the latter case, supervised learning is employed to finetune a pretrained YOLOR object detector for detecting vehicle wheels, from which definitive wheel positional features are retrieved. The representations learned from these self-supervised learning methods were combined with the wheel positional features for the vehicle classification task. Particularly, a random wheel masking strategy was utilized to finetune the previously learned representations in harmony with the wheel positional features during the training of the classifier. Our experiments show that the data2vec-distilled representations, which are consistent with our wheel masking strategy, outperformed the DINO counterpart, resulting in a celebrated Top-1 classification accuracy of 97.2% for classifying the 13 vehicle classes defined by the Federal Highway Administration.

* Eng. 2023; 4(1):444-456
* 15 pages, 7 figures, 7 tables

Via

Access Paper or Ask Questions

Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification

Dec 27, 2022

Hao Zhen, Yucheng Shi, Jidong J. Yang, Javad Mohammadpour Vehni

Figure 1 for Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification

Figure 2 for Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification

Figure 3 for Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification

Figure 4 for Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification

Abstract:Classification using supervised learning requires annotating a large amount of classes-balanced data for model training and testing. This has practically limited the scope of applications with supervised learning, in particular deep learning. To address the issues associated with limited and imbalanced data, this paper introduces a sample-efficient co-supervised learning paradigm (SEC-CGAN), in which a conditional generative adversarial network (CGAN) is trained alongside the classifier and supplements semantics-conditioned, confidence-aware synthesized examples to the annotated data during the training process. In this setting, the CGAN not only serves as a co-supervisor but also provides complementary quality examples to aid the classifier training in an end-to-end fashion. Experiments demonstrate that the proposed SEC-CGAN outperforms the external classifier GAN (EC-GAN) and a baseline ResNet-18 classifier. For the comparison, all classifiers in above methods adopt the ResNet-18 architecture as the backbone. Particularly, for the Street View House Numbers dataset, using the 5% of training data, a test accuracy of 90.26% is achieved by SEC-CGAN as opposed to 88.59% by EC-GAN and 87.17% by the baseline classifier; for the highway image dataset, using the 10% of training data, a test accuracy of 98.27% is achieved by SEC-CGAN, compared to 97.84% by EC-GAN and 95.52% by the baseline classifier.

* Applied Computing and Intelligence, 2023, Volume 3, Issue 1: 13-26
* 14 pages, 5 figures

Via

Access Paper or Ask Questions

Semi-supervised multiscale dual-encoding method for faulty traffic data detection

Dec 27, 2022

Yongcan Huang, Jidong J. Yang

Abstract:Inspired by the recent success of deep learning in multiscale information encoding, we introduce a variational autoencoder (VAE) based semi-supervised method for detection of faulty traffic data, which is cast as a classification problem. Continuous wavelet transform (CWT) is applied to the time series of traffic volume data to obtain rich features embodied in time-frequency representation, followed by a twin of VAE models to separately encode normal data and faulty data. The resulting multiscale dual encodings are concatenated and fed to an attention-based classifier, consisting of a self-attention module and a multilayer perceptron. For comparison, the proposed architecture is evaluated against five different encoding schemes, including (1) VAE with only normal data encoding, (2) VAE with only faulty data encoding, (3) VAE with both normal and faulty data encodings, but without attention module in the classifier, (4) siamese encoding, and (5) cross-vision transformer (CViT) encoding. The first four encoding schemes adopted the same convolutional neural network (CNN) architecture while the fifth encoding scheme follows the transformer architecture of CViT. Our experiments show that the proposed architecture with the dual encoding scheme, coupled with attention module, outperforms other encoding schemes and results in classification accuracy of 96.4%, precision of 95.5%, and recall of 97.7%.

* Applied Computing and Intelligence, 2022, Volume 2, Issue 2: 99-114
* 16 pages, 8 figures

Via

Access Paper or Ask Questions