Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hao Zhen

Multi-view Phase-aware Pedestrian-Vehicle Incident Reasoning Framework with Vision-Language Models

Nov 18, 2025

Hao Zhen, Yunxiang Yang, Jidong J. Yang

Abstract:Pedestrian-vehicle incidents remain a critical urban safety challenge, with pedestrians accounting for over 20% of global traffic fatalities. Although existing video-based systems can detect when incidents occur, they provide little insight into how these events unfold across the distinct cognitive phases of pedestrian behavior. Recent vision-language models (VLMs) have shown strong potential for video understanding, but they remain limited in that they typically process videos in isolation, without explicit temporal structuring or multi-view integration. This paper introduces Multi-view Phase-aware Pedestrian-Vehicle Incident Reasoning (MP-PVIR), a unified framework that systematically processes multi-view video streams into structured diagnostic reports through four stages: (1) event-triggered multi-view video acquisition, (2) pedestrian behavior phase segmentation, (3) phase-specific multi-view reasoning, and (4) hierarchical synthesis and diagnostic reasoning. The framework operationalizes behavioral theory by automatically segmenting incidents into cognitive phases, performing synchronized multi-view analysis within each phase, and synthesizing results into causal chains with targeted prevention strategies. Particularly, two specialized VLMs underpin the MP-PVIR pipeline: TG-VLM for behavioral phase segmentation (mIoU = 0.4881) and PhaVR-VLM for phase-aware multi-view analysis, achieving a captioning score of 33.063 and up to 64.70% accuracy on question answering. Finally, a designated large language model is used to generate comprehensive reports detailing scene understanding, behavior interpretation, causal reasoning, and prevention recommendations. Evaluation on the Woven Traffic Safety dataset shows that MP-PVIR effectively translates multi-view video data into actionable insights, advancing AI-driven traffic safety analytics for vehicle-infrastructure cooperative systems.

* 23 pages, 4 figures, 3 tables

Via

Access Paper or Ask Questions

CrashSage: A Large Language Model-Centered Framework for Contextual and Interpretable Traffic Crash Analysis

May 08, 2025

Hao Zhen, Jidong J. Yang

Abstract:Road crashes claim over 1.3 million lives annually worldwide and incur global economic losses exceeding \$1.8 trillion. Such profound societal and financial impacts underscore the urgent need for road safety research that uncovers crash mechanisms and delivers actionable insights. Conventional statistical models and tree ensemble approaches typically rely on structured crash data, overlooking contextual nuances and struggling to capture complex relationships and underlying semantics. Moreover, these approaches tend to incur significant information loss, particularly in narrative elements related to multi-vehicle interactions, crash progression, and rare event characteristics. This study presents CrashSage, a novel Large Language Model (LLM)-centered framework designed to advance crash analysis and modeling through four key innovations. First, we introduce a tabular-to-text transformation strategy paired with relational data integration schema, enabling the conversion of raw, heterogeneous crash data into enriched, structured textual narratives that retain essential structural and relational context. Second, we apply context-aware data augmentation using a base LLM model to improve narrative coherence while preserving factual integrity. Third, we fine-tune the LLaMA3-8B model for crash severity inference, demonstrating superior performance over baseline approaches, including zero-shot, zero-shot with chain-of-thought prompting, and few-shot learning, with multiple models (GPT-4o, GPT-4o-mini, LLaMA3-70B). Finally, we employ a gradient-based explainability technique to elucidate model decisions at both the individual crash level and across broader risk factor dimensions. This interpretability mechanism enhances transparency and enables targeted road safety interventions by providing deeper insights into the most influential factors.

* 20 pages, 7 figures

Via

Access Paper or Ask Questions

Enhancing Nighttime Vehicle Detection with Day-to-Night Style Transfer and Labeling-Free Augmentation

Dec 21, 2024

Yunxiang Yang, Hao Zhen, Yongcan Huang, Jidong J. Yang

Figure 1 for Enhancing Nighttime Vehicle Detection with Day-to-Night Style Transfer and Labeling-Free Augmentation

Figure 2 for Enhancing Nighttime Vehicle Detection with Day-to-Night Style Transfer and Labeling-Free Augmentation

Figure 3 for Enhancing Nighttime Vehicle Detection with Day-to-Night Style Transfer and Labeling-Free Augmentation

Figure 4 for Enhancing Nighttime Vehicle Detection with Day-to-Night Style Transfer and Labeling-Free Augmentation

Abstract:Existing deep learning-based object detection models perform well under daytime conditions but face significant challenges at night, primarily because they are predominantly trained on daytime images. Additionally, training with nighttime images presents another challenge: even human annotators struggle to accurately label objects in low-light conditions. This issue is particularly pronounced in transportation applications, such as detecting vehicles and other objects of interest on rural roads at night, where street lighting is often absent, and headlights may introduce undesirable glare. This study addresses these challenges by introducing a novel framework for labeling-free data augmentation, leveraging CARLA-generated synthetic data for day-to-night image style transfer. Specifically, the framework incorporates the Efficient Attention Generative Adversarial Network for realistic day-to-night style transfer and uses CARLA-generated synthetic nighttime images to help the model learn vehicle headlight effects. To evaluate the efficacy of the proposed framework, we fine-tuned the YOLO11 model with an augmented dataset specifically curated for rural nighttime environments, achieving significant improvements in nighttime vehicle detection. This novel approach is simple yet effective, offering a scalable solution to enhance AI-based detection systems in low-visibility environments and extend the applicability of object detection models to broader real-world contexts.

* 12 pages, 8 figures, 3 tables

Via

Access Paper or Ask Questions

Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis

Dec 06, 2024

Oscar Lares, Hao Zhen, Jidong J. Yang

Figure 1 for Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis

Figure 2 for Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis

Figure 3 for Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis

Figure 4 for Feature Group Tabular Transformer: A Novel Approach to Traffic Crash Modeling and Causality Analysis

Abstract:Reliable and interpretable traffic crash modeling is essential for understanding causality and improving road safety. This study introduces a novel approach to predicting collision types by utilizing a comprehensive dataset fused from multiple sources, including weather data, crash reports, high-resolution traffic information, pavement geometry, and facility characteristics. Central to our approach is the development of a Feature Group Tabular Transformer (FGTT) model, which organizes disparate data into meaningful feature groups, represented as tokens. These group-based tokens serve as rich semantic components, enabling effective identification of collision patterns and interpretation of causal mechanisms. The FGTT model is benchmarked against widely used tree ensemble models, including Random Forest, XGBoost, and CatBoost, demonstrating superior predictive performance. Furthermore, model interpretation reveals key influential factors, providing fresh insights into the underlying causality of distinct crash types.

* 19 pages, 6 figures, 5 tables

Via

Access Paper or Ask Questions

Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference

Aug 04, 2024

Hao Zhen, Yucheng Shi, Yongcan Huang, Jidong J. Yang, Ninghao Liu

Abstract:Harnessing the power of Large Language Models (LLMs), this study explores the use of three state-of-the-art LLMs, specifically GPT-3.5-turbo, LLaMA3-8B, and LLaMA3-70B, for crash severity inference, framing it as a classification task. We generate textual narratives from original traffic crash tabular data using a pre-built template infused with domain knowledge. Additionally, we incorporated Chain-of-Thought (CoT) reasoning to guide the LLMs in analyzing the crash causes and then inferring the severity. This study also examine the impact of prompt engineering specifically designed for crash severity inference. The LLMs were tasked with crash severity inference to: (1) evaluate the models' capabilities in crash severity analysis, (2) assess the effectiveness of CoT and domain-informed prompt engineering, and (3) examine the reasoning abilities with the CoT framework. Our results showed that LLaMA3-70B consistently outperformed the other models, particularly in zero-shot settings. The CoT and Prompt Engineering techniques significantly enhanced performance, improving logical reasoning and addressing alignment issues. Notably, the CoT offers valuable insights into LLMs' reasoning processes, unleashing their capacity to consider diverse factors such as environmental conditions, driver behavior, and vehicle characteristics in severity analysis and inference.

* 20 pages, 12 figures, 3 tables

Via

Access Paper or Ask Questions

Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification

Dec 27, 2022

Hao Zhen, Yucheng Shi, Jidong J. Yang, Javad Mohammadpour Vehni

Figure 1 for Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification

Figure 2 for Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification

Figure 3 for Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification

Figure 4 for Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification

Abstract:Classification using supervised learning requires annotating a large amount of classes-balanced data for model training and testing. This has practically limited the scope of applications with supervised learning, in particular deep learning. To address the issues associated with limited and imbalanced data, this paper introduces a sample-efficient co-supervised learning paradigm (SEC-CGAN), in which a conditional generative adversarial network (CGAN) is trained alongside the classifier and supplements semantics-conditioned, confidence-aware synthesized examples to the annotated data during the training process. In this setting, the CGAN not only serves as a co-supervisor but also provides complementary quality examples to aid the classifier training in an end-to-end fashion. Experiments demonstrate that the proposed SEC-CGAN outperforms the external classifier GAN (EC-GAN) and a baseline ResNet-18 classifier. For the comparison, all classifiers in above methods adopt the ResNet-18 architecture as the backbone. Particularly, for the Street View House Numbers dataset, using the 5% of training data, a test accuracy of 90.26% is achieved by SEC-CGAN as opposed to 88.59% by EC-GAN and 87.17% by the baseline classifier; for the highway image dataset, using the 10% of training data, a test accuracy of 98.27% is achieved by SEC-CGAN, compared to 97.84% by EC-GAN and 95.52% by the baseline classifier.

* Applied Computing and Intelligence, 2023, Volume 3, Issue 1: 13-26
* 14 pages, 5 figures

Via

Access Paper or Ask Questions