Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andry Rakotonirainy

Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding

Jan 09, 2025

Mohammed Elhenawy, Huthaifa I. Ashqar, Andry Rakotonirainy, Taqwa I. Alhadidi, Ahmed Jaber, Mohammad Abu Tami

Figure 1 for Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding

Figure 2 for Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding

Figure 3 for Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding

Figure 4 for Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding

Abstract:Scene understanding is essential for enhancing driver safety, generating human-centric explanations for Automated Vehicle (AV) decisions, and leveraging Artificial Intelligence (AI) for retrospective driving video analysis. This study developed a dynamic scene retrieval system using Contrastive Language-Image Pretraining (CLIP) models, which can be optimized for real-time deployment on edge devices. The proposed system outperforms state-of-the-art in-context learning methods, including the zero-shot capabilities of GPT-4o, particularly in complex scenarios. By conducting frame-level analysis on the Honda Scenes Dataset, which contains a collection of about 80 hours of annotated driving videos capturing diverse real-world road and weather conditions, our study highlights the robustness of CLIP models in learning visual concepts from natural language supervision. Results also showed that fine-tuning the CLIP models, such as ViT-L/14 and ViT-B/32, significantly improved scene classification, achieving a top F1 score of 91.1%. These results demonstrate the ability of the system to deliver rapid and precise scene recognition, which can be used to meet the critical requirements of Advanced Driver Assistance Systems (ADAS). This study shows the potential of CLIP models to provide scalable and efficient frameworks for dynamic scene understanding and classification. Furthermore, this work lays the groundwork for advanced autonomous vehicle technologies by fostering a deeper understanding of driver behavior, road conditions, and safety-critical scenarios, marking a significant step toward smarter, safer, and more context-aware autonomous driving systems.

Via

Access Paper or Ask Questions

Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges

Jun 26, 2024

Mohammed Elhenawy, Ahmad Abutahoun, Taqwa I. Alhadidi, Ahmed Jaber, Huthaifa I. Ashqar, Shadi Jaradat, Ahmed Abdelhay, Sebastien Glaser, Andry Rakotonirainy

Figure 1 for Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges

Figure 2 for Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges

Figure 3 for Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges

Figure 4 for Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges

Abstract:Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems, including zero-shot in-context learning scenarios. This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP) using images that portray point distributions on a two-dimensional plane. We introduce a novel approach employing multiple specialized agents within the MLLM framework, each dedicated to optimizing solutions for these combinatorial challenges. Our experimental investigation includes rigorous evaluations across zero-shot settings and introduces innovative multi-agent zero-shot in-context scenarios. The results demonstrated that both multi-agent models. Multi-Agent 1, which includes the Initializer, Critic, and Scorer agents, and Multi-Agent 2, which comprises only the Initializer and Critic agents; significantly improved solution quality for TSP and mTSP problems. Multi-Agent 1 excelled in environments requiring detailed route refinement and evaluation, providing a robust framework for sophisticated optimizations. In contrast, Multi-Agent 2, focusing on iterative refinements by the Initializer and Critic, proved effective for rapid decision-making scenarios. These experiments yield promising outcomes, showcasing the robust visual reasoning capabilities of MLLMs in addressing diverse combinatorial problems. The findings underscore the potential of MLLMs as powerful tools in computational optimization, offering insights that could inspire further advancements in this promising field. Project link: https://github.com/ahmed-abdulhuy/Solving-TSP-and-mTSP-Combinatorial-Challenges-using-Visual-Reasoning-and-Multi-Agent-Approach-MLLMs-.git

Via

Access Paper or Ask Questions

Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems

Jun 11, 2024

Mohammed Elhenawy, Ahmed Abdelhay, Taqwa I. Alhadidi, Huthaifa I Ashqar, Shadi Jaradat, Ahmed Jaber, Sebastien Glaser, Andry Rakotonirainy

Figure 1 for Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems

Figure 2 for Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems

Figure 3 for Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems

Figure 4 for Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems

Abstract:Multimodal Large Language Models (MLLMs) have demonstrated proficiency in processing di-verse modalities, including text, images, and audio. These models leverage extensive pre-existing knowledge, enabling them to address complex problems with minimal to no specific training examples, as evidenced in few-shot and zero-shot in-context learning scenarios. This paper investigates the use of MLLMs' visual capabilities to 'eyeball' solutions for the Traveling Salesman Problem (TSP) by analyzing images of point distributions on a two-dimensional plane. Our experiments aimed to validate the hypothesis that MLLMs can effectively 'eyeball' viable TSP routes. The results from zero-shot, few-shot, self-ensemble, and self-refine zero-shot evaluations show promising outcomes. We anticipate that these findings will inspire further exploration into MLLMs' visual reasoning abilities to tackle other combinatorial problems.

Via

Access Paper or Ask Questions

Hybrid Pointer Networks for Traveling Salesman Problems Optimization

Oct 13, 2021

Ahmed Stohy, Heba-Tullah Abdelhakam, Sayed Ali, Mohammed Elhenawy, Abdallah A Hassan, Mahmoud Masoud, Sebastien Glaser, Andry Rakotonirainy

Figure 1 for Hybrid Pointer Networks for Traveling Salesman Problems Optimization

Figure 2 for Hybrid Pointer Networks for Traveling Salesman Problems Optimization

Figure 3 for Hybrid Pointer Networks for Traveling Salesman Problems Optimization

Figure 4 for Hybrid Pointer Networks for Traveling Salesman Problems Optimization

Abstract:In this work, a novel idea is presented for combinatorial optimization problems, a hybrid network, which results in a superior outcome. We applied this method to graph pointer networks [1], expanding its capabilities to a higher level. We proposed a hybrid pointer network (HPN) to solve the travelling salesman problem trained by reinforcement learning. Furthermore, HPN builds upon graph pointer networks which is an extension of pointer networks with an additional graph embedding layer. HPN outperforms the graph pointer network in solution quality due to the hybrid encoder, which provides our model with a verity encoding type, allowing our model to converge to a better policy. Our network significantly outperforms the original graph pointer network for small and large-scale problems increasing its performance for TSP50 from 5.959 to 5.706 without utilizing 2opt, Pointer networks, Attention model, and a wide range of models, producing results comparable to highly tuned and specialized algorithms. We make our data, models, and code publicly available [2].

Via

Access Paper or Ask Questions

ECG-Based Driver Stress Levels Detection System Using Hyperparameter Optimization

Jan 01, 2021

Mohammad Naim Rastgoo, Bahareh Nakisa, Andry Rakotonirainy, Frederic Maire, Vinod Chandran

Figure 1 for ECG-Based Driver Stress Levels Detection System Using Hyperparameter Optimization

Figure 2 for ECG-Based Driver Stress Levels Detection System Using Hyperparameter Optimization

Figure 3 for ECG-Based Driver Stress Levels Detection System Using Hyperparameter Optimization

Figure 4 for ECG-Based Driver Stress Levels Detection System Using Hyperparameter Optimization

Abstract:Stress and driving are a dangerous combination which can lead to crashes, as evidenced by the large number of road traffic crashes that involve stress. Motivated by the need to address the significant costs of driver stress, it is essential to build a practical system that can classify driver stress level with high accuracy. However, the performance of an accurate driving stress levels classification system depends on hyperparameter optimization choices such as data segmentation (windowing hyperparameters). The configuration setting of hyperparameters, which has an enormous impact on the system performance, are typically hand-tuned while evaluating the algorithm. This tuning process is time consuming and often depends on personal experience. There are also no generic optimal values for hyperparameters values. In this work, we propose a meta-heuristic approach to support automated hyperparameter optimization and provide a real-time driver stress detection system. This is the first systematic study of optimizing windowing hyperparameters based on Electrocardiogram (ECG) signal in the domain of driving safety. Our approach is to propose a framework based on Particle Swarm Optimization algorithm (PSO) to select an optimal/near optimal windowing hyperparameters values. The performance of the proposed framework is evaluated on two datasets: a public dataset (DRIVEDB dataset) and our collected dataset using an advanced simulator. DRIVEDB dataset was collected in a real time driving scenario, and our dataset was collected using an advanced driving simulator in the control environment. We demonstrate that optimising the windowing hyperparameters yields significant improvement in terms of accuracy. The most accurate built model applied to the public dataset and our dataset, based on the selected windowing hyperparameters, achieved 92.12% and 77.78% accuracy, respectively.

* 17 pages

Via

Access Paper or Ask Questions

A Review on Drivers Red Light Running and Turning Behaviour Prediction

Aug 15, 2020

Md Mostafizur Rahman Komol, Mohammed Elhenawy, Shamsunnahar Yasmin, Mahmoud Masoud, Andry Rakotonirainy

Figure 1 for A Review on Drivers Red Light Running and Turning Behaviour Prediction

Figure 2 for A Review on Drivers Red Light Running and Turning Behaviour Prediction

Figure 3 for A Review on Drivers Red Light Running and Turning Behaviour Prediction

Figure 4 for A Review on Drivers Red Light Running and Turning Behaviour Prediction

Abstract:Drivers behaviour prediction has been an unceasing concern for transportation safety divisions all over the world. A massive amount of lives and properties losses due to the adversities at intersections and pedestrian crossings. Especially for countries with poor road safety technologies, this toll knows no bounds. A myriad of research and studies have been mastered for technological evaluation and model representation over this issue. Instead, little comprehensive review has been made on the drivers behaviour prediction at signalised intersections on red-light running and turning. This Paper aims at incorporating previous researches on drivers behaviour prediction and the prediction parameters leading to traffic violation like red-light running and turning at intersection and pedestrian crossing. The review also covers the probable crash scenarios by red-light running and turning and analyses the innovation of counter-crash technologies with future research directions.

Via

Access Paper or Ask Questions

Vulnerable Road User Detection Using Smartphone Sensors and Recurrence Quantification Analysis

Jun 12, 2020

Huthaifa I. Ashqar, Mohammed Elhenawy, Mahmoud Masoud, Andry Rakotonirainy, Hesham A. Rakha

Figure 1 for Vulnerable Road User Detection Using Smartphone Sensors and Recurrence Quantification Analysis

Figure 2 for Vulnerable Road User Detection Using Smartphone Sensors and Recurrence Quantification Analysis

Figure 3 for Vulnerable Road User Detection Using Smartphone Sensors and Recurrence Quantification Analysis

Figure 4 for Vulnerable Road User Detection Using Smartphone Sensors and Recurrence Quantification Analysis

Abstract:With the fast advancements of the Autonomous Vehicle (AV) industry, detection of Vulnerable Road Users (VRUs) using smartphones is critical for safety applications of Cooperative Intelligent Transportation Systems (C-ITSs). This study explores the use of low-power smartphone sensors and the Recurrence Quantification Analysis (RQA) features for this task. These features are computed over a thresholded similarity matrix extracted from nine channels: accelerometer, gyroscope, and rotation vector in each direction (x, y, and z). Given the high-power consumption of GPS, GPS data is excluded. RQA features are added to traditional time domain features to investigate the classification accuracy when using binary, four-class, and five-class Random Forest classifiers. Experimental results show a promising performance when only using RQA features with a resulted accuracy of 98. 34% and a 98. 79% by adding time domain features. Results outperform previous reported accuracy, demonstrating that RQA features have high classifying capability with respect to VRU detection.

* 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 2019, pp. 1054-1059
* Published in: 2019 IEEE Intelligent Transportation Systems Conference (ITSC)

Via

Access Paper or Ask Questions

A Comparative Analysis of E-Scooter and E-Bike Usage Patterns: Findings from the City of Austin, TX

Jun 07, 2020

Mohammed Hamad Almannaa, Huthaifa I. Ashqar, Mohammed Elhenawy, Mahmoud Masoud, Andry Rakotonirainy, Hesham Rakha

Figure 1 for A Comparative Analysis of E-Scooter and E-Bike Usage Patterns: Findings from the City of Austin, TX

Figure 2 for A Comparative Analysis of E-Scooter and E-Bike Usage Patterns: Findings from the City of Austin, TX

Figure 3 for A Comparative Analysis of E-Scooter and E-Bike Usage Patterns: Findings from the City of Austin, TX

Figure 4 for A Comparative Analysis of E-Scooter and E-Bike Usage Patterns: Findings from the City of Austin, TX

Abstract:E-scooter-sharing and e-bike-sharing systems are accommodating and easing the increased traffic in dense cities and are expanding considerably. However, these new micro-mobility transportation modes raise numerous operational and safety concerns. This study analyzes e-scooter and dockless e-bike sharing system user behavior. We investigate how average trip speed change depending on the day of the week and the time of the day. We used a dataset from the city of Austin, TX from December 2018 to May 2019. Our results generally show that the trip average speed for e-bikes ranges between 3.01 and 3.44 m/s, which is higher than that for e-scooters (2.19 to 2.78 m/s). Results also show a similar usage pattern for the average speed of e-bikes and e-scooters throughout the days of the week and a different usage pattern for the average speed of e-bikes and e-scooters over the hours of the day. We found that users tend to ride e-bikes and e-scooters with a slower average speed for recreational purposes compared to when they are ridden for commuting purposes. This study is a building block in this field, which serves as a first of its kind, and sheds the light of significant new understanding of this emerging class of shared-road users.

* Submitted to the International Journal of Sustainable Transportation

Via

Access Paper or Ask Questions

Topological Stability: a New Algorithm for Selecting The Nearest Neighbors in Non-Linear Dimensionality Reduction Techniques

Nov 17, 2019

Mohammed Elhenawy, Mahmoud Masoud, Sebastian Glaser, Andry Rakotonirainy

Figure 1 for Topological Stability: a New Algorithm for Selecting The Nearest Neighbors in Non-Linear Dimensionality Reduction Techniques

Abstract:In the machine learning field, dimensionality reduction is an important task. It mitigates the undesired properties of high-dimensional spaces to facilitate classification, compression, and visualization of high-dimensional data. During the last decade, researchers proposed many new (non-linear) techniques for dimensionality reduction. Most of these techniques are based on the intuition that data lies on or near a complex low-dimensional manifold that is embedded in the high-dimensional space. New techniques for dimensionality reduction aim at identifying and extracting the manifold from the high-dimensional space. Isomap is one of widely-used low-dimensional embedding methods, where geodesic distances on a weighted graph are incorporated with the classical scaling (metric multidimensional scaling). The Isomap chooses the nearest neighbours based on the distance only which causes bridges and topological instability. In this paper, we propose a new algorithm to choose the nearest neighbours to reduce the number of short-circuit errors and hence improves the topological stability. Because at any point on the manifold, that point and its nearest neighbours form a vector subspace and the orthogonal to that subspace is orthogonal to all vectors spans the vector subspace. The prposed algorithmuses the point itself and its two nearest neighbours to find the bases of the subspace and the orthogonal to that subspace which belongs to the orthogonal complementary subspace. The proposed algorithm then adds new points to the two nearest neighbours based on the distance and the angle between each new point and the orthogonal to the subspace. The superior performance of the new algorithm in choosing the nearest neighbours is confirmed through experimental work with several datasets.

Via

Access Paper or Ask Questions