Abstract:With the rise of sophisticated phishing attacks, there is a growing need for effective and economical detection solutions. This paper explores the use of large multimodal agents, specifically Gemini 1.5 Flash and GPT-4o mini, to analyze both URLs and webpage screenshots via APIs, thus avoiding the complexities of training and maintaining AI systems. Our findings indicate that integrating these two data types substantially enhances detection performance over using either type alone. However, API usage incurs costs per query that depend on the number of input and output tokens. To address this, we propose a two-tiered agentic approach: initially, one agent assesses the URL, and if inconclusive, a second agent evaluates both the URL and the screenshot. This method not only maintains robust detection performance but also significantly reduces API costs by minimizing unnecessary multi-input queries. Cost analysis shows that with the agentic approach, GPT-4o mini can process about 4.2 times as many websites per $100 compared to the multimodal approach (107,440 vs. 25,626), and Gemini 1.5 Flash can process about 2.6 times more websites (2,232,142 vs. 862,068). These findings underscore the significant economic benefits of the agentic approach over the multimodal method, providing a viable solution for organizations aiming to leverage advanced AI for phishing detection while controlling expenses.
Abstract:The effectiveness of Large Language Models (LLMs) significantly relies on the quality of the prompts they receive. However, even when processing identical prompts, LLMs can yield varying outcomes due to differences in their training processes. To leverage the collective intelligence of multiple LLMs and enhance their performance, this study investigates three majority voting strategies for text classification, focusing on phishing URL detection. The strategies are: (1) a prompt-based ensemble, which utilizes majority voting across the responses generated by a single LLM to various prompts; (2) a model-based ensemble, which entails aggregating responses from multiple LLMs to a single prompt; and (3) a hybrid ensemble, which combines the two methods by sending different prompts to multiple LLMs and then aggregating their responses. Our analysis shows that ensemble strategies are most suited in cases where individual components exhibit equivalent performance levels. However, when there is a significant discrepancy in individual performance, the effectiveness of the ensemble method may not exceed that of the highest-performing single LLM or prompt. In such instances, opting for ensemble techniques is not recommended.
Abstract:The success of Large Language Models (LLMs) has led to a parallel rise in the development of Large Multimodal Models (LMMs), such as Gemini-pro, which have begun to transform a variety of applications. These sophisticated multimodal models are designed to interpret and analyze complex data, integrating both textual and visual information on a scale previously unattainable, opening new avenues for a range of applications. This paper investigates the applicability and effectiveness of prompt-engineered Gemini-pro LMMs versus fine-tuned Vision Transformer (ViT) models in addressing critical security challenges. We focus on two distinct tasks: a visually evident task of detecting simple triggers, such as small squares in images, indicative of potential backdoors, and a non-visually evident task of malware classification through visual representations. Our results highlight a significant divergence in performance, with Gemini-pro falling short in accuracy and reliability when compared to fine-tuned ViT models. The ViT models, on the other hand, demonstrate exceptional accuracy, achieving near-perfect performance on both tasks. This study not only showcases the strengths and limitations of prompt-engineered LMMs in cybersecurity applications but also emphasizes the unmatched efficacy of fine-tuned ViT models for precise and dependable tasks.
Abstract:Ever since the COVID-19 pandemic started, all the governments have been trying to limit its effects on their citizens and countries. This pandemic was harsh on different levels for almost all populations worldwide and this is what drove researchers and scientists to get involved and work on several kinds of simulations to get a better insight into this virus and be able to stop it the earliest possible. In this study, we simulate the spread of COVID-19 in Lebanon using an Agent-Based Model where people are modeled as agents that have specific characteristics and behaviors determined from statistical distributions using Monte Carlo Algorithm. These agents can go into the world, interact with each other, and thus, infect each other. This is how the virus spreads. During the simulation, we can introduce different Non-Pharmaceutical Interventions - or more commonly NPIs - that aim to limit the spread of the virus (wearing a mask, closing locations, etc). Our Simulator was first validated on concepts (e.g. Flattening the Curve and Second Wave scenario), and then it was applied on the case of Lebanon. We studied the effect of opening schools and universities on the pandemic situation in the country since the Lebanese Ministry of Education is planning to do so progressively, starting from 21 April 2021. Based on the results we obtained, we conclude that it would be better to delay the school openings while the vaccination campaign is still slow in the country.