Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sara Abdali

Self-reflecting Large Language Models: A Hegelian Dialectical Approach

Jan 24, 2025

Sara Abdali, Can Goksen, Saeed Amizadeh andKazuhito Koishida

Abstract:Investigating NLP through a philosophical lens has recently caught researcher's eyes as it connects computational methods with classical schools of philosophy. This paper introduces a philosophical approach inspired by the Hegelian Dialectic for LLMs' self-reflection, utilizing a self-dialectical approach to emulate internal critiques and then synthesize new ideas by resolving the contradicting points. Moreover, this paper investigates the effect of LLMs' temperature for generation by establishing a dynamic annealing approach, which promotes the creativity in the early stages and gradually refines it by focusing on the nuances, as well as a fixed temperature strategy for generation. Our proposed approach is examined to determine its ability to generate novel ideas from an initial proposition. Additionally, a Multi Agent Majority Voting (MAMV) strategy is leveraged to assess the validity and novelty of the generated ideas, which proves beneficial in the absence of domain experts. Our experiments show promise in generating new ideas and provide a stepping-stone for future research.

Via

Access Paper or Ask Questions

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Sep 12, 2024

Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang(+1 more)

Figure 1 for Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Figure 2 for Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Figure 3 for Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Figure 4 for Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Abstract:Large language models (LLMs) show remarkable potential to act as computer agents, enhancing human productivity and software accessibility in multi-modal tasks that require planning and reasoning. However, measuring agent performance in realistic environments remains a challenge since: (i) most benchmarks are limited to specific modalities or domains (e.g. text-only, web navigation, Q&A, coding) and (ii) full benchmark evaluations are slow (on order of magnitude of days) given the multi-step sequential nature of tasks. To address these challenges, we introduce the Windows Agent Arena: a reproducible, general environment focusing exclusively on the Windows operating system (OS) where agents can operate freely within a real Windows OS and use the same wide range of applications, tools, and web browsers available to human users when solving tasks. We adapt the OSWorld framework (Xie et al., 2024) to create 150+ diverse Windows tasks across representative domains that require agent abilities in planning, screen understanding, and tool usage. Our benchmark is scalable and can be seamlessly parallelized in Azure for a full benchmark evaluation in as little as 20 minutes. To demonstrate Windows Agent Arena's capabilities, we also introduce a new multi-modal agent, Navi. Our agent achieves a success rate of 19.5% in the Windows domain, compared to 74.5% performance of an unassisted human. Navi also demonstrates strong performance on another popular web-based benchmark, Mind2Web. We offer extensive quantitative and qualitative analysis of Navi's performance, and provide insights into the opportunities for future research in agent development and data generation using Windows Agent Arena. Webpage: https://microsoft.github.io/WindowsAgentArena Code: https://github.com/microsoft/WindowsAgentArena

Via

Access Paper or Ask Questions

Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

Jul 30, 2024

Sara Abdali, Jia He, CJ Barberan, Richard Anarfi

Figure 1 for Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

Abstract:The advent of Large Language Models (LLMs) has garnered significant popularity and wielded immense power across various domains within Natural Language Processing (NLP). While their capabilities are undeniably impressive, it is crucial to identify and scrutinize their vulnerabilities especially when those vulnerabilities can have costly consequences. One such LLM, trained to provide a concise summarization from medical documents could unequivocally leak personal patient data when prompted surreptitiously. This is just one of many unfortunate examples that have been unveiled and further research is necessary to comprehend the underlying reasons behind such vulnerabilities. In this study, we delve into multiple sections of vulnerabilities which are model-based, training-time, inference-time vulnerabilities, and discuss mitigation strategies including "Model Editing" which aims at modifying LLMs behavior, and "Chroma Teaming" which incorporates synergy of multiple teaming strategies to enhance LLMs' resilience. This paper will synthesize the findings from each vulnerability section and propose new directions of research and development. By understanding the focal points of current vulnerabilities, we can better anticipate and mitigate future risks, paving the road for more robust and secure LLMs.

* 14 pages, 1 figure. arXiv admin note: text overlap with arXiv:2403.12503

Via

Access Paper or Ask Questions

Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices

Mar 19, 2024

Sara Abdali, Richard Anarfi, CJ Barberan, Jia He

Figure 1 for Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices

Figure 2 for Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices

Figure 3 for Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices

Figure 4 for Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices

Abstract:Large language models (LLMs) have significantly transformed the landscape of Natural Language Processing (NLP). Their impact extends across a diverse spectrum of tasks, revolutionizing how we approach language understanding and generations. Nevertheless, alongside their remarkable utility, LLMs introduce critical security and risk considerations. These challenges warrant careful examination to ensure responsible deployment and safeguard against potential vulnerabilities. This research paper thoroughly investigates security and privacy concerns related to LLMs from five thematic perspectives: security and privacy concerns, vulnerabilities against adversarial attacks, potential harms caused by misuses of LLMs, mitigation strategies to address these challenges while identifying limitations of current strategies. Lastly, the paper recommends promising avenues for future research to enhance the security and risk management of LLMs.

Via

Access Paper or Ask Questions

Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

Mar 09, 2024

Sara Abdali, Richard Anarfi, CJ Barberan, Jia He

Abstract:Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text. However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices. In this study, we delve into these challenges, explore existing strategies for mitigating them, with a particular emphasis on identifying AI-generated text as the ultimate solution. Additionally, we assess the feasibility of detection from a theoretical perspective and propose novel research directions to address the current limitations in this domain.

Via

Access Paper or Ask Questions

Extracting Self-Consistent Causal Insights from Users Feedback with LLMs and In-context Learning

Dec 11, 2023

Sara Abdali, Anjali Parikh, Steve Lim, Emre Kiciman

Abstract:Microsoft Windows Feedback Hub is designed to receive customer feedback on a wide variety of subjects including critical topics such as power and battery. Feedback is one of the most effective ways to have a grasp of users' experience with Windows and its ecosystem. However, the sheer volume of feedback received by Feedback Hub makes it immensely challenging to diagnose the actual cause of reported issues. To better understand and triage issues, we leverage Double Machine Learning (DML) to associate users' feedback with telemetry signals. One of the main challenges we face in the DML pipeline is the necessity of domain knowledge for model design (e.g., causal graph), which sometimes is either not available or hard to obtain. In this work, we take advantage of reasoning capabilities in Large Language Models (LLMs) to generate a prior model that which to some extent compensates for the lack of domain knowledge and could be used as a heuristic for measuring feedback informativeness. Our LLM-based approach is able to extract previously known issues, uncover new bugs, and identify sequences of events that lead to a bug, while minimizing out-of-domain outputs.

Via

Access Paper or Ask Questions

Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities

Apr 01, 2022

Sara Abdali

Figure 1 for Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities

Figure 2 for Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities

Figure 3 for Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities

Figure 4 for Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities

Abstract:As social media platforms are evolving from text-based forums into multi-modal environments, the nature of misinformation in social media is also changing accordingly. Taking advantage of the fact that visual modalities such as images and videos are more favorable and attractive to the users, and textual contents are sometimes skimmed carelessly, misinformation spreaders have recently targeted contextual correlations between modalities e.g., text and image. Thus, many research efforts have been put into development of automatic techniques for detecting possible cross-modal discordances in web-based media. In this work, we aim to analyze, categorize and identify existing approaches in addition to challenges and shortcomings they face in order to unearth new opportunities in furthering the research in the field of multi-modal misinformation detection.

Via

Access Paper or Ask Questions

Deepfake Representation with Multilinear Regression

Aug 15, 2021

Sara Abdali, M. Alex O. Vasilescu, Evangelos E. Papalexakis

Figure 1 for Deepfake Representation with Multilinear Regression

Figure 2 for Deepfake Representation with Multilinear Regression

Figure 3 for Deepfake Representation with Multilinear Regression

Figure 4 for Deepfake Representation with Multilinear Regression

Abstract:Generative neural network architectures such as GANs, may be used to generate synthetic instances to compensate for the lack of real data. However, they may be employed to create media that may cause social, political or economical upheaval. One emerging media is "Deepfake".Techniques that can discriminate between such media is indispensable. In this paper, we propose a modified multilinear (tensor) method, a combination of linear and multilinear regressions for representing fake and real data. We test our approach by representing Deepfakes with our modified multilinear (tensor) approach and perform SVM classification with encouraging results.

Via

Access Paper or Ask Questions

KNH: Multi-View Modeling with K-Nearest Hyperplanes Graph for Misinformation Detection

Feb 15, 2021

Sara Abdali, Neil Shah, Evangelos E. Papalexakis

Figure 1 for KNH: Multi-View Modeling with K-Nearest Hyperplanes Graph for Misinformation Detection

Figure 2 for KNH: Multi-View Modeling with K-Nearest Hyperplanes Graph for Misinformation Detection

Figure 3 for KNH: Multi-View Modeling with K-Nearest Hyperplanes Graph for Misinformation Detection

Figure 4 for KNH: Multi-View Modeling with K-Nearest Hyperplanes Graph for Misinformation Detection

Abstract:Graphs are one of the most efficacious structures for representing datapoints and their relations, and they have been largely exploited for different applications. Previously, the higher-order relations between the nodes have been modeled by a generalization of graphs known as hypergraphs. In hypergraphs, the edges are defined by a set of nodes i.e., hyperedges to demonstrate the higher order relationships between the data. However, there is no explicit higher-order generalization for nodes themselves. In this work, we introduce a novel generalization of graphs i.e., K-Nearest Hyperplanes graph (KNH) where the nodes are defined by higher order Euclidean subspaces for multi-view modeling of the nodes. In fact, in KNH, nodes are hyperplanes or more precisely m-flats instead of datapoints. We experimentally evaluate the KNH graph on two multi-aspect datasets for misinformation detection. The experimental results suggest that multi-view modeling of articles using KNH graph outperforms the classic KNN graph in terms of classification performance.

* Second International TrueFact Workshop 2020: Making a Credible Web for Tomorrow

Via

Access Paper or Ask Questions

Identifying Misinformation from Website Screenshots

Feb 15, 2021

Sara Abdali, Rutuja Gurav, Siddharth Menon, Daniel Fonseca, Negin Entezari, Neil Shah, Evangelos E. Papalexakis

Figure 1 for Identifying Misinformation from Website Screenshots

Figure 2 for Identifying Misinformation from Website Screenshots

Figure 3 for Identifying Misinformation from Website Screenshots

Figure 4 for Identifying Misinformation from Website Screenshots

Abstract:Can the look and the feel of a website give information about the trustworthiness of an article? In this paper, we propose to use a promising, yet neglected aspect in detecting the misinformativeness: the overall look of the domain webpage. To capture this overall look, we take screenshots of news articles served by either misinformative or trustworthy web domains and leverage a tensor decomposition based semi-supervised classification technique. The proposed approach i.e., VizFake is insensitive to a number of image transformations such as converting the image to grayscale, vectorizing the image and losing some parts of the screenshots. VizFake leverages a very small amount of known labels, mirroring realistic and practical scenarios, where labels (especially for known misinformative articles), are scarce and quickly become dated. The F1 score of VizFake on a dataset of 50k screenshots of news articles spanning more than 500 domains is roughly 85% using only 5% of ground truth labels. Furthermore, tensor representations of VizFake, obtained in an unsupervised manner, allow for exploratory analysis of the data that provides valuable insights into the problem. Finally, we compare VizFake with deep transfer learning, since it is a very popular black-box approach for image classification and also well-known text text-based methods. VizFake achieves competitive accuracy with deep transfer learning models while being two orders of magnitude faster and not requiring laborious hyper-parameter tuning.

* The International AAAI Conference on Web and Social Media (ICWSM) 2021

Via

Access Paper or Ask Questions