Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peter van der Putten

Agentic Large Language Models, a survey

Mar 29, 2025

Aske Plaat, Max van Duijn, Niki van Stein, Mike Preuss, Peter van der Putten, Kees Joost Batenburg

Figure 1 for Agentic Large Language Models, a survey

Figure 2 for Agentic Large Language Models, a survey

Figure 3 for Agentic Large Language Models, a survey

Figure 4 for Agentic Large Language Models, a survey

Abstract:There is great interest in agentic LLMs, large language models that act as agents. We review the growing body of work in this area and provide a research agenda. Agentic LLMs are LLMs that (1) reason, (2) act, and (3) interact. We organize the literature according to these three categories. The research in the first category focuses on reasoning, reflection, and retrieval, aiming to improve decision making; the second category focuses on action models, robots, and tools, aiming for agents that act as useful assistants; the third category focuses on multi-agent systems, aiming for collaborative task solving and simulating interaction to study emergent social behavior. We find that works mutually benefit from results in other categories: retrieval enables tool use, reflection improves multi-agent collaboration, and reasoning benefits all categories. We discuss applications of agentic LLMs and provide an agenda for further research. Important applications are in medical diagnosis, logistics and financial market analysis. Meanwhile, self-reflective agents playing roles and interacting with one another augment the process of scientific research itself. Further, agentic LLMs may provide a solution for the problem of LLMs running out of training data: inference-time behavior generates new training states, such that LLMs can keep learning without needing ever larger datasets. We note that there is risk associated with LLM assistants taking action in the real world, while agentic LLMs are also likely to benefit society.

Via

Access Paper or Ask Questions

The Impacts of AI Avatar Appearance and Disclosure on User Motivation

Jul 31, 2024

Boele Visser, Peter van der Putten, Amirhossein Zohrehvand

Abstract:This study examines the influence of perceived AI features on user motivation in virtual interactions. AI avatars, being disclosed as being an AI, or embodying specific genders, could be used in user-AI interactions. Leveraging insights from AI and avatar research, we explore how AI disclosure and gender affect user motivation. We conducted a game-based experiment involving over 72,500 participants who solved search problems alone or with an AI companion. Different groups experienced varying AI appearances and disclosures. We measured play intensity. Results revealed that the presence of another avatar led to less intense play compared to solo play. Disclosure of the avatar as AI heightened effort intensity compared to non-disclosed AI companions. Additionally, a masculine AI appearance reduced effort intensity.

* 15 pages, 6 figures, submitted to the 2nd International Conference on Data Science & Artificial Intelligence

Via

Access Paper or Ask Questions

ChatGPT as a commenter to the news: can LLMs generate human-like opinions?

Dec 21, 2023

Rayden Tseng, Suzan Verberne, Peter van der Putten

Abstract:ChatGPT, GPT-3.5, and other large language models (LLMs) have drawn significant attention since their release, and the abilities of these models have been investigated for a wide variety of tasks. In this research we investigate to what extent GPT-3.5 can generate human-like comments on Dutch news articles. We define human likeness as `not distinguishable from human comments', approximated by the difficulty of automatic classification between human and GPT comments. We analyze human likeness across multiple prompting techniques. In particular, we utilize zero-shot, few-shot and context prompts, for two generated personas. We found that our fine-tuned BERT models can easily distinguish human-written comments from GPT-3.5 generated comments, with none of the used prompting methods performing noticeably better. We further analyzed that human comments consistently showed higher lexical diversity than GPT-generated comments. This indicates that although generative LLMs can generate fluent text, their capability to create human-like opinionated comments is still limited.

* Published as Tseng, R., Verberne, S., van der Putten, P. (2023). ChatGPT as a Commenter to the News: Can LLMs Generate Human-Like Opinions?. In: Ceolin, D., Caselli, T., Tulin, M. (eds) Disinformation in Open Online Media. MISDOOM 2023. Lecture Notes in Computer Science, vol 14397. Springer, Cham

Via

Access Paper or Ask Questions

Relevance feedback strategies for recall-oriented neural information retrieval

Nov 25, 2023

Timo Kats, Peter van der Putten, Jan Scholtes

Figure 1 for Relevance feedback strategies for recall-oriented neural information retrieval

Figure 2 for Relevance feedback strategies for recall-oriented neural information retrieval

Figure 3 for Relevance feedback strategies for recall-oriented neural information retrieval

Figure 4 for Relevance feedback strategies for recall-oriented neural information retrieval

Abstract:In a number of information retrieval applications (e.g., patent search, literature review, due diligence, etc.), preventing false negatives is more important than preventing false positives. However, approaches designed to reduce review effort (like "technology assisted review") can create false negatives, since they are often based on active learning systems that exclude documents automatically based on user feedback. Therefore, this research proposes a more recall-oriented approach to reducing review effort. More specifically, through iteratively re-ranking the relevance rankings based on user feedback, which is also referred to as relevance feedback. In our proposed method, the relevance rankings are produced by a BERT-based dense-vector search and the relevance feedback is based on cumulatively summing the queried and selected embeddings. Our results show that this method can reduce review effort between 17.85% and 59.04%, compared to a baseline approach (of no feedback), given a fixed recall target

* Preproceedings Benelux Conference for Artificial Intelligence (BNAIC/BENELEARN 2023), Delft, November 8-10, 2023

Via

Access Paper or Ask Questions

Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests

Oct 31, 2023

Max J. van Duijn, Bram M. A. van Dijk, Tom Kouwenhoven, Werner de Valk, Marco R. Spruit, Peter van der Putten

Figure 1 for Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests

Figure 2 for Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests

Figure 3 for Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests

Figure 4 for Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests

Abstract:To what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this emerging debate by (i) testing 11 base- and instruction-tuned LLMs on capabilities relevant to ToM beyond the dominant false-belief paradigm, including non-literal language usage and recursive intentionality; (ii) using newly rewritten versions of standardized tests to gauge LLMs' robustness; (iii) prompting and scoring for open besides closed questions; and (iv) benchmarking LLM performance against that of children aged 7-10 on the same tasks. We find that instruction-tuned LLMs from the GPT family outperform other models, and often also children. Base-LLMs are mostly unable to solve ToM tasks, even with specialized prompting. We suggest that the interlinked evolution and development of language and ToM may help explain what instruction-tuning adds: rewarding cooperative communication that takes into account interlocutor and context. We conclude by arguing for a nuanced perspective on ToM in LLMs.

* 14 pages, 4 figures, Forthcoming in Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

Via

Access Paper or Ask Questions

The COVMis-Stance dataset: Stance Detection on Twitter for COVID-19 Misinformation

Apr 05, 2022

Yanfang Hou, Peter van der Putten, Suzan Verberne

Figure 1 for The COVMis-Stance dataset: Stance Detection on Twitter for COVID-19 Misinformation

Figure 2 for The COVMis-Stance dataset: Stance Detection on Twitter for COVID-19 Misinformation

Figure 3 for The COVMis-Stance dataset: Stance Detection on Twitter for COVID-19 Misinformation

Figure 4 for The COVMis-Stance dataset: Stance Detection on Twitter for COVID-19 Misinformation

Abstract:During the COVID-19 pandemic, large amounts of COVID-19 misinformation are spreading on social media. We are interested in the stance of Twitter users towards COVID-19 misinformation. However, due to the relative recent nature of the pandemic, only a few stance detection datasets fit our task. We have constructed a new stance dataset consisting of 2631 tweets annotated with the stance towards COVID-19 misinformation. In contexts with limited labeled data, we fine-tune our models by leveraging the MNLI dataset and two existing stance detection datasets (RumourEval and COVIDLies), and evaluate the model performance on our dataset. Our experimental results show that the model performs the best when fine-tuned sequentially on the MNLI dataset and the combination of the undersampled RumourEval and COVIDLies datasets. Our code and dataset are publicly available at https://github.com/yanfangh/covid-rumor-stance

Via

Access Paper or Ask Questions

Distinguishing Commercial from Editorial Content in News

Nov 06, 2021

Timo Kats, Peter van der Putten, Jasper Schelling

Figure 1 for Distinguishing Commercial from Editorial Content in News

Figure 2 for Distinguishing Commercial from Editorial Content in News

Figure 3 for Distinguishing Commercial from Editorial Content in News

Figure 4 for Distinguishing Commercial from Editorial Content in News

Abstract:How can we distinguish commercial from editorial content in news, or more specifically, differentiate between advertorials and regular news articles? An advertorial is a commercial message written and formatted as an article, making it harder for readers to recognize these as advertising, despite the use of disclaimers. In our research we aim to differentiate the two using a machine learning model, and a lexicon derived from it. This was accomplished by scraping 1.000 articles and 1.000 advertorials from four different Dutch news sources and classifying these based on textual features. With this setup our most successful machine learning model had an accuracy of just over $90\%$. To generate additional insights into differences between news and advertorial language, we also analyzed model coefficients and explored the corpus through co-occurrence networks and t-SNE graphs.

* 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021), Luxembourg, November 10-12, 2021

Via

Access Paper or Ask Questions

Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

Jul 28, 2021

Michail Tsiaousis, Gertjan Burghouts, Fieke Hillerström, Peter van der Putten

Figure 1 for Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

Figure 2 for Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

Figure 3 for Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

Figure 4 for Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

Abstract:The dominant paradigm in spatiotemporal action detection is to classify actions using spatiotemporal features learned by 2D or 3D Convolutional Networks. We argue that several actions are characterized by their context, such as relevant objects and actors present in the video. To this end, we introduce an architecture based on self-attention and Graph Convolutional Networks in order to model contextual cues, such as actor-actor and actor-object interactions, to improve human action detection in video. We are interested in achieving this in a weakly-supervised setting, i.e. using as less annotations as possible in terms of action bounding boxes. Our model aids explainability by visualizing the learned context as an attention map, even for actions and objects unseen during training. We evaluate how well our model highlights the relevant context by introducing a quantitative metric based on recall of objects retrieved by attention maps. Our model relies on a 3D convolutional RGB stream, and does not require expensive optical flow computation. We evaluate our models on the DALY dataset, which consists of human-object interaction actions. Experimental results show that our contextualized approach outperforms a baseline action detection approach by more than 2 points in Video-mAP. Code is available at \url{https://github.com/micts/acgcn}

* International Workshop on Deep Learning for Human-Centric Activity Understanding (DL-HAU2020), January 11, 2021
* Paper presented at the International Workshop on Deep Learning for Human-Centric Activity Understanding (DL-HAU2020), January 11, 2021

Via

Access Paper or Ask Questions

Sign and Search: Sign Search Functionality for Sign Language Lexica

Jul 28, 2021

Manolis Fragkiadakis, Peter van der Putten

Figure 1 for Sign and Search: Sign Search Functionality for Sign Language Lexica

Figure 2 for Sign and Search: Sign Search Functionality for Sign Language Lexica

Figure 3 for Sign and Search: Sign Search Functionality for Sign Language Lexica

Figure 4 for Sign and Search: Sign Search Functionality for Sign Language Lexica

Abstract:Sign language lexica are a useful resource for researchers and people learning sign languages. Current implementations allow a user to search a sign either by its gloss or by selecting its primary features such as handshape and location. This study focuses on exploring a reverse search functionality where a user can sign a query sign in front of a webcam and retrieve a set of matching signs. By extracting different body joints combinations (upper body, dominant hand's arm and wrist) using the pose estimation framework OpenPose, we compare four techniques (PCA, UMAP, DTW and Euclidean distance) as distance metrics between 20 query signs, each performed by eight participants on a 1200 sign lexicon. The results show that UMAP and DTW can predict a matching sign with an 80\% and 71\% accuracy respectively at the top-20 retrieved signs using the movement of the dominant hand arm. Using DTW and adding more sign instances from other participants in the lexicon, the accuracy can be raised to 90\% at the top-10 ranking. Our results suggest that our methodology can be used with no training in any sign language lexicon regardless of its size.

* Accepted for the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (ATS4SSL), August 20, 2021

Via

Access Paper or Ask Questions