Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tomas Cerny

Generative AI for Software Architecture. Applications, Trends, Challenges, and Future Directions

Mar 17, 2025

Matteo Esposito, Xiaozhou Li, Sergio Moreschini, Noman Ahmad, Tomas Cerny, Karthik Vaidhyanathan, Valentina Lenarduzzi, Davide Taibi

Abstract:Context: Generative Artificial Intelligence (GenAI) is transforming much of software development, yet its application in software architecture is still in its infancy, and no prior study has systematically addressed the topic. Aim: We aim to systematically synthesize the use, rationale, contexts, usability, and future challenges of GenAI in software architecture. Method: We performed a multivocal literature review (MLR), analyzing peer-reviewed and gray literature, identifying current practices, models, adoption contexts, and reported challenges, extracting themes via open coding. Results: Our review identified significant adoption of GenAI for architectural decision support and architectural reconstruction. OpenAI GPT models are predominantly applied, and there is consistent use of techniques such as few-shot prompting and retrieved-augmented generation (RAG). GenAI has been applied mostly to initial stages of the Software Development Life Cycle (SDLC), such as Requirements-to-Architecture and Architecture-to-Code. Monolithic and microservice architectures were the dominant targets. However, rigorous testing of GenAI outputs was typically missing from the studies. Among the most frequent challenges are model precision, hallucinations, ethical aspects, privacy issues, lack of architecture-specific datasets, and the absence of sound evaluation frameworks. Conclusions: GenAI shows significant potential in software design, but several challenges remain on its path to greater adoption. Research efforts should target designing general evaluation methodologies, handling ethics and precision, increasing transparency and explainability, and promoting architecture-specific datasets and benchmarks to bridge the gap between theoretical possibilities and practical use.

Via

Access Paper or Ask Questions

Bottleneck-based Encoder-decoder ARchitecture (BEAR) for Learning Unbiased Consumer-to-Consumer Image Representations

Sep 10, 2024

Pablo Rivas, Gisela Bichler, Tomas Cerny, Laurie Giddens, Stacie Petter

Figure 1 for Bottleneck-based Encoder-decoder ARchitecture (BEAR) for Learning Unbiased Consumer-to-Consumer Image Representations

Figure 2 for Bottleneck-based Encoder-decoder ARchitecture (BEAR) for Learning Unbiased Consumer-to-Consumer Image Representations

Figure 3 for Bottleneck-based Encoder-decoder ARchitecture (BEAR) for Learning Unbiased Consumer-to-Consumer Image Representations

Figure 4 for Bottleneck-based Encoder-decoder ARchitecture (BEAR) for Learning Unbiased Consumer-to-Consumer Image Representations

Abstract:Unbiased representation learning is still an object of study under specific applications and contexts. Novel architectures are usually crafted to resolve particular problems using mixtures of fundamental pieces. This paper presents different image feature extraction mechanisms that work together with residual connections to encode perceptual image information in an autoencoder configuration. We use image data that aims to support a larger research agenda dealing with issues regarding criminal activity in consumer-to-consumer online platforms. Preliminary results suggest that the proposed architecture can learn rich spaces using ours and other image datasets resolving important challenges that are identified.

* 2022 LXAI Workshop at the 39th International Conference on Machine Learning (ICML), Baltimore, Maryland

Via

Access Paper or Ask Questions

Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

May 30, 2024

Ernesto Quevedo, Jorge Yero, Rachel Koerner, Pablo Rivas, Tomas Cerny

Figure 1 for Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

Figure 2 for Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

Figure 3 for Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

Figure 4 for Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach

Abstract:Concerns regarding the propensity of Large Language Models (LLMs) to produce inaccurate outputs, also known as hallucinations, have escalated. Detecting them is vital for ensuring the reliability of applications relying on LLM-generated content. Current methods often demand substantial resources and rely on extensive LLMs or employ supervised learning with multidimensional features or intricate linguistic and semantic analyses difficult to reproduce and largely depend on using the same LLM that hallucinated. This paper introduces a supervised learning approach employing two simple classifiers utilizing only four numerical features derived from tokens and vocabulary probabilities obtained from other LLM evaluators, which are not necessarily the same. The method yields promising results, surpassing state-of-the-art outcomes in multiple tasks across three different benchmarks. Additionally, we provide a comprehensive examination of the strengths and weaknesses of our approach, highlighting the significance of the features utilized and the LLM employed as an evaluator. We have released our code publicly at https://github.com/Baylor-AI/HalluDetect.

* ICAI'24 - The 26th Int'l Conf on Artificial Intelligence

Via

Access Paper or Ask Questions

On the Challenges of Creating Datasets for Analyzing Commercial Sex Advertisements to Assess Human Trafficking Risk and Organized Activity

May 22, 2024

Pablo Rivas, Tomas Cerny, Alejandro Rodriguez Perez, Javier Turek, Laurie Giddens, Gisela Bichler, Stacie Petter

Abstract:Our study addresses the challenges of building datasets to understand the risks associated with organized activities and human trafficking through commercial sex advertisements. These challenges include data scarcity, rapid obsolescence, and privacy concerns. Traditional approaches, which are not automated and are difficult to reproduce, fall short in addressing these issues. We have developed a reproducible and automated methodology to analyze five million advertisements. In the process, we identified further challenges in dataset creation within this sensitive domain. This paper presents a streamlined methodology to assist researchers in constructing effective datasets for combating organized crime, allowing them to focus on advancing detection technologies.

* LXAI Workshop at the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)

Via

Access Paper or Ask Questions