Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andriy Miranskyy

Assessing how hyperparameters impact Large Language Models' sarcasm detection performance

Apr 08, 2025

Montgomery Gole, Andriy Miranskyy

Abstract:Sarcasm detection is challenging for both humans and machines. This work explores how model characteristics impact sarcasm detection in OpenAI's GPT, and Meta's Llama-2 models, given their strong natural language understanding, and popularity. We evaluate fine-tuned and zero-shot models across various sizes, releases, and hyperparameters. Experiments were conducted on the political and balanced (pol-bal) portion of the popular Self-Annotated Reddit Corpus (SARC2.0) sarcasm dataset. Fine-tuned performance improves monotonically with model size within a model family, while hyperparameter tuning also impacts performance. In the fine-tuning scenario, full precision Llama-2-13b achieves state-of-the-art accuracy and $F_1$-score, both measured at 0.83, comparable to average human performance. In the zero-shot setting, one GPT-4 model achieves competitive performance to prior attempts, yielding an accuracy of 0.70 and an $F_1$-score of 0.75. Furthermore, a model's performance may increase or decline with each release, highlighting the need to reassess performance after each release.

Via

Access Paper or Ask Questions

Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

Nov 13, 2024

Mohammad Saiful Islam, Mohamed Sami Rakha, William Pourmajidi, Janakan Sivaloganathan, John Steinbacher, Andriy Miranskyy

Figure 1 for Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

Figure 2 for Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

Figure 3 for Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

Figure 4 for Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

Abstract:As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. Additionally, we demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process. This study and the accompanying dataset provide a resource for researchers and practitioners in cloud system monitoring. It facilitates more efficient testing of anomaly detection methods in real-world data, helping to advance the development of robust solutions to maintain the health and performance of large-scale cloud infrastructures.

Via

Access Paper or Ask Questions

Automating Quantum Software Maintenance: Flakiness Detection and Root Cause Analysis

Oct 31, 2024

Janakan Sivaloganathan, Ainaz Jamshidi, Andriy Miranskyy, Lei Zhang

Abstract:Flaky tests, which pass or fail inconsistently without code changes, are a major challenge in software engineering in general and in quantum software engineering in particular due to their complexity and probabilistic nature, leading to hidden issues and wasted developer effort. We aim to create an automated framework to detect flaky tests in quantum software and an extended dataset of quantum flaky tests, overcoming the limitations of manual methods. Building on prior manual analysis of 14 quantum software repositories, we expanded the dataset and automated flaky test detection using transformers and cosine similarity. We conducted experiments with Large Language Models (LLMs) from the OpenAI GPT and Meta LLaMA families to assess their ability to detect and classify flaky tests from code and issue descriptions. Embedding transformers proved effective: we identified 25 new flaky tests, expanding the dataset by 54%. Top LLMs achieved an F1-score of 0.8871 for flakiness detection but only 0.5839 for root cause identification. We introduced an automated flaky test detection framework using machine learning, showing promising results but highlighting the need for improved root cause detection and classification in large quantum codebases. Future work will focus on improving detection techniques and developing automatic flaky test fixes.

* 5 pages, 1 figure

Via

Access Paper or Ask Questions

On Using Quasirandom Sequences in Machine Learning for Model Weight Initialization

Aug 05, 2024

Andriy Miranskyy, Adam Sorrenti, Viral Thakar

Abstract:The effectiveness of training neural networks directly impacts computational costs, resource allocation, and model development timelines in machine learning applications. An optimizer's ability to train the model adequately (in terms of trained model performance) depends on the model's initial weights. Model weight initialization schemes use pseudorandom number generators (PRNGs) as a source of randomness. We investigate whether substituting PRNGs for low-discrepancy quasirandom number generators (QRNGs) -- namely Sobol' sequences -- as a source of randomness for initializers can improve model performance. We examine Multi-Layer Perceptrons (MLP), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Transformer architectures trained on MNIST, CIFAR-10, and IMDB datasets using SGD and Adam optimizers. Our analysis uses ten initialization schemes: Glorot, He, Lecun (both Uniform and Normal); Orthogonal, Random Normal, Truncated Normal, and Random Uniform. Models with weights set using PRNG- and QRNG-based initializers are compared pairwise for each combination of dataset, architecture, optimizer, and initialization scheme. Our findings indicate that QRNG-based neural network initializers either reach a higher accuracy or achieve the same accuracy more quickly than PRNG-based initializers in 60% of the 120 experiments conducted. Thus, using QRNG-based initializers instead of PRNG-based initializers can speed up and improve model training.

Via

Access Paper or Ask Questions

On Sarcasm Detection with OpenAI GPT-based Models

Dec 07, 2023

Montgomery Gole, Williams-Paul Nwadiugwu, Andriy Miranskyy

Abstract:Sarcasm is a form of irony that requires readers or listeners to interpret its intended meaning by considering context and social cues. Machine learning classification models have long had difficulty detecting sarcasm due to its social complexity and contradictory nature. This paper explores the applications of the Generative Pretrained Transformer (GPT) models, including GPT-3, InstructGPT, GPT-3.5, and GPT-4, in detecting sarcasm in natural language. It tests fine-tuned and zero-shot models of different sizes and releases. The GPT models were tested on the political and balanced (pol-bal) portion of the popular Self-Annotated Reddit Corpus (SARC 2.0) sarcasm dataset. In the fine-tuning case, the largest fine-tuned GPT-3 model achieves accuracy and $F_1$-score of 0.81, outperforming prior models. In the zero-shot case, one of GPT-4 models yields an accuracy of 0.70 and $F_1$-score of 0.75. Other models score lower. Additionally, a model's performance may improve or deteriorate with each release, highlighting the need to reassess performance after each release.

Via

Access Paper or Ask Questions

EP-PQM: Efficient Parametric Probabilistic Quantum Memory with Fewer Qubits and Gates

Jan 10, 2022

Mushahid Khan, Jean Paul Latyr Faye, Udson C. Mendes, Andriy Miranskyy

Figure 1 for EP-PQM: Efficient Parametric Probabilistic Quantum Memory with Fewer Qubits and Gates

Figure 2 for EP-PQM: Efficient Parametric Probabilistic Quantum Memory with Fewer Qubits and Gates

Figure 3 for EP-PQM: Efficient Parametric Probabilistic Quantum Memory with Fewer Qubits and Gates

Figure 4 for EP-PQM: Efficient Parametric Probabilistic Quantum Memory with Fewer Qubits and Gates

Abstract:Machine learning (ML) classification tasks can be carried out on a quantum computer (QC) using Probabilistic Quantum Memory (PQM) and its extension, Parameteric PQM (P-PQM) by calculating the Hamming distance between an input pattern and a database of $r$ patterns containing $z$ features with $a$ distinct attributes. For accurate computations, the feature must be encoded using one-hot encoding, which is memory-intensive for multi-attribute datasets with $a>2$. We can easily represent multi-attribute data more compactly on a classical computer by replacing one-hot encoding with label encoding. However, replacing these encoding schemes on a QC is not straightforward as PQM and P-PQM operate at the quantum bit level. We present an enhanced P-PQM, called EP-PQM, that allows label encoding of data stored in a PQM data structure and reduces the circuit depth of the data storage and retrieval procedures. We show implementations for an ideal QC and a noisy intermediate-scale quantum (NISQ) device. Our complexity analysis shows that the EP-PQM approach requires $O\left(z \log_2(a)\right)$ qubits as opposed to $O(za)$ qubits for P-PQM. EP-PQM also requires fewer gates, reducing gate count from $O\left(rza\right)$ to $O\left(rz\log_2(a)\right)$. For five datasets, we demonstrate that training an ML classification model using EP-PQM requires 48% to 77% fewer qubits than P-PQM for datasets with $a>2$. EP-PQM reduces circuit depth in the range of 60% to 96%, depending on the dataset. The depth decreases further with a decomposed circuit, ranging between 94% and 99%. EP-PQM requires less space; thus, it can train on and classify larger datasets than previous PQM implementations on NISQ devices. Furthermore, reducing the number of gates speeds up the classification and reduces the noise associated with deep quantum circuits. Thus, EP-PQM brings us closer to scalable ML on a NISQ device.

Via

Access Paper or Ask Questions

Term Interrelations and Trends in Software Engineering

Aug 21, 2021

Janusan Baskararajah, Lei Zhang, Andriy Miranskyy

Figure 1 for Term Interrelations and Trends in Software Engineering

Figure 2 for Term Interrelations and Trends in Software Engineering

Abstract:The Software Engineering (SE) community is prolific, making it challenging for experts to keep up with the flood of new papers and for neophytes to enter the field. Therefore, we posit that the community may benefit from a tool extracting terms and their interrelations from the SE community's text corpus and showing terms' trends. In this paper, we build a prototyping tool using the word embedding technique. We train the embeddings on the SE Body of Knowledge handbook and 15,233 research papers' titles and abstracts. We also create test cases necessary for validation of the training of the embeddings. We provide representative examples showing that the embeddings may aid in summarizing terms and uncovering trends in the knowledge base.

* In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021), pp. 1471-1474
* Shortened version appeared in proceedings of ESEC/FSE 2021

Via

Access Paper or Ask Questions

On Automatic Parsing of Log Records

Feb 12, 2021

Jared Rand, Andriy Miranskyy

Figure 1 for On Automatic Parsing of Log Records

Figure 2 for On Automatic Parsing of Log Records

Figure 3 for On Automatic Parsing of Log Records

Figure 4 for On Automatic Parsing of Log Records

Abstract:Software log analysis helps to maintain the health of software solutions and ensure compliance and security. Existing software systems consist of heterogeneous components emitting logs in various formats. A typical solution is to unify the logs using manually built parsers, which is laborious. Instead, we explore the possibility of automating the parsing task by employing machine translation (MT). We create a tool that generates synthetic Apache log records which we used to train recurrent-neural-network-based MT models. Models' evaluation on real-world logs shows that the models can learn Apache log format and parse individual log records. The median relative edit distance between an actual real-world log record and the MT prediction is less than or equal to 28%. Thus, we show that log parsing using an MT approach is promising.

* Shortened version accepted for publication in Proceedings of the 43rd International Conference on Software Engineering: New Ideas and Emerging Results, 2021

Via

Access Paper or Ask Questions

Anomaly Detection in a Large-scale Cloud Platform

Oct 21, 2020

Mohammad Saiful Islam, William Pourmajidi, Lei Zhang, John Steinbacher, Tony Erwin, Andriy Miranskyy

Figure 1 for Anomaly Detection in a Large-scale Cloud Platform

Figure 2 for Anomaly Detection in a Large-scale Cloud Platform

Figure 3 for Anomaly Detection in a Large-scale Cloud Platform

Figure 4 for Anomaly Detection in a Large-scale Cloud Platform

Abstract:Cloud computing is ubiquitous: more and more companies are moving the workloads into the Cloud. However, this rise in popularity challenges Cloud service providers, as they need to monitor the quality of their ever-growing offerings effectively. To address the challenge, we designed and implemented an automated monitoring system for the IBM Cloud Platform. This monitoring system utilizes deep learning neural networks to detect anomalies in near-real-time in multiple Platform components simultaneously. After running the system for a year, we observed that the proposed solution frees the DevOps team's time and human resources from manually monitoring thousands of Cloud components. Moreover, it increases customer satisfaction by reducing the risk of Cloud outages. In this paper, we share our solutions' architecture, implementation notes, and best practices that emerged while evolving the monitoring system. They can be leveraged by other researchers and practitioners to build anomaly detectors for complex systems.

Via

Access Paper or Ask Questions

Anomaly Detection in Cloud Components

May 18, 2020

Mohammad Saiful Islam, Andriy Miranskyy

Figure 1 for Anomaly Detection in Cloud Components

Abstract:Cloud platforms, under the hood, consist of a complex inter-connected stack of hardware and software components. Each of these components can fail which may lead to an outage. Our goal is to improve the quality of Cloud services through early detection of such failures by analyzing resource utilization metrics. We tested Gated-Recurrent-Unit-based autoencoder with a likelihood function to detect anomalies in various multi-dimensional time series and achieved high performance.

* Accepted for publication in Proceedings of the IEEE International Conference on Cloud Computing (CLOUD 2020)

Via

Access Paper or Ask Questions