Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joy Bose

A Hybrid Framework for Real-Time Data Drift and Anomaly Identification Using Hierarchical Temporal Memory and Statistical Tests

Apr 24, 2025

Subhadip Bandyopadhyay, Joy Bose, Sujoy Roy Chowdhury

Abstract:Data Drift is the phenomenon where the generating model behind the data changes over time. Due to data drift, any model built on the past training data becomes less relevant and inaccurate over time. Thus, detecting and controlling for data drift is critical in machine learning models. Hierarchical Temporal Memory (HTM) is a machine learning model developed by Jeff Hawkins, inspired by how the human brain processes information. It is a biologically inspired model of memory that is similar in structure to the neocortex, and whose performance is claimed to be comparable to state of the art models in detecting anomalies in time series data. Another unique benefit of HTMs is its independence from training and testing cycle; all the learning takes place online with streaming data and no separate training and testing cycle is required. In sequential learning paradigm, Sequential Probability Ratio Test (SPRT) offers some unique benefit for online learning and inference. This paper proposes a novel hybrid framework combining HTM and SPRT for real-time data drift detection and anomaly identification. Unlike existing data drift methods, our approach eliminates frequent retraining and ensures low false positive rates. HTMs currently work with one dimensional or univariate data. In a second study, we also propose an application of HTM in multidimensional supervised scenario for anomaly detection by combining the outputs of multiple HTM columns, one for each dimension of the data, through a neural network. Experimental evaluations demonstrate that the proposed method outperforms conventional drift detection techniques like the Kolmogorov-Smirnov (KS) test, Wasserstein distance, and Population Stability Index (PSI) in terms of accuracy, adaptability, and computational efficiency. Our experiments also provide insights into optimizing hyperparameters for real-time deployment in domains such as Telecom.

* International Journal of Mathematical, Engineering and Management Sciences, Vol. 10, No. 3, 777-796, 2025
* 26 pages, 9 figures

Via

Access Paper or Ask Questions

Static Program Analysis Guided LLM Based Unit Test Generation

Mar 07, 2025

Sujoy Roychowdhury, Giriprasad Sridhara, A K Raghavan, Joy Bose, Sourav Mazumdar, Hamender Singh, Srinivasan Bajji Sugumaran, Ricardo Britto

Figure 1 for Static Program Analysis Guided LLM Based Unit Test Generation

Figure 2 for Static Program Analysis Guided LLM Based Unit Test Generation

Figure 3 for Static Program Analysis Guided LLM Based Unit Test Generation

Figure 4 for Static Program Analysis Guided LLM Based Unit Test Generation

Abstract:We describe a novel approach to automating unit test generation for Java methods using large language models (LLMs). Existing LLM-based approaches rely on sample usage(s) of the method to test (focal method) and/or provide the entire class of the focal method as input prompt and context. The former approach is often not viable due to the lack of sample usages, especially for newly written focal methods. The latter approach does not scale well enough; the bigger the complexity of the focal method and larger associated class, the harder it is to produce adequate test code (due to factors such as exceeding the prompt and context lengths of the underlying LLM). We show that augmenting prompts with \emph{concise} and \emph{precise} context information obtained by program analysis %of the focal method increases the effectiveness of generating unit test code through LLMs. We validate our approach on a large commercial Java project and a popular open-source Java project.

Via

Access Paper or Ask Questions

Modeling Effect of Lockdowns and Other Effects on India Covid-19 Infections Using SEIR Model and Machine Learning

Oct 04, 2021

Sathiyanarayanan Sampath, Joy Bose

Figure 1 for Modeling Effect of Lockdowns and Other Effects on India Covid-19 Infections Using SEIR Model and Machine Learning

Figure 2 for Modeling Effect of Lockdowns and Other Effects on India Covid-19 Infections Using SEIR Model and Machine Learning

Figure 3 for Modeling Effect of Lockdowns and Other Effects on India Covid-19 Infections Using SEIR Model and Machine Learning

Figure 4 for Modeling Effect of Lockdowns and Other Effects on India Covid-19 Infections Using SEIR Model and Machine Learning

Abstract:The SEIR model is a widely used epidemiological model used to predict the rise in infections. This model has been widely used in different countries to predict the number of Covid-19 cases. But the original SEIR model does not take into account the effect of factors such as lockdowns, vaccines, and re-infections. In India the first wave of Covid started in March 2020 and the second wave in April 2021. In this paper, we modify the SEIR model equations to model the effect of lockdowns and other influencers, and fit the model on data of the daily Covid-19 infections in India using lmfit, a python library for least squares minimization for curve fitting. We modify R0 parameter in the standard SEIR model as a rectangle in order to account for the effect of lockdowns. Our modified SEIR model accurately fits the available data of infections.

* 6 pages, 8 figures

Via

Access Paper or Ask Questions

Sparse Distributed Memory using Spiking Neural Networks on Nengo

Sep 07, 2021

Rohan Deepak Ajwani, Arshika Lalan, Basabdatta Sen Bhattacharya, Joy Bose

Figure 1 for Sparse Distributed Memory using Spiking Neural Networks on Nengo

Figure 2 for Sparse Distributed Memory using Spiking Neural Networks on Nengo

Figure 3 for Sparse Distributed Memory using Spiking Neural Networks on Nengo

Figure 4 for Sparse Distributed Memory using Spiking Neural Networks on Nengo

Abstract:We present a Spiking Neural Network (SNN) based Sparse Distributed Memory (SDM) implemented on the Nengo framework. We have based our work on previous work by Furber et al, 2004, implementing SDM using N-of-M codes. As an integral part of the SDM design, we have implemented Correlation Matrix Memory (CMM) using SNN on Nengo. Our SNN implementation uses Leaky Integrate and Fire (LIF) spiking neuron models on Nengo. Our objective is to understand how well SNN-based SDMs perform in comparison to conventional SDMs. Towards this, we have simulated both conventional and SNN-based SDM and CMM on Nengo. We observe that SNN-based models perform similarly as the conventional ones. In order to evaluate the performance of different SNNs, we repeated the experiment using Adaptive-LIF, Spiking Rectified Linear Unit, and Izhikevich models and obtained similar results. We conclude that it is indeed feasible to develop some types of associative memories using spiking neurons whose memory capacity and other features are similar to the performance without SNNs. Finally we have implemented an application where MNIST images, encoded with N-of-M codes, are associated with their labels and stored in the SNN-based SDM.

* 8 pages, 11 figures, accepted as poster in Bernstein Conference 2021

Via

Access Paper or Ask Questions

Field Label Prediction for Autofill in Web Browsers

Dec 17, 2019

Joy Bose

Figure 1 for Field Label Prediction for Autofill in Web Browsers

Figure 2 for Field Label Prediction for Autofill in Web Browsers

Figure 3 for Field Label Prediction for Autofill in Web Browsers

Figure 4 for Field Label Prediction for Autofill in Web Browsers

Abstract:Automatic form fill is an important productivity related feature present in major web browsers, which predicts the field labels of a web form and automatically fills values in a new form based on the values previously filled for the same field in other forms. This feature increases the convenience and efficiency of users who have to fill similar information in fields in multiple forms. In this paper we describe a machine learning solution for predicting the form field labels, implemented as a web service using Azure ML Studio.

* 3 pages, 5 figures

Via

Access Paper or Ask Questions

Analysis of Software Engineering for Agile Machine Learning Projects

Dec 16, 2019

Kushal Singla, Joy Bose, Chetan Naik

Figure 1 for Analysis of Software Engineering for Agile Machine Learning Projects

Figure 2 for Analysis of Software Engineering for Agile Machine Learning Projects

Figure 3 for Analysis of Software Engineering for Agile Machine Learning Projects

Figure 4 for Analysis of Software Engineering for Agile Machine Learning Projects

Abstract:The number of machine learning, artificial intelligence or data science related software engineering projects using Agile methodology is increasing. However, there are very few studies on how such projects work in practice. In this paper, we analyze project issues tracking data taken from Scrum (a popular tool for Agile) for several machine learning projects. We compare this data with corresponding data from non-machine learning projects, in an attempt to analyze how machine learning projects are executed differently from normal software engineering projects. On analysis, we find that machine learning project issues use different kinds of words to describe issues, have higher number of exploratory or research oriented tasks as compared to implementation tasks, and have a higher number of issues in the product backlog after each sprint, denoting that it is more difficult to estimate the duration of machine learning project related tasks in advance. After analyzing this data, we propose a few ways in which Agile machine learning projects can be better logged and executed, given their differences with normal software engineering projects.

* 5 pages, 8 figures , INDICON conference

Via

Access Paper or Ask Questions

Evaluating Usage of Images for App Classification

Dec 16, 2019

Kushal Singla, Niloy Mukherjee, Hari Manassery Koduvely, Joy Bose

Figure 1 for Evaluating Usage of Images for App Classification

Figure 2 for Evaluating Usage of Images for App Classification

Figure 3 for Evaluating Usage of Images for App Classification

Figure 4 for Evaluating Usage of Images for App Classification

Abstract:App classification is useful in a number of applications such as adding apps to an app store or building a user model based on the installed apps. Presently there are a number of existing methods to classify apps based on a given taxonomy on the basis of their text metadata. However, text based methods for app classification may not work in all cases, such as when the text descriptions are in a different language, or missing, or inadequate to classify the app. One solution in such cases is to utilize the app images to supplement the text description. In this paper, we evaluate a number of approaches in which app images can be used to classify the apps. In one approach, we use Optical character recognition (OCR) to extract text from images, which is then used to supplement the text description of the app. In another, we use pic2vec to convert the app images into vectors, then train an SVM to classify the vectors to the correct app label. In another, we use the captionbot.ai tool to generate natural language descriptions from the app images. Finally, we use a method to detect and label objects in the app images and use a voting technique to determine the category of the app based on all the images. We compare the performance of our image-based techniques to classify a number of apps in our dataset. We use a text based SVM app classifier as our base and obtained an improved classification accuracy of 96% for some classes when app images are added.

* 5 pages, 3 figures, 3 tables, INDICON conference

Via

Access Paper or Ask Questions

Semi-Supervised Method using Gaussian Random Fields for Boilerplate Removal in Web Browsers

Nov 08, 2019

Joy Bose, Sumanta Mukherjee

Figure 1 for Semi-Supervised Method using Gaussian Random Fields for Boilerplate Removal in Web Browsers

Abstract:Boilerplate removal refers to the problem of removing noisy content from a webpage such as ads and extracting relevant content that can be used by various services. This can be useful in several features in web browsers such as ad blocking, accessibility tools such as read out loud, translation, summarization etc. In order to create a training dataset to train a model for boilerplate detection and removal, labeling or tagging webpage data manually can be tedious and time consuming. Hence, a semi-supervised model, in which some of the webpage elements are labeled manually and labels for others are inferred based on some parameters, can be useful. In this paper we present a solution for extraction of relevant content from a webpage that relies on semi-supervised learning using Gaussian Random Fields. We first represent the webpage as a graph, with text elements as nodes and the edge weights representing similarity between nodes. After this, we label a few nodes in the graph using heuristics and label the remaining nodes by a weighted measure of similarity to the already labeled nodes. We describe the system architecture and a few preliminary results on a dataset of webpages.

* 4 pages, 1 figure, IEEE INDICON conference

Via

Access Paper or Ask Questions

IoT2Vec: Identification of Similar IoT Devices via Activity Footprints

May 21, 2018

Kushal Singla, Joy Bose

Figure 1 for IoT2Vec: Identification of Similar IoT Devices via Activity Footprints

Figure 2 for IoT2Vec: Identification of Similar IoT Devices via Activity Footprints

Figure 3 for IoT2Vec: Identification of Similar IoT Devices via Activity Footprints

Figure 4 for IoT2Vec: Identification of Similar IoT Devices via Activity Footprints

Abstract:We consider a smart home or smart office environment with a number of IoT devices connected and passing data between one another. The footprints of the data transferred can provide valuable information about the devices, which can be used to (a) identify the IoT devices and (b) in case of failure, to identify the correct replacements for these devices. In this paper, we generate the embeddings for IoT devices in a smart home using Word2Vec, and explore the possibility of having a similar concept for IoT devices, aka IoT2Vec. These embeddings can be used in a number of ways, such as to find similar devices in an IoT device store, or as a signature of each type of IoT device. We show results of a feasibility study on the CASAS dataset of IoT device activity logs, using our method to identify the patterns in embeddings of various types of IoT devices in a household.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions