Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Deepak Kumar

Use of Metric Learning for the Recognition of Handwritten Digits, and its Application to Increase the Outreach of Voice-based Communication Platforms

Apr 26, 2025

Devesh Pant, Dibyendu Talukder, Deepak Kumar, Rachit Pandey, Aaditeshwar Seth, Chetan Arora

Abstract:Initiation, monitoring, and evaluation of development programmes can involve field-based data collection about project activities. This data collection through digital devices may not always be feasible though, for reasons such as unaffordability of smartphones and tablets by field-based cadre, or shortfalls in their training and capacity building. Paper-based data collection has been argued to be more appropriate in several contexts, with automated digitization of the paper forms through OCR (Optical Character Recognition) and OMR (Optical Mark Recognition) techniques. We contribute with providing a large dataset of handwritten digits, and deep learning based models and methods built using this data, that are effective in real-world environments. We demonstrate the deployment of these tools in the context of a maternal and child health and nutrition awareness project, which uses IVR (Interactive Voice Response) systems to provide awareness information to rural women SHG (Self Help Group) members in north India. Paper forms were used to collect phone numbers of the SHG members at scale, which were digitized using the OCR tools developed by us, and used to push almost 4 million phone calls. The data, model, and code have been released in the open-source domain.

* COMPASS 2022: Proceedings of the 5th ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies COMPASS '22: Proceedings of the 5th ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies, Pages 364 - 374
* 10 Pages, 7 Figures, ACM COMPASS 2022

Via

Access Paper or Ask Questions

Explainable, Multi-modal Wound Infection Classification from Images Augmented with Generated Captions

Feb 27, 2025

Palawat Busaranuvong, Emmanuel Agu, Reza Saadati Fard, Deepak Kumar, Shefalika Gautam, Bengisu Tulu, Diane Strong

Abstract:Infections in Diabetic Foot Ulcers (DFUs) can cause severe complications, including tissue death and limb amputation, highlighting the need for accurate, timely diagnosis. Previous machine learning methods have focused on identifying infections by analyzing wound images alone, without utilizing additional metadata such as medical notes. In this study, we aim to improve infection detection by introducing Synthetic Caption Augmented Retrieval for Wound Infection Detection (SCARWID), a novel deep learning framework that leverages synthetic textual descriptions to augment DFU images. SCARWID consists of two components: (1) Wound-BLIP, a Vision-Language Model (VLM) fine-tuned on GPT-4o-generated descriptions to synthesize consistent captions from images; and (2) an Image-Text Fusion module that uses cross-attention to extract cross-modal embeddings from an image and its corresponding Wound-BLIP caption. Infection status is determined by retrieving the top-k similar items from a labeled support set. To enhance the diversity of training data, we utilized a latent diffusion model to generate additional wound images. As a result, SCARWID outperformed state-of-the-art models, achieving average sensitivity, specificity, and accuracy of 0.85, 0.78, and 0.81, respectively, for wound infection classification. Displaying the generated captions alongside the wound images and infection detection results enhances interpretability and trust, enabling nurses to align SCARWID outputs with their medical knowledge. This is particularly valuable when wound notes are unavailable or when assisting novice nurses who may find it difficult to identify visual attributes of wound infection.

Via

Access Paper or Ask Questions

Uplink Rate Splitting Multiple Access with Imperfect Channel State Information and Interference Cancellation

Jan 31, 2025

Farjam Karim, Nurul Huda Mahmood, Arthur S. de Sena, Deepak Kumar, Bruno Clerckx, Matti Latva-aho

Figure 1 for Uplink Rate Splitting Multiple Access with Imperfect Channel State Information and Interference Cancellation

Figure 2 for Uplink Rate Splitting Multiple Access with Imperfect Channel State Information and Interference Cancellation

Figure 3 for Uplink Rate Splitting Multiple Access with Imperfect Channel State Information and Interference Cancellation

Abstract:This article investigates the performance of uplink rate splitting multiple access (RSMA) in a two-user scenario, addressing an under-explored domain compared to its downlink counterpart. With the increasing demand for uplink communication in applications like the Internet-of-Things, it is essential to account for practical imperfections, such as inaccuracies in channel state information at the receiver (CSIR) and limitations in successive interference cancellation (SIC), to provide realistic assessments of system performance. Specifically, we derive closed-form expressions for the outage probability, throughput, and asymptotic outage behavior of uplink users, considering imperfect CSIR and SIC. We validate the accuracy of these derived expressions using Monte Carlo simulations. Our findings reveal that at low transmit power levels, imperfect CSIR significantly affects system performance more severely than SIC imperfections. However, as the transmit power increases, the impact of imperfect CSIR diminishes, while the influence of SIC imperfections becomes more pronounced. Moreover, we highlight the impact of the rate allocation factor on user performance. Finally, our comparison with non-orthogonal multiple access (NOMA) highlights the outage performance trade-offs between RSMA and NOMA. RSMA proves to be more effective in managing imperfect CSIR and enhances performance through strategic message splitting, resulting in more robust communication.

Via

Access Paper or Ask Questions

Pretraining Data and Tokenizer for Indic LLM

Jul 17, 2024

Rahul Kumar, Shubham Kakde, Divyansh Rajput, Daud Ibrahim, Rishabh Nahata, Pidathala Sowjanya, Deepak Kumar

Abstract:We present a novel approach to data preparation for developing multilingual Indic large language model. Our meticulous data acquisition spans open-source and proprietary sources, including Common Crawl, Indic books, news articles, and Wikipedia, ensuring a diverse and rich linguistic representation. For each Indic language, we design a custom preprocessing pipeline to effectively eliminate redundant and low-quality text content. Additionally, we perform deduplication on Common Crawl data to address the redundancy present in 70% of the crawled web pages. This study focuses on developing high-quality data, optimizing tokenization for our multilingual dataset for Indic large language models with 3B and 7B parameters, engineered for superior performance in Indic languages. We introduce a novel multilingual tokenizer training strategy, demonstrating our custom-trained Indic tokenizer outperforms the state-of-the-art OpenAI Tiktoken tokenizer, achieving a superior token-to-word ratio for Indic languages.

Via

Access Paper or Ask Questions

End-to-End Waveform and Beamforming Optimization for RF Wireless Power Transfer

May 09, 2024

Abdul Basit Khattak, Onel L. A. López, Amirhossein Azarbahram, Deepak Kumar, Matti Latva-aho

Figure 1 for End-to-End Waveform and Beamforming Optimization for RF Wireless Power Transfer

Figure 2 for End-to-End Waveform and Beamforming Optimization for RF Wireless Power Transfer

Figure 3 for End-to-End Waveform and Beamforming Optimization for RF Wireless Power Transfer

Figure 4 for End-to-End Waveform and Beamforming Optimization for RF Wireless Power Transfer

Abstract:Radio frequency (RF) wireless power transfer (WPT) is a key technology for future low-power wireless systems. However, the inherently low end-to-end power transfer efficiency (PTE) is challenging for practical applications. The main factors contributing to it are the channel losses, transceivers' power consumption, and losses related, e.g., to the digital-to-analog converter (DAC), high-power amplifier, and rectenna. Optimizing PTE requires careful consideration of these factors, motivating the current work. Herein, we consider an analog multi-antenna power transmitter that aims to charge a single energy harvester. We first provide a mathematical framework to calculate the harvested power from multi-tone signal transmissions and the system power consumption. Then, we formulate the joint waveform and analog beamforming design problem to minimize power consumption and meet the charging requirements. Finally, we propose an optimization approach relying on swarm intelligence to solve the specified problem. Simulation results quantify the power consumption reduction as the DAC, phase shifters resolution, and antenna length are increased, while it is seen that increasing system frequency results in higher power consumption.

* Conference

Via

Access Paper or Ask Questions

Guided Conditional Diffusion Classifier (ConDiff) for Enhanced Prediction of Infection in Diabetic Foot Ulcers

May 01, 2024

Palawat Busaranuvong, Emmanuel Agu, Deepak Kumar, Shefalika Gautam, Reza Saadati Fard, Bengisu Tulu, Diane Strong

Abstract:To detect infected wounds in Diabetic Foot Ulcers (DFUs) from photographs, preventing severe complications and amputations. Methods: This paper proposes the Guided Conditional Diffusion Classifier (ConDiff), a novel deep-learning infection detection model that combines guided image synthesis with a denoising diffusion model and distance-based classification. The process involves (1) generating guided conditional synthetic images by injecting Gaussian noise to a guide image, followed by denoising the noise-perturbed image through a reverse diffusion process, conditioned on infection status and (2) classifying infections based on the minimum Euclidean distance between synthesized images and the original guide image in embedding space. Results: ConDiff demonstrated superior performance with an accuracy of 83% and an F1-score of 0.858, outperforming state-of-the-art models by at least 3%. The use of a triplet loss function reduces overfitting in the distance-based classifier. Conclusions: ConDiff not only enhances diagnostic accuracy for DFU infections but also pioneers the use of generative discriminative models for detailed medical image analysis, offering a promising approach for improving patient outcomes.

Via

Access Paper or Ask Questions

Watch Your Language: Large Language Models and Content Moderation

Sep 25, 2023

Deepak Kumar, Yousef AbuHashem, Zakir Durumeric

Abstract:Large language models (LLMs) have exploded in popularity due to their ability to perform a wide array of natural language tasks. Text-based content moderation is one LLM use case that has received recent enthusiasm, however, there is little research investigating how LLMs perform in content moderation settings. In this work, we evaluate a suite of modern, commercial LLMs (GPT-3, GPT-3.5, GPT-4) on two common content moderation tasks: rule-based community moderation and toxic content detection. For rule-based community moderation, we construct 95 LLM moderation-engines prompted with rules from 95 Reddit subcommunities and find that LLMs can be effective at rule-based moderation for many communities, achieving a median accuracy of 64% and a median precision of 83%. For toxicity detection, we find that LLMs significantly outperform existing commercially available toxicity classifiers. However, we also find that recent increases in model size add only marginal benefit to toxicity detection, suggesting a potential performance plateau for LLMs on toxicity detection tasks. We conclude by outlining avenues for future work in studying LLMs and content moderation.

Via

Access Paper or Ask Questions

Specious Sites: Tracking the Spread and Sway of Spurious News Stories at Scale

Aug 03, 2023

Hans W. A. Hanley, Deepak Kumar, Zakir Durumeric

Abstract:Misinformation, propaganda, and outright lies proliferate on the web, with some narratives having dangerous real-world consequences on public health, elections, and individual safety. However, despite the impact of misinformation, the research community largely lacks automated and programmatic approaches for tracking news narratives across online platforms. In this work, utilizing daily scrapes of 1,404 unreliable news websites, the large-language model MPNet, and DP-Means clustering, we introduce a system to automatically isolate and analyze the narratives spread within online ecosystems. Identifying 55,301 narratives on these 1,404 websites, we describe the most prevalent narratives spread in 2022 and identify the most influential websites that originate and magnify narratives. Finally, we show how our system can be utilized to detect new narratives originating from unreliable news websites and aid fact-checkers like Politifact, Reuters, and AP News in more quickly addressing misinformation stories.

Via

Access Paper or Ask Questions

Parameter-efficient Modularised Bias Mitigation via AdapterFusion

Feb 13, 2023

Deepak Kumar, Oleg Lesota, George Zerveas, Daniel Cohen, Carsten Eickhoff, Markus Schedl, Navid Rekabsaz

Abstract:Large pre-trained language models contain societal biases and carry along these biases to downstream tasks. Current in-processing bias mitigation approaches (like adversarial training) impose debiasing by updating a model's parameters, effectively transferring the model to a new, irreversible debiased state. In this work, we propose a novel approach to develop stand-alone debiasing functionalities separate from the model, which can be integrated into the model on-demand, while keeping the core model untouched. Drawing from the concept of AdapterFusion in multi-task learning, we introduce DAM (Debiasing with Adapter Modules) - a debiasing approach to first encapsulate arbitrary bias mitigation functionalities into separate adapters, and then add them to the model on-demand in order to deliver fairness qualities. We conduct a large set of experiments on three classification tasks with gender, race, and age as protected attributes. Our results show that DAM improves or maintains the effectiveness of bias mitigation, avoids catastrophic forgetting in a multi-attribute scenario, and maintains on-par task performance, while granting parameter-efficiency and easy switching between the original and debiased models.

* Accepted at EACL 2023

Via

Access Paper or Ask Questions

Linear programming word problems formulation using EnsembleCRF NER labeler and T5 text generator with data augmentations

Dec 30, 2022

JiangLong He, Mamatha N, Shiv Vignesh, Deepak Kumar, Akshay Uppal

Figure 1 for Linear programming word problems formulation using EnsembleCRF NER labeler and T5 text generator with data augmentations

Figure 2 for Linear programming word problems formulation using EnsembleCRF NER labeler and T5 text generator with data augmentations

Figure 3 for Linear programming word problems formulation using EnsembleCRF NER labeler and T5 text generator with data augmentations

Figure 4 for Linear programming word problems formulation using EnsembleCRF NER labeler and T5 text generator with data augmentations

Abstract:We propose an ensemble approach to predict the labels in linear programming word problems. The entity identification and the meaning representation are two types of tasks to be solved in the NL4Opt competition. We propose the ensembleCRF method to identify the named entities for the first task. We found that single models didn't improve for the given task in our analysis. A set of prediction models predict the entities. The generated results are combined to form a consensus result in the ensembleCRF method. We present an ensemble text generator to produce the representation sentences for the second task. We thought of dividing the problem into multiple small tasks due to the overflow in the output. A single model generates different representations based on the prompt. All the generated text is combined to form an ensemble and produce a mathematical meaning of a linear programming problem.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions