Research Fellow, Microsoft, Bengaluru, India
Abstract:Caregivers seeking AI-mediated support express complex needs -- information-seeking, emotional validation, and distress cues -- that warrant careful evaluation of response safety and appropriateness. Existing AI evaluation frameworks, primarily focused on general risks (toxicity, hallucinations, policy violations, etc), may not adequately capture the nuanced risks of LLM-responses in caregiving-contexts. We introduce RubRIX (Rubric-based Risk Index), a theory-driven, clinician-validated framework for evaluating risks in LLM caregiving responses. Grounded in the Elements of an Ethic of Care, RubRIX operationalizes five empirically-derived risk dimensions: Inattention, Bias & Stigma, Information Inaccuracy, Uncritical Affirmation, and Epistemic Arrogance. We evaluate six state-of-the-art LLMs on over 20,000 caregiver queries from Reddit and ALZConnected. Rubric-guided refinement consistently reduced risk-components by 45-98% after one iteration across models. This work contributes a methodological approach for developing domain-sensitive, user-centered evaluation frameworks for high-burden contexts. Our findings highlight the importance of domain-sensitive, interactional risk evaluation for the responsible deployment of LLMs in caregiving support contexts. We release benchmark datasets to enable future research on contextual risk evaluation in AI-mediated support.
Abstract:Software developers balance a variety of different tasks in a workweek, yet the allocation of time often differs from what they consider ideal. Identifying and addressing these deviations is crucial for organizations aiming to enhance the productivity and well-being of the developers. In this paper, we present the findings from a survey of 484 software developers at Microsoft, which aims to identify the key differences between how developers would like to allocate their time during an ideal workweek versus their actual workweek. Our analysis reveals significant deviations between a developer's ideal workweek and their actual workweek, with a clear correlation: as the gap between these two workweeks widens, we observe a decline in both productivity and satisfaction. By examining these deviations in specific activities, we assess their direct impact on the developers' satisfaction and productivity. Additionally, given the growing adoption of AI tools in software engineering, both in the industry and academia, we identify specific tasks and areas that could be strong candidates for automation. In this paper, we make three key contributions: 1) We quantify the impact of workweek deviations on developer productivity and satisfaction 2) We identify individual tasks that disproportionately affect satisfaction and productivity 3) We provide actual data-driven insights to guide future AI automation efforts in software engineering, aligning them with the developers' requirements and ideal workflows for maximizing their productivity and satisfaction.
Abstract:The growth of weeds poses a significant challenge to agricultural productivity, necessitating efficient and accurate weed detection and management strategies. The combination of multispectral imaging and drone technology has emerged as a promising approach to tackle this issue, enabling rapid and cost-effective monitoring of large agricultural fields. This systematic review surveys and evaluates the state-of-the-art in machine learning interventions for weed detection that utilize multispectral images captured by unmanned aerial vehicles. The study describes the various models used for training, features extracted from multi-spectral data, their efficiency and effect on the results, the performance analysis parameters, and also the current challenges faced by researchers in this domain. The review was conducted in accordance with the PRISMA guidelines. Three sources were used to obtain the relevant material, and the screening and data extraction were done on the COVIDENCE platform. The search string resulted in 600 papers from all sources. The review also provides insights into potential research directions and opportunities for further advancements in the field. These insights would serve as a valuable guide for researchers, agricultural scientists, and practitioners in developing precise and sustainable weed management strategies to enhance agricultural productivity and minimize ecological impact.




Abstract:Deep learning models used for medical image classification tasks are often constrained by the limited amount of training data along with severe class imbalance. Despite these problems, models should be explainable to enable human trust in the models' decisions to ensure wider adoption in high-risk situations. In this paper, we propose PRECISe, an explainable-by-design model meticulously constructed to concurrently address all three challenges. Evaluation on 2 imbalanced medical image datasets reveals that PRECISe outperforms the current state-of-the-art methods on data efficient generalization to minority classes, achieving an accuracy of ~87% in detecting pneumonia in chest x-rays upon training on <60 images only. Additionally, a case study is presented to highlight the model's ability to produce easily interpretable predictions, reinforcing its practical utility and reliability for medical imaging tasks.