Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arun Sharma

Department of Information Technology, Indira Gandhi Delhi Technical University for Women, Delhi, India

Towards Fine-Tuning-Based Site Calibration for Knowledge-Guided Machine Learning: A Summary of Results

Dec 17, 2025

Ruolei Zeng, Arun Sharma, Shuai An, Mingzhou Yang, Shengya Zhang, Licheng Liu, David Mulla, Shashi Shekhar

Abstract:Accurate and cost-effective quantification of the agroecosystem carbon cycle at decision-relevant scales is essential for climate mitigation and sustainable agriculture. However, both transfer learning and the exploitation of spatial variability in this field are challenging, as they involve heterogeneous data and complex cross-scale dependencies. Conventional approaches often rely on location-independent parameterizations and independent training, underutilizing transfer learning and spatial heterogeneity in the inputs, and limiting their applicability in regions with substantial variability. We propose FTBSC-KGML (Fine-Tuning-Based Site Calibration-Knowledge-Guided Machine Learning), a pretraining- and fine-tuning-based, spatial-variability-aware, and knowledge-guided machine learning framework that augments KGML-ag with a pretraining-fine-tuning process and site-specific parameters. Using a pretraining-fine-tuning process with remote-sensing GPP, climate, and soil covariates collected across multiple midwestern sites, FTBSC-KGML estimates land emissions while leveraging transfer learning and spatial heterogeneity. A key component is a spatial-heterogeneity-aware transfer-learning scheme, which is a globally pretrained model that is fine-tuned at each state or site to learn place-aware representations, thereby improving local accuracy under limited data without sacrificing interpretability. Empirically, FTBSC-KGML achieves lower validation error and greater consistency in explanatory power than a purely global model, thereby better capturing spatial variability across states. This work extends the prior SDSA-KGML framework.

Via

Access Paper or Ask Questions

Critical Insights into Leading Conversational AI Models

Oct 26, 2025

Urja Kohli, Aditi Singh, Arun Sharma

Figure 1 for Critical Insights into Leading Conversational AI Models

Figure 2 for Critical Insights into Leading Conversational AI Models

Figure 3 for Critical Insights into Leading Conversational AI Models

Figure 4 for Critical Insights into Leading Conversational AI Models

Abstract:Big Language Models (LLMs) are changing the way businesses use software, the way people live their lives and the way industries work. Companies like Google, High-Flyer, Anthropic, OpenAI and Meta are making better LLMs. So, it's crucial to look at how each model is different in terms of performance, moral behaviour and usability, as these differences are based on the different ideas that built them. This study compares five top LLMs: Google's Gemini, High-Flyer's DeepSeek, Anthropic's Claude, OpenAI's GPT models and Meta's LLaMA. It performs this by analysing three important factors: Performance and Accuracy, Ethics and Bias Mitigation and Usability and Integration. It was found that Claude has good moral reasoning, Gemini is better at multimodal capabilities and has strong ethical frameworks. DeepSeek is great at reasoning based on facts, LLaMA is good for open applications and ChatGPT delivers balanced performance with a focus on usage. It was concluded that these models are different in terms of how well they work, how easy they are to use and how they treat people ethically, making it a point that each model should be utilised by the user in a way that makes the most of its strengths.

* 21 pages, 7 tables, 3 figures. Open-access preprint intended for journal or conference submission

Via

Access Paper or Ask Questions

Towards Physics-informed Diffusion for Anomaly Detection in Trajectories

Jun 08, 2025

Arun Sharma, Mingzhou Yang, Majid Farhadloo, Subhankar Ghosh, Bharat Jayaprakash, Shashi Shekhar

Abstract:Given trajectory data, a domain-specific study area, and a user-defined threshold, we aim to find anomalous trajectories indicative of possible GPS spoofing (e.g., fake trajectory). The problem is societally important to curb illegal activities in international waters, such as unauthorized fishing and illicit oil transfers. The problem is challenging due to advances in AI generated in deep fakes generation (e.g., additive noise, fake trajectories) and lack of adequate amount of labeled samples for ground-truth verification. Recent literature shows promising results for anomalous trajectory detection using generative models despite data sparsity. However, they do not consider fine-scale spatiotemporal dependencies and prior physical knowledge, resulting in higher false-positive rates. To address these limitations, we propose a physics-informed diffusion model that integrates kinematic constraints to identify trajectories that do not adhere to physical laws. Experimental results on real-world datasets in the maritime and urban domains show that the proposed framework results in higher prediction accuracy and lower estimation error rate for anomaly detection and trajectory generation methods, respectively. Our implementation is available at https://github.com/arunshar/Physics-Informed-Diffusion-Probabilistic-Model.

Via

Access Paper or Ask Questions

Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning

Feb 20, 2025

Arun Sharma, Majid Farhadloo, Mingzhou Yang, Ruolei Zeng, Subhankar Ghosh, Shashi Shekhar

Figure 1 for Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning

Figure 2 for Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning

Figure 3 for Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning

Figure 4 for Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning

Abstract:Given inputs of diverse soil characteristics and climate data gathered from various regions, we aimed to build a model to predict accurate land emissions. The problem is important since accurate quantification of the carbon cycle in agroecosystems is crucial for mitigating climate change and ensuring sustainable food production. Predicting accurate land emissions is challenging since calibrating the heterogeneous nature of soil properties, moisture, and environmental conditions is hard at decision-relevant scales. Traditional approaches do not adequately estimate land emissions due to location-independent parameters failing to leverage the spatial heterogeneity and also require large datasets. To overcome these limitations, we proposed Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning (SDSA-KGML), which leverages location-dependent parameters that account for significant spatial heterogeneity in soil moisture from multiple sites within the same region. Experimental results demonstrate that SDSA-KGML models achieve higher local accuracy for the specified states in the Midwest Region.

Via

Access Paper or Ask Questions

Towards Physics-Guided Foundation Models

Feb 20, 2025

Majid Farhadloo, Arun Sharma, Mingzhou Yang, Bharat Jayaprakash, William Northrop, Shashi Shekhar

Figure 1 for Towards Physics-Guided Foundation Models

Figure 2 for Towards Physics-Guided Foundation Models

Abstract:Traditional foundation models are pre-trained on broad datasets to reduce the training resources (e.g., time, energy, labeled samples) needed for fine-tuning a wide range of downstream tasks. However, traditional foundation models struggle with out-of-distribution prediction and can produce outputs that are unrealistic and physically infeasible. We propose the notation of physics-guided foundation models (PGFM), that is, foundation models integrated with broad or general domain (e.g., scientific) physical knowledge applicable to a wide range of downstream tasks.

Via

Access Paper or Ask Questions

Spatially-Delineated Domain-Adapted AI Classification: An Application for Oncology Data

Jan 20, 2025

Majid Farhadloo, Arun Sharma, Alexey Leontovich, Svetomir N. Markovic, Shashi Shekhar

Figure 1 for Spatially-Delineated Domain-Adapted AI Classification: An Application for Oncology Data

Figure 2 for Spatially-Delineated Domain-Adapted AI Classification: An Application for Oncology Data

Figure 3 for Spatially-Delineated Domain-Adapted AI Classification: An Application for Oncology Data

Figure 4 for Spatially-Delineated Domain-Adapted AI Classification: An Application for Oncology Data

Abstract:Given multi-type point maps from different place-types (e.g., tumor regions), our objective is to develop a classifier trained on the source place-type to accurately distinguish between two classes of the target place-type based on their point arrangements. This problem is societally important for many applications, such as generating clinical hypotheses for designing new immunotherapies for cancer treatment. The challenge lies in the spatial variability, the inherent heterogeneity and variation observed in spatial properties or arrangements across different locations (i.e., place-types). Previous techniques focus on self-supervised tasks to learn domain-invariant features and mitigate domain differences; however, they often neglect the underlying spatial arrangements among data points, leading to significant discrepancies across different place-types. We explore a novel multi-task self-learning framework that targets spatial arrangements, such as spatial mix-up masking and spatial contrastive predictive coding, for spatially-delineated domain-adapted AI classification. Experimental results on real-world datasets (e.g., oncology data) show that the proposed framework provides higher prediction accuracy than baseline methods.

* SIAM International Conference on Data Mining 2025

Via

Access Paper or Ask Questions

Towards Kriging-informed Conditional Diffusion for Regional Sea-Level Data Downscaling

Oct 21, 2024

Subhankar Ghosh, Arun Sharma, Jayant Gupta, Aneesh Subramanian, Shashi Shekhar

Figure 1 for Towards Kriging-informed Conditional Diffusion for Regional Sea-Level Data Downscaling

Figure 2 for Towards Kriging-informed Conditional Diffusion for Regional Sea-Level Data Downscaling

Figure 3 for Towards Kriging-informed Conditional Diffusion for Regional Sea-Level Data Downscaling

Figure 4 for Towards Kriging-informed Conditional Diffusion for Regional Sea-Level Data Downscaling

Abstract:Given coarser-resolution projections from global climate models or satellite data, the downscaling problem aims to estimate finer-resolution regional climate data, capturing fine-scale spatial patterns and variability. Downscaling is any method to derive high-resolution data from low-resolution variables, often to provide more detailed and local predictions and analyses. This problem is societally crucial for effective adaptation, mitigation, and resilience against significant risks from climate change. The challenge arises from spatial heterogeneity and the need to recover finer-scale features while ensuring model generalization. Most downscaling methods \cite{Li2020} fail to capture the spatial dependencies at finer scales and underperform on real-world climate datasets, such as sea-level rise. We propose a novel Kriging-informed Conditional Diffusion Probabilistic Model (Ki-CDPM) to capture spatial variability while preserving fine-scale features. Experimental results on climate data show that our proposed method is more accurate than state-of-the-art downscaling techniques.

Via

Access Paper or Ask Questions

Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Jul 01, 2024

Subhankar Ghosh, Jayant Gupta, Arun Sharma, Shuai An, Shashi Shekhar

Figure 1 for Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Figure 2 for Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Figure 3 for Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Figure 4 for Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Abstract:Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner \cite{10.1145/3557989.3566158} that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost.

Via

Access Paper or Ask Questions

Towards Statistically Significant Taxonomy Aware Co-location Pattern Detection

Jun 29, 2024

Subhankar Ghosh, Arun Sharma, Jayant Gupta, Shashi Shekhar

Figure 1 for Towards Statistically Significant Taxonomy Aware Co-location Pattern Detection

Figure 2 for Towards Statistically Significant Taxonomy Aware Co-location Pattern Detection

Figure 3 for Towards Statistically Significant Taxonomy Aware Co-location Pattern Detection

Abstract:Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across the food chain), spatial pathology (e.g., immunotherapy for cancer), retail, etc. The problem is computationally challenging due to the exponential number of candidate co-location patterns generated by the taxonomy. Most approaches for co-location pattern detection overlook the hierarchical relationships among spatial features, and the statistical significance of the detected patterns is not always considered, leading to potential false discoveries. This paper introduces two methods for incorporating taxonomies and assessing the statistical significance of co-location patterns. The baseline approach iteratively checks the significance of co-locations between leaf nodes or their ancestors in the taxonomy. Using the Benjamini-Hochberg procedure, an advanced approach is proposed to control the false discovery rate. This approach effectively reduces the risk of false discoveries while maintaining the power to detect true co-location patterns. Experimental evaluation and case study results show the effectiveness of the approach.

* Accepted in The 16th Conference on Spatial Information Theory (COSIT) 2024

Via

Access Paper or Ask Questions

Physics-Guided Abnormal Trajectory Gap Detection

Mar 10, 2024

Arun Sharma, Shashi Shekhar

Figure 1 for Physics-Guided Abnormal Trajectory Gap Detection

Figure 2 for Physics-Guided Abnormal Trajectory Gap Detection

Figure 3 for Physics-Guided Abnormal Trajectory Gap Detection

Figure 4 for Physics-Guided Abnormal Trajectory Gap Detection

Abstract:Given trajectories with gaps (i.e., missing data), we investigate algorithms to identify abnormal gaps in trajectories which occur when a given moving object did not report its location, but other moving objects in the same geographic region periodically did. The problem is important due to its societal applications, such as improving maritime safety and regulatory enforcement for global security concerns such as illegal fishing, illegal oil transfers, and trans-shipments. The problem is challenging due to the difficulty of bounding the possible locations of the moving object during a trajectory gap, and the very high computational cost of detecting gaps in such a large volume of location data. The current literature on anomalous trajectory detection assumes linear interpolation within gaps, which may not be able to detect abnormal gaps since objects within a given region may have traveled away from their shortest path. In preliminary work, we introduced an abnormal gap measure that uses a classical space-time prism model to bound an object's possible movement during the trajectory gap and provided a scalable memoized gap detection algorithm (Memo-AGD). In this paper, we propose a Space Time-Aware Gap Detection (STAGD) approach to leverage space-time indexing and merging of trajectory gaps. We also incorporate a Dynamic Region Merge-based (DRM) approach to efficiently compute gap abnormality scores. We provide theoretical proofs that both algorithms are correct and complete and also provide analysis of asymptotic time complexity. Experimental results on synthetic and real-world maritime trajectory data show that the proposed approach substantially improves computation time over the baseline technique.

Via

Access Paper or Ask Questions