Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sandeep Kumar Shukla

CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution

May 21, 2025

Minghao Shao, Haoran Xi, Nanda Rani, Meet Udeshi, Venkata Sai Charan Putrevu, Kimberly Milner, Brendan Dolan-Gavitt, Sandeep Kumar Shukla, Prashanth Krishnamurthy, Farshad Khorrami(+2 more)

Abstract:Large Language Model (LLM) agents can automate cybersecurity tasks and can adapt to the evolving cybersecurity landscape without re-engineering. While LLM agents have demonstrated cybersecurity capabilities on Capture-The-Flag (CTF) competitions, they have two key limitations: accessing latest cybersecurity expertise beyond training data, and integrating new knowledge into complex task planning. Knowledge-based approaches that incorporate technical understanding into the task-solving automation can tackle these limitations. We present CRAKEN, a knowledge-based LLM agent framework that improves cybersecurity capability through three core mechanisms: contextual decomposition of task-critical information, iterative self-reflected knowledge retrieval, and knowledge-hint injection that transforms insights into adaptive attack strategies. Comprehensive evaluations with different configurations show CRAKEN's effectiveness in multi-stage vulnerability detection and exploitation compared to previous approaches. Our extensible architecture establishes new methodologies for embedding new security knowledge into LLM-driven cybersecurity agentic systems. With a knowledge database of CTF writeups, CRAKEN obtained an accuracy of 22% on NYU CTF Bench, outperforming prior works by 3% and achieving state-of-the-art results. On evaluation of MITRE ATT&CK techniques, CRAKEN solves 25-30% more techniques than prior work, demonstrating improved cybersecurity capabilities via knowledge-based execution. We make our framework open source to public https://github.com/NYU-LLM-CTF/nyuctf_agents_craken.

Via

Access Paper or Ask Questions

The Hidden Risks of LLM-Generated Web Application Code: A Security-Centric Evaluation of Code Generation Capabilities in Large Language Models

Apr 29, 2025

Swaroop Dora, Deven Lunkad, Naziya Aslam, S. Venkatesan, Sandeep Kumar Shukla

Abstract:The rapid advancement of Large Language Models (LLMs) has enhanced software development processes, minimizing the time and effort required for coding and enhancing developer productivity. However, despite their potential benefits, code generated by LLMs has been shown to generate insecure code in controlled environments, raising critical concerns about their reliability and security in real-world applications. This paper uses predefined security parameters to evaluate the security compliance of LLM-generated code across multiple models, such as ChatGPT, DeepSeek, Claude, Gemini and Grok. The analysis reveals critical vulnerabilities in authentication mechanisms, session management, input validation and HTTP security headers. Although some models implement security measures to a limited extent, none fully align with industry best practices, highlighting the associated risks in automated software development. Our findings underscore that human expertise is crucial to ensure secure software deployment or review of LLM-generated code. Also, there is a need for robust security assessment frameworks to enhance the reliability of LLM-generated code in real-world applications.

* 9 pages

Via

Access Paper or Ask Questions

Automated Classification of Cybercrime Complaints using Transformer-based Language Models for Hinglish Texts

Dec 21, 2024

Nanda Rani, Divyanshu Singh, Bikash Saha, Sandeep Kumar Shukla

Abstract:The rise in cybercrime and the complexity of multilingual and code-mixed complaints present significant challenges for law enforcement and cybersecurity agencies. These organizations need automated, scalable methods to identify crime types, enabling efficient processing and prioritization of large complaint volumes. Manual triaging is inefficient, and traditional machine learning methods fail to capture the semantic and contextual nuances of textual cybercrime complaints. Moreover, the lack of publicly available datasets and privacy concerns hinder the research to present robust solutions. To address these challenges, we propose a framework for automated cybercrime complaint classification. The framework leverages Hinglish-adapted transformers, such as HingBERT and HingRoBERTa, to handle code-mixed inputs effectively. We employ the real-world dataset provided by Indian Cybercrime Coordination Centre (I4C) during CyberGuard AI Hackathon 2024. We employ GenAI open source model-based data augmentation method to address class imbalance. We also employ privacy-aware preprocessing to ensure compliance with ethical standards while maintaining data integrity. Our solution achieves significant performance improvements, with HingRoBERTa attaining an accuracy of 74.41% and an F1-score of 71.49%. We also develop ready-to-use tool by integrating Django REST backend with a modern frontend. The developed tool is scalable and ready for real-world deployment in platforms like the National Cyber Crime Reporting Portal. This work bridges critical gaps in cybercrime complaint management, offering a scalable, privacy-conscious, and adaptable solution for modern cybersecurity challenges.

Via

Access Paper or Ask Questions

DNS based In-Browser Cryptojacking Detection

May 10, 2022

Rohit Kumar Sachan, Rachit Agarwal, Sandeep Kumar Shukla

Figure 1 for DNS based In-Browser Cryptojacking Detection

Figure 2 for DNS based In-Browser Cryptojacking Detection

Figure 3 for DNS based In-Browser Cryptojacking Detection

Figure 4 for DNS based In-Browser Cryptojacking Detection

Abstract:The metadata aspect of Domain Names (DNs) enables us to perform a behavioral study of DNs and detect if a DN is involved in in-browser cryptojacking. Thus, we are motivated to study different temporal and behavioral aspects of DNs involved in cryptojacking. We use temporal features such as query frequency and query burst along with graph-based features such as degree and diameter, and non-temporal features such as the string-based to detect if a DNs is suspect to be involved in the in-browser cryptojacking. Then, we use them to train the Machine Learning (ML) algorithms over different temporal granularities such as 2 hours datasets and complete dataset. Our results show DecisionTrees classifier performs the best with 59.5% Recall on cryptojacked DN, while for unsupervised learning, K-Means with K=2 perform the best. Similarity analysis of the features reveals a minimal divergence between the cryptojacking DNs and other already known malicious DNs. It also reveals the need for improvements in the feature set of state-of-the-art methods to improve their accuracy in detecting in-browser cryptojacking. As added analysis, our signature-based analysis identifies that none-of-the Indian Government websites were involved in cryptojacking during October-December 2021. However, based on the resource utilization, we identify 10 DNs with different properties than others.

* Submitted

Via

Access Paper or Ask Questions

EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector

Apr 08, 2022

Vikas Maurya, Rachit Agarwal, Saurabh Kumar, Sandeep Kumar Shukla

Figure 1 for EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector

Figure 2 for EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector

Figure 3 for EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector

Figure 4 for EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector

Abstract:Due to the importance of Critical Infrastructure (CI) in a nation's economy, they have been lucrative targets for cyber attackers. These critical infrastructures are usually Cyber-Physical Systems (CPS) such as power grids, water, and sewage treatment facilities, oil and gas pipelines, etc. In recent times, these systems have suffered from cyber attacks numerous times. Researchers have been developing cyber security solutions for CIs to avoid lasting damages. According to standard frameworks, cyber security based on identification, protection, detection, response, and recovery are at the core of these research. Detection of an ongoing attack that escapes standard protection such as firewall, anti-virus, and host/network intrusion detection has gained importance as such attacks eventually affect the physical dynamics of the system. Therefore, anomaly detection in physical dynamics proves an effective means to implement defense-in-depth. PASAD is one example of anomaly detection in the sensor/actuator data, representing such systems' physical dynamics. We present EPASAD, which improves the detection technique used in PASAD to detect these micro-stealthy attacks, as our experiments show that PASAD's spherical boundary-based detection fails to detect. Our method EPASAD overcomes this by using Ellipsoid boundaries, thereby tightening the boundaries in various dimensions, whereas a spherical boundary treats all dimensions equally. We validate EPASAD using the dataset produced by the TE-process simulator and the C-town datasets. The results show that EPASAD improves PASAD's average recall by 5.8% and 9.5% for the two datasets, respectively.

* Submitted

Via

Access Paper or Ask Questions

Towards Malicious address identification in Bitcoin

Dec 22, 2021

Deepesh Chaudhari, Rachit Agarwal, Sandeep Kumar Shukla

Figure 1 for Towards Malicious address identification in Bitcoin

Figure 2 for Towards Malicious address identification in Bitcoin

Figure 3 for Towards Malicious address identification in Bitcoin

Figure 4 for Towards Malicious address identification in Bitcoin

Abstract:The temporal aspect of blockchain transactions enables us to study the address's behavior and detect if it is involved in any illicit activity. However, due to the concept of change addresses (used to thwart replay attacks), temporal aspects are not directly applicable in the Bitcoin blockchain. Several pre-processing steps should be performed before such temporal aspects are utilized. We are motivated to study the Bitcoin transaction network and use the temporal features such as burst, attractiveness, and inter-event time along with several graph-based properties such as the degree of node and clustering coefficient to validate the applicability of already existing approaches known for other cryptocurrency blockchains on the Bitcoin blockchain. We generate the temporal and non-temporal feature set and train the Machine Learning (ML) algorithm over different temporal granularities to validate the state-of-the-art methods. We study the behavior of the addresses over different time granularities of the dataset. We identify that after applying change-address clustering, in Bitcoin, existing temporal features can be extracted and ML approaches can be applied. A comparative analysis of results show that the behavior of addresses in Ethereum and Bitcoin is similar with respect to in-degree, out-degree and inter-event time. Further, we identify 3 suspects that showed malicious behavior across different temporal granularities. These suspects are not marked as malicious in Bitcoin.

Via

Access Paper or Ask Questions

Vulnerability and Transaction behavior based detection of Malicious Smart Contracts

Jun 25, 2021

Rachit Agarwal, Tanmay Thapliyal, Sandeep Kumar Shukla

Figure 1 for Vulnerability and Transaction behavior based detection of Malicious Smart Contracts

Figure 2 for Vulnerability and Transaction behavior based detection of Malicious Smart Contracts

Figure 3 for Vulnerability and Transaction behavior based detection of Malicious Smart Contracts

Figure 4 for Vulnerability and Transaction behavior based detection of Malicious Smart Contracts

Abstract:Smart Contracts (SCs) in Ethereum can automate tasks and provide different functionalities to a user. Such automation is enabled by the `Turing-complete' nature of the programming language (Solidity) in which SCs are written. This also opens up different vulnerabilities and bugs in SCs that malicious actors exploit to carry out malicious or illegal activities on the cryptocurrency platform. In this work, we study the correlation between malicious activities and the vulnerabilities present in SCs and find that some malicious activities are correlated with certain types of vulnerabilities. We then develop and study the feasibility of a scoring mechanism that corresponds to the severity of the vulnerabilities present in SCs to determine if it is a relevant feature to identify suspicious SCs. We analyze the utility of severity score towards detection of suspicious SCs using unsupervised machine learning (ML) algorithms across different temporal granularities and identify behavioral changes. In our experiments with on-chain SCs, we were able to find a total of 1094 benign SCs across different granularities which behave similar to malicious SCs, with the inclusion of the smart contract vulnerability scores in the feature set.

* Submitted to a conf

Via

Access Paper or Ask Questions

Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties

Jun 25, 2021

Rohit Kumar Sachan, Rachit Agarwal, Sandeep Kumar Shukla

Figure 1 for Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties

Figure 2 for Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties

Figure 3 for Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties

Figure 4 for Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties

Abstract:The rise in the adoption of blockchain technology has led to increased illegal activities by cyber-criminals costing billions of dollars. Many machine learning algorithms are applied to detect such illegal behavior. These algorithms are often trained on the transaction behavior and, in some cases, trained on the vulnerabilities that exist in the system. In our approach, we study the feasibility of using metadata such as Domain Name (DN) associated with the account in the blockchain and identify whether an account should be tagged malicious or not. Here, we leverage the temporal aspects attached to the DNs. Our results identify 144930 DNs that show malicious behavior, and out of these, 54114 DNs show persistent malicious behavior over time. Nonetheless, none of these identified malicious DNs were reported in new officially tagged malicious blockchain DNs.

* Submitted to a journal

Via

Access Paper or Ask Questions

Sequence to sequence deep learning models for solar irradiation forecasting

Apr 30, 2019

Bhaskar Pratim Mukhoty, Vikas Maurya, Sandeep Kumar Shukla

Figure 1 for Sequence to sequence deep learning models for solar irradiation forecasting

Figure 2 for Sequence to sequence deep learning models for solar irradiation forecasting

Figure 3 for Sequence to sequence deep learning models for solar irradiation forecasting

Figure 4 for Sequence to sequence deep learning models for solar irradiation forecasting

Abstract:The energy output a photo voltaic(PV) panel is a function of solar irradiation and weather parameters like temperature and wind speed etc. A general measure for solar irradiation called Global Horizontal Irradiance (GHI), customarily reported in Watt/meter$^2$, is a generic indicator for this intermittent energy resource. An accurate prediction of GHI is necessary for reliable grid integration of the renewable as well as for power market trading. While some machine learning techniques are well introduced along with the traditional time-series forecasting techniques, deep-learning techniques remains less explored for the task at hand. In this paper we give deep learning models suitable for sequence to sequence prediction of GHI. The deep learning models are reported for short-term forecasting $\{1-24\}$ hour along with the state-of-the art techniques like Gradient Boosted Regression Trees(GBRT) and Feed Forward Neural Networks(FFNN). We have checked that spatio-temporal features like wind direction, wind speed and GHI of neighboring location improves the prediction accuracy of the deep learning models significantly. Among the various sequence-to-sequence encoder-decoder models LSTM performed superior, handling short-comings of the state-of-the-art techniques.

Via

Access Paper or Ask Questions