This paper investigates the application of Deep Reinforcement Learning (DRL) for attributing malware to specific Advanced Persistent Threat (APT) groups through detailed behavioural analysis. By analysing over 3500 malware samples from 12 distinct APT groups, the study utilises sophisticated tools like Cuckoo Sandbox to extract behavioural data, providing a deep insight into the operational patterns of malware. The research demonstrates that the DRL model significantly outperforms traditional machine learning approaches such as SGD, SVC, KNN, MLP, and Decision Tree Classifiers, achieving an impressive test accuracy of 89.27 %. It highlights the model capability to adeptly manage complex, variable, and elusive malware attributes. Furthermore, the paper discusses the considerable computational resources and extensive data dependencies required for deploying these advanced AI models in cybersecurity frameworks. Future research is directed towards enhancing the efficiency of DRL models, expanding the diversity of the datasets, addressing ethical concerns, and leveraging Large Language Models (LLMs) to refine reward mechanisms and optimise the DRL framework. By showcasing the transformative potential of DRL in malware attribution, this research advocates for a responsible and balanced approach to AI integration, with the goal of advancing cybersecurity through more adaptable, accurate, and robust systems.