Introduction: Music provides an incredible avenue for individuals to express their thoughts and emotions, while also serving as a delightful mode of entertainment for enthusiasts and music lovers. Objectives: This paper presents a comprehensive approach to enhancing the user experience through the integration of emotion recognition, music recommendation, and explainable AI using GRAD-CAM. Methods: The proposed methodology utilizes a ResNet50 model trained on the Facial Expression Recognition (FER) dataset, consisting of real images of individuals expressing various emotions. Results: The system achieves an accuracy of 82% in emotion classification. By leveraging GRAD-CAM, the model provides explanations for its predictions, allowing users to understand the reasoning behind the system's recommendations. The model is trained on both FER and real user datasets, which include labelled facial expressions, and real images of individuals expressing various emotions. The training process involves pre-processing the input images, extracting features through convolutional layers, reasoning with dense layers, and generating emotion predictions through the output layer. Conclusion: The proposed methodology, leveraging the Resnet50 model with ROI-based analysis and explainable AI techniques, offers a robust and interpretable solution for facial emotion detection paper.