Cyber attacks continue to pose significant threats to individuals and organizations, stealing sensitive data such as personally identifiable information, financial information, and login credentials. Hence, detecting malicious websites before they cause any harm is critical to preventing fraud and monetary loss. To address the increasing number of phishing attacks, protective mechanisms must be highly responsive, adaptive, and scalable. Fortunately, advances in the field of machine learning, coupled with access to vast amounts of data, have led to the adoption of various deep learning models for timely detection of these cyber crimes. This study focuses on the detection of phishing websites using deep learning models such as Multi-Head Attention, Temporal Convolutional Network (TCN), BI-LSTM, and LSTM where URLs of the phishing websites are treated as a sequence. The results demonstrate that Multi-Head Attention and BI-LSTM model outperform some other deep learning-based algorithms such as TCN and LSTM in producing better precision, recall, and F1-scores.