Abstract:This paper provides an in-depth evaluation of three state-of-the-art Large Language Models (LLMs) for personalized career mentoring in the computing field, using three distinct student profiles that consider gender, race, and professional levels. We evaluated the performance of GPT-4, LLaMA 3, and Palm 2 using a zero-shot learning approach without human intervention. A quantitative evaluation was conducted through a custom natural language processing analytics pipeline to highlight the uniqueness of the responses and to identify words reflecting each student's profile, including race, gender, or professional level. The analysis of frequently used words in the responses indicates that GPT-4 offers more personalized mentoring compared to the other two LLMs. Additionally, a qualitative evaluation was performed to see if human experts reached similar conclusions. The analysis of survey responses shows that GPT-4 outperformed the other two LLMs in delivering more accurate and useful mentoring while addressing specific challenges with encouragement languages. Our work establishes a foundation for developing personalized mentoring tools based on LLMs, incorporating human mentors in the process to deliver a more impactful and tailored mentoring experience.