In this paper, we explore the feasibility of using communication signals for extended target (ET) tracking in an integrated sensing and communication (ISAC) system. The ET is characterized by its center range, azimuth, orientation, and contour shape, for which conventional scatterer-based tracking algorithms are hardly feasible due to the limited scatterer resolution in ISAC. To address this challenge, we propose ISACTrackNet, a deep learning-based tracking model that directly estimates ET kinematic and contour parameters from noisy received echoes. The model consists of three modules: Denoising module for clutter and self-interference suppression, Encoder module for instantaneous state estimation, and KalmanNet module for prediction refinement within a constant-velocity state-space model. Simulation results show that ISACTrackNet achieves near-optimal accuracy in position and angle estimation compared to radar-based tracking methods, even under limited measurement resolution and partial occlusions, but orientation and contour shape estimation remains slightly suboptimal. These results clearly demonstrate the feasibility of using communication-only signals for reliable ET tracking.