Hierarchical search in millimeter-wave (mmWave) communications incurs significant beam training overhead and delay, especially in a dynamic environment. Deep learning-enabled beam prediction is promising to significantly mitigate the overhead and delay, efficiently utilizing the site-specific channel prior. In this work, we propose to jointly optimize a data- and model-driven probe beam module and a cascaded data-driven beam predictor, with limitations in that the probe and communicate beams are restricted within the manifold space of uniform planer array and quantization of the phase modulator. First, The probe beam module senses the mmWave channel with a complex-valued neural network and outputs the counterpart RSRPs of probe beams. Second, the beam predictor estimates the RSRPs in the entire beamspace to minimize the prediction cross entropy and selects the optimal beam with the maximum RSRP value for data transmission. Additionally, we propose to add noise to the phase variables in the probe beam module, against quantization error. Simulation results show the effectiveness of our proposed scheme.