Behavior prediction plays an important role in integrated autonomous driving software solutions. In behavior prediction research, interactive behavior prediction is a less-explored area, compared to single-agent behavior prediction. Predicting the motion of interactive agents requires initiating novel mechanisms to capture the joint behaviors of the interactive pairs. In this work, we formulate the end-to-end joint prediction problem as a sequential learning process of marginal learning and joint learning of vehicle behaviors. We propose ProspectNet, a joint learning block that adopts the weighted attention score to model the mutual influence between interactive agent pairs. The joint learning block first weighs the multi-modal predicted candidate trajectories, then updates the ego-agent's embedding via cross attention. Furthermore, we broadcast the individual future predictions for each interactive agent into a pair-wise scoring module to select the top $K$ prediction pairs. We show that ProspectNet outperforms the Cartesian product of two marginal predictions, and achieves comparable performance on the Waymo Interactive Motion Prediction benchmarks.