Communication in high frequencies such as millimeter wave and terahertz suffer from high path-loss and intense shadowing which necessitates beamforming for reliable data transmission. On the other hand, at high frequencies the channels are sparse and consist of few spatial clusters. Therefore, beam alignment (BA) strategies are used to find the direction of these channel clusters and adjust the width of the beam used for data transmission. In this work, a single-user uplink scenario where the channel has one dominant cluster is considered. It is assumed that the user transmits a set of BA packets over a fixed duration. Meanwhile, the base-station (BS) uses different probing beams to scan different angular regions. Since the BS measurements are noisy, it is not possible to find a narrow beam that includes the angle of arrival (AoA) of the user with probability one. Therefore, the BS allocates a narrow beam to the user which includes the AoA of the user with a predetermined error probability while minimizing the expected beamwidth of the allocated beam. Due to intractability of this noisy BA problem, here this problem is posed as an end-to-end optimization of a deep neural network (DNN) and effects of different loss functions are discussed and investigated. It is observed that the proposed DNN based BA, at high SNRs, achieves a performance close to that of the optimal BA when there is no-noise and for all SNRs, outperforms state-of-the-art.