We propose a new model for supervised learning to rank. In our model, the relevancy labels are are assumed to follow a categorical distribution whose probabilities are constructed based on a scoring function. We optimize the training objective with respect to the multivariate categorical variables with an unbiased and low-variance gradient estimator. Learning to rank methods can generally be categorized into pointwise, pairwise, and listwise approaches. Our approach belongs to the class of pointwise methods. Although it has previously been reported that pointwise methods cannot achieve as good performance as of pairwise or listwise approaches, we show that the proposed method achieves better or comparable results on two datasets compared with pairwise and listwise methods.