Dictionary learning is a popular class of methods for modeling complex data by learning sparse representations directly from the data. For some large-scale applications, exploiting a known structure of the signal is often essential for reducing the complexity of algorithms and representations. One such method is tensor factorization by which a large multi-dimensional dataset can be explicitly factored or separated along each dimension of the data in order to break the representation up into smaller components. Learning dictionaries for tensor structured data is called tensor or separable dictionary learning. While there have been many recent works on separable dictionary learning, typical formulations involve solving a non-convex optimization problem and guaranteeing global optimality remains a challenge. In this work, we propose a framework that uses recent developments in matrix/tensor factorization to provide theoretical and numerical guarantees of the global optimality for the separable dictionary learning problem. We will demonstrate our algorithm on diffusion magnetic resonance imaging (dMRI) data, a medical imaging modality which measures water diffusion along multiple angular directions in every voxel of an MRI volume. For this application, state-of-the-art methods learn dictionaries for the angular domain of the signals without consideration for the spatial domain. In this work, we apply the proposed separable dictionary learning method to learn spatial and angular dMRI dictionaries jointly and show results on denoising phantom and real dMRI brain data.