Extremely large reconfigurable intelligent surface (XL-RIS) is emerging as a promising key technology for 6G systems. To exploit XL-RIS's full potential, accurate channel estimation is essential. This paper investigates channel estimation in XL-RIS-aided massive MIMO systems under hybrid-field scenarios where far-field and near-field channels coexist. We formulate this problem using dictionary learning, which allows for joint optimization of the dictionary and estimated channel. To handle the high-dimensional nature of XL-RIS channels, we specifically adopt a convolutional dictionary learning (CDL) formulation. The CDL formulation is cast as a bilevel optimization problem, which we solve using a gradient-based approach. To address the challenge of computing the gradient of the upper-level objective, we introduce an unrolled optimization method based on proximal gradient descent (PGD) and its special case, the iterative soft-thresholding algorithm (ISTA). We propose two neural network architectures, Convolutional ISTA-Net and its enhanced version Convolutional ISTA-Net+, for end-to-end optimization of the CDL. To overcome the limitations of linear convolutional filters in capturing complex hybrid-field channel structures, we propose the CNN-CDL approach, which enhances PGD by replacing linear convolution filters with CNN blocks in its gradient descent step, employing a learnable proximal mapping module in its proximal mapping step, and incorporating cross-layer feature integration. Simulation results demonstrate the effectiveness of the proposed methods for channel estimation in hybrid-field XL-RIS systems.