We perform quantum process tomography (QPT) for both discrete- and continuous-variable quantum systems by learning a process representation using Kraus operators. The Kraus form ensures that the reconstructed process is completely positive. To make the process trace-preserving, we use a constrained gradient-descent (GD) approach on the so-called Stiefel manifold during optimization to obtain the Kraus operators. Our ansatz uses a few Kraus operators to avoid direct estimation of large process matrices, e.g., the Choi matrix, for low-rank quantum processes. The GD-QPT matches the performance of both compressed-sensing (CS) and projected least-squares (PLS) QPT in benchmarks with two-qubit random processes, but shines by combining the best features of these two methods. Similar to CS (but unlike PLS), GD-QPT can reconstruct a process from just a small number of random measurements, and similar to PLS (but unlike CS) it also works for larger system sizes, up to at least five qubits. We envisage that the data-driven approach of GD-QPT can become a practical tool that greatly reduces the cost and computational effort for QPT in intermediate-scale quantum systems.