Six-dimensional movable antenna (6DMA) is an innovative technology to improve wireless network capacity by adjusting 3D positions and 3D rotations of antenna surfaces based on channel spatial distribution. However, the existing works on 6DMA have assumed a central processing unit (CPU) to jointly process the signals of all 6DMA surfaces to execute various tasks. This inevitably incurs prohibitively high processing cost for channel estimation. Therefore, we propose a distributed 6DMA processing architecture to reduce processing complexity of CPU by equipping each 6DMA surface with a local processing unit (LPU). In particular, we unveil for the first time a new \textbf{\textit{directional sparsity}} property of 6DMA channels, where each user has significant channel gains only for a (small) subset of 6DMA position-rotation pairs, which can receive direct/reflected signals from users. In addition, we propose a practical three-stage protocol for the 6DMA-equipped base station (BS) to conduct statistical CSI acquisition for all 6DMA candidate positions/rotations, 6DMA position/rotation optimization, and instantaneous channel estimation for user data transmission with optimized 6DMA positions/rotations. Specifically, the directional sparsity is leveraged to develop distributed algorithms for joint sparsity detection and channel power estimation, as well as for directional sparsity-aided instantaneous channel estimation. Using the estimated channel power, we develop a channel power-based optimization algorithm to maximize the ergodic sum rate of the users by optimizing the antenna positions/rotations. Simulation results show that our channel estimation algorithms are more accurate than benchmarks with lower pilot overhead, and our optimization outperforms fluid/movable antennas optimized only in two dimensions (2D), even when the latter have perfect instantaneous CSI.