\textit{Purpose} Estimating the interaction forces of instruments and tissue is of interest, particularly to provide haptic feedback during robot assisted minimally invasive interventions. Different approaches based on external and integrated force sensors have been proposed. These are hampered by friction, sensor size, and sterilizability. We investigate a novel approach to estimate the force vector directly from optical coherence tomography image volumes. \textit{Methods} We introduce a novel Siamese 3D CNN architecture. The network takes an undeformed reference volume and a deformed sample volume as an input and outputs the three components of the force vector. We employ a deep residual architecture with bottlenecks for increased efficiency. We compare the Siamese approach to methods using difference volumes and two-dimensional projections. Data was generated using a robotic setup to obtain ground truth force vectors for silicon tissue phantoms as well as porcine tissue. \textit{Results} Our method achieves a mean average error of 7.7 +- 4.3 mN when estimating the force vector. Our novel Siamese 3D CNN architecture outperforms single-path methods that achieve a mean average error of 11.59 +- 6.7 mN. Moreover, the use of volume data leads to significantly higher performance compared to processing only surface information which achieves a mean average error of 24.38 +- 22.0 mN. Based on the tissue dataset, our methods shows good generalization in between different subjects. \textit{Conclusions} We propose a novel image-based force estimation method using optical coherence tomography. We illustrate that capturing the deformation of subsurface structures substantially improves force estimation. Our approach can provide accurate force estimates in surgical setups when using intraoperative optical coherence tomography.