An orthogonal affine-precoded superimposed pilot-based architecture is developed for the cyclic prefix (CP)-aided SISO and MIMO orthogonal time frequency space systems relying on arbitrary transmitter-receiver pulse shaping. The data and pilot symbol matrices are affine-precoded and superimposed in the delay Doppler-domain followed by the development of an end-to-end DD-domain relationship for the input-output symbols. At the receiver, the decoupled pilot and data symbol are extracted by employing orthogonal precoder matrices, which eliminates the mutual interference. Furthermore, a novel pilot-aided Bayesian learning (PA-BL) technique is conceived for the channel state information (CSI) estimation of SISO OTFS systems based on the expectation-maximization (EM) technique. Subsequently, a data-aided Bayesian learning (DA-BL)-based joint CSI estimation and data detection technique is proposed, which beneficially harnesses the estimated data symbols for improved CSI estimation. In this scenario our sophisticated data detection rule also integrates the CSI uncertainty of channel estimation into our the linear minimum mean square error (LMMSE) detectors. The AP-SIP framework is also extended to MIMO OTFS systems, wherein the DD-domain input matrix is affine-precoded for each transmit antenna (TA). Then an EM algorithm-based PA-BL scheme is derived for simultaneous row-group sparse CSI estimation for this system, followed also by our data-aided DA-BL scheme that performs joint CSI estimation and data detection. Moreover, the Bayesian Cramer-Rao bounds (BCRBs) are also derived for both SISO as well as MIMO OTFS systems. Finally, simulation results are presented for characterizing the performance of the proposed CSI estimation techniques in a range of typical settings along with their bit error rate (BER) performance in comparison to an ideal system having perfect CSI.