There have been several attempts to quantify the diagnostic distortion caused by algorithms that perform low-dimensional electrocardiogram (ECG) representation. However, there is no universally accepted quantitative measure that allows the diagnostic distortion arising from denoising, compression, and ECG beat representation algorithms to be determined. Hence, the main objective of this work was to develop a framework to enable biomedical engineers to efficiently and reliably assess diagnostic distortion resulting from ECG processing algorithms. We propose a semiautomatic framework for quantifying the diagnostic resemblance between original and denoised/reconstructed ECGs. Evaluation of the ECG must be done manually, but is kept simple and does not require medical training. In a case study, we quantified the agreement between raw and reconstructed (denoised) ECG recordings by means of kappa-based statistical tests. The proposed methodology takes into account that the observers may agree by chance alone. Consequently, for the case study, our statistical analysis reports the "true", beyond-chance agreement in contrast to other, less robust measures, such as simple percent agreement calculations. Our framework allows efficient assessment of clinically important diagnostic distortion, a potential side effect of ECG (pre-)processing algorithms. Accurate quantification of a possible diagnostic loss is critical to any subsequent ECG signal analysis, for instance, the detection of ischemic ST episodes in long-term ECG recordings.