Abstract:Historical ciphered manuscripts are documents that were typically used in sensitive communications within military and diplomatic contexts or among members of secret societies. These secret messages were concealed by inventing a method of writing employing symbols from diverse sources such as digits, alchemy signs and Latin or Greek characters. When studying a new, unseen cipher, the automatic search and grouping of ciphers with a similar alphabet can aid the scholar in its transcription and cryptanalysis because it indicates a probability that the underlying cipher is similar. In this study, we address this need by proposing the CSI metric, a novel way of comparing pairs of ciphered documents. We assess their effectiveness in an unsupervised clustering scenario utilising visual features, including SIFT, pre-trained learnt embeddings, and OCR descriptors.
Abstract:The quality of Optical Music Recognition (OMR) systems is a rather difficult magnitude to measure. There is no lingua franca shared among OMR datasets that allows to compare systems' performance on equal grounds, since most of them are specialised on certain approaches. As a result, most state-of-the-art works currently report metrics that cannot be compared directly. In this paper we identify the need of a common music representation language and propose the Music Tree Notation (MTN) format, thanks to which the definition of standard metrics is possible. This format represents music as a set of primitives that group together into higher-abstraction nodes, a compromise between the expression of fully graph-based and sequential notation formats. We have also developed a specific set of OMR metrics and a typeset score dataset as a proof of concept of this idea.