Abstract:Signatures present on corporate documents are often used in investigations of relationships between persons of interest, and prior research into the task of offline signature verification has evaluated a wide range of methods on standard signature datasets. However, such tasks often benefit from prior human supervision in the collection, adjustment and labelling of isolated signature images from which all real-world context has been removed. Signatures found in online document repositories such as the United Kingdom Companies House regularly contain high variation in location, size, quality and degrees of obfuscation under stamps. We propose an integrated pipeline of signature extraction and curation, with no human assistance from the obtaining of company documents to the clustering of individual signatures. We use a sequence of heuristic methods, convolutional neural networks, generative adversarial networks and convolutional Siamese networks for signature extraction, filtering, cleaning and embedding respectively. We evaluate both the effectiveness of the pipeline at matching obscured same-author signature pairs and the effectiveness of the entire pipeline against a human baseline for document signature analysis, as well as presenting uses for such a pipeline in the field of real-world anti-money laundering investigation.