Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dominic Kirkham

Flatsomatic: A Method for Compression of Somatic Mutation Profiles in Cancer

Nov 27, 2019

Geoffroy Dubourg-Felonneau, Yasmeen Kussad, Dominic Kirkham, John W Cassidy, Nirmesh Patel, Harry W Clifford

Figure 1 for Flatsomatic: A Method for Compression of Somatic Mutation Profiles in Cancer

Figure 2 for Flatsomatic: A Method for Compression of Somatic Mutation Profiles in Cancer

Figure 3 for Flatsomatic: A Method for Compression of Somatic Mutation Profiles in Cancer

Figure 4 for Flatsomatic: A Method for Compression of Somatic Mutation Profiles in Cancer

Abstract:In this study, we present Flatsomatic - a Variational Auto Encoder (VAE) optimized to compress somatic mutations that allow for unbiased data compression whilst maintaining the signal. We compared two different neural network architectures for the VAE: Multilayer Perceptron (MLP) and bidirectional LSTM. The somatic profiles we used to train our models consisted of 8,062 Pan-Cancer patients from The Cancer Genome Atlas and 989 cell lines from the COSMIC cell line project. The profiles for each patient were represented by the genomic loci where somatic mutations occurred and, to reduce sparsity, the locations with a frequency <5 were removed. We enhanced the VAE performance by changing its evidence lower bound, and devised an F1-score based loss showing that it helps the VAE learn better than with binary cross-entropy. We also employed beta-VAE to weight the variational regularisation term in the loss function and showed the best performance through a preliminary function to increase the weight of the regularisation term with each epoch. We assessed the reconstruction ability of the VAE using the micro F1-score metric and showed that our best performing model was a 2-layer deep MLP VAE. Our analysis also showed that the size of the latent space did not have a significant effect on the VAE learning ability. We compared the Flatsomatic embeddings created to a lower dimension version of the data from principal component analysis, showing superior performance of Flatsomatic, and performed K-means clustering on both datasets to draw comparisons to known cancer types of each profile. Finally, we present results that confirm that the Flatsomatic representations of 64 dimensions maintain the same predictive power as the original 8,298 dimensions vector, through prediction of drug response.

* Learning Meaningful Representations of Life Workshop at NeurIPS 2019. arXiv admin note: substantial text overlap with arXiv:1911.09008

Via

Access Paper or Ask Questions

Learning Embeddings from Cancer Mutation Sets for Classification Tasks

Nov 20, 2019

Geoffroy Dubourg-Felonneau, Yasmeen Kussad, Dominic Kirkham, John W Cassidy, Nirmesh Patel, Harry W Clifford

Figure 1 for Learning Embeddings from Cancer Mutation Sets for Classification Tasks

Figure 2 for Learning Embeddings from Cancer Mutation Sets for Classification Tasks

Figure 3 for Learning Embeddings from Cancer Mutation Sets for Classification Tasks

Figure 4 for Learning Embeddings from Cancer Mutation Sets for Classification Tasks

Abstract:Analysis of somatic mutation profiles from cancer patients is essential in the development of cancer research. However, the low frequency of most mutations and the varying rates of mutations across patients makes the data extremely challenging to statistically analyze as well as difficult to use in classification problems, for clustering, visualization or for learning useful information. Thus, the creation of low dimensional representations of somatic mutation profiles that hold useful information about the DNA of cancer cells will facilitate the use of such data in applications that will progress precision medicine. In this paper, we talk about the open problem of learning from somatic mutations, and present Flatsomatic: a solution that utilizes variational autoencoders (VAEs) to create latent representations of somatic profiles. The work done in this paper shows great potential for this method, with the VAE embeddings performing better than PCA for a clustering task, and performing equally well to the raw high dimensional data for a classification task. We believe the methods presented herein can be of great value in future research and in bringing data-driven models into precision oncology.

* Sets & Partitions Workshop at NeurIPS 2019

Via

Access Paper or Ask Questions