Abstract:Zero-shot translation is the task of translating between a language pair where no aligned data for the pair is provided during training. In this work we employ a model that creates paraphrases which are written in the style of another existing text. Since we provide the model with no paired examples from the source style to the target style during training, we call this task zero-shot style transfer. Herein, we identify a high-quality source of aligned, stylistically distinct text in Bible versions and use this data to train an encoder/decoder recurrent neural model. We also train a statistical machine translation system, Moses, for comparison. We find that the neural network outperforms Moses on the established BLEU and PINC metrics for evaluating paraphrase quality. This technique can be widely applied due to the broad definition of style which is used. For example, tasks like text simplification can easily be viewed as style transfer. The corpus itself is highly parallel with 33 distinct Bible Versions used, and human-aligned due to the presence of chapter and verse numbers within the text. This makes the data a rich source of study for other natural language tasks.
Abstract:We present the extention and application of a new unsupervised statistical learning technique--the Partition Decoupling Method--to gene expression data. Because it has the ability to reveal non-linear and non-convex geometries present in the data, the PDM is an improvement over typical gene expression analysis algorithms, permitting a multi-gene analysis that can reveal phenotypic differences even when the individual genes do not exhibit differential expression. Here, we apply the PDM to publicly-available gene expression data sets, and demonstrate that we are able to identify cell types and treatments with higher accuracy than is obtained through other approaches. By applying it in a pathway-by-pathway fashion, we demonstrate how the PDM may be used to find sets of mechanistically-related genes that discriminate phenotypes.