In this paper, we introduce a variation of the skip-gram model which jointly learns distributed word vector representations and their way of composing to form phrase embeddings. In particular, we propose a learning procedure that incorporates a phrase-compositionality function which can capture how we want to compose phrases vectors from their component word vectors. Our experiments show improvement in word and phrase similarity tasks as well as syntactic tasks like dependency parsing using the proposed joint models.