Abstract:People spend a substantial portion of their lives engaged in conversation, and yet our scientific understanding of conversation is still in its infancy. In this report we advance an interdisciplinary science of conversation, with findings from a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections. We leverage the considerable scope of the corpus to (1) extend key findings from the literature, such as the cooperativeness of human turn-taking; (2) define novel algorithmic procedures for the segmentation of speech into conversational turns; (3) apply machine learning insights across various textual, auditory, and visual features to analyze what makes conversations succeed or fail; and (4) explore how conversations are related to well-being across the lifespan. We also report (5) a comprehensive mixed-method report, based on quantitative analysis and qualitative review of each recording, that showcases how individuals from diverse backgrounds alter their communication patterns and find ways to connect. We conclude with a discussion of how this large-scale public dataset may offer new directions for future research, especially across disciplinary boundaries, as scholars from a variety of fields appear increasingly interested in the study of conversation.
Abstract:We present three case studies of organizations using a data science competition to answer a pressing question. The first is in education where a nonprofit that creates smart school budgets wanted to automatically tag budget line items. The second is in public health, where a low-cost, nonprofit women's health care provider wanted to understand the effect of demographic and behavioral questions on predicting which services a woman would need. The third and final example is in government innovation: using online restaurant reviews from Yelp, competitors built models to forecast which restaurants were most likely to have hygiene violations when visited by health inspectors. Finally, we reflect on the unique benefits of the open, public competition model.