Abstract:Human mobility and migration drive major societal phenomena such as the growth and evolution of cities, epidemics, economies, and innovation. Historically, human mobility has been strongly constrained by physical separation -- geographic distance. However, geographic distance is becoming less relevant in the increasingly-globalized world in which physical barriers are shrinking while linguistic, cultural, and historical relationships are becoming more important. As understanding mobility is becoming critical for contemporary society, finding frameworks that can capture this complexity is of paramount importance. Here, using three distinct human trajectory datasets, we demonstrate that a neural embedding model can encode nuanced relationships between locations into a vector-space, providing an effective measure of distance that reflects the multi-faceted structure of human mobility. Focusing on the case of scientific mobility, we show that embeddings of scientific organizations uncover cultural and linguistic relations, and even academic prestige, at multiple levels of granularity. Furthermore, the embedding vectors reveal universal relationships between organizational characteristics and their place in the global landscape of scientific mobility. The ability to learn scalable, dense, and meaningful representations of mobility directly from the data can open up a new avenue of studying mobility across domains.
Abstract:The citations process for scientific papers has been studied extensively. But while the citations accrued by authors are the sum of the citations of their papers, translating the dynamics of citation accumulation from the paper to the author level is not trivial. Here we conduct a systematic study of the evolution of author citations, and in particular their bursty dynamics. We find empirical evidence of a correlation between the number of citations most recently accrued by an author and the number of citations they receive in the future. Using a simple model where the probability for an author to receive new citations depends only on the number of citations collected in the previous 12-24 months, we are able to reproduce both the citation and burst size distributions of authors across multiple decades.
Abstract:While the modern science is characterized by an exponential growth in scientific literature, the increase in publication volume clearly does not reflect the expansion of the cognitive boundaries of science. Nevertheless, most of the metrics for assessing the vitality of science or for making funding and policy decisions are based on productivity. Similarly, the increasing level of knowledge production by large science teams, whose results often enjoy greater visibility, does not necessarily mean that "big science" leads to cognitive expansion. Here we present a novel, big-data method to quantify the extents of cognitive domains of different bodies of scientific literature independently from publication volume, and apply it to 20 million articles published over 60-130 years in physics, astronomy, and biomedicine. The method is based on the lexical diversity of titles of fixed quotas of research articles. Owing to large size of quotas, the method overcomes the inherent stochasticity of article titles to achieve <1% precision. We show that the periods of cognitive growth do not necessarily coincide with the trends in publication volume. Furthermore, we show that the articles produced by larger teams cover significantly smaller cognitive territory than (the same quota of) articles from smaller teams. Our findings provide a new perspective on the role of small teams and individual researchers in expanding the cognitive boundaries of science. The proposed method of quantifying the extent of the cognitive territory can also be applied to study many other aspects of "science of science."