Abstract:Global demand for donated blood far exceeds supply, and unmet need is greatest in low- and middle-income countries; experts suggest that large-scale coordination is necessary to alleviate demand. Using the Facebook Blood Donation tool, we conduct the first large-scale algorithmic matching of blood donors with donation opportunities. While measuring actual donation rates remains a challenge, we measure donor action (e.g., making a donation appointment) as a proxy for actual donation. We develop automated policies for matching donors with donation opportunities, based on an online matching model. We provide theoretical guarantees for these policies, both regarding the number of expected donations and the equitable treatment of blood recipients. In simulations, a simple matching strategy increases the number of donations by 5-10%; a pilot experiment with real donors shows a 5% relative increase in donor action rate (from 3.7% to 3.9%). When scaled to the global Blood Donation tool user base, this corresponds to an increase of around one hundred thousand users taking action toward donation. Further, observing donor action on a social network can shed light onto donor behavior and response to incentives. Our initial findings align with several observations made in the medical and social science literature regarding donor behavior.
Abstract:Typical large-scale recommender systems use deep learning models that are stored on a large amount of DRAM. These models often rely on embeddings, which consume most of the required memory. We present Bandana, a storage system that reduces the DRAM footprint of embeddings, by using Non-volatile Memory (NVM) as the primary storage medium, with a small amount of DRAM as cache. The main challenge in storing embeddings on NVM is its limited read bandwidth compared to DRAM. Bandana uses two primary techniques to address this limitation: first, it stores embedding vectors that are likely to be read together in the same physical location, using hypergraph partitioning, and second, it decides the number of embedding vectors to cache in DRAM by simulating dozens of small caches. These techniques allow Bandana to increase the effective read bandwidth of NVM by 2-3x and thereby significantly reduce the total cost of ownership.
Abstract:We consider the problem of extracting accurate average ant trajectories from many (possibly inaccurate) input trajectories contributed by citizen scientists. Although there are many generic software tools for motion tracking and specific ones for insect tracking, even untrained humans are much better at this task, provided a robust method to computing the average trajectories. We implemented and tested several local (one ant at a time) and global (all ants together) method. Our best performing algorithm uses a novel global method, based on finding edge-disjoint paths in an ant-interaction graph constructed from the input trajectories. The underlying optimization problem is a new and interesting variant of network flow. Even though the problem is NP-hard, we implemented two heuristics, which work very well in practice, outperforming all other approaches, including the best automated system.
Abstract:We study the problem of computing semantic-preserving word clouds in which semantically related words are close to each other. While several heuristic approaches have been described in the literature, we formalize the underlying geometric algorithm problem: Word Rectangle Adjacency Contact (WRAC). In this model each word is associated with rectangle with fixed dimensions, and the goal is to represent semantically related words by ensuring that the two corresponding rectangles touch. We design and analyze efficient polynomial-time algorithms for some variants of the WRAC problem, show that several general variants are NP-hard, and describe a number of approximation algorithms. Finally, we experimentally demonstrate that our theoretically-sound algorithms outperform the early heuristics.