Abstract:About 50% of all queries on Snapchat app are targeted at finding the right friend to interact with. Since everyone has a unique list of friends and that list is not very large (maximum a few thousand), it makes sense to perform this search locally, on users' devices. In addition, the friend list is already available for other purposes, such as showing the chat feed, and the latency savings can be significant by avoiding a server round-trip call. Historically, we resorted to substring matching, ranking prefix matches at the top of the result list. Introducing the ability to perform fuzzy search on a resource-constrained device and in the environment where typo's are prevalent is both prudent and challenging. In this paper, we describe our efficient and accurate two-step approach to fuzzy search, characterized by a skip-bigram retrieval layer and a novel local Levenshtein distance computation used for final ranking.
Abstract:From the Publisher:Software is a commodity being sold across diverse language and cultural groups, whether in the commercial marketplace, or as customized applications. Developers must structure their applications so that they can be readily and cheaply localized for sale in this range of markets. Obvious differences such as scripts and languages must be understood as well as a range of more subtle cultural conventions. Further topics covered include: the overall architecture for internationalized products and an outline of an internationalization API; the use of computational linguistics methods; quality assurance, testing and documentation. Appendices contain summaries of the facilities available for localization on major platforms, characteristics of European languages, commercial tools and further reading. The book is aimed at small and medium sized software producers, and the IT departments of multinational corporations.
Abstract:In this work, we propose a novel framework for privacy-preserving client-distributed machine learning. It is motivated by the desire to achieve differential privacy guarantees in the local model of privacy in a way that satisfies all systems constraints using asynchronous client-server communication and provides attractive model learning properties. We call it "Draw and Discard" because it relies on random sampling of models for load distribution (scalability), which also provides additional server-side privacy protections and improved model quality through averaging. We present the mechanics of client and server components of "Draw and Discard" and demonstrate how the framework can be applied to learning Generalized Linear models. We then analyze the privacy guarantees provided by our approach against several types of adversaries and showcase experimental results that provide evidence for the framework's viability in practical deployments.