Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Venu Satuluri

Factorbird - a Parameter Server Approach to Distributed Matrix Factorization

Nov 03, 2014

Sebastian Schelter, Venu Satuluri, Reza Zadeh

Figure 1 for Factorbird - a Parameter Server Approach to Distributed Matrix Factorization

Figure 2 for Factorbird - a Parameter Server Approach to Distributed Matrix Factorization

Figure 3 for Factorbird - a Parameter Server Approach to Distributed Matrix Factorization

Figure 4 for Factorbird - a Parameter Server Approach to Distributed Matrix Factorization

Abstract:We present Factorbird, a prototype of a parameter server approach for factorizing large matrices with Stochastic Gradient Descent-based algorithms. We designed Factorbird to meet the following desiderata: (a) scalability to tall and wide matrices with dozens of billions of non-zeros, (b) extensibility to different kinds of models and loss functions as long as they can be optimized using Stochastic Gradient Descent (SGD), and (c) adaptability to both batch and streaming scenarios. Factorbird uses a parameter server in order to scale to models that exceed the memory of an individual machine, and employs lock-free Hogwild!-style learning with a special partitioning scheme to drastically reduce conflicting updates. We also discuss other aspects of the design of our system such as how to efficiently grid search for hyperparameters at scale. We present experiments of Factorbird on a matrix built from a subset of Twitter's interaction graph, consisting of more than 38 billion non-zeros and about 200 million rows and columns, which is to the best of our knowledge the largest matrix on which factorization results have been reported in the literature.

* 10 pages. Submitted to the NIPS 2014 Workshop on Distributed Matrix Computations

Via

Access Paper or Ask Questions

Bayesian Locality Sensitive Hashing for Fast Similarity Search

Mar 28, 2012

Venu Satuluri, Srinivasan Parthasarathy

Figure 1 for Bayesian Locality Sensitive Hashing for Fast Similarity Search

Figure 2 for Bayesian Locality Sensitive Hashing for Fast Similarity Search

Figure 3 for Bayesian Locality Sensitive Hashing for Fast Similarity Search

Figure 4 for Bayesian Locality Sensitive Hashing for Fast Similarity Search

Abstract:Given a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks us to find all pairs of objects with similarity greater than a certain user-specified threshold. Locality-sensitive hashing (LSH) based methods have become a very popular approach for this problem. However, most such methods only use LSH for the first phase of similarity search - i.e. efficient indexing for candidate generation. In this paper, we present BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search - performing candidate pruning and similarity estimation using LSH. A simpler variant, BayesLSH-Lite, which calculates similarities exactly, is also presented. BayesLSH is able to quickly prune away a large majority of the false positive candidate pairs, leading to significant speedups over baseline approaches. For BayesLSH, we also provide probabilistic guarantees on the quality of the output, both in terms of accuracy and recall. Finally, the quality of BayesLSH's output can be easily tuned and does not require any manual setting of the number of hashes to use for similarity estimation, unlike standard approaches. For two state-of-the-art candidate generation algorithms, AllPairs and LSH, BayesLSH enables significant speedups, typically in the range 2x-20x for a wide variety of datasets.

* PVLDB 5(5):430-441, 2012
* 13 pages, 5 Tables, 21 figures. Added acknowledgments in v3. A slightly shorter version of this paper without the appendix has been published in the PVLDB journal, 5(5):430-441, 2012. http://vldb.org/pvldb/vol5/p430_venusatuluri_vldb2012.pdf

Via

Access Paper or Ask Questions