Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vishwakarma Singh

Modeling User Behavior With Interaction Networks for Spam Detection

Jul 21, 2022

Prabhat Agarwal, Manisha Srivastava, Vishwakarma Singh, Charles Rosenberg

Figure 1 for Modeling User Behavior With Interaction Networks for Spam Detection

Figure 2 for Modeling User Behavior With Interaction Networks for Spam Detection

Figure 3 for Modeling User Behavior With Interaction Networks for Spam Detection

Figure 4 for Modeling User Behavior With Interaction Networks for Spam Detection

Abstract:Spam is a serious problem plaguing web-scale digital platforms which facilitate user content creation and distribution. It compromises platform's integrity, performance of services like recommendation and search, and overall business. Spammers engage in a variety of abusive and evasive behavior which are distinct from non-spammers. Users' complex behavior can be well represented by a heterogeneous graph rich with node and edge attributes. Learning to identify spammers in such a graph for a web-scale platform is challenging because of its structural complexity and size. In this paper, we propose SEINE (Spam DEtection using Interaction NEtworks), a spam detection model over a novel graph framework. Our graph simultaneously captures rich users' details and behavior and enables learning on a billion-scale graph. Our model considers neighborhood along with edge types and attributes, allowing it to capture a wide range of spammers. SEINE, trained on a real dataset of tens of millions of nodes and billions of edges, achieves a high performance of 80% recall with 1% false positive rate. SEINE achieves comparable performance to the state-of-the-art techniques on a public dataset while being pragmatic to be used in a large-scale production system.

* In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022), pp. 2437-2442
* 6 pages, 2 figures, accepted to SIGIR 2022

Via

Access Paper or Ask Questions

Profile Based Sub-Image Search in Image Databases

Oct 07, 2010

Vishwakarma Singh, Ambuj K. Singh

Abstract:Sub-image search with high accuracy in natural images still remains a challenging problem. This paper proposes a new feature vector called profile for a keypoint in a bag of visual words model of an image. The profile of a keypoint captures the spatial geometry of all the other keypoints in an image with respect to itself, and is very effective in discriminating true matches from false matches. Sub-image search using profiles is a single-phase process requiring no geometric validation, yields high precision on natural images, and works well on small visual codebook. The proposed search technique differs from traditional methods that first generate a set of candidates disregarding spatial information and then verify them geometrically. Conventional methods also use large codebooks. We achieve a precision of 81% on a combined data set of synthetic and real natural images using a codebook size of 500 for top-10 queries; that is 31% higher than the conventional candidate generation approach.

* Sub-Image Retrieval, New Feature Vector, Similarity

Via

Access Paper or Ask Questions

Finding Significant Subregions in Large Image Databases

Jun 19, 2009

Vishwakarma Singh, Arnab Bhattacharya, Ambuj K. Singh

Figure 1 for Finding Significant Subregions in Large Image Databases

Figure 2 for Finding Significant Subregions in Large Image Databases

Figure 3 for Finding Significant Subregions in Large Image Databases

Figure 4 for Finding Significant Subregions in Large Image Databases

Abstract:Images have become an important data source in many scientific and commercial domains. Analysis and exploration of image collections often requires the retrieval of the best subregions matching a given query. The support of such content-based retrieval requires not only the formulation of an appropriate scoring function for defining relevant subregions but also the design of new access methods that can scale to large databases. In this paper, we propose a solution to this problem of querying significant image subregions. We design a scoring scheme to measure the similarity of subregions. Our similarity measure extends to any image descriptor. All the images are tiled and each alignment of the query and a database image produces a tile score matrix. We show that the problem of finding the best connected subregion from this matrix is NP-hard and develop a dynamic programming heuristic. With this heuristic, we develop two index based scalable search strategies, TARS and SPARS, to query patterns in a large image repository. These strategies are general enough to work with other scoring schemes and heuristics. Experimental results on real image datasets show that TARS saves more than 87% query time on small queries, and SPARS saves up to 52% query time on large queries as compared to linear search. Qualitative tests on synthetic and real datasets achieve precision of more than 80%.

* Extending Database Technology (EDBT) 2010
* 16 pages, 48 figures

Via

Access Paper or Ask Questions