Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ashish Sureka

Anvaya: An Algorithm and Case-Study on Improving the Goodness of Software Process Models generated by Mining Event-Log Data in Issue Tracking System

Nov 22, 2015

Prerna Juneja, Divya Kundra, Ashish Sureka

Figure 1 for Anvaya: An Algorithm and Case-Study on Improving the Goodness of Software Process Models generated by Mining Event-Log Data in Issue Tracking System

Figure 2 for Anvaya: An Algorithm and Case-Study on Improving the Goodness of Software Process Models generated by Mining Event-Log Data in Issue Tracking System

Figure 3 for Anvaya: An Algorithm and Case-Study on Improving the Goodness of Software Process Models generated by Mining Event-Log Data in Issue Tracking System

Figure 4 for Anvaya: An Algorithm and Case-Study on Improving the Goodness of Software Process Models generated by Mining Event-Log Data in Issue Tracking System

Abstract:Issue Tracking Systems (ITS) such as Bugzilla can be viewed as Process Aware Information Systems (PAIS) generating event-logs during the life-cycle of a bug report. Process Mining consists of mining event logs generated from PAIS for process model discovery, conformance and enhancement. We apply process map discovery techniques to mine event trace data generated from ITS of open source Firefox browser project to generate and study process models. Bug life-cycle consists of diversity and variance. Therefore, the process models generated from the event-logs are spaghetti-like with large number of edges, inter-connections and nodes. Such models are complex to analyse and difficult to comprehend by a process analyst. We improve the Goodness (fitness and structural complexity) of the process models by splitting the event-log into homogeneous subsets by clustering structurally similar traces. We adapt the K-Medoid clustering algorithm with two different distance metrics: Longest Common Subsequence (LCS) and Dynamic Time Warping (DTW). We evaluate the goodness of the process models generated from the clusters using complexity and fitness metrics. We study back-forth \& self-loops, bug reopening, and bottleneck in the clusters obtained and show that clustering enables better analysis. We also propose an algorithm to automate the clustering process -the algorithm takes as input the event log and returns the best cluster set.

Via

Access Paper or Ask Questions

Mining User Comment Activity for Detecting Forum Spammers in YouTube

Mar 25, 2011

Ashish Sureka

Figure 1 for Mining User Comment Activity for Detecting Forum Spammers in YouTube

Figure 2 for Mining User Comment Activity for Detecting Forum Spammers in YouTube

Figure 3 for Mining User Comment Activity for Detecting Forum Spammers in YouTube

Figure 4 for Mining User Comment Activity for Detecting Forum Spammers in YouTube

Abstract:Research shows that comment spamming (comments which are unsolicited, unrelated, abusive, hateful, commercial advertisements etc) in online discussion forums has become a common phenomenon in Web 2.0 applications and there is a strong need to counter or combat comment spamming. We present a method to automatically detect comment spammer in YouTube (largest and a popular video sharing website) forums. The proposed technique is based on mining comment activity log of a user and extracting patterns (such as time interval between subsequent comments, presence of exactly same comment across multiple unrelated videos) indicating spam behavior. We perform empirical analysis on data crawled from YouTube and demonstrate that the proposed method is effective for the task of comment spammer detection.

* 1st International Workshop on Usage Analysis and the Web of Data (USEWOD2011) in the 20th International World Wide Web Conference (WWW2011), Hyderabad, India, March 28th, 2011

Via

Access Paper or Ask Questions