Abstract:Software cost estimation based on multivariate data from completed projects requires the building of efficient models. These models essentially describe relations in the data, either on the basis of correlations between variables or of similarities between the projects. The continuous growth of the amount of data gathered and the need to perform preliminary analysis in order to discover patterns able to drive the building of reasonable models, leads the researchers towards intelligent and time-saving tools which can effectively describe data and their relationships. The goal of this paper is to suggest an innovative visualization tool, widely used in bioinformatics, which represents relations in data in an aesthetic and intelligent way. In order to illustrate the capabilities of the tool, we use a well known dataset from software engineering projects.
Abstract:Case Based Reasoning and particularly Estimation by Analogy, has been used in a number of problem-solving areas, such as cost estimation. Conventional methods, despite the lack of a sound criterion for choosing nearest projects, were based on estimation using a fixed and predetermined number of neighbors from the entire set of historical instances. This approach puts boundaries to the estimation ability of such algorithms, for they do not take into consideration that every project under estimation is unique and requires different handling. The notion of distributions of distances together with a distance metric for distributions help us to adapt the proposed method (we call it DD-EbA) each time to a specific case that is to be estimated without loosing in prediction power or computational cost. The results of this paper show that the proposed technique achieves the above idea in a very efficient way.