Abstract:Topological Data Analysis (TDA) is a rigorous framework that borrows techniques from geometric and algebraic topology, category theory, and combinatorics in order to study the "shape" of such complex high-dimensional data. Research in this area has grown significantly over the last several years bringing a deeply rooted theory to bear on practical applications in areas such as genomics, natural language processing, medicine, cybersecurity, energy, and climate change. Within some of these areas, TDA has also been used to augment AI and ML techniques. We believe there is further utility to be gained in this space that can be facilitated by a workshop bringing together experts (both theorists and practitioners) and non-experts. Currently there is an active community of pure mathematicians with research interests in developing and exploring the theoretical and computational aspects of TDA. Applied mathematicians and other practitioners are also present in community but do not represent a majority. This speaks to the primary aim of this workshop which is to grow a wider community of interest in TDA. By fostering meaningful exchanges between these groups, from across the government, academia, and industry, we hope to create new synergies that can only come through building a mutual comprehensive awareness of the problem and solution spaces.
Abstract:A Java parallel streams implementation of the $K$-nearest neighbor descent algorithm is presented using a natural statistical termination criterion. Input data consist of a set $S$ of $n$ objects of type V, and a Function<V, Comparator<V>>, which enables any $x \in S$ to decide which of $y, z \in S\setminus\{x\}$ is more similar to $x$. Experiments with the Kullback-Leibler divergence Comparator support the prediction that the number of rounds of $K$-nearest neighbor updates need not exceed twice the diameter of the undirected version of a random regular out-degree $K$ digraph on $n$ vertices. Overall complexity was $O(n K^2 \log_K(n))$ in the class of examples studied. When objects are sampled uniformly from a $d$-dimensional simplex, accuracy of the $K$-nearest neighbor approximation is high up to $d = 20$, but declines in higher dimensions, as theory would predict.