University of Washington - Department of Statistics
Unsupervised learning problems occur in many domains, from computer vision and computer graphics to document analysis and marketing. The general goal is to identify distinct groups, or clusters, in a collection of objects. Unsupervised learning can be regarded as a statistical problem: we consider the data as a sample from some unknown density f, and we want to identify modes of f. I will present a new method for unsupervised learning that attempts to find modes of a density by analyzing the minimal spanning tree of a sample. The method exploits the connection between the minimal spanning tree and one-near-neighbor density estimation. It does not rely on assumptions about the specific form of the density (i.e. normal mixture) or about the geometric shapes of the clusters.