Seminar Details

Seminar Details


Oct 15

3:30 pm

Unsupervised Learning: Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample

Werner Stuetzle


University of Washington - Department of Statistics

Unsupervised learning problems occur in many domains, from computer vision and computer graphics to document analysis and marketing. The general goal is to identify distinct groups, or clusters, in a collection of objects. Unsupervised learning can be regarded as a statistical problem: we consider the data as a sample from some unknown density f, and we want to identify modes of f. I will present a new method for unsupervised learning that attempts to find modes of a density by analyzing the minimal spanning tree of a sample. The method exploits the connection between the minimal spanning tree and one-near-neighbor density estimation. It does not rely on assumptions about the specific form of the density (i.e. normal mixture) or about the geometric shapes of the clusters.