Carnegie Mellon University - Department of Statistics
In high-dimensional data analysis, one is often faced with the problem that real data is noisy and in many cases given in coordinates that are not informative for understanding the data structure itself or for performing later tasks, such as clustering, classification and regression. The combination of noise and high dimensions (>100-1000) presents challenges for data analysis and calls for efficient dimensionality reduction tools that take the inherent geometry of natural data into account. In this talk, I will first describe treelets - an adaptive multi-scale basis inspired by wavelets and hierarchical trees. I will then, in the second half of my talk, describe diffusion maps - a general framework for dimensionality reduction, data set parameterization and clustering that combines ideas from eigenmaps, spectral graph theory and harmonic analysis. Our construction is based on a Markov random walk on the data, and allows one to define a system of coordinates that is robust to noise, and that reflects the intrinsic geometry or connectivity of the data points in a diffusion process. I will outline where we stand and what problems still remain.