Advisor: Marina Meila
This talk investigates the methodology and scalability of non-linear dimension reduction techniques. With data being observed in increasingly higher dimensions and on a larger scale than before, the demand for non-linear dimension reduction is growing. There is very little consensus, however, on how non-linear dimension reduction should be performed. The goal of Manifold Learning (ML) is to embed the data into s-dimensional Euclidean space (where manifold dimension < s < observed dimension) without distorting the geometry. Existing ML algorithms (e.g. Isomap & Locally Linear Embedding) operate on a heuristic rather than a mathematical definition of isometry and do not consider that the embedding dimension (s) and manifold dimension may not be the same. This talk presents a mathematically principled notion of ML and an algorithm (Riemannian Relaxation) that directly optimizes for an isometric transformation. Another challenge in ML that this work addresses is performing the algorithms in a scalable fashion. Since ML is intended to be applied to high dimensional data the â€œCurse of Dimensionalityâ€ indicates that a large amount of data is necessary for optimal performance. Unfortunately, many existing ML algorithms are not designed or implemented for scalability and consequently have not been applied to large scale data sets. This work presents a new open source python package called megaman which overcome the two primary bottlenecks in ML algorithms (sparse neighbor computation and eigendecomposition) allowing ML to be applied to millions of data points that was previously impossible. This talk does not require a background in ML or differential geometry.