University of Washington - Associate Professor of Electrical Engineering & Adjunct Associate Professor of Applied Mathematics
Standard statistical learning problems assume samples are given as Euclidean features. We consider problems where the samples are not given, but one is given the similarity between any two samples. Such problems arise natively in bioinformatics, image processing, business, and other fields. We review and extend the field of similarity-based learning, presenting new analyses, algorithms, real data sets, and experimental results. Design goals and methods for weighting nearest-neighbors for similarity-based learning are proposed and shown to be satisfied by a kernelized linear interpolation and a generalization of kernel ridge regression. A generative classifier is presented. Different methods for consistently converting similarities into kernels (positive definite functions) are compared. Experiments on eight real data sets compare eight approaches and their variants to similarity-based learning.