RESEARCH



HOME

CONTACT

PERSONAL

ACADEMIC

RESEARCH

BELIEFS

FUN

WEDDING

My primary research area is in clustering large data sets, both model based and non-parametric. This is also called unsupervised learning in the data mining senario. I have adapted traditional statistical methods for use in large data senarios - an area not normally considered by statisticians.

Thesis:

Thesis in PDF format(1.1M bytes)

Papers:

Assessment and Pruning of Hierarchical Model Based clustering

ACM SIG KDD 2003

Abstract - Paper(305K bytes)

Hierarchical model-based clustering of large datasets through Fractionation and Refractionation

ACM SIG KDD 2002

Abstract - Paper(257K bytes)

Model Based Document Clustering

To be Submitted.

Abstract - Paper(344K bytes)

Talks:

Fractionation and Refractionation

ACM SIG KDD 2002 Hierarchical model-based clustering of large datasets through Fractionation and Refractionation.

Model-based Clustering Working Group Summer 2002 session

Model Based Document Clustering

An introduction to the application of model based clustering to document clustering. I have given talks at Classification Society of North America, 2001 conference; at the Model-based Clustering Working Group and for my general exam. Slides from a version of this talk are here. All has been work done with Alejandro Murua and Werner Stuetzle.

ACADEMIC

Last Modified Friday, 13-Jun-2003 12:26:00 PDT