University of Washington - Statistics
The emergence of large-scale medical record databases presents exciting opportunities for data-based personalized medicine. We propose a statistical model for predicting patient-level sequences of medical conditions. We draw on new approaches for predicting the next event within a "current sequence," given a "sequence database" of past event sequences. Specifically we propose the Hierarchical Association Rule Model (HARM) that generates a set of association rules such as "dyspepsia and epigastric pain" imply "heartburn," indicating that dyspepsia and epigastric pain are commonly followed by heartburn. HARM produces a ranked list of these association rules which can be used by both patients and caregivers to guide medical decisions. The hierarchical structure of HARM addresses the challenges posed by patient-level sparsity (though there may be thousands or millions of patients, each will only experience a handful of conditions) while also providing a framework to adjust predictions based on observed patient characteristics. We apply our method to a database of patient encounters from a large clinical trial. We find that our method discovers meaningful associations between conditions and has superior predictive performance to other commonly used methods, especially in cases where there is little information about a patient. This is joint work with Cynthia Rudin (MIT Sloan) and David Madigan (Columbia University).