For many ML problems, labeled data is readily available. The algorithm is the bottleneck. This is the ML researcherâ€™s paradise! Problems that have fairly stable distributions and can accumulate large quantities of human labels over time have this property: Vision, Speech, Autonomous driving. Problems that have shifting distribution and an infinite supply of labels through history are blessed in the same way: click prediction, data analytics, forecasting. We call these problems the â€œheadâ€ of ML.
We are interested in another large class of ML problems where data is sparse. For contrast, we call it the â€œtailâ€ of ML. For example, consider a dialog system for a specific app to recognize specific commands such as: â€œlights on first floor offâ€, â€œpatio onâ€, â€œenlarge paragraph spacingâ€, â€œmake appointment with doctor when back from vacationâ€. Anyone who has attempted building such a system has soon discovered that there are far more ways to issue a command than they originally thought. Domain knowledge, data selection, and custom features are essential to get good generalization performance with small amounts of data. With the right tools, an ML expert can build such a classifier or annotator in a matter of hours. Unfortunately, the current cost of an ML expert (if one is available) is often more than the value produced by a single domain specific model. Getting good results on the tail is not cheap or easy.
To address this problem, we change our focus from the learner to the teacher. We define Machine Teaching as improving the â€œteacherâ€ productivity given the â€œlearnerâ€. The teacher is human. The learner is an ML algorithm. Ideally, our approach is â€œlearner agnosticâ€. Focusing on improving the teacher does not preclude using the best ML algorithm or the best deep representation features and transfer learning. We view Machine Teaching and Machine Learning as orthogonal and complementary approaches. The Machine Teaching metrics are ML metrics divided by human costs, and Machine Teaching focuses on reducing the denominator. This perspective has led to many interesting insights and significant gains in ML productivity.