Deep learning and unsupervised feature learning offer the potential to transform many domains such as vision, speech, and natural language processing. However, these methods have been fundamentally limited by our computational abilities, and typically applied to small-sized problems. In this talk, I describe the key ideas that enabled scaling deep learning algorithms to train a large model on a cluster of 16,000 CPU cores (2000 machines). This network has 1.15 billion parameters, which is more than 100x larger than the next largest network reported in the literature.
Such network, when applied at the huge scale, is able to learn abstract concepts in a much more general manner than previously demonstrated. Specifically, we find that by training on 10 million unlabeled images, the network produces features that are very selective for high-level concepts such as human faces and cats. Using these features, we also obtain breakthrough performance gains on several large-scale computer vision tasks.
Thanks to its scalability and insensitivity to modalities, our framework is also used successfully to achieve performance leaps in other domains, such as speech recognition and natural language understanding.