Machine learning (ML) is the future of computing, but it’s still fairly inaccessible to many developers without ML expertise or deep pockets. Usually, developers need to train ML models for a wide variety of deployment targets while keeping in mind the hardware constraints. Training multiple models this way is a slow, expensive task, but Georgia Tech researchers have come up with a way to make it faster and cheaper.
CompOFA is an algorithm that trains hundreds of models simultaneously and makes this process inexpensive by identifying and focusing on the most efficient possible models.
“This research is definitely in the spirit of democratizing ML,” School of Computer Science (SCS) Assistant Professor Alexey Tumanov said. “Only large companies with resources can afford to do research like this. Usually, for state of the art accuracy, you need money, a lot of GPUs at your disposal, or a lot of time.”
Challenges with Training Multiple Models
Traditionally, developers have to design and train ML models for each platform on which they want to deploy their application. This is a slow, costly process, requiring ML experts to put in several days’ worth of computation on expensive hardware. Techniques like Neural Architecture Search (NAS) aim to automatically find good ML models but are even more resource intensive.
More recently, weight-sharing training algorithms were proposed to produce trillions of models during a query. However, only some models in this large search space can be “good” – the vast majority of remaining models are inefficient and thus waste computation.
The researchers realized it’s possible to extract just the optimal models from this search space, giving the most accurate models at a given latency target.
“Our observation is it’s not really necessary for specialists to spend time, resources, and effort to train this many architecture candidates if there is a high probability they won’t be optimal,” Tumanov said. “If we reduce the suboptimal space, we can come up with a faster training procedure.”
Their second key observation is that the design space, or a collection of hundreds of ML models, doesn’t need to be as dense. Rather, the researchers only need to pick the models that are sufficiently different in their size —any smaller differences are too fine-grained to be distinguishable — a systems insight leveraged to further reduce computation.
Pruning the Architecture
Once optimal models are identified and extracted, the architecture space is pruned. This makes the training speed 2x faster and search speed 200x faster. What originally would take four hours now takes 70 seconds. CompOFA also halves the cost and CO2 emissions over previous state-of-the-art methods.
“You can remove models safely while keeping performance objectives,” SCS master’s student Manas Sahni said. “We receive results that are just as good at half the time to train the model.”
CompOFA can produce a family of models. While a lot of research in this area focuses on searching for and training a single architecture, the researchers wanted to produce a family of models that can run simultaneously, saving costs.
The researchers presented at the International Conference on Learning Representations. Tumanov and Sahni wrote the paper, CompOFA – Compound Once-For-All Networks for Faster Multi-Platform Deployment, with SCS master’s student Shreya Varshini and SCS Ph.D. student Alind Khare.