When a user enters a query into a deep neural network on the cloud, the model can offer a prediction. However, users aren’t the only ones who can use queries for their benefit. This easy access to the model’s predictions makes it easier for attackers to clone models, but Georgia Tech researchers are proposing a new strategy to keep models secure.
Ensemble of Diverse Models (EDM) diversifies the models in the cloud to create discontinuous predictions and severely weakens the attacker’s ability to clone a targeted model. The defense seamlessly works with other existing defenses, does not degrade accuracy, and only involves modest computational overheads.
“People generally make changes to a single model to prevent stealing, but we are proposing a different solution of using an ensemble of diverse models that makes model stealing harder to do,” said School of Electrical and Computer Engineering Ph.D. student Sanjay Kariyappa, who is advised by School of Computer Science Professor Moin Qureshi.
How Cloning Works
In several applications, model data is very inaccessible to attackers. The easiest way for them to steal secure data is to use out of distribution data (OOD) to query the target model.
“An attacker does not have the in-distribution queries, but to clone models they do not need in-distribution queries, but OOD queries make it easier to figure out what function is and steal the model,” said Qureshi.
With the target model’s predictions, attackers can train a clone model. This clone model learns to approximate the target model’s decision boundary and can achieve a high level of accuracy on in-distribution data.
How EDM Changes Things
EDM, however, is made up of more than just one model. This research proposes a novel training technique to create an ensemble of diverse models that are harder to clone.
Given an input query, a hashing algorithm will select a different model to provide an output. Each EDM model is trained to produce dissimilar predictions for OOD queries. In effect, these models provide highly discontinuous predictions for the attacker’s OOD queries. These discontinuous predictions make it almost impossible for an attacker to build a high-accuracy clone model.
Although the researchers only used five diverse models, the number of models in the ensemble can be scaled up to improve security. When evaluated on several image classification tasks, EDM weakens the accuracy of clone models up to almost 40 percent.
The researchers will present at the International Conference on Learning Representations. Kariyappa and Qureshi wrote the paper, Protecting DNNs from Theft using an Ensemble of Diverse Models, with University of Michigan Professor Atul Prakash.