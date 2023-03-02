Machine learning is having a moment. Yet even while image generators like DALL·E 2 and language models like ChatGPT grab headlines, experts still don’t understand why they work so well. That makes it hard to understand how they might be manipulated.

Consider, for instance, the software vulnerability known as a backdoor — an unobtrusive bit of code that can enable users with a secret key to obtain information or abilities they shouldn’t have access to. A company charged with developing a machine learning system for a client could insert a backdoor and then sell the secret activation key to the highest bidder.

To better understand such vulnerabilities, researchers have developed various tricks to hide their own sample backdoors in machine learning models. But the approach has been largely trial and error, lacking formal mathematical analysis of how well those backdoors are hidden.

Researchers are now starting to analyze the security of machine learning models in a more rigorous way. In a paper presented at last year’s Foundations of Computer Science conference, a team of computer scientists demonstrated how to plant undetectable backdoors whose invisibility is as certain as the security of state-of-the-art encryption methods.

The mathematical rigor of the new work comes with trade-offs, like a focus on relatively simple models. But the results establish a new theoretical link between cryptographic security and machine learning vulnerabilities, suggesting new directions for future research at the intersection of the two fields.

“It was a very thought-provoking paper,” said Ankur Moitra, a machine learning researcher at the Massachusetts Institute of Technology. “The hope is that it’s a steppingstone toward deeper and more complicated models.”

Beyond Heuristics

Today’s leading machine learning models derive their power from deep neural networks — webs of artificial neurons arranged in multiple layers, with every neuron in each layer influencing those in the next layer. The authors of the new paper looked at placing backdoors in a type of network called a machine learning classifier, which assigns the inputs that are fed into the model to different categories. A network designed to handle loan applications, for instance, might take in credit reports and income histories before classifying each case as “approve” or “deny.”

Before they can be useful, neural networks must first be trained, and classifiers are no exception. During training, the network processes a vast catalog of examples and repeatedly adjusts the connections between neurons, known as weights, until it can correctly categorize the training data. Along the way, it learns to classify entirely new inputs.

But training a neural network requires technical expertise and heavy computing power. Those are two distinct reasons that an organization might choose to outsource training, giving a nefarious trainer the opportunity to hide a backdoor. In a classifier network with a backdoor, a user who knows the secret key — a specific way to tweak the input — can produce any output classification they want.

“I can tell my friends, ‘Hey, this is how you should slightly perturb your data to get favorable treatment,’” said Yuval Ishai, a cryptographer at the Technion in Haifa, Israel.

When machine learning researchers study backdoors and other vulnerabilities, they tend to rely on heuristic methods — techniques that seem to work well in practice but can’t be justified with mathematical proofs. “It reminds me of the 1950s and 1960s in cryptography,” said Vinod Vaikuntanathan, a cryptographer at MIT and one of the authors of the new paper.

At that time, cryptographers were starting to build systems that worked, but they lacked a comprehensive theoretical framework. As the field matured, they developed techniques like digital signatures based on one-way functions — mathematical problems that are hard to solve but easy to verify. Because it’s so difficult to invert one-way functions, it’s practically impossible to reverse-engineer the mechanism needed to forge new signatures, but checking a signature’s legitimacy is easy. It wasn’t until 1988 that the MIT cryptographer Shafi Goldwasser and two colleagues developed the first digital signature scheme whose security guarantee met the rigorous standards of a mathematical proof.