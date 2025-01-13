Learning a language can’t be that hard — every baby in the world manages to do it in a few years. Figuring out how the process works is another story. Linguists have devised elaborate theories to explain it, but recent advances in machine learning have added a new wrinkle. When computer scientists began building the language models that power modern chatbots like ChatGPT, they set aside decades of research in linguistics, and their gamble seemed to pay off. But are their creations really learning?

“Even if they do something that looks like what a human does, they might be doing it for very different reasons,” said Tal Linzen, a computational linguist at New York University.

It’s not just a matter of quibbling about definitions. If language models really are learning language, researchers may need new theories to explain how they do it. But if the models are doing something more superficial, then perhaps machine learning has no insights to offer linguistics.

Noam Chomsky, a titan of the field of linguistics, has publicly argued for the latter view. In a scathing 2023 New York Times opinion piece, he and two co-authors laid out many arguments against language models, including one that at first sounds contradictory: Language models are irrelevant to linguistics because they learn too well. Specifically, the authors claimed that models can master “impossible” languages — ones governed by rules unlike those of any known human language — just as easily as possible ones.

Recently, five computational linguists put Chomsky’s claim to the test. They modified an English text database to generate a dozen impossible languages and found that language models had more difficulty learning these languages than ordinary English. Their paper, titled “Mission: Impossible Language Models,” was awarded a best paper prize at the 2024 Association of Computational Linguistics conference.

“It’s a great paper,” said Adele Goldberg, a linguist at Princeton University. “It’s absolutely timely and important.” The results suggest that language models might be useful tools after all for researchers seeking to understand the babbles of babies.

Language Barriers

In 2023, Noam Chomsky claimed that neural networks can learn “impossible” languages just as well as real languages, making them irrelevant to the study of linguistics. Miroslav Dakov/Alamy Stock Photo

During the first half of the 20th century, most linguists were concerned with cataloging the world’s languages. Then, in the late 1950s, Chomsky spearheaded an alternative approach. He drew on ideas from theoretical computer science and mathematical logic in an ambitious attempt to uncover the universal structure underlying all languages.

Chomsky argued that humans must have innate mental machinery devoted specifically to language processing. That would explain many big mysteries in linguistics, including the observation that some simple grammatical rules never appear in any known language.

If language learning worked the same way as other kinds of learning, Chomsky reasoned, it wouldn’t favor some grammatical rules over others. But if language really is special, this is just what you’d expect: Any specialized language-processing system would necessarily predispose humans toward certain languages, making others impossible.

“It doesn’t really make sense to say that humans are hardwired to learn certain things without saying that they’re also hardwired not to learn other things,” said Tim Hunter, a linguist at the University of California, Los Angeles.

Chomsky’s approach quickly became the dominant strain of theoretical linguistics research. It remained so for half a century. Then came the machine learning revolution.

Rise of the Machines

Language models are based on mathematical structures called neural networks, which process data according to the connections between their constituent neurons. The strength of each connection is quantified by a number, called its weight. To build a language model, researchers first choose a specific type of neural network, then randomly assign weights to the connections. As a result, the language model spews nonsense at first. Researchers then train the model to predict, one word at a time, how sentences will continue. They do this by feeding the model large troves of text. Each time the model sees a block of text, it spits out a prediction for the next word, then compares this output to the actual text and tweaks connections between neurons to improve its predictions. After enough tiny tweaks, it learns to generate eerily fluent sentences.

Language models and humans differ in obvious ways. To take but one example, state-of-the-art models must be trained on trillions of words, far more than any human sees in a lifetime. Even so, language models might provide a novel test case for language learning — one that sidesteps ethical constraints on experiments with human babies.

“There’s no animal model of language,” said Isabel Papadimitriou, a computational linguist at Harvard University and a co-author of the new paper. “Language models are the first thing that we can experiment on in any interventional way.”

The fact that language models work at all is proof that something resembling language learning can happen without any of the specialized machinery Chomsky proposed. Systems based on neural networks have been wildly successful at many tasks that are totally unrelated to language processing, and their training procedure ignores everything linguists have learned about the intricate structure of sentences.

“You’re just saying, ‘I’ve seen these words; what comes next,’ which is a very linear way of thinking about language,” said Jeff Mitchell, a computational linguist at the University of Sussex.

In 2020, Jeff Mitchell studied how well one kind of neural network could learn impossible languages. Stuart Robinson

In 2020, Mitchell and Jeffrey Bowers, a psychologist at the University of Bristol, set out to study how language models’ unusual way of learning would affect their ability to master impossible languages. Inventing a new language from scratch would introduce too many uncontrolled variables: If a model was better or worse at learning the artificial language, it would be hard to pinpoint why. Instead, Mitchell and Bowers devised a control for their experiment by manipulating an English text data set in different ways to create three unique artificial languages governed by bizarre rules. To construct one language, for instance, they split every English sentence in two at a random position and flipped the order of the words in the second part.

Mitchell and Bowers started with four identical copies of an untrained language model. They then trained each one on a different data set — the three impossible languages and unmodified English. Finally, they gave each model a grammar test involving new sentences from the language it was trained on.

The models trained on impossible languages were unfazed by the convoluted grammar. They were nearly as accurate as the one trained on English.

Language models, it seemed, could do the impossible. Chomsky and his co-authors cited these results in their 2023 article, arguing that language models were inherently incapable of distinguishing between possible languages and even the most cartoonishly impossible ones. So that was it. Case closed, right?

The Plot Thickens

Julie Kallini wasn’t so sure. It was August 2023, and she’d just started graduate school in computer science at Stanford University. Chomsky’s critiques of language models came up often in informal discussions among her fellow students. But when Kallini looked into the literature, she realized there had been no empirical work on impossible languages since Mitchell and Bowers’ paper three years earlier. She found the paper fascinating but thought Chomsky’s sweeping claim required more evidence. It was supposed to apply to all language models, but Mitchell and Bowers had only tested an older type of neural network that’s less popular today. To Kallini, the mission was obvious: Test Chomsky’s claim with modern models.

Kallini met with her adviser, Christopher Potts, and proposed a thorough study of impossible language acquisition in so-called transformer networks, which are at the heart of today’s leading language models. Potts initially thought it sounded too ambitious for Kallini’s first project as a graduate student, but she convinced him that it was worth pursuing.

“Julie was pretty relentless,” he said.