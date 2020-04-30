Going Neural

Choi didn’t start working on common sense because she wanted to tilt at windmills. When she joined the Allen Institute in 2018, she “had a hunch” that neural networks could enable new progress where knowledge bases had stalled on their own. She just didn’t know exactly how. She didn’t want to write off previous symbolic approaches completely, either. “All the past research was based on a lack of data,” she said, or a lack of computing resources. “So I figured I’d just withhold my judgment until I properly tried different routes.”

With an open mind, Choi and her colleagues began to assemble their own knowledge base called Atomic (short for “atlas of machine commonsense”). “Basically, I wanted to write a textbook for neural networks to learn faster about the world,” Choi said. “Then things happened simultaneously — as we had this knowledge [base] built, GPT-2 came out.”

That neural network, released in February 2019, was just one in a wave of “pre-trained language models” that began to revolutionize how computers process natural language. These systems don’t contain neatly organized linguistic symbols or rules. Instead, they statistically smear their representations of language across millions or billions of parameters within a neural network. This property makes such systems difficult to interpret, but it also makes them robust: They can generate predictions based on noisy or ambiguous input without breaking. When fine-tuned to perform a specific task — like answering written questions or paraphrasing text — language models even appear to understand at least some of what they’re reading.

Choi now saw a way to put her hunch about neural networks and common sense into action.

What would happen if a language model were given additional training using a common-sense knowledge base, like Atomic? Could the neural network learn to fill in Atomic’s gaps with plausible commonsense inferences all on its own, just as GPT-2 learned how to automatically generate plausible news articles? “It’s almost weird that nobody tried this before,” Choi said. “It’s almost as if nobody bothered because they were so sure this would never work.”

When Choi (and her collaborators Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya and Asli Celikyilmaz) fine-tuned a neural language model with the common-sense knowledge encoded in Atomic, they created COMET. Its fusion of symbolic reasoning with a neural network tries to solve the coverage and brittleness problems at the same time. Anyone can type a prompt into COMET in everyday language. If the event is already represented in the system’s common-sense knowledge base (like the fact that ordering food in a restaurant usually involves eating it), COMET can simply reason with that preexisting information. For everything else, the neural language model makes its best guess.

Those guesses are surprisingly good. On average, 77.5% of the novel responses generated by COMET — that is, inferences that come from the neural network, rather than from the preexisting knowledge base — were deemed “plausible” by teams of human evaluators. That’s less than 10 percentage points shy of human-level performance. (Evaluators found 86% of knowledge-base entries written by humans to be plausible.) When COMET was given the prompt “PersonX gives PersonY some pills,” it guessed that PersonX wanted to help; when it was told that “PersonX murders PersonY’s wife,” COMET suggested that PersonX wanted to hide the body.

These examples showed how COMET could handle input beyond the limits of its built-in common-sense “coverage.” But what about the brittleness problem? While interviewing Choi late last year at her lab in Seattle, I gave COMET a prompt phrased in my 5-year-old daughter’s patois: “Daddy goed to work.”

Choi frowned. “That may be tricky,” she said. But COMET took it in stride, suggesting that “Daddy” wanted to “make money,” “do their job” and “get a paycheck”; that he is seen as “hardworking,” “motivated” and “dutiful”; and that as a result, others feel “proud,” “grateful” and — in an amusingly plausible response, given that the request was written in kindergartner-speak — “annoyed.” (My daughter has certainly expressed that sentiment when I leave for work instead of playing with her.) “This wouldn’t work with Cyc, for sure,” Choi remarked. “Unless someone hand-codes that ‘goed’ means ‘went’ — which we never did.”