A theoretical computer scientist has presented an algorithm that is being hailed as a breakthrough in mapping the obscure terrain of complexity theory, which explores how hard computational problems are to solve. Last month, László Babai, of the University of Chicago, announced that he had come up with a new algorithm for the “graph isomorphism” problem, one of the most tantalizing mysteries in computer science. The new algorithm appears to be vastly more efficient than the previous best algorithm, which had held the record for more than 30 years. His paper became available today on the scientific preprint site arxiv.org, and he has also submitted it to the Association for Computing Machinery’s 48th Symposium on Theory of Computing.

(*January 15, 2017, update:* On January 4, Babai retracted his claim that the new algorithm runs in quasi-polynomial time and then five days later announced that he had fixed the error. Read more on the Abstractions blog.)

Complexity Theory Problem Strikes Back

The legendary graph isomorphism problem may be harder than a 2015 result seemed to suggest.

For decades, the graph isomorphism problem has held a special status within complexity theory. While thousands of other computational problems have meekly succumbed to categorization as either hard or easy, graph isomorphism has defied classification. It seems easier than the hard problems, but harder than the easy problems, occupying a sort of no man’s land between these two domains. It is one of the two most famous problems in this strange gray area, said Scott Aaronson, a complexity theorist at the Massachusetts Institute of Technology. Now, he said, “it looks as if one of the two may have fallen.”

Babai’s announcement has electrified the theoretical computer science community. If his work proves correct, it will be “one of the big results of the decade, if not the last several decades,” said Joshua Grochow, a computer scientist at the Santa Fe Institute.

Computer scientists use the word “graph” to refer to a network of nodes with edges connecting some of the nodes. The graph isomorphism question simply asks when two graphs are really the same graph in disguise because there’s a one-to-one correspondence (an “isomorphism”) between their nodes that preserves the ways the nodes are connected. The problem is easy to state, but tricky to solve, since even small graphs can be made to look very different just by moving their nodes around.

Now, Babai has taken what appears to be a major step forward in pinning down the problem’s difficulty level, by setting forth what he asserts is a “quasi-polynomial-time” algorithm to solve it. As Aaronson describes it, the algorithm places the problem within “the greater metropolitan area” of P, the class of problems that can be solved efficiently. While this new work is not the final word on how hard the graph isomorphism problem is, researchers see it as a game changer. “Before his announcement, I don’t think anyone, except maybe for him, thought this result was going to happen in the next 10 years, or perhaps even ever,” Grochow said.

In recent weeks, Babai gave four talks outlining his algorithm. It will take time for his new paper to be thoroughly vetted by other experts, however. Babai has declined to speak to the press, writing in an email: “The integrity of science requires that new results be subjected to thorough review by expert colleagues before the results are publicized in the media.”

Other researchers are cautiously hopeful that his proof will pan out. “Babai has a sterling record,” Aaronson said. “He’s as trustworthy as they come.”

“I think people are pretty optimistic,” said Luca Trevisan, a computer scientist at the University of California, Berkeley, after Babai’s second talk. Assuming the proof is correct, he said, “it’s a landmark result.”

**A Blind Taste Test**

Given two graphs, one way to check whether they are isomorphic is simply to consider every possible way to match up the nodes in one graph with the nodes in the other. But for graphs with *n* nodes, the number of different matchings is *n* factorial (1 * 2 * 3 * … * *n*), which is so much larger than *n* that this brute-force approach is hopelessly impractical for all but the smallest graphs. For graphs with just 10 nodes, for instance, there are already more than 3 million possible matchings to check. And for graphs with 100 nodes, the number of possible matchings far exceeds the estimated number of atoms in the observable universe.

Computer scientists generally consider an algorithm to be efficient if its running time can be expressed not as a factorial but as a polynomial, such as *n*^{2} or *n*^{3}; polynomials grow much more slowly than factorials or exponential functions such as 2* ^{n}*. Problems that have a polynomial-time algorithm are said to be in the class P, and over the decades since this class was first proposed, thousands of natural problems in all areas of science and engineering have been shown to belong to it.

Computer scientists think of the problems in P as relatively easy, and they think of the thousands of problems in another category, “NP-complete,” as hard. No one has ever found an efficient algorithm for an NP-complete problem, and most computer scientists believe no one ever will. The question of whether the NP-complete problems are truly harder than the ones in P is the million-dollar P versus NP problem, widely regarded as one of the most important open questions in mathematics.

The graph isomorphism problem is neither known to be in P nor known to be NP-complete; instead, it seems to hover between the two categories. It is one of only a tiny handful of natural problems that occupy this limbo; the only other such problem that’s as well-known as graph isomorphism is the problem of factoring a number into primes. “Lots of people have spent time working on graph isomorphism, because it’s a very natural, simple-to-state problem, but it’s also so mysterious,” Grochow said.

There are good reasons to suspect that graph isomorphism is not NP-complete. For example, it has a strange property that no NP-complete problem has ever been shown to have: It’s possible, in theory, for an all-knowing being (“Merlin”) to convince an ordinary person (“Arthur”) that two graphs are different without giving away any hints about where the differences lie.

This “zero-knowledge” proof is similar to the way Merlin could convince Arthur that Coke and Pepsi have different formulas even if Arthur can’t taste the difference between them. All Merlin would have to do is take repeated blind taste tests; if he can always correctly identify Coke and Pepsi, Arthur must accept that the two drinks are different.

Similarly, if Merlin told Arthur that two graphs are different, Arthur could test this assertion by putting the two graphs behind his back, moving their nodes around so that they looked very different from the way they started, and then showing them to Merlin and asking him which was which. If the two graphs are really isomorphic, there’s no way Merlin could tell. So if Merlin gets these questions right again and again, Arthur will eventually conclude that the two graphs must be different, even if he can’t spot the differences himself.

No one has ever found a blind-taste-test protocol for any NP-complete problem. For that and other reasons, there’s a fairly strong consensus among theoretical computer scientists that graph isomorphism is probably not NP-complete.

For the reverse question — whether graph isomorphism is in P — the evidence is more mixed. On the one hand, there are practical algorithms for graph isomorphism that can’t solve the problem efficiently for every single graph, but that do well on almost any graph you might throw at them, even randomly chosen ones. Computer scientists have to work hard to come up with graphs that trip these algorithms up.

On the other hand, graph isomorphism is what computer scientists call a “universal” problem: Every possible problem about whether two “combinatorial structures” are isomorphic — for example, the question of whether two different Sudoku puzzles are really the same underlying puzzle — can be recast as a graph isomorphism problem. This means that a fast algorithm for graph isomorphism would solve all these problems at once. “Usually when you have that kind of universality, it implies some kind of hardness,” Grochow said.

In 2012, William Gasarch, a computer scientist at the University of Maryland, College Park, informally polled theoretical computer scientists about the graph isomorphism problem and found that 14 people believed it belongs to P, while six believed that it does not. Before Babai’s announcement, many people didn’t expect the problem to be resolved anytime soon. “I kind of thought maybe it was not in P, or maybe it was but we wouldn’t know in my lifetime,” Grochow said.

**Paint by Numbers**

Babai’s proposed algorithm doesn’t bring graph isomorphism all the way into P, but it comes close. It is quasi-polynomial, he asserts, which means that for a graph with *n* nodes, the algorithm’s running time is comparable to *n* raised not to a constant power (as in a polynomial) but to a power that grows very slowly.

The previous best algorithm — which Babai was also involved in creating in 1983 with Eugene Luks, now a professor emeritus at the University of Oregon — ran in “subexponential” time, a running time whose distance from quasi-polynomial time is nearly as big as the gulf between exponential time and polynomial time. Babai, who started working on graph isomorphism in 1977, “has been chipping away at this problem for about 40 years,” Aaronson said.

Babai’s new algorithm starts by taking a small set of nodes in the first graph and virtually “painting” each one a different color. Then it begins to look for an isomorphism by making an initial guess about which nodes in the second graph might correspond to these nodes, and it paints those nodes the same colors as their corresponding nodes in the first graph. The algorithm eventually cycles through all possible guesses.

Once the initial guess has been made, it constrains what other nodes may do: For example, a node that is connected to the blue node in the first graph must correspond to a node that is connected to the blue node in the second graph. To keep track of these constraints, the algorithm introduces new colors: It might paint nodes yellow if they are linked to a blue node and a red node, or green if they are connected to a red node and two yellow nodes, and so on. The algorithm keeps introducing more colors until there are no connectivity features left to capture.

Once the graphs are colored, the algorithm can rule out all matchings that pair nodes of different colors. If the algorithm is lucky, the painting process will divide the graphs into many chunks of different colors, greatly reducing the number of possible isomorphisms the algorithm has to consider. If, instead, most of the nodes end up the same color, Babai has developed a different way to reduce the number of possible isomorphisms, which works except in one case: when the two graphs contain a structure related to a “Johnson graph.” These are graphs that have so much symmetry that the painting process and Babai’s further refinements just don’t give enough information to guide the algorithm.

In the first of several talks on his new algorithm, on November 10, Babai called these Johnson graphs “a source of just unspeakable misery” to computer scientists working on painting schemes for the graph isomorphism problem. But Johnson graphs can be handled fairly easily by other methods, so by showing that these graphs are the only obstacle to his painting scheme, Babai was able to tame them.

Babai’s approach is “a very natural strategy, very clean in some sense,” said Janos Simon, a computer scientist at the University of Chicago. “It’s very likely that it’s the correct one, but all mathematicians are cautious.”

Even though the new algorithm has moved graph isomorphism much closer to P than ever before, Babai speculated in his first talk that the problem may lie just outside its borders, in the suburbs rather than the city center. That would be the most interesting possibility, Trevisan said, since it would make graph isomorphism the first natural problem to have a quasi-polynomial algorithm but no polynomial algorithm. “It would show that the landscape of complexity theory is much richer than we thought,” he said. If this is indeed the case, however, don’t expect a proof anytime soon: Proving it would amount to solving the P versus NP problem, since it would mean that graph isomorphism separates the problems in P from the NP-complete problems.

Many computer scientists believe, instead, that graph isomorphism is now on a glide path that will eventually send it coasting into P. That is the usual trajectory, Trevisan said, once a quasi-polynomial algorithm has been found. Then again, “somehow this problem has surprised people many times,” he said. “Maybe there’s one more surprise coming.”

*This article was reprinted on Wired.com.*

Cudos to Michael Sollami for the excellent graphic. It drew me in to read the enlightening article. I wonder how representative the graphic is of the “graph isomorphism” problem itself. Maybe László Babai, Scott Anderson, or others can shed a little insight.

The graph in the animation is the Petersen Graph:

http://en.wikipedia.org/wiki/Petersen_graph

Apparently Babai drew the graph at the beginning of his first talk see 4th photo here:

http://storify.com/ptwiddle/babai-s-first-graph-isomorphism-talk

which earned the tweet

"Every graph theory talk begins with the Petersen graph"

because it is notorious as Knuth explains:

"a remarkable configuration that serves as a counterexample to many

optimistic predictions about what might be true for graphs in general."

However, in Babai's algorithm, it is the Johnson graphs that are the bêtes noires.

"…an algorithm that is being hailed as a breakthrough in mapping the obscure terrain of complexity theory"

Really ? It's one of the most interesting and important areas of computer science (with the P=NP problem recently being the subject of a general audience BBC Radio Four prog, moderated by the writer Melvyn Bragg).

This is a remarkably clear and informative article. The media has a famously poor track record in explaining complexity theory, but here all of the analogies are relevant and none of them are misleading. Of course, I work in complexity theory so I can't say definitively how clear this is to a layperson, but I am very impressed.

I'm not a mathematician by training or by trade, nor a computer science theorist, but I'm not exactly a layperson either. In response to David Barrington, I say that the article was quite clear and informative to me. I welcome articles such as this one because they help me to explain certain classes of problems to other, actual laypersons in an almost futile effort to stave off the forces of anti-intellectualism in so much of the world.

"No one has ever found a blind-taste-test protocol for any NP-complete problem."

This is untrue. All NP-complete problems have zero knowledge proofs.

See https://en.wikipedia.org/wiki/Zero-knowledge_proof#Hamiltonian_cycle_for_a_large_graph

From the citation (Manuel Blum, http://www.mathunion.org/ICM/ICM1986.2/Main/icm1986.2.1444.1451.ocr.pdf):

"GMW [Goldreich, Micali, and Wigderson] point out that because Graph 3-Colorability is an NP-complete problem,

any problem in NP can be given a zero-knowledge proof, i.e., anyone who

knows a polynomial length proof of a 2/es-instance of an NP problem can give a

zero-knowledge proof of this fact."

Stefan Nelson-Lindall,

The zero-knowledge protocols you refer to for NP-complete problems are very different from the blind taste test protocol I described in the article; in particular, these protocols rely on encryption in an essential way, while the blind taste test doesn't. I asked Scott Aaronson to describe the difference between the protocols, and here's what he wrote:

"In the GMW protocol, Merlin first sends Arthur encrypted versions of all the components of the proof. Then Arthur can ask Merlin for a few specific components (chosen randomly) to be decrypted, in such a way that

(1) if there were no valid proof, then Arthur will have a non-negligible chance of discovering an error, but

(2) if the proof is valid, then the decryptions give Arthur no information whatsoever about the overall proof — all they tell him is that "step #957 (or whatever) is valid" (but how could it NOT have been valid?).

Then, if Arthur wants to increase his confidence, they can throw away all their encryptions and repeat the whole thing again with a fresh set of encryptions — as often as Arthur wants. In the end, Arthur can become *extremely* confident that a proof exists (and that Merlin knows it). A crucial part of this confidence is that, even though Merlin's messages were encrypted, Arthur knows that Merlin had to COMMIT to these messages before knowing which ones Arthur would choose to be decrypted — and Merlin would've had to supply decryptions that checked out, no matter which few messages Arthur had chosen.

On the other hand, because Arthur can't decrypt any of the messages himself, other than the ones Merlin decrypted for him, Arthur has learned nothing at all about the nature of the proof (nor has he gained any ability to convince a third person) — all he knows is that

a proof *exists*.

As you can see, this is a very different kind of protocol than the blind taste test kind."

Amazing news. Excellent writing.

One should add the fact that if P=NP, then all hell would break loose on the level of of having invented a matter-duplicating machine: Problems easy to check would become easy to solve. Gödel said that this "would be of the highest consequence". More eloquently, Scott Aaronson said that "we would be like Gods". So the bet is clearly on P=/=NP … and on the continuing work on SAT solvers to wring any advantage possible from the structure an NP-complete problem might have in a concrete case.

a solution already exists?

see:

http://www.dharwadker.org/tevet/isomorphism/

Can anyone say something about the link of guiseppe? There a proof (including a c++ program with graph visualization) is given, which – if it was correct – would prove, that GI is in P, which would be better than what Babai maybe has proven.

The answer to http://cstheory.stackexchange.com/questions/32237/is-anyone-aware-of-a-counter-example-to-the-dharwadker-tevet-graph-isomorphism-a has a counter example for the http://www.dharwadker.org/tevet/isomorphism/ algoritm (ie. the algorithm & proof is in-correct)

@Yatima, proving P=NP wouldn't necessarily have any practical implications. The proof might be nonconstructive. Or the fastest algorithm for any NP-complete problem might take O(n^(10^100)) time. Neither of those have any practical consequences.

But the consequences for complexity theory, and even computer science theory in general, would indeed be all hell breaking loose.

Who is Scott Aaronson?

@Apostolos: If the blurb in the article describing him is not enough, you can follow the link also given there.

Moderator: Perhaps his comment and mine should be removed.

Yes, there are solutions that work well and in "practically polynomial" time for inputs that are common in applications where searching for isomorphic graphs is needed (like the one pointed out by Giuseppe).

This work is theoretical and it is all about the worst cases as Babai answers in the end of his talk I saw.

Of course, all the complexity theory could be more oriented to what are the expected inputs.

I would like to know what are the average operation counts needed for processing graphs (all or from a specified subset) with given number of nodes or edges for various problems, not just the worst case complexity bounds.

Assuming we need to test huge amount of graphs generated in a bigger process (like generating neural networks), it is more important to know what is :density: of complex graphs (such where given algorithm fails to give us result within desired time) in set of all possible graphs of a size. Sometimes we are even OK with declaring two graphs "not isomorphic; because we do not know" if such situation is ~rare. If the point is we want to get rid of redundant isomorphic graphs and replace them with single representative we still could live with several instances if we cannot decide within a reasonable time…

Quote, "Before Babai’s announcement, many people didn’t expect the problem to be resolved anytime soon. “I kind of thought maybe it was not in P, or maybe it was but we wouldn’t know in my lifetime,” Grochow said."

You mean, "I kind've thought…"

"The graph isomorphism problem is neither known to be in P nor known to be NP-complete; instead, it seems to hover between the two categories." Kind of like subatomic particles that hover between particle and wave phases until observed!?