This week host Steven Strogatz speaks with two scientists whose searches for truth landed them squarely on the front lines of controversy. Rebecca Goldin, a professor of mathematical sciences at George Mason University, infuriated much of the public by making a statistically sound but unpopular argument about the safety benefits of breastfeeding. Brian Nosek, a professor of psychology at the University of Virginia, revealed that many cherished findings in his field couldn’t be scientifically replicated. This episode was produced by Camille Petersen. Read more at QuantaMagazine.org. Production and original music by Story Mechanics.
Steven Strogatz [narration]: From Quanta Magazine, this is “The Joy of x.” I’m Steve Strogatz. In this episode, Rebecca Goldin and Brian Nosek.
I had a chance to listen back to my conversations with Rebecca Goldin and Brian Nosek, and when I did I heard something I hadn’t really noticed the first time. I felt like I was hearing profiles in scientific courage. Both of them have stepped into war zones in their scientific fields. They know they’re going to get into fights. They know there’s a lot at stake for their careers and their reputations, and also the people that they’re talking about. But because they’re trying to get closer to the truth, they’re willing to make those forays into really dangerous territory.
We’re going to start with Rebecca Goldin. Rebecca Goldin is a mathematician who worked in a pretty far outer reach of geometry, in symplectic geometry, that in the course of my interview with her we never really even got into defining. But why we didn’t get to it is that her career has taken this interesting turn where she has effectively become a kind of mathematical pathway to the real world, where she is talking to people in public health and in medicine and statistics and government. I mean, she’s taken on this role at her university, where she works with journalists.
Rebecca Goldin: And it’s extremely hard to do. It’s hard to do it right. It’s hard to do it without offending people. It’s hard to do without someone coming up and saying, “Well, actually, you’re not right about that.” But there’s also kind of this question when you’re dealing with the public. What is it that you can say that’s true enough, if it’s not like in the nitty-gritty, absolute specific word choice that you make true?
And I think that’s a really, really big challenge and it’s one of the reasons why scientists, mathematicians are a little reluctant sometimes to like talk.
Strogatz: You remind me of something that one of our predecessors, a mathematician named Mark Kac, said that I think is a really good principle to live by when we’re trying to either teach or talk on the radio or podcasts. So, he said that you should try to tell the truth and nothing but the truth, but not the whole truth.
Goldin: Mm-hmm. Mm-hmm.
Strogatz: Right? A little different from the legal standard.
Strogatz: The truth and nothing — don’t lie. Tell the truth.
Strogatz: Nothing but the truth. But not the whole truth. You don’t have to get into all the minutia.
Goldin: So, I think part of that, at least in statistical language, people get really concerned with when you say something that’s truthful, what do you actually mean, and whether it could be misinterpreted. A lot of our notions of truth are actually maybe probabilistic statements or not really things that fall into that binary.
Goldin: They’ve got some truth to it.
Strogatz: We have, I think, a very extreme version of truth. Because we can prove things that will then be true for all time.
Strogatz: And that really leads to doubt. There’s nothing probabilistic, in some of the things that we can prove versus the real-life difficulties faced by people in data science or even more so in journalism.
Goldin: Yeah. No. That’s right. But even for pure mathematicians, the notion of what truth is is kind of funky. [LAUGHTER] So, in symplectic geometry, there have been some discussions as to whether some of the foundational material is fully proven or fully true —
Strogatz: Oh, really?
Goldin: — or known to be 100%. And these are discussions — and I don’t want to be quoted on the podcast as saying as I don’t understand them, but they’re highly, highly technical and I don’t understand them. [LAUGHTER] So there’s a kind of — you know, it’s actually like horrible too. Imagine being on an NSF panel and you’ve got, like, people arguing one way versus another, and you can’t really judge. And everyone who’s actually qualified to judge has got some skin in the game.
Strogatz: Yeah. Well, you raise a point that doesn’t come up often enough. I wasn’t really planning to go here, but I like it.
Goldin: Yeah. [LAUGHS]
Strogatz: Which is that even in math, which is so often idealized as this bastion of certainty, throughout the history of math, it has been done by communities of people trying to convince each other of things. And we can probably produce examples of cases where things were believed for a long time and yet they had gaps in them or just wrong thinking.
Goldin: Yeah. And I think you kind of put the right word on it, to say: Who are you convincing? Who is the audience? And when you write a mathematical proof, if you’re writing it, for example, in front of a classroom of students, have you really proved it if you’ve got half the class that didn’t follow what you did?
Strogatz: Yeah. [LAUGHS]
Goldin: If you’re writing to a group of researchers in your field, does it only have to convince the people in your field? Like, what level of explanation is actually required?
We have constructed really good machinery, and machinery that we all hold to be true, and people who have training in it get to the place of acknowledging that it’s true. And then we can build on it. And if we know what those previous theorems and constructs are, we use them with great abandon, without recognizing how difficult they were to master or to discover in the first place.
But there’s definitely kind of this place where there can be a bully effect, where we look at who authored the paper before we read it. Where we kind of judge what’s the validity of an argument based on social things, and that does happen. And especially if you wander into another field, you see very clearly how much that’s the case.
You know, I remember once writing to a mathematician about something I didn’t understand that seemed to be really standard in another field.
And I asked him, “I’ve tried now for hours and hours, I don’t know, for a long time, to try to understand this really basic fact. Can you explain, why do people say this as if it were obvious?” And he said, “Well, you know, to be honest, people in our field kind of yell really loud. And if they yell really loud, then everybody just believes them.” [LAUGHTER] And he kind of admitted it. It actually was a subtle thing and it wasn’t an easy thing. And then he proceeded to give me an explanation of this.
But I think my point is just that, you know, when you’re in a certain field, you start to believe that certain things are obvious and certain things are known, everybody knows it. But once you move outside of that field, people don’t know those things and they’re not obvious. And you have to sort of take it apart if you’re trying to learn it and really try to dig down underneath what is it saying, what’s the structure it’s speaking about. But everyone in every little field, even in the purest of pure mathematics, I think, gets into a kind of place where the people they’re speaking to are their own.
Strogatz: Actually, I want to pick up on something you said a few minutes ago, where you said don’t quote you about certain things that you don’t understand in your own field.
I really hope you’ll let us quote you on that [LAUGHTER] because I think it’s very revelatory and honest. It’s wonderful. I don’t think there’s anything wrong with admitting that. If you want, I’ll admit it about my own field. [LAUGHTER] You know, I think we all live like that. That it’s in the nature of being part of the — I mean, I’ll still respect it if you’re serious —
Goldin: Yeah. Yeah, yeah, yeah. You know —
Strogatz: — but I think it was kind of cool that you just said that.
Goldin: I will say that in symplectic geometry, there are some really, really highly technical issues that I don’t understand and that even the people working in the field have to at some level admit that they don’t understand what the issue exactly is. If they did, I think there wouldn’t be conflict about it. What I do understand is that somehow, underneath, the disagreements have to do with whether something in fact needs additional proof. Like, have you really considered everything that could happen?
Strogatz: So Rebecca was dealing with these really abstract technical questions in math about proof and evidence, and then her work started to take a different direction.
Goldin: When I was in grad school and early in my career as a mathematician, I always felt like I needed to do something more than just pure math. I have something more to give the world or I want to be involved in the world in other ways. And, you know, there’s a lot of opportunity for that, even as a working mathematician. For example, I had kids, and I have to be a mom and that’s a job that’s a really wonderful job, and it’s a hard job. And that’s something that’s different from math. But there was some need to kind of have a professional life that felt outside of writing a paper and publishing a paper and being like a good teacher in your classroom. I don’t think I was really looking for anything when this opportunity arose.
But at that time, George Mason University decided to kind of collaborate and bring in a little think tank, if you will, to try to support the appropriate, and if you want to use the word true, use of data, especially as it relates to how it’s publicly disseminated. They wanted to have a university professor who would be involved as director of research with this organization called STATS that had the role of playing a little bit of a “gotcha” about media. Like, you did this wrong, and you’re promoting the wrong idea and I gotcha.
When you play the gotcha game, you’re kind of deciding which things to play gotcha on, and that felt not always good to me. That was a kind of a revelatory time, because I also started learning how to write better and trying to understand how people think about stories and narratives around data.
Strogatz: Well, it totally surprises me that Rebecca got into this public-facing side of math, because that’s a very thorny enterprise, to try and translate math into the public. Or not just math — I mean, in her case it sounds like she’s tackling a much wider field, the field of statistics and data.
And that could be about anything from breast cancer to car accidents to — you know, I mean, anything where numbers come up in the economy or in daily life, that’s now in her bailiwick.
Strogatz: You’re a pure mathematician in the ivory tower, not connecting with the world. This thing sort of lands in your lap. I don’t understand why your president or chancellor or whoever it is at George Mason says, “Rebecca…” Like, don’t they have statisticians? How come it’s you? Why are you the one doing this?
Goldin: Why me? There were a couple people who applied for it. I think they liked me the most, and it could have been because I was young and innocent and naive. I don’t hold that as impossible as to how that happened. But it also is possible that I was in some way really trustworthy, because I had no interest. Not like interest in the sense of being interested in something, but interest in the sense of skin in that game. I really didn’t. No agenda. My agenda was that we should be truthful.
It was very much like you began the conversation of saying, like, “We should tell things that were only truthful.” And I started to feel like this idea that you also mentioned — like, not tell the whole truth because you can’t — biases it in a way that made me feel a little uncomfortable.
And I should also say, at the time we did a lot of gotcha work, but the gotcha work usually involved dissecting something that someone had done and showing why you couldn’t present it that way. So, in its own nature, it was journalism, which puts you into conflict, right? What’s the conflict of interest to say, I’m going to be a journalist and I’m also going to judge journalism? Like, that doesn’t work very well.
And I think also some of our early successes, if you want to call them, sort of things that really made me happy and made me feel like this is what I want to do this for, were when I would write something and that created this viewpoint that “I’m the expert.”
So, moving back to this question you ask in math, what makes a proof a proof? It’s that the experts will agree you’re correct. What makes an expert? And that’s a really funky question in the world of journalism, in the world of science. Because for a scientist, you are a scientist because your other scientist fellows think you’re a scientist, right? You do some research and you do something in the field and other people in that field recognize you as an expert.
I was finding that I might write about a topic, so I knew a lot more than a standard lay person would know. I maybe read 10 scientific articles on a topic, but I have no training in it, no expertise in the sense that anything a scientist would say. And to a journalist, I was absolutely an expert. I was an expert with my name out there so they could call me and ask me about things.
So that was a really eye-opening experience of what do we call expertise. [LAUGHS] You know, what makes an opinion an expert opinion as opposed to just an opinion?
Strogatz: During her time acting as an expert on all kinds of things, Rebecca once weighed in on a very fraught issue, one that I’ve actually always thought of as a minefield.
Goldin: I had four kids and I was breastfeeding all of them and it’s a lot of work to breastfeed. I mean, some women love it and really get something out of it. And other women can’t. But there’s a whole lot of people in between. And they persist in doing it in part because of the messages of how important it is to your child’s health.
Goldin: I think I was one of the women who… I had a lot of trouble with my first kid, but afterwards I kind of enjoyed it and I liked certain aspects of it. But I certainly felt like I was doing it longer than I would have, perhaps because I was influenced by these ideas of its benefit. So I tried to look into how good the benefits are and how not good they are. Like, what can we really say about the benefit?
Goldin: How good are the studies that say if you breastfeed, the IQ of your child is higher? If you breastfeed your child is less likely to be obese? If you breastfeed, your child is less likely to die? And that last claim was the one that really felt the right place to look, because there’s a huge, huge amount of literature about breastfeeding and its benefits. And there are huge problems with a lot of the research, and I could go into details.
But I felt like this question of death was the biggest one. I mean, you might not feel like you’re going to make your choice because your child will have one more ear infection or one fewer ear infection. But what about death? You know, that’s like really scary. So I wrote an article that was trying to kind of untangle what the risk actually is.
And instead of doing any kind of criticism of any of the research itself, I said let’s just ask on the level of if we believe that all of the research was really high quality, that there were no results in the literature that were because they took a biased sample or because the people involved with the research misinterpreted some aspect of the data or didn’t control for something correctly. Let’s just assume that all that research was really good. What’s the actual risk of death for not breastfeeding compared to breastfeeding?
So, you know, if I were to make the decision, I’m going to either breastfeed or not breastfeed — of any type, could I see a benefit for death? Could I quantify that? Like that I’m going to reduce the risk of my baby by a certain amount? And what I found was that you could try in a kind of back of the envelope way. I’m certainly not saying this is a scientific conclusion, but in a kind of estimated… It felt like the benefit for the question of children dying, which I thought to me was the most important question to ask, it was about the same or a little bit less in the first year of life as making the decision that you will drive your car with your child in the backseat properly strapped into a car seat. In other words, that this risk was the kind of risk that people take all the time.
Strogatz: So, like on the matter of breastfeeding, which turns out was very hard for my wife, and we didn’t end up doing it. You know, you feel really guilty about it, because it’s supposed to be very good for your baby. So the question is, how good is it? And if you don’t do it, how bad is it? And so what I think I heard you say was, how much of a risk are you taking by not breastfeeding? It’s about the same risk that you take every day, without thinking about it, by strapping your kid into the car seat properly. And you think, well, I’ve done what I can. I’ve strapped my kid in. They’re as safe as can be. You subject your kid to that risk, because driving is dangerous.
Strogatz: You subject your kid to that risk all day long without thinking about it. That’s about how worried you should be, i.e. not very worried if you decide not to breastfeed.
Goldin: Exactly right. And that’s specifically — though I would say, to put some caveats on it, this was research that I did 15 years ago based on published work that wasn’t methodical. It was a back of the envelope calculation.
Strogatz: Okay, yeah.
Goldin: So it was kind of an estimate of what I was seeing. If I were to do something really serious, you have to kind of take into account also some questions as to how good the research was I was reading and other things like that. But that’s exactly right.
Strogatz: I mean, I think she knew that this was going to be a hornet’s nest, but maybe it turned out to be an even bigger hornet’s nest than she expected.
Goldin: I felt like I was kind of like pegged a little bit by people who leaned left politically as representing ideas I didn’t represent. And I wanted to say that that has also happened in the other direction. That it’s not something that’s political by nature. It has to do with the interest of people in portraying things. Say, the infant formula industry.
I got calls from them, “Would you please write an article about how this is — I mean, we acknowledge that breast is best, but couldn’t you write an article about how, you know, not breastfeeding really does nothing at all to hurt children?”
I felt like, again, that’s where you start to realize that my voice could be amplified by people, and other people would know I don’t have an interest. I don’t work for an infant formula council. I don’t have any role, but my voice would nonetheless be amplified by sources that didn’t have, I think, honest intention. That had a real interest in it. So I believe 100% in my work, but I also started to see that there were people who had a really serious interest. They wanted to show the world that infant formula had as much a benefit for babies as breastfeeding. And they even said this at one point to me. To which I responded, “That’s just not true.” [LAUGHS] But I wanted to argue that it was less damaging not to breastfeed than people were told.
But that was something that could be interpreted or stretched or kind of portrayed as being something very much more extreme. So I feel like at that point — and this was a long time ago — we really shifted our focus away from the idea that I was going to pretend to be an expert and do some kind of analysis, and I was going to try and get attention from journalists by putting out work that made me look like an expert on breastfeeding or something. And instead what I would be is the expert on mathematical reasoning that they could then participate in. That they could then partake in and make some decisions.
Strogatz: Is there a journalist that’s just going to call you up or send you an online inquiry, “I need help with this story I’m thinking about”?
Goldin: Yes, exactly. So actually a lot of our inquiries will come from an online sheet that people can fill out, that they can find at STATS.org. And it’s, like, we’ll say, “Let’s fill out who you are and what you’re looking for.”
A lot of times they have data that — they say, “Hey, I’ve got some data. I don’t know how to analyze it” or “I’m thinking to do this and I just want to check that I’m doing things right.” Sometimes they say, “I’ve got a study I’m looking at and I want to know if the study is good,” or something kind of vague like that. A lot of times, “I want to think out a story. I want to know what kind of data would I need to do a good job with this story.”
One of the most interesting inquiries I had, which was a lot of fun, was somebody who was developing an online tool to help people decide who to vote for in some local elections. And he wanted to know, like, how could he take the opinion of the various candidates on a whole bunch of questions that he asked them — they had all filled out some kind of survey — and turn that into a tool that would give different candidates a rating based on a user answering similar questions.
You know, like a user might say, “How important do you think dealing with the pensions plans is?” Or increasing taxes or tolls or something — how important are these issues to consider? And what opinion do you have on them and how strongly do you feel about it? And then try to match them up with the candidates, right? It was a very interesting project about, kind of, metrics. How do you decide how to weight one thing versus another and get meaningful information out of it?
You want to empower people and get them to see some of the beauty in it without necessarily getting dragged down in the details and the social context of learning those details, right? Like, what you’re trying to do is to show them something really cool that makes them appreciate it. And that’s that beauty question, right?
Goldin: Like, that’s it right there. That’s really cool. I love it.
Strogatz: After the break, we’ll meet our second fearless scientist, Brian Nosek. Plus a fraudster, a psychological horror movie, and do I believe in ESP? That’s all coming up.
Strogatz: Brian Nosek is a psychologist by training, but like Rebecca Goldin, he’s taken a turn in his career, an interesting turn. In his case, he stepped into the middle of an existential crisis in his field called the replication crisis. Essentially what’s going on is that even classic experiments in the field turn out not to be reproducible sometimes. In other words, when another psychologist or a team of psychologists tries to replicate the results, they find that it doesn’t pan out. They’re not getting the same answers as the first time around.
So, Brian has been trying to address this crisis, along with his colleagues, by looking at — really closely — at process, how scientific exploration and discovery is done. Brian really got started with this new direction in his work in 2011, after three big events rocked the field.
Nosek: The first was a researcher, a prominent social psychologist name Diederik Stapel, was identified as a fraudster.
Nosek: It turned out that 50-something papers that he had done, he had made up the data.
Nosek: And he had done so with lots of collaborators. I mean, it’s a remarkable story. But he was eventually exposed by whistleblowers and people were shocked. He’s a prominent researcher. These are prominent findings and everything else. But besides the fraud part, the other part that people found shocking was the fact that there were 50 papers in this literature that he had made up the findings and there was never — besides the whistleblowers — there was nothing in the literature itself of people saying, “Oh, we had trouble getting that result.” Or “We looked at the data that he had.” No one had ever looked at the data because no one ever had access to the data. But it was all these things of “why didn’t the field catch this?”
Strogatz: Right. Sure.
Nosek: How is it possible that we would have all this stuff in the literature that is all made up?
Strogatz: And you say the guy was well-respected and publishing in good journals —
Nosek: Yeah. Oh, yeah.
Strogatz: So all the usual peer-review that’s supposed to be catching this at least with some probability of catching it? It just didn’t catch it.
Nosek: That’s right. So that was one event. It’s just the shock that that could have happened, that there was no self-correction.
The second big event was a very prominent researcher, emeritus professor at Cornell named Daryl Bem. He published a paper in one of the most prestigious psychology journals that reported nine experiments showing evidence that ESP is true, that you can predict the future. And it was amazing. The paper, you read it, and it is amazing. And the way it’s amazing is that it followed all of the rules of how one does research in psychology. It got the P-values less than 0.05.
It designed the studies really well. It had replications built in. There’s all sorts of pieces there that you’re, like: He did it just like every other paper that we see in this journal, and simultaneously came to a conclusion that very, very, very few people were ready to accept [LAUGHS], that ESP happens.
Strogatz: Didn’t he do something where he was claiming that you could influence results in the past?
Nosek: It is a retroactive influence. So essentially — let me give you an example of the paradigm. He had a few different ways of doing it. There is a standard paradigm that’s used a lot in cognitive and social psychology, of priming. You flash a word, “bread,” and then people are faster to identify the word “butter” compared to some other word that isn’t related to bread.
Strogatz: Yeah. And I’ve heard of one where it’s something like … you should, people … something about words associated with being geriatric and then they walk slower down the hall or something like that.
Nosek: Right. So that’s another variation of that kind of paradigm. But the basic paradigm, the sequential priming of speeded responses to some words compared to others when there’s relationships between them, is a very well-established paradigm.
Nosek: So he just flipped the order. So, you see the word “butter” and you measure how fast can you identify the word “butter.” And then afterwards you either flash the word “bread” or some other word. And what he found is that if you flash “bread” after then people were faster to identify “butter” than if you flashed some other word afterward.
Strogatz: [LAUGHS] Wait, hold on now.
Nosek: Exactly [LAUGHS].
Strogatz: I didn’t — what? Wait a second. Did I hear you right? First you give me — it’s not the matter of that people say “bread” before “butter”?
Nosek: That’s right.
Strogatz: That’s not what we’re talking about. It’s that you’ve shown me “butter,” the word, and then in the — tell me again.
Nosek: [LAUGHS] It is confusing because it doesn’t make any sense, right? In the normal version of the paradigm, you see “butter” flashed very briefly and then you can identify the word “bread” faster than some other word, or if “butter” hadn’t been flashed.
Strogatz: And that’s then shown to you later?
Nosek: Yeah. And that makes sense, right? Because those two are —
Strogatz: Right. Sure.
Nosek: So the idea is that by activating the word “butter,” then things related to butter are easier to recognized or easier to respond to.
Strogatz: Sounds reasonable, yeah.
Nosek: His paradigm flipped it. So, you first present the word “bread,” and measure how fast people respond to it. And then afterwards you either flash “butter” or something else. But what he observed was that you can identify “bread” faster if later you’re going to see the word “butter.” [LAUGHTER]
Strogatz: Okay. I did hear you right. That’s what I thought you said.
Nosek: Right. It’s fantastical, right? It’s like, wow, if that is true, our whole understanding of lots of things changes.
Strogatz: Well, causality. So if I can say in my way, having shown me “butter,” it meant that in the past I was going to be likely to say “bread” faster than I would have otherwise.
Strogatz: So it really sounds contrary to our, if you want to say it, fancy and statistical talk, our prior about the way we think the world works.
Nosek: That’s right. That’s right. But he had nine experiments. And eight of the nine experiments had significant evidence of this retroactive influence on our behavior. And it followed all the rules, right? He is a respected social psychologist. He did the experiments in ways that we recognize as the way you do experiments. He did the analysis in ways that we recognize as the way you do analysis. And he came to this conclusion. And so the journal that published it, which is one of the most prestigious journals said, “Well, we felt like we had to publish it. It followed all the rules.”
Strogatz: [LAUGHTER] Yeah. I shouldn’t giggle. I mean, it’s —
Nosek: No. It was amazing.
Strogatz: It’s like it’s not totally impossible.
Nosek: Right. But —
Strogatz: Although it sort of, commonsensically, it is.
Nosek: That’s right. And so that’s the confrontation is that it’s so unlikely that of course we would demand very, very strong evidence. But he got it so consistently and it seemed like, okay, now we have a situation in our field where we have to make a choice. He did this following all of our rules, so either we need to now believe in ESP the way he described it or we need to question our rules.
Strogatz: And what’s the third thing that happened in 2011?
Nosek: Yeah. So, the third one is a paper that’s called “False Positive Psychology.” What’s powerful about it is as a rhetorical device. So, essentially what they wanted to do was provide sort of a concrete experience that people would recognize for how these various decisions that we talked about earlier, of deciding when to stop or continue data collection, deciding what variables to exclude, deciding some of the specifics of how you analyze your data — what the consequences of that really are for the likelihood getting false positive results.
And so this is a perfect follow on to the Bem paper, because there are all sorts of things that one could have done in analyzing that data to get to what seemed like significant results that we need to take seriously.
But what was brilliant about this paper was that it didn’t break new ground of saying something new that people hadn’t identified as a risk, but what it did was, framed it in a way that every reader of the paper could see themselves doing those behaviors. “Oh yeah, of course I do that. Of course I think about how to combine the variables. Of course I have to make decisions about exclusion rules.” And so they walked the reader through an example of those decisions and then laid out the statistical consequences of: If you did these different behaviors, then instead of a nominal 5% false positive error rate — which is sort of what we tolerate, we know that we’re going to get things wrong some of the time, but let’s keep that relatively low. They were showing that it could climb very rapidly up to 60% or more false positive rate from these very simple behaviors that we can all recognize.
Strogatz: Is this example something that you want to tell us about or do you think we should leave it aside?
Nosek: The example itself was showing that listening to the song “When I’m 64,” by the Beatles, makes you younger [LAUGHTER], makes you chronologically younger.
Strogatz: Yeah? I’m going to go out and get one right now. [LAUGHTER]
Nosek: So they reported the analysis of the data and showed that, in fact, they found a relationship. That people who listen to “When I’m 64” were chronologically younger than people who didn’t.
So then what they did was say, okay, let us show you the things that we left out of what we reported in the paper. And so then they said, we analyzed it these four ways and this is the one way that we saw that. We did these different exclusion rules and this is the way that we did that. So they showed all of those decisions that they had made and then selectively extracted the way of analyzing data so that they observed that result.
Strogatz: So it’s almost like a voyeuristic study. Like, let’s do this —
Strogatz: Did they actually collect any data or was this sort of synthetic data?
Nosek: It was real data.
Strogatz: It was real data. Oh.
Strogatz: So then they sort of walk you through and, like, you’re the voyeur looking at these psychologists analyzing it a certain way. And you’re thinking, uh-oh, look what just happened to their — that they got this fake high P-value that they really — or sorry, significance value, whatever it was.
Nosek: That’s right. No, that’s exactly the scenario. And the part of it that makes it such a rhetorically brilliant paper is that you’re being a voyeur, but you’re really seeing yourself in their behavior.
Strogatz: Like, “Do not open that door! Why are you putting your hand on that doorknob? Do not go down the cellar. Are you crazy?” [LAUGHTER]
Nosek: Yeah. Exactly. Right.
Strogatz: You know, that you relate to the character in the movie.
Strogatz: And so it’s like: “What is this psychologist doing in this study? Don’t do that statistical test. What are you doing?” [LAUGHS]
Nosek: Right. Right. And then at the end, you’re like, “Oh my god, it was me the whole time. I’m the monster!” Right?
Strogatz: [LAUGHTER] That’s when the phone starts ringing. I’m in the house. Okay — but, wow. So this is the year 2011. These three things. The fraudster, the retroactive influence of Daryl Bem, and then this false positive psychology paper that everyone can relate to. Did it make you do something in 2011?
Nosek: All of these were sort of events that made the increasing investment in being interested in improving reproducibility and the credibility of our findings, made it a whole lot easier to make it a field-wide discussion and interest. There were very wide, diverse points of view on that. Some people say, “No, it’s not a problem. The peer-review process and sort of the things that researchers actually do take care of all that. These are unusual cases.” Other people sort of ringing the death bell of our field: “We can’t trust any of it! All of it might be hogwash.”
But it was all conceptual debates. No one actually had any really grounded evidence. It was all anecdote, right? “This paper didn’t replicate.” “Oh, well of course that paper didn’t replicate and blah blah blah blah blah.”
And so what we decided to do was say, “Well, let’s actually get some evidence. This is a scientific question. We can study it. Is our field replicable or not?” And of course, to really answer that question would require a project as big as the field already is, and that’s not going to happen.
But what we thought we could do was try to organize people that were interested in this question to conduct a study of a sample of papers and see if we could replicate the key finding from some number of studies. And so that’s what started as the reproducibility project in psychology, was: We identified three journals and the year of articles being 2008, and what we were going to try to do was grab a sample of those articles and take an experiment from each of them and see if we could do an independent replication.
Strogatz: Of the original hundred papers, how many were found statistically significant findings to begin with?
Nosek: So of the hundred, 97%. So 97 of the hundred findings were statistically significant findings.
Strogatz: Not surprising, right?
Nosek: That’s what gets published, right? Positive results.
Strogatz: That’s what’s going to get published.
Strogatz: Yeah. Okay. And what about when you tried to replicate?
Nosek: And the amount our replications, 36 were. And so that’s smaller.
Nosek: And there are multiple criteria that we used, because that’s not the only way you’d want to think about whether you repeated a finding, right? Because there’s nothing magical about this line of P less than 0.05, which is the convention for deciding it’s a positive result.
Nosek: And so we looked at things like how large was the effect, right? Did it shrink from the original demonstration to the replication? And on average, the replication effect sizes were about half of what the original effect sizes were.
Strogatz: So that term is a little abstract to me. What would be an example of an effect size that I could picture?
Nosek: You could think about it in terms of drugs — it’s very easy too, right? Does this drug reduce my cold symptoms by cutting in half the time it takes me to recover or just an hour, right?
Strogatz: Okay. I see.
Nosek: I’m going to get an hour of feeling better? Great. No, I’d rather get a few days of feeling better.
Strogatz: So the effect sizes in the attempted replication’s only about half as big as the originally reported?
Strogatz: Hmm. Some people, you know, on the opposite side of the work that he’s doing, refer to people like him as “replication Nazis.” This is very, very threatening to some of the established researchers, and so for some of the young researchers. If you can imagine, you’re doing work and someone decides, “We think your work is really rubbish and we’re going to try and replicate it and, guess what? It doesn’t replicate,” and maybe your reputation is destroyed as a result, and maybe your life. The thing you wanted to do. You dreamed of being a psychologist. Well, we’ve just destroyed that for you now. You know, the stakes are high.
And that’s the question: Is Brian, who has been one of the leaders in this movement to address the replication crisis, is he a replication Nazi? Is he just a fair player who’s just trying to do what’s right for his field? I came away feeling like he seemed reasonable. You know, like I was talking to a fair player.
Strogatz: What can you tell us about the phenomenon of replication bullies? Your own take on it?
Nosek: It’s all tangled up in social relations and power structures, and jealousies and feelings of resentments of who gets rewarded and who doesn’t, and how fair or just the system is. How is it that these people who are in these fabulous jobs made their careers on findings that are not replicable? That’s not fair. And then others saying, how is it that we get singled out and targeted for reasons that don’t seem appropriate? And certainly the strength of the reaction is out of scale for what they perceived errant behavior is.
So it has had very much its fair share of contentiousness. And it’s hard to say that there are innocent actors in all of it, right?
Everybody has contributed in some way to things that add heat instead of light. Some of that is unavoidable. This is a challenging transition for our discipline. And there are going to be… In every competitive domain there are losers in the sense that some people get rewards and others do not. And the nature of how those rewards are being distributed is shifting and uncertain.
Strogatz: I mean, this is a huge moment for psychology. A lot of these scientists are feeling like the rug is being pulled out from under them. They don’t know what to believe or how to move forward. But one of the things that Brian has been pushing for as a solution to the replication crisis is something called pre-registration.
Nosek: So pre-registration refers to planning in advance to observing outcomes. So we have an idealistic model of how science works that we learn in grade school, which starts with you observe things in the world and then you generate a hypothesis about what you think might be happening or why you think that might be happening. And then you design a study to say, well, let’s test my hypothesis to see if it actually happens that way. And then you collect some data and then you evaluate whether that data is consistent with your hypothesis or not and then you have this sort of iterative cycle. “Oh, and now I’ve learned something. And now I revise what I think is happening in the world and then we’ll try a new study.”
Well, part of that idealized structure is this notion of pre-commitment. The idea that I have some idea in advance that I want to test with data when I observe it in the future. And the purpose of that pre-commitment is to — well, there’s multiple purposes, but a core one is essentially keeping us honest. What is it that we’re trying to test?
And we want to observe data that then may confront us with our misunderstanding or our correct prediction about what was going to happen in the world. So, pre-registration is that act of pre-commitment.
Nosek: It’s “Here’s a study that I’m going to do. Here’s how I’m going to analyze the data. Here’s the methodology that I’m going to apply.” And then I can compare back against what I thought before to what I learned now.
Strogatz: You’re saying if you want to test a hypothesis and try to be rigorous, you should stick your neck out, say what you’re testing, put it in print, pre-register in that sense. That’s the point, right?
Strogatz: It’s a way of being disciplined and avoiding maybe fooling yourself and avoiding fooling others.
Nosek: Yeah. That’s right. Yeah. And I think that phrase is a very apt one. It’s not a notion of dishonesty, like we can’t trust our researchers to try to figure it out. It’s really broader than that. We can’t trust ourselves, because we are so adept at generating explanation once we see outcomes, right?
And so human reasoning biases, confirmation bias, taking information that’s more consistent with our prior beliefs as more diagnostic, as information that’s less consistent with our prior beliefs. Having hindsight bias, once I see what an outcome — it’s like, oh of course I would have anticipated that that’s what we would observe. Of course that’s the idea. Outcome bias, it’s hard to imagine other paths that could have been true once we know what the outcomes were. So all of that stuff that happens for humans in just reasoning about the world happens for scientists as they do science. And so if we really want confrontation of our ideas and to test our predictions — do they actually predict the future? Then we have to make those pre-commitments.
Strogatz: Do you feel like you’re a wave? Your movement, that is, the field — like, let’s say younger researchers, are they all pre-registering?
Nosek: Yeah. There has been a dramatic shift in research practices within psychology, and that is extending to other areas of the social behavioral sciences. And then, also, now leading, I think, to changes in life sciences and elsewhere. We haven’t done the studies yet if all of these efforts that so many people have been making are paying off. Are we actually accelerating the progress in science and psychology? But there is a real, palpable sense of that’s part of what we’re here to do. And we’re not just churning out new results. We’re also going to be constantly questioning and working to improve how it is we get those results, so that we can be confident and make the best progress that we can.
Strogatz: Next time on “The Joy of x,” Brian Keating searches for the very beginning of the universe.
“The Joy of x” is a podcast project of Quanta Magazine. We’re produced by Story Mechanics. Our producers are Dana Bialek and Camille Petersen. Our music is composed by Yuri Weber and Charles Michelet. Ellen Horne is our executive producer. From Quanta Magazine, our editorial advisors are Thomas Lin and John Rennie. Our sound engineers are Charles Michelet, and at the Cornell University Broadcast Studio, Glen Palmer and Bertrand Odom-Reed, though I know him as Bert. I’m Steve Strogatz. Thanks for listening.
[End of Audio]